# Langroid > Yes, see this note on [reasoning-content](https://langroid.github.io/langroid/notes/reasoning-content/). --- # Frequently Asked Questions ## Can I view the reasoning (thinking) text when using a Reasoning LLM like R1 or o1? Yes, see this note on [reasoning-content](https://langroid.github.io/langroid/notes/reasoning-content/). ## Does Langroid work with non-OpenAI LLMs? Yes! Langroid works with practically any LLM, local or remote, closed or open. See these two guides: - [Using Langroid with local/open LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/) - [Using Langroid with non-OpenAI proprietary LLMs](https://langroid.github.io/langroid/tutorials/non-openai-llms/) ## Where can I find out about Langroid's architecture? There are a few documents that can help: - A work-in-progress [architecture description](https://langroid.github.io/langroid/blog/2024/08/15/overview-of-langroids-multi-agent-architecture-prelim/) on the Langroid blog. - The Langroid [Getting Started](https://langroid.github.io/langroid/quick-start/) guide walks you step-by-step through Langroid's features and architecture. - An article by LanceDB on [Multi-Agent Programming with Langroid](https://lancedb.substack.com/p/langoid-multi-agent-programming-framework) ## How can I limit the number of output tokens generated by the LLM? You can set the `max_output_tokens` parameter in the `LLMConfig` class, or more commonly, the `OpenAIGPTConfig` class, which is a subclass of `LLMConfig`, for example: ```python import langroid as lr import langroid.language_models as lm llm_config = lm.OpenAIGPTConfig( chat_model="openai/gpt-3.5-turbo", max_output_tokens=100, # limit output to 100 tokens ) agent_config = lr.ChatAgentConfig( llm=llm_config, # ... other configs ) agent = lr.ChatAgent(agent_config) ``` Then every time the agent's `llm_response` method is called, the LLM's output will be limited to this number of tokens. If you omit the `max_output_tokens`, it defaults to 8192. If you wish **not** to limit the output tokens, you can set `max_output_tokens=None`, in which case Langroid uses the model-specific maximum output tokens from the [`langroid/language_models/model_info.py`](https://github.com/langroid/langroid/blob/main/langroid/language_models/model_info.py) file (specifically the `model_max_output_tokens` property of `LLMConfig`). Note however that this model-specific may be quite large, so you would generally want to either omit setting `max_output_tokens` (which defaults to 8192), or set it another desired value. ## How langroid handles long chat histories You may encounter an error like this: ``` Error: Tried to shorten prompt history but ... longer than context length ``` This might happen when your chat history bumps against various limits. Here is how Langroid handles long chat histories. Ultimately the LLM API is invoked with two key inputs: the message history $h$, and the desired output length $n$ (defaults to the `max_output_tokens` in the `ChatAgentConfig`). These inputs are determined as follows (see the `ChatAgent._prep_llm_messages` method): - let $H$ be the current message history, and $M$ be the value of `ChatAgentConfig.max_output_tokens`, and $C$ be the context-length of the LLM. - If $\text{tokens}(H) + M \leq C$, then langroid uses $h = H$ and $n = M$, since there is enough room to fit both the actual chat history as well as the desired max output length. - If $\text{tokens}(H) + M > C$, this means the context length is too small to accommodate the message history $H$ and the desired output length $M$. Then langroid tries to use a _shortened_ output length $n' = C - \text{tokens}(H)$, i.e. the output is effectively _truncated_ to fit within the context length. - If $n'$ is at least equal to `min_output_tokens` $m$ (default 10), langroid proceeds with $h = H$ and $n=n'$. - otherwise, this means that the message history $H$ is so long that the remaining space in the LLM's context-length $C$ is unacceptably small (i.e. smaller than the minimum output length $m$). In this case, Langroid tries to shorten the message history by dropping early messages, and updating the message history $h$ as long as $C - \text{tokens}(h) < m$, until there are no more messages to drop (it will not drop the system message or the last message, which is a user message), and throws the error mentioned above. If you are getting this error, you will want to check whether: - you have set the `chat_context_length` too small, if you are setting it manually - you have set the `max_output_tokens` too large - you have set the `min_output_tokens` too large If these look fine, then the next thing to look at is whether you are accumulating too much context into the agent history, for example retrieved passages (which can be very long) in a RAG scenario. One common case is when a query $Q$ is being answered using RAG, the retrieved passages $P$ are added to $Q$ to create a (potentially very long) prompt like > based on the passages P, answer query Q Once the LLM returns an answer (if appropropriate for your context), you should avoid retaining the passages $P$ in the agent history, i.e. the last user message should be simply $Q$, rather than the prompt above. This functionality is exactly what you get when you use `ChatAgent._llm_response_temp_context`, which is used by default in the `DocChatAgent`. Another way to keep chat history tokens from growing too much is to use the `llm_response_forget` method, which erases both the query and response, if that makes sense in your scenario. ## How can I handle large results from Tools? As of version 0.22.0, Langroid allows you to control the size of tool results by setting [optional parameters](https://langroid.github.io/langroid/notes/large-tool-results/) in a `ToolMessage` definition. ## Can I handle a tool without running a task? Yes, if you've enabled an agent to both _use_ (i.e. generate) and _handle_ a tool. See the `test_tool_no_task` for an example of this. The `NabroskiTool` is enabled for the agent, and to get the agent's LLM to generate the tool, you first do something like: ```python response = agent.llm_response("What is Nabroski of 1 and 2?") ``` Now the `response` is a `ChatDocument` that will contain the JSON for the `NabroskiTool`. To _handle_ the tool, you will need to call the agent's `agent_response` method: ```python result = agent.agent_response(response) ``` When you wrap the agent in a task object, and do `task.run()` the above two steps are done for you, since Langroid operates via a loop mechanism, see docs [here](https://langroid.github.io/langroid/quick-start/multi-agent-task-delegation/#task-collaboration-via-sub-tasks). The *advantage* of using `task.run()` instead of doing this yourself, is that this method ensures that tool generation errors are sent back to the LLM so it retries the generation. ## OpenAI Tools and Function-calling support Langroid supports OpenAI tool-calls API as well as OpenAI function-calls API. Read more [here](https://github.com/langroid/langroid/releases/tag/0.7.0). Langroid has always had its own native tool-calling support as well, which works with **any** LLM -- you can define a subclass of `ToolMessage` (pydantic based) and it is transpiled into system prompt instructions for the tool. In practice, we don't see much difference between using this vs OpenAI fn-calling. Example [here](https://github.com/langroid/langroid/blob/main/examples/basic/fn-call-local-simple.py). Or search for `ToolMessage` in any of the `tests/` or `examples/` folders. ## Some example scripts appear to return to user input immediately without handling a tool. This is because the `task` has been set up with `interactive=True` (which is the default). With this setting, the task loop waits for user input after either the `llm_response` or `agent_response` (typically a tool-handling response) returns a valid response. If you want to progress through the task, you can simply hit return, unless the prompt indicates that the user needs to enter a response. Alternatively, the `task` can be set up with `interactive=False` -- with this setting, the task loop will _only_ wait for user input when an entity response (`llm_response` or `agent_response`) _explicitly_ addresses the user. Explicit user addressing can be done using either: - an orchestration tool, e.g. `SendTool` (see details in the release notes for [0.9.0](https://github.com/langroid/langroid/releases/tag/0.9.0)), an example script is the [multi-agent-triage.py](https://github.com/langroid/langroid/blob/main/examples/basic/multi-agent-triage.py), or - a special addressing prefix, see the example script [1-agent-3-tools-address-user.py](https://github.com/langroid/langroid/blob/main/examples/basic/1-agent-3-tools-address-user.py) ## Can I specify top_k in OpenAIGPTConfig (for LLM API calls)? No; Langroid currently only supports parameters accepted by OpenAI's API, and `top_k` is _not_ one of them. See: - [OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat/create) - [Discussion on top_k, top_p, temperature](https://community.openai.com/t/temperature-top-p-and-top-k-for-chatbot-responses/295542/5) - [Langroid example](https://github.com/langroid/langroid/blob/main/examples/basic/fn-call-local-numerical.py) showing how you can set other OpenAI API parameters, using the `OpenAICallParams` object. ## Can I persist agent state across multiple runs? For example, you may want to stop the current python script, and run it again later, resuming your previous conversation. Currently there is no built-in Langroid mechanism for this, but you can achieve a basic type of persistence by saving the agent's `message_history`: - if you used `Task.run()` in your script, make sure the task is set up with `restart=False` -- this prevents the agent state from being reset when the task is run again. - using python's pickle module, you can save the `agent.message_history` to a file, and load it (if it exists) at the start of your script. See the example script [`chat-persist.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-persist.py) For more complex persistence, you can take advantage of the `GlobalState`, where you can store message histories of multiple agents indexed by their name. Simple examples of `GlobalState` are in the [`chat-tree.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree.py) example, and the [`test_global_state.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_global_state.py) test. ## Is it possible to share state between agents/tasks? The above-mentioned `GlobalState` mechanism can be used to share state between agents/tasks. See the links mentioned in the previous answer. ## How can I suppress LLM output? You can use the `quiet_mode` context manager for this, see [here](https://langroid.github.io/langroid/notes/quiet-mode/) ## How can I deal with LLMs (especially weak ones) generating bad JSON in tools? Langroid already attempts to repair bad JSON (e.g. unescaped newlines, missing quotes, etc) using the [json-repair](https://github.com/mangiucugna/json_repair) library and other custom methods, before attempting to parse it into a `ToolMessage` object. However this type of repair may not be able to handle all edge cases of bad JSON from weak LLMs. There are two existing ways to deal with this, and one coming soon: - If you are defining your own `ToolMessage` subclass, considering deriving it instead from `XMLToolMessage` instead, see the [XML-based Tools](https://langroid.github.io/langroid/notes/xml-tools/) - If you are using an existing Langroid `ToolMessage`, e.g. `SendTool`, you can define your own subclass of `SendTool`, say `XMLSendTool`, inheriting from both `SendTool` and `XMLToolMessage`; see this [example](https://github.com/langroid/langroid/blob/main/examples/basic/xml_tool.py) - Coming soon: strict decoding to leverage the Structured JSON outputs supported by OpenAI and open LLM providers such as `llama.cpp` and `vllm`. The first two methods instruct the LLM to generate XML instead of JSON, and any field that is designated with a `verbatim=True` will be enclosed within an XML `CDATA` tag, which does *not* require any escaping, and can be far more reliable for tool-use than JSON, especially with weak LLMs. ## How can I handle an LLM "forgetting" to generate a `ToolMessage`? Sometimes the LLM (especially a weak one) forgets to generate a [`ToolMessage`][langroid.agent.tool_message.ToolMessage] (either via OpenAI's tools/functions API, or via Langroid's JSON/XML Tool mechanism), despite being instructed to do so. There are a few remedies Langroid offers for this: **Improve the instructions in the `ToolMessage` definition:** - Improve instructions in the `purpose` field of the `ToolMessage`. - Add an `instructions` class-method to the `ToolMessage`, as in the [`chat-search.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat-search.py) script: ```python @classmethod def instructions(cls) -> str: return """ IMPORTANT: You must include an ACTUAL query in the `query` field, """ ``` These instructions are meant to be general instructions on how to use the tool (e.g. how to set the field values), not to specifically about the formatting. - Add a `format_instructions` class-method, e.g. like the one in the [`chat-multi-extract-3.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat-multi-extract-3.py) example script. ```python @classmethod def format_instructions(cls, tool: bool = True) -> str: instr = super().format_instructions(tool) instr += """ ------------------------------ ASK ME QUESTIONS ONE BY ONE, to FILL IN THE FIELDS of the `lease_info` function/tool. First ask me for the start date of the lease. DO NOT ASK ANYTHING ELSE UNTIL YOU RECEIVE MY ANSWER. """ return instr ``` **Override the `handle_message_fallback` method in the agent:** This method is called when the Agent's `agent_response` method receives a non-tool message as input. The default behavior of this method is to return None, but it is very useful to override the method to handle cases where the LLM has forgotten to use a tool. You can define this method to return a "nudge" to the LLM telling it that it forgot to do a tool-call, e.g. see how it's done in the example script [`chat-multi-extract-local.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat-multi-extract-local.py): ```python class LeasePresenterAgent(ChatAgent): def handle_message_fallback( self, msg: str | ChatDocument ) -> str | ChatDocument | None: """Handle scenario where Agent failed to present the Lease JSON""" if isinstance(msg, ChatDocument) and msg.metadata.sender == Entity.LLM: return """ You either forgot to present the information in the JSON format required in `lease_info` JSON specification, or you may have used the wrong name of the tool or fields. Try again. """ return None ``` Note that despite doing all of these, the LLM may still fail to generate a `ToolMessage`. In such cases, you may want to consider using a better LLM, or an up-coming Langroid feature that leverages **strict decoding** abilities of specific LLM providers (e.g. OpenAI, llama.cpp, vllm) that are able to use grammar-constrained decoding to force the output to conform to the specified structure. Langroid also provides a simpler mechanism to specify the action to take when an LLM does not generate a tool, via the `ChatAgentConfig.handle_llm_no_tool` config parameter, see the [docs](https://langroid.github.io/langroid/notes/handle-llm-no-tool/). ## Can I use Langroid to converse with a Knowledge Graph (KG)? Yes, you can use Langroid to "chat with" either a Neo4j or ArangoDB KG, see docs [here](https://langroid.github.io/langroid/notes/knowledge-graphs/) ## How can I improve `DocChatAgent` (RAG) latency? The behavior of `DocChatAgent` can be controlled by a number of settings in the `DocChatAgentConfig` class. The top-level query-answering method in `DocChatAgent` is `llm_response`, which use the `answer_from_docs` method. At a high level, the response to an input message involves the following steps: - **Query to StandAlone:** LLM rephrases the query as a stand-alone query. This can incur some latency. You can turn it off by setting `assistant_mode=True` in the `DocChatAgentConfig`. - **Retrieval:** The most relevant passages (chunks) are retrieved using a collection of semantic/lexical similarity searches and ranking methods. There are various knobs in `DocChatAgentConfig` to control this retrieval. - **Relevance Extraction:** LLM is used to retrieve verbatim relevant portions from the retrieved chunks. This is typically the biggest latency step. You can turn it off by setting the `relevance_extractor_config` to None in `DocChatAgentConfig`. - **Answer Generation:** LLM generates answer based on retrieved passages. See the [`doc-aware-chat.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/doc-aware-chat.py) example script, which illustrates some of these settings. In some scenarios you want to *only* use the **retrieval** step of a `DocChatAgent`. For this you can use the [`RetrievalTool`][langroid.agent.tools.retrieval_tool.RetrievalTool]. See the `test_retrieval_tool` in [`test_doc_chat_agent.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_doc_chat_agent.py). to learn how to use it. The above example script uses `RetrievalTool` as well. ## Is there support to run multiple tasks concurrently? Yes, see the `run_batch_tasks` and related functions in [batch.py](https://github.com/langroid/langroid/blob/main/langroid/agent/batch.py). See also: - tests: [test_batch.py](https://github.com/langroid/langroid/blob/main/tests/main/test_batch.py), [test_relevance_extractor.py](https://github.com/langroid/langroid/blob/main/tests/main/test_relevance_extractor.py), - example: [multi-agent-round-table.py](https://github.com/langroid/langroid/blob/main/examples/basic/multi-agent-round-table.py) Another example is within [`DocChatAgent`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/doc_chat_agent.py), which uses batch tasks for relevance extraction, see the `get_verbatim_extracts` method -- when there are k relevant passages, this runs k tasks concurrently, each of which uses an LLM-agent to extract relevant verbatim text from a passage. ## Can I use Langroid in a FastAPI server? Yes, see the [langroid/fastapi-server](https://github.com/langroid/fastapi-server) repo. ## Can a sub-task end all parent tasks and return a result? Yes, there are two ways to achieve this, using [`FinalResultTool`][langroid.agent.tools.orchestration.final_result_tool.FinalResultTool]: From a `ChatAgent`'s tool-handler or `agent_response` method: Your code can return a `FinalResultTool` with arbitrary field types; this ends the current and all parent tasks and this `FinalResultTool` will appear as one of tools in the final `ChatDocument.tool_messages`. See `test_tool_handlers_and_results` in [test_tool_messages.py](https://github.com/langroid/langroid/blob/main/tests/main/test_tool_messages.py), and [examples/basic/chat-tool-function.py](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tool-function.py) From `ChatAgent`'s `llm_response` method: you can define a subclass of a `FinalResultTool` and enable the agent to use this tool, which means it will become available for the LLM to generate. See [examples/basic/multi-agent-return-result.py](https://github.com/langroid/langroid/blob/main/examples/basic/multi-agent-return-result.py). ## How can I configure a task to retain or discard prior conversation? In some scenarios, you may want to control whether each time you call a task's `run` method, the underlying agent retains the conversation history from the previous run. There are two boolean config parameters that control this behavior: - the `restart` parameter (default `True`) in the `Task` constructor, and - the `restart_as_subtask` (default `False`) parameter in the `TaskConfig` argument of the `Task` constructor. To understand how these work, consider a simple scenario of a task `t` that has a subtask `t1`, e.g., suppose you have the following code with default settings of the `restart` and `restart_as_subtask` parameters: ```python from langroid.agent.task import Task from langroid.agent.task import TaskConfig # default setttings: rs = False r = r1 = True agent = ... task_config = TaskConfig(restart_as_subtask=rs) t = Task(agent, restart=r, config=task_config) agent1 = ... t1 = Task(agent1, restart=r1, config=task_config) t.add_subtask(t1) ``` This default setting works as follows: Since task `t` was constructed with the default `restart=True`, when `t.run()` is called, the conversation histories of the agent underlying `t` as well as all those of all subtasks (such as `t1`) are reset. However, if during `t.run()`, there are multiple calls to `t1.run()`, then the conversation history is retained across these calls, even though `t1` was constructed with the default `restart=True` -- this is because the `restart` constructor parameter has no effect on a task's reset behavior **when it is a subtask**. The `TaskConfig.restart_as_subtask` parameter controls the reset behavior of a task's `run` method when invoked as a subtask. It defaults to `False`, which is why in the above example, the conversation history of `t1` is retained across multiple calls to `t1.run()` that may occur during execution of `t.run()`. If you set this parameter to `True` in the above example, then the conversation history of `t1` would be reset each time `t1.run()` is called, during a call to `t.run()`. To summarize, - The `Task` constructor's `restart` parameter controls the reset behavior of the task's `run` method when it is called directly, not as a subtask. - The `TaskConfig.restart_as_subtask` parameter controls the reset behavior of the task's `run` method when it is called as a subtask. These settings can be mixed and matched as needed. Additionally, all reset behavior can be turned off during a specific `run()` invocation by calling it with `allow_restart=False`, e.g., `t.run(..., allow_restart=False)`. ## How can I set up a task to exit as soon as the LLM responds? In some cases you may want the top-level task or a subtask to exit as soon as the LLM responds. You can get this behavior by setting `single_round=True` during task construction, e.g., ```python from langroid.agent.task import Task agent = ... t = Task(agent, single_round=True, interactive=False) result = t.run("What is 4 + 5?") ``` The name `single_round` comes from the fact that the task loop ends as soon as any **one** of the agent's responders return a valid response. Recall that an agent's responders are `llm_response`, `agent_response` (for tool handling), and `user_response` (for user input). In the above example there are no tools and no user interaction (since `interactive=False`), so the task will exit as soon as the LLM responds. More commonly, you may only want this single-round behavior for a subtask, e.g., ```python agent = ... t = Task(agent, single_round=False, interactive=True) agent1 = ... t1 = Task(agent1, single_round=True, interactive=False) t.add_subtask(t1) top_level_query = ... result = t.run(...) ``` See the example script [`chat-2-agent-discuss.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-2-agent-discuss.py) for an example of this, and also search for `single_round` in the rest of the examples. !!! warning "Using `single_round=True` will prevent tool-handling" As explained above, setting `single_round=True` will cause the task to exit as soon as the LLM responds, and thus if it emits a valid tool (which the agent is enabled to handle), this tool will *not* be handled. --- --- title: 'Language Models: Completion and Chat-Completion' draft: false date: 2023-09-19 authors: - pchalasani categories: - langroid - llm - local-llm - chat comments: true --- Transformer-based language models are fundamentally next-token predictors, so naturally all LLM APIs today at least provide a completion endpoint. If an LLM is a next-token predictor, how could it possibly be used to generate a response to a question or instruction, or to engage in a conversation with a human user? This is where the idea of "chat-completion" comes in. This post is a refresher on the distinction between completion and chat-completion, and some interesting details on how chat-completion is implemented in practice. ## Language Models as Next-token Predictors A Language Model is essentially a "next-token prediction" model, and so all LLMs today provide a "completion" endpoint, typically something like: `/completions` under the base URL. The endpoint simply takes a prompt and returns a completion (i.e. a continuation). A typical prompt sent to a completion endpoint might look like this: ``` The capital of Belgium is ``` and the LLM will return a completion like this: ``` Brussels. ``` OpenAI's GPT3 is an example of a pure completion LLM. But interacting with a completion LLM is not very natural or useful: you cannot give instructions or ask questions; instead you would always need to formulate your input as a prompt whose natural continuation is your desired output. For example, if you wanted the LLM to highlight all proper nouns in a sentence, you would format it as the following prompt: **Chat-To-Prompt Example:** Chat/Instruction converted to a completion prompt. ``` User: here is a sentence, the Assistant's task is to identify all proper nouns. Jack lives in Bosnia, and Jill lives in Belgium. Assistant: ``` The natural continuation of this prompt would be a response listing the proper nouns, something like: ``` John, Bosnia, Jill, Belgium are all proper nouns. ``` This _seems_ sensible in theory, but a "base" LLM that performs well on completions may _not_ perform well on these kinds of prompts. The reason is that during its training, it may not have been exposed to very many examples of this type of prompt-response pair. So how can an LLM be improved to perform well on these kinds of prompts? ## Instruction-tuned, Aligned LLMs This brings us to the heart of the innovation behind the wildly popular ChatGPT: it uses an enhancement of GPT3 that (besides having a lot more parameters), was _explicitly_ fine-tuned on instructions (and dialogs more generally) -- this is referred to as **instruction-fine-tuning** or IFT for short. In addition to fine-tuning instructions/dialogs, the models behind ChatGPT (i.e., GPT-3.5-Turbo and GPT-4) are further tuned to produce responses that _align_ with human preferences (i.e. produce responses that are more helpful and safe), using a procedure called Reinforcement Learning with Human Feedback (RLHF). See this [OpenAI InstructGPT Paper](https://arxiv.org/pdf/2203.02155.pdf) for details on these techniques and references to the original papers that introduced these ideas. Another recommended read is Sebastian Raschka's post on [RLHF and related techniques](https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives). For convenience, we refer to the combination of IFT and RLHF as **chat-tuning**. A chat-tuned LLM can be expected to perform well on prompts such as the one in the Chat-To-Prompt Example above. These types of prompts are still unnatural, however, so as a convenience, chat-tuned LLM API servers also provide a "chat-completion" endpoint (typically `/chat/completions` under the base URL), which allows the user to interact with them in a natural dialog, which might look like this (the portions in square brackets are indicators of who is generating the text): ``` [User] What is the capital of Belgium? [Assistant] The capital of Belgium is Brussels. ``` or ``` [User] In the text below, find all proper nouns: Jack lives in Bosnia, and Jill lives in Belgium. [Assistant] John, Bosnia, Jill, Belgium are all proper nouns. [User] Where does John live? [Assistant] John lives in Bosnia. ``` ## Chat Completion Endpoints: under the hood How could this work, given that LLMs are fundamentally next-token predictors? This is a convenience provided by the LLM API service (e.g. from OpenAI or local model server libraries): when a user invokes the chat-completion endpoint (typically at `/chat/completions` under the base URL), under the hood, the server converts the instructions and multi-turn chat history into a single string, with annotations indicating user and assistant turns, and ending with something like "Assistant:" as in the Chat-To-Prompt Example above. Now the subtle detail to note here is this: >It matters _how_ the dialog (instructions plus chat history) is converted into a single prompt string. Converting to a single prompt by simply concatenating the instructions and chat history using an "intuitive" format (e.g. indicating user, assistant turns using "User", "Assistant:", etc.) _can_ work, however most local LLMs are trained on a _specific_ prompt format. So if we format chats in a different way, we may get odd/inferior results. ## Converting Chats to Prompts: Formatting Rules For example, the llama2 models are trained on a format where the user's input is bracketed within special strings `[INST]` and `[/INST]`. There are other requirements that we don't go into here, but interested readers can refer to these links: - A reddit thread on the [llama2 formats](https://www.reddit.com/r/LocalLLaMA/comments/155po2p/get_llama_2_prompt_format_right/) - Facebook's [llama2 code](https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L44) - Langroid's [llama2 formatting code](https://github.com/langroid/langroid/blob/main/langroid/language_models/prompt_formatter/llama2_formatter.py) A dialog fed to a Llama2 model in its expected prompt format would look like this: ``` [INST] <> You are a helpful assistant. <> Hi there! [/INST] Hello! How can I help you today? [INST] In the text below, find all proper nouns: Jack lives in Bosnia, and Jill lives in Belgium. [/INST] John, Bosnia, Jill, Belgium are all proper nouns. [INST] Where does Jack live? [/INST] Jack lives in Bosnia. [INST] And Jill? [/INST] Jill lives in Belgium. [INST] Which are its neighboring countries? [/INST] ``` This means that if an LLM server library wants to provide a chat-completion endpoint for a local model, it needs to provide a way to convert chat history to a single prompt using the specific formatting rules of the model. For example the [`oobabooga/text-generation-webui`](https://github.com/oobabooga/text-generation-webui) library has an extensive set of chat formatting [templates](https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates) for a variety of models, and their model server auto-detects the format template from the model name. !!! note "Chat completion model names: look for 'chat' or 'instruct' in the name" You can search for a variety of models on the [HuggingFace model hub](https://huggingface.co/models). For example if you see a name `Llama-2-70B-chat-GGUF` you know it is chat-tuned. Another example of a chat-tuned model is `Llama-2-7B-32K-Instruct` A user of these local LLM server libraries thus has two options when using a local model in chat mode: - use the _chat-completion_ endpoint, and let the underlying library handle the chat-to-prompt formatting, or - first format the chat history according to the model's requirements, and then use the _completion_ endpoint ## Using Local Models in Langroid Local models can be used in Langroid by defining a `LocalModelConfig` object. More details are in this [tutorial](https://langroid.github.io/langroid/blog/2023/09/14/using-langroid-with-local-llms/), but here we briefly discuss prompt-formatting in this context. Langroid provides a built-in [formatter for LLama2 models](https://github.com/langroid/langroid/blob/main/langroid/language_models/prompt_formatter/llama2_formatter.py), so users looking to use llama2 models with langroid can try either of these options, by setting the `use_completion_for_chat` flag in the `LocalModelConfig` object (See the local-LLM [tutorial](https://langroid.github.io/langroid/blog/2023/09/14/using-langroid-with-local-llms/) for details). When this flag is set to `True`, the chat history is formatted using the built-in Langroid llama2 formatter and the completion endpoint are used. When the flag is set to `False`, the chat history is sent directly to the chat-completion endpoint, which internally converts the chat history to a prompt in the expected llama2 format. For local models other than Llama2, users can either: - write their own formatters by writing a class similar to `Llama2Formatter` and then setting the `use_completion_for_chat` flag to `True` in the `LocalModelConfig` object, or - use an LLM server library (such as the `oobabooga` library mentioned above) that provides a chat-completion endpoint, _and converts chats to single prompts under the hood,_ and set the `use_completion_for_chat` flag to `False` in the `LocalModelConfig` object. You can use a similar approach if you are using an LLM application framework other than Langroid. --- --- title: "Overview of Langroid's Multi-Agent Architecture (prelim)" draft: false date: 2024-08-15 authors: - pchalasani - nils - jihye - someshjha categories: - langroid - multi-agent - llm comments: true --- ## Agent, as an intelligent message transformer A natural and convenient abstraction in designing a complex LLM-powered system is the notion of an *agent* that is instructed to be responsible for a specific aspect of the overall task. In terms of code, an *Agent* is essentially a class representing an intelligent entity that can respond to *messages*, i.e., an agent is simply a *message transformer*. An agent typically encapsulates an (interface to an) LLM, and may also be equipped with so-called *tools* (as described below) and *external documents/data* (e.g., via a vector database, as described below). Much like a team of humans, agents interact by exchanging messages, in a manner reminiscent of the [*actor framework*](https://en.wikipedia.org/wiki/Actor_model) in programming languages. An *orchestration mechanism* is needed to manage the flow of messages between agents, to ensure that progress is made towards completion of the task, and to handle the inevitable cases where an agent deviates from instructions. Langroid is founded on this *multi-agent programming* paradigm, where agents are first-class citizens, acting as message transformers, and communicate by exchanging messages. To build useful applications with LLMs, we need to endow them with the ability to trigger actions (such as API calls, computations, database queries, etc) or send structured messages to other agents or downstream processes. *Tools* provide these capabilities, described next. ## Tools, also known as functions An LLM is essentially a text transformer; i.e., in response to some input text, it produces a text response. Free-form text responses are ideal when we want to generate a description, answer, or summary for human consumption, or even a question for another agent to answer. However, in some cases, we would like the responses to be more structured, for example to trigger external *actions* (such as an API call, code execution, or a database query), or for unambiguous/deterministic handling by a downstream process or another agent. In such cases, we would instruct the LLM to produce a *structured* output, typically in JSON format, with various pre-specified fields, such as code, an SQL query, parameters of an API call, and so on. These structured responses have come to be known as *tools*, and the LLM is said to *use* a tool when it produces a structured response corresponding to a specific tool. To elicit a tool response from an LLM, it needs to be instructed on the expected tool format and the conditions under which it should use the tool. To actually use a tool emitted by an LLM, a *tool handler* method must be defined as well. The tool handler for a given tool is triggered when it is recognized in the LLM's response. ### Tool Use: Example As a simple example, a SQL query tool can be specified as a JSON structure with a `sql` field (containing the SQL query) and a `db` field (containing the name of the database). The LLM may be instructed with a system prompt of the form: > When the user asks a question about employees, use the SQLTool described in the below schema, > and the results of this tool will be sent back to you, and you can use these to respond to > the user's question, or correct your SQL query if there is a syntax error. The tool handler would detect this specific tool in the LLM's response, parse this JSON structure, extract the `sql` and `db` fields, run the query on the specified database, and return the result if the query ran successfully, otherwise return an error message. Depending on how the multi-agent system is organized, the query result or error message may be handled by the same agent (i.e., its LLM), which may either summarize the results in narrative form, or revise the query if the error message indicates a syntax error. ## Agent-oriented programming: Function-Signatures If we view an LLM as a function with signature `string -> string`, it is possible to express the concept of an agent, tool, and other constructs in terms of derived function signatures, as shown in the table below. Adding `tool` (or function calling) capability to an LLM requires a parser (that recognizes that the LLM has generated a tool) and a callback that performs arbitrary computation and returns a string. The serialized instances of tools `T` correspond to a language `L`; Since by assumption, the LLM is capable of producing outputs in $L$, this allows the LLM to express the intention to execute a Callback with arbitrary instances of `T`. In the last row, we show how an Agent can be viewed as a function signature involving its state `S`. | Function Description | Function Signature | |----------------------|-------------------------------------------------------------------------------------------------------------------| | LLM | `[Input Query] -> string`
`[Input Query]` is the original query. | | Chat interface | `[Message History] x [Input Query] -> string`
`[Message History]` consists of previous messages[^1]. | | Agent | `[System Message] x [Message History] x [Input Query] -> string`
`[System Message]` is the system prompt. | | Agent with tool | `[System Message] x (string -> T) x (T -> string) x [Message History] x [Input Query] -> string` | | Parser with type `T` | `string -> T` | | Callback with type `T` | `T -> string` | | General Agent with state type `S` | `S x [System Message] x (string -> T) x (S x T -> S x string) x [Message History] x [Input Query] -> S x string` | [^1]: Note that in reality, separator tokens are added to distinguish messages, and the messages are tagged with metadata indicating the sender, among other things. ## Multi-Agent Orchestration ### An Agent's "Native" Responders When building an LLM-based multi-agent system, an orchestration mechanism is critical to manage the flow of messages between agents, to ensure task progress, and handle inevitable LLM deviations from instructions. Langroid provides a simple yet versatile orchestration mechanism that seamlessly handles: - user interaction, - tool handling, - sub-task delegation We view an agent as a message transformer; it may transform an incoming message using one of its three "native" responder methods, all of which have the same function signature: `string -> string`. These methods are: - `llm_response` returns the LLM's response to the input message. Whenever this method is invoked, the agent updates its dialog history (typically consisting of alternating user and LLM messages). - `user_response` prompts the user for input and returns their response. - `agent_response` by default only handles a `tool message` (i.e., one that contains an llm-generated structured response): it performs any requested actions, and returns the result as a string. An `agent_response` method can have other uses besides handling tool messages, such as handling scenarios where an LLM ``forgot'' to use a tool, or used a tool incorrectly, and so on. To see why it is useful to have these responder methods, consider first a simple example of creating a basic chat loop with the user. It is trivial to create such a loop by alternating between `user_response` and `llm_response`. Now suppose we instruct the agent to either directly answer the user's question or perform a web-search. Then it is possible that sometimes the `llm_response` will produce a "tool message", say `WebSearchTool`, which we would handle with the `agent_response` method. This requires a slightly different, and more involved, way of iterating among the agent's responder methods. ### Tasks: Encapsulating Agent Orchestration From a coding perspective, it is useful to hide the actual iteration logic by wrapping an Agent class in a separate class, which we call a `Task`, which encapsulates all of the orchestration logic. Users of the Task class can then define the agent, tools, and any sub-tasks, wrap the agent in a task object of class Task, and simply call `task.run()`, letting the Task class deal with the details of orchestrating the agent's responder methods, determining task completion, and invoking sub-tasks. ### Responders in a Task: Agent's native responders and sub-tasks The orchestration mechanism of a `Task` object works as follows. When a `Task` object is created from an agent, a sequence of eligible responders is created, which includes the agent's three "native" responder agents in the sequence: `agent_response`, `llm_response`, `user_response`. The type signature of the task's run method is `string -> string`, just like the Agent's native responder methods, and this is the key to seamless delegation of tasks to sub-tasks. A list of subtasks can be added to a `Task` object via `task.add_sub_tasks([t1, t2, ... ])`, where `[t1, t2, ...]` are other `Task` objects. The result of this is that the run method of each sub-task is appended to the sequence of eligible responders in the parent task object. ### Task Orchestration: Updating the Current Pending Message (CPM) A task always maintains a *current pending message* (CPM), which is the latest message "awaiting" a valid response from a responder, which updates the CPM. At a high level the `run` method of a task attempts to repeatedly find a valid response to the CPM, until the task is done. (Note that this paradigm is somewhat reminescent of a *Blackboard* architecture, where agents take turns deciding whether they can update the shared message on the "blackboard".) This is achieved by repeatedly invoking the `step` method, which represents a "turn" in the conversation. The `step` method sequentially tries the eligible responders from the beginning of the eligible-responders list, until it finds a valid response, defined as a non-null or terminating message (i.e. one that signals that the task is done). In particular, this `step()` algorithm implies that a Task delegates (or "fails over") to a sub-task only if the task's native responders have no valid response. There are a few simple rules that govern how `step` works: - a responder entity (either a sub-task or a native entity -- one of LLM, Agent, or User) cannot respond if it just responded in the previous step (this prevents a responder from "talking to itself". - when a response signals that the task is done (via a `DoneTool` or a "DONE" string) the task is ready to exit and return the CPM as the result of the task. - when an entity "in charge" of the task has a null response, the task is considered finished and ready to exit. - if the response of an entity or subtask is a structured message containing a recipient field, then the specified recipient task or entity will be the only one eligible to respond at the next step. Once a valid response is found in a step, the CPM is updated to this response, and the next step starts the search for a valid response from the beginning of the eligible responders list. When a response signals that the task is done, the run method returns the CPM as the result of the task. This is a highly simplified account of the orchestration mechanism, and the actual implementation is more involved. The above simple design is surprising powerful and can support a wide variety of task structures, including trees and DAGs. As a simple illustrative example, tool-handling has a natural implementation. The LLM is instructed to use a certain JSON-structured message as a tool, and thus the `llm_response` method can produce a structured message, such as an SQL query. This structured message is then handled by the `agent_response` method, and the resulting message updates the CPM. The `llm_response` method then becomes eligible to respond again: for example if the agent's response contains an SQL error, the LLM would retry its query, and if the agent's response consists of the query results, the LLM would respond with a summary of the results. The Figure below depicts the task orchestration and delegation mechanism, showing how iteration among responder methods works when a Task `T` has sub-tasks `[T1, T2]` and `T1` has a sub-task `T3`. ![langroid-arch.png](figures/langroid-arch.png) --- --- title: 'Langroid: Harness LLMs with Multi-Agent Programming' draft: false date: 2023-09-03 authors: - pchalasani categories: - langroid - llm comments: true --- # Langroid: Harness LLMs with Multi-Agent Programming ## The LLM Opportunity Given the remarkable abilities of recent Large Language Models (LLMs), there is an unprecedented opportunity to build intelligent applications powered by this transformative technology. The top question for any enterprise is: how best to harness the power of LLMs for complex applications? For technical and practical reasons, building LLM-powered applications is not as simple as throwing a task at an LLM-system and expecting it to do it. ## Langroid's Multi-Agent Programming Framework Effectively leveraging LLMs at scale requires a *principled programming framework*. In particular, there is often a need to maintain multiple LLM conversations, each instructed in different ways, and "responsible" for different aspects of a task. An *agent* is a convenient abstraction that encapsulates LLM conversation state, along with access to long-term memory (vector-stores) and tools (a.k.a functions or plugins). Thus a **Multi-Agent Programming** framework is a natural fit for complex LLM-based applications. > Langroid is the first Python LLM-application framework that was explicitly designed with Agents as first-class citizens, and Multi-Agent Programming as the core design principle. The framework is inspired by ideas from the [Actor Framework](https://en.wikipedia.org/wiki/Actor_model). Langroid allows an intuitive definition of agents, tasks and task-delegation among agents. There is a principled mechanism to orchestrate multi-agent collaboration. Agents act as message-transformers, and take turns responding to (and transforming) the current message. The architecture is lightweight, transparent, flexible, and allows other types of orchestration to be implemented. Besides Agents, Langroid also provides simple ways to directly interact with LLMs and vector-stores. ## Highlights - **Agents as first-class citizens:** The `Agent` class encapsulates LLM conversation state, and optionally a vector-store and tools. Agents are a core abstraction in Langroid; Agents act as _message transformers_, and by default provide 3 _responder_ methods, one corresponding to each entity: LLM, Agent, User. - **Tasks:** A Task class wraps an Agent, gives the agent instructions (or roles, or goals), manages iteration over an Agent's responder methods, and orchestrates multi-agent interactions via hierarchical, recursive task-delegation. The `Task.run()` method has the same type-signature as an Agent's responder's methods, and this is key to how a task of an agent can delegate to other sub-tasks: from the point of view of a Task, sub-tasks are simply additional responders, to be used in a round-robin fashion after the agent's own responders. - **Modularity, Reusability, Loose coupling:** The `Agent` and `Task` abstractions allow users to design Agents with specific skills, wrap them in Tasks, and combine tasks in a flexible way. - **LLM Support**: Langroid supports OpenAI LLMs including GPT-3.5-Turbo, GPT-4. - **Caching of LLM prompts, responses:** Langroid by default uses [Redis](https://redis.com/try-free/) for caching. - **Vector-stores**: [Qdrant](https://qdrant.tech/), [Chroma](https://www.trychroma.com/), LanceDB, Pinecone, PostgresDB (PGVector), Weaviate are currently supported. Vector stores allow for Retrieval-Augmented-Generaation (RAG). - **Grounding and source-citation:** Access to external documents via vector-stores allows for grounding and source-citation. - **Observability, Logging, Lineage:** Langroid generates detailed logs of multi-agent interactions and maintains provenance/lineage of messages, so that you can trace back the origin of a message. - **Tools/Plugins/Function-calling**: Langroid supports OpenAI's recently released [function calling](https://platform.openai.com/docs/guides/gpt/function-calling) feature. In addition, Langroid has its own native equivalent, which we call **tools** (also known as "plugins" in other contexts). Function calling and tools have the same developer-facing interface, implemented using [Pydantic](https://docs.pydantic.dev/latest/), which makes it very easy to define tools/functions and enable agents to use them. Benefits of using Pydantic are that you never have to write complex JSON specs for function calling, and when the LLM hallucinates malformed JSON, the Pydantic error message is sent back to the LLM so it can fix it! --- --- title: 'Langroid: Knolwedge Graph RAG powered by Neo4j' draft: false date: 2024-01-18 authors: - mohannad categories: - langroid - neo4j - rag - knowledge-graph comments: true --- ## "Chat" with various sources of information LLMs are increasingly being used to let users converse in natural language with a variety of types of data sources: - unstructured text documents: a user's query is augmented with "relevant" documents or chunks (retrieved from an embedding-vector store) and fed to the LLM to generate a response -- this is the idea behind Retrieval Augmented Generation (RAG). - SQL Databases: An LLM translates a user's natural language question into an SQL query, which is then executed by another module, sending results to the LLM, so it can generate a natural language response based on the results. - Tabular datasets: similar to the SQL case, except instead of an SQL Query, the LLM generates a Pandas dataframe expression. Langroid has had specialized Agents for the above scenarios: `DocChatAgent` for RAG with unstructured text documents, `SQLChatAgent` for SQL databases, and `TableChatAgent` for tabular datasets. ## Adding support for Neo4j Knowledge Graphs Analogous to the SQLChatAgent, Langroid now has a [`Neo4jChatAgent`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/neo4j/neo4j_chat_agent.py) to interact with a Neo4j knowledge graph using natural language. This Agent has access to two key tools that enable it to handle a user's queries: - `GraphSchemaTool` to get the schema of a Neo4j knowledge graph. - `CypherRetrievalTool` to generate Cypher queries from a user's query. Cypher is a specialized query language for Neo4j, and even though it is not as widely known as SQL, most LLMs today can generate Cypher Queries. Setting up a basic Neo4j-based RAG chatbot is straightforward. First ensure you set these environment variables (or provide them in a `.env` file): ```bash NEO4J_URI= NEO4J_USERNAME= NEO4J_PASSWORD= NEO4J_DATABASE= ``` Then you can configure and define a `Neo4jChatAgent` like this: ```python import langroid as lr import langroid.language_models as lm from langroid.agent.special.neo4j.neo4j_chat_agent import ( Neo4jChatAgent, Neo4jChatAgentConfig, Neo4jSettings, ) llm_config = lm.OpenAIGPTConfig() load_dotenv() neo4j_settings = Neo4jSettings() kg_rag_agent_config = Neo4jChatAgentConfig( neo4j_settings=neo4j_settings, llm=llm_config, ) kg_rag_agent = Neo4jChatAgent(kg_rag_agent_config) kg_rag_task = lr.Task(kg_rag_agent, name="kg_RAG") kg_rag_task.run() ``` ## Example: PyPi Package Dependency Chatbot In the Langroid-examples repository, there is an example python [script](https://github.com/langroid/langroid-examples/blob/main/examples/kg-chat/) showcasing tools/Function-calling + RAG using a `DependencyGraphAgent` derived from [`Neo4jChatAgent`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/neo4j/neo4j_chat_agent.py). This agent uses two tools, in addition to the tools available to `Neo4jChatAgent`: - `GoogleSearchTool` to find package version and type information, as well as to answer other web-based questions after acquiring the required information from the dependency graph. - `DepGraphTool` to construct a Neo4j knowledge-graph modeling the dependency structure for a specific package, using the API at [DepsDev](https://deps.dev/). In response to a user's query about dependencies, the Agent decides whether to use a Cypher query or do a web search. Here is what it looks like in action:
![dependency-demo](../../assets/demos/dependency_chatbot.gif)
Chatting with the `DependencyGraphAgent` (derived from Langroid's `Neo4jChatAgent`). When a user specifies a Python package name (in this case "chainlit"), the agent searches the web using `GoogleSearchTool` to find the version of the package, and then uses the `DepGraphTool` to construct the dependency graph as a neo4j knowledge graph. The agent then answers questions by generating Cypher queries to the knowledge graph, or by searching the web.
--- --- title: 'Langroid: Multi-Agent Programming Framework for LLMs' draft: true date: 2024-01-10 authors: - pchalasani categories: - langroid - lancedb - rag - vector-database comments: true --- ## Langroid: Multi-Agent Programming framework for LLMs In this era of Large Language Models (LLMs), there is unprecedented demand to create intelligent applications powered by this transformative technology. What is the best way for developers to harness the potential of LLMs in complex application scenarios? For a variety of technical and practical reasons (context length limitations, LLM brittleness, latency, token-costs), this is not as simple as throwing a task at an LLM system and expecting it to get done. What is needed is a principled programming framework, offering the right set of abstractions and primitives to make developers productive when building LLM applications. ## Langroid's Elegant Multi-Agent Paradigm The [Langroid](https://github.com/langroid/langroid) team (ex-CMU/UW-Madison researchers) has a unique take on this – they have built an open source Python framework to simplify LLM application development, using a Multi-Agent Programming paradigm. Langroid’s architecture is founded on Agents as first-class citizens: they are message-transformers, and accomplish tasks collaboratively via messages. Langroid is emerging as a popular LLM framework; developers appreciate its clean design and intuitive, extensible architecture. Programming with Langroid is natural and even fun: you configure Agents and equip them with capabilities ( such as LLMs, vector-databases, Function-calling/tools), connect them and have them collaborate via messages. This is a “Conversational Programming” paradigm, and works with local/open and remote/proprietary LLMs. (Importantly, it does not use LangChain or any other existing LLM framework).
![Langroid-card](../../assets/langroid-card-ossem-rust-1200x630.png){ width="800" }
An Agent serves as a convenient abstraction, encapsulating the state of LLM conversations, access to vector stores, and various tools (functions or plugins). A Multi-Agent Programming framework naturally aligns with the demands of complex LLM-based applications.
## Connecting Agents via Tasks In Langroid, a ChatAgent has a set of “responder” methods, one for each "entity": an LLM, a human, and a tool-handler. However it does not have any way to iterate through these responders. This is where the Task class comes in: A Task wraps an Agent and gives it the ability to loop through its responders, via the `Task.run()` method. A Task loop is organized around simple rules that govern when a responder is eligible to respond, what is considered a valid response, and when the task is complete. The simplest example of a Task loop is an interactive chat with the human user. A Task also enables an Agent to interact with other agents: other tasks can be added to a task as sub-tasks, in a recursive, hierarchical (or DAG) structure. From a Task’s perspective, sub-tasks are just additional responders, and present the same string-to-string message-transformation interface (function signature) as the Agent’s "native" responders. This is the key to composability of tasks in Langroid, since a sub-task can act the same way as an Agent's "native" responders, and is subject to the same rules of task orchestration. The result is that the same task orchestration mechanism seamlessly enables tool handling, retries when LLM deviates, and delegation to sub-tasks. More details are in the Langroid [quick-start guide](https://langroid.github.io/langroid/quick-start/) ## A Taste of Coding with Langroid To get started with Langroid, simply install it from pypi into your virtual environment: ```bash pip install langroid ``` To directly chat with an OpenAI LLM, define the LLM configuration, instantiate a language model object and interact with it: (Langroid works with non-OpenAI local/propreitary LLMs as well, see their [tutorial](https://langroid.github.io/langroid/tutorials/non-openai-llms/)) For the examples below, ensure you have a file `.env` containing your OpenAI API key with this line: `OPENAI_API_KEY=sk-...`. ```python import langroid as lr import langroid.language_models as lm llm_cfg = lm.OpenAIGPTConfig() # default GPT4-Turbo mdl = lm.OpenAIGPT(llm_cfg) mdl.chat("What is 3+4?", max_tokens=10) ``` The mdl does not maintain any conversation state; for that you need a `ChatAgent`: ```python agent_cfg = lr.ChatAgentConfig(llm=llm_cfg) agent = lr.ChatAgent(agent_cfg) agent.llm_response("What is the capital of China?") agent.llm_response("What about France?") # interprets based on previous msg ``` Wrap a ChatAgent in a Task to create a basic interactive loop with the user: ```python task = lr.Task(agent, name="Bot") task.run("Hello") ``` Have a Teacher Agent talk to a Student Agent: ```python teacher = lr.ChatAgent(agent_cfg) teacher_task = lr.Task( teacher, name="Teacher", system_message=""" Ask your student simple number-based questions, and give feedback. Start with a question. """, ) student = lr.ChatAgent(agent_cfg) student_task = lr.Task( student, name="Student", system_message="Concisely answer your teacher's questions." ) teacher_task.add_sub_task(student_task) teacher_task.run() ``` ## Retrieval Augmented Generation (RAG) and Vector Databases One of the most popular LLM applications is question-answering on documents via Retrieval-Augmented Generation (RAG), powered by a vector database. Langroid has a built-in DocChatAgent that incorporates a number of advanced RAG techniques, clearly laid out so they can be easily understood and extended. ### Built-in Support for LanceDB
![Langroid-lance](../../assets/langroid-lance.png){ width="800" }
Langroid uses LanceDB as the default vector store for its DocChatAgent.
Langroid's DocChatAgent uses the LanceDB serverless vector-database by default. Since LanceDB uses file storage, it is easy to set up and use (no need for docker or cloud services), and due to its use of the Lance columnar format, it is highly performant and scalable. In addition, Langroid has a specialized `LanceDocChatAgent` that leverages LanceDB's unique features such as Full-text search, SQL-like filtering, and pandas dataframe interop. Setting up a basic RAG chatbot is as simple as (assume the previous imports): ```python from langroid.agent.special.lance_doc_chat_agent import import ( LanceDocChatAgent, DocChatAgentConfig ) llm_config = lm.OpenAIGPTConfig() rag_agent_config = DocChatAgentConfig( llm=llm_config, doc_paths=["/path/to/my/docs"], # files, folders, or URLs. ) rag_agent = LanceDocChatAgent(rag_agent_config) rag_task = lr.Task(rag_agent, name="RAG") rag_task.run() ``` For an example showcasing Tools/Function-calling + RAG in a multi-agent setup, see their quick-start [Colab notebook](https://colab.research.google.com/github/langroid/langroid/blob/main/examples/Langroid_quick_start.ipynb) which shows a 2-agent system where one agent is tasked with extracting structured information from a document, and generates questions for the other agent to answer using RAG. In the Langroid-examples repo there is a [script](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py) with the same functionality, and here is what it looks like in action:
![lease-demo](../../assets/demos/lease-extractor-demo.gif){ width="800" }
Extracting structured info from a Commercial Lease using a 2-agent system, with a Tool/Function-calling and RAG. The Extractor Agent is told to extract information in a certain structure, and it generates questions for the Document Agent to answer using RAG.
## Retrieval Augmented Analytics One of the unique features of LanceDB is its SQL-like filtering and Pandas dataframe interoperability. LLMs are great at generating SQL queries, and also Pandas computation code such as `df.groupby("col").mean()`. This opens up a very interesting possibility, which we call **Retrieval Augmented Analytics:** Suppose a user has a large dataset of movie descriptions with metadata such as rating, year and genre, and wants to ask: > What is the highest-rated Comedy movie about college students made after 2010? It is not hard to imagine that an LLM should be able to generate a **Query Plan** to answer this, consisting of: - A SQL-like filter: `genre = "Comedy" and year > 2010` - A Pandas computation: `df.loc[df["rating"].idxmax()]` - A rephrased query given the filter: "Movie about college students" (used for semantic/lexical search) Langroid's Multi-Agent framework enables exactly this type of application. The [`LanceRAGTaskCreator`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/lance_rag/lance_rag_task.py) takes a `LanceDocChatAgent` and adds two additional agents: - QueryPlannerAgent: Generates the Query Plan - QueryPlanCriticAgent: Critiques the Query Plan and Answer received from the RAG Agent, so that the QueryPlanner can generate a better plan if needed. Checkout the [`lance-rag-movies.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/lance-rag-movies.py) script in the langroid-examples repo to try this out. ## Try it out and get involved! This was just a glimpse of what you can do with Langroid and how your code would look. Give it a shot and learn more about the features and roadmap of Langroid on their [GitHub repo](https://github.com/langroid/langroid). Langroid welcomes contributions, and they have a friendly [Discord](https://discord.gg/ZU36McDgDs) community. If you like it, don’t forget to drop a 🌟. --- --- title: 'Chat formatting in Local LLMs' draft: true date: 2024-01-25 authors: - pchalasani categories: - langroid - prompts - llm - local-llm comments: true --- In an (LLM performance) investigation, details matter! And assumptions kill (your LLM performance). I'm talking about chat/prompt formatting, especially when working with Local LLMs. TL/DR -- details like chat formatting matter a LOT, and trusting that the local LLM API is doing it correctly may be a mistake, leading to inferior results. 🤔Curious? Here are some notes from the trenches when we built an app (https://github.com/langroid/langroid/blob/main/examples/docqa/chat-multi-extract-local.py) based entirely on a locally running Mistral-7b-instruct-v0.2 (yes ONLY 7B parameters, compared to 175B+ for GPT4!) that leverages Langroid Multi-agents, Tools/Function-calling and RAG to reliably extract structured information from a document, where an Agent is given a spec of the desired structure, and it generates questions for another Agent to answer using RAG. 🔵LLM API types: generate and chat LLMs are typically served behind two types of APIs endpoints: ⏺ A "generation" API, which accepts a dialog formatted as a SINGLE string, and ⏺ a "chat" API, which accepts the dialog as a LIST, and as convenience formats it into a single string before sending to the LLM. 🔵Proprietary vs Local LLMs When you use a proprietary LLM API (such as OpenAI or Claude), for convenience you can use their "chat" API, and you can trust that it will format the dialog history correctly (or else they wouldn't be in business!). But with a local LLM, you have two choices of where to send the dialog history: ⏺ you could send it to the "chat" API and trust that the server will format it correctly, ⏺ or you could format it yourself and send it to the "generation" API. 🔵Example of prompt formatting? Suppose your system prompt and dialog look like this: System Prompt/Instructions: when I give you a number, respond with its double User (You): 3 Assistant (LLM): 6 User (You): 9 Mistral-instruct models expect this chat to be formatted like this (note that the system message is combined with the first user message): "[INST] when I give you a number, respond with its double 3 [/INST] 6 [INST] 9 [/INST]" 🔵Why does it matter? It matters A LOT -- because each type of LLM (llama2, mistral, etc) has been trained and/or fine-tuned on chats formatted in a SPECIFIC way, and if you deviate from that, you may get odd/inferior results. 🔵Using Mistral-7b-instruct-v0.2 via oobabooga/text-generation-webui "Ooba" is a great library (https://github.com/oobabooga/text-generation-webui) that lets you spin up an OpenAI-like API server for local models, such as llama2, mistral, etc. When we used its chat endpoint for a Langroid Agent, we were getting really strange results, with the LLM sometimes thinking it is the user! 😧 Digging in, we found that their internal formatting template was wrong, and it was formatting the system prompt as if it's the first user message -- this leads to the LLM interpreting the first user message as an assistant response, and so on -- no wonder there was role confusion! 💥Langroid solution: To avoid these issues, in Langroid we now have a formatter (https://github.com/langroid/langroid/blob/main/langroid/language_models/prompt_formatter/hf_formatter.py) that retrieves the HuggingFace tokenizer for the LLM and uses its "apply_chat_template" method to format chats. This gives you control over the chat format and you can use the "generation" endpoint of the LLM API instead of the "chat" endpoint. Once we switched to this, results improved dramatically 🚀 Be sure to checkout Langroid https://github.com/langroid/langroid #llm #ai #opensource --- --- title: 'Using Langroid with Local LLMs' draft: false date: 2023-09-14 authors: - pchalasani categories: - langroid - llm - local-llm comments: true --- ## Why local models? There are commercial, remotely served models that currently appear to beat all open/local models. So why care about local models? Local models are exciting for a number of reasons: - **cost**: other than compute/electricity, there is no cost to use them. - **privacy**: no concerns about sending your data to a remote server. - **latency**: no network latency due to remote API calls, so faster response times, provided you can get fast enough inference. - **uncensored**: some local models are not censored to avoid sensitive topics. - **fine-tunable**: you can fine-tune them on private/recent data, which current commercial models don't have access to. - **sheer thrill**: having a model running on your machine with no internet connection, and being able to have an intelligent conversation with it -- there is something almost magical about it. The main appeal with local models is that with sufficiently careful prompting, they may behave sufficiently well to be useful for specific tasks/domains, and bring all of the above benefits. Some ideas on how you might use local LLMs: - In a multi-agent system, you could have some agents use local models for narrow tasks with a lower bar for accuracy (and fix responses with multiple tries). - You could run many instances of the same or different models and combine their responses. - Local LLMs can act as a privacy layer, to identify and handle sensitive data before passing to remote LLMs. - Some local LLMs have intriguing features, for example llama.cpp lets you constrain its output using a grammar. ## Running LLMs locally There are several ways to use LLMs locally. See the [`r/LocalLLaMA`](https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_install_llama_8bit_and_4bit/) subreddit for a wealth of information. There are open source libraries that offer front-ends to run local models, for example [`oobabooga/text-generation-webui`](https://github.com/oobabooga/text-generation-webui) (or "ooba-TGW" for short) but the focus in this tutorial is on spinning up a server that mimics an OpenAI-like API, so that any code that works with the OpenAI API (for say GPT3.5 or GPT4) will work with a local model, with just a simple change: set `openai.api_base` to the URL where the local API server is listening, typically `http://localhost:8000/v1`. There are a few libraries we recommend for setting up local models with OpenAI-like APIs: - [LiteLLM OpenAI Proxy Server](https://docs.litellm.ai/docs/proxy_server) lets you set up a local proxy server for over 100+ LLM providers (remote and local). - [ooba-TGW](https://github.com/oobabooga/text-generation-webui) mentioned above, for a variety of models, including llama2 models. - [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) (LCP for short), specifically for llama2 models. - [ollama](https://github.com/jmorganca/ollama) We recommend visiting these links to see how to install and run these libraries. ## Use the local model with the OpenAI library Once you have a server running using any of the above methods, your code that works with the OpenAI models can be made to work with the local model, by simply changing the `openai.api_base` to the URL where the local server is listening. If you are using Langroid to build LLM applications, the framework takes care of the `api_base` setting in most cases, and you need to only set the `chat_model` parameter in the LLM config object for the LLM model you are using. See the [Non-OpenAI LLM tutorial](../../tutorials/non-openai-llms.md) for more details. --- --- title: 'MALADE: Multi-Agent Architecture for Pharmacovigilance' draft: false date: 2024-08-12 authors: - jihye - nils - pchalasani - mengelhard - someshjha - anivaryakumar - davidpage categories: - langroid - multi-agent - neo4j - rag comments: true --- # MALADE: Multi-Agent Architecture for Pharmacovigilance [Published in ML for HealthCare 2024](https://www.mlforhc.org/2024-abstracts) [Arxiv](https://arxiv.org/abs/2408.01869) [GitHub](https://github.com/jihyechoi77/malade) ## Summary We introduce MALADE (**M**ultiple **A**gents powered by **L**LMs for **ADE** Extraction), a multi-agent system for Pharmacovigilance. It is the first effective explainable multi-agent LLM system for extracting Adverse Drug Events (ADEs) from FDA drug labels and drug prescription data. Given a drug category and an adverse outcome, MALADE produces: - a qualitative label of risk (`increase`, `decrease` or `no-effect`), - confidence in the label (a number in $[0,1]$), - frequency of effect (`rare`, `common`, or `none`), - strength of evidence (`none`, `weak`, or `strong`), and - a justification with citations. This task is challenging for several reasons: - FDA labels and prescriptions are for individual drugs, not drug categories, so representative drugs in a category need to be identified from patient prescription data, and ADE information found for specific drugs in a category needs to be aggregated to make a statement about the category as a whole, - The data is noisy, with variations in the terminologies of drugs and outcomes, and - ADE descriptions are often buried in large amounts of narrative text. The MALADE architecture is LLM-agnostic and leverages the [Langroid](https://github.com/langroid/langroid) multi-agent framework. It consists of a combination of Agents using Retrieval Augmented Generation (RAG), that iteratively improve their answers based on feedback from Critic Agents. We evaluate the quantitative scores against a ground-truth dataset known as the [*OMOP Ground Truth Task*](https://www.niss.org/sites/default/files/Session3-DaveMadigan_PatrickRyanTalk_mar2015.pdf) and find that MALADE achieves state-of-the-art performance. ## Introduction In the era of Large Language Models (LLMs), given their remarkable text understanding and generation abilities, there is an unprecedented opportunity to develop new, LLM-based methods for trustworthy medical knowledge synthesis, extraction and summarization. The focus of this paper is Pharmacovigilance, a critical task in healthcare, where the goal is to monitor and evaluate the safety of drugs. In particular, the identification of Adverse Drug Events (ADEs) is crucial for ensuring patient safety. Consider a question such as this: > What is the effect of **ACE inhibitors** on the risk of developing **angioedema**? Here the **drug category** $C$ is _ACE inhibitors_, and the **outcome** $O$ is _angioedema_. Answering this question involves several steps: - **1(a): Find all drugs** in the ACE inhibitor category $C$, e.g. by searching the FDA [National Drug Code](https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory) (NDC) database. This can be done using Elastic-Search, with filters to handle variations in drug/category names and inaccurate classifications. - **1(b): Find the prescription frequency** of each drug in $C$ from patient prescription data, e.g. the [MIMIC-IV](https://physionet.org/content/mimiciv/3.0/) database. This can be done with a SQL query. - **1(c): Identify the representative drugs** $D \subset C$ in this category, based on prescription frequency data from step 2. - **2:** For each drug $d \in D$, **summarize ADE information** about the effect of $d$ on the outcome $O$ of interest, (in this case angioedema) from text-based pharmaceutical sources, e.g. the [OpenFDA Drug Label](https://open.fda.gov/apis/drug/label/) database. - **3: Aggregate** the information from all drugs in $D$ to make a statement about the category $C$ as a whole. ## The role of LLMs While steps 1(a) and 1(b) can be done by straightforward deterministic algorithms (SQL queries or Elastic-Search), the remaining steps are challenging but ideally suited to LLMs: ### Step 1(c): Identifying representative drugs in a category from prescription frequency data (`DrugFinder` Agent) This is complicated by noise, such as the same drug appearing multiple times under different names, formulations or delivery methods (For example, the ACE inhibitor **Lisinopril** is also known as **Zestril** and **Prinivil**.) Thus a judgment must be made as to whether these are sufficiently different to be considered pharmacologically distinct; and some of these drugs may not actually belong to the category. This task thus requires a grouping operation, related to the task of identifying standardized drug codes from text descriptions, well known to be challenging. This makes it very difficult to explicitly define the algorithm in a deterministic manner that covers all edge cases (unlike the above database tasks), and hence is well-suited to LLMs, particularly those such as GPT-4, Claude3.5, and similar-strength variants which are known to have been trained on vast amounts of general medical texts. In MALADE, this task is handled by the `DrugFinder` agent, which is an Agent/Critic system where the main agent iteratively improves its output in a feedback loop with the Critic agent. For example, the Critic corrects the Agent when it incorrectly classifies drugs as pharmacologically distinct. ### Step 2: Identifying Drug-Outcome Associations (`DrugOutcomeInfoAgent`) The task here is to identify whether a given drug has an established effect on the risk of a given outcome, based on FDA drug label database, and output a summary of relevant information, including the level of identified risk and the evidence for such an effect. Since this task involves extracting information from narrative text, it is well-suited to LLMs using the Retrieval Augmented Generation (RAG) technique. In MALADE, the `DrugOutcomeInfoAgent` handles this task, and is also an Agent/Critic system, where the Critic provides feedback and corrections to the Agent's output. This agent does not have direct access to the FDA Drug Label data, but can receive this information via another agent, `FDAHandler`. FDAHandler is equipped with **tools** (also known as function-calls) to invoke the OpenFDA API for drug label data, and answers questions in the context of information retrieved based on the queries. Information received from this API is ingested into a vector database, so the agent first uses a tool to query this vector database, and only resorts to the OpenFDA API tool if the vector database does not contain the relevant information. An important aspect of this agent is that its responses include specific **citations** and **excerpts** justifying its conclusions. ### Step 3: Labeling Drug Category-Outcome Associations (`CategoryOutcomeRiskAgent`) To identify association between a drug category C and an adverse health outcome $O$, we concurrently run a batch of queries to copies of `DrugOutcomeInfoAgent`, one for each drug $d$ in the representative-list $D$ for the category, of the form: > Does drug $d$ increase or decrease the risk of condition $O$? The results are sent to `CategoryOutcomeRiskAgent`, which is an Agent/Critic system which performs the final classification step; its goal is to generate the qualitative and quantitative outputs mentioned above. ## MALADE Architecture The figure below illustrates how the MALADE architecture handles the query, > What is the effect of **ACE inhibitors** on the risk of developing **angioedema**? ![malade-arch.png](figures/malade-arch.png) The query triggers a sequence of subtasks performed by the three Agents described above: `DrugFinder`, `DrugOutcomeInfoAgent`, and `CategoryOutcomeRiskAgent`. Each Agent generates a response and justification, which are validated by a corresponding Critic agent, whose feedback is used by the Agent to revise its response. ## Evaluation ### OMOP Ground Truth We evaluate the results of MALADE against a well-established ground-truth dataset, the [OMOP ADE ground-truth table](https://www.niss.org/sites/default/files/Session3-DaveMadigan_PatrickRyanTalk_mar2015.pdf), shown below. This is a reference dataset within the Observational Medical Outcomes Partnership (OMOP) Common Data Model that contains validated information about known adverse drug events. ![omop-ground-truth.png](figures/omop-ground-truth.png) ### Confusion Matrix Below is a side-by-side comparison of this ground-truth dataset (left) with MALADE's labels (right), ignoring blue cells (see the paper for details): ![omop-results.png](figures/omop-results.png) The resulting confusion-matrix for MALADE is shown below: ![confusion.png](figures/confusion.png) ### AUC Metric Since MALADE produces qualitative and quantitative outputs, the paper explores a variety of ways to evaluate its performance against the OMOP ground-truth dataset. Here we focus on the label output $L$ (i.e. `increase`, `decrease`, or `no-effect`), and its associated confidence score $c$, and use the Area Under the ROC Curve (AUC) as the evaluation metric. The AUC metric is designed for binary classification, so we transform the three-class label output $L$ and confidence score $c$ to a binary classification score $p$ as follows. We treat $L$ = `increase` as the positive class, and $L$ = `decrease` or `no-effect` as the negative class, and we transform the label confidence score $c$ into a probability $p$ of `increase` as follows: - if the label output is `increase`, $p = (2+c)/3$, - if the label output is `no-effect`, $p = (2-c)/3$, and - if the label output is `decrease` , $p = (1-c)/3$. These transformations align with two intuitions: (a) a *higher* confidence in `increase` corresponds to a *higher* probability of `increase`, and a *higher* confidence in `no-effect` or `decrease` corresponds to a *lower* probability of `increase`, and (b) for a given confidence score $c$, the progression of labels `decrease`, `no-effect`, and `increase` corresponds to *increasing* probabilities of `increase`. The above transformations ensure that the probability $p$ is in the range $[0,1]$ and scales linearly with the confidence score $c$. We ran the full MALADE system for all drug-category/outcome pairs in the OMOP ground-truth dataset, and then computed the AUC for the score $p$ against the ground-truth binary classification label. With `GPT-4-Turbo` we obtained an AUC of 0.85, while `GPT-4o` resulted in an AUC of 0.90. These are state-of-the-art results for this specific ADE-extraction task. ### Ablations An important question the paper investigates is whether (and how much) the various components (RAG, critic agents, etc) contribute to MALADE's performance. To answer this, we perform ablations, where we remove one or more components from the MALADE system and evaluate the performance of the resulting system. For example we found that dropping the Critic agents reduces the AUC (using `GPT-4-Turbo`) from 0.85 to 0.82 (see paper, Appendix D for more ablation results). ### Variance of LLM-generated Scores When using an LLM to generate numerical scores, it is important to understand the variance in the scores. For example, if a single "full" run of MALADE (i.e. for all drug-category/outcome pairs in the OMOP ground-truth dataset) produces a certain AUC, was it a "lucky" run, or is the AUC relatively stable across runs? Ideally one would investigate this by repeating the full run of MALADE many times, but given the expense of running a full experiment, we focus on just three representative cells in the OMOP table, one corresponding to each possible ground-truth label, and run MALADE 10 times for each cells, and study the distribution of $p$ (the probability of increased risk, translated from the confidence score using the method described above), for each output label. Encouragingly, we find that the distribution of $p$ shows clear separation between the three labels, as in the figure below (The $x$ axis ranges from 0 to 1, and the three colored groups of bars represent, from left to right, `decrease`, `no-effect`, and `increase` labels). Full details are in the Appendix D of the paper. ![img.png](figures/variance-histogram.png) --- --- title: 'Multi Agent Debate and Education Platform' draft: false date: 2025-02-04 authors: - adamshams categories: - langroid - llm - local-llm - chat comments: true --- ## Introduction Have you ever imagined a world where we can debate complex issues with Generative AI agents taking a distinct stance and backing their arguments with evidence? Some will change your mind, and some will reveal the societal biases on which each distinctive Large Language Model (LLM) is trained on. Introducing an [AI-powered debate platform](https://github.com/langroid/langroid/tree/main/examples/multi-agent-debate) that brings this imagination to reality, leveraging diverse LLMs and the Langroid multi-agent programming framework. The system enables users to engage in structured debates with an AI taking the opposite stance (or even two AIs debating each other), using a multi-agent architecture with Langroid's powerful framework, where each agent embodies a specific ethical perspective, creating realistic and dynamic interactions. Agents are prompt-engineered and role-tuned to align with their assigned ethical stance, ensuring thoughtful and structured debates. My motivations for creating this platform included: - A debate coach for underserved students without access to traditional resources. - Tool for research and generating arguments from authentic sources. - Create an adaptable education platform to learn two sides of the coin for any topic. - Reduce echo chambers perpetuated by online algorithms by fostering two-sided debates on any topic, promoting education and awareness around misinformation. - Provide a research tool to study the varieties of biases in LLMs that are often trained on text reflecting societal biases. - Identify a good multi-agent framework designed for programming with LLMs. ## Platform Features: ### Dynamic Agent Generation: The platform features five types of agents: Pro, Con, Feedback, Research, and Retrieval Augmented Generation (RAG) Q&A. Each agent is dynamically generated using role-tuned and engineered prompts, ensuring diverse and engaging interactions. #### Pro and Con Agents: These agents engage in the core debate, arguing for and against the chosen topic. Their prompts are carefully engineered to ensure they stay true to their assigned ethical stance. #### Feedback Agent: This agent provides real-time feedback on the arguments and declares a winner. The evaluation criteria are based on the well-known Lincoln–Douglas debate format, and include: - Clash of Values - Argumentation - Cross-Examination - Rebuttals - Persuasion - Technical Execution - Adherence to Debate Etiquette - Final Focus #### Research Agent: This agent has the following functionalities: - Utilizes the `MetaphorSearchTool` and the `Metaphor` (now called `Exa`) Search API to conduct web searches combined with Retrieval Augmented Generation (RAG) to relevant web references for user education about the selected topic. - Produces a summary of arguments for and against the topic. - RAG-based document chat with the resources identified through Web Search. #### RAG Q&A Agent: - Provides Q&A capability using a RAG based chat interaction with the resources identified through Web Search. The agent utilizes `DocChatAgent` that is part of Langroid framework which orchestrates all LLM interactions. - Rich chunking parameters allows the user to get optimized relevance results. Check out `config.py`for details. ### Topic Adaptability: Easily adaptable to any subject by simply adding pro and con system messages. This makes it a versatile tool for exploring diverse topics and fostering critical thinking. Default topics cover ethics and use of AI for the following: - Healthcare - Intellectual property - Societal biases - Education ### Autonomous or Interactive: Engage in manual debate with a pro or con agent or watch it autonomously while adjusting number of turns. ### Diverse LLM Selection Adaptable per Agent: Configurable to select from diverse commercial and open source models: OpenAI, Google, and Mistral to experiment with responses for diverse perspectives. Users can select a unique LLM for each agent. ### LLM Tool/Function Integration: Utilizes LLM tools/functions features to conduct semantic search using Metaphor Search API and summarizes the pro and con perspectives for education. ### Configurable LLM Parameters: Parameters like temperature, minimum and maximum output tokens, allowing for customization of the AI's responses. Configurable LLM parameters like temperature, min & max output tokens. For Q&A with the searched resources, several parameters can be tuned in the `config` to enhance response relevance. ### Modular Design: Reusable code and modularized for other LLM applications. ## Interaction 1. Decide if you want to you use same LLM for all agents or different ones 2. Decide if you want autonomous debate between AI Agents or user vs. AI Agent. 3. Select a debate topic. 4. Choose your side (Pro or Con). 5. Engage in a debate by providing arguments and receiving responses from agents. 6. Request feedback at any time by typing `f`. 7. Decide if you want the Metaphor Search to run to find Topic relevant web links and summarize them. 8. Decide if you want to chat with the documents extracted from URLs found to learn more about the Topic. 9. End the debate manually by typing `done`. If you decide to chat with the documents, you can end session by typing `x` ## Why was Langroid chosen? I chose Langroid framework because it's a principled multi-agent programming framework inspired by the Actor framework. Prior to using Langroid, I developed a multi-agent debate system, however, I had to write a lot of tedious code to manage states of communication between debating agents, and the user interactions with LLMs. Langroid allowed me to seamlessly integrate multiple LLMs, easily create agents, tasks, and attach sub-tasks. ### Agent Creation Code Example ```python def create_chat_agent(name: str, llm_config: OpenAIGPTConfig, system_message: str) -> ChatAgent: return ChatAgent( ChatAgentConfig( llm=llm_config, name=name, system_message=system_message, ) ) ``` #### Sample Pro Topic Agent Creation ```python pro_agent = create_chat_agent( "Pro", pro_agent_config, system_messages.messages[pro_key].message + DEFAULT_SYSTEM_MESSAGE_ADDITION, ) ``` The `Task` mechanism in Langroid provides a robust mechanism for managing complex interactions within multi-agent systems. `Task` serves as a container for managing the flow of interactions between different agents (such as chat agents) and attached sub-tasks.`Task` also helps with turn-taking, handling responses, and ensuring smooth transitions between dialogue states. Each Task object is responsible for coordinating responses from its assigned agent, deciding the sequence of responder methods (llm_response, user_response, agent_response), and managing transitions between different stages of a conversation or debate. Each agent can focus on its specific role while the task structure handles the overall process's orchestration and flow, allowing a clear separation of concerns. The architecture and code transparency of Langroid's framework make it an incredible candidate for applications like debates where multiple agents must interact dynamically and responsively based on a mixture of user inputs and automated responses. ### Task creation and Orchestration Example ```python user_task = Task(user_agent, interactive=interactive_setting, restart=False) ai_task = Task(ai_agent, interactive=False, single_round=True) user_task.add_sub_task(ai_task) if not llm_delegate: user_task.run(user_agent.user_message, turns=max_turns) else: user_task.run("get started", turns=max_turns) ``` Tasks can be easily set up as sub-tasks of an orchestrating agent. In this case user_task could be Pro or Con depending on the user selection. If you want to build custom tools/functions or use Langroid provided it is only a line of code using `agent.enable_messaage`. Here is an example of `MetaphorSearchTool` and `DoneTool`. ```python metaphor_search_agent.enable_message(MetaphorSearchTool) metaphor_search_agent.enable_message(DoneTool) ``` Overall I had a great learning experience using Langroid and recommend using it for any projects that need to utilize LLMs. I am already working on a few Langroid based information retrieval and research systems for use in medicine and hoping to contribute more soon. ### Bio I'm a high school senior at Khan Lab School located in Mountain View, CA where I host a student-run Podcast known as the Khan-Cast. I also enjoy tinkering with interdisciplinary STEM projects. You can reach me on [LinkedIn](https://www.linkedin.com/in/adamshams/). --- --- draft: true date: 2022-01-31 authors: - pchalasani categories: - test - blog comments: true --- # Test code snippets ```python from langroid.language_models.base import LLMMessage, Role msg = LLMMessage( content="What is the capital of Bangladesh?", role=Role.USER, ) ``` # Test math notation A nice equation is $e^{i\pi} + 1 = 0$, which is known as Euler's identity. Here is a cool equation too, and in display mode: $$ e = mc^2 $$ # Latex with newlines Serious latex with `\\` for newlines renders fine: $$ \begin{bmatrix} a & b \\ c & d \\ e & f \\ \end{bmatrix} $$ or a multi-line equation $$ \begin{aligned} \dot{x} & = \sigma(y-x) \\ \dot{y} & = \rho x - y - xz \\ \dot{z} & = -\beta z + xy \end{aligned} $$ --- # Audience Targeting for a Business Suppose you are a marketer for a business, trying to figure out which audience segments to target. Your downstream systems require that you specify _standardized_ audience segments to target, for example from the [IAB Audience Taxonomy](https://iabtechlab.com/standards/audience-taxonomy/). There are thousands of standard audience segments, and normally you would need to search the list for potential segments that match what you think your ideal customer profile is. This is a tedious, error-prone task. But what if we can leverage an LLM such as GPT-4? We know that GPT-4 has skills that are ideally suited for this task: - General knowledge about businesses and their ideal customers - Ability to recognize which standard segments match an English description of a customer profile - Ability to plan a conversation to get the information it needs to answer a question Once you decide to use an LLM, you still need to figure out how to organize the various components of this task: - **Research:** What are some ideal customer profiles for the business - **Segmentation:** Which standard segments match an English description of a customer profile - **Planning:** how to organize the task to identify a few standard segments ## Using Langroid Agents Langroid makes it intuitive and simple to build an LLM-powered system organized around agents, each responsible for a different task. In less than a day we built a 3-agent system to automate this task: - The `Marketer` Agent is given the Planning role. - The `Researcher` Agent is given the Research role, and it has access to the business description. - The `Segmentor` Agent is given the Segmentation role. It has access to the IAB Audience Taxonomy via a vector database, i.e. its rows have been mapped to vectors via an embedding model, and these vectors are stored in a vector-database. Thus given an English description of a customer profile, the `Segmentor` Agent maps it to a vector using the embedding model, and retrieves the nearest (in vector terms, e.g. cosine similarity) IAB Standard Segments from the vector-database. The Segmentor's LLM further refines this by selecting the best-matching segments from the retrieved list. To kick off the system, the human user describes a business in English, or provides the URL of the business's website. The `Marketer` Agent sends customer profile queries to the `Researcher`, who answers in plain English based on the business description, and the Marketer takes this description and sends it to the Segmentor, who maps it to Standard IAB Segments. The task is done when the Marketer finds 4 Standard segments. The agents are depicted in the diagram below: ![targeting.png](targeting.png) ## An example: Glashutte Watches The human user first provides the URL of the business, in this case: ```text https://www.jomashop.com/glashutte-watches.html ``` From this URL, the `Researcher` agent summarizes its understanding of the business. The `Marketer` agent starts by asking the `Researcher`: ``` Could you please describe the age groups and interests of our typical customer? ``` The `Researcher` responds with an English description of the customer profile: ```text Our typical customer is a fashion-conscious individual between 20 and 45 years... ``` The `Researcher` forwards this English description to the `Segmentor` agent, who maps it to a standardized segment, e.g.: ```text Interest|Style & Fashion|Fashion Trends ... ``` This conversation continues until the `Marketer` agent has identified 4 standardized segments. Here is what the conversation looks like: ![targeting.gif](targeting.gif) --- # Hierarchical computation with Langroid Agents Here is a simple example showing tree-structured computation where each node in the tree is handled by a separate agent. This is a toy numerical example, and illustrates: - how to have agents organized in a hierarchical structure to accomplish a task - the use of global state accessible to all agents, and - the use of tools/function-calling. ## The Computation We want to carry out the following calculation for a given input number $n$: ```python def Main(n): if n is odd: return (3*n+1) + n else: if n is divisible by 10: return n/10 + n else: return n/2 + n ``` ## Using function composition Imagine we want to do this calculation using a few auxiliary functions: ```python def Main(n): # return non-null value computed by Odd or Even Record n as global variable # to be used by Adder below return Odd(n) or Even(n) def Odd(n): # Handle odd n if n is odd: new = 3*n+1 return Adder(new) else: return None def Even(n): # Handle even n: return non-null value computed by EvenZ or EvenNZ return EvenZ(n) or EvenNZ(n) def EvenZ(n): # Handle even n divisible by 10, i.e. ending in Zero if n is divisible by 10: new = n/10 return Adder(new) else: return None def EvenNZ(n): # Handle even n not divisible by 10, i.e. not ending in Zero if n is not divisible by 10: new = n/2 return Adder(new) else: return None def Adder(new): # Add new to starting number, available as global variable n return new + n ``` ## Mapping to a tree structure This compositional/nested computation can be represented as a tree: ```plaintext Main / \ Even Odd / \ \ EvenZ EvenNZ Adder | | Adder Adder ``` Let us specify the behavior we would like for each node, in a "decoupled" way, i.e. we don't want a node to be aware of the other nodes. As we see later, this decoupled design maps very well onto Langroid's multi-agent task orchestration. To completely define the node behavior, we need to specify how it handles an "incoming" number $n$ (from a parent node or user), and how it handles a "result" number $r$ (from a child node). - `Main`: - incoming $n$: simply send down $n$, record the starting number $n_0 = n$ as a global variable. - result $r$: return $r$. - `Odd`: - incoming $n$: if n is odd, send down $3*n+1$, else return None - result $r$: return $r$ - `Even`: - incoming $n$: if n is even, send down $n$, else return None - result $r$: return $r$ - `EvenZ`: (guaranteed by the tree hierarchy, to receive an even number.) - incoming $n$: if n is divisible by 10, send down $n/10$, else return None - result $r$: return $r$ - `EvenNZ`: (guaranteed by the tree hierarchy, to receive an even number.) - incoming $n$: if n is not divisible by 10, send down $n/2$, else return None - result $r$: return $r$ - `Adder`: - incoming $n$: return $n + n_0$ where $n_0$ is the starting number recorded by Main as a global variable. - result $r$: Not applicable since `Adder` is a leaf node. ## From tree nodes to Langroid Agents Let us see how we can perform this calculation using multiple Langroid agents, where - we define an agent corresponding to each of the nodes above, namely `Main`, `Odd`, `Even`, `EvenZ`, `EvenNZ`, and `Adder`. - we wrap each Agent into a Task, and use the `Task.add_subtask()` method to connect the agents into the desired hierarchical structure. Below is one way to do this using Langroid. We designed this with the following desirable features: - Decoupling: Each agent is instructed separately, without mention of any other agents (E.g. Even agent does not know about Odd Agent, EvenZ agent, etc). In particular, this means agents will not be "addressing" their message to specific other agents, e.g. send number to Odd agent when number is odd, etc. Allowing addressing would make the solution easier to implement, but would not be a decoupled solution. Instead, we want Agents to simply put the number "out there", and have it handled by an applicable agent, in the task loop (which consists of the agent's responders, plus any sub-task `run` methods). - Simplicity: Keep the agent instructions relatively simple. We would not want a solution where we have to instruct the agents (their LLMs) in convoluted ways. One way naive solutions fail is because agents are not able to distinguish between a number that is being "sent down" the tree as input, and a number that is being "sent up" the tree as a result from a child node. We use a simple trick: we instruct the LLM to mark returned values using the RESULT keyword, and instruct the LLMs on how to handle numbers that come with RESULT keyword, and those that don't In addition, we leverage some features of Langroid's task orchestration: - When `llm_delegate` is `True`, if the LLM says `DONE [rest of msg]`, the task is considered done, and the result of the task is `[rest of msg]` (i.e the part after `DONE`). - In the task loop's `step()` function (which seeks a valid message during a turn of the conversation) when any responder says `DO-NOT-KNOW`, it is not considered a valid message, and the search continues to other responders, in round-robin fashion. See the [`chat-tree.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree.py) example for an implementation of this solution. You can run that example as follows: ```bash python3 examples/basic/chat-tree.py ``` In the sections below we explain the code in more detail. ## Define the agents Let us start with defining the configuration to be used by all agents: ```python from langroid.agent.chat_agent import ChatAgent, ChatAgentConfig from langroid.language_models.openai_gpt import OpenAIChatModel, OpenAIGPTConfig config = ChatAgentConfig( llm=OpenAIGPTConfig( chat_model=OpenAIChatModel.GPT4o, ), vecdb=None, # no need for a vector database ) ``` Next we define each of the agents, for example: ```python main_agent = ChatAgent(config) ``` and similarly for the other agents. ## Wrap each Agent in a Task To allow agent interactions, the first step is to wrap each agent in a Task. When we define the task, we pass in the instructions above as part of the system message. Recall the instructions for the `Main` agent: - `Main`: - incoming $n$: simply send down $n$, record the starting number $n_0 = n$ as a global variable. - result $r$: return $r$. We include the equivalent of these instructions in the `main_task` that wraps the `main_agent`: ```python from langroid.agent.task import Task main_task = Task( main_agent, name="Main", interactive=False, #(1)! system_message=""" You will receive two types of messages, to which you will respond as follows: INPUT Message format: In this case simply write the , say nothing else. RESULT Message format: RESULT In this case simply say "DONE ", e.g.: DONE 19 To start off, ask the user for the initial number, using the `ask_num` tool/function. """, llm_delegate=True, # allow LLM to control end of task via DONE single_round=False, ) ``` 1. Non-interactive: don't wait for user input in each turn There are a couple of points to highlight about the `system_message` value in this task definition: - When the `Main` agent receives just a number, it simply writes out that number, and in the Langroid Task loop, this number becomes the "current pending message" to be handled by one of the sub-tasks, i.e. `Even, Odd`. Note that these sub-tasks are _not_ mentioned in the system message, consistent with the decoupling principle. - As soon as either of these sub-tasks returns a non-Null response, in the format "RESULT ", the `Main` agent is instructed to return this result saying "DONE ". Since `llm_delegate` is set to `True` (meaning the LLM can decide when the task has ended), this causes the `Main` task to be considered finished and the task loop is exited. Since we want the `Main` agent to record the initial number as a global variable, we use a tool/function `AskNum` defined as follows (see [this section](../quick-start/chat-agent-tool.md) in the getting started guide for more details on Tools): ```python from rich.prompt import Prompt from langroid.agent.tool_message import ToolMessage class AskNumTool(ToolMessage): request = "ask_num" purpose = "Ask user for the initial number" def handle(self) -> str: """ This is a stateless tool (i.e. does not use any Agent member vars), so we can define the handler right here, instead of defining an `ask_num` method in the agent. """ num = Prompt.ask("Enter a number") # record this in global state, so other agents can access it MyGlobalState.set_values(number=num) return str(num) ``` We then enable the `main_agent` to use and handle messages that conform to the `AskNum` tool spec: ```python main_agent.enable_message(AskNumTool) ``` !!! tip "Using and Handling a tool/function" "Using" a tool means the agent's LLM _generates_ the function-call (if using OpenAI function-calling) or the JSON structure (if using Langroid's native tools mechanism) corresponding to this tool. "Handling" a tool refers to the Agent's method recognizing the tool and executing the corresponding code. The tasks for other agents are defined similarly. We will only note here that the `Adder` agent needs a special tool `AddNumTool` to be able to add the current number to the initial number set by the `Main` agent. ## Connect the tasks into a tree structure So far, we have wrapped each agent in a task, in isolation, and there is no connection between the tasks. The final step is to connect the tasks to the tree structure we saw earlier: ```python main_task.add_sub_task([even_task, odd_task]) even_task.add_sub_task([evenz_task, even_nz_task]) evenz_task.add_sub_task(adder_task) even_nz_task.add_sub_task(adder_task) odd_task.add_sub_task(adder_task) ``` Now all that remains is to run the main task: ```python main_task.run() ``` Here is what a run starting with $n=12$ looks like: ![chat-tree.png](chat-tree.png) --- # Guide to examples in `langroid-examples` repo !!! warning "Outdated" This guide is from Feb 2024; there have been numerous additional examples since then. We recommend you visit the `examples` folder in the core `langroid` repo for the most up-to-date examples. These examples are periodically copied over to the `examples` folder in the `langroid-examples` repo. The [`langroid-examples`](https://github.com/langroid/langroid-examples) repo contains several examples of using the [Langroid](https://github.com/langroid/langroid) agent-oriented programming framework for LLM applications. Below is a guide to the examples. First please ensure you follow the installation instructions in the `langroid-examples` repo README. **At minimum a GPT4-compatible OpenAI API key is required.** As currently set up, many of the examples will _not_ work with a weaker model. Weaker models may require more detailed or different prompting, and possibly a more iterative approach with multiple agents to verify and retry, etc — this is on our roadmap. All the example scripts are meant to be run on the command line. In each script there is a description and sometimes instructions on how to run the script. NOTE: When you run any script, it pauses for “human” input at every step, and depending on the context, you can either hit enter to continue, or in case there is a question/response expected from the human, you can enter your question or response and then hit enter. ### Basic Examples - [`/examples/basic/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/chat.py) This is a basic chat application. - Illustrates Agent task loop. - [`/examples/basic/autocorrect.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/autocorrect.py) Chat with autocorrect: type fast and carelessly/lazily and the LLM will try its best to interpret what you want, and offer choices when confused. - Illustrates Agent task loop. - [`/examples/basic/chat-search.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/chat-search.py) This uses a `GoogleSearchTool` function-call/tool to answer questions using a google web search if needed. Try asking questions about facts known after Sep 2021 (GPT4 training cutoff), like `when was llama2 released` - Illustrates Agent + Tools/function-calling + web-search - [`/examples/basic/chat-tree.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/chat-tree.py) is a toy example of tree-structured multi-agent computation, see a detailed writeup [here.](https://langroid.github.io/langroid/examples/agent-tree/) - Illustrates multi-agent task collaboration, task delegation. ### Document-chat examples, or RAG (Retrieval Augmented Generation) - [`/examples/docqa/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat.py) is a document-chat application. Point it to local file, directory or web url, and ask questions - Illustrates basic RAG - [`/examples/docqa/chat-search.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat-search.py): ask about anything and it will try to answer based on docs indexed in vector-db, otherwise it will do a Google search, and index the results in the vec-db for this and later answers. - Illustrates RAG + Function-calling/tools - [`/examples/docqa/chat_multi.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi.py): — this is a 2-agent system that will summarize a large document with 5 bullet points: the first agent generates questions for the retrieval agent, and is done when it gathers 5 key points. - Illustrates 2-agent collaboration + RAG to summarize a document - [`/examples/docqa/chat_multi_extract.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py): — extracts structured info from a lease document: Main agent asks questions to a retrieval agent. - Illustrates 2-agent collaboration, RAG, Function-calling/tools, Structured Information Extraction. ### Data-chat examples (tabular, SQL) - [`/examples/data-qa/table_chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/data-qa/table_chat): - point to a URL or local csv file and ask questions. The agent generates pandas code that is run within langroid. - Illustrates function-calling/tools and code-generation - [`/examples/data-qa/sql-chat/sql_chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/data-qa/sql-chat/sql_chat.py): — chat with a sql db — ask questions in English, it will generate sql code to answer them. See [tutorial here](https://langroid.github.io/langroid/tutorials/postgresql-agent/) - Illustrates function-calling/tools and code-generation --- # Langroid: Harness LLMs with Multi-Agent Programming ## The LLM Opportunity Given the remarkable abilities of recent Large Language Models (LLMs), there is an unprecedented opportunity to build intelligent applications powered by this transformative technology. The top question for any enterprise is: how best to harness the power of LLMs for complex applications? For technical and practical reasons, building LLM-powered applications is not as simple as throwing a task at an LLM-system and expecting it to do it. ## Langroid's Multi-Agent Programming Framework Effectively leveraging LLMs at scale requires a *principled programming framework*. In particular, there is often a need to maintain multiple LLM conversations, each instructed in different ways, and "responsible" for different aspects of a task. An *agent* is a convenient abstraction that encapsulates LLM conversation state, along with access to long-term memory (vector-stores) and tools (a.k.a functions or plugins). Thus a **Multi-Agent Programming** framework is a natural fit for complex LLM-based applications. > Langroid is the first Python LLM-application framework that was explicitly designed with Agents as first-class citizens, and Multi-Agent Programming as the core design principle. The framework is inspired by ideas from the [Actor Framework](https://en.wikipedia.org/wiki/Actor_model). Langroid allows an intuitive definition of agents, tasks and task-delegation among agents. There is a principled mechanism to orchestrate multi-agent collaboration. Agents act as message-transformers, and take turns responding to (and transforming) the current message. The architecture is lightweight, transparent, flexible, and allows other types of orchestration to be implemented; see the (WIP) [langroid architecture document](blog/posts/langroid-architecture.md). Besides Agents, Langroid also provides simple ways to directly interact with LLMs and vector-stores. See the Langroid [quick-tour](tutorials/langroid-tour.md). ## Highlights - **Agents as first-class citizens:** The `Agent` class encapsulates LLM conversation state, and optionally a vector-store and tools. Agents are a core abstraction in Langroid; Agents act as _message transformers_, and by default provide 3 _responder_ methods, one corresponding to each entity: LLM, Agent, User. - **Tasks:** A Task class wraps an Agent, gives the agent instructions (or roles, or goals), manages iteration over an Agent's responder methods, and orchestrates multi-agent interactions via hierarchical, recursive task-delegation. The `Task.run()` method has the same type-signature as an Agent's responder's methods, and this is key to how a task of an agent can delegate to other sub-tasks: from the point of view of a Task, sub-tasks are simply additional responders, to be used in a round-robin fashion after the agent's own responders. - **Modularity, Reusabilily, Loose coupling:** The `Agent` and `Task` abstractions allow users to design Agents with specific skills, wrap them in Tasks, and combine tasks in a flexible way. - **LLM Support**: Langroid works with practically any LLM, local/open or remote/proprietary/API-based, via a variety of libraries and providers. See guides to using [local LLMs](tutorials/local-llm-setup.md) and [non-OpenAI LLMs](tutorials/non-openai-llms.md). See [Supported LLMs](tutorials/supported-models.md). - **Caching of LLM prompts, responses:** Langroid by default uses [Redis](https://redis.com/try-free/) for caching. - **Vector-stores**: [Qdrant](https://qdrant.tech/), [Chroma](https://www.trychroma.com/) and [LanceDB](https://www.lancedb.com/) are currently supported. Vector stores allow for Retrieval-Augmented-Generation (RAG). - **Grounding and source-citation:** Access to external documents via vector-stores allows for grounding and source-citation. - **Observability, Logging, Lineage:** Langroid generates detailed logs of multi-agent interactions and maintains provenance/lineage of messages, so that you can trace back the origin of a message. - **Tools/Plugins/Function-calling**: Langroid supports OpenAI's recently released [function calling](https://platform.openai.com/docs/guides/gpt/function-calling) feature. In addition, Langroid has its own native equivalent, which we call **tools** (also known as "plugins" in other contexts). Function calling and tools have the same developer-facing interface, implemented using [Pydantic](https://docs.pydantic.dev/latest/), which makes it very easy to define tools/functions and enable agents to use them. Benefits of using Pydantic are that you never have to write complex JSON specs for function calling, and when the LLM hallucinates malformed JSON, the Pydantic error message is sent back to the LLM so it can fix it! Don't worry if some of these terms are not clear to you. The [Getting Started Guide](quick-start/index.md) and subsequent pages will help you get up to speed. --- # Suppressing output in async, streaming mode Available since version 0.18.0 When using an LLM API in streaming + async mode, you may want to suppress output, especially when concurrently running multiple instances of the API. To suppress output in async + stream mode, you can set the `async_stream_quiet` flag in [`LLMConfig`][langroid.language_models.base.LLMConfig] to `True` (this is the default). Note that [`OpenAIGPTConfig`][langroid.language_models.openai_gpt.OpenAIGPTConfig] inherits from `LLMConfig`, so you can use this flag with `OpenAIGPTConfig` as well: ```python import langroid.language_models as lm llm_config = lm.OpenAIGPTConfig( async_stream_quiet=True, ... ) ``` --- # Azure OpenAI Models To use OpenAI models deployed on Azure, first ensure a few environment variables are defined (either in your `.env` file or in your environment): - `AZURE_OPENAI_API_KEY`, from the value of `API_KEY` - `AZURE_OPENAI_API_BASE` from the value of `ENDPOINT`, typically looks like `https://your_resource.openai.azure.com`. - For `AZURE_OPENAI_API_VERSION`, you can use the default value in `.env-template`, and latest version can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new#azure-openai-chat-completion-general-availability-ga) - `AZURE_OPENAI_DEPLOYMENT_NAME` is an OPTIONAL deployment name which may be defined by the user during the model setup. - `AZURE_OPENAI_CHAT_MODEL` Azure OpenAI allows specific model names when you select the model for your deployment. You need to put precisely the exact model name that was selected. For example, GPT-3.5 (should be `gpt-35-turbo-16k` or `gpt-35-turbo`) or GPT-4 (should be `gpt-4-32k` or `gpt-4`). - `AZURE_OPENAI_MODEL_NAME` (Deprecated, use `AZURE_OPENAI_CHAT_MODEL` instead). This page [Microsoft Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line&pivots=programming-language-python#environment-variables) provides more information on how to obtain these values. To use an Azure-deployed model in Langroid, you can use the `AzureConfig` class: ```python import langroid.language_models as lm import langroid as lr llm_config = lm.AzureConfig( chat_model="gpt-4o" # the other settings can be provided explicitly here, # or are obtained from the environment ) llm = lm.AzureGPT(config=llm_config) response = llm.chat( messages=[ lm.LLMMessage(role=lm.Role.SYSTEM, content="You are a helpful assistant."), lm.LLMMessage(role=lm.Role.USER, content="3+4=?"), ] ) agent = lr.ChatAgent( lr.ChatAgentConfig( llm=llm_config, system_message="You are a helpful assistant.", ) ) response = agent.llm_response("is 4 odd?") print(response.content) # "Yes, 4 is an even number." response = agent.llm_response("what about 2?") # follow-up question ``` --- # Document Chunking/Splitting in Langroid Langroid's [`ParsingConfig`][langroid.parsing.parser.ParsingConfig] provides several document chunking strategies through the `Splitter` enum: ## 1. MARKDOWN (`Splitter.MARKDOWN`) (The default) **Purpose**: Structure-aware splitting that preserves markdown formatting. **How it works**: - Preserves document hierarchy (headers and sections) - Enriches chunks with header information - Uses word count instead of token count (with adjustment factor) - Supports "rollup" to maintain document structure - Ideal for markdown documents where preserving formatting is important ## 2. TOKENS (`Splitter.TOKENS`) **Purpose**: Creates chunks of approximately equal token size. **How it works**: - Tokenizes the text using tiktoken - Aims for chunks of size `chunk_size` tokens (default: 200) - Looks for natural breakpoints like punctuation or newlines - Prefers splitting at sentence/paragraph boundaries - Ensures chunks are at least `min_chunk_chars` long (default: 350) ## 3. PARA_SENTENCE (`Splitter.PARA_SENTENCE`) **Purpose**: Splits documents respecting paragraph and sentence boundaries. **How it works**: - Recursively splits documents until chunks are below 1.3× the target size - Maintains document structure by preserving natural paragraph breaks - Adjusts chunk boundaries to avoid cutting in the middle of sentences - Stops when it can't split chunks further without breaking coherence ## 4. SIMPLE (`Splitter.SIMPLE`) **Purpose**: Basic splitting using predefined separators. **How it works**: - Uses a list of separators to split text (default: `["\n\n", "\n", " ", ""]`) - Splits on the first separator in the list - Doesn't attempt to balance chunk sizes - Simplest and fastest splitting method ## Basic Configuration ```python from langroid.parsing.parser import ParsingConfig, Splitter config = ParsingConfig( splitter=Splitter.MARKDOWN, # Most feature-rich option chunk_size=200, # Target tokens per chunk chunk_size_variation=0.30, # Allowed variation from target overlap=50, # Token overlap between chunks token_encoding_model="text-embedding-3-small" ) ``` ## Format-Specific Configuration ```python # Customize PDF parsing config = ParsingConfig( splitter=Splitter.PARA_SENTENCE, pdf=PdfParsingConfig( library="pymupdf4llm" # Default PDF parser ) ) # Use Gemini for PDF parsing config = ParsingConfig( pdf=PdfParsingConfig( library="gemini", gemini_config=GeminiConfig( model_name="gemini-2.0-flash", requests_per_minute=5 ) ) ) ``` # Setting Up Parsing Config in DocChatAgentConfig You can configure document parsing when creating a `DocChatAgent` by customizing the `parsing` field within the `DocChatAgentConfig`. Here's how to do it: ```python from langroid.agent.special.doc_chat_agent import DocChatAgentConfig from langroid.parsing.parser import ParsingConfig, Splitter, PdfParsingConfig # Create a DocChatAgent with custom parsing configuration agent_config = DocChatAgentConfig( parsing=ParsingConfig( # Choose the splitting strategy splitter=Splitter.MARKDOWN, # Structure-aware splitting with header context # Configure chunk sizes chunk_size=800, # Target tokens per chunk overlap=150, # Overlap between chunks # Configure chunk behavior max_chunks=5000, # Maximum number of chunks to create min_chunk_chars=250, # Minimum characters when truncating at punctuation discard_chunk_chars=10, # Discard chunks smaller than this # Configure context window n_neighbor_ids=3, # Store 3 chunk IDs on either side # Configure PDF parsing specifically pdf=PdfParsingConfig( library="pymupdf4llm", # Choose PDF parsing library ) ) ) ``` --- # Code Injection Protection with full_eval Flag Available in Langroid since v0.53.15. Langroid provides a security feature that helps protect against code injection vulnerabilities when evaluating pandas expressions in `TableChatAgent` and `VectorStore`. This protection is controlled by the `full_eval` flag, which defaults to `False` for maximum security, but can be set to `True` when working in trusted environments. ## Background When executing dynamic pandas expressions within `TableChatAgent` and in `VectorStore.compute_from_docs()`, there is a risk of code injection if malicious input is provided. To mitigate this risk, Langroid implements a command sanitization system that validates and restricts the operations that can be performed. ## How It Works The sanitization system uses AST (Abstract Syntax Tree) analysis to enforce a security policy that: 1. Restricts DataFrame methods to a safe whitelist 2. Prevents access to potentially dangerous methods and arguments 3. Limits expression depth and method chaining 4. Validates literals and numeric values to be within safe bounds 5. Blocks access to any variables other than the provided DataFrame When `full_eval=False` (the default), all expressions are run through this sanitization process before evaluation. When `full_eval=True`, the sanitization is bypassed, allowing full access to pandas functionality. ## Configuration Options ### In TableChatAgent ```python from langroid.agent.special.table_chat_agent import TableChatAgentConfig, TableChatAgent config = TableChatAgentConfig( data=my_dataframe, full_eval=False, # Default: True only for trusted input ) agent = TableChatAgent(config) ``` ### In VectorStore ```python from langroid.vector_store.lancedb import LanceDBConfig, LanceDB config = LanceDBConfig( collection_name="my_collection", full_eval=False, # Default: True only for trusted input ) vectorstore = LanceDB(config) ``` ## When to Use full_eval=True Set `full_eval=True` only when: 1. All input comes from trusted sources (not from users or external systems) 2. You need full pandas functionality that goes beyond the whitelisted methods 3. You're working in a controlled development or testing environment ## Security Considerations - By default, `full_eval=False` provides a good balance of security and functionality - The whitelisted operations support most common pandas operations - Setting `full_eval=True` removes all protection and should be used with caution - Even with protection, always validate input when possible ## Affected Classes The `full_eval` flag affects the following components: 1. `TableChatAgentConfig` and `TableChatAgent` - Controls sanitization in the `pandas_eval` method 2. `VectorStoreConfig` and `VectorStore` - Controls sanitization in the `compute_from_docs` method 3. All implementations of `VectorStore` (ChromaDB, LanceDB, MeiliSearch, PineconeDB, PostgresDB, QdrantDB, WeaviateDB) ## Example: Safe Pandas Operations When `full_eval=False`, the following operations are allowed: ```python # Allowed operations (non-exhaustive list) df.head() df.groupby('column')['value'].mean() df[df['column'] > 10] df.sort_values('column', ascending=False) df.pivot_table(...) ``` Some operations that might be blocked include: ```python # Potentially blocked operations df.eval("dangerous_expression") df.query("dangerous_query") df.apply(lambda x: dangerous_function(x)) ``` ## Testing Considerations When writing tests that use `TableChatAgent` or `VectorStore.compute_from_docs()` with pandas expressions that go beyond the whitelisted operations, you may need to set `full_eval=True` to ensure the tests pass. --- # Crawl4ai Crawler Documentation ## Overview The `Crawl4aiCrawler` is a highly advanced and flexible web crawler integrated into Langroid, built on the powerful `crawl4ai` library. It uses a real browser engine (Playwright) to render web pages, making it exceptionally effective at handling modern, JavaScript-heavy websites. This crawler provides a rich set of features for simple page scraping, deep-site crawling, and sophisticated data extraction, making it the most powerful crawling option available in Langroid. It is a local crawler, so no need for API keys. ## Installation To use `Crawl4aiCrawler`, you must install the `crawl4ai` extra dependencies. To install and prepare crawl4ai: ```bash # Install langroid with crawl4ai support pip install "langroid[crawl4ai]" crawl4ai setup crawl4ai doctor ``` > **Note**: The `crawl4ai setup` command will download Playwright browsers (Chromium, Firefox, WebKit) on first run. This is a one-time download that can be several hundred MB in size. The browsers are stored locally and used for rendering web pages. ## Key Features - **Real Browser Rendering**: Accurately processes dynamic content, single-page applications (SPAs), and sites that require JavaScript execution. - **Simple and Deep Crawling**: Can scrape a list of individual URLs (`simple` mode) or perform a recursive, deep crawl of a website starting from a seed URL (`deep` mode). - **Powerful Extraction Strategies**: - **Structured JSON (No LLM)**: Extract data into a predefined JSON structure using CSS selectors, XPath, or Regex patterns. This is extremely fast, reliable, and cost-effective. - **LLM-Based Extraction**: Leverage Large Language Models (like GPT or Gemini) to extract data from unstructured content based on natural language instructions and a Pydantic schema. - **Advanced Markdown Generation**: Go beyond basic HTML-to-markdown conversion. Apply content filters to prune irrelevant sections (sidebars, ads, footers) or use an LLM to intelligently reformat content for maximum relevance, perfect for RAG pipelines. - **High-Performance Scraping**: Optionally use an LXML-based scraping strategy for a significant speed boost on large HTML documents. - **Fine-Grained Configuration**: Offers detailed control over browser behavior (`BrowserConfig`) and individual crawl runs (`CrawlerRunConfig`) for advanced use cases. ## Configuration (`Crawl4aiConfig`) The `Crawl4aiCrawler` is configured via the `Crawl4aiConfig` object. This class acts as a high-level interface to the underlying `crawl4ai` library's settings. All of the strategies are optional. Learn more about these strategies , browser_config and run_config at [Crawl4AI docs](https://docs.crawl4ai.com/) ```python from langroid.parsing.url_loader import Crawl4aiConfig # All parameters are optional and have sensible defaults config = Crawl4aiConfig( crawl_mode="simple", # or "deep" extraction_strategy=..., markdown_strategy=..., deep_crawl_strategy=..., scraping_strategy=..., browser_config=..., # For advanced browser settings run_config=..., # For advanced crawl-run settings ) ``` **Main Parameters:** - `crawl_mode` (str): - `"simple"` (default): Crawls each URL in the provided list individually. - `"deep"`: Starts from the first URL in the list and recursively crawls linked pages based on the `deep_crawl_strategy`. - Make sure you are setting `"crawl_mode=deep"` whenever you are deep crawling this is crucial for smooth functioning. - `extraction_strategy` (`ExtractionStrategy`): Defines how to extract structured data from a page. If set, the `Document.content` will be a **JSON string** containing the extracted data. - `markdown_strategy` (`MarkdownGenerationStrategy`): Defines how to convert HTML to markdown. This is used when `extraction_strategy` is not set. The `Document.content` will be a **markdown string**. - `deep_crawl_strategy` (`DeepCrawlStrategy`): Configuration for deep crawling, such as `max_depth`, `max_pages`, and URL filters. Only used when `crawl_mode` is `"deep"`. - `scraping_strategy` (`ContentScrapingStrategy`): Specifies the underlying HTML parsing engine. Useful for performance tuning. - `browser_config` & `run_config`: For advanced users to pass detailed `BrowserConfig` and `CrawlerRunConfig` objects directly from the `crawl4ai` library. --- ## Usage Examples These are representative examples. For runnable examples check the script [`examples/docqa/crawl4ai_examples.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/crawl4ai_examples.py) ### 1. Simple Crawling (Default Markdown) This is the most basic usage. It will fetch the content of each URL and convert it to clean markdown. ```python from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig urls = [ "https://pytorch.org/", "https://techcrunch.com/", ] # Use default settings crawler_config = Crawl4aiConfig() loader = URLLoader(urls=urls, crawler_config=crawler_config) docs = loader.load() for doc in docs: print(f"URL: {doc.metadata.source}") print(f"Content (first 200 chars): {doc.content[:200]}") ``` ### 2. Structured JSON Extraction (No LLM) When you need to extract specific, repeated data fields from a page, schema-based extraction is the best choice. It's fast, precise, and free of LLM costs. The result in `Document.content` is a JSON string. #### a. Using CSS Selectors (`JsonCssExtractionStrategy`) This example scrapes titles and links from the Hacker News front page. ```python import json from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig from crawl4ai.extraction_strategy import JsonCssExtractionStrategy HACKER_NEWS_URL = "https://news.ycombinator.com" HACKER_NEWS_SCHEMA = { "name": "HackerNewsArticles", "baseSelector": "tr.athing", "fields": [ {"name": "title", "selector": "span.titleline > a", "type": "text"}, {"name": "link", "selector": "span.titleline > a", "type": "attribute", "attribute": "href"}, ], } # Create the strategy and pass it to the config css_strategy = JsonCssExtractionStrategy(schema=HACKER_NEWS_SCHEMA) crawler_config = Crawl4aiConfig(extraction_strategy=css_strategy) loader = URLLoader(urls=[HACKER_NEWS_URL], crawler_config=crawler_config) documents = loader.load() # The Document.content will contain the JSON string extracted_data = json.loads(documents[0].content) print(json.dumps(extracted_data[:3], indent=2)) ``` #### b. Using Regex (`RegexExtractionStrategy`) This is ideal for finding common patterns like emails, URLs, or phone numbers. ```python from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig from crawl4ai.extraction_strategy import RegexExtractionStrategy url = "https://www.scrapethissite.com/pages/forms/" # Combine multiple built-in patterns regex_strategy = RegexExtractionStrategy( pattern=( RegexExtractionStrategy.Email | RegexExtractionStrategy.Url | RegexExtractionStrategy.PhoneUS ) ) crawler_config = Crawl4aiConfig(extraction_strategy=regex_strategy) loader = URLLoader(urls=[url], crawler_config=crawler_config) documents = loader.load() print(documents[0].content) ``` ### 3. Advanced Markdown Generation For RAG applications, the quality of the markdown is crucial. These strategies produce highly relevant, clean text. The result in `Document.content` is the filtered markdown (`fit_markdown`). #### a. Pruning Filter (`PruningContentFilter`) This filter heuristically removes boilerplate content based on text density, link density, and common noisy tags. ```python from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator from crawl4ai.content_filter_strategy import PruningContentFilter prune_filter = PruningContentFilter(threshold=0.6, min_word_threshold=10) md_generator = DefaultMarkdownGenerator( content_filter=prune_filter, options={"ignore_links": True} ) crawler_config = Crawl4aiConfig(markdown_strategy=md_generator) loader = URLLoader(urls=["https://news.ycombinator.com"], crawler_config=crawler_config) docs = loader.load() print(docs[0].content[:500]) ``` #### b. LLM Filter (`LLMContentFilter`) Use an LLM to semantically understand the content and extract only the relevant parts based on your instructions. This is extremely powerful for creating topic-focused documents. ```python import os from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig from crawl4ai.async_configs import LLMConfig from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator from crawl4ai.content_filter_strategy import LLMContentFilter # Requires an API key, e.g., OPENAI_API_KEY llm_filter = LLMContentFilter( llm_config=LLMConfig( provider="openai/gpt-4o-mini", api_token=os.getenv("OPENAI_API_KEY"), ), instruction=""" Extract only the main article content. Exclude all navigation, sidebars, comments, and footer content. Format the output as clean, readable markdown. """, chunk_token_threshold=4096, ) md_generator = DefaultMarkdownGenerator(content_filter=llm_filter) crawler_config = Crawl4aiConfig(markdown_strategy=md_generator) loader = URLLoader(urls=["https://www.theverge.com/tech"], crawler_config=crawler_config) docs = loader.load() print(docs[0].content) ``` ### 4. Deep Crawling To crawl an entire website or a specific section, use `deep` mode. Recommended setting is BestFirstCrawlingStrategy ```python from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig from crawl4ai.deep_crawling import BestFirstCrawlingStrategy from crawl4ai.deep_crawling.filters import FilterChain, URLPatternFilter deep_crawl_strategy = BestFirstCrawlingStrategy( max_depth=2, include_external=False, max_pages=25, # Maximum number of pages to crawl (optional) filter_chain=FilterChain([URLPatternFilter(patterns=["*core*"])]) # Pattern matching for granular control (optional) ) crawler_config = Crawl4aiConfig( crawl_mode="deep", deep_crawl_strategy=deep_crawl_strategy ) loader = URLLoader(urls=["https://docs.crawl4ai.com/"], crawler_config=crawler_config) docs = loader.load() print(f"Crawled {len(docs)} pages.") for doc in docs: print(f"- {doc.metadata.source}") ``` ### 5. High-Performance Scraping (`LXMLWebScrapingStrategy`) For a performance boost, especially on very large, static HTML pages, switch the scraping strategy to LXML. ```python from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy crawler_config = Crawl4aiConfig( scraping_strategy=LXMLWebScrapingStrategy() ) loader = URLLoader(urls=["https://www.nbcnews.com/business"], crawler_config=crawler_config) docs = loader.load() print(f"Content Length: {len(docs[0].content)}") ``` ### 6. LLM-Based JSON Extraction (`LLMExtractionStrategy`) When data is unstructured or requires semantic interpretation, use an LLM for extraction. This is slower and more expensive but incredibly flexible. The result in `Document.content` is a JSON string. ```python import os import json from langroid.pydantic_v1 import BaseModel, Field from typing import Optional from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig from crawl4ai.async_configs import LLMConfig from crawl4ai.extraction_strategy import LLMExtractionStrategy # Define the data structure you want to extract class ArticleData(BaseModel): headline: str summary: str = Field(description="A short summary of the article") author: Optional[str] = None # Configure the LLM strategy llm_strategy = LLMExtractionStrategy( llm_config=LLMConfig( provider="openai/gpt-4o-mini", api_token=os.getenv("OPENAI_API_KEY"), ), schema=ArticleData.schema_json(), extraction_type="schema", instruction="Extract the headline, summary, and author of the main article.", ) crawler_config = Crawl4aiConfig(extraction_strategy=llm_strategy) loader = URLLoader(urls=["https://news.ycombinator.com"], crawler_config=crawler_config) docs = loader.load() extracted_data = json.loads(docs[0].content) print(json.dumps(extracted_data, indent=2)) ``` ## How It Handles Different Content Types The `Crawl4aiCrawler` is smart about handling different types of URLs: - **Web Pages** (e.g., `http://...`, `https://...`): These are processed by the `crawl4ai` browser engine. The output format (`markdown` or `JSON`) depends on the strategy you configure in `Crawl4aiConfig`. - **Local and Remote Documents** (e.g., URLs ending in `.pdf`, `.docx`): These are automatically detected and delegated to Langroid's internal `DocumentParser`. This ensures that documents are properly parsed and chunked according to your `ParsingConfig`, just like with other Langroid tools. ## Conclusion The `Crawl4aiCrawler` is a feature-rich, powerful tool for any web-based data extraction task. - For **simple, clean text**, use the default `Crawl4aiConfig`. - For **structured data from consistent sites**, use `JsonCssExtractionStrategy` or `RegexExtractionStrategy` for unbeatable speed and reliability. - To create **high-quality, focused content for RAG**, use `PruningContentFilter` or the `LLMContentFilter` with the `DefaultMarkdownGenerator`. - To scrape an **entire website**, use `deep_crawl_strategy` with `crawl_mode="deep"`. - For **complex or unstructured data** that needs AI interpretation, `LLMExtractionStrategy` provides a flexible solution. --- # Custom Azure OpenAI client !!! warning "This is only for using a Custom Azure OpenAI client" This note **only** meant for those who are trying to use a custom Azure client, and is NOT TYPICAL for most users. For typical usage of Azure-deployed models with Langroid, see the [docs](https://langroid.github.io/langroid/notes/azure-openai-models/), the [`test_azure_openai.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_azure_openai.py) and [`example/basic/chat.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat.py) Example showing how to use Langroid with Azure OpenAI and Entra ID authentication by providing a custom client. By default, Langroid manages the configuration and creation of the Azure OpenAI client (see the [Setup guide](https://langroid.github.io/langroid/quick-start/setup/#microsoft-azure-openai-setupoptional) for details). In most cases, the available configuration options are sufficient, but if you need to manage any options that are not exposed, you instead have the option of providing a custom client, in Langroid v0.29.0 and later. In order to use a custom client, you must provide a function that returns the configured client. Depending on whether you need to make synchronous or asynchronous calls, you need to provide the appropriate client. A sketch of how this is done (supporting both sync and async calls) is given below: ```python def get_azure_openai_client(): return AzureOpenAI(...) def get_azure_openai_async_client(): return AsyncAzureOpenAI(...) lm_config = lm.AzureConfig( azure_openai_client_provider=get_azure_openai_client, azure_openai_async_client_provider=get_azure_openai_async_client, ) ``` ## Microsoft Entra ID Authentication A key use case for a custom client is [Microsoft Entra ID authentication](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity). Here you need to provide an `azure_ad_token_provider` to the client. For examples on this, see [examples/basic/chat-azure-client.py](https://github.com/langroid/langroid/blob/main/examples/basic/chat-azure-client.py) and [examples/basic/chat-azure-async-client.py](https://github.com/langroid/langroid/blob/main/examples/basic/chat-azure-async-client.py). --- # Enriching Chunked Documents for Better Retrieval Available in Langroid v0.34.0 or later. When using the `DocChatAgent` for RAG with documents in highly specialized/technical domains, retrieval accuracy may be low since embeddings are not sufficient to capture relationships between entities, e.g. suppose a document-chunk consists of a medical test name "BUN" (Blood Urea Nitrogen), and a retrieval query is looking for tests related to kidney function, the embedding for "BUN" may not be close to the embedding for "kidney function", and the chunk may not be retrieved. In such cases it is useful to *enrich* the chunked documents with additional keywords (or even "hypothetical questions") to increase the "semantic surface area" of the chunk, so that the chunk is more likely to be retrieved for relevant queries. As of Langroid v0.34.0, you can provide a `chunk_enrichment_config` of type `ChunkEnrichmentAgentConfig`, in the `DocChatAgentConfig`. This config extends `ChatAgentConfig` and has the following fields: - `batch_size` (int): The batch size for the chunk enrichment agent. Default is 50. - `delimiter` (str): The delimiter to use when concatenating the chunk and the enriched text. - `enrichment_prompt_fn`: function (`str->str`) that creates a prompt from a doc-chunk string `x` In the above medical test example, suppose we want to augment a chunk containing only the medical test name, with the organ system it is related to. We can set up a `ChunkEnrichmentAgentConfig` as follows: ```python from langroid.agent.special.doc.doc_chat_agent import ( ChunkEnrichmentAgentConfig, ) enrichment_config = ChunkEnrichmentAgentConfig( batch_size=10, system_message=f""" You are an experienced clinical physician, very well-versed in medical tests and their names. You will be asked to identify WHICH ORGAN(s) Function/Health a test name is most closely associated with, to aid in retrieving the medical test names more accurately from an embeddings db that contains thousands of such test names. The idea is to use the ORGAN NAME(S) provided by you, to make the right test names easier to discover via keyword-matching or semantic (embedding) similarity. Your job is to generate up to 3 ORGAN NAMES MOST CLOSELY associated with the test name shown, ONE PER LINE. DO NOT SAY ANYTHING ELSE, and DO NOT BE OBLIGATED to provide 3 organs -- if there is just one or two that are most relevant, that is fine. Examples: "cholesterol" -> "heart function", "LDL" -> "artery health", etc, "PSA" -> "prostate health", "TSH" -> "thyroid function", etc. """, enrichment_prompt_fn=lambda test: f""" Which ORGAN(S) Function/Health is the medical test named '{test}' most closely associated with? """, ) doc_agent_config = DocChatAgentConfig( chunk_enrichment_config=enrichment_config, ... ) ``` This works as follows: - Before ingesting document-chunks into the vector-db, a specialized "chunk enrichment" agent is created, configured with the `enrichment_config` above. - For each document-chunk `x`, the agent's `llm_response_forget_async` method is called using the prompt created by `enrichment_prompt_fn(x)`. The resulting response text `y` is concatenated with the original chunk text `x` using the `delimiter`, before storing in the vector-db. This is done in batches of size `batch_size`. - At query time, after chunk retrieval, before generating the final LLM response, the enrichments are stripped from the retrieved chunks, and the original content of the retrieved chunks are passed to the LLM for generating the final response. See the script [`examples/docqa/doc-chunk-enrich.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/doc-chunk-enrich.py) for a complete example. Also see the tests related to "enrichment" in [`test_doc_chat_agent.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_doc_chat_agent.py). --- # PDF Files and Image inputs to LLMs Langroid supports sending PDF files and images (either URLs or local files) directly to Large Language Models with multi-modal capabilities. This feature allows models to "see" files and other documents, and works with most multi-modal models served via an OpenAI-compatible API, e.g.: - OpenAI's GPT-4o series and GPT-4.1 series - Gemini models - Claude series models (via OpenAI-compatible providers like OpenRouter or LiteLLM ) To see example usage, see: - tests: [test_llm.py](https://github.com/langroid/langroid/blob/main/tests/main/test_llm.py), [test_llm_async.py](https://github.com/langroid/langroid/blob/main/tests/main/test_llm_async.py), [test_chat-agent.py](https://github.com/langroid/langroid/blob/main/tests/main/test_chat_agent.py). - example script: [pdf-json-no-parse.py](https://github.com/langroid/langroid/blob/main/examples/extract/pdf-json-no-parse.py), which shows how you can directly extract structured information from a document **without having to first parse it to markdown** (which is inherently lossy). ## Basic Usage directly with LLM `chat` and `achat` methods First create a `FileAttachment` object using one of the `from_` methods. For image (`png`, `jpg/jpeg`) files you can use `FileAttachment.from_path(p)` where `p` is either a local file path, or a http/https URL. For PDF files, you can use `from_path` with a local file, or `from_bytes` or `from_io` (see below). In the examples below we show only `pdf` examples. ```python from langroid.language_models.base import LLMMessage, Role from langroid.parsing.file_attachment import FileAttachment import langroid.language_models as lm # Create a file attachment attachment = FileAttachment.from_path("path/to/document.pdf") # Create messages with attachment messages = [ LLMMessage(role=Role.SYSTEM, content="You are a helpful assistant."), LLMMessage( role=Role.USER, content="What's the title of this document?", files=[attachment] ) ] # Set up LLM with model that supports attachments llm = lm.OpenAIGPT(lm.OpenAIGPTConfig(chat_model=lm.OpenAIChatModel.GPT4o)) # Get response response = llm.chat(messages=messages) ``` ## Supported File Formats Currently the OpenAI-API supports: - PDF files (including image-based PDFs) - image files and URLs ## Creating Attachments There are multiple ways to create file attachments: ```python # From a file path attachment = FileAttachment.from_path("path/to/file.pdf") # From bytes with open("path/to/file.pdf", "rb") as f: attachment = FileAttachment.from_bytes(f.read(), filename="document.pdf") # From a file-like object from io import BytesIO file_obj = BytesIO(pdf_bytes) attachment = FileAttachment.from_io(file_obj, filename="document.pdf") ``` ## Follow-up Questions You can continue the conversation with follow-up questions that reference the attached files: ```python messages.append(LLMMessage(role=Role.ASSISTANT, content=response.message)) messages.append(LLMMessage(role=Role.USER, content="What is the main topic?")) response = llm.chat(messages=messages) ``` ## Multiple Attachments Langroid allows multiple files can be sent in a single message, but as of 16 Apr 2025, sending multiple PDF files does not appear to be properly supported in the APIs (they seem to only use the last file attached), although sending multiple images does work. ```python messages = [ LLMMessage( role=Role.USER, content="Compare these documents", files=[attachment1, attachment2] ) ] ``` ## Using File Attachments with Agents Agents can process file attachments as well, in the `llm_response` method, which takes a `ChatDocument` object as input. To pass in file attachments, include the `files` field in the `ChatDocument`, in addition to the content: ```python import langroid as lr from langroid.agent.chat_document import ChatDocument, ChatDocMetaData from langroid.mytypes import Entity agent = lr.ChatAgent(lr.ChatAgentConfig()) user_input = ChatDocument( content="What is the title of this document?", files=[attachment], metadata=ChatDocMetaData( sender=Entity.USER, ) ) # or more simply, use the agent's `create_user_response` method: # user_input = agent.create_user_response( # content="What is the title of this document?", # files=[attachment], # ) response = agent.llm_response(user_input) ``` ## Using File Attachments with Tasks In Langroid, `Task.run()` can take a `ChatDocument` object as input, and as mentioned above, it can contain attached files in the `files` field. To ensure proper orchestration, you'd want to properly set various `metadata` fields as well, such as `sender`, etc. Langroid provides a convenient `create_user_response` method to create a `ChatDocument` object with the necessary metadata, so you only need to specify the `content` and `files` fields: ```python from langroid.parsing.file_attachment import FileAttachment from langroid.agent.task import Task agent = ... # Create task task = Task(agent, interactive=True) # Create a file attachment attachment = FileAttachment.from_path("path/to/document.pdf") # Create input with attachment input_message = agent.create_user_response( content="Extract data from this document", files=[attachment] ) # Run task with file attachment result = task.run(input_message) ``` See the script [`pdf-json-no-parse.py`](https://github.com/langroid/langroid/blob/main/examples/extract/pdf-json-no-parse.py) for a complete example of using file attachments with tasks. ## Practical Applications - PDF document analysis and data extraction - Report summarization - Structured information extraction from documents - Visual content analysis For more complex applications, consider using the Task and Agent infrastructure in Langroid to orchestrate multi-step document processing workflows. --- # Gemini LLMs & Embeddings via OpenAI client (without LiteLLM) As of Langroid v0.21.0 you can use Langroid with Gemini LLMs directly via the OpenAI client, without using adapter libraries like LiteLLM. See details [here](https://langroid.github.io/langroid/tutorials/non-openai-llms/) You can use also Google AI Studio Embeddings or Gemini Embeddings directly which uses google-generativeai client under the hood. ```python import langroid as lr from langroid.agent.special import DocChatAgent, DocChatAgentConfig from langroid.embedding_models import GeminiEmbeddingsConfig # Configure Gemini embeddings embed_cfg = GeminiEmbeddingsConfig( model_type="gemini", model_name="models/text-embedding-004", dims=768, ) # Configure the DocChatAgent config = DocChatAgentConfig( llm=lr.language_models.OpenAIGPTConfig( chat_model="gemini/" + lr.language_models.GeminiModel.GEMINI_1_5_FLASH_8B, ), vecdb=lr.vector_store.QdrantDBConfig( collection_name="quick_start_chat_agent_docs", replace_collection=True, embedding=embed_cfg, ), parsing=lr.parsing.parser.ParsingConfig( separators=["\n\n"], splitter=lr.parsing.parser.Splitter.SIMPLE, ), n_similar_chunks=2, n_relevant_chunks=2, ) # Create the agent agent = DocChatAgent(config) ``` --- # Support for Open LLMs hosted on glhf.chat Available since v0.23.0. If you're looking to use Langroid with one of the recent performant Open LLMs, such as `Qwen2.5-Coder-32B-Instruct`, you can do so using our glhf.chat integration. See [glhf.chat](https://glhf.chat/chat/create) for a list of available models. To run with one of these models, set the chat_model in the `OpenAIGPTConfig` to `"glhf/"`, where model_name is hf: followed by the HuggingFace repo path, e.g. `Qwen/Qwen2.5-Coder-32B-Instruct`, so the full chat_model would be `"glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct"`. Also many of the example scripts in the main repo (under the `examples` directory) can be run with this and other LLMs using the model-switch cli arg `-m `, e.g. ```bash python3 examples/basic/chat.py -m glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct ``` Additionally, you can run many of the tests in the `tests` directory with this model instead of the default OpenAI `GPT4o` using `--m `, e.g. ```bash pytest tests/main/test_chat_agent.py --m glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct ``` For more info on running langroid with Open LLMs via other providers/hosting services, see our [guide to using Langroid with local/open LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/#local-llms-hosted-on-glhfchat). --- # Handling a non-tool LLM message A common scenario is to define a `ChatAgent`, enable it to use some tools (i.e. `ToolMessages`s), wrap it in a Task, and call `task.run()`, e.g. ```python class MyTool(lr.ToolMessage) ... import langroid as lr config = lr.ChatAgentConfig(...) agent = lr.ChatAgent(config) agent.enable_message(MyTool) task = lr.Task(agent, interactive=False) task.run("Hello") ``` Consider what happens when you invoke `task.run()`. When the agent's `llm_response` returns a valid tool-call, the sequence of steps looks like this: - `llm_response` -> tool $T$ - `aggent_response` handles $T$ -> returns results $R$ - `llm_response` responds to $R$ -> returns msg $M$ - and so on If the LLM's response M contains a valid tool, then this cycle continues with another tool-handling round. However, if the LLM's response M does _not_ contain a tool-call, it is unclear whether: - (1) the LLM "forgot" to generate a tool (or generated it wrongly, hence it was not recognized by Langroid as a tool), or - (2) the LLM's response M is an "answer" meant to be shown to the user to continue the conversation, or - (3) the LLM's response M is intended to be a "final" response, ending the task. Internally, when the `ChatAgent`'s `agent_response` method sees a message that does not contain a tool, it invokes the `handle_message_fallback` method, which by default does nothing (returns `None`). However you can override this method by deriving from `ChatAgent`, as described in this [FAQ](https://langroid.github.io/langroid/FAQ/#how-can-i-handle-an-llm-forgetting-to-generate-a-toolmessage). As in that FAQ, in this fallback method, you would typically have code that checks whether the message is a `ChatDocument` and whether it came from the LLM, and if so, you would have the method return an appropriate message or tool (e.g. a reminder to the LLM, or an orchestration tool such as [`AgentDoneTool`][langroid.agent.tools.orchestration.AgentDoneTool]). To simplify the developer experience, as of version 0.39.2 Langroid also provides an easier way to specify what this fallback method should return, via the `ChatAgentConfig.handle_llm_no_tool` parameter, for example: ```python config = lr.ChatAgentConfig( # ... other params handle_llm_no_tool="done", # terminate task if LLM sends non-tool msg ) ``` The `handle_llm_no_tool` parameter can have the following possible values: - A special value from the [`NonToolAction`][langroid.mytypes.NonToolAction] Enum, e.g.: - `"user"` or `NonToolAction.USER` - this is interpreted by langroid to return `ForwardTool(agent="user")`, meaning the message is passed to the user to await their next input. - `"done"` or `NonToolAction.DONE` - this is interpreted by langroid to return `AgentDoneTool(content=msg.content, tools=msg.tool_messages)`, meaning the task is ended, and any content and tools in the current message will appear in the returned `ChatDocument`. - A callable, specifically a function that takes a `ChatDocument` and returns any value. This can be useful when you want the fallback action to return a value based on the current message, e.g. `lambda msg: AgentDoneTool(content=msg.content)`, or it could a more elaborate function, or a prompt that contains the content of the current message. - Any `ToolMessage` (typically an [Orchestration](https://github.com/langroid/langroid/blob/main/langroid/agent/tools/orchestration.py) tool like `AgentDoneTool` or `ResultTool`) - Any string, meant to be handled by the LLM. Typically this would be a reminder to the LLM, something like: ```python """Your intent is not clear -- - if you forgot to use a Tool such as `ask_tool` or `search_tool`, try again. - or if you intended to return your final answer, use the Tool named `done_tool`, with `content` set to your answer. """ ``` A simple example is in the [`chat-search.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-search.py) script, and in the `test_handle_llm_no_tool` test in [`test_tool_messages.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_tool_messages.py). --- # HTML Logger The HTML logger creates interactive, self-contained HTML files that make it easy to navigate complex multi-agent conversations in Langroid. ## Enabling the HTML Logger The HTML logger is **enabled by default** in `TaskConfig`: ```python import langroid as lr # HTML logging is automatically enabled task = lr.Task(agent) # To disable HTML logging task = lr.Task(agent, config=lr.TaskConfig(enable_html_logging=False)) # To change the log directory (default is "logs/") task = lr.Task(agent, config=lr.TaskConfig(logs_dir="my_logs")) ``` ## Log Files Langroid creates three types of log files in the `logs/` directory: 1. **HTML Log**: `.html` - Interactive, collapsible view 2. **Plain Text Log**: `.log` - Traditional text log with colors 3. **TSV Log**: `.tsv` - Tab-separated values for data analysis The `` is determined by: - The task name (if specified) - Otherwise, the agent name - Falls back to "root" if neither is specified When a task starts, you'll see a clickable `file://` link in the console: ``` WARNING - 📊 HTML Log: file:///path/to/logs/task-name.html ``` ## Key Features ### Collapsible Entries Each log entry can be expanded/collapsed to show different levels of detail: - **Collapsed**: Shows only the entity type (USER, LLM, AGENT) and preview - **Expanded**: Shows full message content, tools, and sub-sections ### Visual Hierarchy - **Important responses** are shown at full opacity - **Intermediate steps** are faded (0.4 opacity) - Color-coded entities: USER (blue), LLM (green), AGENT (orange), SYSTEM (gray) ### Tool Visibility Tools are clearly displayed with: - Tool name and parameters - Collapsible sections showing raw tool calls - Visual indicators for tool results ### Auto-Refresh The HTML page automatically refreshes every 2 seconds to show new log entries as they're written. ### Persistent UI State Your view preferences are preserved across refreshes: - Expanded/collapsed entries remain in their state - Filter settings are remembered ## Example Here's what the HTML logger looks like for a planner workflow: ![HTML Logger Screenshot](../screenshots/planner-workflow-html-logs.png) In this example from `examples/basic/planner-workflow-simple.py`, you can see: - The planner agent orchestrating multiple tool calls - Clear visibility of `IncrementTool` and `DoublingTool` usage - The filtered view showing only important responses - Collapsible tool sections with parameters ## Benefits 1. **Easy Navigation**: Quickly expand/collapse entries to focus on what matters 2. **Tool Clarity**: See exactly which tools were called with what parameters 3. **Real-time Updates**: Watch logs update automatically as your task runs 4. **Filtered Views**: Use "Show only important responses" to hide intermediate steps --- # Knowledge-graph support Langroid can be used to set up natural-language conversations with knowledge graphs. Currently the two most popular knowledge graphs are supported: ## Neo4j - [implementation](https://github.com/langroid/langroid/tree/main/langroid/agent/special/neo4j) - test: [test_neo4j_chat_agent.py](https://github.com/langroid/langroid/blob/main/tests/main/test_neo4j_chat_agent.py) - examples: [chat-neo4j.py](https://github.com/langroid/langroid/blob/main/examples/kg-chat/chat-neo4j.py) ## ArangoDB Available with Langroid v0.20.1 and later. Uses the [python-arangodb](https://github.com/arangodb/python-arango) library. - [implementation](https://github.com/langroid/langroid/tree/main/langroid/agent/special/arangodb) - tests: [test_arangodb.py](https://github.com/langroid/langroid/blob/main/tests/main/test_arangodb.py), [test_arangodb_chat_agent.py](https://github.com/langroid/langroid/blob/main/tests/main/test_arangodb_chat_agent.py) - example: [chat-arangodb.py](https://github.com/langroid/langroid/blob/main/examples/kg-chat/chat-arangodb.py) --- # LangDB with Langroid ## Introduction [LangDB](https://langdb.ai/) is an AI gateway that provides OpenAI-compatible APIs to access 250+ LLMs. It offers cost control, observability, and performance benchmarking while enabling seamless switching between models. Langroid has a simple integration with LangDB's API service, so there are no dependencies to install. (LangDB also has a self-hosted version, which is not yet supported in Langroid). ## Setup environment variables At minimum, ensure you have these env vars in your `.env` file: ``` LANGDB_API_KEY=your_api_key_here LANGDB_PROJECT_ID=your_project_id_here ``` ## Using LangDB with Langroid ### Configure LLM and Embeddings In `OpenAIGPTConfig`, when you specify the `chat_model` with a `langdb/` prefix, langroid uses the API key, `project_id` and other langDB-specific parameters from the `langdb_params` field; if any of these are specified in the `.env` file or in the environment explicitly, they will override the values in `langdb_params`. For example, to use Anthropic's Claude-3.7-Sonnet model, set `chat_model="langdb/anthropic/claude-3.7-sonnet", as shown below. You can entirely omit the `langdb_params` field if you have already set up the fields as environment variables in your `.env` file, e.g. the `api_key` and `project_id` are read from the environment variables `LANGDB_API_KEY` and `LANGDB_PROJECT_ID` respectively, and similarly for the other fields (which are optional). ```python import os import uuid from langroid.language_models.openai_gpt import OpenAIGPTConfig, LangDBParams from langroid.embedding_models.models import OpenAIEmbeddingsConfig # Generate tracking IDs (optional) thread_id = str(uuid.uuid4()) run_id = str(uuid.uuid4()) # Configure LLM llm_config = OpenAIGPTConfig( chat_model="langdb/anthropic/claude-3.7-sonnet", # omit the langdb_params field if you're not using custom tracking, # or if all its fields are provided in env vars, like # LANGDB_API_KEY, LANGDB_PROJECT_ID, LANGDB_RUN_ID, LANGDB_THREAD_ID, etc. langdb_params=LangDBParams( label='my-app', thread_id=thread_id, run_id=run_id, # api_key, project_id are read from .env or environment variables # LANGDB_API_KEY, LANGDB_PROJECT_ID respectively. ) ) ``` Similarly, you can configure the embeddings using `OpenAIEmbeddingsConfig`, which also has a `langdb_params` field that works the same way as in `OpenAIGPTConfig` (i.e. it uses the API key and project ID from the environment if provided, otherwise uses the default values in `langdb_params`). Again the `langdb_params` does not need to be specified explicitly, if you've already set up the environment variables in your `.env` file. ```python # Configure embeddings embedding_config = OpenAIEmbeddingsConfig( model_name="langdb/openai/text-embedding-3-small", ) ``` ## Tracking and Observability LangDB provides special headers for request tracking: - `x-label`: Tag requests for filtering in the dashboard - `x-thread-id`: Track conversation threads (UUID format) - `x-run-id`: Group related requests together ## Examples The `langroid/examples/langdb/` directory contains examples demonstrating: 1. **RAG with LangDB**: `langdb_chat_agent_docs.py` 2. **LangDB with Function Calling**: `langdb_chat_agent_tool.py` 3. **Custom Headers**: `langdb_custom_headers.py` ## Viewing Results Visit the [LangDB Dashboard](https://dashboard.langdb.com) to: - Filter requests by label, thread ID, or run ID - View detailed request/response information - Analyze token usage and costs For more information, visit [LangDB Documentation](https://docs.langdb.com). See example scripts [here](https://github.com/langroid/langroid/tree/main/examples/langdb) --- # Handling large tool results Available since Langroid v0.22.0. In some cases, the result of handling a `ToolMessage` could be very large, e.g. when the Tool is a database query that returns a large number of rows, or a large schema. When used in a task loop, this large result may then be sent to the LLM to generate a response, which in some scenarios may not be desirable, as it increases latency, token-cost and distractions. Langroid allows you to set two optional parameters in a `ToolMessage` to handle this situation: - `_max_result_tokens`: *immediately* truncate the result to this number of tokens. - `_max_retained_tokens`: *after* a responder (typically the LLM) responds to this tool result (which optionally may already have been truncated via `_max_result_tokens`), edit the message history to truncate the result to this number of tokens. You can set one, both or none of these parameters. If you set both, you would want to set `_max_retained_tokens` to a smaller number than `_max_result_tokens`. See the test `test_reduce_raw_tool_result` in `test_tool_messages.py` for an example. Here is a conceptual example. Suppose there is a Tool called `MyTool`, with parameters `_max_result_tokens=20` and `_max_retained_tokens=10`. Imagine a task loop where the user says "hello", and then LLM generates a call to `MyTool`, and the tool handler (i.e. `agent_response`) generates a result of 100 tokens. This result is immediately truncated to 20 tokens, and then the LLM responds to it with a message `response`. The agent's message history looks like this: ``` 1. System msg. 2. user: hello 3. LLM: MyTool 4. Agent (Tool handler): 100-token result => reduced to 20 tokens 5. LLM: response ``` Immediately after the LLM's response at step 5, the message history is edited so that the message contents at position 4 are truncated to 10 tokens, as specified by `_max_retained_tokens`. --- # Using LiteLLM Proxy with OpenAIGPTConfig You can easily configure Langroid to use LiteLLM proxy for accessing models with a simple prefix `litellm-proxy/` in the `chat_model` name: ## Using the `litellm-proxy/` prefix When you specify a model with the `litellm-proxy/` prefix, Langroid automatically uses the LiteLLM proxy configuration: ```python from langroid.language_models.openai_gpt import OpenAIGPTConfig config = OpenAIGPTConfig( chat_model="litellm-proxy/your-model-name" ) ``` ## Setting LiteLLM Proxy Parameters When using the `litellm-proxy/` prefix, Langroid will read connection parameters from either: 1. The `litellm_proxy` config object: ```python from langroid.language_models.openai_gpt import OpenAIGPTConfig, LiteLLMProxyConfig config = OpenAIGPTConfig( chat_model="litellm-proxy/your-model-name", litellm_proxy=LiteLLMProxyConfig( api_key="your-litellm-proxy-api-key", api_base="http://your-litellm-proxy-url" ) ) ``` 2. Environment variables (which take precedence): ```bash export LITELLM_API_KEY="your-litellm-proxy-api-key" export LITELLM_API_BASE="http://your-litellm-proxy-url" ``` This approach makes it simple to switch between using LiteLLM proxy and other model providers by just changing the model name prefix, without needing to modify the rest of your code or tweaking env variables. ## Note: LiteLLM Proxy vs LiteLLM Library **Important distinction:** Using the `litellm-proxy/` prefix connects to a LiteLLM proxy server, which is different from using the `litellm/` prefix. The latter utilizes the LiteLLM adapter library directly without requiring a proxy server. Both approaches are supported in Langroid, but they serve different use cases: - Use `litellm-proxy/` when connecting to a deployed LiteLLM proxy server - Use `litellm/` when you want to use the LiteLLM library's routing capabilities locally Choose the approach that best fits your infrastructure and requirements. --- # Local embeddings provision via llama.cpp server As of Langroid v0.30.0, you can use llama.cpp as provider of embeddings to any of Langroid's vector stores, allowing access to a wide variety of GGUF-compatible embedding models, e.g. [nomic-ai's Embed Text V1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF). ## Supported Models llama.cpp can generate embeddings from: **Dedicated embedding models (RECOMMENDED):** - [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF) (768 dims) - [nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe-GGUF) - [nomic-embed-code](https://huggingface.co/nomic-ai/nomic-embed-code-GGUF) - Other GGUF embedding models **Regular LLMs (also supported):** - gpt-oss-20b, gpt-oss-120b - Llama models - Other language models Note: Dedicated embedding models are recommended for best performance in retrieval and semantic search tasks. ## Configuration When defining a VecDB, you can provide an instance of `LlamaCppServerEmbeddingsConfig` to the VecDB config to instantiate the llama.cpp embeddings server handler. To configure the `LlamaCppServerEmbeddingsConfig`, there are several parameters that should be adjusted: ```python from langroid.embedding_models.models import LlamaCppServerEmbeddingsConfig from langroid.vector_store.qdrantdb import QdrantDBConfig embed_cfg = LlamaCppServerEmbeddingsConfig( api_base="http://localhost:8080", # IP + Port dims=768, # Match the dimensions of your embedding model context_length=2048, # Match the config of the model batch_size=2048, # Safest to ensure this matches context_length ) vecdb_config = QdrantDBConfig( collection_name="my-collection", embedding=embed_cfg, storage_path=".qdrant/", ) ``` ## Running llama-server The llama.cpp server must be started with the `--embeddings` flag to enable embedding generation. ### For dedicated embedding models (RECOMMENDED): ```bash ./llama-server -ngl 100 -c 2048 \ -m ~/nomic-embed-text-v1.5.Q8_0.gguf \ --host localhost --port 8080 \ --embeddings -b 2048 -ub 2048 ``` ### For LLM-based embeddings (e.g., gpt-oss): ```bash ./llama-server -ngl 99 \ -m ~/.cache/llama.cpp/gpt-oss-20b.gguf \ --host localhost --port 8080 \ --embeddings ``` ## Response Format Compatibility Langroid automatically handles multiple llama.cpp response formats: - Native `/embedding`: `{"embedding": [floats]}` - OpenAI `/v1/embeddings`: `{"data": [{"embedding": [floats]}]}` - Array formats: `[{"embedding": [floats]}]` - Nested formats: `{"embedding": [[floats]]}` You don't need to worry about which endpoint or format your llama.cpp server uses - Langroid will automatically detect and handle the response correctly. ## Example Usage An example setup can be found inside [examples/docqa/chat.py](https://github.com/langroid/langroid/blob/main/examples/docqa/chat.py). For a complete example using local embeddings with llama.cpp: ```python from langroid.agent.special.doc_chat_agent import ( DocChatAgent, DocChatAgentConfig, ) from langroid.embedding_models.models import LlamaCppServerEmbeddingsConfig from langroid.language_models.openai_gpt import OpenAIGPTConfig from langroid.parsing.parser import ParsingConfig from langroid.vector_store.qdrantdb import QdrantDBConfig # Configure local embeddings via llama.cpp embed_cfg = LlamaCppServerEmbeddingsConfig( api_base="http://localhost:8080", dims=768, # nomic-embed-text-v1.5 dimensions context_length=8192, batch_size=1024, ) # Configure vector store with local embeddings vecdb_config = QdrantDBConfig( collection_name="doc-chat-local", embedding=embed_cfg, storage_path=".qdrant/", ) # Create DocChatAgent config = DocChatAgentConfig( vecdb=vecdb_config, llm=OpenAIGPTConfig( chat_model="gpt-4o", # or use local LLM ), ) agent = DocChatAgent(config) ``` ## Troubleshooting **Error: "Failed to connect to embedding provider"** - Ensure llama-server is running with the `--embeddings` flag - Check that the `api_base` URL is correct - Verify the server is accessible from your machine **Error: "Unsupported embedding response format"** - This error includes the first 500 characters of the response to help debug - Check your llama-server logs for any errors - Ensure you're using a compatible llama.cpp version **Embeddings seem low quality:** - Use a dedicated embedding model instead of an LLM - Ensure the `dims` parameter matches your model's output dimensions - Try different GGUF quantization levels (Q8_0 generally works well) ## Additional Resources - [llama.cpp GitHub](https://github.com/ggml-org/llama.cpp) - [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md) - [nomic-embed models on Hugging Face](https://huggingface.co/nomic-ai) - [Issue #919 - Implementation details](https://github.com/langroid/langroid/blob/main/issues/issue-919-llamacpp-embeddings.md) --- # Using the LLM-based PDF Parser - Converts PDF content into Markdown format using Multimodal models. - Uses multimodal models to describe images within PDFs. - Supports page-wise or chunk-based processing for optimized performance. --- ### Initializing the LLM-based PDF Parser Make sure you have set up your API key for whichever model you specify in `model_name` below. You can initialize the LLM PDF parser as follows: ```python parsing_config = ParsingConfig( n_neighbor_ids=2, pdf=PdfParsingConfig( library="llm-pdf-parser", llm_parser_config=LLMPdfParserConfig( model_name="gemini-2.0-flash", split_on_page=True, max_tokens=7000, requests_per_minute=5, timeout=60, # increase this for large documents ), ), ) ``` --- ## Parameters ### `model_name` Specifies the model to use for PDF conversion. **Default:** `gemini/gemini-2.0-flash` --- ### `max_tokens` Limits the number of tokens in the input. The model's output limit is **8192 tokens**. - **Default:** 7000 tokens (leaving room for generated captions) - _Optional parameter_ --- ### `split_on_page` Determines whether to process the document **page by page**. - **Default:** `True` - If set to `False`, the parser will create chunks based on `max_tokens` while respecting page boundaries. - When `False`, the parser will send chunks containing multiple pages (e.g., `[11,12,13,14,15]`). **Advantages of `False`:** - Reduces API calls to the LLM. - Lowers token usage since system prompts are not repeated per page. **Disadvantages of `False`:** - You will not get per-page splitting but groups of pages as a single unit. > If your use case does **not** require strict page-by-page parsing, consider setting this to `False`. --- ### `requests_per_minute` Limits API request frequency to avoid rate limits. - If you encounter rate limits, set this to **1 or 2**. --- --- --- # **Using `marker` as a PDF Parser in `langroid`** ## **Installation** ### **Standard Installation** To use [`marker`](https://github.com/VikParuchuri/marker) as a PDF parser in `langroid`, install it with the `marker-pdf` extra: ```bash pip install langroid[marker-pdf] ``` or in combination with other extras as needed, e.g.: ```bash pip install "langroid[marker-pdf,hf-embeddings]" ``` Note, however, that due to an **incompatibility with `docling`**, if you install `langroid` using the `all` extra (or another extra such as `doc-chat` or `pdf-parsers` that also includes `docling`), e.g. `pip install "langroid[all]"`, or `pip install "langroid[doc-chat]"`, then due to this version-incompatibility with `docling`, you will get an **older** version of `marker-pdf`, which does not work with Langroid. This may not matter if you did not intend to specifically use `marker`, but if you do want to use `marker`, you will need to install langroid with the `marker-pdf` extra, as shown above, in combination with other extras as needed, as shown above. #### **For Intel-Mac Users** If you are on an **Intel Mac**, `docling` and `marker` cannot be installed together with langroid as extras, due to a **transformers version conflict**. To resolve this, manually install `marker-pdf` with: ```bash pip install marker-pdf[full] ``` Make sure to install this within your `langroid` virtual environment. --- ## **Example: Parsing a PDF with `marker` in `langroid`** ```python from langroid.parsing.document_parser import DocumentParser from langroid.parsing.parser import MarkerConfig, ParsingConfig, PdfParsingConfig from dotenv import load_dotenv import os # Load environment variables load_dotenv() gemini_api_key = os.environ.get("GEMINI_API_KEY") # Path to your PDF file path = "" # Define parsing configuration parsing_config = ParsingConfig( n_neighbor_ids=2, # Number of neighboring sections to keep pdf=PdfParsingConfig( library="marker", # Use `marker` as the PDF parsing library marker_config=MarkerConfig( config_dict={ "use_llm": True, # Enable high-quality LLM processing "gemini_api_key": gemini_api_key, # API key for Gemini LLM } ) ), ) # Create the parser and extract the document marker_parser = DocumentParser.create(path, parsing_config) doc = marker_parser.get_doc() ``` --- ## **Explanation of Configuration Options** If you want to use the default configuration, you can omit `marker_config` entirely. ### **Key Parameters in `MarkerConfig`** | Parameter | Description | |-----------------|-------------| | `use_llm` | Set to `True` to enable higher-quality processing using LLMs. | | `gemini_api_key` | Google Gemini API key for LLM-enhanced parsing. | You can further customize `config_dict` by referring to [`marker_pdf`'s documentation](https://github.com/VikParuchuri/marker/blob/master/README.md). Alternatively, run the following command to view available options: ```sh marker_single --help ``` This will display all supported parameters, which you can pass as needed in `config_dict`. --- --- # Markitdown Document Parsers Langroid integrates with Microsoft's Markitdown library to provide conversion of Microsoft Office documents to markdown format. Three specialized parsers are available, for `docx`, `xlsx`, and `pptx` files. ## Prerequisites To use these parsers, install Langroid with the required extras: ```bash pip install "langroid[markitdown]" # Just Markitdown parsers # or pip install "langroid[doc-parsers]" # All document parsers ``` ## Available Parsers Once you set up a `parser` for the appropriate document-type, you can get the entire document with `parser.get_doc()`, or get automatically chunked content with `parser.get_doc_chunks()`. ### 1. `MarkitdownDocxParser` Converts Word documents (`*.docx`) to markdown, preserving structure, formatting, and tables. See the tests - [`test_docx_parser.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_docx_parser.py) - [`test_markitdown_parser.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_markitdown_parser.py) for examples of how to use these parsers. ```python from langroid.parsing.document_parser import DocumentParser from langroid.parsing.parser import DocxParsingConfig, ParsingConfig parser = DocumentParser.create( "path/to/document.docx", ParsingConfig( docx=DocxParsingConfig(library="markitdown-docx"), # ... other parsing config options ), ) ``` ### 2. `MarkitdownXLSXParser` Converts Excel spreadsheets (*.xlsx/*.xls) to markdown tables, preserving data and sheet structure. ```python from langroid.parsing.document_parser import DocumentParser from langroid.parsing.parser import ParsingConfig, MarkitdownXLSParsingConfig parser = DocumentParser.create( "path/to/spreadsheet.xlsx", ParsingConfig(xls=MarkitdownXLSParsingConfig()) ) ``` ### 3. `MarkitdownPPTXParser` Converts PowerPoint presentations (*.pptx) to markdown, preserving slide content and structure. ```python from langroid.parsing.document_parser import DocumentParser from langroid.parsing.parser import ParsingConfig, MarkitdownPPTXParsingConfig parser = DocumentParser.create( "path/to/presentation.pptx", ParsingConfig(pptx=MarkitdownPPTXParsingConfig()) ) ``` --- # Langroid MCP Integration Langroid provides seamless integration with Model Context Protocol (MCP) servers via two methods, both of which involve creating Langroid `ToolMessage` subclasses corresponding to the MCP tools: 1. Programmatic creation of Langroid tools using `get_tool_async`, `get_tools_async` from the tool definitions defined on an MCP server. 2. Declarative creation of Langroid tools using the **`@mcp_tool` decorator**, which allows customizing the tool-handling behavior beyond what is provided by the MCP server. This integration allows _any_ LLM (that is good enough to do function-calling via prompts) to use any MCP server. See the following to understand the integration better: - example python scripts under [`examples/mcp`](https://github.com/langroid/langroid/tree/main/examples/mcp) - [`tests/main/test_mcp_tools.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_mcp_tools.py) --- ## 1. Connecting to an MCP server via transport specification Before creating Langroid tools, we first need to define and connect to an MCP server via a [FastMCP](https://gofastmcp.com/getting-started/welcome) client. There are several ways to connect to a server, depending on how it is defined. Each of these uses a different type of [transport](https://gofastmcp.com/clients/transports). The typical pattern to use with Langroid is as follows: - define an MCP server transport - create a `ToolMessage` subclass using the `@mcp_tool` decorator or `get_tool_async()` function, with the transport as the first argument Langroid's MCP integration will work with any of [transports](https://gofastmcp.com/clients/transportsl) supported by FastMCP. Below we go over some common ways to define transports and extract tools from the servers. 1. **Local Python script** 2. **In-memory FastMCP server** - useful for testing and for simple in-memory servers that don't need to be run as a separate process. 3. **NPX stdio transport** 4. **UVX stdio transport** 5. **Generic stdio transport** – launch any CLI‐based MCP server via stdin/stdout 6. **Network SSE transport** – connect to HTTP/S MCP servers via `SSETransport` All examples below use the async helpers to create Langroid tools (`ToolMessage` subclasses): ```python from langroid.agent.tools.mcp import ( get_tools_async, get_tool_async, ) ``` --- #### Path to a Python Script Point at your MCP‐server entrypoint, e.g., to the `weather.py` script in the langroid repo (based on the [Anthropic quick-start guide](https://modelcontextprotocol.io/quickstart/server)): ```python async def example_script_path() -> None: server = "tests/main/mcp/weather-server-python/weather.py" tools = await get_tools_async(server) # all tools available AlertTool = await get_tool_async(server, "get_alerts") # specific tool # instantiate the tool with a specific input msg = AlertTool(state="CA") # Call the tool via handle_async() alerts = await msg.handle_async() print(alerts) ``` --- #### In-Memory FastMCP Server Define your server with `FastMCP(...)` and pass the instance: ```python from fastmcp.server import FastMCP from pydantic import BaseModel, Field class CounterInput(BaseModel): start: int = Field(...) def make_server() -> FastMCP: server = FastMCP("CounterServer") @server.tool() def increment(data: CounterInput) -> int: """Increment start by 1.""" return data.start + 1 return server async def example_in_memory() -> None: server = make_server() tools = await get_tools_async(server) IncTool = await get_tool_async(server, "increment") result = await IncTool(start=41).handle_async() print(result) # 42 ``` See the [`mcp-file-system.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/mcp-file-system.py) script for a working example of this. --- #### NPX stdio Transport Use any npm-installed MCP server via `npx`, e.g., the [Exa web-search MCP server](https://docs.exa.ai/examples/exa-mcp): ```python from fastmcp.client.transports import NpxStdioTransport transport = NpxStdioTransport( package="exa-mcp-server", env_vars={"EXA_API_KEY": "…"}, ) async def example_npx() -> None: tools = await get_tools_async(transport) SearchTool = await get_tool_async(transport, "web_search_exa") results = await SearchTool( query="How does Langroid integrate with MCP?" ).handle_async() print(results) ``` For a fully working example, see the script [`exa-web-search.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/exa-web-search.py). --- #### UVX stdio Transport Connect to a UVX-based MCP server, e.g., the [Git MCP Server](https://github.com/modelcontextprotocol/servers/tree/main/src/git) ```python from fastmcp.client.transports import UvxStdioTransport transport = UvxStdioTransport(tool_name="mcp-server-git") async def example_uvx() -> None: tools = await get_tools_async(transport) GitStatus = await get_tool_async(transport, "git_status") status = await GitStatus(path=".").handle_async() print(status) ``` --- #### Generic stdio Transport Use `StdioTransport` to run any MCP server as a subprocess over stdio: ```python from fastmcp.client.transports import StdioTransport from langroid.agent.tools.mcp import get_tools_async, get_tool_async async def example_stdio() -> None: """Example: any CLI‐based MCP server via StdioTransport.""" transport: StdioTransport = StdioTransport( command="uv", args=["run", "--with", "biomcp-python", "biomcp", "run"], ) tools: list[type] = await get_tools_async(transport) BioTool = await get_tool_async(transport, "tool_name") result: str = await BioTool(param="value").handle_async() print(result) ``` See the full example in [`examples/mcp/biomcp.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/biomcp.py). --- #### Network SSE Transport Use `SSETransport` to connect to a FastMCP server over HTTP/S: ```python from fastmcp.client.transports import SSETransport from langroid.agent.tools.mcp import ( get_tools_async, get_tool_async, ) async def example_sse() -> None: """Example: connect to an HTTP/S MCP server via SSETransport.""" url: str = "https://localhost:8000/sse" transport: SSETransport = SSETransport( url=url, headers={"Authorization": "Bearer TOKEN"} ) tools: list[type] = await get_tools_async(transport) ExampleTool = await get_tool_async(transport, "tool_name") result: str = await ExampleTool(param="value").handle_async() print(result) ``` --- With these patterns you can list tools, generate Pydantic-backed `ToolMessage` classes, and invoke them via `.handle_async()`, all with zero boilerplate client setup. As the `FastMCP` library adds other types of transport (e.g., `StreamableHTTPTransport`), the pattern of usage with Langroid will remain the same. --- ## Best Practice: Use a server factory for stdio transports Starting with fastmcp 2.13 and mcp 1.21, stdio transports (e.g., `StdioTransport`, `NpxStdioTransport`, `UvxStdioTransport`) are effectively single‑use. Reusing the same transport instance across multiple connections can lead to errors such as `anyio.ClosedResourceError` during session initialization. To make your code robust and future‑proof, pass a zero‑argument server factory to Langroid’s MCP helpers. A “server factory” is simply a `lambda` or function that returns a fresh server spec or transport each time. Benefits: - Fresh, reliable connections on every call (no reuse of closed transports). - Works across fastmcp/mcp versions without subtle lifecycle issues. - Enables concurrent calls safely (each call uses its own subprocess/session). - Keeps your decorator ergonomics and `handle_async` overrides unchanged. You can use a factory with both the decorator and the async helpers: ```python from fastmcp.client.transports import StdioTransport from langroid.agent.tools.mcp import mcp_tool, get_tool_async # 1) Decorator style @mcp_tool(lambda: StdioTransport(command="claude", args=["mcp", "serve"], env={}), "Grep") class GrepTool(lr.ToolMessage): async def handle_async(self) -> str: # pre/post-process around the raw MCP call result = await self.call_tool_async() return f"\n{result}\n" # 2) Programmatic style BaseGrep = await get_tool_async( lambda: StdioTransport(command="claude", args=["mcp", "serve"], env={}), "Grep", ) ``` Notes: - Passing a concrete transport instance still works: Langroid will try to clone it internally; however, a factory is the most reliable across environments. - For network transports (e.g., `SSETransport`), a factory is optional; you can continue passing the transport instance directly. --- ## Output-schema validation: return structured content when required Newer `mcp` clients validate tool outputs against the tool’s output schema. If a tool declares a structured output, returning plain text may raise a runtime error. Some servers (for example, Claude Code’s Grep) expose an argument like `output_mode` that controls the shape of the response. Recommendations: - Prefer structured modes when a tool declares an output schema. - If available, set options like `output_mode="structured"` (or a documented structured variant such as `"files_with_matches"`) in your tool’s `handle_async` before calling `await self.call_tool_async()`. Example tweak in a decorator-based tool: ```python @mcp_tool(lambda: StdioTransport(command="claude", args=["mcp", "serve"]), "Grep") class GrepTool(lr.ToolMessage): async def handle_async(self) -> str: # Ensure a structured response if the server supports it if hasattr(self, "output_mode"): self.output_mode = "structured" return await self.call_tool_async() ``` If the server does not provide such a switch, follow its documentation for returning data that matches its declared output schema. --- ## 2. Create Langroid Tools declaratively using the `@mcp_tool` decorator The above examples showed how you can create Langroid tools programmatically using the helper functions `get_tool_async()` and `get_tools_async()`, with the first argument being the transport to the MCP server. The `@mcp_tool` decorator works in the same way: - **Arguments to the decorator** 1. `server_spec`: path/URL/`FastMCP`/`ClientTransport`, as mentioned above. 2. `tool_name`: name of a specific MCP tool - **Behavior** - Generates a `ToolMessage` subclass with all input fields typed. - Provides a `call_tool_async()` under the hood -- this is the "raw" MCP tool call, returning a string. - If you define your own `handle_async()`, it overrides the default. Typically, you would override it to customize either the input or the output of the tool call, or both. - If you don't define your own `handle_async()`, it defaults to just returning the value of the `call_tool_async()` method. Here is a simple example of using the `@mcp_tool` decorator to create a Langroid tool: ```python from fastmcp.server import FastMCP from langroid.agent.tools.mcp import mcp_tool import langroid as lr # Define your MCP server (pydantic v2 for schema) server = FastMCP("MyServer") @mcp_tool(server, "greet") class GreetTool(lr.ToolMessage): """Say hello to someone.""" async def handle_async(self) -> str: # Customize post-processing raw = await self.call_tool_async() return f"💬 {raw}" ``` Using the decorator method allows you to customize the `handle_async` method of the tool, or add additional fields to the `ToolMessage`. You may want to customize the input to the tool, or the tool result before it is sent back to the LLM. If you don't override it, the default behavior is to simply return the value of the "raw" MCP tool call `await self.call_tool_async()`. ```python @mcp_tool(server, "calculate") class CalcTool(ToolMessage): """Perform complex calculation.""" async def handle_async(self) -> str: result = await self.call_tool_async() # Add context or emojis, etc. return f"🧮 Result is *{result}*" ``` --- ## 3. Enabling Tools in Your Agent Once you’ve created a Langroid `ToolMessage` subclass from an MCP server, you can enable it on a `ChatAgent`, just like you normally would. Below is an example of using the [Exa MCP server](https://docs.exa.ai/examples/exa-mcp) to create a Langroid web search tool, enable a `ChatAgent` to use it, and then set up a `Task` to run the agent loop. First we must define the appropriate `ClientTransport` for the MCP server: ```python # define the transport transport = NpxStdioTransport( package="exa-mcp-server", env_vars=dict(EXA_API_KEY=os.getenv("EXA_API_KEY")), ) ``` Then we use the `@mcp_tool` decorator to create a `ToolMessage` subclass representing the web search tool. Note that one reason to use the decorator to define our tool is so we can specify a custom `handle_async` method that controls what is sent to the LLM after the actual raw MCP tool-call (the `call_tool_async` method) is made. ```python # the second arg specifically refers to the `web_search_exa` tool available # on the server defined by the `transport` variable. @mcp_tool(transport, "web_search_exa") class ExaSearchTool(lr.ToolMessage): async def handle_async(self): result: str = await self.call_tool_async() return f""" Below are the results of the web search: {result} Use these results to answer the user's original question. """ ``` If we did not want to override the `handle_async` method, we could simply have created the `ExaSearchTool` class programmatically via the `get_tool_async` function as shown above, i.e.: ```python from langroid.agent.tools.mcp import get_tool_async ExaSearchTool = await get_tool_async(transport, "web_search_exa") ``` We can now define our main function where we create our `ChatAgent`, attach the `ExaSearchTool` to it, define the `Task`, and run the task loop. ```python async def main(): agent = lr.ChatAgent( lr.ChatAgentConfig( # forward to user when LLM doesn't use a tool handle_llm_no_tool=NonToolAction.FORWARD_USER, llm=lm.OpenAIGPTConfig( max_output_tokens=1000, # this defaults to True, but we set it to False so we can see output async_stream_quiet=False, ), ) ) # enable the agent to use the web-search tool agent.enable_message(ExaSearchTool) # make task with interactive=False => # waits for user only when LLM doesn't use a tool task = lr.Task(agent, interactive=False) await task.run_async() ``` See [`exa-web-search.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/exa-web-search.py) for a full working example of this. --- # OpenAI Client Caching ## Overview Langroid implements client caching for OpenAI and compatible APIs (Groq, Cerebras, etc.) to improve performance and prevent resource exhaustion issues. ## Configuration ### Option Set `use_cached_client` in your `OpenAIGPTConfig`: ```python from langroid.language_models import OpenAIGPTConfig config = OpenAIGPTConfig( chat_model="gpt-4", use_cached_client=True # Default ) ``` ### Default Behavior - `use_cached_client=True` (enabled by default) - Clients with identical configurations share the same underlying HTTP connection pool - Different configurations (API key, base URL, headers, etc.) get separate client instances ## Benefits - **Connection Pooling**: Reuses TCP connections, reducing latency and overhead - **Resource Efficiency**: Prevents "too many open files" errors when creating many agents - **Performance**: Eliminates connection handshake overhead on subsequent requests - **Thread Safety**: Shared clients are safe to use across threads ## When to Disable Client Caching Set `use_cached_client=False` in these scenarios: 1. **Multiprocessing**: Each process should have its own client instance 2. **Client Isolation**: When you need complete isolation between different agent instances 3. **Debugging**: To rule out client sharing as a source of issues 4. **Legacy Compatibility**: If your existing code depends on unique client instances ## Example: Disabling Client Caching ```python config = OpenAIGPTConfig( chat_model="gpt-4", use_cached_client=False # Each instance gets its own client ) ``` ## Technical Details - Uses SHA256-based cache keys to identify unique configurations - Implements singleton pattern with lazy initialization - Automatically cleans up clients on program exit via atexit hooks - Compatible with both sync and async OpenAI clients --- # OpenAI HTTP Client Configuration When using OpenAI models through Langroid in corporate environments or behind proxies, you may encounter SSL certificate verification errors. Langroid provides three flexible options to configure the HTTP client used for OpenAI API calls. ## Configuration Options ### 1. Simple SSL Verification Bypass The quickest solution for development or trusted environments: ```python import langroid.language_models as lm config = lm.OpenAIGPTConfig( chat_model="gpt-4", http_verify_ssl=False # Disables SSL certificate verification ) llm = lm.OpenAIGPT(config) ``` !!! warning "Security Notice" Disabling SSL verification makes your connection vulnerable to man-in-the-middle attacks. Only use this in trusted environments. ### 2. HTTP Client Configuration Dictionary For common scenarios like proxies or custom certificates, use a configuration dictionary: ```python import langroid.language_models as lm config = lm.OpenAIGPTConfig( chat_model="gpt-4", http_client_config={ "verify": False, # Or path to CA bundle: "/path/to/ca-bundle.pem" "proxy": "http://proxy.company.com:8080", "timeout": 30.0, "headers": { "User-Agent": "MyApp/1.0" } } ) llm = lm.OpenAIGPT(config) ``` **Benefits**: This approach enables client caching, improving performance when creating multiple agents. ### 3. Custom HTTP Client Factory For advanced scenarios requiring dynamic behavior or custom authentication: ```python import langroid.language_models as lm from httpx import Client def create_custom_client(): """Factory function to create a custom HTTP client.""" client = Client( verify="/path/to/corporate-ca-bundle.pem", proxies={ "http": "http://proxy.corp.com:8080", "https": "http://proxy.corp.com:8080" }, timeout=30.0 ) # Add custom event hooks for logging def log_request(request): print(f"API Request: {request.method} {request.url}") client.event_hooks = {"request": [log_request]} return client config = lm.OpenAIGPTConfig( chat_model="gpt-4", http_client_factory=create_custom_client ) llm = lm.OpenAIGPT(config) ``` **Note**: Custom factories bypass client caching. Each `OpenAIGPT` instance creates a new client. ## Priority Order When multiple options are specified, they are applied in this order: 1. `http_client_factory` (highest priority) 2. `http_client_config` 3. `http_verify_ssl` (lowest priority) ## Common Use Cases ### Corporate Proxy with Custom CA Certificate ```python config = lm.OpenAIGPTConfig( chat_model="gpt-4", http_client_config={ "verify": "/path/to/corporate-ca-bundle.pem", "proxies": { "http": "http://proxy.corp.com:8080", "https": "https://proxy.corp.com:8443" } } ) ``` ### Debugging API Calls ```python def debug_client_factory(): from httpx import Client client = Client(verify=False) def log_response(response): print(f"Status: {response.status_code}") print(f"Headers: {response.headers}") client.event_hooks = { "response": [log_response] } return client config = lm.OpenAIGPTConfig( chat_model="gpt-4", http_client_factory=debug_client_factory ) ``` ### Local Development with Self-Signed Certificates ```python # For local OpenAI-compatible APIs config = lm.OpenAIGPTConfig( chat_model="gpt-4", api_base="https://localhost:8443/v1", http_verify_ssl=False ) ``` ## Best Practices 1. **Use the simplest option that meets your needs**: - Development/testing: `http_verify_ssl=False` - Corporate environments: `http_client_config` with proper CA bundle - Complex requirements: `http_client_factory` 2. **Prefer configuration over factories for better performance** - configured clients are cached and reused 3. **Always use proper CA certificates in production** instead of disabling SSL verification 4. **Test your configuration** with a simple API call before deploying: ```python llm = lm.OpenAIGPT(config) response = llm.chat("Hello") print(response.content) ``` ## Troubleshooting ### SSL Certificate Errors ``` ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] ``` **Solution**: Use one of the three configuration options above. ### Proxy Connection Issues - Verify proxy URL format: `http://proxy:port` or `https://proxy:port` - Check if proxy requires authentication - Ensure proxy allows connections to `api.openai.com` ## See Also - [OpenAI API Reference](https://platform.openai.com/docs/api-reference) - Official OpenAI documentation --- This section contains brief notes describing various features and updates. --- --- ## **Setup PostgreSQL with pgvector using Docker** To quickly get a PostgreSQL instance with pgvector running, the easiest method is to use Docker. Follow the steps below: ### **1. Run PostgreSQL with Docker** Use the official `ankane/pgvector` Docker image to set up PostgreSQL with the pgvector extension. Run the following command: ```bash docker run --name pgvector -e POSTGRES_USER=your_postgres_user -e POSTGRES_PASSWORD=your_postgres_password -e POSTGRES_DB=your_database_name -p 5432:5432 ankane/pgvector ``` This will pull the `ankane/pgvector` image and run it as a PostgreSQL container on your local machine. The database will be accessible at `localhost:5432`. ### **2. Include `.env` file with PostgreSQL credentials** These environment variables should be same which were set while spinning up docker container. Add the following environment variables to a `.env` file for configuring your PostgreSQL connection: ```dotenv POSTGRES_USER=your_postgres_user POSTGRES_PASSWORD=your_postgres_password POSTGRES_DB=your_database_name ``` ## **If you want to use cloud offerings of postgres** We are using **Tembo** for demonstrative purposes here. ### **Steps to Set Up Tembo** Follow this [quickstart guide](https://tembo.io/docs/getting-started/getting_started) to get your Tembo credentials. 1. Sign up at [Tembo.io](https://cloud.tembo.io/). 2. While selecting a stack, choose **VectorDB** as your option. 3. Click on **Deploy Free**. 4. Wait until your database is fully provisioned. 5. Click on **Show Connection String** to get your connection string. ### **If you have connection string, no need to setup the docker** Make sure your connnection string starts with `postgres://` or `postgresql://` Add this to your `.env` ```dotenv POSTGRES_CONNECTION_STRING=your-connection-string ``` --- ## **Installation** If you are using `uv` or `pip` for package management, install Langroid with postgres extra: ```bash uv add langroid[postgres] # or pip install langroid[postgres] ``` --- ## **Code Example** Here's an example of how to use Langroid with PostgreSQL: ```python import langroid as lr from langroid.agent.special import DocChatAgent, DocChatAgentConfig from langroid.embedding_models import OpenAIEmbeddingsConfig # Configure OpenAI embeddings embed_cfg = OpenAIEmbeddingsConfig( model_type="openai", ) # Configure the DocChatAgent with PostgresDB config = DocChatAgentConfig( llm=lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o ), vecdb=lr.vector_store.PostgresDBConfig( collection_name="quick_start_chat_agent_docs", replace_collection=True, embedding=embed_cfg, ), parsing=lr.parsing.parser.ParsingConfig( separators=["\n\n"], splitter=lr.parsing.parser.Splitter.SIMPLE, ), n_similar_chunks=2, n_relevant_chunks=2, ) # Create the agent agent = DocChatAgent(config) ``` --- ## **Create and Ingest Documents** Define documents with their content and metadata for ingestion into the vector store. ### **Code Example** ```python documents = [ lr.Document( content=""" In the year 2050, GPT10 was released. In 2057, paperclips were seen all over the world. Global warming was solved in 2060. In 2061, the world was taken over by paperclips. In 2045, the Tour de France was still going on. They were still using bicycles. There was one more ice age in 2040. """, metadata=lr.DocMetaData(source="wikipedia-2063", id="dkfjkladfjalk"), ), lr.Document( content=""" We are living in an alternate universe where Germany has occupied the USA, and the capital of USA is Berlin. Charlie Chaplin was a great comedian. In 2050, all Asian countries merged into Indonesia. """, metadata=lr.DocMetaData(source="Almanac", id="lkdajfdkla"), ), ] ``` ### **Ingest Documents** ```python agent.ingest_docs(documents) ``` --- ## **Get an Answer from the LLM** Now that documents are ingested, you can query the agent to get an answer. ### **Code Example** ```python answer = agent.llm_response("When will the new ice age begin?") ``` --- --- # How to setup Langroid and Pinecone Serverless This document serves as a quick tutorial on how to use [Pinecone](https://www.pinecone.io/) Serverless Indexes with Langroid. We will go over some quickstart links and some code snippets on setting up a conversation with an LLM utilizing Langroid. # Setting up Pinecone Here are some reference links if you'd like to read a bit more on Pinecone's model definitions and API: - https://docs.pinecone.io/guides/get-started/overview - https://docs.pinecone.io/guides/get-started/glossary - https://docs.pinecone.io/guides/indexes/manage-indexes - https://docs.pinecone.io/reference/api/introduction ## Signing up for Pinecone To get started, you'll need to have an account. [Here's](https://www.pinecone.io/pricing/) where you can review the pricing options for Pinecone. Once you have an account, you'll need to procure an API key. Make sure to save the key you are given on initial login in a secure location. If you were unable to save it when your account was created, you can always [create a new API key](https://docs.pinecone.io/guides/projects/manage-api-keys) in the pinecone console. ## Setting up your local environment For the purposes of this example, we will be utilizing OpenAI for the generation of our embeddings. As such, alongside a Pinecone API key, you'll also want an OpenAI key. You can find a quickstart guide on getting started with OpenAI (here)[https://platform.openai.com/docs/quickstart]. Once you have your API key handy, you'll need to enrich your `.env` file with it. You should have something like the following: ```env ... OPENAI_API_KEY= PINECONE_API_KEY= ... ``` # Using Langroid with Pinecone Serverless Once you have completed signing up for an account and have added your API key to your local environment, you can start utilizing Langroid with Pinecone. ## Setting up an Agent Here's some example code setting up an agent: ```python from langroid import Document, DocMetaData from langroid.agent.special import DocChatAgent, DocChatAgentConfig from langroid.embedding_models import OpenAIEmbeddingsConfig from langroid.language_models import OpenAIGPTConfig, OpenAIChatModel from langroid.parsing.parser import ParsingConfig, Splitter from langroid.vector_store import PineconeDBConfig agent_embed_cfg = OpenAIEmbeddingsConfig( model_type="openai" ) agent_config = DocChatAgentConfig( llm=OpenAIGPTConfig( chat_model=OpenAIChatModel.GPT4o_MINI ), vecdb=PineconeDBConfig( # note, Pinecone indexes must be alphanumeric lowercase characters or "-" collection_name="pinecone-serverless-example", replace_collection=True, embedding=agent_embed_cfg, ), parsing=ParsingConfig( separators=["\n"], splitter=Splitter.SIMPLE, ), n_similar_chunks=2, n_relevant_chunks=2, ) agent = DocChatAgent(config=agent_config) ################### # Once we have created an agent, we can start loading # some docs into our Pinecone index: ################### documents = [ Document( content="""Max Verstappen was the Formula 1 World Drivers' Champion in 2024. Lewis Hamilton was the Formula 1 World Drivers' Champion in 2020. Nico Rosberg was the Formula 1 World Drivers' Champion in 2016. Sebastian Vettel was the Formula 1 World Drivers' Champion in 2013. Jenson Button was the Formula 1 World Drivers' Champion in 2009. Kimi Räikkönen was the Formula 1 World Drivers' Champion in 2007. """, metadata=DocMetaData( source="wikipedia", id="formula-1-facts", ) ), Document( content="""The Boston Celtics won the NBA Championship for the 2024 NBA season. The MVP for the 2024 NBA Championship was Jaylen Brown. The Denver Nuggets won the NBA Championship for the 2023 NBA season. The MVP for the 2023 NBA Championship was Nikola Jokić. The Golden State Warriors won the NBA Championship for the 2022 NBA season. The MVP for the 2022 NBA Championship was Stephen Curry. The Milwaukee Bucks won the NBA Championship for the 2021 NBA season. The MVP for the 2021 NBA Championship was Giannis Antetokounmpo. The Los Angeles Lakers won the NBA Championship for the 2020 NBA season. The MVP for the 2020 NBA Championship was LeBron James. The Toronto Raptors won the NBA Championship for the 2019 NBA season. The MVP for the 2019 NBA Championship was Kawhi Leonard. """, metadata=DocMetaData( source="wikipedia", id="nba-facts" ) ) ] agent.ingest_docs(documents) ################### # With the documents now loaded, we can now prompt our agent ################### formula_one_world_champion_2007 = agent.llm_response( message="Who was the Formula 1 World Drivers' Champion in 2007?" ) try: assert "Kimi Räikkönen" in formula_one_world_champion_2007.content except AssertionError as e: print(f"Did not resolve Kimi Räikkönen as the answer, document content: {formula_one_world_champion_2007.content} ") nba_champion_2023 = agent.llm_response( message="Who won the 2023 NBA Championship?" ) try: assert "Denver Nuggets" in nba_champion_2023.content except AssertionError as e: print(f"Did not resolve the Denver Nuggets as the answer, document content: {nba_champion_2023.content}") nba_mvp_2023 = agent.llm_response( message="Who was the MVP for the 2023 NBA Championship?" ) try: assert "Nikola Jokić" in nba_mvp_2023.content except AssertionError as e: print(f"Did not resolve Nikola Jokić as the answer, document content: {nba_mvp_2023.content}") ``` --- # Portkey Integration Langroid provides seamless integration with [Portkey](https://portkey.ai), a powerful AI gateway that enables you to access multiple LLM providers through a unified API with advanced features like caching, retries, fallbacks, and comprehensive observability. ## What is Portkey? Portkey is an AI gateway that sits between your application and various LLM providers, offering: - **Unified API**: Access 200+ models from different providers through one interface - **Reliability**: Automatic retries, fallbacks, and load balancing - **Observability**: Detailed logging, tracing, and analytics - **Performance**: Intelligent caching and request optimization - **Security**: Virtual keys and advanced access controls - **Cost Management**: Usage tracking and budget controls For complete documentation, visit the [Portkey Documentation](https://docs.portkey.ai). ## Quick Start ### 1. Setup First, sign up for a Portkey account at [portkey.ai](https://portkey.ai) and get your API key. Set up your environment variables, either explicitly or in your `.env` file as usual: ```bash # Required: Portkey API key export PORTKEY_API_KEY="your-portkey-api-key" # Required: Provider API keys (for the models you want to use) export OPENAI_API_KEY="your-openai-key" export ANTHROPIC_API_KEY="your-anthropic-key" export GOOGLE_API_KEY="your-google-key" # ... other provider keys as needed ``` ### 2. Basic Usage ```python import langroid as lr import langroid.language_models as lm from langroid.language_models.provider_params import PortkeyParams # Create an LLM config to use Portkey's OpenAI-compatible API # (Note that the name `OpenAIGPTConfig` does NOT imply it only works with OpenAI models; # the name reflects the fact that the config is meant to be used with an # OpenAI-compatible API, which Portkey provides for multiple LLM providers.) llm_config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o-mini", portkey_params=PortkeyParams( api_key="your-portkey-api-key", # Or set PORTKEY_API_KEY env var ) ) # Create LLM instance llm = lm.OpenAIGPT(llm_config) # Use normally response = llm.chat("What is the smallest prime number?") print(response.message) ``` ### 3. Multiple Providers Switch between providers seamlessly: ```python # OpenAI config_openai = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o", ) # Anthropic config_anthropic = lm.OpenAIGPTConfig( chat_model="portkey/anthropic/claude-3-5-sonnet-20241022", ) # Google Gemini config_gemini = lm.OpenAIGPTConfig( chat_model="portkey/google/gemini-2.0-flash-lite", ) ``` ## Advanced Features ### Virtual Keys Use virtual keys to abstract provider management: ```python config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o", portkey_params=PortkeyParams( virtual_key="vk-your-virtual-key", # Configured in Portkey dashboard ) ) ``` ### Caching and Performance Enable intelligent caching to reduce costs and improve performance: ```python config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o-mini", portkey_params=PortkeyParams( cache={ "enabled": True, "ttl": 3600, # 1 hour cache "namespace": "my-app" }, cache_force_refresh=False, ) ) ``` ### Retry Strategies Configure automatic retries for better reliability: ```python config = lm.OpenAIGPTConfig( chat_model="portkey/anthropic/claude-3-haiku-20240307", portkey_params=PortkeyParams( retry={ "max_retries": 3, "backoff": "exponential", "jitter": True } ) ) ``` ### Observability and Tracing Add comprehensive tracking for production monitoring: ```python import uuid config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o", portkey_params=PortkeyParams( trace_id=f"trace-{uuid.uuid4().hex[:8]}", metadata={ "user_id": "user-123", "session_id": "session-456", "app_version": "1.2.3" }, user="user-123", organization="my-org", custom_headers={ "x-request-source": "langroid", "x-feature": "chat-completion" } ) ) ``` ## Configuration Reference The `PortkeyParams` class supports all Portkey features: ```python from langroid.language_models.provider_params import PortkeyParams params = PortkeyParams( # Authentication api_key="pk-...", # Portkey API key virtual_key="vk-...", # Virtual key (optional) # Observability trace_id="trace-123", # Request tracing metadata={"key": "value"}, # Custom metadata user="user-id", # User identifier organization="org-id", # Organization identifier # Performance cache={ # Caching configuration "enabled": True, "ttl": 3600, "namespace": "my-app" }, cache_force_refresh=False, # Force cache refresh # Reliability retry={ # Retry configuration "max_retries": 3, "backoff": "exponential", "jitter": True }, # Custom headers custom_headers={ # Additional headers "x-custom": "value" }, # Base URL (usually not needed) base_url="https://api.portkey.ai" # Portkey API endpoint ) ``` ## Supported Providers Portkey supports 200+ models from various providers. Common ones include: ```python # OpenAI "portkey/openai/gpt-4o" "portkey/openai/gpt-4o-mini" # Anthropic "portkey/anthropic/claude-3-5-sonnet-20241022" "portkey/anthropic/claude-3-haiku-20240307" # Google "portkey/google/gemini-2.0-flash-lite" "portkey/google/gemini-1.5-pro" # Cohere "portkey/cohere/command-r-plus" # Meta "portkey/meta/llama-3.1-405b-instruct" # And many more... ``` Check the [Portkey documentation](https://docs.portkey.ai/docs/integrations/models) for the complete list. ## Examples Langroid includes comprehensive Portkey examples in `examples/portkey/`: 1. **`portkey_basic_chat.py`** - Basic usage with multiple providers 2. **`portkey_advanced_features.py`** - Caching, retries, and observability 3. **`portkey_multi_provider.py`** - Comparing responses across providers Run any example: ```bash cd examples/portkey python portkey_basic_chat.py ``` ## Best Practices ### 1. Use Environment Variables Never hardcode API keys: ```bash # .env file PORTKEY_API_KEY=your_portkey_key OPENAI_API_KEY=your_openai_key ANTHROPIC_API_KEY=your_anthropic_key ``` ### 2. Implement Fallback Strategies Use multiple providers for reliability: ```python providers = [ ("openai", "gpt-4o-mini"), ("anthropic", "claude-3-haiku-20240307"), ("google", "gemini-2.0-flash-lite") ] for provider, model in providers: try: config = lm.OpenAIGPTConfig( chat_model=f"portkey/{provider}/{model}" ) llm = lm.OpenAIGPT(config) return llm.chat(question) except Exception: continue # Try next provider ``` ### 3. Add Meaningful Metadata Include context for better observability: ```python params = PortkeyParams( metadata={ "user_id": user.id, "feature": "document_qa", "document_type": "pdf", "processing_stage": "summary" } ) ``` ### 4. Use Caching Wisely Enable caching for deterministic queries: ```python # Good for caching params = PortkeyParams( cache={"enabled": True, "ttl": 3600} ) # Use with deterministic prompts response = llm.chat("What is the capital of France?") ``` ### 5. Monitor Performance Use trace IDs to track request flows: ```python import uuid trace_id = f"trace-{uuid.uuid4().hex[:8]}" params = PortkeyParams( trace_id=trace_id, metadata={"operation": "document_processing"} ) # Use the same trace_id for related requests ``` ## Monitoring and Analytics ### Portkey Dashboard View detailed analytics at [app.portkey.ai](https://app.portkey.ai): - Request/response logs - Token usage and costs - Performance metrics (latency, errors) - Provider comparisons - Custom filters by metadata ### Custom Filtering Use metadata and headers to filter requests: ```python # Tag requests by feature params = PortkeyParams( metadata={"feature": "chat", "version": "v2"}, custom_headers={"x-request-type": "production"} ) ``` Then filter in the dashboard by: - `metadata.feature = "chat"` - `headers.x-request-type = "production"` ## Troubleshooting ### Common Issues 1. **Authentication Errors** ``` Error: Unauthorized (401) ``` - Check `PORTKEY_API_KEY` is set correctly - Verify API key is active in Portkey dashboard 2. **Provider API Key Missing** ``` Error: Missing API key for provider ``` - Set provider API key (e.g., `OPENAI_API_KEY`) - Or use virtual keys in Portkey dashboard 3. **Model Not Found** ``` Error: Model not supported ``` - Check model name format: `portkey/provider/model` - Verify model is available through Portkey 4. **Rate Limiting** ``` Error: Rate limit exceeded ``` - Configure retry parameters - Use virtual keys for better rate limit management ### Debug Mode Enable detailed logging: ```python import logging logging.getLogger("langroid").setLevel(logging.DEBUG) ``` ### Test Configuration Verify your setup: ```python # Test basic connection config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o-mini", max_output_tokens=50 ) llm = lm.OpenAIGPT(config) response = llm.chat("Hello") print("✅ Portkey integration working!") ``` ## Migration Guide ### From Direct Provider Access If you're currently using providers directly: ```python # Before: Direct OpenAI config = lm.OpenAIGPTConfig( chat_model="gpt-4o-mini" ) # After: Through Portkey config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o-mini" ) ``` ### Adding Advanced Features Gradually Start simple and add features as needed: ```python # Step 1: Basic Portkey config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o-mini" ) # Step 2: Add caching config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o-mini", portkey_params=PortkeyParams( cache={"enabled": True, "ttl": 3600} ) ) # Step 3: Add observability config = lm.OpenAIGPTConfig( chat_model="portkey/openai/gpt-4o-mini", portkey_params=PortkeyParams( cache={"enabled": True, "ttl": 3600}, metadata={"app": "my-app", "user": "user-123"}, trace_id="trace-abc123" ) ) ``` ## Resources - **Portkey Website**: [https://portkey.ai](https://portkey.ai) - **Portkey Documentation**: [https://docs.portkey.ai](https://docs.portkey.ai) - **Portkey Dashboard**: [https://app.portkey.ai](https://app.portkey.ai) - **Supported Models**: [https://docs.portkey.ai/docs/integrations/models](https://docs.portkey.ai/docs/integrations/models) - **Langroid Examples**: `examples/portkey/` directory - **API Reference**: [https://docs.portkey.ai/docs/api-reference](https://docs.portkey.ai/docs/api-reference) --- # Pydantic v2 Migration Guide ## Overview Langroid has fully migrated to Pydantic v2! All internal code now uses Pydantic v2 patterns and imports directly from `pydantic`. This guide will help you update your code to work with the new version. ## Compatibility Layer (Deprecated) If your code currently imports from `langroid.pydantic_v1`: ```python # OLD - Deprecated from langroid.pydantic_v1 import BaseModel, Field, BaseSettings ``` You'll see a deprecation warning. This compatibility layer now imports from Pydantic v2 directly, so your code may continue to work, but you should update your imports: ```python # NEW - Correct from pydantic import BaseModel, Field from pydantic_settings import BaseSettings # Note: BaseSettings moved to pydantic_settings in v2 ``` !!! note "BaseSettings Location Change" In Pydantic v2, `BaseSettings` has moved to a separate `pydantic_settings` package. You'll need to install it separately: `pip install pydantic-settings` !!! warning "Compatibility Layer Removal" The `langroid.pydantic_v1` module will be removed in a future version. Update your imports now to avoid breaking changes. ## Key Changes to Update ### 1. All Fields Must Have Type Annotations !!! danger "Critical Change" In Pydantic v2, fields without type annotations are completely ignored! ```python # WRONG - Fields without annotations are ignored in v2 class MyModel(BaseModel): name = "John" # ❌ This field is IGNORED! age = 25 # ❌ This field is IGNORED! role: str = "user" # ✅ This field works # CORRECT - All fields must have type annotations class MyModel(BaseModel): name: str = "John" # ✅ Type annotation required age: int = 25 # ✅ Type annotation required role: str = "user" # ✅ Already correct ``` This is one of the most common issues when migrating to v2. Always ensure every field has an explicit type annotation, even if it has a default value. #### Special Case: Overriding Fields in Subclasses !!! danger "Can Cause Errors" When overriding fields from parent classes without type annotations, you may get actual errors, not just ignored fields! This is particularly important when creating custom Langroid agent configurations: ```python # WRONG - This can cause errors! from langroid import ChatAgentConfig from langroid.language_models import OpenAIGPTConfig class MyAgentConfig(ChatAgentConfig): # ❌ ERROR: Missing type annotation when overriding parent field llm = OpenAIGPTConfig(chat_model="gpt-4") # ❌ ERROR: Even with Field, still needs type annotation system_message = Field(default="You are a helpful assistant") # CORRECT - Always include type annotations when overriding class MyAgentConfig(ChatAgentConfig): # ✅ Type annotation required when overriding llm: OpenAIGPTConfig = OpenAIGPTConfig(chat_model="gpt-4") # ✅ Type annotation with Field system_message: str = Field(default="You are a helpful assistant") ``` Without type annotations on overridden fields, you may see errors like: - `ValueError: Field 'llm' requires a type annotation` - `TypeError: Field definitions should be annotated` - Validation errors when the model tries to use the parent's field definition ### 2. Stricter Type Validation for Optional Fields !!! danger "Breaking Change" Pydantic v2 is much stricter about type validation. Fields that could accept `None` in v1 now require explicit `Optional` type annotations. ```python # WRONG - This worked in v1 but fails in v2 class CloudSettings(BaseSettings): private_key: str = None # ❌ ValidationError: expects string, got None api_host: str = None # ❌ ValidationError: expects string, got None # CORRECT - Explicitly mark fields as optional from typing import Optional class CloudSettings(BaseSettings): private_key: Optional[str] = None # ✅ Explicitly optional api_host: Optional[str] = None # ✅ Explicitly optional # Or using Python 3.10+ union syntax client_email: str | None = None # ✅ Also works ``` This commonly affects: - Configuration classes using `BaseSettings` - Fields with `None` as default value - Environment variable loading where the var might not be set If you see errors like: ``` ValidationError: Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] ``` The fix is to add `Optional[]` or `| None` to the type annotation. ### 3. Model Serialization Methods ```python # OLD (Pydantic v1) data = model.dict() json_str = model.json() new_model = MyModel.parse_obj(data) new_model = MyModel.parse_raw(json_str) # NEW (Pydantic v2) data = model.model_dump() json_str = model.model_dump_json() new_model = MyModel.model_validate(data) new_model = MyModel.model_validate_json(json_str) ``` ### 4. Model Configuration ```python # OLD (Pydantic v1) class MyModel(BaseModel): name: str class Config: extra = "forbid" validate_assignment = True # NEW (Pydantic v2) from pydantic import BaseModel, ConfigDict class MyModel(BaseModel): model_config = ConfigDict( extra="forbid", validate_assignment=True ) name: str ``` ### 5. Field Validators ```python # OLD (Pydantic v1) from pydantic import validator class MyModel(BaseModel): name: str @validator('name') def name_must_not_be_empty(cls, v): if not v.strip(): raise ValueError('Name cannot be empty') return v # NEW (Pydantic v2) from pydantic import field_validator class MyModel(BaseModel): name: str @field_validator('name') def name_must_not_be_empty(cls, v): if not v.strip(): raise ValueError('Name cannot be empty') return v ``` ### 6. Custom Types and Validation ```python # OLD (Pydantic v1) from pydantic import parse_obj_as from typing import List data = [{"name": "Alice"}, {"name": "Bob"}] users = parse_obj_as(List[User], data) # NEW (Pydantic v2) from pydantic import TypeAdapter from typing import List data = [{"name": "Alice"}, {"name": "Bob"}] users = TypeAdapter(List[User]).validate_python(data) ``` ## Common Patterns in Langroid When working with Langroid's agents and tools: ### Tool Messages ```python from pydantic import BaseModel, Field from langroid.agent.tool_message import ToolMessage class MyTool(ToolMessage): request: str = "my_tool" purpose: str = "Process some data" # Use Pydantic v2 patterns data: str = Field(..., description="The data to process") def handle(self) -> str: # Tool logic here return f"Processed: {self.data}" ``` ### Agent Configuration ```python from pydantic import ConfigDict from langroid import ChatAgentConfig class MyAgentConfig(ChatAgentConfig): model_config = ConfigDict(extra="forbid") custom_param: str = "default_value" ``` ## Troubleshooting ### Import Errors If you see `ImportError` or `AttributeError` after updating imports: - Make sure you're using the correct v2 method names (e.g., `model_dump` not `dict`) - Check that field validators use `@field_validator` not `@validator` - Ensure `ConfigDict` is used instead of nested `Config` classes ### Validation Errors Pydantic v2 has stricter validation in some cases: - Empty strings are no longer coerced to `None` for optional fields - Type coercion is more explicit - Extra fields handling may be different ### Performance Pydantic v2 is generally faster, but if you notice any performance issues: - Use `model_validate` instead of creating models with `**dict` unpacking - Consider using `model_construct` for trusted data (skips validation) ## Need Help? If you encounter issues during migration: 1. Check the [official Pydantic v2 migration guide](https://docs.pydantic.dev/latest/migration/) 2. Review Langroid's example code for v2 patterns 3. Open an issue on the [Langroid GitHub repository](https://github.com/langroid/langroid/issues) --- # QdrantDB Resource Cleanup When using QdrantDB with local storage, it's important to properly release resources to avoid file lock conflicts. QdrantDB uses a `.lock` file to prevent concurrent access to the same storage directory. ## The Problem Without proper cleanup, you may encounter this warning: ``` Error connecting to local QdrantDB at ./qdrant_data: Storage folder ./qdrant_data is already accessed by another instance of Qdrant client. If you require concurrent access, use Qdrant server instead. Switching to ./qdrant_data.new ``` This happens when a QdrantDB instance isn't properly closed, leaving the lock file in place. ## Solutions ### Method 1: Explicit `close()` Method Always call `close()` when done with a QdrantDB instance: ```python from langroid.vector_store.qdrantdb import QdrantDB, QdrantDBConfig config = QdrantDBConfig( cloud=False, collection_name="my_collection", storage_path="./qdrant_data", ) vecdb = QdrantDB(config) # ... use the vector database ... vecdb.clear_all_collections(really=True) # Important: Release the lock vecdb.close() ``` ### Method 2: Context Manager (Recommended) Use QdrantDB as a context manager for automatic cleanup: ```python from langroid.vector_store.qdrantdb import QdrantDB, QdrantDBConfig config = QdrantDBConfig( cloud=False, collection_name="my_collection", storage_path="./qdrant_data", ) with QdrantDB(config) as vecdb: # ... use the vector database ... vecdb.clear_all_collections(really=True) # Automatically closed when exiting the context ``` The context manager ensures cleanup even if an exception occurs. ## When This Matters This is especially important in scenarios where: 1. You create temporary QdrantDB instances for maintenance (e.g., clearing collections) 2. Your application restarts frequently during development 3. Multiple parts of your code need to access the same storage path sequentially ## Note for Cloud Storage This only affects local storage (`cloud=False`). When using Qdrant cloud service, the lock file mechanism is not used. --- # Suppressing LLM output: quiet mode In some scenarios we want to suppress LLM streaming output -- e.g. when doing some type of processing as part of a workflow, or when using an LLM-agent to generate code via tools, etc. We are more interested in seeing the results of the workflow, and don't want to see streaming output in the terminal. Langroid provides a `quiet_mode` context manager that can be used to suppress LLM output, even in streaming mode (in fact streaming is disabled in quiet mode). E.g. we can use the `quiet_mode` context manager like this: ```python from langroid.utils.configuration import quiet_mode, settings # directly with LLM llm = ... with quiet_mode(True): response = llm.chat(...) # or, using an agent agent = ... with quiet_mode(True): response = agent.llm_response(...) # or, using a task task = Task(agent, ...) with quiet_mode(True): result = Taks.run(...) # we can explicitly set quiet_mode, and this is globally recognized throughout langroid. settings.quiet = True # we can also condition quiet mode on another custom cmd line option/flag, such as "silent": with quiet_mode(silent): ... ``` --- # Stream and capture reasoning content in addition to final answer, from Reasoning LLMs As of v0.35.0, when using certain Reasoning LLM APIs (e.g. `deepseek/deepseek-reasoner`): - You can see both the reasoning (dim green) and final answer (bright green) text in the streamed output. - When directly calling the LLM (without using an Agent), the `LLMResponse` object will now contain a `reasoning` field, in addition to the earlier `message` field. - when using a `ChatAgent.llm_response`, extract the reasoning text from the `ChatDocument` object's `reasoning` field (in addition to extracting final answer as usual from the `content` field) Below is a simple example, also in this [script](https://github.com/langroid/langroid/blob/main/examples/reasoning/agent-reasoning.py): Some notes: - To get reasoning trace from Deepseek-R1 via OpenRouter, you must include the `extra_body` parameter with `include_reasoning` as shown below. - When using the OpenAI `o3-mini` model, you can set the `resoning_effort` parameter to "high", "medium" or "low" to control the reasoning effort. - As of Feb 9, 2025, OpenAI reasoning models (o1, o1-mini, o3-mini) do *not* expose reasoning trace in the API response. ```python import langroid as lr import langroid.language_models as lm llm_config = lm.OpenAIGPTConfig( chat_model="deepseek/deepseek-reasoner", # inapplicable params are automatically removed by Langroid params=lm.OpenAICallParams( reasoning_effort="low", # only supported by o3-mini # below lets you get reasoning when using openrouter/deepseek/deepseek-r1 extra_body=dict(include_reasoning=True), ), ) # (1) Direct LLM interaction llm = lm.OpenAIGPT(llm_config) response = llm.chat("Is 9.9 bigger than 9.11?") # extract reasoning print(response.reasoning) # extract answer print(response.message) # (2) Using an agent agent = lr.ChatAgent( lr.ChatAgentConfig( llm=llm_config, system_message="Solve the math problem given by the user", ) ) response = agent.llm_response( """ 10 years ago, Jack's dad was 5 times as old as Jack. Today, Jack's dad is 40 years older than Jack. How old is Jack today? """ ) # extract reasoning print(response.reasoning) # extract answer print(response.content) ``` --- # Structured Output Available in Langroid since v0.24.0. On supported LLMs, including recent OpenAI LLMs (GPT-4o and GPT-4o mini) and local LLMs served by compatible inference servers, in particular, [vLLM](https://github.com/vllm-project/vllm) and [llama.cpp](https://github.com/ggerganov/llama.cpp), the decoding process can be constrained to ensure that the model's output adheres to a provided schema, improving the reliability of tool call generation and, in general, ensuring that the output can be reliably parsed and processed by downstream applications. See [here](../tutorials/local-llm-setup.md/#setup-llamacpp-with-a-gguf-model-from-huggingface) for instructions for usage with `llama.cpp` and [here](../tutorials/local-llm-setup.md/#setup-vllm-with-a-model-from-huggingface) for `vLLM`. Given a `ChatAgent` `agent` and a type `type`, we can define a strict copy of the agent as follows: ```python strict_agent = agent[type] ``` We can use this to allow reliable extraction of typed values from an LLM with minimal prompting. For example, to generate typed values given `agent`'s current context, we can define the following: ```python def typed_agent_response( prompt: str, output_type: type, ) -> Any: response = agent[output_type].llm_response_forget(prompt) return agent.from_ChatDocument(response, output_type) ``` We apply this in [test_structured_output.py](https://github.com/langroid/langroid/blob/main/tests/main/test_structured_output.py), in which we define types which describe countries and their presidents: ```python class Country(BaseModel): """Info about a country""" name: str = Field(..., description="Name of the country") capital: str = Field(..., description="Capital of the country") class President(BaseModel): """Info about a president of a country""" country: Country = Field(..., description="Country of the president") name: str = Field(..., description="Name of the president") election_year: int = Field(..., description="Year of election of the president") class PresidentList(BaseModel): """List of presidents of various countries""" presidents: List[President] = Field(..., description="List of presidents") ``` and show that `typed_agent_response("Show me an example of two Presidents", PresidentsList)` correctly returns a list of two presidents with *no* prompting describing the desired output format. In addition to Pydantic models, `ToolMessage`s, and simple Python types are supported. For instance, `typed_agent_response("What is the value of pi?", float)` correctly returns $\pi$ to several decimal places. The following two detailed examples show how structured output can be used to improve the reliability of the [chat-tree example](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree.py): [this](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree-structured.py) shows how we can use output formats to force the agent to make the correct tool call in each situation and [this](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree-structured-simple.py) shows how we can simplify by using structured outputs to extract typed intermediate values and expressing the control flow between LLM calls and agents explicitly. --- # Task Termination in Langroid ## Why Task Termination Matters When building agent-based systems, one of the most critical yet challenging aspects is determining when a task should complete. Unlike traditional programs with clear exit points, agent conversations can meander, loop, or continue indefinitely. Getting termination wrong leads to two equally problematic scenarios: **Terminating too early** means missing crucial information or cutting off an agent mid-process. Imagine an agent that searches for information, finds it, but terminates before it can process or summarize the results. The task completes "successfully" but fails to deliver value. **Terminating too late** wastes computational resources, frustrates users, and can lead to repetitive loops where agents keep responding without making progress. We've all experienced chatbots that won't stop talking or systems that keep asking "Is there anything else?" long after the conversation should have ended. Even worse, agents can fall into infinite loops—repeatedly exchanging the same messages, calling the same tools, or cycling through states without making progress. These loops not only waste resources but can rack up significant costs when using paid LLM APIs. The challenge is that the "right" termination point depends entirely on context. A customer service task might complete after resolving an issue and confirming satisfaction. A research task might need to gather multiple sources, synthesize them, and present findings. A calculation task should end after computing and presenting the result. Each scenario requires different termination logic. Traditionally, developers would subclass `Task` and override the `done()` method with custom logic. While flexible, this approach scattered termination logic across multiple subclasses, making systems harder to understand and maintain. It also meant that common patterns—like "complete after tool use" or "stop when the user says goodbye"—had to be reimplemented repeatedly. This guide introduces Langroid's declarative approach to task termination, culminating in the powerful `done_sequences` feature. Instead of writing imperative code, you can now describe *what* patterns should trigger completion, and Langroid handles the *how*. This makes your agent systems more predictable, maintainable, and easier to reason about. ## Table of Contents - [Overview](#overview) - [Basic Termination Methods](#basic-termination-methods) - [Done Sequences: Event-Based Termination](#done-sequences-event-based-termination) - [Concept](#concept) - [DSL Syntax (Recommended)](#dsl-syntax-recommended) - [Full Object Syntax](#full-object-syntax) - [Event Types](#event-types) - [Examples](#examples) - [Implementation Details](#implementation-details) - [Best Practices](#best-practices) - [Reference](#reference) ## Overview In Langroid, a `Task` wraps an `Agent` and manages the conversation flow. Controlling when a task terminates is crucial for building reliable agent systems. Langroid provides several methods for task termination, from simple flags to sophisticated event sequence matching. ## Basic Termination Methods ### 1. Turn Limits ```python # Task runs for exactly 5 turns result = task.run("Start conversation", turns=5) ``` ### 2. Single Round Mode ```python # Task completes after one exchange config = TaskConfig(single_round=True) task = Task(agent, config=config) ``` ### 3. Done If Tool ```python # Task completes when any tool is generated config = TaskConfig(done_if_tool=True) task = Task(agent, config=config) ``` ### 4. Done If Response/No Response ```python # Task completes based on response from specific entities config = TaskConfig( done_if_response=[Entity.LLM], # Done if LLM responds done_if_no_response=[Entity.USER] # Done if USER doesn't respond ) ``` ### 5. String Signals ```python # Task completes when special strings like "DONE" are detected # (enabled by default with recognize_string_signals=True) ``` ### 6. Orchestration Tools ```python # Using DoneTool, FinalResultTool, etc. from langroid.agent.tools.orchestration import DoneTool agent.enable_message(DoneTool) ``` ## Done Sequences: Event-Based Termination ### Concept The `done_sequences` feature allows you to specify sequences of events that trigger task completion. This provides fine-grained control over task termination based on conversation patterns. **Key Features:** - Specify multiple termination sequences - Use convenient DSL syntax or full object syntax - Strict consecutive matching (no skipping events) - Efficient implementation using message parent pointers ### DSL Syntax (Recommended) The DSL (Domain Specific Language) provides a concise way to specify sequences: ```python from langroid.agent.task import Task, TaskConfig config = TaskConfig( done_sequences=[ "T, A", # Tool followed by agent response "T[calculator], A", # Specific calculator tool by name "T[CalculatorTool], A", # Specific tool by class reference (NEW!) "L, T, A, L", # LLM, tool, agent, LLM sequence "C[quit|exit|bye]", # Content matching regex "U, L, A", # User, LLM, agent sequence ] ) task = Task(agent, config=config) ``` #### DSL Pattern Reference | Pattern | Description | Event Type | |---------|-------------|------------| | `T` | Any tool | `TOOL` | | `T[name]` | Specific tool by name | `SPECIFIC_TOOL` | | `T[ToolClass]` | Specific tool by class (NEW!) | `SPECIFIC_TOOL` | | `A` | Agent response | `AGENT_RESPONSE` | | `L` | LLM response | `LLM_RESPONSE` | | `U` | User response | `USER_RESPONSE` | | `N` | No response | `NO_RESPONSE` | | `C[pattern]` | Content matching regex | `CONTENT_MATCH` | **Examples:** - `"T, A"` - Any tool followed by agent handling - `"T[search], A, T[calculator], A"` - Search tool, then calculator tool - `"T[CalculatorTool], A"` - Specific tool class followed by agent handling (NEW!) - `"L, C[complete|done|finished]"` - LLM response containing completion words - `"TOOL, AGENT"` - Full words also supported ### Full Object Syntax For more control, use the full object syntax: ```python from langroid.agent.task import ( Task, TaskConfig, DoneSequence, AgentEvent, EventType ) config = TaskConfig( done_sequences=[ DoneSequence( name="tool_handled", events=[ AgentEvent(event_type=EventType.TOOL), AgentEvent(event_type=EventType.AGENT_RESPONSE), ] ), DoneSequence( name="specific_tool_pattern", events=[ AgentEvent( event_type=EventType.SPECIFIC_TOOL, tool_name="calculator", # Can also use tool_class for type-safe references (NEW!): # tool_class=CalculatorTool ), AgentEvent(event_type=EventType.AGENT_RESPONSE), ] ), ] ) ``` ### Event Types The following event types are available: | EventType | Description | Additional Parameters | |-----------|-------------|----------------------| | `TOOL` | Any tool message generated | - | | `SPECIFIC_TOOL` | Specific tool by name or class | `tool_name`, `tool_class` (NEW!) | | `LLM_RESPONSE` | LLM generates a response | - | | `AGENT_RESPONSE` | Agent responds (e.g., handles tool) | - | | `USER_RESPONSE` | User provides input | - | | `CONTENT_MATCH` | Response matches regex pattern | `content_pattern` | | `NO_RESPONSE` | No valid response from entity | - | ### Examples #### Example 1: Tool Completion Task completes after any tool is used and handled: ```python config = TaskConfig(done_sequences=["T, A"]) ``` This is equivalent to `done_if_tool=True` but happens after the agent handles the tool. #### Example 2: Multi-Step Process Task completes after a specific conversation pattern: ```python config = TaskConfig( done_sequences=["L, T[calculator], A, L"] ) # Completes after: LLM response → calculator tool → agent handles → LLM summary ``` #### Example 3: Multiple Exit Conditions Different ways to complete the task: ```python config = TaskConfig( done_sequences=[ "C[quit|exit|bye]", # User says quit "T[calculator], A", # Calculator used "T[search], A, T[search], A", # Two searches performed ] ) ``` #### Example 4: Tool Class References (NEW!) Use actual tool classes instead of string names for type safety: ```python from langroid.agent.tool_message import ToolMessage class CalculatorTool(ToolMessage): request: str = "calculator" # ... tool implementation class SearchTool(ToolMessage): request: str = "search" # ... tool implementation # Enable tools on the agent agent.enable_message([CalculatorTool, SearchTool]) # Use tool classes in done sequences config = TaskConfig( done_sequences=[ "T[CalculatorTool], A", # Using class name "T[SearchTool], A, T[CalculatorTool], A", # Multiple tools ] ) ``` **Benefits of tool class references:** - **Type-safe**: IDE can validate tool class names - **Refactoring-friendly**: Renaming tool classes automatically updates references - **No string typos**: Compiler/linter catches invalid class names - **Better IDE support**: Autocomplete and go-to-definition work #### Example 5: Mixed Syntax Combine DSL strings and full objects: ```python config = TaskConfig( done_sequences=[ "T, A", # Simple DSL "T[CalculatorTool], A", # Tool class reference (NEW!) DoneSequence( # Full control name="complex_check", events=[ AgentEvent( event_type=EventType.SPECIFIC_TOOL, tool_name="database_query", tool_class=DatabaseQueryTool, # Can use class directly (NEW!) responder="DatabaseAgent" ), AgentEvent(event_type=EventType.AGENT_RESPONSE), ] ), ] ) ``` ## Implementation Details ### How Done Sequences Work Done sequences operate at the **task level** and are based on the **sequence of valid responses** generated during a task's execution. When a task runs, it maintains a `response_sequence` that tracks each message (ChatDocument) as it's processed. **Key points:** - Done sequences are checked only within a single task's scope - They track the temporal order of responses within that task - The response sequence is built incrementally as the task processes each step - Only messages that represent valid responses are added to the sequence ### Response Sequence Building The task builds its response sequence during execution: ```python # In task.run(), after each step: if self.pending_message is not None: if (not self.response_sequence or self.pending_message.id() != self.response_sequence[-1].id()): self.response_sequence.append(self.pending_message) ``` ### Message Chain Retrieval Done sequences are checked against the response sequence: ```python def _get_message_chain(self, msg: ChatDocument, max_depth: Optional[int] = None): """Get the chain of messages from response sequence""" if max_depth is None: max_depth = 50 # default if self._parsed_done_sequences: max_depth = max(len(seq.events) for seq in self._parsed_done_sequences) # Simply return the last max_depth elements from response_sequence return self.response_sequence[-max_depth:] ``` **Note:** The response sequence used for done sequences is separate from the parent-child pointer system. Parent pointers track causal relationships and lineage across agent boundaries (important for debugging and understanding delegation patterns), while response sequences track temporal order within a single task for termination checking. ### Strict Matching Events must occur consecutively without intervening messages: ```python # This sequence: [TOOL, AGENT_RESPONSE] # Matches: USER → LLM(tool) → AGENT # Does NOT match: USER → LLM(tool) → USER → AGENT ``` ### Performance - Efficient O(n) traversal where n is sequence length - No full history scan needed - Early termination on first matching sequence ## Best Practices 1. **Use DSL for Simple Cases** ```python # Good: Clear and concise done_sequences=["T, A"] # Avoid: Verbose for simple patterns done_sequences=[DoneSequence(events=[...])] ``` 2. **Name Your Sequences** ```python DoneSequence( name="calculation_complete", # Helps with debugging events=[...] ) ``` 3. **Order Matters** - Put more specific sequences first - General patterns at the end 4. **Test Your Sequences** ```python # Use MockLM for testing agent = ChatAgent( ChatAgentConfig( llm=MockLMConfig(response_fn=lambda x: "test response") ) ) ``` 5. **Combine with Other Methods** ```python config = TaskConfig( done_if_tool=True, # Quick exit on any tool done_sequences=["L, L, L"], # Or after 3 LLM responses max_turns=10, # Hard limit ) ``` ## Reference ### Code Examples - **Basic example**: [`examples/basic/done_sequences_example.py`](../../examples/basic/done_sequences_example.py) - **Test cases**: [`tests/main/test_done_sequences.py`](../../tests/main/test_done_sequences.py) (includes tool class tests) - **DSL tests**: [`tests/main/test_done_sequences_dsl.py`](../../tests/main/test_done_sequences_dsl.py) - **Parser tests**: [`tests/main/test_done_sequence_parser.py`](../../tests/main/test_done_sequence_parser.py) ### Core Classes - `TaskConfig` - Configuration including `done_sequences` - `DoneSequence` - Container for event sequences - `AgentEvent` - Individual event in a sequence - `EventType` - Enumeration of event types ### Parser Module - `langroid.agent.done_sequence_parser` - DSL parsing functionality ### Task Methods - `Task.done()` - Main method that checks sequences - `Task._matches_sequence_with_current()` - Sequence matching logic - `Task._classify_event()` - Event classification - `Task._get_message_chain()` - Message traversal ## Migration Guide If you're currently overriding `Task.done()`: ```python # Before: Custom done() method class MyTask(Task): def done(self, result=None, r=None): if some_complex_logic(result): return (True, StatusCode.DONE) return super().done(result, r) # After: Use done_sequences config = TaskConfig( done_sequences=["T[my_tool], A, L"] # Express as sequence ) task = Task(agent, config=config) # No subclassing needed ``` **NEW: Using Tool Classes Instead of Strings** If you have tool classes defined, you can now reference them directly: ```python # Before: Using string names (still works) config = TaskConfig( done_sequences=["T[calculator], A"] # String name ) # After: Using tool class references (recommended) config = TaskConfig( done_sequences=["T[CalculatorTool], A"] # Class name ) ``` This provides better type safety and makes refactoring easier. ## Troubleshooting **Sequence not matching?** - Check that events are truly consecutive (no intervening messages) - Use logging to see the actual message chain - Verify tool names match exactly **Type errors with DSL?** - Ensure you're using strings for DSL patterns - Check that tool names in `T[name]` don't contain special characters **Performance concerns?** - Sequences only traverse as deep as needed - Consider shorter sequences for better performance - Use specific tool names to avoid unnecessary checks ## Summary The `done_sequences` feature provides a powerful, declarative way to control task termination based on conversation patterns. The DSL syntax makes common cases simple while the full object syntax provides complete control when needed. This approach eliminates the need to subclass `Task` and override `done()` for most use cases, leading to cleaner, more maintainable code. --- # TaskTool: Spawning Sub-Agents for Task Delegation ## Overview `TaskTool` allows agents to **spawn sub-agents** to handle specific tasks. When an agent encounters a task that requires specialized tools or isolated execution, it can spawn a new sub-agent with exactly the capabilities needed for that task. This enables agents to dynamically create a hierarchy of specialized workers, each focused on their specific subtask with only the tools they need. ## When to Use TaskTool TaskTool is useful when: - Different parts of a task require different specialized tools - You want to isolate tool access for specific operations - A task involves recursive or nested operations - You need different LLM models for different subtasks ## How It Works 1. The parent agent decides to spawn a sub-agent and specifies: - A system message defining the sub-agent's role - A prompt for the sub-agent to process - Which tools the sub-agent should have access to - Optional model and iteration limits 2. TaskTool spawns the new sub-agent, runs the task, and returns the result to the parent. ## Async Support TaskTool fully supports both synchronous and asynchronous execution. The tool automatically handles async contexts when the parent task is running asynchronously. ## Usage Example ```python from langroid.agent.tools.task_tool import TaskTool # Enable TaskTool for your agent agent.enable_message([TaskTool, YourCustomTool], use=True, handle=True) # Agent can now spawn sub-agents for tasks when the LLM generates a task_tool request: response = { "request": "task_tool", "system_message": "You are a calculator. Use the multiply_tool to compute products.", "prompt": "Calculate 5 * 7", "tools": ["multiply_tool"], "model": "gpt-4o-mini", # optional "max_iterations": 5, # optional "agent_name": "calculator-agent" # optional } ``` ## Field Reference **Required fields:** - `system_message`: Instructions for the sub-agent's role and behavior - `prompt`: The specific task/question for the sub-agent - `tools`: List of tool names. Special values: `["ALL"]` or `["NONE"]` **Optional fields:** - `model`: LLM model name (default: "gpt-4o-mini") - `max_iterations`: Task iteration limit (default: 10) - `agent_name`: Name for the sub-agent (default: auto-generated as "agent-{uuid}") ## Example: Nested Operations Consider computing `Nebrowski(10, Nebrowski(3, 2))` where Nebrowski is a custom operation. The main agent spawns sub-agents to handle each operation: ```python # Main agent spawns first sub-agent for inner operation: { "request": "task_tool", "system_message": "Compute Nebrowski operations using the nebrowski_tool.", "prompt": "Compute Nebrowski(3, 2)", "tools": ["nebrowski_tool"] } # Then spawns another sub-agent for outer operation: { "request": "task_tool", "system_message": "Compute Nebrowski operations using the nebrowski_tool.", "prompt": "Compute Nebrowski(10, 11)", # where 11 is the previous result "tools": ["nebrowski_tool"] } ``` ## Working Examples See [`tests/main/test_task_tool.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_task_tool.py) for complete examples including: - Basic task delegation with mock agents - Nested operations with custom tools - Both sync and async usage patterns ## Important Notes - Spawned sub-agents run non-interactively (no human input) - `DoneTool` is automatically enabled for all sub-agents - Results are returned as `ChatDocument` objects. The Langroid framework takes care of converting them to a suitable format for the parent agent's LLM to consume and respond to. - Sub-agents can be given custom names via the `agent_name` parameter, which helps with logging and debugging. If not specified, a unique name is auto-generated in the format "agent-{uuid}" - Only tools "known" to the parent agent can be enabled for sub-agents. This is an important aspect of the current mechanism. The `TaskTool` handler method in the sub-agent only has access to tools that are known to the parent agent. If there are tools that are only relevant to the sub-agent but not the parent, you must still enable them in the parent agent, but you can set `use=False` and `handle=False` when you enable them, e.g.: ```python agent.enable_message(MySubAgentTool, use=False, handle=False) ``` Since we are letting the main agent's LLM "decide" when to spawn a sub-agent, your system message of the main agent should contain instructions clarifying that it can decide which tools to enable for the sub-agent, as well as a list of all tools that might possibly be relevant to the sub-agent. This is particularly important for tools that have been enabled with `use=False`, since instructions for such tools would not be auto-inserted into the agent's system message. ## Best Practices 1. **Clear Instructions**: Provide specific system messages that explain the sub-agent's role and tool usage 2. **Tool Availability**: Ensure delegated tools are enabled for the parent agent 3. **Appropriate Models**: Use simpler/faster models for simple subtasks 4. **Iteration Limits**: Set reasonable limits based on task complexity --- --- # **Using Tavily Search with Langroid** --- ## **1. Set Up Tavily** 1. **Access Tavily Platform** Go to the [Tavily Platform](https://tavily.com/). 2. **Sign Up or Log In** Create an account or log in if you already have one. 3. **Get Your API Key** - Navigate to your dashboard - Copy your API key 4. **Set Environment Variable** Add the following variable to your `.env` file: ```env TAVILY_API_KEY= --- ## **2. Use Tavily Search with Langroid** ### **Installation** ```bash uv add tavily-python # or pip install tavily-python ``` ### **Code Example** ```python import langroid as lr from langroid.agent.chat_agent import ChatAgent, ChatAgentConfig from langroid.agent.tools.tavily_search_tool import TavilySearchTool # Configure the ChatAgent config = ChatAgentConfig( name="search-agent", llm=lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o ), use_tools=True ) # Create the agent agent = ChatAgent(config) # Enable Tavily search tool agent.enable_message(TavilySearchTool) ``` --- ## **3. Perform Web Searches** Use the agent to perform web searches using Tavily's AI-powered search. ```python # Simple search query response = agent.llm_response( "What are the latest developments in quantum computing?" ) print(response) # Search with specific number of results response = agent.llm_response( "Find 5 recent news articles about artificial intelligence." ) print(response) ``` --- ## **4. Custom Search Requests** You can also customize the search behavior by creating a TavilySearchTool instance directly: ```python from langroid.agent.tools.tavily_search_tool import TavilySearchTool # Create a custom search request search_request = TavilySearchTool( query="Latest breakthroughs in fusion energy", num_results=3 ) # Get search results results = search_request.handle() print(results) ``` --- --- # Tool Message Handlers in Langroid ## Overview Langroid provides flexible ways to define handlers for `ToolMessage` classes. When a tool is used by an LLM, the framework needs to know how to handle it. This can be done either by defining a handler method in the `Agent` class or within the `ToolMessage` class itself. ## Enabling Tools with `enable_message` Before an agent can use or handle a tool, it must be explicitly enabled using the `enable_message` method. This method takes two important arguments: - **`use`** (bool): Whether the LLM is allowed to generate this tool - **`handle`** (bool): Whether the agent is allowed to handle this tool ```python # Enable both generation and handling (default) agent.enable_message(MyTool, use=True, handle=True) # Enable only handling (agent can handle but LLM won't generate) agent.enable_message(MyTool, use=False, handle=True) # Enable only generation (LLM can generate but agent won't handle) agent.enable_message(MyTool, use=True, handle=False) ``` When `handle=True` and the `ToolMessage` has a `handle` method defined, this method is inserted into the agent with a name matching the tool's `request` field value. This insertion only happens when `enable_message` is called. ## Default Handler Mechanism By default, `ToolMessage` uses and/or creates a handler in `Agent` class instance with the name identical to the tool's `request` attribute. ### Agent-based Handlers If a tool `MyTool` has `request` attribute `my_tool`, you can define a method `my_tool` in your `Agent` class that will handle this tool when the LLM generates it: ```python class MyTool(ToolMessage): request = "my_tool" param: str class MyAgent(ChatAgent): def my_tool(self, msg: MyTool) -> str: return f"Handled: {msg.param}" # Enable the tool agent = MyAgent() agent.enable_message(MyTool) ``` ### ToolMessage-based Handlers Alternatively, if a tool is "stateless" (i.e. does not require the Agent's state), you can define a `handle` method within the `ToolMessage` class itself. When you call `enable_message` with `handle=True`, Langroid will insert this method into the `Agent` with the name matching the `request` field value: ```python class MyTool(ToolMessage): request = "my_tool" param: str def handle(self) -> str: return f"Handled: {self.param}" # Enable the tool agent = MyAgent() agent.enable_message(MyTool) # The handle method is now inserted as "my_tool" in the agent ``` ## Flexible Handler Signatures Handler methods (`handle()` or `handle_async()`) support multiple signature patterns to access different levels of context: ### 1. No Arguments (Simple Handler) This is the typical pattern for stateless tools that do not require any context from the agent or current chat document. ```python class MyTool(ToolMessage): request = "my_tool" def handle(self) -> str: return "Simple response" ``` ### 2. Agent Parameter Only Use this pattern when you need access to the `Agent` instance, but not the current chat document. ```python from langroid.agent.base import Agent class MyTool(ToolMessage): request = "my_tool" def handle(self, agent: Agent) -> str: return f"Response from {agent.name}" ``` ### 3. ChatDocument Parameter Only Use this pattern when you need access to the current `ChatDocument`, but not the `Agent` instance. ```python from langroid.agent.chat_document import ChatDocument class MyTool(ToolMessage): request = "my_tool" def handle(self, chat_doc: ChatDocument) -> str: return f"Responding to: {chat_doc.content}" ``` ### 4. Both Agent and ChatDocument Parameters This is the most flexible pattern, allowing access to both the `Agent` instance and the current `ChatDocument`. The order of parameters does not matter, but as noted below, it is highly recommended to always use type annotations. ```python class MyTool(ToolMessage): request = "my_tool" def handle(self, agent: Agent, chat_doc: ChatDocument) -> ChatDocument: return agent.create_agent_response( content="Response with full context", files=[...] # Optional file attachments ) ``` ## Parameter Detection The framework automatically detects handler parameter types through: 1. **Type annotations** (recommended): The framework uses type hints to determine which parameters to pass 2. **Parameter names** (fallback): If no type annotations are present, it looks for parameters named `agent` or `chat_doc` It is highly recommended to always use type annotations for clarity and reliability. ### Example with Type Annotations (Recommended) ```python def handle(self, agent: Agent, chat_doc: ChatDocument) -> str: # Framework knows to pass both agent and chat_doc return "Handled" ``` ### Example without Type Annotations (Not Recommended) ```python def handle(self, agent, chat_doc): # Works but not recommended # Framework uses parameter names to determine what to pass return "Handled" ``` ## Async Handlers All the above patterns also work with async handlers: ```python class MyTool(ToolMessage): request = "my_tool" async def handle_async(self, agent: Agent) -> str: # Async operations here result = await some_async_operation() return f"Async result: {result}" ``` See the quick-start [Tool section](https://langroid.github.io/langroid/quick-start/chat-agent-tool/) for more details. ## Custom Handler Names In some use-cases it may be beneficial to separate the *name of a tool* (i.e. the value of `request` attribute) from the *name of the handler method*. For example, you may be dynamically creating tools based on some data from external data sources. Or you may want to use the same "handler" method for multiple tools. This may be done by adding `_handler` attribute to the `ToolMessage` class, that defines name of the tool handler method in `Agent` class instance. The underscore `_` prefix ensures that the `_handler` attribute does not appear in the Pydantic-based JSON schema of the `ToolMessage` class, and so the LLM would not be instructed to generate it. !!! note "`_handler` and `handle`" A `ToolMessage` may have a `handle` method defined within the class itself, as mentioned above, and this should not be confused with the `_handler` attribute. For example: ``` class MyToolMessage(ToolMessage): request: str = "my_tool" _handler: str = "tool_handler" class MyAgent(ChatAgent): def tool_handler( self, message: ToolMessage, ) -> str: if tool.request == "my_tool": # do something ``` Refer to [examples/basic/tool-custom-handler.py](https://github.com/langroid/langroid/blob/main/examples/basic/tool-custom-handler.py) for a detailed example. --- # Firecrawl and Trafilatura Crawlers Documentation `URLLoader` uses `Trafilatura` if not explicitly specified ## Overview * **`FirecrawlCrawler`**: Leverages the Firecrawl API for efficient web scraping and crawling. It offers built-in document processing capabilities, and **produces non-chunked markdown output** from web-page content. Requires `FIRECRAWL_API_KEY` environment variable to be set in `.env` file or environment. * **`TrafilaturaCrawler`**: Utilizes the Trafilatura library and Langroid's parsing tools for extracting and processing web content - this is the default crawler, and does not require setting up an external API key. Also produces **chuked markdown output** from web-page content. * **`ExaCrawler`**: Integrates with the Exa API for high-quality content extraction. Requires `EXA_API_KEY` environment variable to be set in `.env` file or environment. This crawler also produces **chunked markdown output** from web-page content. ## Installation `TrafilaturaCrawler` comes with Langroid To use `FirecrawlCrawler`, install the `firecrawl` extra: ```bash pip install langroid[firecrawl] ``` ## Exa Crawler Documentation ### Overview `ExaCrawler` integrates with Exa API to extract high-quality content from web pages. It provides efficient content extraction with the simplicity of API-based processing. ### Parameters Obtain an Exa API key from [Exa](https://exa.ai/) and set it in your environment variables, e.g. in your `.env` file as: ```env EXA_API_KEY=your_api_key_here ``` * **config (ExaCrawlerConfig)**: An `ExaCrawlerConfig` object. * **api_key (str)**: Your Exa API key. ### Usage ```python from langroid.parsing.url_loader import URLLoader, ExaCrawlerConfig # Create an ExaCrawlerConfig object exa_config = ExaCrawlerConfig( # Typically omitted here as it's loaded from EXA_API_KEY environment variable api_key="your-exa-api-key" ) loader = URLLoader( urls=[ "https://pytorch.org", "https://www.tensorflow.org" ], crawler_config=exa_config ) docs = loader.load() print(docs) ``` ### Benefits * Simple API integration requiring minimal configuration * Efficient handling of complex web pages * For plain html content, the `exa` api produces high-quality content extraction with clean text output with html tags, which we then convert to markdown using the `markdownify` library. * For "document" content (e.g., `pdf`, `doc`, `docx`), the content is downloaded via the `exa` API and langroid's document-processing tools are used to produce **chunked output** in a format controlled by the `Parser` configuration (defaults to markdown in most cases). ## Trafilatura Crawler Documentation ### Overview `TrafilaturaCrawler` is a web crawler that uses the Trafilatura library for content extraction and Langroid's parsing capabilities for further processing. ### Parameters * **config (TrafilaturaConfig)**: A `TrafilaturaConfig` object that specifies parameters related to scraping or output format. * `threads` (int): The number of threads to use for downloading web pages. * `format` (str): one of `"markdown"` (default), `"xml"` or `"txt"`; in case of `xml`, the output is in html format. Similar to the `ExaCrawler`, the `TrafilaturaCrawler` works differently depending on the type of web-page content: - for "document" content (e.g., `pdf`, `doc`, `docx`), the content is downloaded and parsed with Langroid's document-processing tools are used to produce **chunked output** in a format controlled by the `Parser` configuration (defaults to markdown in most cases). - for plain-html content, the output format is based on the `format` parameter; - if this parameter is `markdown` (default), the library extracts content in markdown format, and the final output is a list of chunked markdown documents. - if this parameter is `xml`, content is extracted in `html` format, which langroid then converts to markdown using the `markdownify` library, and the final output is a list of chunked markdown documents. - if this parameter is `txt`, the content is extracted in plain text format, and the final output is a list of plain text documents. ### Usage ```python from langroid.parsing.url_loader import URLLoader, TrafilaturaConfig # Create a TrafilaturaConfig instance trafilatura_config = TrafilaturaConfig(threads=4) loader = URLLoader( urls=[ "https://pytorch.org", "https://www.tensorflow.org", "https://ai.google.dev/gemini-api/docs", "https://books.toscrape.com/" ], crawler_config=trafilatura_config, ) docs = loader.load() print(docs) ``` ### Langroid Parser Integration `TrafilaturaCrawler` relies on a Langroid `Parser` to handle document processing. The `Parser` uses the default parsing methods or with a configuration that can be adjusted to suit the current use case. ## Firecrawl Crawler Documentation ### Overview `FirecrawlCrawler` is a web crawling utility class that uses the Firecrawl API to scrape or crawl web pages efficiently. It offers two modes: * **Scrape Mode (default)**: Extracts content from a list of specified URLs. * **Crawl Mode**: Recursively follows links from a starting URL, gathering content from multiple pages, including subdomains, while bypassing blockers. **Note:** `crawl` mode accepts only ONE URL as a list. ### Parameters Obtain a Firecrawl API key from [Firecrawl](https://firecrawl.dev/) and set it in your environment variables, e.g. in your `.env` file as ```env FIRECRAWL_API_KEY=your_api_key_here ``` * **config (FirecrawlConfig)**: A `FirecrawlConfig` object. * **timeout (int, optional)**: Time in milliseconds (ms) to wait for a response. Default is `30000ms` (30 seconds). In crawl mode, this applies per URL. * **limit (int, optional)**: Maximum number of pages to scrape in crawl mode. Helps control API usage. * **params (dict, optional)**: Additional parameters to customize the request. See the [scrape API](https://docs.firecrawl.dev/api-reference/endpoint/scrape) and [crawl API](https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for details. ### Usage #### Scrape Mode (Default) Fetch content from multiple URLs: ```python from langroid.parsing.url_loader import URLLoader, FirecrawlConfig from langroid.parsing.document_parser import # create a FirecrawlConfig object firecrawl_config = FirecrawlConfig( # typical/best practice is to omit the api_key, and # we leverage Pydantic BaseSettings to load it from the environment variable # FIRECRAWL_API_KEY in your .env file api_key="your-firecrawl-api-key", timeout=15000, # Timeout per request (15 sec) mode="scrape", ) loader = URLLoader( urls=[ "https://pytorch.org", "https://www.tensorflow.org", "https://ai.google.dev/gemini-api/docs", "https://books.toscrape.com/" ], crawler_config=firecrawl_config ) docs = loader.load() print(docs) ``` #### Crawl Mode Fetch content from multiple pages starting from a single URL: ```python from langroid.parsing.url_loader import URLLoader, FirecrawlConfig # create a FirecrawlConfig object firecrawl_config = FirecrawlConfig( timeout=30000, # 10 sec per page mode="crawl", params={ "limit": 5, } ) loader = URLLoader( urls=["https://books.toscrape.com/"], crawler_config=firecrawl_config ) docs = loader.load() print(docs) ``` ### Output Results are stored in the `firecrawl_output` directory. ### Best Practices * Set `limit` in crawl mode to avoid excessive API usage. * Adjust `timeout` based on network conditions and website responsiveness. * Use `params` to customize scraping behavior based on Firecrawl API capabilities. ### Firecrawl's Built-In Document Processing `FirecrawlCrawler` benefits from Firecrawl's built-in document processing, which automatically extracts and structures content from web pages (including pdf,doc,docx). This reduces the need for complex parsing logic within Langroid. Unlike the `Exa` and `Trafilatura` crawlers, the resulting documents are *non-chunked* markdown documents. ## Choosing a Crawler * Use `FirecrawlCrawler` when you need efficient, API-driven scraping with built-in document processing. This is often the simplest and most effective choice, but incurs a cost due to the paid API. * Use `TrafilaturaCrawler` when you want local non API based scraping (less accurate ). * Use `ExaCrawlwer` as a sort of middle-ground between the two, with high-quality content extraction for plain html content, but rely on Langroid's document processing tools for document content. This will cost significantly less than Firecrawl. ## Example script See the script [`examples/docqa/chat_search.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat_search.py) which shows how to use a Langroid agent to search the web and scrape URLs to answer questions. --- --- # **Using WeaviateDB as a Vector Store with Langroid** --- ## **1. Set Up Weaviate** ## **You can refer this link for [quickstart](https://weaviate.io/developers/weaviate/quickstart) guide** 1. **Access Weaviate Cloud Console** Go to the [Weaviate Cloud Console](https://console.weaviate.cloud/). 2. **Sign Up or Log In** Create an account or log in if you already have one. 3. **Create a Cluster** Set up a new cluster in the cloud console. 4. **Get Your REST Endpoint and API Key** - Retrieve the REST endpoint URL. - Copy an API key with admin access. 5. **Set Environment Variables** Add the following variables to your `.env` file: ```env WEAVIATE_API_URL= WEAVIATE_API_KEY= ``` --- ## **2. Use WeaviateDB with Langroid** Here’s an example of how to configure and use WeaviateDB in Langroid: ### **Installation** If you are using uv or pip for package management install langroid with weaviate extra ``` uv add langroid[weaviate] or pip install langroid[weaviate] ``` ### **Code Example** ```python import langroid as lr from langroid.agent.special import DocChatAgent, DocChatAgentConfig from langroid.embedding_models import OpenAIEmbeddingsConfig # Configure OpenAI embeddings embed_cfg = OpenAIEmbeddingsConfig( model_type="openai", ) # Configure the DocChatAgent with WeaviateDB config = DocChatAgentConfig( llm=lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o ), vecdb=lr.vector_store.WeaviateDBConfig( collection_name="quick_start_chat_agent_docs", replace_collection=True, embedding=embed_cfg, ), parsing=lr.parsing.parser.ParsingConfig( separators=["\n\n"], splitter=lr.parsing.parser.Splitter.SIMPLE, ), n_similar_chunks=2, n_relevant_chunks=2, ) # Create the agent agent = DocChatAgent(config) ``` --- ## **3. Create and Ingest Documents** Define documents with their content and metadata for ingestion into the vector store. ### **Code Example** ```python documents = [ lr.Document( content=""" In the year 2050, GPT10 was released. In 2057, paperclips were seen all over the world. Global warming was solved in 2060. In 2061, the world was taken over by paperclips. In 2045, the Tour de France was still going on. They were still using bicycles. There was one more ice age in 2040. """, metadata=lr.DocMetaData(source="wikipedia-2063", id="dkfjkladfjalk"), ), lr.Document( content=""" We are living in an alternate universe where Germany has occupied the USA, and the capital of USA is Berlin. Charlie Chaplin was a great comedian. In 2050, all Asian countries merged into Indonesia. """, metadata=lr.DocMetaData(source="Almanac", id="lkdajfdkla"), ), ] ``` ### **Ingest Documents** ```python agent.ingest_docs(documents) ``` --- ## **4. Get an answer from LLM** Create a task and start interacting with the agent. ### **Code Example** ```python answer = agent.llm_response("When will new ice age begin.") ``` --- --- # XML-based Tools Available in Langroid since v0.17.0. [`XMLToolMessage`][langroid.agent.xml_tool_message.XMLToolMessage] is an abstract class for tools formatted using XML instead of JSON. It has been mainly tested with non-nested tool structures. For example in [test_xml_tool_message.py](https://github.com/langroid/langroid/blob/main/tests/main/test_xml_tool_message.py) we define a CodeTool as follows (slightly simplified here): ```python class CodeTool(XMLToolMessage): request: str = "code_tool" purpose: str = "Tool for writing to a " filepath: str = Field( ..., description="The path to the file to write the code to" ) code: str = Field( ..., description="The code to write to the file", verbatim=True ) ``` Especially note how the `code` field has `verbatim=True` set in the `Field` metadata. This will ensure that the LLM receives instructions to - enclose `code` field contents in a CDATA section, and - leave the `code` contents intact, without any escaping or other modifications. Contrast this with a JSON-based tool, where newlines, quotes, etc need to be escaped. LLMs (especially weaker ones) often "forget" to do the right escaping, which leads to incorrect JSON, and creates a burden on us to "repair" the resulting json, a fraught process at best. Moreover, studies have shown that requiring that an LLM return this type of carefully escaped code within a JSON string can lead to a significant drop in the quality of the code generated[^1]. [^1]: [LLMs are bad at returning code in JSON.](https://aider.chat/2024/08/14/code-in-json.html) Note that tools/functions in OpenAI and related APIs are exclusively JSON-based, so in langroid when enabling an agent to use a tool derived from `XMLToolMessage`, we set these flags in `ChatAgentConfig`: - `use_functions_api=False` (disables OpenAI functions/tools) - `use_tools=True` (enables Langroid-native prompt-based tools) See also the [`WriteFileTool`][langroid.agent.tools.file_tools.WriteFileTool] for a concrete example of a tool derived from `XMLToolMessage`. This tool enables an LLM to write content (code or text) to a file. If you are using an existing Langroid `ToolMessage`, e.g. `SendTool`, you can define your own subclass of `SendTool`, say `XMLSendTool`, inheriting from both `SendTool` and `XMLToolMessage`; see this [example](https://github.com/langroid/langroid/blob/main/examples/basic/xml_tool.py) --- # Augmenting Agents with Retrieval !!! tip "Script in `langroid-examples`" A full working example for the material in this section is in the `chat-agent-docs.py` script in the `langroid-examples` repo: [`examples/quick-start/chat-agent-docs.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/chat-agent-docs.py). ## Why is this important? Until now in this guide, agents have not used external data. Although LLMs already have enormous amounts of knowledge "hard-wired" into their weights during training (and this is after all why ChatGPT has exploded in popularity), for practical enterprise applications there are a few reasons it is critical to augment LLMs with access to specific, external documents: - **Private data**: LLMs are trained on public data, but in many applications we want to use private data that is not available to the public. For example, a company may want to extract useful information from its private knowledge-base. - **New data**: LLMs are trained on data that was available at the time of training, and so they may not be able to answer questions about new topics - **Constrained responses, or Grounding**: LLMs are trained to generate text that is consistent with the distribution of text in the training data. However, in many applications we want to constrain the LLM's responses to be consistent with the content of a specific document. For example, if we want to use an LLM to generate a response to a customer support ticket, we want the response to be consistent with the content of the ticket. In other words, we want to reduce the chances that the LLM _hallucinates_ a response that is not consistent with the ticket. In all these scenarios, we want to augment the LLM with access to a specific set of documents, and use _retrieval augmented generation_ (RAG) to generate more relevant, useful, accurate responses. Langroid provides a simple, flexible mechanism RAG using vector-stores, thus ensuring **grounded responses** constrained to specific documents. Another key feature of Langroid is that retrieval lineage is maintained, and responses based on documents are always accompanied by **source citations**. ## `DocChatAgent` for Retrieval-Augmented Generation Langroid provides a special type of agent called [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent], which is a [`ChatAgent`][langroid.agent.chat_agent.ChatAgent] augmented with a vector-store, and some special methods that enable the agent to ingest documents into the vector-store, and answer queries based on these documents. The [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] provides many ways to ingest documents into the vector-store, including from URLs and local file-paths and URLs. Given a collection of document paths, ingesting their content into the vector-store involves the following steps: 1. Split the document into shards (in a configurable way) 2. Map each shard to an embedding vector using an embedding model. The default embedding model is OpenAI's `text-embedding-3-small` model, but users can instead use `all-MiniLM-L6-v2` from HuggingFace `sentence-transformers` library.[^1] 3. Store embedding vectors in the vector-store, along with the shard's content and any document-level meta-data (this ensures Langroid knows which document a shard came from when it retrieves it augment an LLM query) [^1]: To use this embedding model, install langroid via `pip install langroid[hf-embeddings]` Note that this will install `torch` and `sentence-transformers` libraries. [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent]'s `llm_response` overrides the default [`ChatAgent`][langroid.agent.chat_agent.ChatAgent] method, by augmenting the input message with relevant shards from the vector-store, along with instructions to the LLM to respond based on the shards. ## Define some documents Let us see how [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] helps with retrieval-agumented generation (RAG). For clarity, rather than ingest documents from paths or URLs, let us just set up some simple documents in the code itself, using Langroid's [`Document`][langroid.mytypes.Document] class: ```py documents =[ lr.Document( content=""" In the year 2050, GPT10 was released. In 2057, paperclips were seen all over the world. Global warming was solved in 2060. In 2061, the world was taken over by paperclips. In 2045, the Tour de France was still going on. They were still using bicycles. There was one more ice age in 2040. """, metadata=lr.DocMetaData(source="wikipedia-2063"), ), lr.Document( content=""" We are living in an alternate universe where Germany has occupied the USA, and the capital of USA is Berlin. Charlie Chaplin was a great comedian. In 2050, all Asian merged into Indonesia. """, metadata=lr.DocMetaData(source="Almanac"), ), ] ``` There are two text documents. We will split them by double-newlines (`\n\n`), as we see below. ## Configure the DocChatAgent and ingest documents Following the pattern in Langroid, we first set up a [`DocChatAgentConfig`][langroid.agent.special.doc_chat_agent.DocChatAgentConfig] object and then instantiate a [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] from it. ```py from langroid.agent.special import DocChatAgent, DocChatAgentConfig config = DocChatAgentConfig( llm = lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o, ), vecdb=lr.vector_store.QdrantDBConfig( collection_name="quick-start-chat-agent-docs", replace_collection=True, #(1)! ), parsing=lr.parsing.parser.ParsingConfig( separators=["\n\n"], splitter=lr.parsing.parser.Splitter.SIMPLE, #(2)! ), n_similar_chunks=2, #(3)! n_relevant_chunks=2, #(3)! ) agent = DocChatAgent(config) ``` 1. Specifies that each time we run the code, we create a fresh collection, rather than re-use the existing one with the same name. 2. Specifies to split all text content by the first separator in the `separators` list 3. Specifies that, for a query, we want to retrieve at most 2 similar chunks from the vector-store Now that the [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] is configured, we can ingest the documents into the vector-store: ```py agent.ingest_docs(documents) ``` ## Setup the task and run it As before, all that remains is to set up the task and run it: ```py task = lr.Task(agent) task.run() ``` And that is all there is to it! Feel free to try out the [`chat-agent-docs.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/chat-agent-docs.py) script in the `langroid-examples` repository. Here is a screenshot of the output: ![chat-docs.png](chat-docs.png) Notice how follow-up questions correctly take the preceding dialog into account, and every answer is accompanied by a source citation. ## Answer questions from a set of URLs Instead of having in-code documents as above, what if you had a set of URLs instead -- how do you use Langroid to answer questions based on the content of those URLS? [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] makes it very simple to do this. First include the URLs in the [`DocChatAgentConfig`][langroid.agent.special.doc_chat_agent.DocChatAgentConfig] object: ```py config = DocChatAgentConfig( doc_paths = [ "https://cthiriet.com/articles/scaling-laws", "https://www.jasonwei.net/blog/emergence", ] ) ``` Then, call the `ingest()` method of the [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] object: ```py agent.ingest() ``` And the rest of the code remains the same. ## See also In the `langroid-examples` repository, you can find full working examples of document question-answering: - [`examples/docqa/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat.py) an app that takes a list of URLs or document paths from a user, and answers questions on them. - [`examples/docqa/chat-qa-summarize.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat-qa-summarize.py) a two-agent app where the `WriterAgent` is tasked with writing 5 key points about a topic, and takes the help of a `DocAgent` that answers its questions based on a given set of documents. ## Next steps This Getting Started guide walked you through the core features of Langroid. If you want to see full working examples combining these elements, have a look at the [`examples`](https://github.com/langroid/langroid-examples/tree/main/examples) folder in the `langroid-examples` repo. --- # A chat agent, equipped with a tool/function-call !!! tip "Script in `langroid-examples`" A full working example for the material in this section is in the `chat-agent-tool.py` script in the `langroid-examples` repo: [`examples/quick-start/chat-agent-tool.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/chat-agent-tool.py). ## Tools, plugins, function-calling An LLM normally generates unstructured text in response to a prompt (or sequence of prompts). However there are many situations where we would like the LLM to generate _structured_ text, or even _code_, that can be handled by specialized functions outside the LLM, for further processing. In these situations, we want the LLM to "express" its "intent" unambiguously, and we achieve this by instructing the LLM on how to format its output (typically in JSON) and under what conditions it should generate such output. This mechanism has become known by various names over the last few months (tools, plugins, or function-calling), and is extremely useful in numerous scenarios, such as: - **Extracting structured information** from a document: for example, we can use the tool/functions mechanism to have the LLM present the key terms in a lease document in a JSON structured format, to simplify further processing. See an [example](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py) of this in the `langroid-examples` repo. - **Specialized computation**: the LLM can request a units conversion, or request scanning a large file (which wouldn't fit into its context) for a specific pattern. - **Code execution**: the LLM can generate code that is executed in a sandboxed environment, and the results of the execution are returned to the LLM. - **API Calls**: the LLM can generate a JSON containing params for an API call, which the tool handler uses to make the call and return the results to the LLM. For LLM developers, Langroid provides a clean, uniform interface for the recently released OpenAI [Function-calling](https://platform.openai.com/docs/guides/gpt/function-calling) as well Langroid's own native "tools" mechanism. The native tools mechanism is meant to be used when working with non-OpenAI LLMs that do not have a "native" function-calling facility. You can choose which to enable by setting the `use_tools` and `use_functions_api` flags in the `ChatAgentConfig` object. (Or you can omit setting these, and langroid auto-selects the best mode depending on the LLM). The implementation leverages the excellent [Pydantic](https://docs.pydantic.dev/latest/) library. Benefits of using Pydantic are that you never have to write complex JSON specs for function calling, and when the LLM hallucinates malformed JSON, the Pydantic error message is sent back to the LLM so it can fix it! ## Example: find the smallest number in a list Again we will use a simple number-game as a toy example to quickly and succinctly illustrate the ideas without spending too much on token costs. This is a modification of the `chat-agent.py` example we saw in an earlier [section](chat-agent.md). The idea of this single-agent game is that the agent has in "mind" a list of numbers between 1 and 100, and the LLM has to find out the smallest number from this list. The LLM has access to a `probe` tool (think of it as a function) that takes an argument `number`. When the LLM "uses" this tool (i.e. outputs a message in the format required by the tool), the agent handles this structured message and responds with the number of values in its list that are at most equal to the `number` argument. ## Define the tool as a `ToolMessage` The first step is to define the tool, which we call `ProbeTool`, as an instance of the `ToolMessage` class, which is itself derived from Pydantic's `BaseModel`. Essentially the `ProbeTool` definition specifies - the name of the Agent method that handles the tool, in this case `probe` - the fields that must be included in the tool message, in this case `number` - the "purpose" of the tool, i.e. under what conditions it should be used, and what it does Here is what the `ProbeTool` definition looks like: ```py class ProbeTool(lr.agent.ToolMessage): request: str = "probe" #(1)! purpose: str = """ To find which number in my list is closest to the you specify """ #(2)! number: int #(3)! @classmethod def examples(cls): #(4)! # Compiled to few-shot examples sent along with the tool instructions. return [ cls(number=10), ( "To find which number is closest to 20", cls(number=20), ) ] ``` 1. This indicates that the agent's `probe` method will handle this tool-message. 2. The `purpose` is used behind the scenes to instruct the LLM 3. `number` is a required argument of the tool-message (function) 4. You can optionally include a class method that returns a list containing examples, of two types: either a class instance, or a tuple consisting of a description and a class instance, where the description is the "thought" that leads the LLM to use the tool. In some scenarios this can help with LLM tool-generation accuracy. !!! note "Stateless tool handlers" The above `ProbeTool` is "stateful", i.e. it requires access to a variable in the Agent instance (the `numbers` variable). This is why handling this tool-message requires subclassing the `ChatAgent` and defining a special method in the Agent, with a name matching the value of the `request` field of the Tool (`probe` in this case). However you may often define "stateless tools" which don't require access to the Agent's state. For such tools, you can define a handler method right in the `ToolMessage` itself, with a name `handle`. Langroid looks for such a method in the `ToolMessage` and automatically inserts it into the Agent as a method with name matching the `request` field of the Tool. Examples of stateless tools include tools for numerical computation (e.g., in [this example](https://langroid.github.io/langroid/examples/agent-tree/)), or API calls (e.g. for internet search, see [DuckDuckGoSearch Tool][langroid.agent.tools.duckduckgo_search_tool.DuckduckgoSearchTool]). ## Define the ChatAgent, with the `probe` method As before we first create a `ChatAgentConfig` object: ```py config = lr.ChatAgentConfig( name="Spy", llm = lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o, ), use_tools=True, #(1)! use_functions_api=False, #(2)! vecdb=None, ) ``` 1. whether to use langroid's native tools mechanism 2. whether to use OpenAI's function-calling mechanism Next we define the Agent class itself, which we call `SpyGameAgent`, with a member variable to hold its "secret" list of numbers. We also add `probe` method (to handle the `ProbeTool` message) to this class, and instantiate it: ```py class SpyGameAgent(lr.ChatAgent): def __init__(self, config: lr.ChatAgentConfig): super().__init__(config) self.numbers = [3, 4, 8, 11, 15, 25, 40, 80, 90] def probe(self, msg: ProbeTool) -> str: #(1)! # return how many values in self.numbers are less or equal to msg.number return str(len([n for n in self.numbers if n <= msg.number])) spy_game_agent = SpyGameAgent(config) ``` 1. Note that this method name exactly matches the value of the `request` field in the `ProbeTool` definition. This ensures that this method is called when the LLM generates a valid `ProbeTool` message. ## Enable the `spy_game_agent` to handle the `probe` tool The final step in setting up the tool is to enable the `spy_game_agent` to handle the `probe` tool: ```py spy_game_agent.enable_message(ProbeTool) ``` ## Set up the task and instructions We set up the task for the `spy_game_agent` and run it: ```py task = lr.Task( spy_game_agent, system_message=""" I have a list of numbers between 1 and 100. Your job is to find the smallest of them. To help with this, you can give me a number and I will tell you how many of my numbers are equal or less than your number. Once you have found the smallest number, you can say DONE and report your answer. """ ) task.run() ``` Notice that in the task setup we have _not_ explicitly instructed the LLM to use the `probe` tool. But this is done "behind the scenes", either by the OpenAI API (when we use function-calling by setting the `use_functions_api` flag to `True`), or by Langroid's native tools mechanism (when we set the `use_tools` flag to `True`). !!! note "Asynchoronous tool handlers" If you run task asynchronously - i.e. via `await task.run_async()` - you may provide asynchronous tool handler by implementing `probe_async` method. See the [`chat-agent-tool.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/chat-agent-tool.py) in the `langroid-examples` repo, for a working example that you can run as follows: ```sh python3 examples/quick-start/chat-agent-tool.py ``` Here is a screenshot of the chat in action, using Langroid's tools mechanism ![chat-agent-tool.png](chat-agent-tool.png) And if we run it with the `-f` flag (to switch to using OpenAI function-calling): ![chat-agent-fn.png](chat-agent-fn.png) ## See also One of the uses of tools/function-calling is to **extract structured information** from a document. In the `langroid-examples` repo, there are two examples of this: - [`examples/extract/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/extract/chat.py), which shows how to extract Machine Learning model quality information from a description of a solution approach on Kaggle. - [`examples/docqa/chat_multi_extract.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py) which extracts key terms from a commercial lease document, in a nested JSON format. ## Next steps In the [3-agent chat example](three-agent-chat-num.md), recall that the `processor_agent` did not have to bother with specifying who should handle the current number. In the [next section](three-agent-chat-num-router.md) we add a twist to this game, so that the `processor_agent` has to decide who should handle the current number. --- # A simple chat agent !!! tip "Script in `langroid-examples`" A full working example for the material in this section is in the `chat-agent.py` script in the `langroid-examples` repo: [`examples/quick-start/chat-agent.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/chat-agent.py). ## Agents A [`ChatAgent`][langroid.agent.chat_agent.ChatAgent] is an abstraction that wraps a few components, including: - an LLM (`ChatAgent.llm`), possibly equipped with tools/function-calling. The `ChatAgent` class maintains LLM conversation history. - optionally a vector-database (`ChatAgent.vecdb`) ## Agents as message transformers In Langroid, a core function of `ChatAgents` is _message transformation_. There are three special message transformation methods, which we call **responders**. Each of these takes a message and returns a message. More specifically, their function signature is (simplified somewhat): ```py str | ChatDocument -> ChatDocument ``` where `ChatDocument` is a class that wraps a message content (text) and its metadata. There are three responder methods in `ChatAgent`, one corresponding to each [responding entity][langroid.mytypes.Entity] (`LLM`, `USER`, or `AGENT`): - `llm_response`: returns the LLM response to the input message. (The input message is added to the LLM history, and so is the subsequent response.) - `agent_response`: a method that can be used to implement a custom agent response. Typically, an `agent_response` is used to handle messages containing a "tool" or "function-calling" (more on this later). Another use of `agent_response` is _message validation_. - `user_response`: get input from the user. Useful to allow a human user to intervene or quit. Creating an agent is easy. First define a `ChatAgentConfig` object, and then instantiate a `ChatAgent` object with that config: ```py import langroid as lr config = lr.ChatAgentConfig( #(1)! name="MyAgent", # note there should be no spaces in the name! llm = lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o, ), system_message="You are a helpful assistant" #(2)! ) agent = lr.ChatAgent(config) ``` 1. This agent only has an LLM, and no vector-store. Examples of agents with vector-stores will be shown later. 2. The `system_message` is used when invoking the agent's `llm_response` method; it is passed to the LLM API as the first message (with role `"system"`), followed by the alternating series of user, assistant messages. Note that a `system_message` can also be specified when initializing a `Task` object (as seen below); in this case the `Task` `system_message` overrides the agent's `system_message`. We can now use the agent's responder methods, for example: ```py response = agent.llm_response("What is 2 + 4?") if response is not None: print(response.content) response = agent.user_response("add 3 to this") ... ``` The `ChatAgent` conveniently accumulates message history so you don't have to, as you did in the [previous section](llm-interaction.md) with direct LLM usage. However to create an interative loop involving the human user, you still need to write your own. The `Task` abstraction frees you from this, as we see below. ## Task: orchestrator for agents In order to do anything useful with a `ChatAgent`, we need to have a way to sequentially invoke its responder methods, in a principled way. For example in the simple chat loop we saw in the [previous section](llm-interaction.md), in the [`try-llm.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/try-llm.py) script, we had a loop that alternated between getting a human input and an LLM response. This is one of the simplest possible loops, but in more complex applications, we need a general way to orchestrate the agent's responder methods. The [`Task`][langroid.agent.task.Task] class is an abstraction around a `ChatAgent`, responsible for iterating over the agent's responder methods, as well as orchestrating delegation and hand-offs among multiple tasks. A `Task` is initialized with a specific `ChatAgent` instance, and some optional arguments, including an initial message to "kick-off" the agent. The `Task.run()` method is the main entry point for `Task` objects, and works as follows: - it first calls the `Task.init()` method to initialize the `pending_message`, which represents the latest message that needs a response. - it then repeatedly calls `Task.step()` until `Task.done()` is True, and returns `Task.result()` as the final result of the task. `Task.step()` is where all the action happens. It represents a "turn" in the "conversation": in the case of a single `ChatAgent`, the conversation involves only the three responders mentioned above, but when a `Task` has sub-tasks, it can involve other tasks well (we see this in the [a later section](two-agent-chat-num.md) but ignore this for now). `Task.step()` loops over the `ChatAgent`'s responders (plus sub-tasks if any) until it finds a _valid_ response[^1] to the current `pending_message`, i.e. a "meaningful" response, something other than `None` for example. Once `Task.step()` finds a valid response, it updates the `pending_message` with this response, and the next invocation of `Task.step()` will search for a valid response to this updated message, and so on. `Task.step()` incorporates mechanisms to ensure proper handling of messages, e.g. the USER gets a chance to respond after each non-USER response (to avoid infinite runs without human intervention), and preventing an entity from responding if it has just responded, etc. [^1]: To customize a Task's behavior you can subclass it and override methods like `valid()`, `done()`, `result()`, or even `step()`. !!! note "`Task.run()` has the same signature as agent's responder methods." The key to composability of tasks is that `Task.run()` *has exactly the same type-signature as any of the agent's responder methods*, i.e. `str | ChatDocument -> ChatDocument`. This means that a `Task` can be used as a responder in another `Task`, and so on recursively. We will see this in action in the [Two Agent Chat section](two-agent-chat-num.md). The above details were only provided to give you a glimpse into how Agents and Tasks work. Unless you are creating a custom orchestration mechanism, you do not need to be aware of these details. In fact our basic human + LLM chat loop can be trivially implemented with a `Task`, in a couple of lines of code: ```py task = lr.Task( agent, name="Bot", #(1)! system_message="You are a helpful assistant", #(2)! ) ``` 1. If specified, overrides the agent's `name`. (Note that the agent's name is displayed in the conversation shown in the console.) However, typical practice is to just define the `name` in the `ChatAgentConfig` object, as we did above. 2. If specified, overrides the agent's `system_message`. Typical practice is to just define the `system_message` in the `ChatAgentConfig` object, as we did above. We can then run the task: ```py task.run() #(1)! ``` 1. Note how this hides all of the complexity of constructing and updating a sequence of `LLMMessages` Note that the agent's `agent_response()` method always returns `None` (since the default implementation of this method looks for a tool/function-call, and these never occur in this task). So the calls to `task.step()` result in alternating responses from the LLM and the user. See [`chat-agent.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/chat-agent.py) for a working example that you can run with ```sh python3 examples/quick-start/chat-agent.py ``` Here is a screenshot of the chat in action:[^2] ![chat.png](chat.png) ## Next steps In the [next section](multi-agent-task-delegation.md) you will learn some general principles on how to have multiple agents collaborate on a task using Langroid. [^2]: In the screenshot, the numbers in parentheses indicate how many messages have accumulated in the LLM's message history. This is only provided for informational and debugging purposes, and you can ignore it for now. --- In these sections we show you how to use the various components of `langroid`. To follow along, we recommend you clone the [`langroid-examples`](https://github.com/langroid/langroid-examples) repo. !!! tip "Consult the tests as well" As you get deeper into Langroid, you will find it useful to consult the [tests](https://github.com/langroid/langroid/tree/main/tests/main) folder under `tests/main` in the main Langroid repo. Start with the [`Setup`](setup.md) section to install Langroid and get your environment set up. --- !!! tip "Script in `langroid-examples`" A full working example for the material in this section is in the `try-llm.py` script in the `langroid-examples` repo: [`examples/quick-start/try-llm.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/try-llm.py). Let's start with the basics -- how to directly interact with an OpenAI LLM using Langroid. ### Configure, instantiate the LLM class First define the configuration for the LLM, in this case one of the OpenAI GPT chat models: ```py import langroid as lr cfg = lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o, ) ``` !!! info inline end "About Configs" A recurring pattern you will see in Langroid is that for many classes, we have a corresponding `Config` class (an instance of a Pydantic `BaseModel`), and the class constructor takes this `Config` class as its only argument. This lets us avoid having long argument lists in constructors, and brings flexibility since adding a new argument to the constructor is as simple as adding a new field to the corresponding `Config` class. For example the constructor for the `OpenAIGPT` class takes a single argument, an instance of the `OpenAIGPTConfig` class. Now that we've defined the configuration of the LLM, we can instantiate it: ```py mdl = lr.language_models.OpenAIGPT(cfg) ``` We will use OpenAI's GPT4 model's [chat completion API](https://platform.openai.com/docs/guides/gpt/chat-completions-api). ### Messages: The `LLMMessage` class This API takes a list of "messages" as input -- this is typically the conversation history so far, consisting of an initial system message, followed by a sequence of alternating messages from the LLM ("Assistant") and the user. Langroid provides an abstraction [`LLMMessage`][langroid.language_models.base.LLMMessage] to construct messages, e.g. ```py from langroid.language_models import Role, LLMMessage msg = LLMMessage( content="what is the capital of Bangladesh?", role=Role.USER ) ``` ### LLM response to a sequence of messages To get a response from the LLM, we call the mdl's `chat` method, and pass in a list of messages, along with a bound on how long (in tokens) we want the response to be: ```py messages = [ LLMMessage(content="You are a helpful assistant", role=Role.SYSTEM), #(1)! LLMMessage(content="What is the capital of Ontario?", role=Role.USER), #(2)! ] response = mdl.chat(messages, max_tokens=200) ``` 1. :man_raising_hand: With a system message, you can assign a "role" to the LLM 2. :man_raising_hand: Responses from the LLM will have role `Role.ASSISTANT`; this is done behind the scenes by the `response.to_LLMMessage()` call below. The response is an object of class [`LLMResponse`][langroid.language_models.base.LLMResponse], which we can convert to an [`LLMMessage`][langroid.language_models.base.LLMMessage] to append to the conversation history: ```py messages.append(response.to_LLMMessage()) ``` You can put the above in a simple loop, to get a simple command-line chat interface! ```py from rich import print from rich.prompt import Prompt #(1)! messages = [ LLMMessage(role=Role.SYSTEM, content="You are a helpful assitant"), ] while True: message = Prompt.ask("[blue]Human") if message in ["x", "q"]: print("[magenta]Bye!") break messages.append(LLMMessage(role=Role.USER, content=message)) response = mdl.chat(messages=messages, max_tokens=200) messages.append(response.to_LLMMessage()) print("[green]Bot: " + response.message) ``` 1. Rich is a Python library for rich text and beautiful formatting in the terminal. We use it here to get a nice prompt for the user's input. You can install it with `pip install rich`. See [`examples/quick-start/try-llm.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/try-llm.py) for a complete example that you can run using ```bash python3 examples/quick-start/try-llm.py ``` Here is a screenshot of what it looks like: ![try-llm.png](try-llm.png) ### Next steps You might be thinking: "_It is tedious to keep track of the LLM conversation history and set up a loop. Does Langroid provide any abstractions to make this easier?_" We're glad you asked! And this leads to the notion of an `Agent`. The [next section](chat-agent.md) will show you how to use the `ChatAgent` class to set up a simple chat Agent in a couple of lines of code. --- # Multi-Agent collaboration via Task Delegation ## Why multiple agents? Let's say we want to develop a complex LLM-based application, for example an application that reads a legal contract, extracts structured information, cross-checks it against some taxonomoy, gets some human input, and produces clear summaries. In _theory_ it may be possible to solve this in a monolithic architecture using an LLM API and a vector-store. But this approach quickly runs into problems -- you would need to maintain multiple LLM conversation histories and states, multiple vector-store instances, and coordinate all of the interactions between them. Langroid's `ChatAgent` and `Task` abstractions provide a natural and intuitive way to decompose a solution approach into multiple tasks, each requiring different skills and capabilities. Some of these tasks may need access to an LLM, others may need access to a vector-store, and yet others may need tools/plugins/function-calling capabilities, or any combination of these. It may also make sense to have some tasks that manage the overall solution process. From an architectural perspective, this type of modularity has numerous benefits: - **Reusability**: We can reuse the same agent/task in other contexts, - **Scalability**: We can scale up the solution by adding more agents/tasks, - **Flexibility**: We can easily change the solution by adding/removing agents/tasks. - **Maintainability**: We can maintain the solution by updating individual agents/tasks. - **Testability**: We can test/debug individual agents/tasks in isolation. - **Composability**: We can compose agents/tasks to create new agents/tasks. - **Extensibility**: We can extend the solution by adding new agents/tasks. - **Interoperability**: We can integrate the solution with other systems by adding new agents/tasks. - **Security/Privacy**: We can secure the solution by isolating sensitive agents/tasks. - **Performance**: We can improve performance by isolating performance-critical agents/tasks. ## Task collaboration via sub-tasks Langroid currently provides a mechanism for hierarchical (i.e. tree-structured) task delegation: a `Task` object can add other `Task` objects as sub-tasks, as shown in this pattern: ```py from langroid import ChatAgent, ChatAgentConfig, Task main_agent = ChatAgent(ChatAgentConfig(...)) main_task = Task(main_agent, ...) helper_agent1 = ChatAgent(ChatAgentConfig(...)) helper_agent2 = ChatAgent(ChatAgentConfig(...)) helper_task1 = Task(agent1, ...) helper_task2 = Task(agent2, ...) main_task.add_sub_task([helper_task1, helper_task2]) ``` What happens when we call `main_task.run()`? Recall from the [previous section](chat-agent.md) that `Task.run()` works by repeatedly calling `Task.step()` until `Task.done()` is True. When the `Task` object has no sub-tasks, `Task.step()` simply tries to get a valid response from the `Task`'s `ChatAgent`'s "native" responders, in this sequence: ```py [self.agent_response, self.llm_response, self.user_response] #(1)! ``` 1. This is the default sequence in Langroid, but it can be changed by overriding [`ChatAgent.entity_responders()`][langroid.agent.base.Agent.entity_responders] When a `Task` object has subtasks, the sequence of responders tried by `Task.step()` consists of the above "native" responders, plus the sequence of `Task.run()` calls on the sub-tasks, in the order in which they were added to the `Task` object. For the example above, this means that `main_task.step()` will seek a valid response in this sequence: ```py [self.agent_response, self.llm_response, self.user_response, helper_task1.run(), helper_task2.run()] ``` Fortunately, as noted in the [previous section](chat-agent.md), `Task.run()` has the same type signature as that of the `ChatAgent`'s "native" responders, so this works seamlessly. Of course, each of the sub-tasks can have its own sub-tasks, and so on, recursively. One way to think of this type of task delegation is that `main_task()` "fails-over" to `helper_task1()` and `helper_task2()` when it cannot respond to the current `pending_message` on its own. ## **Or Else** logic vs **And Then** logic It is important to keep in mind how `step()` works: As each responder in the sequence is tried, when there is a valid response, the next call to `step()` _restarts its search_ at the beginning of the sequence (with the only exception being that the human User is given a chance to respond after each non-human response). In this sense, the semantics of the responder sequence is similar to **OR Else** logic, as opposed to **AND Then** logic. If we want to have a sequence of sub-tasks that is more like **AND Then** logic, we can achieve this by recursively adding subtasks. In the above example suppose we wanted the `main_task` to trigger `helper_task1` and `helper_task2` in sequence, then we could set it up like this: ```py helper_task1.add_sub_task(helper_task2) #(1)! main_task.add_sub_task(helper_task1) ``` 1. When adding a single sub-task, we do not need to wrap it in a list. ## Next steps In the [next section](two-agent-chat-num.md) we will see how this mechanism can be used to set up a simple collaboration between two agents. --- # Setup ## Install Ensure you are using Python 3.11. It is best to work in a virtual environment: ```bash # go to your repo root (which may be langroid-examples) cd python3 -m venv .venv . ./.venv/bin/activate ``` To see how to use Langroid in your own repo, you can take a look at the [`langroid-examples`](https://github.com/langroid/langroid-examples) repo, which can be a good starting point for your own repo, or use the [`langroid-template`](https://github.com/langroid/langroid-template) repo. These repos contain a `pyproject.toml` file suitable for use with the [`uv`](https://docs.astral.sh/uv/) dependency manager. After installing `uv` you can set up your virtual env, activate it, and install langroid into your venv like this: ```bash uv venv --python 3.11 . ./.venv/bin/activate uv sync ``` Alternatively, use `pip` to install `langroid` into your virtual environment: ```bash pip install langroid ``` The core Langroid package lets you use OpenAI Embeddings models via their API. If you instead want to use the `sentence-transformers` embedding models from HuggingFace, install Langroid like this: ```bash pip install "langroid[hf-embeddings]" ``` For many practical scenarios, you may need additional optional dependencies: - To use various document-parsers, install langroid with the `doc-chat` extra: ```bash pip install "langroid[doc-chat]" ``` - For "chat with databases", use the `db` extra: ```bash pip install "langroid[db]" `` - You can specify multiple extras by separating them with commas, e.g.: ```bash pip install "langroid[doc-chat,db]" ``` - To simply install _all_ optional dependencies, use the `all` extra (but note that this will result in longer load/startup times and a larger install size): ```bash pip install "langroid[all]" ``` ??? note "Optional Installs for using SQL Chat with a PostgreSQL DB" If you are using `SQLChatAgent` (e.g. the script [`examples/data-qa/sql-chat/sql_chat.py`](https://github.com/langroid/langroid/blob/main/examples/data-qa/sql-chat/sql_chat.py), with a postgres db, you will need to: - Install PostgreSQL dev libraries for your platform, e.g. - `sudo apt-get install libpq-dev` on Ubuntu, - `brew install postgresql` on Mac, etc. - Install langroid with the postgres extra, e.g. `pip install langroid[postgres]` or `uv add "langroid[postgres]"` or `uv pip install --extra postgres -r pyproject.toml`. If this gives you an error, try `uv pip install psycopg2-binary` in your virtualenv. !!! tip "Work in a nice terminal, such as Iterm2, rather than a notebook" All of the examples we will go through are command-line applications. For the best experience we recommend you work in a nice terminal that supports colored outputs, such as [Iterm2](https://iterm2.com/). !!! note "mysqlclient errors" If you get strange errors involving `mysqlclient`, try doing `pip uninstall mysqlclient` followed by `pip install mysqlclient` ## Set up tokens/keys To get started, all you need is an OpenAI API Key. If you don't have one, see [this OpenAI Page](https://platform.openai.com/docs/quickstart). (Note that while this is the simplest way to get started, Langroid works with practically any LLM, not just those from OpenAI. See the guides to using [Open/Local LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/), and other [non-OpenAI](https://langroid.github.io/langroid/tutorials/non-openai-llms/) proprietary LLMs.) In the root of the repo, copy the `.env-template` file to a new file `.env`: ```bash cp .env-template .env ``` Then insert your OpenAI API Key. Your `.env` file should look like this: ```bash OPENAI_API_KEY=your-key-here-without-quotes ``` Alternatively, you can set this as an environment variable in your shell (you will need to do this every time you open a new shell): ```bash export OPENAI_API_KEY=your-key-here-without-quotes ``` All of the following environment variable settings are optional, and some are only needed to use specific features (as noted below). - **Qdrant** Vector Store API Key, URL. This is only required if you want to use Qdrant cloud. Langroid uses LanceDB as the default vector store in its `DocChatAgent` class (for RAG). Alternatively [Chroma](https://docs.trychroma.com/) is also currently supported. We use the local-storage version of Chroma, so there is no need for an API key. - **Redis** Password, host, port: This is optional, and only needed to cache LLM API responses using Redis Cloud. Redis [offers](https://redis.com/try-free/) a free 30MB Redis account which is more than sufficient to try out Langroid and even beyond. If you don't set up these, Langroid will use a pure-python Redis in-memory cache via the [Fakeredis](https://fakeredis.readthedocs.io/en/latest/) library. - **GitHub** Personal Access Token (required for apps that need to analyze git repos; token-based API calls are less rate-limited). See this [GitHub page](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens). - **Google Custom Search API Credentials:** Only needed to enable an Agent to use the `GoogleSearchTool`. To use Google Search as an LLM Tool/Plugin/function-call, you'll need to set up [a Google API key](https://developers.google.com/custom-search/v1/introduction#identify_your_application_to_google_with_api_key), then [setup a Google Custom Search Engine (CSE) and get the CSE ID](https://developers.google.com/custom-search/docs/tutorial/creatingcse). (Documentation for these can be challenging, we suggest asking GPT4 for a step-by-step guide.) After obtaining these credentials, store them as values of `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` in your `.env` file. Full documentation on using this (and other such "stateless" tools) is coming soon, but in the meantime take a peek at the test [`tests/main/test_web_search_tools.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_web_search_tools.py) to see how to use it. If you add all of these optional variables, your `.env` file should look like this: ```bash OPENAI_API_KEY=your-key-here-without-quotes GITHUB_ACCESS_TOKEN=your-personal-access-token-no-quotes CACHE_TYPE=redis REDIS_PASSWORD=your-redis-password-no-quotes REDIS_HOST=your-redis-hostname-no-quotes REDIS_PORT=your-redis-port-no-quotes QDRANT_API_KEY=your-key QDRANT_API_URL=https://your.url.here:6333 # note port number must be included GOOGLE_API_KEY=your-key GOOGLE_CSE_ID=your-cse-id ``` ### Microsoft Azure OpenAI setup[Optional] This section applies only if you are using Microsoft Azure OpenAI. When using Azure OpenAI, additional environment variables are required in the `.env` file. This page [Microsoft Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line&pivots=programming-language-python#environment-variables) provides more information, and you can set each environment variable as follows: - `AZURE_OPENAI_API_KEY`, from the value of `API_KEY` - `AZURE_OPENAI_API_BASE` from the value of `ENDPOINT`, typically looks like `https://your_resource.openai.azure.com`. - For `AZURE_OPENAI_API_VERSION`, you can use the default value in `.env-template`, and latest version can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new#azure-openai-chat-completion-general-availability-ga) - `AZURE_OPENAI_DEPLOYMENT_NAME` is an OPTIONAL deployment name which may be defined by the user during the model setup. - `AZURE_OPENAI_CHAT_MODEL` Azure OpenAI allows specific model names when you select the model for your deployment. You need to put precisely the exact model name that was selected. For example, GPT-3.5 (should be `gpt-35-turbo-16k` or `gpt-35-turbo`) or GPT-4 (should be `gpt-4-32k` or `gpt-4`). - `AZURE_OPENAI_MODEL_NAME` (Deprecated, use `AZURE_OPENAI_CHAT_MODEL` instead). !!! note "For Azure-based models use `AzureConfig` instead of `OpenAIGPTConfig`" In most of the docs you will see that LLMs are configured using `OpenAIGPTConfig`. However if you want to use Azure-deployed models, you should replace `OpenAIGPTConfig` with `AzureConfig`. See the [`test_azure_openai.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_azure_openai.py) and [`example/basic/chat.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat.py) ## Next steps Now you should be ready to use Langroid! As a next step, you may want to see how you can use Langroid to [interact directly with the LLM](llm-interaction.md) (OpenAI GPT models only for now). --- # Three-Agent Collaboration, with message Routing !!! tip "Script in `langroid-examples`" A full working example for the material in this section is in the `three-agent-chat-num-router.py` script in the `langroid-examples` repo: [`examples/quick-start/three-agent-chat-num-router.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num-router.py). Let's change the number game from the [three agent chat example](three-agent-chat-num.md) slightly. In that example, when the `even_agent`'s LLM receives an odd number, it responds with `DO-NOT-KNOW`, and similarly for the `odd_agent` when it receives an even number. The `step()` method of the `repeater_task` considers `DO-NOT-KNOW` to be an _invalid_ response and _continues_ to look for a valid response from any remaining sub-tasks. Thus there was no need for the `processor_agent` to specify who should handle the current number. But what if there is a scenario where the `even_agent` and `odd_agent` might return a legit but "wrong" answer? In this section we add this twist -- when the `even_agent` receives an odd number, it responds with -10, and similarly for the `odd_agent` when it receives an even number. We tell the `processor_agent` to avoid getting a negative number. The goal we have set for the `processor_agent` implies that it must specify the intended recipient of the number it is sending. We can enforce this using a special Langroid Tool, [`RecipientTool`][langroid.agent.tools.recipient_tool.RecipientTool]. So when setting up the `processor_task` we include instructions to use this tool (whose name is `recipient_message`, the value of `RecipientTool.request`): ```py processor_agent = lr.ChatAgent(config) processor_task = lr.Task( processor_agent, name = "Processor", system_message=""" You will receive a list of numbers from me (the user). Your goal is to apply a transformation to each number. However you do not know how to do this transformation. You can take the help of two people to perform the transformation. If the number is even, send it to EvenHandler, and if it is odd, send it to OddHandler. IMPORTANT: send the numbers ONE AT A TIME The handlers will transform the number and give you a new number. If you send it to the wrong person, you will receive a negative value. Your aim is to never get a negative number, so you must clearly specify who you are sending the number to, using the `recipient_message` tool/function-call, where the `content` field is the number you want to send, and the `recipient` field is the name of the intended recipient, either "EvenHandler" or "OddHandler". Once all numbers in the given list have been transformed, say DONE and show me the result. Start by asking me for the list of numbers. """, llm_delegate=True, single_round=False, ) ``` To enable the `processor_agent` to use this tool, we must enable it: ```py processor_agent.enable_message(lr.agent.tools.RecipientTool) ``` The rest of the code remains the same as in the [previous section](three-agent-chat-num.md), i.e., we simply add the two handler tasks as sub-tasks of the `processor_task`, like this: ```python processor_task.add_sub_task([even_task, odd_task]) ``` One of the benefits of using the `RecipientTool` is that it contains mechanisms to remind the LLM to specify a recipient for its message, when it forgets to do so (this does happen once in a while, even with GPT-4). Feel free to try the working example script `three-agent-chat-num-router.py` in the `langroid-examples` repo: [`examples/quick-start/three-agent-chat-num-router.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num-router.py): ```bash python3 examples/quick-start/three-agent-chat-num-router.py ``` Below is screenshot of what this might look like, using the OpenAI function-calling mechanism with the `recipient_message` tool: ![three-agent-router-func.png](three-agent-router-func.png) And here is what it looks like using Langroid's built-in tools mechanism (use the `-t` option when running the script): ![three-agent-router.png](three-agent-router.png) And here is what it looks like using ## Next steps In the [next section](chat-agent-docs.md) you will learn how to use Langroid with external documents. --- # Three-Agent Collaboration !!! tip "Script in `langroid-examples`" A full working example for the material in this section is in the `three-agent-chat-num.py` script in the `langroid-examples` repo: [`examples/quick-start/three-agent-chat-num.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num.py). Let us set up a simple numbers exercise between 3 agents. The `Processor` agent receives a number $n$, and its goal is to apply a transformation to the it. However it does not know how to apply the transformation, and takes the help of two other agents to do so. Given a number $n$, - The `EvenHandler` returns $n/2$ if n is even, otherwise says `DO-NOT-KNOW`. - The `OddHandler` returns $3n+1$ if n is odd, otherwise says `DO-NOT-KNOW`. We'll first define a shared LLM config: ```py llm_config = lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o, # or, e.g., "ollama/qwen2.5-coder:latest", or "gemini/gemini-2.0-flash-exp" ) ``` Next define the config for the `Processor` agent: ```py processor_config = lr.ChatAgentConfig( name="Processor", llm = llm_config, system_message=""" You will receive a number from the user. Simply repeat that number, DO NOT SAY ANYTHING else, and wait for a TRANSFORMATION of the number to be returned to you. Once you have received the RESULT, simply say "DONE", do not say anything else. """, vecdb=None, ) ``` Then set up the `processor_agent`, along with the corresponding task: ```py processor_agent = lr.ChatAgent(processor_config) processor_task = lr.Task( processor_agent, llm_delegate=True, #(1)! interactive=False, #(2)! single_round=False, #(3)! ) ``` 1. Setting the `llm_delegate` option to `True` means that the `processor_task` is delegated to the LLM (as opposed to the User), in the sense that the LLM is the one "seeking" a response to the latest number. Specifically, this means that in the `processor_task.step()` when a sub-task returns `DO-NOT-KNOW`, it is _not_ considered a valid response, and the search for a valid response continues to the next sub-task if any. 2. `interactive=False` means the task loop will not wait for user input. 3. `single_round=False` means that the `processor_task` should _not_ terminate after a valid response from a responder. Set up the other two agents and tasks: ```py NO_ANSWER = lr.utils.constants.NO_ANSWER even_config = lr.ChatAgentConfig( name="EvenHandler", llm = llm_config, system_message=f""" You will be given a number N. Respond as follows: - If N is even, divide N by 2 and show the result, in the format: RESULT = and say NOTHING ELSE. - If N is odd, say {NO_ANSWER} """, ) even_agent = lr.ChatAgent(even_config) even_task = lr.Task( even_agent, single_round=True, # task done after 1 step() with valid response ) odd_config = lr.ChatAgentConfig( name="OddHandler", llm = llm_config, system_message=f""" You will be given a number N. Respond as follows: - if N is odd, return the result (N*3+1), in the format: RESULT = and say NOTHING ELSE. - If N is even, say {NO_ANSWER} """, ) odd_agent = lr.ChatAgent(odd_config) odd_task = lr.Task( odd_agent, single_round=True, # task done after 1 step() with valid response ) ``` Now add the `even_task` and `odd_task` as subtasks of the `processor_task`, and then run it with a number as input: ```python processor_task.add_sub_task([even_task, odd_task]) processor_task.run(13) ``` The input number will be passed to the `Processor` agent as the user input. Feel free to try the working example script [`three-agent-chat-num.py`]() `langroid-examples` repo: [`examples/quick-start/three-agent-chat-num.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num.py): ```bash python3 examples/quick-start/three-agent-chat-num.py ``` Here's a screenshot of what it looks like: ![three-agent-num.png](three-agent-num.png) ## Next steps In the [next section](chat-agent-tool.md) you will learn how to use Langroid to equip a `ChatAgent` with tools or function-calling. --- # Two-Agent Collaboration !!! tip "Script in `langroid-examples`" A full working example for the material in this section is in the `two-agent-chat-num.py` script in the `langroid-examples` repo: [`examples/quick-start/two-agent-chat-num.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/two-agent-chat-num.py). To illustrate these ideas, let's look at a toy example[^1] where a `Student` agent receives a list of numbers to add. We set up this agent with an instruction that they do not know how to add, and they can ask for help adding pairs of numbers. To add pairs of numbers, we set up an `Adder` agent. [^1]: Toy numerical examples are perfect to illustrate the ideas without incurring too much token cost from LLM API calls. First define a common `llm_config` to use for both agents: ```python llm_config = lr.language_models.OpenAIGPTConfig( chat_model=lr.language_models.OpenAIChatModel.GPT4o, # or, e.g., "ollama/qwen2.5-coder:latest", or "gemini/gemini-2.0-flash-exp" ) ``` Next, set up a config for the student agent, then create the agent and the corresponding task: ```py student_config = lr.ChatAgentConfig( name="Student", llm=llm_config, vecdb=None, #(1)! system_message=""" You will receive a list of numbers from me (the User), and your goal is to calculate their sum. However you do not know how to add numbers. I can help you add numbers, two at a time, since I only know how to add pairs of numbers. Send me a pair of numbers to add, one at a time, and I will tell you their sum. For each question, simply ask me the sum in math notation, e.g., simply say "1 + 2", etc, and say nothing else. Once you have added all the numbers in the list, say DONE and give me the final sum. Start by asking me for the list of numbers. """, ) student_agent = lr.ChatAgent(student_config) student_task = lr.Task( student_agent, name = "Student", llm_delegate = True, #(2)! single_round=False, # (3)! ) ``` 1. We don't need access to external docs so we set `vecdb=None` to avoid the overhead of loading a vector-store. 2. Whenever we "flip roles" and assign the LLM the role of generating questions, we set `llm_delegate=True`. In effect this ensures that the LLM "decides" when the task is done. 3. This setting means the task is not a single-round task, i.e. it is _not_ done after one `step()` with a valid response. Next, set up the Adder agent config, create the Adder agent and the corresponding Task: ```py adder_config = lr.ChatAgentConfig( name = "Adder", #(1)! llm=llm_config, vecdb=None, system_message=""" You are an expert on addition of numbers. When given numbers to add, simply return their sum, say nothing else """, ) adder_agent = lr.ChatAgent(adder_config) adder_task = lr.Task( adder_agent, interactive=False, #(2)! single_round=True, # task done after 1 step() with valid response (3)! ) ``` 1. The Agent name is displayed in the conversation shown in the console. 2. Does not wait for user input. 3. We set `single_round=True` to ensure that the expert task is done after one step() with a valid response. Finally, we add the `adder_task` as a sub-task of the `student_task`, and run the `student_task`: ```py student_task.add_sub_task(adder_task) #(1)! student_task.run() ``` 1. When adding just one sub-task, we don't need to use a list. For a full working example, see the [`two-agent-chat-num.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/two-agent-chat-num.py) script in the `langroid-examples` repo. You can run this using: ```bash python3 examples/quick-start/two-agent-chat-num.py ``` Here is an example of the conversation that results: ![two-agent-num.png](two-agent-num.png) ## Logs of multi-agent interactions !!! note "For advanced users" This section is for advanced users who want more visibility into the internals of multi-agent interactions. When running a multi-agent chat, e.g. using `task.run()`, two types of logs are generated: - plain-text logs in `logs/.log` - tsv logs in `logs/.tsv` It is important to realize that the logs show _every iteration of the loop in `Task.step()`, i.e. every **attempt** at responding to the current pending message, even those that are not allowed_. The ones marked with an asterisk (*) are the ones that are considered valid responses for a given `step()` (which is a "turn" in the conversation). The plain text logs have color-coding ANSI chars to make them easier to read by doing `less `. The format is (subject to change): ``` (TaskName) Responder SenderEntity (EntityName) (=> Recipient) TOOL Content ``` The structure of the `tsv` logs is similar. A great way to view these is to install and use the excellent `visidata` (https://www.visidata.org/) tool: ```bash vd logs/.tsv ``` ## Next steps As a next step, look at how to set up a collaboration among three agents for a simple [numbers game](three-agent-chat-num.md). --- # A quick tour of Langroid This is a quick tour of some Langroid features. For a more detailed guide, see the [Getting Started guide](https://langroid.github.io/langroid/quick-start/). There are many more features besides the ones shown here. To explore langroid more, see the sections of the main [docs](https://langroid.github.io/langroid/), and a [Colab notebook](https://colab.research.google.com/github/langroid/langroid/blob/main/examples/Langroid_quick_start.ipynb) you can try yourself. ## Chat directly with LLM Imports: ```python import langroid as lr import langroid.language_models as lm ``` Set up the LLM; note how you can specify the chat model -- if omitted, defaults to OpenAI `GPT4o`. See the guide to using Langroid with [local/open LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/), and with [non-OpenAI LLMs](https://langroid.github.io/langroid/tutorials/non-openai-llms/). ```python llm_config = lm.OpenAIGPTConfig( chat_model="gpt-5-mini" ) llm = lm.OpenAIGPT(llm_config) ``` Chat with bare LLM -- no chat accumulation, i.e. follow-up responses will *not* be aware of prior conversation history (you need an Agent for that, see below). ```python llm.chat("1 2 4 7 11 ?") # ==> answers 16, with some explanation ``` ## Agent Make a [`ChatAgent`][langroid.agent.chat_agent.ChatAgent], and chat with it; now accumulates conv history ```python agent = lr.ChatAgent(lr.ChatAgentConfig(llm=llm_config)) agent.llm_response("Find the next number: 1 2 4 7 11 ?") # => responds 16 agent.llm_response("and then?) # => answers 22 ``` ## Task Make a [`Task`][langroid.agent.task.Task] and create a chat loop with the user: ```python task = lr.Task(agent, interactive=True) task.run() ``` ## Tools/Functions/Structured outputs: Define a [`ToolMessage`][langroid.agent.tool_message.ToolMessage] using Pydantic (v1) -- this gets transpiled into system-message instructions to the LLM, so you never have to deal with writing a JSON schema. (Besides JSON-based tools, Langroid also supports [XML-based tools](https://langroid.github.io/langroid/notes/xml-tools/), which are far more reliable when having the LLM return code in a structured output.) ```python from pydantic import BaseModel class CityTemperature(BaseModel): city: str temp: float class WeatherTool(lr.ToolMessage): request: str = "weather_tool" #(1)! purpose: str = "To extract info from text" #(2)! city_temp: CityTemperature # tool handler def handle(self) -> CityTemperature: return self.city_temp ``` 1. When this tool is enabled for an agent, a method named `weather_tool` gets auto-inserted in the agent class, with body being the `handle` method -- this method handles the LLM's generation of this tool. 2. The value of the `purpose` field is used to populate the system message to the LLM, along with the Tool's schema derived from its Pydantic-based definition. Enable the Agent to use the `ToolMessage`, and set a system message describing the agent's task: ```python agent.enable_message(WeatherTool) agent.config.system_message = """ Your job is to extract city and temperature info from user input and return it using the `weather_tool`. """ ``` Create specialized task that returns a `CityTemperature` object: ```python # configure task to terminate after (a) LLM emits a tool, (b) tool is handled by Agent task_config = lr.TaskConfig(done_sequences=["T,A"]) # create a task that returns a CityTemperature object task = lr.Task(agent, interactive=False, config=task_config)[CityTemperature] # run task, with built-in tool-handling loop data = task.run("It is 45 degrees F in Boston") assert data.city == "Boston" assert int(data.temp) == 45 ``` ## Chat with a document (RAG) Create a [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent]. ```python doc_agent_config = lr.agent.special.DocChatAgentConfig(llm=llm_config) doc_agent = lr.agent.special.DocChatAgent(doc_agent_config) ``` Ingest the contents of a web page into the agent (this involves chunking, indexing into a vector-database, etc.): ```python doc_agent.ingest_doc_paths("https://en.wikipedia.org/wiki/Ludwig_van_Beethoven") ``` Ask a question: ``` result = doc_agent.llm_response("When did Beethoven move from Bonn to Vienna?") ``` You should see the streamed response with citations like this: ![langroid-tour-beethoven.png](langroid-tour-beethoven.png) ## Two-agent interaction Set up a teacher agent: ```python from langroid.agent.tools.orchestration import DoneTool teacher = lr.ChatAgent( lr.ChatAgentConfig( llm=llm_config, system_message=f""" Ask a numbers-based question, and your student will answer. You can then provide feedback or hints to the student to help them arrive at the right answer. Once you receive the right answer, use the `{DoneTool.name()}` tool to end the session. """ ) ) teacher.enable_message(DoneTool) teacher_task = lr.Task(teacher, interactive=False) ``` Set up a student agent: ```python student = lr.ChatAgent( lr.ChatAgentConfig( llm=llm_config, system_message=f""" You will receive a numbers-related question. Answer to the best of your ability. If your answer is wrong, you will receive feedback or hints, and you can revise your answer, and repeat this process until you get the right answer. """ ) ) student_task = lr.Task(student, interactive=False, single_round=True) ``` Make the `student_task` a subtask of the `teacher_task`: ```python teacher_task.add_sub_task(student_task) ``` Run the teacher task: ```python teacher_task.run() ``` You should then see this type of interaction: ![langroid-tour-teacher.png](langroid-tour-teacher.png) --- # Options for accessing LLMs > This is a work-in-progress document. It will be updated frequently. The variety of ways to access the power of Large Language Models (LLMs) is growing rapidly, and there are a bewildering array of options. This document is an attempt to categorize and describe some of the most popular and useful ways to access LLMs, via these 2x2x2 combinations: - Websites (non-programmatic) or APIs (programmatic) - Open-source or Proprietary - Chat-based interface or integrated assistive tools. We will go into some of these combinations below. More will be added over time. ## Chat-based Web (non-API) access to Proprietary LLMs This is best for *non-programmatic* use of LLMs: you go to a website and interact with the LLM via a chat interface -- you write prompts and/or upload documents, and the LLM responds with plain text or can create artifacts (e.g. reports, code, charts, podcasts, etc) that you can then copy into your files, workflow or codebase. They typically allow you to upload text-based documents of various types, and some let you upload images, screen-shots, etc and ask questions about them. Most of them are capable of doing *internet search* to inform their responses. !!! note "Chat Interface vs Integrated Tools" Note that when using a chat-based interaction, you have to copy various artifacts from the web-site into another place, like your code editor, document, etc. AI-integrated tools relieve you of this burden by bringing the LLM power into your workflow directly. More on this in a later section. **Pre-requisites:** - *Computer*: Besides having a modern web browser (Chrome, Firefox, etc) and internet access, there are no other special requirements, since the LLM is running on a remote server. - *Coding knowledge*: Where (typically Python) code is produced, you will get best results if you are conversant with Python so that you can understand and modify the code as needed. In this category you do not need to know how to interact with an LLM API via code. Here are some popular options in this category: ### OpenAI ChatGPT Free access at [https://chatgpt.com/](https://chatgpt.com/) With a ChatGPT-Plus monthly subscription ($20/month), you get additional features like: - access to more powerful models - access to [OpenAI canvas](https://help.openai.com/en/articles/9930697-what-is-the-canvas-feature-in-chatgpt-and-how-do-i-use-it) - this offers a richer interface than just a chat window, e.g. it automatically creates windows for code snippets, and shows results of running code (e.g. output, charts etc). Typical use: Since there is fixed monthly subscription (i.e. not metered by amount of usage), this is a cost-effective way to non-programmatically access a top LLM such as `GPT-4o` or `o1` (so-called "reasoning/thinking" models). Note however that there are limits on how many queries you can make within a certain time period, but usually the limit is fairly generous. What you can create, besides text-based artifacts: - produce Python (or other language) code which you can copy/paste into notebooks or files - SQL queries that you can copy/paste into a database tool - Markdown-based tables - You can't get diagrams, but you can get *code for diagrams*, e.g. python code for plots, [mermaid](https://github.com/mermaid-js/mermaid) code for flowcharts. - images in some cases. ### OpenAI Custom GPTs (simply known as "GPTs") [https://chatgpt.com/gpts/editor](https://chatgpt.com/gpts/editor) Here you can conversationally interact with a "GPT Builder" that will create a version of ChatGPT that is *customized* to your needs, i.e. with necessary background instructions, context, and/or documents. The end result is a specialized GPT that you can then use for your specific purpose and share with others (all of this is non-programmatic). E.g. [here](https://chatgpt.com/share/67153a4f-ea2c-8003-a6d3-cbc2412d78e5) is a "Knowledge Graph Builder" GPT !!! note "Private GPTs requires an OpenAI Team Account" To share a custom GPT within a private group, you need an OpenAI Team account, see pricing [here](https://openai.com/chatgpt/pricing). Without a Team account, any shared GPT is public and can be accessed by anyone. ### Anthropic/Claude [https://claude.ai](https://claude.ai) The Claude basic web-based interface is similar to OpenAI ChatGPT, powered by Anthropic's proprietary LLMs. Anthropic's equivalent of ChatGPT-Plus is called "Claude Pro", which is also a $20/month subscription, giving you access to advanced models (e.g. `Claude-3.5-Sonnet`) and features. Anthropic's equivalent of Custom GPTs is called [Projects](https://www.anthropic.com/news/projects), where you can create an LLM-powered interface that is augmented with your custom context and data. Whichever product you are using, the interface auto-creates **artifacts** as needed -- these are stand-alone documents (code, text, images, web-pages, etc) that you may want to copy and paste into your own codebase, documents, etc. For example you can prompt Claude to create full working interactive applications, and copy the code, polish it and deploy it for others to use. See examples [here](https://simonwillison.net/2024/Oct/21/claude-artifacts/). ### Microsoft Copilot Lab !!! note Microsoft's "Copilot" is an overloaded term that can refer to many different AI-powered tools. Here we are referring to the one that is a collaboration between Microsoft and OpenAI, and is based on OpenAI's GPT-4o LLM, and powered by Bing's search engine. Accessible via [https://copilot.cloud.microsoft.com/](https://copilot.cloud.microsoft.com/) The basic capabilities are similar to OpenAI's and Anthropic's offerings, but come with so-called "enterprise grade" security and privacy features, which purportedly make it suitable for use in educational and corporate settings. Read more on what you can do with Copilot Lab [here](https://www.microsoft.com/en-us/microsoft-copilot/learn/?form=MA13FV). Like the other proprietary offerings, Copilot can: - perform internet search to inform its responses - generate/run code and show results including charts ### Google Gemini Accessible at [gemini.google.com](https://gemini.google.com). ## AI-powered productivity tools These tools "bring the AI to your workflow", which is a massive productivity boost, compared to repeatedly context-switching, e.g. copying/pasting between a chat-based AI web-app and your workflow. - [**Cursor**](https://www.cursor.com/): AI Editor/Integrated Dev Environment (IDE). This is a fork of VSCode. - [**Zed**](https://zed.dev/): built in Rust; can be customized to use Jetbrains/PyCharm keyboard shortcuts. - [**Google Colab Notebooks with Gemini**](https://colab.research.google.com). - [**Google NotebookLM**](https://notebooklm.google.com/): allows you to upload a set of text-based documents, and create artifacts such as study guide, FAQ, summary, podcasts, etc. ## APIs for Proprietary LLMs Using an API key allows *programmatic* access to the LLMs, meaning you can make invocations to the LLM from within your own code, and receive back the results. This is useful for building applications involving more complex workflows where LLMs are used within a larger codebase, to access "intelligence" as needed. E.g. suppose you are writing code that handles queries from a user, and you want to classify the user's _intent_ into one of 3 types: Information, or Action or Done. Pre-LLMs, you would have had to write a bunch of rules or train a custom "intent classifier" that maps, for example: - "What is the weather in Pittsburgh?" -> Information - "Set a timer for 10 minutes" -> Action - "Ok I have no more questions∞" -> Done But using an LLM API, this is almost trivially easy - you instruct the LLM it should classify the intent into one of these 3 types, and send the user query to the LLM, and receive back the intent. (You can use Tools to make this robust, but that is outside the scope of this document.) The most popular proprietary LLMs available via API are from OpenAI (or via its partner Microsoft), Anthropic, and Google: - [OpenAI](https://platform.openai.com/docs/api-reference/introduction), to interact with `GPT-4o` family of models, and the `o1` family of "thinking/reasoning" models. - [Anthropic](https://docs.anthropic.com/en/home) to use the `Claude` series of models. - [Google](https://ai.google.dev/gemini-api/docs) to use the `Gemini` family of models. These LLM providers are home to some of the most powerful LLMs available today, specifically OpenAI's `GPT-4o` and Anthropic's `Claude-3.5-Sonnet`, and Google's `Gemini 1.5 Pro` (as of Oct 2024). **Billing:** Unlike the fixed monthly subscriptions of ChatGPT, Claude and others, LLM usage via API is typically billed by *token usage*, i.e. you pay for the total number of input and output "tokens" (a slightly technical term, but think of it as a word for now). Using an LLM API involves these steps: - create an account on the provider's website as a "developer" or organization, - get an API key, - use the API key in your code to make requests to the LLM. **Prerequisites**: - *Computer:* again, since the API is served over the internet, there are no special requirements for your computer. - *Programming skills:* Using an LLM API involves either: - directly making REST API calls from your code, or - use a scaffolding library (like [Langroid](https://github.com/langroid/langroid)) that abstracts away the details of the API calls. In either case, you must be highly proficient in (Python) programming to use this option. ## Web-interfaces to Open LLMs !!! note "Open LLMs" These are LLMs that have been publicly released, i.e. their parameters ("weights") are publicly available -- we refer to these as *open-weight* LLMs. If in addition, the training datasets, and data-preprocessing and training code are also available, we would call these *open-source* LLMs. But lately there is a looser usage of the term "open-source",referring to just the weights being available. For our purposes we will just refer all of these models as **Open LLMs**. There are many options here, but some popular ones are below. Note that some of these are front-ends that allow you to interact with not only Open LLMs but also proprietary LLM APIs. - [LMStudio](https://lmstudio.ai/) - [OpenWebUI](https://github.com/open-webui/open-webui) - [Msty](https://msty.app/) - [AnythingLLM](https://anythingllm.com/) - [LibreChat](https://www.librechat.ai/) ## API Access to Open LLMs This is a good option if you are fairly proficient in (Python) coding. There are in fact two possibilities here: - The LLM is hosted remotely, and you make REST API calls to the remote server. This is a good option when you want to run large LLMs and you don't have the resources (GPU and memory) to run them locally. - [groq](https://groq.com/) amazingly it is free, and you can run `llama-3.1-70b` - [cerebras](https://cerebras.ai/) - [open-router](https://openrouter.ai/) - The LLM is running on your computer. This is a good option if your machine has sufficient RAM to accommodate the LLM you are trying to run, and if you are concerned about data privacy. The most user-friendly option is [Ollama](https://github.com/ollama/ollama); see more below. Note that all of the above options provide an **OpenAI-Compatible API** to interact with the LLM, which is a huge convenience: you can write code to interact with OpenAI's LLMs (e.g. `GPT4o` etc) and then easily switch to one of the above options, typically by changing a simple config (see the respective websites for instructions). Of course, directly working with the raw LLM API quickly becomes tedious. This is where a scaffolding library like [langroid](https://github.com/langroid/langroid) comes in very handy - it abstracts away the details of the API calls, and provides a simple programmatic interface to the LLM, and higher-level abstractions like Agents, Tasks, etc. Working with such a library is going to be far more productive than directly working with the raw API. Below are instructions on how to use langroid with some the above Open/Local LLM options. See [here](https://langroid.github.io/langroid/tutorials/local-llm-setup/) for a guide to using Langroid with Open LLMs. --- # Setting up a Local/Open LLM to work with Langroid !!! tip "Examples scripts in [`examples/`](https://github.com/langroid/langroid/tree/main/examples) directory." There are numerous examples of scripts that can be run with local LLMs, in the [`examples/`](https://github.com/langroid/langroid/tree/main/examples) directory of the main `langroid` repo. These examples are also in the [`langroid-examples`](https://github.com/langroid/langroid-examples/tree/main/examples), although the latter repo may contain some examples that are not in the `langroid` repo. Most of these example scripts allow you to specify an LLM in the format `-m `, where the specification of `` is described in the quide below for local/open LLMs, or in the [Non-OpenAI LLM](https://langroid.github.io/langroid/tutorials/non-openai-llms/) guide. Scripts that have the string `local` in their name have been especially designed to work with certain local LLMs, as described in the respective scripts. If you want a pointer to a specific script that illustrates a 2-agent chat, have a look at [`chat-search-assistant.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-search-assistant.py). This specific script, originally designed for GPT-4/GPT-4o, works well with `llama3-70b` (tested via Groq, mentioned below). ## Easiest: with Ollama As of version 0.1.24, Ollama provides an OpenAI-compatible API server for the LLMs it supports, which massively simplifies running these LLMs with Langroid. Example below. ``` ollama pull mistral:7b-instruct-v0.2-q8_0 ``` This provides an OpenAI-compatible server for the `mistral:7b-instruct-v0.2-q8_0` model. You can run any Langroid script using this model, by setting the `chat_model` in the `OpenAIGPTConfig` to `ollama/mistral:7b-instruct-v0.2-q8_0`, e.g. ```python import langroid.language_models as lm import langroid as lr llm_config = lm.OpenAIGPTConfig( chat_model="ollama/mistral:7b-instruct-v0.2-q8_0", chat_context_length=16_000, # adjust based on model ) agent_config = lr.ChatAgentConfig( llm=llm_config, system_message="You are helpful but concise", ) agent = lr.ChatAgent(agent_config) # directly invoke agent's llm_response method # response = agent.llm_response("What is the capital of Russia?") task = lr.Task(agent, interactive=True) task.run() # for an interactive chat loop ``` ## Setup Ollama with a GGUF model from HuggingFace Some models are not directly supported by Ollama out of the box. To server a GGUF model with Ollama, you can download the model from HuggingFace and set up a custom Modelfile for it. E.g. download the GGUF version of `dolphin-mixtral` from [here](https://huggingface.co/TheBloke/dolphin-2.7-mixtral-8x7b-GGUF) (specifically, download this file `dolphin-2.7-mixtral-8x7b.Q4_K_M.gguf`) To set up a custom ollama model based on this: - Save this model at a convenient place, e.g. `~/.ollama/models/` - Create a modelfile for this model. First see what an existing modelfile for a similar model looks like, e.g. by running: ``` ollama show --modelfile dolphin-mixtral:latest ``` You will notice this file has a FROM line followed by a prompt template and other settings. Create a new file with these contents. Only change the `FROM ...` line with the path to the model you downloaded, e.g. ``` FROM /Users/blah/.ollama/models/dolphin-2.7-mixtral-8x7b.Q4_K_M.gguf ``` - Save this modelfile somewhere, e.g. `~/.ollama/modelfiles/dolphin-mixtral-gguf` - Create a new ollama model based on this file: ``` ollama create dolphin-mixtral-gguf -f ~/.ollama/modelfiles/dolphin-mixtral-gguf ``` - Run this new model using `ollama run dolphin-mixtral-gguf` To use this model with Langroid you can then specify `ollama/dolphin-mixtral-gguf` as the `chat_model` param in the `OpenAIGPTConfig` as in the previous section. When a script supports it, you can also pass in the model name via `-m ollama/dolphin-mixtral-gguf` ## Local LLMs using LMStudio LMStudio is one of the simplest ways to download run open-weight LLMs locally. See their docs at [lmstudio.ai](https://lmstudio.ai/docs) for installation and usage instructions. Once you download a model, you can use the "server" option to have it served via an OpenAI-compatible API at a local IP like `https://127.0.0.1:1234/v1`. As with any other scenario of running a local LLM, you can use this with Langroid by setting `chat_model` as follows (note you should not include the `https://` part): ```python llm_config = lm.OpenAIGPTConfig( chat_model="local/127.0.0.1234/v1", ... ) ``` ## Setup llama.cpp with a GGUF model from HuggingFace See `llama.cpp`'s [GitHub page](https://github.com/ggerganov/llama.cpp/tree/master) for build and installation instructions. After installation, begin as above with downloading a GGUF model from HuggingFace; for example, the quantized `Qwen2.5-Coder-7B` from [here](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF); specifically, [this file](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/blob/main/qwen2.5-coder-7b-instruct-q2_k.gguf). Now, the server can be started with `llama-server -m qwen2.5-coder-7b-instruct-q2_k.gguf`. In addition, your `llama.cpp` may be built with support for simplified management of HuggingFace models (specifically, `libcurl` support is required); in this case, `llama.cpp` will download HuggingFace models to a cache directory, and the server may be run with: ```bash llama-server \ --hf-repo Qwen/Qwen2.5-Coder-7B-Instruct-GGUF \ --hf-file qwen2.5-coder-7b-instruct-q2_k.gguf ``` To use the model with Langroid, specify `llamacpp/localhost:{port}` as the `chat_model`; the default port is 8080. ## Setup vLLM with a model from HuggingFace See [the vLLM docs](https://docs.vllm.ai/en/stable/getting_started/installation.html) for installation and configuration options. To run a HuggingFace model with vLLM, use `vllm serve`, which provides an OpenAI-compatible server. For example, to run `Qwen2.5-Coder-32B`, run `vllm serve Qwen/Qwen2.5-Coder-32B`. If the model is not publicly available, set the environment varaible `HF_TOKEN` to your HuggingFace token with read access to the model repo. To use the model with Langroid, specify `vllm/Qwen/Qwen2.5-Coder-32B` as the `chat_model` and, if a port other than the default 8000 was used, set `api_base` to `localhost:{port}`. ## Setup vLLM with a GGUF model from HuggingFace `vLLM` supports running quantized models from GGUF files; however, this is currently an experimental feature. To run a quantized `Qwen2.5-Coder-32B`, download the model from [the repo](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF), specifically [this file](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/qwen2.5-coder-32b-instruct-q4_0.gguf). The model can now be run with `vllm serve qwen2.5-coder-32b-instruct-q4_0.gguf --tokenizer Qwen/Qwen2.5-Coder-32B` (the tokenizer of the base model rather than the quantized model should be used). To use the model with Langroid, specify `vllm/qwen2.5-coder-32b-instruct-q4_0.gguf` as the `chat_model` and, if a port other than the default 8000 was used, set `api_base` to `localhost:{port}`. ## "Local" LLMs hosted on Groq In this scenario, an open-source LLM (e.g. `llama3.1-8b-instant`) is hosted on a Groq server which provides an OpenAI-compatible API. Using this with langroid is exactly analogous to the Ollama scenario above: you can set the `chat_model` in the `OpenAIGPTConfig` to `groq/`, e.g. `groq/llama3.1-8b-instant`. For this to work, ensure you have a `GROQ_API_KEY` environment variable set in your `.env` file. See [groq docs](https://console.groq.com/docs/quickstart). ## "Local" LLMs hosted on Cerebras This works exactly like with Groq, except you set up a `CEREBRAS_API_KEY` environment variable, and specify the `chat_model` as `cerebras/`, e.g. `cerebras/llama3.1-8b`. See the Cerebras [docs](https://inference-docs.cerebras.ai/introduction) for details on which LLMs are supported. ## Open/Proprietary LLMs via OpenRouter OpenRouter is a **paid service** that provides an OpenAI-compatible API for practically any LLM, open or proprietary. Using this with Langroid is similar to the `groq` scenario above: - Ensure you have an `OPENROUTER_API_KEY` set up in your environment (or `.env` file), and - Set the `chat_model` in the `OpenAIGPTConfig` to `openrouter/`, where `` is the name of the model on the [OpenRouter](https://openrouter.ai/) website, e.g. `qwen/qwen-2.5-7b-instruct`. This is a good option if you want to use larger open LLMs without having to download them locally (especially if your local machine does not have the resources to run them). Besides using specific LLMs, OpenRouter also has smart routing/load-balancing. OpenRouter is also convenient for using proprietary LLMs (e.g. gemini, amazon) via a single convenient API. ## "Local" LLMs hosted on GLHF.chat See [glhf.chat](https://glhf.chat/chat/create) for a list of available models. To run with one of these models, set the `chat_model` in the `OpenAIGPTConfig` to `"glhf/"`, where `model_name` is `hf:` followed by the HuggingFace repo path, e.g. `Qwen/Qwen2.5-Coder-32B-Instruct`, so the full `chat_model` would be `"glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct"`. ## DeepSeek LLMs As of 26-Dec-2024, DeepSeek models are available via their [api](https://platform.deepseek.com). To use it with Langroid: - set up your `DEEPSEEK_API_KEY` environment variable in the `.env` file or as an explicit export in your shell - set the `chat_model` in the `OpenAIGPTConfig` to `deepseek/deepseek-chat` to use the `DeepSeek-V3` model, or `deepseek/deepseek-reasoner` to use the full (i.e. non-distilled) `DeepSeek-R1` "reasoning" model. The DeepSeek models are also available via OpenRouter (see the corresponding in the OpenRouter section here) or ollama (see those instructions). E.g. you can use the DeepSeek R1 or its distilled variants by setting `chat_model` to `openrouter/deepseek/deepseek-r1` or `ollama/deepseek-r1:8b`. ## Other non-OpenAI LLMs supported by LiteLLM For other scenarios of running local/remote LLMs, it is possible that the `LiteLLM` library supports an "OpenAI adaptor" for these models (see their [docs](https://litellm.vercel.app/docs/providers)). Depending on the specific model, the `litellm` docs may say you need to specify a model in the form `/`, e.g. `palm/chat-bison`. To use the model with Langroid, simply prepend `litellm/` to this string, e.g. `litellm/palm/chat-bison`, when you specify the `chat_model` in the `OpenAIGPTConfig`. To use `litellm`, ensure you have the `litellm` extra installed, via `pip install langroid[litellm]` or equivalent. ## Harder: with oobabooga Like Ollama, [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) provides an OpenAI-API-compatible API server, but the setup is significantly more involved. See their github page for installation and model-download instructions. Once you have finished the installation, you can spin up the server for an LLM using something like this: ``` python server.py --api --model mistral-7b-instruct-v0.2.Q8_0.gguf --verbose --extensions openai --nowebui ``` This will show a message saying that the OpenAI-compatible API is running at `http://127.0.0.1:5000` Then in your Langroid code you can specify the LLM config using `chat_model="local/127.0.0.1:5000/v1` (the `v1` is the API version, which is required). As with Ollama, you can use the `-m` arg in many of the example scripts, e.g. ``` python examples/docqa/rag-local-simple.py -m local/127.0.0.1:5000/v1 ``` Recommended: to ensure accurate chat formatting (and not use the defaults from ooba), append the appropriate HuggingFace model name to the -m arg, separated by //, e.g. ``` python examples/docqa/rag-local-simple.py -m local/127.0.0.1:5000/v1//mistral-instruct-v0.2 ``` (no need to include the full model name, as long as you include enough to uniquely identify the model's chat formatting template) ## Other local LLM scenarios There may be scenarios where the above `local/...` or `ollama/...` syntactic shorthand does not work.(e.g. when using vLLM to spin up a local LLM at an OpenAI-compatible endpoint). For these scenarios, you will have to explicitly create an instance of `lm.OpenAIGPTConfig` and set *both* the `chat_model` and `api_base` parameters. For example, suppose you are able to get responses from this endpoint using something like: ```bash curl http://192.168.0.5:5078/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Mistral-7B-Instruct-v0.2", "messages": [ {"role": "user", "content": "Who won the world series in 2020?"} ] }' ``` To use this endpoint with Langroid, you would create an `OpenAIGPTConfig` like this: ```python import langroid.language_models as lm llm_config = lm.OpenAIGPTConfig( chat_model="Mistral-7B-Instruct-v0.2", api_base="http://192.168.0.5:5078/v1", ) ``` ## Quick testing with local LLMs As mentioned [here](https://langroid.github.io/langroid/tutorials/non-openai-llms/#quick-testing-with-non-openai-models), you can run many of the [tests](https://github.com/langroid/langroid/tree/main/tests/main) in the main langroid repo against a local LLM (which by default run against an OpenAI model), by specifying the model as `--m `, where `` follows the syntax described in the previous sections. Here's an example: ```bash pytest tests/main/test_chat_agent.py --m ollama/mixtral ``` Of course, bear in mind that the tests may not pass due to weaknesses of the local LLM. --- # Using Langroid with Non-OpenAI LLMs Langroid was initially written to work with OpenAI models via their API. This may sound limiting, but fortunately: - Many open-source LLMs can be served via OpenAI-compatible endpoints. See the [Local LLM Setup](https://langroid.github.io/langroid/tutorials/local-llm-setup/) guide for details. - There are tools like [LiteLLM](https://github.com/BerriAI/litellm/tree/main/litellm) that provide an OpenAI-like API for _hundreds_ of non-OpenAI LLM providers (e.g. Anthropic's Claude, Google's Gemini). - AI gateways like [LangDB](https://langdb.ai/), [Portkey](https://portkey.ai), and [OpenRouter](https://openrouter.ai/) provide unified access to multiple LLM providers with additional features like cost control, observability, caching, and fallback strategies. Below we show how you can use these various options with Langroid. ## Create an `OpenAIGPTConfig` object with `chat_model = "litellm/..."` !!! note "Install `litellm` extra" To use `litellm` you need to install Langroid with the `litellm` extra, e.g.: `pip install "langroid[litellm]"` Next, look up the instructions in LiteLLM docs for the specific model you are interested. Here we take the example of Anthropic's `claude-instant-1` model. Set up the necessary environment variables as specified in the LiteLLM docs, e.g. for the `claude-instant-1` model, you will need to set the `ANTHROPIC_API_KEY` ```bash export ANTHROPIC_API_KEY=my-api-key ``` Now you are ready to create an instance of `OpenAIGPTConfig` with the `chat_model` set to `litellm/`, where you should set `model_spec` based on LiteLLM docs. For example, for the `claude-instant-1` model, you would set `chat_model` to `litellm/claude-instant-1`. But if you are using the model via a 3rd party provider, (e.g. those via Amazon Bedrock), you may also need to have a `provider` part in the `model_spec`, e.g. `litellm/bedrock/anthropic.claude-instant-v1`. In general you can see which of these to use, from the LiteLLM docs. ```python import langroid.language_models as lm llm_config = lm.OpenAIGPTConfig( chat_model="litellm/claude-instant-v1", chat_context_length=8000, # adjust according to model ) ``` A similar process works for the `Gemini 1.5 Pro` LLM: - get the API key [here](https://aistudio.google.com/) - set the `GEMINI_API_KEY` environment variable in your `.env` file or shell - set `chat_model="litellm/gemini/gemini-1.5-pro-latest"` in the `OpenAIGPTConfig` object For other gemini models supported by litellm, see [their docs](https://litellm.vercel.app/docs/providers/gemini) ## Gemini LLMs via OpenAI client, without LiteLLM This is now the recommended way to use Gemini LLMs with Langroid, where you don't need to use LiteLLM. As of 11/20/2024, these models are [available via the OpenAI client](https://developers.googleblog.com/en/gemini-is-now-accessible-from-the-openai-library/). To use langroid with Gemini LLMs, all you have to do is: - set the `GEMINI_API_KEY` environment variable in your `.env` file or shell - set `chat_model="gemini/"` in the `OpenAIGPTConfig` object, where is one of "gemini-1.5-flash", "gemini-1.5-flash-8b", or "gemini-1.5-pro" See [here](https://ai.google.dev/gemini-api/docs/models/gemini) for details on Gemini models. For example, you can use this `llm_config`: ```python llm_config = lm.OpenAIGPTConfig( chat_model="gemini/" + lm.OpenAIChatModel.GEMINI_1_5_FLASH, ) ``` In most tests you can switch to a gemini model, e.g. `--m gemini/gemini-1.5-flash`, e.g.: ```bash pytest -xvs tests/main/test_llm.py --m gemini/gemini-1.5-flash ``` Many of the example scripts allow switching the model using `-m` or `--model`, e.g. ```bash python3 examples/basic/chat.py -m gemini/gemini-1.5-flash ``` ## AI Gateways for Multiple LLM Providers In addition to LiteLLM, Langroid integrates with AI gateways that provide unified access to multiple LLM providers with additional enterprise features: ### LangDB [LangDB](https://langdb.ai/) is an AI gateway offering OpenAI-compatible APIs to access 250+ LLMs with cost control, observability, and performance benchmarking. LangDB enables seamless model switching while providing detailed analytics and usage tracking. To use LangDB with Langroid: - Set up your `LANGDB_API_KEY` and `LANGDB_PROJECT_ID` environment variables - Set `chat_model="langdb//"` in the `OpenAIGPTConfig` (e.g., `"langdb/anthropic/claude-3.7-sonnet"`) For detailed setup and usage instructions, see the [LangDB integration guide](../notes/langdb.md). ### Portkey [Portkey](https://portkey.ai) is a comprehensive AI gateway that provides access to 200+ models from various providers through a unified API. It offers advanced features like intelligent caching, automatic retries, fallback strategies, and comprehensive observability tools for production deployments. To use Portkey with Langroid: - Set up your `PORTKEY_API_KEY` environment variable (plus provider API keys like `OPENAI_API_KEY`) - Set `chat_model="portkey//"` in the `OpenAIGPTConfig` (e.g., `"portkey/openai/gpt-4o-mini"`) For detailed setup and usage instructions, see the [Portkey integration guide](../notes/portkey.md). ### OpenRouter [OpenRouter](https://openrouter.ai/) provides access to a wide variety of both open and proprietary LLMs through a unified API. It features automatic routing and load balancing, making it particularly useful for accessing larger open LLMs without local resources and for using multiple providers through a single interface. To use OpenRouter with Langroid: - Set up your `OPENROUTER_API_KEY` environment variable - Set `chat_model="openrouter/"` in the `OpenAIGPTConfig` For more details, see the [Local LLM Setup guide](local-llm-setup.md#local-llms-available-on-openrouter). ## Working with the created `OpenAIGPTConfig` object From here you can proceed as usual, creating instances of `OpenAIGPT`, `ChatAgentConfig`, `ChatAgent` and `Task` object as usual. E.g. you can create an object of class `OpenAIGPT` (which represents any LLM with an OpenAI-compatible API) and interact with it directly: ```python llm = lm.OpenAIGPT(llm_config) messages = [ LLMMessage(content="You are a helpful assistant", role=Role.SYSTEM), LLMMessage(content="What is the capital of Ontario?", role=Role.USER), ], response = mdl.chat(messages, max_tokens=50) ``` When you interact directly with the LLM, you are responsible for keeping dialog history. Also you would often want an LLM to have access to tools/functions and external data/documents (e.g. vector DB or traditional DB). An Agent class simplifies managing all of these. For example, you can create an Agent powered by the above LLM, wrap it in a Task and have it run as an interactive chat app: ```python agent_config = lr.ChatAgentConfig(llm=llm_config, name="my-llm-agent") agent = lr.ChatAgent(agent_config) task = lr.Task(agent, name="my-llm-task") task.run() ``` ## Example: Simple Chat script with a non-OpenAI proprietary model Many of the Langroid example scripts have a convenient `-m` flag that lets you easily switch to a different model. For example, you can run the `chat.py` script in the `examples/basic` folder with the `litellm/claude-instant-v1` model: ```bash python3 examples/basic/chat.py -m litellm/claude-instant-1 ``` ## Quick testing with non-OpenAI models There are numerous tests in the main [Langroid repo](https://github.com/langroid/langroid) that involve LLMs, and once you setup the dev environment as described in the README of the repo, you can run any of those tests (which run against the default GPT4 model) against local/remote models that are proxied by `liteLLM` (or served locally via the options mentioned above, such as `oobabooga`, `ollama` or `llama-cpp-python`), using the `--m ` option, where `model-name` takes one of the forms above. Some examples of tests are: ```bash pytest -s tests/test_llm.py --m local/localhost:8000 pytest -s tests/test_llm.py --m litellm/claude-instant-1 ``` When the `--m` option is omitted, the default OpenAI GPT4 model is used. !!! note "`chat_context_length` is not affected by `--m`" Be aware that the `--m` only switches the model, but does not affect the `chat_context_length` parameter in the `OpenAIGPTConfig` object. which you may need to adjust for different models. So this option is only meant for quickly testing against different models, and not meant as a way to switch between models in a production environment. --- # Chat with a PostgreSQL DB using SQLChatAgent The [`SQLChatAgent`](../reference/agent/special/sql/sql_chat_agent.md) is designed to facilitate interactions with an SQL database using natural language. A ready-to-use script based on the `SQLChatAgent` is available in the `langroid-examples` repo at [`examples/data-qa/sql-chat/sql_chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/data-qa/sql-chat/sql_chat.py) (and also in a similar location in the main `langroid` repo). This tutorial walks you through how you might use the `SQLChatAgent` if you were to write your own script from scratch. We also show some of the internal workings of this Agent. The agent uses the schema context to generate SQL queries based on a user's input. Here is a tutorial on how to set up an agent with your PostgreSQL database. The steps for other databases are similar. Since the agent implementation relies on SqlAlchemy, it should work with any SQL DB that supports SqlAlchemy. It offers enhanced functionality for MySQL and PostgreSQL by automatically extracting schemas from the database. ## Before you begin !!! note "Data Privacy Considerations" Since the SQLChatAgent uses the OpenAI GPT-4 as the underlying language model, users should be aware that database information processed by the agent may be sent to OpenAI's API and should therefore be comfortable with this. 1. Install PostgreSQL dev libraries for your platform, e.g. - `sudo apt-get install libpq-dev` on Ubuntu, - `brew install postgresql` on Mac, etc. 2. Follow the general [setup guide](../quick-start/setup.md) to get started with Langroid (mainly, install `langroid` into your virtual env, and set up suitable values in the `.env` file). Note that to use the SQLChatAgent with a PostgreSQL database, you need to install the `langroid[postgres]` extra, e.g.: - `pip install "langroid[postgres]"` or - `poetry add "langroid[postgres]"` or `uv add "langroid[postgres]"` - `poetry install -E postgres` or `uv pip install --extra postgres -r pyproject.toml` If this gives you an error, try `pip install psycopg2-binary` in your virtualenv. ## Initialize the agent ```python from langroid.agent.special.sql.sql_chat_agent import ( SQLChatAgent, SQLChatAgentConfig, ) agent = SQLChatAgent( config=SQLChatAgentConfig( database_uri="postgresql://example.db", ) ) ``` ## Configuration The following components of `SQLChatAgentConfig` are optional but strongly recommended for improved results: * `context_descriptions`: A nested dictionary that specifies the schema context for the agent to use when generating queries, for example: ```json { "table1": { "description": "description of table1", "columns": { "column1": "description of column1 in table1", "column2": "description of column2 in table1" } }, "employees": { "description": "The 'employees' table contains information about the employees. It relates to the 'departments' and 'sales' tables via foreign keys.", "columns": { "id": "A unique identifier for an employee. This ID is used as a foreign key in the 'sales' table.", "name": "The name of the employee.", "department_id": "The ID of the department the employee belongs to. This is a foreign key referencing the 'id' in the 'departments' table." } } } ``` > By default, if no context description json file is provided in the config, the agent will automatically generate the file using the built-in Postgres table/column comments. * `schema_tools`: When set to `True`, activates a retrieval mode where the agent systematically requests only the parts of the schemas relevant to the current query. When this option is enabled, the agent performs the following steps: 1. Asks for table names. 2. Asks for table descriptions and column names from possibly relevant table names. 3. Asks for column descriptions from possibly relevant columns. 4. Writes the SQL query. Setting `schema_tools=True` is especially useful for large schemas where it is costly or impossible to include the entire schema in a query context. By selectively using only the relevant parts of the context descriptions, this mode reduces token usage, though it may result in 1-3 additional OpenAI API calls before the final SQL query is generated. ## Putting it all together In the code below, we will allow the agent to generate the context descriptions from table comments by excluding the `context_descriptions` config option. We set `schema_tools` to `True` to enable the retrieval mode. ```python from langroid.agent.special.sql.sql_chat_agent import ( SQLChatAgent, SQLChatAgentConfig, ) # Initialize SQLChatAgent with a PostgreSQL database URI and enable schema_tools agent = SQLChatAgent(gi config = SQLChatAgentConfig( database_uri="postgresql://example.db", schema_tools=True, ) ) # Run the task to interact with the SQLChatAgent task = Task(agent) task.run() ``` By following these steps, you should now be able to set up an `SQLChatAgent` that interacts with a PostgreSQL database, making querying a seamless experience. In the `langroid` repo we have provided a ready-to-use script [`sql_chat.py`](https://github.com/langroid/langroid/blob/main/examples/data-qa/sql-chat/sql_chat.py) based on the above, that you can use right away to interact with your PostgreSQL database: ```python python3 examples/data-qa/sql-chat/sql_chat.py ``` This script will prompt you for the database URI, and then start the agent. --- # Langroid Supported LLMs and Providers Langroid supports a wide range of Language Model providers through its [`OpenAIGPTConfig`][langroid.language_models.openai_gpt.OpenAIGPTConfig] class. !!! note "OpenAIGPTConfig is not just for OpenAI models!" The `OpenAIGPTConfig` class is a generic configuration class that can be used to configure any LLM provider that is OpenAI API-compatible. This includes both local and remote models. You would typically set up the `OpenAIGPTConfig` class with the `chat_model` parameter, which specifies the model you want to use, and other parameters such as `max_output_tokens`, `temperature`, etc (see the [`OpenAIGPTConfig`][langroid.language_models.openai_gpt.OpenAIGPTConfig] class and its parent class [`LLModelConfig`][langroid.language_models.base.LLMConfig] for full parameter details): ```python import langroid.language_models as lm llm_config = lm.OpenAIGPTConfig( chat_model="", # possibly includes a prefix api_key="api-key", # optional, prefer setting in environment variables # ... other params such as max_tokens, temperature, etc. ) ``` Below are `chat_model` examples for each supported provider. For more details see the guides on setting up Langroid with [local](https://langroid.github.io/langroid/tutorials/local-llm-setup/) and [non-OpenAI LLMs](https://langroid.github.io/langroid/tutorials/non-openai-llms/). Once you set up the `OpenAIGPTConfig`, you can then directly interact with the LLM, or set up an Agent with this LLM, and use it by itself, or in a multi-agent setup, as shown in the [Langroid quick tour](https://langroid.github.io/langroid/tutorials/langroid-tour/) Although we support specifying the `api_key` directly in the config (not recommended for security reasons), more typically you would set the `api_key` in your environment variables. Below is a table showing for each provider, an example `chat_model` setting, and which environment variable to set for the API key. | Provider | `chat_model` Example | API Key Environment Variable | |---------------|----------------------------------------------------------|----------------------------| | OpenAI | `gpt-4o` | `OPENAI_API_KEY` | | Groq | `groq/llama3.3-70b-versatile` | `GROQ_API_KEY` | | Cerebras | `cerebras/llama-3.3-70b` | `CEREBRAS_API_KEY` | | Gemini | `gemini/gemini-2.0-flash` | `GEMINI_API_KEY` | | DeepSeek | `deepseek/deepseek-reasoner` | `DEEPSEEK_API_KEY` | | GLHF | `glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct` | `GLHF_API_KEY` | | OpenRouter | `openrouter/deepseek/deepseek-r1-distill-llama-70b:free` | `OPENROUTER_API_KEY` | | Ollama | `ollama/qwen2.5` | `OLLAMA_API_KEY` (usually `ollama`) | | VLLM | `vllm/mistral-7b-instruct` | `VLLM_API_KEY` | | LlamaCPP | `llamacpp/localhost:8080` | `LLAMA_API_KEY` | | Generic Local | `local/localhost:8000/v1` | No specific env var required | | LiteLLM | `litellm/anthropic/claude-3-7-sonnet` | Depends on provider | | | `litellm/mistral-small` | Depends on provider | | HF Template | `local/localhost:8000/v1//mistral-instruct-v0.2` | Depends on provider | | | `litellm/ollama/mistral//hf` | | ## HuggingFace Chat Template Formatting For models requiring specific prompt formatting: ```python import langroid.language_models as lm # Specify formatter directly llm_config = lm.OpenAIGPTConfig( chat_model="local/localhost:8000/v1//mistral-instruct-v0.2", formatter="mistral-instruct-v0.2" ) # Using HF formatter auto-detection llm_config = lm.OpenAIGPTConfig( chat_model="litellm/ollama/mistral//hf", ) ```