# Gpt4All > [![logo](../assets/nomic.png)](../index.html "GPT4All") GPT4All --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_api_server/home.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](../gpt4all_desktop/quickstart.html) * [ Chats ](../gpt4all_desktop/chats.html) * [ Models ](../gpt4all_desktop/models.html) * [ LocalDocs ](../gpt4all_desktop/localdocs.html) * [ Settings ](../gpt4all_desktop/settings.html) * [ Chat Templates ](../gpt4all_desktop/chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * GPT4All API Server [ GPT4All API Server ](home.html) Table of contents * Key Features * Activating the API Server * Connecting to the API Server * Examples * API Endpoints * LocalDocs Integration * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * Key Features * Activating the API Server * Connecting to the API Server * Examples * API Endpoints * LocalDocs Integration # GPT4All API Server GPT4All provides a local API server that allows you to run LLMs over an HTTP API. ## Key Features * **Local Execution** : Run models on your own hardware for privacy and offline use. * **LocalDocs Integration** : Run the API with relevant text snippets provided to your LLM from a [LocalDocs collection](../gpt4all_desktop/localdocs.html). * **OpenAI API Compatibility** : Use existing OpenAI-compatible clients and tools with your local models. ## Activating the API Server 1. Open the GPT4All Chat Desktop Application. 2. Go to `Settings` > `Application` and scroll down to `Advanced`. 3. Check the box for the `"Enable Local API Server"` setting. 4. The server listens on port 4891 by default. You can choose another port number in the `"API Server Port"` setting. ## Connecting to the API Server The base URL used for the API server is `http://localhost:4891/v1` (or `http://localhost:/v1` if you are using a different port number). The server only accepts HTTP connections (not HTTPS) and only listens on localhost (127.0.0.1) (e.g. not to the IPv6 localhost address `::1`.) ## Examples Example GPT4All API calls cURLPowerShell curl -X POST http://localhost:4891/v1/chat/completions -d '{ "model": "Phi-3 Mini Instruct", "messages": [{"role":"user","content":"Who is Lionel Messi?"}], "max_tokens": 50, "temperature": 0.28 }' Invoke-WebRequest -URI http://localhost:4891/v1/chat/completions -Method POST -ContentType application/json -Body '{ "model": "Phi-3 Mini Instruct", "messages": [{"role":"user","content":"Who is Lionel Messi?"}], "max_tokens": 50, "temperature": 0.28 }' ## API Endpoints Method | Path | Description ---|---|--- GET | `/v1/models` | List available models GET | `/v1/models/` | Get details of a specific model POST | `/v1/completions` | Generate text completions POST | `/v1/chat/completions` | Generate chat completions ## LocalDocs Integration You can use LocalDocs with the API server: 1. Open the Chats view in the GPT4All application. 2. Scroll to the bottom of the chat history sidebar. 3. Select the server chat (it has a different background color). 4. Activate LocalDocs collections in the right sidebar. (Note: LocalDocs can currently only be activated through the GPT4All UI, not via the API itself). Now, your API calls to your local LLM will have relevant references from your LocalDocs collection retrieved and placed in the input message for the LLM to respond to. The references retrieved for your API call can be accessed in the API response object at `response["choices"][0]["references"]` The data included in the `references` are: * `text`: the actual text content from the snippet that was extracted from the reference document * `author`: the author of the reference document (if available) * `date`: the date of creation of the reference document (if available) * `page`: the page number the snippet is from (only available for PDF documents for now) * `title`: the title of the reference document (if available) --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/chat_templates.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](quickstart.html) * [ Chats ](chats.html) * [ Models ](models.html) * [ LocalDocs ](localdocs.html) * [ Settings ](settings.html) * Chat Templates [ Chat Templates ](chat_templates.html) Table of contents * What are chat templates? * What is a system message? * How do I customize the chat template or system message? * Do I need to write a chat template? * What changed in GPT4All v3.5? * Error/Warning: System message is not plain text. * Error: Legacy system prompt needs to be updated in Settings. * Error/Warning: Chat template is not in Jinja format. * Error: Legacy prompt template needs to be updated in Settings. * The chat template has a syntax error. * Error: No chat template configured. * Error: The chat template cannot be blank. * How do I find a chat template? * Using the CLI (all models) * Python (open models) * Python (gated models) * Advanced: How do chat templates work? * Advanced: How do I make a chat template? * Advanced: What are GPT4All v1 templates? * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * What are chat templates? * What is a system message? * How do I customize the chat template or system message? * Do I need to write a chat template? * What changed in GPT4All v3.5? * Error/Warning: System message is not plain text. * Error: Legacy system prompt needs to be updated in Settings. * Error/Warning: Chat template is not in Jinja format. * Error: Legacy prompt template needs to be updated in Settings. * The chat template has a syntax error. * Error: No chat template configured. * Error: The chat template cannot be blank. * How do I find a chat template? * Using the CLI (all models) * Python (open models) * Python (gated models) * Advanced: How do chat templates work? * Advanced: How do I make a chat template? * Advanced: What are GPT4All v1 templates? ## What are chat templates? Natively, large language models only know how to complete plain text and do not know the difference between their input and their output. In order to support a chat with a person, LLMs are designed to use a template to convert the conversation to plain text using a specific format. For a given model, it is important to use an appropriate chat template, as each model is designed to work best with a specific format. The chat templates included with the built-in models should be sufficient for most purposes. There are two reasons you would want to alter the chat template: * You are sideloading a model and there is no chat template available, * You would like to have greater control over the input to the LLM than a system message provides. ## What is a system message? A system message is a message that controls the responses from the LLM in a way that affects the entire conversation. System messages can be short, such as "Speak like a pirate.", or they can be long and contain a lot of context for the LLM to keep in mind. Not all models are designed to use a system message, so they work with some models better than others. ## How do I customize the chat template or system message? To customize the chat template or system message, go to Settings > Model. Make sure to select the correct model at the top. If you clone a model, you can use a different chat template or system message from the base model, enabling you to use different settings for each conversation. These settings take effect immediately. After changing them, you can click "Redo last response" in the chat view, and the response will take the new settings into account. ## Do I need to write a chat template? You typically do not need to write your own chat template. The exception is models that are not in the official model list and do not come with a chat template built-in. These will show a "Clear" option above the chat template field in the Model Settings page instead of a "Reset" option. See the section on finding or creating a chat template. ## What changed in GPT4All v3.5? GPT4All v3.5 overhauled the chat template system. There are three crucial differences: * The chat template now formats an entire conversation instead of a single pair of messages, * The chat template now uses Jinja syntax instead of `%1` and `%2` placeholders, * And the system message should no longer contain control tokens or trailing whitespace. If you are using any chat templates or system messages that had been added or altered from the default before upgrading to GPT4All v3.5 or newer, these will no longer work. See below for how to solve common errors you may see after upgrading. ## Error/Warning: System message is not plain text. This is easy to fix. Go to the model's settings and look at the system prompt. There are three things to look for: * Control tokens such as `<|im_start|>`, `<|start_header_id|>`, or `<|system|>` * A prefix such as `### System` or `SYSTEM:` * Trailing whitespace, such as a space character or blank line. If you see any of these things, remove them. For example, this legacy system prompt: <|start_header_id|>system<|end_header_id|> You are a helpful assistant.<|eot_id|> Should become this: You are a helpful assistant. If you do not see anything that needs to be changed, you can dismiss the error by making a minor modification to the message and then changing it back. If you see a warning, your system message does not appear to be plain text. If you believe this warning is incorrect, it can be safely ignored. If in doubt, ask on the [Discord](https://discord.gg/mGZE39AS3e). ## Error: Legacy system prompt needs to be updated in Settings. This is the same as above, but appears on the chat page. ## Error/Warning: Chat template is not in Jinja format. This is the result of attempting to use an old-style template (possibly from a previous version) in GPT4All 3.5+. Go to the Model Settings page and select the affected model. If you see a "Reset" button, and you have not intentionally modified the prompt template, you can click "Reset". Otherwise, this is what you can do: 1. Back up your chat template by copying it safely to a text file and saving it. In the next step, it will be removed from GPT4All. 2. Click "Reset" or "Clear". 3. If you clicked "Clear", the chat template is now gone. Follow the steps to find or create a basic chat template for your model. 4. Customize the chat template to suit your needs. For help, read the section about creating a chat template. ## Error: Legacy prompt template needs to be updated in Settings. This is the same as above, but appears on the chat page. ## The chat template has a syntax error. If there is a syntax error while editing the chat template, the details will be displayed in an error message above the input box. This could be because the chat template is not actually in Jinja format (see above). Otherwise, you have either typed something correctly, or the model comes with a template that is incompatible with GPT4All. See the below section on creating chat templates and make sure that everything is correct. When in doubt, ask on the [Discord](https://discord.gg/mGZE39AS3e). ## Error: No chat template configured. This may appear for models that are not from the official model list and do not include a chat template. Older versions of GPT4All picked a poor default in this case. You will get much better results if you follow the steps to find or create a chat template for your model. ## Error: The chat template cannot be blank. If the button above the chat template on the Model Settings page says "Clear", see above. If you see "Reset", click that button to restore a reasonable default. Also see the section on syntax errors. ## How do I find a chat template? When in doubt, you can always ask the [Discord](https://discord.gg/mGZE39AS3e) community for help. Below are the instructions to find one on your own. The authoritative source for a model's chat template is the HuggingFace repo that the original (non-GGUF) model came from. First, you should find this page. If you just have a model file, you can try a google search for the model's name. If you know the page you downloaded the GGUF model from, its README usually links to the original non-GGUF model. Once you have located the original model, there are two methods you can use to extract its chat template. Pick whichever one you are most comfortable with. ### Using the CLI (all models) 1. Install `jq` using your preferred package manager - e.g. Chocolatey (Windows), Homebrew (macOS), or apt (Ubuntu). 2. Download `tokenizer_config.json` from the model's "Files and versions" tab. 3. Open a command prompt in the directory which you have downloaded the model file. 4. Run `jq -r ".chat_template" tokenizer_config.json`. This shows the chat template in a human-readable form. You can copy this and paste it into the settings page. 5. (Optional) You can save the output to a text file like this: `jq -r ".chat_template" tokenizer_config.json >chat_template.txt` If the output is "null", the model does not provide a chat template. See the below instructions on creating a chat template. ### Python (open models) 1. Install `transformers` using your preferred python package manager, e.g. `pip install transformers`. Make sure it is at least version v4.43.0. 2. Copy the ID of the HuggingFace model, using the clipboard icon next to the name. For example, if the URL is `https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B`, the ID is `NousResearch/Hermes-2-Pro-Llama-3-8B`. 3. Open a python interpreter (`python`) and run the following commands. Change the model ID in the example to the one you copied. >>> from transformers import AutoTokenizer >>> tokenizer = AutoTokenizer.from_pretrained('NousResearch/Hermes-2-Pro-Llama-3-8B') >>> print(tokenizer.get_chat_template()) You can copy the output and paste it into the settings page. 4. (Optional) You can save the output to a text file like this: >>> open('chat_template.txt', 'w').write(tokenizer.get_chat_template()) If you get a ValueError exception, this model does not provide a chat template. See the below instructions on creating a chat template. ### Python (gated models) Some models, such as Llama and Mistral, do not allow public access to their chat template. You must either use the CLI method above, or follow the following instructions to use Python: 1. For these steps, you must have git and git-lfs installed. 2. You must have a HuggingFace account and be logged in. 3. You must already have access to the gated model. Otherwise, request access. 4. You must have an SSH key configured for git access to HuggingFace. 5. `git clone` the model's HuggingFace repo using the SSH clone URL. There is no need to download the entire model, which is very large. A good way to do this on Linux is: $ GIT_LFS_SKIP_SMUDGE=1 git clone hf.co:meta-llama/Llama-3.1-8B-Instruct.git $ cd Llama-3.1-8B-Instruct $ git lfs pull -I "tokenizer.*" 6. Follow the above instructions for open models, but replace the model ID with the path to the directory containing `tokenizer\_config.json`: >>> tokenizer = AutoTokenizer.from_pretrained('.') ## Advanced: How do chat templates work? The chat template is applied to the entire conversation you see in the chat window. The template loops over the list of messages, each containing `role` and `content` fields. `role` is either `user`, `assistant`, or `system`. GPT4All also supports the special variables `bos_token`, `eos_token`, and `add_generation_prompt`. See the [HuggingFace docs](https://huggingface.co/docs/transformers/v4.46.3/en/chat_templating#special- variables) for what those do. ## Advanced: How do I make a chat template? The best way to create a chat template is to start by using an existing one as a reference. Then, modify it to use the format documented for the given model. Its README page may explicitly give an example of its template. Or, it may mention the name of a well-known standard template, such as ChatML, Alpaca, Vicuna. GPT4All does not yet include presets for these templates, so they will have to be found in other models or taken from the community. For more information, see the very helpful [HuggingFace guide](https://huggingface.co/docs/transformers/v4.46.3/en/chat_templating#advanced- template-writing-tips). Some of this is not applicable, such as the information about tool calling and RAG - GPT4All implements those features differently. Some models use a prompt template that does not intuitively map to a multi- turn chat, because it is more intended for single instructions. The [FastChat](https://github.com/lm- sys/FastChat/blob/main/fastchat/conversation.py) implementation of these templates is a useful reference for the correct way to extend them to multiple messages. # Advanced: What are GPT4All v1 templates? GPT4All supports its own template syntax, which is nonstandard but provides complete control over the way LocalDocs sources and file attachments are inserted into the conversation. These templates begin with `{# gpt4all v1 #}` and look similar to the example below. For standard templates, GPT4All combines the user message, sources, and attachments into the `content` field. For GPT4All v1 templates, this is not done, so they must be used directly in the template for those features to work correctly. {# gpt4all v1 #} {%- for message in messages %} {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' }} {%- if message['role'] == 'user' %} {%- for source in message['sources'] %} {%- if loop.first %} {{- '### Context:\n' }} {%- endif %} {{- 'Collection: ' + source['collection'] + '\n' + 'Path: ' + source['path'] + '\n' + 'Excerpt: ' + source['text'] + '\n\n' }} {%- endfor %} {%- endif %} {%- for attachment in message['prompt_attachments'] %} {{- attachment['processed_content'] + '\n\n' }} {%- endfor %} {{- message['content'] | trim }} {{- '<|eot_id|>' }} {%- endfor %} {%- if add_generation_prompt %} {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }} {%- endif %} --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/chats.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](quickstart.html) * Chats [ Chats ](chats.html) Table of contents * New Chat * LocalDocs * Chat History * [ Models ](models.html) * [ LocalDocs ](localdocs.html) * [ Settings ](settings.html) * [ Chat Templates ](chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * New Chat * LocalDocs * Chat History # Chats Chats are conversations with [language models](models.html) that run locally on your device. Example Chats Llama 3Nous Hermes 2 Mistral DPO Download `Llama 3` and prompt: explain why the sky is blue in a way that is correct and makes sense to a child With the default sampling settings, you should see text resembling the following: What a great question! So, you know how we can see different colors like red, yellow, green, and orange? Well, when sunlight enters Earth's atmosphere, it starts to interact with tiny particles called molecules of gases like nitrogen (N2) and oxygen (02). These molecules are really small, but they're all around us! Now, here's the cool part: these molecules scatter, or bounce off, shorter wavelengths of light more than longer wavelengths. Think of it like a game of pool where the cue ball hits other balls with different sizes and speeds. The shortest wavelength of visible light is violet (about 400 nanometers), followed by blue (around 450-495 nanometers). These shorter wavelengths are scattered in all directions, kind of like how a basketball player might bounce the ball around on the court. This scattering effect makes the sky appear blue to our eyes! Here's why it doesn't look as bright or intense as a blue crayon: there are many more molecules in the atmosphere that scatter longer wavelengths (like red and orange) than shorter ones, so those colors get spread out and become less noticeable. To sum it up: 1. Sunlight enters Earth's atmosphere. 2. Tiny gas molecule particles interact with light, scattering shorter wavelengths like blue and violet. 3. This scattered blue light reaches our eyes from all directions, making the sky appear blue! Now, go outside and enjoy that beautiful blue sky! Download `Nous Hermes 2 Mistral DPO` and prompt: write me a react app i can run from the command line to play a quick game With the default sampling settings, you should see text and code blocks resembling the following: I'll provide you with a simple React app that allows you to play a guessing game. This app will generate a random number between 1 and 10, and ask you to guess it. You can run this app locally using Create React App. Follow the steps below: Open your terminal or command line interface. Navigate to the directory where you want to create the project (e.g., cd Documents/Projects). Run the following commands: npx create-react-app guessing-game cd guessing-game Open the project in your preferred code editor by running code . (for VS Code) or any other command to open the folder with your text editor. Replace the content of the src/App.js file with the following code: import React, { useState } from 'react'; function App() { const [guess, setGuess] = useState(''); const [message, setMessage] = useState(''); const [numberToGuess, setNumberToGuess] = useState(Math.floor(Math.random() * 10) + 1); function handleChange(e) { setGuess(e.target.value); } function handleSubmit(e) { e.preventDefault(); if (parseInt(guess) === numberToGuess) { setMessage('You guessed it right!'); } else { setMessage(`Wrong guess, the correct number was ${numberToGuess}. Try again next time.`); } } return (

Guess a Number

{message}

); } export default App; Save the file and go back to your terminal or command line interface. Run npm start. This will start a local development server, and you can access the app in your browser at http://localhost:3000. Play the guessing game by entering a number between 1 and 10 into the input field and clicking "Submit". The app will tell you if your guess is correct or not. Remember that this is just a simple example, and you can expand upon it to make the game more interesting with additional features like high scores, multiple difficulty levels, etc. ## New Chat Choose a model with the dropdown at the top of the Chats page If you don't have any models, [download one](models.html#download-models). Once you have models, you can start chats by loading your default model, which you can configure in [settings](settings.html#application-settings) ![](../assets/three_model_options.png)Choose a model ## LocalDocs Open the [LocalDocs](localdocs.html) panel with the button in the top-right corner to bring your files into the chat. With LocalDocs, your chats are enhanced with semantically related snippets from your files included in the model's context. ![](../assets/open_local_docs.png)Open LocalDocs ## Chat History View your chat history with the button in the top-left corner of the Chats page. ![Close chats](../assets/closed_chat_panel.png) | ![Open chats](../assets/open_chat_panel.png) ---|--- You can change a chat name or delete it from your chat history at any time. --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html --- [ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../../index.html) * [ Quickstart ](../quickstart.html) * [ Chats ](../chats.html) * [ Models ](../models.html) * [ LocalDocs ](../localdocs.html) * [ Settings ](../settings.html) * [ Chat Templates ](../chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](use-local-ai-models-to-privately-chat-with-google-drive.html) * Local AI Chat with your Obsidian Vault [ Local AI Chat with your Obsidian Vault ](use-local-ai-models-to-privately-chat-with-Obsidian.html) Table of contents * Download Obsidian for Desktop * Connect Obsidian to LocalDocs * How It Works * [ Local AI Chat with your OneDrive ](use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../../gpt4all_python/home.html) * [ Monitoring ](../../gpt4all_python/monitoring.html) * [ SDK Reference ](../../gpt4all_python/ref.html) * Help Help * [ FAQ ](../../gpt4all_help/faq.html) * [ Troubleshooting ](../../gpt4all_help/troubleshooting.html) Table of contents * Download Obsidian for Desktop * Connect Obsidian to LocalDocs * How It Works # Using GPT4All to Privately Chat with your Obsidian Vault Obsidian for Desktop is a powerful management and note-taking software designed to create and organize markdown notes. This tutorial allows you to sync and access your Obsidian note files directly on your computer. By connecting it to LocalDocs, you can integrate these files into your LLM chats for private access and enhanced context. ## Download Obsidian for Desktop Download Obsidian for Desktop 1. **Download Obsidian for Desktop** : * Visit the [Obsidian website](https://obsidian.md) and create an account account. * Click the Download button in the center of the homepage * For more help with installing Obsidian see [Getting Started with Obsidian](https://help.obsidian.md/Getting+started/Download+and+install+Obsidian) 2. **Set Up Obsidian** : * Launch Obsidian from your Applications folder (macOS), Start menu (Windows), or equivalent location (Linux). * On the welcome screen, you can either create a new vault (a collection of notes) or open an existing one. * To create a new vault, click Create a new vault, name your vault, choose a location on your computer, and click Create. 3. **Sign in and Sync** : \- Once installed, you can start adding and organizing notes. \- Choose the folders you want to sync to your computer. ## Connect Obsidian to LocalDocs Connect Obsidian to LocalDocs 1. * Navigate to the LocalDocs feature within GPT4All. **Open LocalDocs** : ![LocalDocs interface](https://github.com/nomic- ai/gpt4all/assets/132290469/d8fb2d79-2063-45d4-bcce-7299fb75b144) --- 2. **Add Collection** : * Click on **\+ Add Collection** to begin linking your Obsidian Vault. ![Screenshot of adding collection](https://raw.githubusercontent.com/nomic- ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all- bindings/python/docs/assets/obsidian_adding_collection.png) --- * Name your collection 3. **Create Collection** : * Click **Create Collection** to initiate the embedding process. Progress will be displayed within the LocalDocs interface. 4. **Access Files in Chats** : * Load a model to chat with your files (Llama 3 Instruct is the fastest) * In your chat, open 'LocalDocs' with the button in the top-right corner to provide context from your synced Obsidian notes. ![Accessing LocalDocs in chats](https://raw.githubusercontent.com/nomic- ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all- bindings/python/docs/assets/obsidian_docs.png) --- 5. **Interact With Your Notes:** * Use the model to interact with your files ![osbsidian user interaction](https://raw.githubusercontent.com/nomic-ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all-bindings/python/docs/assets/osbsidian_user_interaction.png) --- ![osbsidian GPT4ALL response](https://raw.githubusercontent.com/nomic- ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all- bindings/python/docs/assets/obsidian_response.png) --- 6. **View Referenced Files** : * Click on **Sources** below LLM responses to see which Obsidian Notes were referenced. ![Referenced Files](https://raw.githubusercontent.com/nomic- ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all- bindings/python/docs/assets/obsidian_sources.png) --- ## How It Works Obsidian for Desktop syncs your Obsidian notes to your computer, while LocalDocs integrates these files into your LLM chats using embedding models. These models find semantically similar snippets from your files to enhance the context of your interactions. To learn more about embedding models and explore further, refer to the [Nomic Python SDK documentation](https://docs.nomic.ai/atlas/capabilities/embeddings). --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html --- [ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../../index.html) * [ Quickstart ](../quickstart.html) * [ Chats ](../chats.html) * [ Models ](../models.html) * [ LocalDocs ](../localdocs.html) * [ Settings ](../settings.html) * [ Chat Templates ](../chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](use-local-ai-models-to-privately-chat-with-Obsidian.html) * Local AI Chat with your OneDrive [ Local AI Chat with your OneDrive ](use-local-ai-models-to-privately-chat-with-One-Drive.html) Table of contents * Download OneDrive for Desktop * Connect OneDrive to LocalDocs * How It Works * API Server API Server * [ GPT4All API Server ](../../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../../gpt4all_python/home.html) * [ Monitoring ](../../gpt4all_python/monitoring.html) * [ SDK Reference ](../../gpt4all_python/ref.html) * Help Help * [ FAQ ](../../gpt4all_help/faq.html) * [ Troubleshooting ](../../gpt4all_help/troubleshooting.html) Table of contents * Download OneDrive for Desktop * Connect OneDrive to LocalDocs * How It Works # Using GPT4All to Privately Chat with your OneDrive Data Local and Private AI Chat with your OneDrive Data OneDrive for Desktop allows you to sync and access your OneDrive files directly on your computer. By connecting your synced directory to LocalDocs, you can start using GPT4All to privately chat with data stored in your OneDrive. ## Download OneDrive for Desktop Download OneDrive for Desktop 1. **Download OneDrive for Desktop** : 2. Visit [Microsoft OneDrive](https://www.microsoft.com/en-us/microsoft-365/onedrive/download). 3. Press 'download' for your respective device type. 4. Download the OneDrive for Desktop application. 5. **Install OneDrive for Desktop** 6. Run the installer file you downloaded. 7. Follow the prompts to complete the installation process. 8. **Sign in and Sync** 9. Once installed, sign in to OneDrive for Desktop with your Microsoft account credentials. 10. Choose the folders you want to sync to your computer. ## Connect OneDrive to LocalDocs Connect OneDrive to LocalDocs 1. **Install GPT4All and Open LocalDocs** : * Go to [nomic.ai/gpt4all](https://nomic.ai/gpt4all) to install GPT4All for your operating system. * Navigate to the LocalDocs feature within GPT4All to configure it to use your synced OneDrive directory. ![Screenshot 2024-07-10 at 10 55 41 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/54254bc0-d9a0-40c4-9fd1-5059abaad583) --- 2. **Add Collection** : * Click on **\+ Add Collection** to begin linking your OneDrive folders. ![Screenshot 2024-07-10 at 10 56 29 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/7f12969a-753a-4757-bb9e-9b607cf315ca) --- * Name the Collection and specify the OneDrive folder path. 3. **Create Collection** : * Click **Create Collection** to initiate the embedding process. Progress will be displayed within the LocalDocs interface. 4. **Access Files in Chats** : * Load a model within GPT4All to chat with your files. * In your chat, open 'LocalDocs' using the button in the top-right corner to provide context from your synced OneDrive files. ![Screenshot 2024-07-10 at 10 58 55 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/b5a67fe6-0d6a-42ae-b3b8-cc0f91cbf5b1) --- 5. **Interact With Your OneDrive** : * Use the model to interact with your files directly from OneDrive. ![Screenshot 2024-07-10 at 11 04 55 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/2c9815b8-3d1c-4179-bf76-3ddbafb193bf) --- ![Screenshot 2024-07-11 at 11 21 46 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/ce8be292-b025-415a-bd54-f11868e0cd0a) --- 6. **View Referenced Files** : * Click on **Sources** below responses to see which OneDrive files were referenced. ![Screenshot 2024-07-11 at 11 22 49 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/6fe3f10d-2791-4153-88a7-2198ab3ac945) --- ## How It Works OneDrive for Desktop syncs your OneDrive files to your computer, while LocalDocs maintains a database of these synced files for use by your local GPT4All model. As your OneDrive updates, LocalDocs will automatically detect file changes and stay up to date. LocalDocs leverages [Nomic Embedding](https://docs.nomic.ai/atlas/capabilities/embeddings) models to find semantically similar snippets from your files, enhancing the context of your interactions. --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html --- [ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../../index.html) * [ Quickstart ](../quickstart.html) * [ Chats ](../chats.html) * [ Models ](../models.html) * [ LocalDocs ](../localdocs.html) * [ Settings ](../settings.html) * [ Chat Templates ](../chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * Local AI Chat with your Google Drive [ Local AI Chat with your Google Drive ](use-local-ai-models-to-privately-chat-with-google-drive.html) Table of contents * Download Google Drive for Desktop * Connect Google Drive to LocalDocs * How It Works * [ Local AI Chat with your Obsidian Vault ](use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../../gpt4all_python/home.html) * [ Monitoring ](../../gpt4all_python/monitoring.html) * [ SDK Reference ](../../gpt4all_python/ref.html) * Help Help * [ FAQ ](../../gpt4all_help/faq.html) * [ Troubleshooting ](../../gpt4all_help/troubleshooting.html) Table of contents * Download Google Drive for Desktop * Connect Google Drive to LocalDocs * How It Works # Using GPT4All to Privately Chat with your Google Drive Data Local and Private AI Chat with your Google Drive Data Google Drive for Desktop allows you to sync and access your Google Drive files directly on your computer. By connecting your synced directory to LocalDocs, you can start using GPT4All to privately chat with data stored in your Google Drive. ## Download Google Drive for Desktop Download Google Drive for Desktop 1. **Download Google Drive for Desktop** : 2. Visit [drive.google.com](https://drive.google.com) and sign in with your Google account. 3. Navigate to the **Settings** (gear icon) and select **Settings** from the dropdown menu. 4. Scroll down to **Google Drive for desktop** and click **Download**. 5. **Install Google Drive for Desktop** 6. Run the installer file you downloaded. 7. Follow the prompts to complete the installation process. 8. **Sign in and Sync** 9. Once installed, sign in to Google Drive for Desktop with your Google account credentials. 10. Choose the folders you want to sync to your computer. For advanced help, see [Setting up Google Drive for Desktop](https://support.google.com/drive/answer/10838124?hl=en) ## Connect Google Drive to LocalDocs Connect Google Drive to LocalDocs 1. **Install GPT4All and Open LocalDocs** : * Go to [nomic.ai/gpt4all](https://nomic.ai/gpt4all) to install GPT4All for your operating system. * Navigate to the LocalDocs feature within GPT4All to configure it to use your synced directory. ![Screenshot 2024-07-09 at 3 15 35 PM](https://github.com/nomic- ai/gpt4all/assets/132290469/d8fb2d79-2063-45d4-bcce-7299fb75b144) --- 2. **Add Collection** : * Click on **\+ Add Collection** to begin linking your Google Drive folders. ![Screenshot 2024-07-09 at 3 17 24 PM](https://github.com/nomic- ai/gpt4all/assets/132290469/39063615-9eb6-4c47-bde7-c9f04f9b168b) --- * Name Collection 3. **Create Collection** : * Click **Create Collection** to initiate the embedding process. Progress will be displayed within the LocalDocs interface. 4. **Access Files in Chats** : * Load a model to chat with your files (Llama 3 Instruct performs best) * In your chat, open 'LocalDocs' with the button in the top-right corner to provide context from your synced Google Drive files. ![Screenshot 2024-07-09 at 3 20 53 PM](https://github.com/nomic- ai/gpt4all/assets/132290469/ce68811f-9abd-451b-ac0a-fb941e185d7a) --- 5. **Interact With Your Drive:** * Use the model to interact with your files ![Screenshot 2024-07-09 at 3 36 51 PM](https://github.com/nomic- ai/gpt4all/assets/132290469/bc55bc36-e613-419d-a568-adb1cd993854) --- ![Screenshot 2024-07-11 at 11 34 00 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/1c0fd19a-5a22-4726-a841-d26c1bea81fc) --- 6. **View Referenced Files** : * Click on **Sources** below LLM responses to see which Google Drive files were referenced. ![Screenshot 2024-07-11 at 11 34 37 AM](https://github.com/nomic- ai/gpt4all/assets/132290469/78527d30-8d24-4b4c-8311-b611a2d66fcd) --- ## How It Works Google Drive for Desktop syncs your Google Drive files to your computer, while LocalDocs maintains a database of these synced files for use by your local LLM. As your Google Drive updates, LocalDocs will automatically detect file changes and get up to date. LocalDocs is powered by [Nomic Embedding](https://docs.nomic.ai/atlas/capabilities/embeddings) models which find semantically similar snippets from your files to enhance the context of your interactions. --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html --- [ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../../index.html) * [ Quickstart ](../quickstart.html) * [ Chats ](../chats.html) * [ Models ](../models.html) * [ LocalDocs ](../localdocs.html) * [ Settings ](../settings.html) * [ Chat Templates ](../chat_templates.html) * Cookbook Cookbook * Local AI Chat with Microsoft Excel [ Local AI Chat with Microsoft Excel ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html) Table of contents * Attach Microsoft Excel to your GPT4All Conversation * How It Works * Limitations * [ Local AI Chat with your Google Drive ](use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../../gpt4all_python/home.html) * [ Monitoring ](../../gpt4all_python/monitoring.html) * [ SDK Reference ](../../gpt4all_python/ref.html) * Help Help * [ FAQ ](../../gpt4all_help/faq.html) * [ Troubleshooting ](../../gpt4all_help/troubleshooting.html) Table of contents * Attach Microsoft Excel to your GPT4All Conversation * How It Works * Limitations # Using GPT4All to Privately Chat with your Microsoft Excel Spreadsheets Local and Private AI Chat with your Microsoft Excel Spreadsheets Microsoft Excel allows you to create, manage, and analyze data in spreadsheet format. By attaching your spreadsheets directly to GPT4All, you can privately chat with the AI to query and explore the data, enabling you to summarize, generate reports, and glean insights from your files—all within your conversation. ## Attach Microsoft Excel to your GPT4All Conversation Attach Microsoft Excel to your GPT4All Conversation 1. **Install GPT4All and Open** : * Go to [nomic.ai/gpt4all](https://nomic.ai/gpt4all) to install GPT4All for your operating system. * Navigate to the Chats view within GPT4All. ![Chat view](../../assets/chat_window.png) --- 2. **Example Spreadsheet** : ![Spreadsheet view](../../assets/disney_spreadsheet.png) --- 3. **Attach to GPT4All conversration** ![Attach view](../../assets/attach_spreadsheet.png) --- 4. **Have GPT4All Summarize and Generate a Report** ![Attach view](../../assets/spreadsheet_chat.png) --- ## How It Works GPT4All parses your attached excel spreadsheet into Markdown, a format understandable to LLMs, and adds the markdown text to the context for your LLM chat. You can view the code that converts `.xslx` to Markdown [here](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all- chat/src/xlsxtomd.cpp) in the GPT4All github repo. For example, the above spreadsheet titled `disney_income_stmt.xlsx` would be formatted the following way: ## disney_income_stmt |Walt Disney Co.||||||| |---|---|---|---|---|---|---| |Consolidated Income Statement||||||| ||||||||| |US$ in millions||||||| |12 months ended:|2023-09-30 00:00:00|2022-10-01 00:00:00|2021-10-02 00:00:00|2020-10-03 00:00:00|2019-09-28 00:00:00|2018-09-29 00:00:00| |Services|79562|74200|61768|59265|60542|50869| ... ... ... ## Limitations It is important to double-check the claims LLMs make about the spreadsheets you provide. LLMs can make mistakes about the data they are presented, particularly for the LLMs with smaller parameter counts (~8B) that fit within the memory of consumer hardware. --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/localdocs.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](quickstart.html) * [ Chats ](chats.html) * [ Models ](models.html) * LocalDocs [ LocalDocs ](localdocs.html) Table of contents * Create LocalDocs * How It Works * [ Settings ](settings.html) * [ Chat Templates ](chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * Create LocalDocs * How It Works # LocalDocs LocalDocs brings the information you have from files on-device into your LLM chats - **privately**. ## Create LocalDocs Create LocalDocs 1. Click `+ Add Collection`. 2. Name your collection and link it to a folder. ![new GOT Docs](../assets/new_docs_annotated.png) | ![new GOT Docs filled out](../assets/new_docs_annotated_filled.png) ---|--- 3. Click `Create Collection`. Progress for the collection is displayed on the LocalDocs page. ![](../assets/baelor.png)Embedding in progress You will see a green `Ready` indicator when the entire collection is ready. Note: you can still chat with the files that are ready before the entire collection is ready. ![](../assets/got_done.png)Embedding complete Later on if you modify your LocalDocs settings you can rebuild your collections with your new settings. 4. In your chats, open `LocalDocs` with button in top-right corner to give your LLM context from those files. ![](../assets/syrio_snippets.png)LocalDocs result 5. See which files were referenced by clicking `Sources` below the LLM responses. ![](../assets/open_sources.png)Sources ## How It Works A LocalDocs collection uses Nomic AI's free and fast on-device embedding models to index your folder into text snippets that each get an **embedding vector**. These vectors allow us to find snippets from your files that are semantically similar to the questions and prompts you enter in your chats. We then include those semantically similar snippets in the prompt to the LLM. To try the embedding models yourself, we recommend using the [Nomic Python SDK](https://docs.nomic.ai/atlas/capabilities/embeddings) --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/models.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](quickstart.html) * [ Chats ](chats.html) * Models [ Models ](models.html) Table of contents * Download Models * Explore Models * Example Models * Search Results * Connect Model APIs * [ LocalDocs ](localdocs.html) * [ Settings ](settings.html) * [ Chat Templates ](chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * Download Models * Explore Models * Example Models * Search Results * Connect Model APIs # Models GPT4All is optimized to run LLMs in the 3-13B parameter range on consumer- grade hardware. LLMs are downloaded to your device so you can run them locally and privately. With our backend anyone can interact with LLMs efficiently and securely on their own hardware. ## Download Models Download Models 1. | Click `Models` in the menu on the left (below `Chats` and above `LocalDocs`) | ![Models Page Icon](../assets/models_page_icon.png) ---|---|--- 2. | Click `+ Add Model` to navigate to the `Explore Models` page | ![Add Model button](../assets/add.png) 3. | Search for models available online | ![Explore Models search](../assets/explore.png) 4. | Hit `Download` to save a model to your device | ![Download Models button](../assets/download.png) 5. | Once the model is downloaded you will see it in `Models`. | ![Download Models button](../assets/installed_models.png) ## Explore Models GPT4All connects you with LLMs from HuggingFace with a [`llama.cpp`](https://github.com/ggerganov/llama.cpp) backend so that they will run efficiently on your hardware. Many of these models can be identified by the file type `.gguf`. ![](../assets/search_mistral.png)Explore models ## Example Models Many LLMs are available at various sizes, quantizations, and licenses. * LLMs with more parameters tend to be better at coherently responding to instructions * LLMs with a smaller quantization (e.g. 4bit instead of 16bit) are much faster and less memory intensive, and tend to have slightly worse performance * Licenses vary in their terms for personal and commercial use Here are a few examples: Model | Filesize | RAM Required | Parameters | Quantization | Developer | License | MD5 Sum (Unique Hash) ---|---|---|---|---|---|---|--- Llama 3 Instruct | 4.66 GB | 8 GB | 8 Billion | q4_0 | Meta | [Llama 3 License](https://llama.meta.com/llama3/license/) | c87ad09e1e4c8f9c35a5fcef52b6f1c9 Nous Hermes 2 Mistral DPO | 4.11 GB | 8 GB | 7 Billion | q4_0 | Mistral & Nous Research | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | Coa5f6b4eabd3992da4d7fb7f020f921eb Phi-3 Mini Instruct | 2.18 GB | 4 GB | 4 billion | q4_0 | Microsoft | [MIT](https://opensource.org/license/mit) | f8347badde9bfc2efbe89124d78ddaf5 Mini Orca (Small) | 1.98 GB | 4 GB | 3 billion | q4_0 | Microsoft | [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0) | 0e769317b90ac30d6e09486d61fefa26 GPT4All Snoozy | 7.37 GB | 16 GB | 13 billion | q4_0 | Nomic AI | [GPL](https://www.gnu.org/licenses/gpl-3.0.en.html) | 40388eb2f8d16bb5d08c96fdfaac6b2c ### Search Results You can click the gear icon in the search bar to sort search results by their # of likes, # of downloads, or date of upload (all from HuggingFace). ![](../assets/search_settings.png)Sort search results ## Connect Model APIs You can add your API key for remote model providers. **Note** : this does not download a model file to your computer to use securely. Instead, this way of interacting with models has your prompts leave your computer to the API provider and returns the response to your computer. ![](../assets/add_model_gpt4.png)Connect APIs --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/quickstart.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * Quickstart [ Quickstart ](quickstart.html) Table of contents * Quickstart * [ Chats ](chats.html) * [ Models ](models.html) * [ LocalDocs ](localdocs.html) * [ Settings ](settings.html) * [ Chat Templates ](chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * Quickstart # GPT4All Desktop The GPT4All Desktop Application allows you to download and run large language models (LLMs) locally & privately on your device. With GPT4All, you can chat with models, turn your local files into information sources for models [(LocalDocs)](localdocs.html), or browse models available online to download onto your device. [Official Video Tutorial](https://www.youtube.com/watch?v=gQcZDXRVJok) ## Quickstart Quickstart 1. Install GPT4All for your operating system and open the application. [Download for Windows](https://gpt4all.io/installers/gpt4all-installer- win64.exe) [Download for Mac](https://gpt4all.io/installers/gpt4all- installer-darwin.dmg) [Download for Linux](https://gpt4all.io/installers/gpt4all-installer-linux.run) 2. Hit `Start Chatting`. ![](../assets/gpt4all_home.png)GPT4All home page 3. Click `+ Add Model`. 4. Download a model. We recommend starting with Llama 3, but you can [browse more models](models.html). ![](../assets/download_llama.png)Download a model 5. Once downloaded, go to Chats (below Home and above Models in the menu on the left). 6. Click "Load Default Model" (will be Llama 3 or whichever model you downloaded). ![Before first chat](../assets/before_first_chat.png) | ![New first chat](../assets/new_first_chat.png) ---|--- 7. Try the [example chats](chats.html) or your own prompts! --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_desktop/settings.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](quickstart.html) * [ Chats ](chats.html) * [ Models ](models.html) * [ LocalDocs ](localdocs.html) * Settings [ Settings ](settings.html) Table of contents * Application Settings * Model Settings * Clone * Sampling Settings * LocalDocs Settings * [ Chat Templates ](chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * Application Settings * Model Settings * Clone * Sampling Settings * LocalDocs Settings # Settings ## Application Settings General Application Settings Setting | Description | Default Value ---|---|--- **Theme** | Color theme for the application. Options are `Light`, `Dark`, and `LegacyDark` | `Light` **Font Size** | Font size setting for text throughout the application. Options are Small, Medium, and Large | Small **Language and Locale** | The language and locale of that language you wish to use | System Locale **Device** | Device that will run your models. Options are `Auto` (GPT4All chooses), `Metal` (Apple Silicon M1+), `CPU`, and `GPU` | `Auto` **Default Model** | Choose your preferred LLM to load by default on startup | Auto **Suggestion Mode** | Generate suggested follow up questions at the end of responses | When chatting with LocalDocs **Download Path** | Select a destination on your device to save downloaded models | Windows: `C:\Users\{username}\AppData\Local\nomic.ai\GPT4All` Mac: `/Users/{username}/Library/Application Support/nomic.ai/GPT4All/` Linux: `/home/{username}/.local/share/nomic.ai/GPT4All` **Enable Datalake** | Opt-in to sharing interactions with GPT4All community (**anonymous** and **optional**) | Off Advanced Application Settings Setting | Description | Default Value ---|---|--- **CPU Threads** | Number of concurrently running CPU threads (more can speed up responses) | 4 **Enable System Tray** | The application will minimize to the system tray / taskbar when the window is closed | Off **Enable Local Server** | Allow any application on your device to use GPT4All via an OpenAI-compatible GPT4All API | Off **API Server Port** | Local HTTP port for the local API server | 4891 ## Model Settings Model / Character Settings Setting | Description | Default Value ---|---|--- **Name** | Unique name of this model / character | set by model uploader **Model File** | Filename (.gguf) of the model | set by model uploader **System Message** | General instructions for the chats this model will be used for | set by model uploader **Chat Template** | Format of user <-> assistant interactions for the chats this model will be used for | set by model uploader **Chat Name Prompt** | Prompt used to automatically generate chat names | Describe the above conversation in seven words or less. **Suggested FollowUp Prompt** | Prompt used to automatically generate follow up questions after a chat response | Suggest three very short factual follow-up questions that have not been answered yet or cannot be found inspired by the previous conversation and excerpts. ### Clone You can **clone** an existing model, which allows you to save a configuration of a model file with different prompt templates and sampling settings. ### Sampling Settings Model Sampling Settings Setting | Description | Default Value ---|---|--- **Context Length** | Maximum length of input sequence in tokens | 2048 **Max Length** | Maximum length of response in tokens | 4096 **Prompt Batch Size** | Token batch size for parallel processing | 128 **Temperature** | Lower temperature gives more likely generations | 0.7 **Top P** | Prevents choosing highly unlikely tokens | 0.4 **Top K** | Size of selection pool for tokens | 40 **Min P** | Minimum relative probability | 0 **Repeat Penalty Tokens** | Length to apply penalty | 64 **Repeat Penalty** | Penalize repetitiveness | 1.18 **GPU Layers** | How many model layers to load into VRAM | 32 ## LocalDocs Settings General LocalDocs Settings Setting | Description | Default Value ---|---|--- **Allowed File Extensions** | Choose which file types will be indexed into LocalDocs collections as text snippets with embedding vectors | `.txt`, `.pdf`, `.md`, `.rst` **Use Nomic Embed API** | Use Nomic API to create LocalDocs collections fast and off-device; [Nomic API Key](https://atlas.nomic.ai/) required | Off **Embeddings Device** | Device that will run embedding models. Options are `Auto` (GPT4All chooses), `Metal` (Apple Silicon M1+), `CPU`, and `GPU` | `Auto` **Show Sources** | Titles of source files retrieved by LocalDocs will be displayed directly in your chats. | On Advanced LocalDocs Settings Note that increasing these settings can increase the likelihood of factual responses, but may result in slower generation times. Setting | Description | Default Value ---|---|--- **Document Snippet Size** | Number of string characters per document snippet | 512 **Maximum Document Snippets Per Prompt** | Upper limit for the number of snippets from your files LocalDocs can retrieve for LLM context | 3 --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_help/faq.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](../gpt4all_desktop/quickstart.html) * [ Chats ](../gpt4all_desktop/chats.html) * [ Models ](../gpt4all_desktop/models.html) * [ LocalDocs ](../gpt4all_desktop/localdocs.html) * [ Settings ](../gpt4all_desktop/settings.html) * [ Chat Templates ](../gpt4all_desktop/chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * FAQ [ FAQ ](faq.html) Table of contents * Models * Which language models are supported? * Which embedding models are supported? * Software * What software do I need? * Which SDK languages are supported? * Is there an API? * Can I monitor a GPT4All deployment? * Is there a command line interface (CLI)? * Hardware * What hardware do I need? * What are the system requirements? * [ Troubleshooting ](troubleshooting.html) Table of contents * Models * Which language models are supported? * Which embedding models are supported? * Software * What software do I need? * Which SDK languages are supported? * Is there an API? * Can I monitor a GPT4All deployment? * Is there a command line interface (CLI)? * Hardware * What hardware do I need? * What are the system requirements? # Frequently Asked Questions ## Models ### Which language models are supported? We support models with a `llama.cpp` implementation which have been uploaded to [HuggingFace](https://huggingface.co/). ### Which embedding models are supported? We support SBert and Nomic Embed Text v1 & v1.5. ## Software ### What software do I need? All you need is to [install GPT4all](../index.html) onto you Windows, Mac, or Linux computer. ### Which SDK languages are supported? Our SDK is in Python for usability, but these are light bindings around [`llama.cpp`](https://github.com/ggerganov/llama.cpp) implementations that we contribute to for efficiency and accessibility on everyday computers. ### Is there an API? Yes, you can run your model in server-mode with our [OpenAI-compatible API](https://platform.openai.com/docs/api-reference/completions), which you can configure in [settings](../gpt4all_desktop/settings.html#application- settings) ### Can I monitor a GPT4All deployment? Yes, GPT4All [integrates](../gpt4all_python/monitoring.html) with [OpenLIT](https://github.com/openlit/openlit) so you can deploy LLMs with user interactions and hardware usage automatically monitored for full observability. ### Is there a command line interface (CLI)? [Yes](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/cli), we have a lightweight use of the Python client as a CLI. We welcome further contributions! ## Hardware ### What hardware do I need? GPT4All can run on CPU, Metal (Apple Silicon M1+), and GPU. ### What are the system requirements? Your CPU needs to support [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) and you need enough RAM to load a model into memory. --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_help/troubleshooting.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](../gpt4all_desktop/quickstart.html) * [ Chats ](../gpt4all_desktop/chats.html) * [ Models ](../gpt4all_desktop/models.html) * [ LocalDocs ](../gpt4all_desktop/localdocs.html) * [ Settings ](../gpt4all_desktop/settings.html) * [ Chat Templates ](../gpt4all_desktop/chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](../gpt4all_python/home.html) * [ Monitoring ](../gpt4all_python/monitoring.html) * [ SDK Reference ](../gpt4all_python/ref.html) * Help Help * [ FAQ ](faq.html) * Troubleshooting [ Troubleshooting ](troubleshooting.html) Table of contents * Error Loading Models * Bad Responses * Responses Incoherent * Responses Incorrect * LocalDocs Issues Table of contents * Error Loading Models * Bad Responses * Responses Incoherent * Responses Incorrect * LocalDocs Issues # Troubleshooting ## Error Loading Models It is possible you are trying to load a model from HuggingFace whose weights are not compatible with our [backend](https://github.com/nomic- ai/gpt4all/tree/main/gpt4all-bindings). Try downloading one of the officially supported models listed on the main models page in the application. If the problem persists, please share your experience on our [Discord](https://discord.com/channels/1076964370942267462). ## Bad Responses Try the [example chats](../gpt4all_desktop/chats.html) to double check that your system is implementing models correctly. ### Responses Incoherent If you are seeing something **not at all** resembling the [example chats](../gpt4all_desktop/chats.html) \- for example, if the responses you are seeing look nonsensical - try [downloading a different model](../gpt4all_desktop/models.html), and please share your experience on our [Discord](https://discord.com/channels/1076964370942267462). ### Responses Incorrect LLMs can be unreliable. It's helpful to know what their training data was - they are less likely to be correct when asking about data they were not trained on unless you give the necessary information in the prompt as **context**. Giving LLMs additional context, like chatting using [LocalDocs](../gpt4all_desktop/localdocs.html), can help merge the language model's ability to understand text with the files that you trust to contain the information you need. Including information in a prompt is not a guarantee that it will be used correctly, but the more clear and concise your prompts, and the more relevant your prompts are to your files, the better. ### LocalDocs Issues Occasionally a model - particularly a smaller or overall weaker LLM - may not use the relevant text snippets from the files that were referenced via LocalDocs. If you are seeing this, it can help to use phrases like "in the docs" or "from the provided files" when prompting your model. --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_python/home.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](../gpt4all_desktop/quickstart.html) * [ Chats ](../gpt4all_desktop/chats.html) * [ Models ](../gpt4all_desktop/models.html) * [ LocalDocs ](../gpt4all_desktop/localdocs.html) * [ Settings ](../gpt4all_desktop/settings.html) * [ Chat Templates ](../gpt4all_desktop/chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * GPT4All Python SDK [ GPT4All Python SDK ](home.html) Table of contents * Installation * Load LLM * Chat Session Generation * Direct Generation * Embeddings * [ Monitoring ](monitoring.html) * [ SDK Reference ](ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * Installation * Load LLM * Chat Session Generation * Direct Generation * Embeddings # GPT4All Python SDK ## Installation To get started, pip-install the `gpt4all` package into your python environment. pip install gpt4all We recommend installing `gpt4all` into its own virtual environment using `venv` or `conda` ## Load LLM Models are loaded by name via the `GPT4All` class. If it's your first time loading a model, it will be downloaded to your device and saved so it can be quickly reloaded next time you create a `GPT4All` model with the same name. Load LLM from gpt4all import GPT4All model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM with model.chat_session(): print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024)) `GPT4All` model name | Filesize | RAM Required | Parameters | Quantization | Developer | License | MD5 Sum (Unique Hash) ---|---|---|---|---|---|---|--- `Meta-Llama-3-8B-Instruct.Q4_0.gguf` | 4.66 GB | 8 GB | 8 Billion | q4_0 | Meta | [Llama 3 License](https://llama.meta.com/llama3/license/) | c87ad09e1e4c8f9c35a5fcef52b6f1c9 `Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf` | 4.11 GB | 8 GB | 7 Billion | q4_0 | Mistral & Nous Research | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | Coa5f6b4eabd3992da4d7fb7f020f921eb `Phi-3-mini-4k-instruct.Q4_0.gguf` | 2.18 GB | 4 GB | 3.8 billion | q4_0 | Microsoft | [MIT](https://opensource.org/license/mit) | f8347badde9bfc2efbe89124d78ddaf5 `orca-mini-3b-gguf2-q4_0.gguf` | 1.98 GB | 4 GB | 3 billion | q4_0 | Microsoft | [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0) | 0e769317b90ac30d6e09486d61fefa26 `gpt4all-13b-snoozy-q4_0.gguf` | 7.37 GB | 16 GB | 13 billion | q4_0 | Nomic AI | [GPL](https://www.gnu.org/licenses/gpl-3.0.en.html) | 40388eb2f8d16bb5d08c96fdfaac6b2c ## Chat Session Generation Most of the language models you will be able to access from HuggingFace have been trained as assistants. This guides language models to not just answer with relevant text, but _helpful_ text. If you want your LLM's responses to be helpful in the typical sense, we recommend you apply the chat templates the models were finetuned with. Information about specific prompt templates is typically available on the official HuggingFace page for the model. Example LLM Chat Session Generation CodeOutput Load `Llama 3` and enter the following prompt in a chat session: from gpt4all import GPT4All model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") with model.chat_session(): print(model.generate("quadratic formula")) With the default sampling settings, you should see something resembling the following: The quadratic formula! The quadratic formula is a mathematical formula that provides the solutions to a quadratic equation of the form: ax^2 + bx + c = 0 where a, b, and c are constants. The formula is: x = (-b ± √(b^2 - 4ac)) / 2a Let's break it down: * x is the variable we're trying to solve for. * a, b, and c are the coefficients of the quadratic equation. * ± means "plus or minus". * √ denotes the square root. To use the formula, simply plug in the values of a, b, and c into the expression above. The resulting value(s) will be the solutions to the original quadratic equation! For example, let's say we have the quadratic equation: x^2 + 5x + 6 = 0 We can plug these values into the formula as follows: a = 1 b = 5 ## Direct Generation Directly calling `model.generate()` prompts the model without applying any templates. Note: this can result in responses that are less like helpful responses and more like mirroring the tone of your prompt. In general, a language model outside of a chat session is less of a helpful assistant and more of a lens into the distribution of the model's training data. As an example, see how the model's response changes when we give the same prompt as above without applying a chat session: Example LLM Direct Generation CodeOutput Load `Llama 3` and enter the following prompt: from gpt4all import GPT4All model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") print(model.generate("quadratic formula")) With the default sampling settings, you should see something resembling the following: . The equation is in the form of a + bx = c, where a and b are constants. The solution to this problem involves using the quadratic formula which states that for any quadratic equation ax^2+bx+c=0, its solutions can be found by: x = (-b ± √(b^2-4ac)) / 2a In your case, since you have a + bx = c, we need to rewrite it in the form of ax^2+bx+c=0. To do this, subtract both sides from c, so that: c - (a + bx) = 0 Now, combine like terms on the left side and simplify: ax^2 + (-b)x + (c-a) = 0\n\nSo now we have a quadratic equation in standard form: ax^2+bx+c=0. We can use this to find its solutions using the quadratic formula: x = ((-b ± √((-b)^2 Why did it respond differently? Because language models, before being fine- tuned as assistants, are trained to be more like a data mimic than a helpful assistant. Therefore our responses ends up more like a typical continuation of math-style text rather than a helpful answer in dialog. ## Embeddings Nomic trains and open-sources free embedding models that will run very fast on your hardware. The easiest way to run the text embedding model locally uses the [`nomic`](https://github.com/nomic-ai/nomic) python library to interface with our fast [C/C++ implementations](ref.html#gpt4all.gpt4all.Embed4All). Example Embeddings Generation CodeOutput Importing `embed` from the [`nomic`](https://github.com/nomic-ai/nomic) library, you can call `embed.text()` with `inference_mode="local"`. This downloads an embedding model and saves it for later. from nomic import embed embeddings = embed.text(["String 1", "String 2"], inference_mode="local")['embeddings'] print("Number of embeddings created:", len(embeddings)) print("Number of dimensions per embedding:", len(embeddings[0])) Number of embeddings created: 2 Number of dimensions per embedding: 768 ![](../assets/local_embed.gif)Nomic embed text local inference To learn more about making embeddings locally with `nomic`, visit our [embeddings guide](https://docs.nomic.ai/atlas/guides/embeddings#local- inference). The following embedding models can be used within the application and with the `Embed4All` class from the `gpt4all` Python library. The default context length as GGUF files is 2048 but can be [extended](https://huggingface.co/nomic-ai/nomic-embed- text-v1.5-GGUF#description). Name | Using with `nomic` | `Embed4All` model name | Context Length | # Embedding Dimensions | File Size ---|---|---|---|---|--- [Nomic Embed v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1-GGUF) | `embed.text(strings, model="nomic-embed-text-v1", inference_mode="local")` | `Embed4All("nomic-embed-text-v1.f16.gguf")` | 2048 | 768 | 262 MiB [Nomic Embed v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF) | `embed.text(strings, model="nomic-embed-text-v1.5", inference_mode="local")` | `Embed4All("nomic-embed-text-v1.5.f16.gguf")` | 2048 | 64-768 | 262 MiB [SBert](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | n/a | `Embed4All("all-MiniLM-L6-v2.gguf2.f16.gguf")` | 512 | 384 | 44 MiB --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_python/monitoring.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](../gpt4all_desktop/quickstart.html) * [ Chats ](../gpt4all_desktop/chats.html) * [ Models ](../gpt4all_desktop/models.html) * [ LocalDocs ](../gpt4all_desktop/localdocs.html) * [ Settings ](../gpt4all_desktop/settings.html) * [ Chat Templates ](../gpt4all_desktop/chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](home.html) * Monitoring [ Monitoring ](monitoring.html) Table of contents * Setup Monitoring * Visualization * OpenLIT UI * Grafana, DataDog, & Other Integrations * [ SDK Reference ](ref.html) * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * Setup Monitoring * Visualization * OpenLIT UI * Grafana, DataDog, & Other Integrations # GPT4All Monitoring GPT4All integrates with [OpenLIT](https://github.com/openlit/openlit) OpenTelemetry auto-instrumentation to perform real-time monitoring of your LLM application and GPU hardware. Monitoring can enhance your GPT4All deployment with auto-generated traces and metrics for * **Performance Optimization:** Analyze latency, cost and token usage to ensure your LLM application runs efficiently, identifying and resolving performance bottlenecks swiftly. * **User Interaction Insights:** Capture each prompt and response to understand user behavior and usage patterns better, improving user experience and engagement. * **Detailed GPU Metrics:** Monitor essential GPU parameters such as utilization, memory consumption, temperature, and power usage to maintain optimal hardware performance and avert potential issues. ## Setup Monitoring Setup Monitoring With [OpenLIT](https://github.com/openlit/openlit), you can automatically monitor traces and metrics for your LLM deployment: pip install openlit from gpt4all import GPT4All import openlit openlit.init() # start # openlit.init(collect_gpu_stats=True) # Optional: To configure GPU monitoring model = GPT4All(model_name='orca-mini-3b-gguf2-q4_0.gguf') # Start a chat session and send queries with model.chat_session(): response1 = model.generate(prompt='hello', temp=0) response2 = model.generate(prompt='write me a short poem', temp=0) response3 = model.generate(prompt='thank you', temp=0) print(model.current_chat_session) ## Visualization ### OpenLIT UI Connect to OpenLIT's UI to start exploring the collected LLM performance metrics and traces. Visit the OpenLIT [Quickstart Guide](https://docs.openlit.io/latest/quickstart) for step-by-step details. ### Grafana, DataDog, & Other Integrations You can also send the data collected by OpenLIT to popular monitoring tools like Grafana and DataDog. For detailed instructions on setting up these connections, please refer to the OpenLIT [Connections Guide](https://docs.openlit.io/latest/connections/intro). --- # GPT4All Documentation Source: https://docs.gpt4all.io/gpt4all_python/ref.html --- [ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](../index.html) * [ Quickstart ](../gpt4all_desktop/quickstart.html) * [ Chats ](../gpt4all_desktop/chats.html) * [ Models ](../gpt4all_desktop/models.html) * [ LocalDocs ](../gpt4all_desktop/localdocs.html) * [ Settings ](../gpt4all_desktop/settings.html) * [ Chat Templates ](../gpt4all_desktop/chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](../gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](home.html) * [ Monitoring ](monitoring.html) * SDK Reference [ SDK Reference ](ref.html) Table of contents * GPT4All * backend * device * __init__ * chat_session * close * download_model * generate * list_gpus * list_models * retrieve_model * Embed4All * __init__ * close * embed * Help Help * [ FAQ ](../gpt4all_help/faq.html) * [ Troubleshooting ](../gpt4all_help/troubleshooting.html) Table of contents * GPT4All * backend * device * __init__ * chat_session * close * download_model * generate * list_gpus * list_models * retrieve_model * Embed4All * __init__ * close * embed # GPT4All Python SDK Reference #### `GPT4All` Python class that handles instantiation, downloading, generation and chat with GPT4All models. Source code in `gpt4all/gpt4all.py` 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 | class GPT4All: """ Python class that handles instantiation, downloading, generation and chat with GPT4All models. """ def __init__( self, model_name: str, *, model_path: str | os.PathLike[str] | None = None, model_type: str | None = None, allow_download: bool = True, n_threads: int | None = None, device: str | None = None, n_ctx: int = 2048, ngl: int = 100, verbose: bool = False, ): """ Constructor Args: model_name: Name of GPT4All or custom model. Including ".gguf" file extension is optional but encouraged. model_path: Path to directory containing model file or, if file does not exist, where to download model. Default is None, in which case models will be stored in `~/.cache/gpt4all/`. model_type: Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user. Default is None. allow_download: Allow API to download models from gpt4all.io. Default is True. n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically. device: The processing unit on which the GPT4All model will run. It can be set to: - "cpu": Model will run on the central processing unit. - "gpu": Use Metal on ARM64 macOS, otherwise the same as "kompute". - "kompute": Use the best GPU provided by the Kompute backend. - "cuda": Use the best GPU provided by the CUDA backend. - "amd", "nvidia": Use the best GPU provided by the Kompute backend from this vendor. - A specific device name from the list returned by `GPT4All.list_gpus()`. Default is Metal on ARM64 macOS, "cpu" otherwise. Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown, and the GPT4All instance will be rendered invalid. It's advised to ensure the device has enough memory before initiating the model. n_ctx: Maximum size of context window ngl: Number of GPU layers to use (Vulkan) verbose: If True, print debug messages. """ self.model_type = model_type self._chat_session: ChatSession | None = None device_init = None if sys.platform == "darwin": if device is None: backend = "auto" # "auto" is effectively "metal" due to currently non-functional fallback elif device == "cpu": backend = "cpu" else: if platform.machine() != "arm64" or device != "gpu": raise ValueError(f"Unknown device for this platform: {device}") backend = "metal" else: backend = "kompute" if device is None or device == "cpu": pass # use kompute with no device elif device in ("cuda", "kompute"): backend = device device_init = "gpu" elif device.startswith("cuda:"): backend = "cuda" device_init = _remove_prefix(device, "cuda:") else: device_init = _remove_prefix(device, "kompute:") # Retrieve model and download if allowed self.config: ConfigType = self.retrieve_model(model_name, model_path=model_path, allow_download=allow_download, verbose=verbose) self.model = LLModel(self.config["path"], n_ctx, ngl, backend) if device_init is not None: self.model.init_gpu(device_init) self.model.load_model() # Set n_threads if n_threads is not None: self.model.set_thread_count(n_threads) def __enter__(self) -> Self: return self def __exit__( self, typ: type[BaseException] | None, value: BaseException | None, tb: TracebackType | None, ) -> None: self.close() def close(self) -> None: """Delete the model instance and free associated system resources.""" self.model.close() @property def backend(self) -> Literal["cpu", "kompute", "cuda", "metal"]: """The name of the llama.cpp backend currently in use. One of "cpu", "kompute", "cuda", or "metal".""" return self.model.backend @property def device(self) -> str | None: """The name of the GPU device currently in use, or None for backends other than Kompute or CUDA.""" return self.model.device @property def current_chat_session(self) -> list[MessageType] | None: return None if self._chat_session is None else self._chat_session.history @current_chat_session.setter def current_chat_session(self, history: list[MessageType]) -> None: if self._chat_session is None: raise ValueError("current_chat_session may only be set when there is an active chat session") self._chat_session.history[:] = history @staticmethod def list_models() -> list[ConfigType]: """ Fetch model list from https://gpt4all.io/models/models3.json. Returns: Model list in JSON format. """ resp = requests.get("https://gpt4all.io/models/models3.json") if resp.status_code != 200: raise ValueError(f"Request failed: HTTP {resp.status_code} {resp.reason}") return resp.json() @classmethod def retrieve_model( cls, model_name: str, model_path: str | os.PathLike[str] | None = None, allow_download: bool = True, verbose: bool = False, ) -> ConfigType: """ Find model file, and if it doesn't exist, download the model. Args: model_name: Name of model. model_path: Path to find model. Default is None in which case path is set to ~/.cache/gpt4all/. allow_download: Allow API to download model from gpt4all.io. Default is True. verbose: If True (default), print debug messages. Returns: Model config. """ model_filename = append_extension_if_missing(model_name) # get the config for the model config: ConfigType = {} if allow_download: models = cls.list_models() if (model := next((m for m in models if m["filename"] == model_filename), None)) is not None: config.update(model) # Validate download directory if model_path is None: try: os.makedirs(DEFAULT_MODEL_DIRECTORY, exist_ok=True) except OSError as e: raise RuntimeError("Failed to create model download directory") from e model_path = DEFAULT_MODEL_DIRECTORY else: model_path = Path(model_path) if not model_path.exists(): raise FileNotFoundError(f"Model directory does not exist: {model_path!r}") model_dest = model_path / model_filename if model_dest.exists(): config["path"] = str(model_dest) if verbose: print(f"Found model file at {str(model_dest)!r}", file=sys.stderr) elif allow_download: # If model file does not exist, download filesize = config.get("filesize") config["path"] = str(cls.download_model( model_filename, model_path, verbose=verbose, url=config.get("url"), expected_size=None if filesize is None else int(filesize), expected_md5=config.get("md5sum"), )) else: raise FileNotFoundError(f"Model file does not exist: {model_dest!r}") return config @staticmethod def download_model( model_filename: str, model_path: str | os.PathLike[str], verbose: bool = True, url: str | None = None, expected_size: int | None = None, expected_md5: str | None = None, ) -> str | os.PathLike[str]: """ Download model from gpt4all.io. Args: model_filename: Filename of model (with .gguf extension). model_path: Path to download model to. verbose: If True (default), print debug messages. url: the models remote url (e.g. may be hosted on HF) expected_size: The expected size of the download. expected_md5: The expected MD5 hash of the download. Returns: Model file destination. """ # Download model if url is None: url = f"https://gpt4all.io/models/gguf/{model_filename}" def make_request(offset=None): headers = {} if offset: print(f"\nDownload interrupted, resuming from byte position {offset}", file=sys.stderr) headers["Range"] = f"bytes={offset}-" # resume incomplete response headers["Accept-Encoding"] = "identity" # Content-Encoding changes meaning of ranges response = requests.get(url, stream=True, headers=headers) if response.status_code not in (200, 206): raise ValueError(f"Request failed: HTTP {response.status_code} {response.reason}") if offset and (response.status_code != 206 or str(offset) not in response.headers.get("Content-Range", "")): raise ValueError("Connection was interrupted and server does not support range requests") if (enc := response.headers.get("Content-Encoding")) is not None: raise ValueError(f"Expected identity Content-Encoding, got {enc}") return response response = make_request() total_size_in_bytes = int(response.headers.get("content-length", 0)) block_size = 2**20 # 1 MB partial_path = Path(model_path) / (model_filename + ".part") with open(partial_path, "w+b") as partf: try: with tqdm(desc="Downloading", total=total_size_in_bytes, unit="iB", unit_scale=True) as progress_bar: while True: last_progress = progress_bar.n try: for data in response.iter_content(block_size): partf.write(data) progress_bar.update(len(data)) except ChunkedEncodingError as cee: if cee.args and isinstance(pe := cee.args[0], ProtocolError): if len(pe.args) >= 2 and isinstance(ir := pe.args[1], IncompleteRead): assert progress_bar.n <= ir.partial # urllib3 may be ahead of us but never behind # the socket was closed during a read - retry response = make_request(progress_bar.n) continue raise if total_size_in_bytes != 0 and progress_bar.n < total_size_in_bytes: if progress_bar.n == last_progress: raise RuntimeError("Download not making progress, aborting.") # server closed connection prematurely - retry response = make_request(progress_bar.n) continue break # verify file integrity file_size = partf.tell() if expected_size is not None and file_size != expected_size: raise ValueError(f"Expected file size of {expected_size} bytes, got {file_size}") if expected_md5 is not None: partf.seek(0) hsh = hashlib.md5() with tqdm(desc="Verifying", total=file_size, unit="iB", unit_scale=True) as bar: while chunk := partf.read(block_size): hsh.update(chunk) bar.update(len(chunk)) if hsh.hexdigest() != expected_md5.lower(): raise ValueError(f"Expected MD5 hash of {expected_md5!r}, got {hsh.hexdigest()!r}") except: if verbose: print("Cleaning up the interrupted download...", file=sys.stderr) try: os.remove(partial_path) except OSError: pass raise # flush buffers and sync the inode partf.flush() _fsync(partf) # move to final destination download_path = Path(model_path) / model_filename try: os.rename(partial_path, download_path) except FileExistsError: try: os.remove(partial_path) except OSError: pass raise if verbose: print(f"Model downloaded to {str(download_path)!r}", file=sys.stderr) return download_path @overload def generate( self, prompt: str, *, max_tokens: int = ..., temp: float = ..., top_k: int = ..., top_p: float = ..., min_p: float = ..., repeat_penalty: float = ..., repeat_last_n: int = ..., n_batch: int = ..., n_predict: int | None = ..., streaming: Literal[False] = ..., callback: ResponseCallbackType = ..., ) -> str: ... @overload def generate( self, prompt: str, *, max_tokens: int = ..., temp: float = ..., top_k: int = ..., top_p: float = ..., min_p: float = ..., repeat_penalty: float = ..., repeat_last_n: int = ..., n_batch: int = ..., n_predict: int | None = ..., streaming: Literal[True], callback: ResponseCallbackType = ..., ) -> Iterable[str]: ... @overload def generate( self, prompt: str, *, max_tokens: int = ..., temp: float = ..., top_k: int = ..., top_p: float = ..., min_p: float = ..., repeat_penalty: float = ..., repeat_last_n: int = ..., n_batch: int = ..., n_predict: int | None = ..., streaming: bool, callback: ResponseCallbackType = ..., ) -> Any: ... def generate( self, prompt : str, *, max_tokens : int = 200, temp : float = 0.7, top_k : int = 40, top_p : float = 0.4, min_p : float = 0.0, repeat_penalty : float = 1.18, repeat_last_n : int = 64, n_batch : int = 8, n_predict : int | None = None, streaming : bool = False, callback : ResponseCallbackType = empty_response_callback, ) -> Any: """ Generate outputs from any GPT4All model. Args: prompt: The prompt for the model to complete. max_tokens: The maximum number of tokens to generate. temp: The model temperature. Larger values increase creativity but decrease factuality. top_k: Randomly sample from the top_k most likely tokens at each generation step. Set this to 1 for greedy decoding. top_p: Randomly sample at each generation step from the top most likely tokens whose probabilities add up to top_p. min_p: Randomly sample at each generation step from the top most likely tokens whose probabilities are at least min_p. repeat_penalty: Penalize the model for repetition. Higher values result in less repetition. repeat_last_n: How far in the models generation history to apply the repeat penalty. n_batch: Number of prompt tokens processed in parallel. Larger values decrease latency but increase resource requirements. n_predict: Equivalent to max_tokens, exists for backwards compatibility. streaming: If True, this method will instead return a generator that yields tokens as the model generates them. callback: A function with arguments token_id:int and response:str, which receives the tokens from the model as they are generated and stops the generation by returning False. Returns: Either the entire completion or a generator that yields the completion token by token. """ # Preparing the model request generate_kwargs: dict[str, Any] = dict( temp = temp, top_k = top_k, top_p = top_p, min_p = min_p, repeat_penalty = repeat_penalty, repeat_last_n = repeat_last_n, n_batch = n_batch, n_predict = n_predict if n_predict is not None else max_tokens, ) # Prepare the callback, process the model response full_response = "" def _callback_wrapper(token_id: int, response: str) -> bool: nonlocal full_response full_response += response return callback(token_id, response) last_msg_rendered = prompt if self._chat_session is not None: session = self._chat_session def render(messages: list[MessageType]) -> str: return session.template.render( messages=messages, add_generation_prompt=True, **self.model.special_tokens_map, ) session.history.append(MessageType(role="user", content=prompt)) prompt = render(session.history) if len(session.history) > 1: last_msg_rendered = render(session.history[-1:]) # Check request length last_msg_len = self.model.count_prompt_tokens(last_msg_rendered) if last_msg_len > (limit := self.model.n_ctx - 4): raise ValueError(f"Your message was too long and could not be processed ({last_msg_len} > {limit}).") # Send the request to the model if streaming: def stream() -> Iterator[str]: yield from self.model.prompt_model_streaming(prompt, _callback_wrapper, **generate_kwargs) if self._chat_session is not None: self._chat_session.history.append(MessageType(role="assistant", content=full_response)) return stream() self.model.prompt_model(prompt, _callback_wrapper, **generate_kwargs) if self._chat_session is not None: self._chat_session.history.append(MessageType(role="assistant", content=full_response)) return full_response @contextmanager def chat_session( self, system_message: str | Literal[False] | None = None, chat_template: str | None = None, ): """ Context manager to hold an inference optimized chat session with a GPT4All model. Args: system_message: An initial instruction for the model, None to use the model default, or False to disable. Defaults to None. chat_template: Jinja template for the conversation, or None to use the model default. Defaults to None. """ if system_message is None: system_message = self.config.get("systemMessage", False) if chat_template is None: if "name" not in self.config: raise ValueError("For sideloaded models or with allow_download=False, you must specify a chat template.") if "chatTemplate" not in self.config: raise NotImplementedError("This model appears to have a built-in chat template, but loading it is not " "currently implemented. Please pass a template to chat_session() directly.") if (tmpl := self.config["chatTemplate"]) is None: raise ValueError(f"The model {self.config['name']!r} does not support chat.") chat_template = tmpl history = [] if system_message is not False: history.append(MessageType(role="system", content=system_message)) self._chat_session = ChatSession( template=_jinja_env.from_string(chat_template), history=history, ) try: yield self finally: self._chat_session = None @staticmethod def list_gpus() -> list[str]: """ List the names of the available GPU devices. Returns: A list of strings representing the names of the available GPU devices. """ return LLModel.list_gpus() ---|--- ##### `backend: Literal['cpu', 'kompute', 'cuda', 'metal']` `property` The name of the llama.cpp backend currently in use. One of "cpu", "kompute", "cuda", or "metal". ##### `device: str | None` `property` The name of the GPU device currently in use, or None for backends other than Kompute or CUDA. ##### `__init__(model_name, *, model_path=None, model_type=None, allow_download=True, n_threads=None, device=None, n_ctx=2048, ngl=100, verbose=False)` Constructor Parameters: * **`model_name`** (`str`) – Name of GPT4All or custom model. Including ".gguf" file extension is optional but encouraged. * **`model_path`** (`str | PathLike[str] | None`, default: `None` ) – Path to directory containing model file or, if file does not exist, where to download model. Default is None, in which case models will be stored in `~/.cache/gpt4all/`. * **`model_type`** (`str | None`, default: `None` ) – Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user. Default is None. * **`allow_download`** (`bool`, default: `True` ) – Allow API to download models from gpt4all.io. Default is True. * **`n_threads`** (`int | None`, default: `None` ) – number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically. * **`device`** (`str | None`, default: `None` ) – The processing unit on which the GPT4All model will run. It can be set to: \- "cpu": Model will run on the central processing unit. \- "gpu": Use Metal on ARM64 macOS, otherwise the same as "kompute". \- "kompute": Use the best GPU provided by the Kompute backend. \- "cuda": Use the best GPU provided by the CUDA backend. \- "amd", "nvidia": Use the best GPU provided by the Kompute backend from this vendor. \- A specific device name from the list returned by `GPT4All.list_gpus()`. Default is Metal on ARM64 macOS, "cpu" otherwise. Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown, and the GPT4All instance will be rendered invalid. It's advised to ensure the device has enough memory before initiating the model. * **`n_ctx`** (`int`, default: `2048` ) – Maximum size of context window * **`ngl`** (`int`, default: `100` ) – Number of GPU layers to use (Vulkan) * **`verbose`** (`bool`, default: `False` ) – If True, print debug messages. Source code in `gpt4all/gpt4all.py` 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 | def __init__( self, model_name: str, *, model_path: str | os.PathLike[str] | None = None, model_type: str | None = None, allow_download: bool = True, n_threads: int | None = None, device: str | None = None, n_ctx: int = 2048, ngl: int = 100, verbose: bool = False, ): """ Constructor Args: model_name: Name of GPT4All or custom model. Including ".gguf" file extension is optional but encouraged. model_path: Path to directory containing model file or, if file does not exist, where to download model. Default is None, in which case models will be stored in `~/.cache/gpt4all/`. model_type: Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user. Default is None. allow_download: Allow API to download models from gpt4all.io. Default is True. n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically. device: The processing unit on which the GPT4All model will run. It can be set to: - "cpu": Model will run on the central processing unit. - "gpu": Use Metal on ARM64 macOS, otherwise the same as "kompute". - "kompute": Use the best GPU provided by the Kompute backend. - "cuda": Use the best GPU provided by the CUDA backend. - "amd", "nvidia": Use the best GPU provided by the Kompute backend from this vendor. - A specific device name from the list returned by `GPT4All.list_gpus()`. Default is Metal on ARM64 macOS, "cpu" otherwise. Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown, and the GPT4All instance will be rendered invalid. It's advised to ensure the device has enough memory before initiating the model. n_ctx: Maximum size of context window ngl: Number of GPU layers to use (Vulkan) verbose: If True, print debug messages. """ self.model_type = model_type self._chat_session: ChatSession | None = None device_init = None if sys.platform == "darwin": if device is None: backend = "auto" # "auto" is effectively "metal" due to currently non-functional fallback elif device == "cpu": backend = "cpu" else: if platform.machine() != "arm64" or device != "gpu": raise ValueError(f"Unknown device for this platform: {device}") backend = "metal" else: backend = "kompute" if device is None or device == "cpu": pass # use kompute with no device elif device in ("cuda", "kompute"): backend = device device_init = "gpu" elif device.startswith("cuda:"): backend = "cuda" device_init = _remove_prefix(device, "cuda:") else: device_init = _remove_prefix(device, "kompute:") # Retrieve model and download if allowed self.config: ConfigType = self.retrieve_model(model_name, model_path=model_path, allow_download=allow_download, verbose=verbose) self.model = LLModel(self.config["path"], n_ctx, ngl, backend) if device_init is not None: self.model.init_gpu(device_init) self.model.load_model() # Set n_threads if n_threads is not None: self.model.set_thread_count(n_threads) ---|--- ##### `chat_session(system_message=None, chat_template=None)` Context manager to hold an inference optimized chat session with a GPT4All model. Parameters: * **`system_message`** (`str | Literal[False] | None`, default: `None` ) – An initial instruction for the model, None to use the model default, or False to disable. Defaults to None. * **`chat_template`** (`str | None`, default: `None` ) – Jinja template for the conversation, or None to use the model default. Defaults to None. Source code in `gpt4all/gpt4all.py` 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 | @contextmanager def chat_session( self, system_message: str | Literal[False] | None = None, chat_template: str | None = None, ): """ Context manager to hold an inference optimized chat session with a GPT4All model. Args: system_message: An initial instruction for the model, None to use the model default, or False to disable. Defaults to None. chat_template: Jinja template for the conversation, or None to use the model default. Defaults to None. """ if system_message is None: system_message = self.config.get("systemMessage", False) if chat_template is None: if "name" not in self.config: raise ValueError("For sideloaded models or with allow_download=False, you must specify a chat template.") if "chatTemplate" not in self.config: raise NotImplementedError("This model appears to have a built-in chat template, but loading it is not " "currently implemented. Please pass a template to chat_session() directly.") if (tmpl := self.config["chatTemplate"]) is None: raise ValueError(f"The model {self.config['name']!r} does not support chat.") chat_template = tmpl history = [] if system_message is not False: history.append(MessageType(role="system", content=system_message)) self._chat_session = ChatSession( template=_jinja_env.from_string(chat_template), history=history, ) try: yield self finally: self._chat_session = None ---|--- ##### `close()` Delete the model instance and free associated system resources. Source code in `gpt4all/gpt4all.py` 279 280 281 | def close(self) -> None: """Delete the model instance and free associated system resources.""" self.model.close() ---|--- ##### `download_model(model_filename, model_path, verbose=True, url=None, expected_size=None, expected_md5=None)` `staticmethod` Download model from gpt4all.io. Parameters: * **`model_filename`** (`str`) – Filename of model (with .gguf extension). * **`model_path`** (`str | PathLike[str]`) – Path to download model to. * **`verbose`** (`bool`, default: `True` ) – If True (default), print debug messages. * **`url`** (`str | None`, default: `None` ) – the models remote url (e.g. may be hosted on HF) * **`expected_size`** (`int | None`, default: `None` ) – The expected size of the download. * **`expected_md5`** (`str | None`, default: `None` ) – The expected MD5 hash of the download. Returns: * `str | PathLike[str]` – Model file destination. Source code in `gpt4all/gpt4all.py` 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 | @staticmethod def download_model( model_filename: str, model_path: str | os.PathLike[str], verbose: bool = True, url: str | None = None, expected_size: int | None = None, expected_md5: str | None = None, ) -> str | os.PathLike[str]: """ Download model from gpt4all.io. Args: model_filename: Filename of model (with .gguf extension). model_path: Path to download model to. verbose: If True (default), print debug messages. url: the models remote url (e.g. may be hosted on HF) expected_size: The expected size of the download. expected_md5: The expected MD5 hash of the download. Returns: Model file destination. """ # Download model if url is None: url = f"https://gpt4all.io/models/gguf/{model_filename}" def make_request(offset=None): headers = {} if offset: print(f"\nDownload interrupted, resuming from byte position {offset}", file=sys.stderr) headers["Range"] = f"bytes={offset}-" # resume incomplete response headers["Accept-Encoding"] = "identity" # Content-Encoding changes meaning of ranges response = requests.get(url, stream=True, headers=headers) if response.status_code not in (200, 206): raise ValueError(f"Request failed: HTTP {response.status_code} {response.reason}") if offset and (response.status_code != 206 or str(offset) not in response.headers.get("Content-Range", "")): raise ValueError("Connection was interrupted and server does not support range requests") if (enc := response.headers.get("Content-Encoding")) is not None: raise ValueError(f"Expected identity Content-Encoding, got {enc}") return response response = make_request() total_size_in_bytes = int(response.headers.get("content-length", 0)) block_size = 2**20 # 1 MB partial_path = Path(model_path) / (model_filename + ".part") with open(partial_path, "w+b") as partf: try: with tqdm(desc="Downloading", total=total_size_in_bytes, unit="iB", unit_scale=True) as progress_bar: while True: last_progress = progress_bar.n try: for data in response.iter_content(block_size): partf.write(data) progress_bar.update(len(data)) except ChunkedEncodingError as cee: if cee.args and isinstance(pe := cee.args[0], ProtocolError): if len(pe.args) >= 2 and isinstance(ir := pe.args[1], IncompleteRead): assert progress_bar.n <= ir.partial # urllib3 may be ahead of us but never behind # the socket was closed during a read - retry response = make_request(progress_bar.n) continue raise if total_size_in_bytes != 0 and progress_bar.n < total_size_in_bytes: if progress_bar.n == last_progress: raise RuntimeError("Download not making progress, aborting.") # server closed connection prematurely - retry response = make_request(progress_bar.n) continue break # verify file integrity file_size = partf.tell() if expected_size is not None and file_size != expected_size: raise ValueError(f"Expected file size of {expected_size} bytes, got {file_size}") if expected_md5 is not None: partf.seek(0) hsh = hashlib.md5() with tqdm(desc="Verifying", total=file_size, unit="iB", unit_scale=True) as bar: while chunk := partf.read(block_size): hsh.update(chunk) bar.update(len(chunk)) if hsh.hexdigest() != expected_md5.lower(): raise ValueError(f"Expected MD5 hash of {expected_md5!r}, got {hsh.hexdigest()!r}") except: if verbose: print("Cleaning up the interrupted download...", file=sys.stderr) try: os.remove(partial_path) except OSError: pass raise # flush buffers and sync the inode partf.flush() _fsync(partf) # move to final destination download_path = Path(model_path) / model_filename try: os.rename(partial_path, download_path) except FileExistsError: try: os.remove(partial_path) except OSError: pass raise if verbose: print(f"Model downloaded to {str(download_path)!r}", file=sys.stderr) return download_path ---|--- ##### `generate(prompt, *, max_tokens=200, temp=0.7, top_k=40, top_p=0.4, min_p=0.0, repeat_penalty=1.18, repeat_last_n=64, n_batch=8, n_predict=None, streaming=False, callback=empty_response_callback)` Generate outputs from any GPT4All model. Parameters: * **`prompt`** (`str`) – The prompt for the model to complete. * **`max_tokens`** (`int`, default: `200` ) – The maximum number of tokens to generate. * **`temp`** (`float`, default: `0.7` ) – The model temperature. Larger values increase creativity but decrease factuality. * **`top_k`** (`int`, default: `40` ) – Randomly sample from the top_k most likely tokens at each generation step. Set this to 1 for greedy decoding. * **`top_p`** (`float`, default: `0.4` ) – Randomly sample at each generation step from the top most likely tokens whose probabilities add up to top_p. * **`min_p`** (`float`, default: `0.0` ) – Randomly sample at each generation step from the top most likely tokens whose probabilities are at least min_p. * **`repeat_penalty`** (`float`, default: `1.18` ) – Penalize the model for repetition. Higher values result in less repetition. * **`repeat_last_n`** (`int`, default: `64` ) – How far in the models generation history to apply the repeat penalty. * **`n_batch`** (`int`, default: `8` ) – Number of prompt tokens processed in parallel. Larger values decrease latency but increase resource requirements. * **`n_predict`** (`int | None`, default: `None` ) – Equivalent to max_tokens, exists for backwards compatibility. * **`streaming`** (`bool`, default: `False` ) – If True, this method will instead return a generator that yields tokens as the model generates them. * **`callback`** (`ResponseCallbackType`, default: `empty_response_callback` ) – A function with arguments token_id:int and response:str, which receives the tokens from the model as they are generated and stops the generation by returning False. Returns: * `Any` – Either the entire completion or a generator that yields the completion token by token. Source code in `gpt4all/gpt4all.py` 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 | def generate( self, prompt : str, *, max_tokens : int = 200, temp : float = 0.7, top_k : int = 40, top_p : float = 0.4, min_p : float = 0.0, repeat_penalty : float = 1.18, repeat_last_n : int = 64, n_batch : int = 8, n_predict : int | None = None, streaming : bool = False, callback : ResponseCallbackType = empty_response_callback, ) -> Any: """ Generate outputs from any GPT4All model. Args: prompt: The prompt for the model to complete. max_tokens: The maximum number of tokens to generate. temp: The model temperature. Larger values increase creativity but decrease factuality. top_k: Randomly sample from the top_k most likely tokens at each generation step. Set this to 1 for greedy decoding. top_p: Randomly sample at each generation step from the top most likely tokens whose probabilities add up to top_p. min_p: Randomly sample at each generation step from the top most likely tokens whose probabilities are at least min_p. repeat_penalty: Penalize the model for repetition. Higher values result in less repetition. repeat_last_n: How far in the models generation history to apply the repeat penalty. n_batch: Number of prompt tokens processed in parallel. Larger values decrease latency but increase resource requirements. n_predict: Equivalent to max_tokens, exists for backwards compatibility. streaming: If True, this method will instead return a generator that yields tokens as the model generates them. callback: A function with arguments token_id:int and response:str, which receives the tokens from the model as they are generated and stops the generation by returning False. Returns: Either the entire completion or a generator that yields the completion token by token. """ # Preparing the model request generate_kwargs: dict[str, Any] = dict( temp = temp, top_k = top_k, top_p = top_p, min_p = min_p, repeat_penalty = repeat_penalty, repeat_last_n = repeat_last_n, n_batch = n_batch, n_predict = n_predict if n_predict is not None else max_tokens, ) # Prepare the callback, process the model response full_response = "" def _callback_wrapper(token_id: int, response: str) -> bool: nonlocal full_response full_response += response return callback(token_id, response) last_msg_rendered = prompt if self._chat_session is not None: session = self._chat_session def render(messages: list[MessageType]) -> str: return session.template.render( messages=messages, add_generation_prompt=True, **self.model.special_tokens_map, ) session.history.append(MessageType(role="user", content=prompt)) prompt = render(session.history) if len(session.history) > 1: last_msg_rendered = render(session.history[-1:]) # Check request length last_msg_len = self.model.count_prompt_tokens(last_msg_rendered) if last_msg_len > (limit := self.model.n_ctx - 4): raise ValueError(f"Your message was too long and could not be processed ({last_msg_len} > {limit}).") # Send the request to the model if streaming: def stream() -> Iterator[str]: yield from self.model.prompt_model_streaming(prompt, _callback_wrapper, **generate_kwargs) if self._chat_session is not None: self._chat_session.history.append(MessageType(role="assistant", content=full_response)) return stream() self.model.prompt_model(prompt, _callback_wrapper, **generate_kwargs) if self._chat_session is not None: self._chat_session.history.append(MessageType(role="assistant", content=full_response)) return full_response ---|--- ##### `list_gpus()` `staticmethod` List the names of the available GPU devices. Returns: * `list[str]` – A list of strings representing the names of the available GPU devices. Source code in `gpt4all/gpt4all.py` 640 641 642 643 644 645 646 647 648 | @staticmethod def list_gpus() -> list[str]: """ List the names of the available GPU devices. Returns: A list of strings representing the names of the available GPU devices. """ return LLModel.list_gpus() ---|--- ##### `list_models()` `staticmethod` Fetch model list from https://gpt4all.io/models/models3.json. Returns: * `list[ConfigType]` – Model list in JSON format. Source code in `gpt4all/gpt4all.py` 303 304 305 306 307 308 309 310 311 312 313 314 | @staticmethod def list_models() -> list[ConfigType]: """ Fetch model list from https://gpt4all.io/models/models3.json. Returns: Model list in JSON format. """ resp = requests.get("https://gpt4all.io/models/models3.json") if resp.status_code != 200: raise ValueError(f"Request failed: HTTP {resp.status_code} {resp.reason}") return resp.json() ---|--- ##### `retrieve_model(model_name, model_path=None, allow_download=True, verbose=False)` `classmethod` Find model file, and if it doesn't exist, download the model. Parameters: * **`model_name`** (`str`) – Name of model. * **`model_path`** (`str | PathLike[str] | None`, default: `None` ) – Path to find model. Default is None in which case path is set to ~/.cache/gpt4all/. * **`allow_download`** (`bool`, default: `True` ) – Allow API to download model from gpt4all.io. Default is True. * **`verbose`** (`bool`, default: `False` ) – If True (default), print debug messages. Returns: * `ConfigType` – Model config. Source code in `gpt4all/gpt4all.py` 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 | @classmethod def retrieve_model( cls, model_name: str, model_path: str | os.PathLike[str] | None = None, allow_download: bool = True, verbose: bool = False, ) -> ConfigType: """ Find model file, and if it doesn't exist, download the model. Args: model_name: Name of model. model_path: Path to find model. Default is None in which case path is set to ~/.cache/gpt4all/. allow_download: Allow API to download model from gpt4all.io. Default is True. verbose: If True (default), print debug messages. Returns: Model config. """ model_filename = append_extension_if_missing(model_name) # get the config for the model config: ConfigType = {} if allow_download: models = cls.list_models() if (model := next((m for m in models if m["filename"] == model_filename), None)) is not None: config.update(model) # Validate download directory if model_path is None: try: os.makedirs(DEFAULT_MODEL_DIRECTORY, exist_ok=True) except OSError as e: raise RuntimeError("Failed to create model download directory") from e model_path = DEFAULT_MODEL_DIRECTORY else: model_path = Path(model_path) if not model_path.exists(): raise FileNotFoundError(f"Model directory does not exist: {model_path!r}") model_dest = model_path / model_filename if model_dest.exists(): config["path"] = str(model_dest) if verbose: print(f"Found model file at {str(model_dest)!r}", file=sys.stderr) elif allow_download: # If model file does not exist, download filesize = config.get("filesize") config["path"] = str(cls.download_model( model_filename, model_path, verbose=verbose, url=config.get("url"), expected_size=None if filesize is None else int(filesize), expected_md5=config.get("md5sum"), )) else: raise FileNotFoundError(f"Model file does not exist: {model_dest!r}") return config ---|--- #### `Embed4All` Python class that handles embeddings for GPT4All. Source code in `gpt4all/gpt4all.py` 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | class Embed4All: """ Python class that handles embeddings for GPT4All. """ MIN_DIMENSIONALITY = 64 def __init__(self, model_name: str | None = None, *, n_threads: int | None = None, device: str | None = None, **kwargs: Any): """ Constructor Args: n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically. device: The processing unit on which the embedding model will run. See the `GPT4All` constructor for more info. kwargs: Remaining keyword arguments are passed to the `GPT4All` constructor. """ if model_name is None: model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf" self.gpt4all = GPT4All(model_name, n_threads=n_threads, device=device, **kwargs) def __enter__(self) -> Self: return self def __exit__( self, typ: type[BaseException] | None, value: BaseException | None, tb: TracebackType | None, ) -> None: self.close() def close(self) -> None: """Delete the model instance and free associated system resources.""" self.gpt4all.close() # return_dict=False @overload def embed( self, text: str, *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ..., return_dict: Literal[False] = ..., atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ..., ) -> list[float]: ... @overload def embed( self, text: list[str], *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ..., return_dict: Literal[False] = ..., atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ..., ) -> list[list[float]]: ... @overload def embed( self, text: str | list[str], *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ..., return_dict: Literal[False] = ..., atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ..., ) -> list[Any]: ... # return_dict=True @overload def embed( self, text: str, *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ..., return_dict: Literal[True], atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ..., ) -> EmbedResult[list[float]]: ... @overload def embed( self, text: list[str], *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ..., return_dict: Literal[True], atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ..., ) -> EmbedResult[list[list[float]]]: ... @overload def embed( self, text: str | list[str], *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ..., return_dict: Literal[True], atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ..., ) -> EmbedResult[list[Any]]: ... # return type unknown @overload def embed( self, text: str | list[str], *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ..., return_dict: bool = ..., atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ..., ) -> Any: ... def embed( self, text: str | list[str], *, prefix: str | None = None, dimensionality: int | None = None, long_text_mode: str = "mean", return_dict: bool = False, atlas: bool = False, cancel_cb: EmbCancelCallbackType | None = None, ) -> Any: """ Generate one or more embeddings. Args: text: A text or list of texts to generate embeddings for. prefix: The model-specific prefix representing the embedding task, without the trailing colon. For Nomic Embed, this can be `search_query`, `search_document`, `classification`, or `clustering`. Defaults to `search_document` or equivalent if known; otherwise, you must explicitly pass a prefix or an empty string if none applies. dimensionality: The embedding dimension, for use with Matryoshka-capable models. Defaults to full-size. long_text_mode: How to handle texts longer than the model can accept. One of `mean` or `truncate`. return_dict: Return the result as a dict that includes the number of prompt tokens processed. atlas: Try to be fully compatible with the Atlas API. Currently, this means texts longer than 8192 tokens with long_text_mode="mean" will raise an error. Disabled by default. cancel_cb: Called with arguments (batch_sizes, backend_name). Return true to cancel embedding. Returns: With return_dict=False, an embedding or list of embeddings of your text(s). With return_dict=True, a dict with keys 'embeddings' and 'n_prompt_tokens'. Raises: CancellationError: If cancel_cb returned True and embedding was canceled. """ if dimensionality is None: dimensionality = -1 else: if dimensionality <= 0: raise ValueError(f"Dimensionality must be None or a positive integer, got {dimensionality}") if dimensionality < self.MIN_DIMENSIONALITY: warnings.warn( f"Dimensionality {dimensionality} is less than the suggested minimum of {self.MIN_DIMENSIONALITY}." " Performance may be degraded." ) try: do_mean = {"mean": True, "truncate": False}[long_text_mode] except KeyError: raise ValueError(f"Long text mode must be one of 'mean' or 'truncate', got {long_text_mode!r}") result = self.gpt4all.model.generate_embeddings(text, prefix, dimensionality, do_mean, atlas, cancel_cb) return result if return_dict else result["embeddings"] ---|--- ##### `__init__(model_name=None, *, n_threads=None, device=None, **kwargs)` Constructor Parameters: * **`n_threads`** (`int | None`, default: `None` ) – number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically. * **`device`** (`str | None`, default: `None` ) – The processing unit on which the embedding model will run. See the `GPT4All` constructor for more info. * **`kwargs`** (`Any`, default: `{}` ) – Remaining keyword arguments are passed to the `GPT4All` constructor. Source code in `gpt4all/gpt4all.py` 76 77 78 79 80 81 82 83 84 85 86 87 | def __init__(self, model_name: str | None = None, *, n_threads: int | None = None, device: str | None = None, **kwargs: Any): """ Constructor Args: n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically. device: The processing unit on which the embedding model will run. See the `GPT4All` constructor for more info. kwargs: Remaining keyword arguments are passed to the `GPT4All` constructor. """ if model_name is None: model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf" self.gpt4all = GPT4All(model_name, n_threads=n_threads, device=device, **kwargs) ---|--- ##### `close()` Delete the model instance and free associated system resources. Source code in `gpt4all/gpt4all.py` 97 98 99 | def close(self) -> None: """Delete the model instance and free associated system resources.""" self.gpt4all.close() ---|--- ##### `embed(text, *, prefix=None, dimensionality=None, long_text_mode='mean', return_dict=False, atlas=False, cancel_cb=None)` Generate one or more embeddings. Parameters: * **`text`** (`str | list[str]`) – A text or list of texts to generate embeddings for. * **`prefix`** (`str | None`, default: `None` ) – The model-specific prefix representing the embedding task, without the trailing colon. For Nomic Embed, this can be `search_query`, `search_document`, `classification`, or `clustering`. Defaults to `search_document` or equivalent if known; otherwise, you must explicitly pass a prefix or an empty string if none applies. * **`dimensionality`** (`int | None`, default: `None` ) – The embedding dimension, for use with Matryoshka-capable models. Defaults to full-size. * **`long_text_mode`** (`str`, default: `'mean'` ) – How to handle texts longer than the model can accept. One of `mean` or `truncate`. * **`return_dict`** (`bool`, default: `False` ) – Return the result as a dict that includes the number of prompt tokens processed. * **`atlas`** (`bool`, default: `False` ) – Try to be fully compatible with the Atlas API. Currently, this means texts longer than 8192 tokens with long_text_mode="mean" will raise an error. Disabled by default. * **`cancel_cb`** (`EmbCancelCallbackType | None`, default: `None` ) – Called with arguments (batch_sizes, backend_name). Return true to cancel embedding. Returns: * `Any` – With return_dict=False, an embedding or list of embeddings of your text(s). * `Any` – With return_dict=True, a dict with keys 'embeddings' and 'n_prompt_tokens'. Raises: * `CancellationError` – If cancel_cb returned True and embedding was canceled. Source code in `gpt4all/gpt4all.py` 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | def embed( self, text: str | list[str], *, prefix: str | None = None, dimensionality: int | None = None, long_text_mode: str = "mean", return_dict: bool = False, atlas: bool = False, cancel_cb: EmbCancelCallbackType | None = None, ) -> Any: """ Generate one or more embeddings. Args: text: A text or list of texts to generate embeddings for. prefix: The model-specific prefix representing the embedding task, without the trailing colon. For Nomic Embed, this can be `search_query`, `search_document`, `classification`, or `clustering`. Defaults to `search_document` or equivalent if known; otherwise, you must explicitly pass a prefix or an empty string if none applies. dimensionality: The embedding dimension, for use with Matryoshka-capable models. Defaults to full-size. long_text_mode: How to handle texts longer than the model can accept. One of `mean` or `truncate`. return_dict: Return the result as a dict that includes the number of prompt tokens processed. atlas: Try to be fully compatible with the Atlas API. Currently, this means texts longer than 8192 tokens with long_text_mode="mean" will raise an error. Disabled by default. cancel_cb: Called with arguments (batch_sizes, backend_name). Return true to cancel embedding. Returns: With return_dict=False, an embedding or list of embeddings of your text(s). With return_dict=True, a dict with keys 'embeddings' and 'n_prompt_tokens'. Raises: CancellationError: If cancel_cb returned True and embedding was canceled. """ if dimensionality is None: dimensionality = -1 else: if dimensionality <= 0: raise ValueError(f"Dimensionality must be None or a positive integer, got {dimensionality}") if dimensionality < self.MIN_DIMENSIONALITY: warnings.warn( f"Dimensionality {dimensionality} is less than the suggested minimum of {self.MIN_DIMENSIONALITY}." " Performance may be degraded." ) try: do_mean = {"mean": True, "truncate": False}[long_text_mode] except KeyError: raise ValueError(f"Long text mode must be one of 'mean' or 'truncate', got {long_text_mode!r}") result = self.gpt4all.model.generate_embeddings(text, prefix, dimensionality, do_mean, atlas, cancel_cb) return result if return_dict else result["embeddings"] ---|--- --- # GPT4All Documentation Source: https://docs.gpt4all.io --- [ ![logo](assets/nomic.png) ](index.html "GPT4All") GPT4All [ nomic-ai/gpt4all ](https://github.com/nomic-ai/gpt4all "Go to repository") * [ GPT4All Documentation ](index.html) * [ Quickstart ](gpt4all_desktop/quickstart.html) * [ Chats ](gpt4all_desktop/chats.html) * [ Models ](gpt4all_desktop/models.html) * [ LocalDocs ](gpt4all_desktop/localdocs.html) * [ Settings ](gpt4all_desktop/settings.html) * [ Chat Templates ](gpt4all_desktop/chat_templates.html) * Cookbook Cookbook * [ Local AI Chat with Microsoft Excel ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html) * [ Local AI Chat with your Google Drive ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html) * [ Local AI Chat with your Obsidian Vault ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html) * [ Local AI Chat with your OneDrive ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html) * API Server API Server * [ GPT4All API Server ](gpt4all_api_server/home.html) * Python SDK Python SDK * [ GPT4All Python SDK ](gpt4all_python/home.html) * [ Monitoring ](gpt4all_python/monitoring.html) * [ SDK Reference ](gpt4all_python/ref.html) * Help Help * [ FAQ ](gpt4all_help/faq.html) * [ Troubleshooting ](gpt4all_help/troubleshooting.html) # GPT4All Documentation GPT4All runs large language models (LLMs) privately on everyday desktops & laptops. No API calls or GPUs required - you can just download the application and [get started](gpt4all_desktop/quickstart.html#quickstart). Desktop Application GPT4All runs LLMs as an application on your computer. Nomic's embedding models can bring information from your local documents and files into your chats. It's fast, on-device, and completely **private**. [Download for Windows](https://gpt4all.io/installers/gpt4all-installer- win64.exe) [Download for Mac](https://gpt4all.io/installers/gpt4all- installer-darwin.dmg) [Download for Linux](https://gpt4all.io/installers/gpt4all-installer-linux.run) Python SDK Use GPT4All in Python to program with LLMs implemented with the [`llama.cpp`](https://github.com/ggerganov/llama.cpp) backend and [Nomic's C backend](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-backend). Nomic contributes to open source software like [`llama.cpp`](https://github.com/ggerganov/llama.cpp) to make LLMs accessible and efficient **for all**. pip install gpt4all from gpt4all import GPT4All model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM with model.chat_session(): print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))