Guess a Number

# Gpt4All

> [![logo](../assets/nomic.png)](../index.html "GPT4All") GPT4All

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_api_server/home.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](../gpt4all_desktop/quickstart.html)
  * [ Chats  ](../gpt4all_desktop/chats.html)
  * [ Models  ](../gpt4all_desktop/models.html)
  * [ LocalDocs  ](../gpt4all_desktop/localdocs.html)
  * [ Settings  ](../gpt4all_desktop/settings.html)
  * [ Chat Templates  ](../gpt4all_desktop/chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * GPT4All API Server  [ GPT4All API Server  ](home.html) Table of contents 
      * Key Features 
      * Activating the API Server 
      * Connecting to the API Server 
      * Examples 
      * API Endpoints 
      * LocalDocs Integration 
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * Key Features 
  * Activating the API Server 
  * Connecting to the API Server 
  * Examples 
  * API Endpoints 
  * LocalDocs Integration 

# GPT4All API Server

GPT4All provides a local API server that allows you to run LLMs over an HTTP
API.

## Key Features

  * **Local Execution** : Run models on your own hardware for privacy and offline use.
  * **LocalDocs Integration** : Run the API with relevant text snippets provided to your LLM from a [LocalDocs collection](../gpt4all_desktop/localdocs.html).
  * **OpenAI API Compatibility** : Use existing OpenAI-compatible clients and tools with your local models.

## Activating the API Server

  1. Open the GPT4All Chat Desktop Application.
  2. Go to `Settings` > `Application` and scroll down to `Advanced`.
  3. Check the box for the `"Enable Local API Server"` setting.
  4. The server listens on port 4891 by default. You can choose another port number in the `"API Server Port"` setting.

## Connecting to the API Server

The base URL used for the API server is `http://localhost:4891/v1` (or
`http://localhost:<PORT_NUM>/v1` if you are using a different port number).

The server only accepts HTTP connections (not HTTPS) and only listens on
localhost (127.0.0.1) (e.g. not to the IPv6 localhost address `::1`.)

## Examples

Example GPT4All API calls

cURLPowerShell

    
    
    curl -X POST http://localhost:4891/v1/chat/completions -d '{
    "model": "Phi-3 Mini Instruct",
    "messages": [{"role":"user","content":"Who is Lionel Messi?"}],
    "max_tokens": 50,
    "temperature": 0.28
    }'
    
    
    
    Invoke-WebRequest -URI http://localhost:4891/v1/chat/completions -Method POST -ContentType application/json -Body '{
    "model": "Phi-3 Mini Instruct",
    "messages": [{"role":"user","content":"Who is Lionel Messi?"}],
    "max_tokens": 50,
    "temperature": 0.28
    }'
    

## API Endpoints

Method | Path | Description  
---|---|---  
GET | `/v1/models` | List available models  
GET | `/v1/models/<name>` | Get details of a specific model  
POST | `/v1/completions` | Generate text completions  
POST | `/v1/chat/completions` | Generate chat completions  
  
## LocalDocs Integration

You can use LocalDocs with the API server:

  1. Open the Chats view in the GPT4All application.
  2. Scroll to the bottom of the chat history sidebar.
  3. Select the server chat (it has a different background color).
  4. Activate LocalDocs collections in the right sidebar.

(Note: LocalDocs can currently only be activated through the GPT4All UI, not
via the API itself).

Now, your API calls to your local LLM will have relevant references from your
LocalDocs collection retrieved and placed in the input message for the LLM to
respond to.

The references retrieved for your API call can be accessed in the API response
object at

`response["choices"][0]["references"]`

The data included in the `references` are:

  * `text`: the actual text content from the snippet that was extracted from the reference document

  * `author`: the author of the reference document (if available)

  * `date`: the date of creation of the reference document (if available)

  * `page`: the page number the snippet is from (only available for PDF documents for now)

  * `title`: the title of the reference document (if available)

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/chat_templates.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](quickstart.html)
  * [ Chats  ](chats.html)
  * [ Models  ](models.html)
  * [ LocalDocs  ](localdocs.html)
  * [ Settings  ](settings.html)
  * Chat Templates  [ Chat Templates  ](chat_templates.html) Table of contents 
    * What are chat templates? 
    * What is a system message? 
    * How do I customize the chat template or system message? 
    * Do I need to write a chat template? 
    * What changed in GPT4All v3.5? 
    * Error/Warning: System message is not plain text. 
    * Error: Legacy system prompt needs to be updated in Settings. 
    * Error/Warning: Chat template is not in Jinja format. 
    * Error: Legacy prompt template needs to be updated in Settings. 
    * The chat template has a syntax error. 
    * Error: No chat template configured. 
    * Error: The chat template cannot be blank. 
    * How do I find a chat template? 
      * Using the CLI (all models) 
      * Python (open models) 
      * Python (gated models) 
    * Advanced: How do chat templates work? 
    * Advanced: How do I make a chat template? 
    * Advanced: What are GPT4All v1 templates? 
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * What are chat templates? 
  * What is a system message? 
  * How do I customize the chat template or system message? 
  * Do I need to write a chat template? 
  * What changed in GPT4All v3.5? 
  * Error/Warning: System message is not plain text. 
  * Error: Legacy system prompt needs to be updated in Settings. 
  * Error/Warning: Chat template is not in Jinja format. 
  * Error: Legacy prompt template needs to be updated in Settings. 
  * The chat template has a syntax error. 
  * Error: No chat template configured. 
  * Error: The chat template cannot be blank. 
  * How do I find a chat template? 
    * Using the CLI (all models) 
    * Python (open models) 
    * Python (gated models) 
  * Advanced: How do chat templates work? 
  * Advanced: How do I make a chat template? 
  * Advanced: What are GPT4All v1 templates? 

## What are chat templates?

Natively, large language models only know how to complete plain text and do
not know the difference between their input and their output. In order to
support a chat with a person, LLMs are designed to use a template to convert
the conversation to plain text using a specific format.

For a given model, it is important to use an appropriate chat template, as
each model is designed to work best with a specific format. The chat templates
included with the built-in models should be sufficient for most purposes.

There are two reasons you would want to alter the chat template:

  * You are sideloading a model and there is no chat template available,
  * You would like to have greater control over the input to the LLM than a system message provides.

## What is a system message?

A system message is a message that controls the responses from the LLM in a
way that affects the entire conversation. System messages can be short, such
as "Speak like a pirate.", or they can be long and contain a lot of context
for the LLM to keep in mind.

Not all models are designed to use a system message, so they work with some
models better than others.

## How do I customize the chat template or system message?

To customize the chat template or system message, go to Settings > Model. Make
sure to select the correct model at the top. If you clone a model, you can use
a different chat template or system message from the base model, enabling you
to use different settings for each conversation.

These settings take effect immediately. After changing them, you can click
"Redo last response" in the chat view, and the response will take the new
settings into account.

## Do I need to write a chat template?

You typically do not need to write your own chat template. The exception is
models that are not in the official model list and do not come with a chat
template built-in. These will show a "Clear" option above the chat template
field in the Model Settings page instead of a "Reset" option. See the section
on finding or creating a chat template.

## What changed in GPT4All v3.5?

GPT4All v3.5 overhauled the chat template system. There are three crucial
differences:

  * The chat template now formats an entire conversation instead of a single pair of messages,
  * The chat template now uses Jinja syntax instead of `%1` and `%2` placeholders,
  * And the system message should no longer contain control tokens or trailing whitespace.

If you are using any chat templates or system messages that had been added or
altered from the default before upgrading to GPT4All v3.5 or newer, these will
no longer work. See below for how to solve common errors you may see after
upgrading.

## Error/Warning: System message is not plain text.

This is easy to fix. Go to the model's settings and look at the system prompt.
There are three things to look for:

  * Control tokens such as `<|im_start|>`, `<|start_header_id|>`, or `<|system|>`
  * A prefix such as `### System` or `SYSTEM:`
  * Trailing whitespace, such as a space character or blank line.

If you see any of these things, remove them. For example, this legacy system
prompt:

    
    
    <|start_header_id|>system<|end_header_id|>
    You are a helpful assistant.<|eot_id|>
    

Should become this:

    
    
    You are a helpful assistant.
    

If you do not see anything that needs to be changed, you can dismiss the error
by making a minor modification to the message and then changing it back.

If you see a warning, your system message does not appear to be plain text. If
you believe this warning is incorrect, it can be safely ignored. If in doubt,
ask on the [Discord](https://discord.gg/mGZE39AS3e).

## Error: Legacy system prompt needs to be updated in Settings.

This is the same as above, but appears on the chat page.

## Error/Warning: Chat template is not in Jinja format.

This is the result of attempting to use an old-style template (possibly from a
previous version) in GPT4All 3.5+.

Go to the Model Settings page and select the affected model. If you see a
"Reset" button, and you have not intentionally modified the prompt template,
you can click "Reset". Otherwise, this is what you can do:

  1. Back up your chat template by copying it safely to a text file and saving it. In the next step, it will be removed from GPT4All.
  2. Click "Reset" or "Clear".
  3. If you clicked "Clear", the chat template is now gone. Follow the steps to find or create a basic chat template for your model.
  4. Customize the chat template to suit your needs. For help, read the section about creating a chat template.

## Error: Legacy prompt template needs to be updated in Settings.

This is the same as above, but appears on the chat page.

## The chat template has a syntax error.

If there is a syntax error while editing the chat template, the details will
be displayed in an error message above the input box. This could be because
the chat template is not actually in Jinja format (see above).

Otherwise, you have either typed something correctly, or the model comes with
a template that is incompatible with GPT4All. See the below section on
creating chat templates and make sure that everything is correct. When in
doubt, ask on the [Discord](https://discord.gg/mGZE39AS3e).

## Error: No chat template configured.

This may appear for models that are not from the official model list and do
not include a chat template. Older versions of GPT4All picked a poor default
in this case. You will get much better results if you follow the steps to find
or create a chat template for your model.

## Error: The chat template cannot be blank.

If the button above the chat template on the Model Settings page says "Clear",
see above. If you see "Reset", click that button to restore a reasonable
default. Also see the section on syntax errors.

## How do I find a chat template?

When in doubt, you can always ask the [Discord](https://discord.gg/mGZE39AS3e)
community for help. Below are the instructions to find one on your own.

The authoritative source for a model's chat template is the HuggingFace repo
that the original (non-GGUF) model came from. First, you should find this
page. If you just have a model file, you can try a google search for the
model's name. If you know the page you downloaded the GGUF model from, its
README usually links to the original non-GGUF model.

Once you have located the original model, there are two methods you can use to
extract its chat template. Pick whichever one you are most comfortable with.

### Using the CLI (all models)

  1. Install `jq` using your preferred package manager - e.g. Chocolatey (Windows), Homebrew (macOS), or apt (Ubuntu).
  2. Download `tokenizer_config.json` from the model's "Files and versions" tab.
  3. Open a command prompt in the directory which you have downloaded the model file.
  4. Run `jq -r ".chat_template" tokenizer_config.json`. This shows the chat template in a human-readable form. You can copy this and paste it into the settings page.
  5. (Optional) You can save the output to a text file like this: `jq -r ".chat_template" tokenizer_config.json >chat_template.txt`

If the output is "null", the model does not provide a chat template. See the
below instructions on creating a chat template.

### Python (open models)

  1. Install `transformers` using your preferred python package manager, e.g. `pip install transformers`. Make sure it is at least version v4.43.0.
  2. Copy the ID of the HuggingFace model, using the clipboard icon next to the name. For example, if the URL is `https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B`, the ID is `NousResearch/Hermes-2-Pro-Llama-3-8B`.
  3. Open a python interpreter (`python`) and run the following commands. Change the model ID in the example to the one you copied. 
         
         >>> from transformers import AutoTokenizer
         >>> tokenizer = AutoTokenizer.from_pretrained('NousResearch/Hermes-2-Pro-Llama-3-8B')
         >>> print(tokenizer.get_chat_template())
         

You can copy the output and paste it into the settings page.

  4. (Optional) You can save the output to a text file like this: 
         
         >>> open('chat_template.txt', 'w').write(tokenizer.get_chat_template())
         

If you get a ValueError exception, this model does not provide a chat
template. See the below instructions on creating a chat template.

### Python (gated models)

Some models, such as Llama and Mistral, do not allow public access to their
chat template. You must either use the CLI method above, or follow the
following instructions to use Python:

  1. For these steps, you must have git and git-lfs installed.
  2. You must have a HuggingFace account and be logged in.
  3. You must already have access to the gated model. Otherwise, request access.
  4. You must have an SSH key configured for git access to HuggingFace.
  5. `git clone` the model's HuggingFace repo using the SSH clone URL. There is no need to download the entire model, which is very large. A good way to do this on Linux is: 
         
         $ GIT_LFS_SKIP_SMUDGE=1 git clone hf.co:meta-llama/Llama-3.1-8B-Instruct.git
         $ cd Llama-3.1-8B-Instruct
         $ git lfs pull -I "tokenizer.*"
         

  6. Follow the above instructions for open models, but replace the model ID with the path to the directory containing `tokenizer\_config.json`: 
         
         >>> tokenizer = AutoTokenizer.from_pretrained('.')
         

## Advanced: How do chat templates work?

The chat template is applied to the entire conversation you see in the chat
window. The template loops over the list of messages, each containing `role`
and `content` fields. `role` is either `user`, `assistant`, or `system`.

GPT4All also supports the special variables `bos_token`, `eos_token`, and
`add_generation_prompt`. See the [HuggingFace
docs](https://huggingface.co/docs/transformers/v4.46.3/en/chat_templating#special-
variables) for what those do.

## Advanced: How do I make a chat template?

The best way to create a chat template is to start by using an existing one as
a reference. Then, modify it to use the format documented for the given model.
Its README page may explicitly give an example of its template. Or, it may
mention the name of a well-known standard template, such as ChatML, Alpaca,
Vicuna. GPT4All does not yet include presets for these templates, so they will
have to be found in other models or taken from the community.

For more information, see the very helpful [HuggingFace
guide](https://huggingface.co/docs/transformers/v4.46.3/en/chat_templating#advanced-
template-writing-tips). Some of this is not applicable, such as the
information about tool calling and RAG - GPT4All implements those features
differently.

Some models use a prompt template that does not intuitively map to a multi-
turn chat, because it is more intended for single instructions. The
[FastChat](https://github.com/lm-
sys/FastChat/blob/main/fastchat/conversation.py) implementation of these
templates is a useful reference for the correct way to extend them to multiple
messages.

# Advanced: What are GPT4All v1 templates?

GPT4All supports its own template syntax, which is nonstandard but provides
complete control over the way LocalDocs sources and file attachments are
inserted into the conversation. These templates begin with `{# gpt4all v1 #}`
and look similar to the example below.

For standard templates, GPT4All combines the user message, sources, and
attachments into the `content` field. For GPT4All v1 templates, this is not
done, so they must be used directly in the template for those features to work
correctly.

    
    
    {# gpt4all v1 #}
    {%- for message in messages %}
        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' }}
        {%- if message['role'] == 'user' %}
            {%- for source in message['sources'] %}
                {%- if loop.first %}
                    {{- '### Context:\n' }}
                {%- endif %}
                {{- 'Collection: ' + source['collection'] + '\n'   +
                    'Path: '       + source['path']       + '\n'   +
                    'Excerpt: '    + source['text']       + '\n\n' }}
            {%- endfor %}
        {%- endif %}
        {%- for attachment in message['prompt_attachments'] %}
            {{- attachment['processed_content'] + '\n\n' }}
        {%- endfor %}
        {{- message['content'] | trim }}
        {{- '<|eot_id|>' }}
    {%- endfor %}
    {%- if add_generation_prompt %}
        {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
    {%- endif %}

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/chats.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](quickstart.html)
  * Chats  [ Chats  ](chats.html) Table of contents 
    * New Chat 
    * LocalDocs 
    * Chat History 
  * [ Models  ](models.html)
  * [ LocalDocs  ](localdocs.html)
  * [ Settings  ](settings.html)
  * [ Chat Templates  ](chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * New Chat 
  * LocalDocs 
  * Chat History 

# Chats

Chats are conversations with [language models](models.html) that run locally
on your device.

Example Chats

Llama 3Nous Hermes 2 Mistral DPO

Download `Llama 3` and prompt:

    
    
    explain why the sky is blue in a way that is correct and makes sense to a child
    

With the default sampling settings, you should see text resembling the
following:

    
    
    What a great question!
    So, you know how we can see different colors like red, yellow, green, and orange? Well, when sunlight enters Earth's atmosphere, it starts to interact with tiny particles called molecules of gases like nitrogen (N2) and oxygen (02). These molecules are really small, but they're all around us!
    Now, here's the cool part: these molecules scatter, or bounce off, shorter wavelengths of light more than longer wavelengths. Think of it like a game of pool where the cue ball hits other balls with different sizes and speeds.
    The shortest wavelength of visible light is violet (about 400 nanometers), followed by blue (around 450-495 nanometers). These shorter wavelengths are scattered in all directions, kind of like how a basketball player might bounce the ball around on the court. This scattering effect makes the sky appear blue to our eyes!
    Here's why it doesn't look as bright or intense as a blue crayon: there are many more molecules in the atmosphere that scatter longer wavelengths (like red and orange) than shorter ones, so those colors get spread out and become less noticeable.
    To sum it up:
    1. Sunlight enters Earth's atmosphere.
    2. Tiny gas molecule particles interact with light, scattering shorter wavelengths like blue and violet.
    3. This scattered blue light reaches our eyes from all directions, making the sky appear blue!
    Now, go outside and enjoy that beautiful blue sky!
    

Download `Nous Hermes 2 Mistral DPO` and prompt:

    
    
    write me a react app i can run from the command line to play a quick game
    

With the default sampling settings, you should see text and code blocks
resembling the following:

    
    
    I'll provide you with a simple React app that allows you to play a guessing game. This app will generate a random number between 1 and 10, and ask you to guess it. You can run this app locally using Create React App. Follow the steps below:
    Open your terminal or command line interface.
    Navigate to the directory where you want to create the project (e.g., cd Documents/Projects).
    Run the following commands:
    
    
    
    npx create-react-app guessing-game
    cd guessing-game
    
    
    
    Open the project in your preferred code editor by running code . (for VS Code) or any other command to open the folder with your text editor.
    Replace the content of the src/App.js file with the following code:
    
    
    
    import React, { useState } from 'react';
    
    function App() {
    const [guess, setGuess] = useState('');
    const [message, setMessage] = useState('');
    const [numberToGuess, setNumberToGuess] = useState(Math.floor(Math.random() * 10) + 1);
    
    function handleChange(e) {
        setGuess(e.target.value);
    }
    
    function handleSubmit(e) {
        e.preventDefault();
        if (parseInt(guess) === numberToGuess) {
        setMessage('You guessed it right!');
        } else {
        setMessage(`Wrong guess, the correct number was ${numberToGuess}. Try again next time.`);
        }
    }
    
    return (
        <div>
        <h1>Guess a Number</h1>
        <p>{message}</p>
        <form onSubmit={handleSubmit}>
            <input type="number" value={guess} onChange={handleChange} />
            <button type="submit">Submit</button>
        </form>
        </div>
    );
    }
    
    export default App;
    
    
    
    Save the file and go back to your terminal or command line interface. Run npm start. This will start a local development server, and you can access the app in your browser at http://localhost:3000.
    Play the guessing game by entering a number between 1 and 10 into the input field and clicking "Submit". The app will tell you if your guess is correct or not.
    Remember that this is just a simple example, and you can expand upon it to make the game more interesting with additional features like high scores, multiple difficulty levels, etc.
    

## New Chat

Choose a model with the dropdown at the top of the Chats page

If you don't have any models, [download one](models.html#download-models).
Once you have models, you can start chats by loading your default model, which
you can configure in [settings](settings.html#application-settings)

![](../assets/three_model_options.png)Choose a model

## LocalDocs

Open the [LocalDocs](localdocs.html) panel with the button in the top-right
corner to bring your files into the chat. With LocalDocs, your chats are
enhanced with semantically related snippets from your files included in the
model's context.

![](../assets/open_local_docs.png)Open LocalDocs

## Chat History

View your chat history with the button in the top-left corner of the Chats
page.

![Close chats](../assets/closed_chat_panel.png) |  ![Open chats](../assets/open_chat_panel.png)  
---|---  
  
You can change a chat name or delete it from your chat history at any time.

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html

---

[ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../../index.html)
  * [ Quickstart  ](../quickstart.html)
  * [ Chats  ](../chats.html)
  * [ Models  ](../models.html)
  * [ LocalDocs  ](../localdocs.html)
  * [ Settings  ](../settings.html)
  * [ Chat Templates  ](../chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](use-local-ai-models-to-privately-chat-with-google-drive.html)
    * Local AI Chat with your Obsidian Vault  [ Local AI Chat with your Obsidian Vault  ](use-local-ai-models-to-privately-chat-with-Obsidian.html) Table of contents 
      * Download Obsidian for Desktop 
      * Connect Obsidian to LocalDocs 
      * How It Works 
    * [ Local AI Chat with your OneDrive  ](use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../../gpt4all_python/home.html)
    * [ Monitoring  ](../../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../../gpt4all_help/troubleshooting.html)

Table of contents

  * Download Obsidian for Desktop 
  * Connect Obsidian to LocalDocs 
  * How It Works 

# Using GPT4All to Privately Chat with your Obsidian Vault

Obsidian for Desktop is a powerful management and note-taking software
designed to create and organize markdown notes. This tutorial allows you to
sync and access your Obsidian note files directly on your computer. By
connecting it to LocalDocs, you can integrate these files into your LLM chats
for private access and enhanced context.

## Download Obsidian for Desktop

Download Obsidian for Desktop

  1. **Download Obsidian for Desktop** :

     * Visit the [Obsidian website](https://obsidian.md) and create an account account.
     * Click the Download button in the center of the homepage
     * For more help with installing Obsidian see [Getting Started with Obsidian](https://help.obsidian.md/Getting+started/Download+and+install+Obsidian)
  2. **Set Up Obsidian** :

     * Launch Obsidian from your Applications folder (macOS), Start menu (Windows), or equivalent location (Linux).
     * On the welcome screen, you can either create a new vault (a collection of notes) or open an existing one.
     * To create a new vault, click Create a new vault, name your vault, choose a location on your computer, and click Create.
  3. **Sign in and Sync** : \- Once installed, you can start adding and organizing notes. \- Choose the folders you want to sync to your computer.

## Connect Obsidian to LocalDocs

Connect Obsidian to LocalDocs

  1.      * Navigate to the LocalDocs feature within GPT4All.

**Open LocalDocs** :

![LocalDocs interface](https://github.com/nomic-
ai/gpt4all/assets/132290469/d8fb2d79-2063-45d4-bcce-7299fb75b144)  
---  
  
  2. **Add Collection** :

     * Click on **\+ Add Collection** to begin linking your Obsidian Vault.

![Screenshot of adding collection](https://raw.githubusercontent.com/nomic-
ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all-
bindings/python/docs/assets/obsidian_adding_collection.png)  
---  
  
     * Name your collection
  3. **Create Collection** :

     * Click **Create Collection** to initiate the embedding process. Progress will be displayed within the LocalDocs interface.
  4. **Access Files in Chats** :

     * Load a model to chat with your files (Llama 3 Instruct is the fastest)
     * In your chat, open 'LocalDocs' with the button in the top-right corner to provide context from your synced Obsidian notes.

![Accessing LocalDocs in chats](https://raw.githubusercontent.com/nomic-
ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all-
bindings/python/docs/assets/obsidian_docs.png)  
---  
  
  5. **Interact With Your Notes:**

     * Use the model to interact with your files  ![osbsidian user interaction](https://raw.githubusercontent.com/nomic-ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all-bindings/python/docs/assets/osbsidian_user_interaction.png)  
---  
![osbsidian GPT4ALL response](https://raw.githubusercontent.com/nomic-
ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all-
bindings/python/docs/assets/obsidian_response.png)  
---  
  6. **View Referenced Files** :

     * Click on **Sources** below LLM responses to see which Obsidian Notes were referenced.

![Referenced Files](https://raw.githubusercontent.com/nomic-
ai/gpt4all/124ef867a9d9afd9e14d3858cd77bce858f79773/gpt4all-
bindings/python/docs/assets/obsidian_sources.png)  
---  
  

## How It Works

Obsidian for Desktop syncs your Obsidian notes to your computer, while
LocalDocs integrates these files into your LLM chats using embedding models.
These models find semantically similar snippets from your files to enhance the
context of your interactions.

To learn more about embedding models and explore further, refer to the [Nomic
Python SDK
documentation](https://docs.nomic.ai/atlas/capabilities/embeddings).

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html

---

[ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../../index.html)
  * [ Quickstart  ](../quickstart.html)
  * [ Chats  ](../chats.html)
  * [ Models  ](../models.html)
  * [ LocalDocs  ](../localdocs.html)
  * [ Settings  ](../settings.html)
  * [ Chat Templates  ](../chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * Local AI Chat with your OneDrive  [ Local AI Chat with your OneDrive  ](use-local-ai-models-to-privately-chat-with-One-Drive.html) Table of contents 
      * Download OneDrive for Desktop 
      * Connect OneDrive to LocalDocs 
      * How It Works 
  * API Server  API Server 
    * [ GPT4All API Server  ](../../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../../gpt4all_python/home.html)
    * [ Monitoring  ](../../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../../gpt4all_help/troubleshooting.html)

Table of contents

  * Download OneDrive for Desktop 
  * Connect OneDrive to LocalDocs 
  * How It Works 

# Using GPT4All to Privately Chat with your OneDrive Data

Local and Private AI Chat with your OneDrive Data

OneDrive for Desktop allows you to sync and access your OneDrive files
directly on your computer. By connecting your synced directory to LocalDocs,
you can start using GPT4All to privately chat with data stored in your
OneDrive.

## Download OneDrive for Desktop

Download OneDrive for Desktop

  1. **Download OneDrive for Desktop** :
  2. Visit [Microsoft OneDrive](https://www.microsoft.com/en-us/microsoft-365/onedrive/download).
  3. Press 'download' for your respective device type.
  4. Download the OneDrive for Desktop application.

  5. **Install OneDrive for Desktop**

  6. Run the installer file you downloaded.
  7. Follow the prompts to complete the installation process.

  8. **Sign in and Sync**

  9. Once installed, sign in to OneDrive for Desktop with your Microsoft account credentials.
  10. Choose the folders you want to sync to your computer.

## Connect OneDrive to LocalDocs

Connect OneDrive to LocalDocs

  1. **Install GPT4All and Open LocalDocs** :

     * Go to [nomic.ai/gpt4all](https://nomic.ai/gpt4all) to install GPT4All for your operating system.

     * Navigate to the LocalDocs feature within GPT4All to configure it to use your synced OneDrive directory.

![Screenshot 2024-07-10 at 10 55 41 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/54254bc0-d9a0-40c4-9fd1-5059abaad583)  
---  
  
  2. **Add Collection** :

     * Click on **\+ Add Collection** to begin linking your OneDrive folders.

![Screenshot 2024-07-10 at 10 56 29 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/7f12969a-753a-4757-bb9e-9b607cf315ca)  
---  
  
     * Name the Collection and specify the OneDrive folder path.
  3. **Create Collection** :

     * Click **Create Collection** to initiate the embedding process. Progress will be displayed within the LocalDocs interface.
  4. **Access Files in Chats** :

     * Load a model within GPT4All to chat with your files.

     * In your chat, open 'LocalDocs' using the button in the top-right corner to provide context from your synced OneDrive files.

![Screenshot 2024-07-10 at 10 58 55 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/b5a67fe6-0d6a-42ae-b3b8-cc0f91cbf5b1)  
---  
  
  5. **Interact With Your OneDrive** :

     * Use the model to interact with your files directly from OneDrive.

![Screenshot 2024-07-10 at 11 04 55 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/2c9815b8-3d1c-4179-bf76-3ddbafb193bf)  
---  
  
![Screenshot 2024-07-11 at 11 21 46 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/ce8be292-b025-415a-bd54-f11868e0cd0a)  
---  
  
  6. **View Referenced Files** :

     * Click on **Sources** below responses to see which OneDrive files were referenced.

![Screenshot 2024-07-11 at 11 22 49 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/6fe3f10d-2791-4153-88a7-2198ab3ac945)  
---  
  

## How It Works

OneDrive for Desktop syncs your OneDrive files to your computer, while
LocalDocs maintains a database of these synced files for use by your local
GPT4All model. As your OneDrive updates, LocalDocs will automatically detect
file changes and stay up to date. LocalDocs leverages [Nomic
Embedding](https://docs.nomic.ai/atlas/capabilities/embeddings) models to find
semantically similar snippets from your files, enhancing the context of your
interactions.

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html

---

[ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../../index.html)
  * [ Quickstart  ](../quickstart.html)
  * [ Chats  ](../chats.html)
  * [ Models  ](../models.html)
  * [ LocalDocs  ](../localdocs.html)
  * [ Settings  ](../settings.html)
  * [ Chat Templates  ](../chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * Local AI Chat with your Google Drive  [ Local AI Chat with your Google Drive  ](use-local-ai-models-to-privately-chat-with-google-drive.html) Table of contents 
      * Download Google Drive for Desktop 
      * Connect Google Drive to LocalDocs 
      * How It Works 
    * [ Local AI Chat with your Obsidian Vault  ](use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../../gpt4all_python/home.html)
    * [ Monitoring  ](../../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../../gpt4all_help/troubleshooting.html)

Table of contents

  * Download Google Drive for Desktop 
  * Connect Google Drive to LocalDocs 
  * How It Works 

# Using GPT4All to Privately Chat with your Google Drive Data

Local and Private AI Chat with your Google Drive Data

Google Drive for Desktop allows you to sync and access your Google Drive files
directly on your computer. By connecting your synced directory to LocalDocs,
you can start using GPT4All to privately chat with data stored in your Google
Drive.

## Download Google Drive for Desktop

Download Google Drive for Desktop

  1. **Download Google Drive for Desktop** :
  2. Visit [drive.google.com](https://drive.google.com) and sign in with your Google account.
  3. Navigate to the **Settings** (gear icon) and select **Settings** from the dropdown menu.
  4. Scroll down to **Google Drive for desktop** and click **Download**.

  5. **Install Google Drive for Desktop**

  6. Run the installer file you downloaded.
  7. Follow the prompts to complete the installation process.

  8. **Sign in and Sync**

  9. Once installed, sign in to Google Drive for Desktop with your Google account credentials.
  10. Choose the folders you want to sync to your computer.

For advanced help, see [Setting up Google Drive for
Desktop](https://support.google.com/drive/answer/10838124?hl=en)

## Connect Google Drive to LocalDocs

Connect Google Drive to LocalDocs

  1. **Install GPT4All and Open LocalDocs** :

     * Go to [nomic.ai/gpt4all](https://nomic.ai/gpt4all) to install GPT4All for your operating system.

     * Navigate to the LocalDocs feature within GPT4All to configure it to use your synced directory.

![Screenshot 2024-07-09 at 3 15 35 PM](https://github.com/nomic-
ai/gpt4all/assets/132290469/d8fb2d79-2063-45d4-bcce-7299fb75b144)  
---  
  
  2. **Add Collection** :

     * Click on **\+ Add Collection** to begin linking your Google Drive folders.

![Screenshot 2024-07-09 at 3 17 24 PM](https://github.com/nomic-
ai/gpt4all/assets/132290469/39063615-9eb6-4c47-bde7-c9f04f9b168b)  
---  
  
     * Name Collection
  3. **Create Collection** :

     * Click **Create Collection** to initiate the embedding process. Progress will be displayed within the LocalDocs interface.
  4. **Access Files in Chats** :

     * Load a model to chat with your files (Llama 3 Instruct performs best)

     * In your chat, open 'LocalDocs' with the button in the top-right corner to provide context from your synced Google Drive files.

![Screenshot 2024-07-09 at 3 20 53 PM](https://github.com/nomic-
ai/gpt4all/assets/132290469/ce68811f-9abd-451b-ac0a-fb941e185d7a)  
---  
  
  5. **Interact With Your Drive:**

     * Use the model to interact with your files

![Screenshot 2024-07-09 at 3 36 51 PM](https://github.com/nomic-
ai/gpt4all/assets/132290469/bc55bc36-e613-419d-a568-adb1cd993854)  
---  
  
![Screenshot 2024-07-11 at 11 34 00 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/1c0fd19a-5a22-4726-a841-d26c1bea81fc)  
---  
  
  6. **View Referenced Files** :

     * Click on **Sources** below LLM responses to see which Google Drive files were referenced.

  
![Screenshot 2024-07-11 at 11 34 37 AM](https://github.com/nomic-
ai/gpt4all/assets/132290469/78527d30-8d24-4b4c-8311-b611a2d66fcd)  
---  
  

## How It Works

Google Drive for Desktop syncs your Google Drive files to your computer, while
LocalDocs maintains a database of these synced files for use by your local
LLM. As your Google Drive updates, LocalDocs will automatically detect file
changes and get up to date. LocalDocs is powered by [Nomic
Embedding](https://docs.nomic.ai/atlas/capabilities/embeddings) models which
find semantically similar snippets from your files to enhance the context of
your interactions.

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html

---

[ ![logo](../../assets/nomic.png) ](../../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../../index.html)
  * [ Quickstart  ](../quickstart.html)
  * [ Chats  ](../chats.html)
  * [ Models  ](../models.html)
  * [ LocalDocs  ](../localdocs.html)
  * [ Settings  ](../settings.html)
  * [ Chat Templates  ](../chat_templates.html)
  * Cookbook  Cookbook 
    * Local AI Chat with Microsoft Excel  [ Local AI Chat with Microsoft Excel  ](use-local-ai-models-to-privately-chat-with-microsoft-excel.html) Table of contents 
      * Attach Microsoft Excel to your GPT4All Conversation 
      * How It Works 
      * Limitations 
    * [ Local AI Chat with your Google Drive  ](use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../../gpt4all_python/home.html)
    * [ Monitoring  ](../../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../../gpt4all_help/troubleshooting.html)

Table of contents

  * Attach Microsoft Excel to your GPT4All Conversation 
  * How It Works 
  * Limitations 

# Using GPT4All to Privately Chat with your Microsoft Excel Spreadsheets

Local and Private AI Chat with your Microsoft Excel Spreadsheets

Microsoft Excel allows you to create, manage, and analyze data in spreadsheet
format. By attaching your spreadsheets directly to GPT4All, you can privately
chat with the AI to query and explore the data, enabling you to summarize,
generate reports, and glean insights from your filesâall within your
conversation.

## Attach Microsoft Excel to your GPT4All Conversation

Attach Microsoft Excel to your GPT4All Conversation

  1. **Install GPT4All and Open** :

     * Go to [nomic.ai/gpt4all](https://nomic.ai/gpt4all) to install GPT4All for your operating system.

     * Navigate to the Chats view within GPT4All.

![Chat view](../../assets/chat_window.png)  
---  
  
  2. **Example Spreadsheet** :

![Spreadsheet view](../../assets/disney_spreadsheet.png)  
---  
  
  3. **Attach to GPT4All conversration** ![Attach view](../../assets/attach_spreadsheet.png)  
---  
  
  4. **Have GPT4All Summarize and Generate a Report** ![Attach view](../../assets/spreadsheet_chat.png)  
---  
  

## How It Works

GPT4All parses your attached excel spreadsheet into Markdown, a format
understandable to LLMs, and adds the markdown text to the context for your LLM
chat. You can view the code that converts `.xslx` to Markdown
[here](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-
chat/src/xlsxtomd.cpp) in the GPT4All github repo.

For example, the above spreadsheet titled `disney_income_stmt.xlsx` would be
formatted the following way:

    
    
    ## disney_income_stmt
    
    |Walt Disney Co.|||||||
    |---|---|---|---|---|---|---|
    |Consolidated Income Statement|||||||
    |||||||||
    |US$ in millions|||||||
    |12 months ended:|2023-09-30 00:00:00|2022-10-01 00:00:00|2021-10-02 00:00:00|2020-10-03 00:00:00|2019-09-28 00:00:00|2018-09-29 00:00:00|
    |Services|79562|74200|61768|59265|60542|50869|
    ...
    ...
    ...
    

## Limitations

It is important to double-check the claims LLMs make about the spreadsheets
you provide. LLMs can make mistakes about the data they are presented,
particularly for the LLMs with smaller parameter counts (~8B) that fit within
the memory of consumer hardware.

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/localdocs.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](quickstart.html)
  * [ Chats  ](chats.html)
  * [ Models  ](models.html)
  * LocalDocs  [ LocalDocs  ](localdocs.html) Table of contents 
    * Create LocalDocs 
    * How It Works 
  * [ Settings  ](settings.html)
  * [ Chat Templates  ](chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * Create LocalDocs 
  * How It Works 

# LocalDocs

LocalDocs brings the information you have from files on-device into your LLM
chats - **privately**.

## Create LocalDocs

Create LocalDocs

  1. Click `+ Add Collection`.

  2. Name your collection and link it to a folder.

![new GOT Docs](../assets/new_docs_annotated.png) |  ![new GOT Docs filled out](../assets/new_docs_annotated_filled.png)  
---|---  
  
  3. Click `Create Collection`. Progress for the collection is displayed on the LocalDocs page. 

![](../assets/baelor.png)Embedding in progress

You will see a green `Ready` indicator when the entire collection is ready.

Note: you can still chat with the files that are ready before the entire
collection is ready.

![](../assets/got_done.png)Embedding complete

Later on if you modify your LocalDocs settings you can rebuild your
collections with your new settings.

  4. In your chats, open `LocalDocs` with button in top-right corner to give your LLM context from those files.

![](../assets/syrio_snippets.png)LocalDocs result

  5. See which files were referenced by clicking `Sources` below the LLM responses.

![](../assets/open_sources.png)Sources

## How It Works

A LocalDocs collection uses Nomic AI's free and fast on-device embedding
models to index your folder into text snippets that each get an **embedding
vector**. These vectors allow us to find snippets from your files that are
semantically similar to the questions and prompts you enter in your chats. We
then include those semantically similar snippets in the prompt to the LLM.

To try the embedding models yourself, we recommend using the [Nomic Python
SDK](https://docs.nomic.ai/atlas/capabilities/embeddings)

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/models.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](quickstart.html)
  * [ Chats  ](chats.html)
  * Models  [ Models  ](models.html) Table of contents 
    * Download Models 
    * Explore Models 
    * Example Models 
      * Search Results 
    * Connect Model APIs 
  * [ LocalDocs  ](localdocs.html)
  * [ Settings  ](settings.html)
  * [ Chat Templates  ](chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * Download Models 
  * Explore Models 
  * Example Models 
    * Search Results 
  * Connect Model APIs 

# Models

GPT4All is optimized to run LLMs in the 3-13B parameter range on consumer-
grade hardware.

LLMs are downloaded to your device so you can run them locally and privately.
With our backend anyone can interact with LLMs efficiently and securely on
their own hardware.

## Download Models

Download Models

1. | Click `Models` in the menu on the left (below `Chats` and above `LocalDocs`) | ![Models Page Icon](../assets/models_page_icon.png)  
---|---|---  
2. | Click `+ Add Model` to navigate to the `Explore Models` page | ![Add Model button](../assets/add.png)  
3. | Search for models available online | ![Explore Models search](../assets/explore.png)  
4. | Hit `Download` to save a model to your device | ![Download Models button](../assets/download.png)  
5. | Once the model is downloaded you will see it in `Models`. | ![Download Models button](../assets/installed_models.png)  
  
## Explore Models

GPT4All connects you with LLMs from HuggingFace with a
[`llama.cpp`](https://github.com/ggerganov/llama.cpp) backend so that they
will run efficiently on your hardware. Many of these models can be identified
by the file type `.gguf`.

![](../assets/search_mistral.png)Explore models

## Example Models

Many LLMs are available at various sizes, quantizations, and licenses.

  * LLMs with more parameters tend to be better at coherently responding to instructions

  * LLMs with a smaller quantization (e.g. 4bit instead of 16bit) are much faster and less memory intensive, and tend to have slightly worse performance

  * Licenses vary in their terms for personal and commercial use

Here are a few examples:

Model | Filesize | RAM Required | Parameters | Quantization | Developer | License | MD5 Sum (Unique Hash)  
---|---|---|---|---|---|---|---  
Llama 3 Instruct | 4.66 GB | 8 GB | 8 Billion | q4_0 | Meta | [Llama 3 License](https://llama.meta.com/llama3/license/) | c87ad09e1e4c8f9c35a5fcef52b6f1c9  
Nous Hermes 2 Mistral DPO | 4.11 GB | 8 GB | 7 Billion | q4_0 | Mistral & Nous Research | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | Coa5f6b4eabd3992da4d7fb7f020f921eb  
Phi-3 Mini Instruct | 2.18 GB | 4 GB | 4 billion | q4_0 | Microsoft | [MIT](https://opensource.org/license/mit) | f8347badde9bfc2efbe89124d78ddaf5  
Mini Orca (Small) | 1.98 GB | 4 GB | 3 billion | q4_0 | Microsoft | [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0) | 0e769317b90ac30d6e09486d61fefa26  
GPT4All Snoozy | 7.37 GB | 16 GB | 13 billion | q4_0 | Nomic AI | [GPL](https://www.gnu.org/licenses/gpl-3.0.en.html) | 40388eb2f8d16bb5d08c96fdfaac6b2c  
  
### Search Results

You can click the gear icon in the search bar to sort search results by their
# of likes, # of downloads, or date of upload (all from HuggingFace).

![](../assets/search_settings.png)Sort search results

## Connect Model APIs

You can add your API key for remote model providers.

**Note** : this does not download a model file to your computer to use
securely. Instead, this way of interacting with models has your prompts leave
your computer to the API provider and returns the response to your computer.

![](../assets/add_model_gpt4.png)Connect APIs

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/quickstart.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * Quickstart  [ Quickstart  ](quickstart.html) Table of contents 
    * Quickstart 
  * [ Chats  ](chats.html)
  * [ Models  ](models.html)
  * [ LocalDocs  ](localdocs.html)
  * [ Settings  ](settings.html)
  * [ Chat Templates  ](chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * Quickstart 

# GPT4All Desktop

The GPT4All Desktop Application allows you to download and run large language
models (LLMs) locally & privately on your device.

With GPT4All, you can chat with models, turn your local files into information
sources for models [(LocalDocs)](localdocs.html), or browse models available
online to download onto your device.

[Official Video Tutorial](https://www.youtube.com/watch?v=gQcZDXRVJok)

## Quickstart

Quickstart

  1. Install GPT4All for your operating system and open the application.

[Download for Windows](https://gpt4all.io/installers/gpt4all-installer-
win64.exe)      [Download for Mac](https://gpt4all.io/installers/gpt4all-
installer-darwin.dmg)      [Download for
Linux](https://gpt4all.io/installers/gpt4all-installer-linux.run)

  2. Hit `Start Chatting`. ![](../assets/gpt4all_home.png)GPT4All home page

  3. Click `+ Add Model`.

  4. Download a model. We recommend starting with Llama 3, but you can [browse more models](models.html). ![](../assets/download_llama.png)Download a model

  5. Once downloaded, go to Chats (below Home and above Models in the menu on the left). 

  6. Click "Load Default Model" (will be Llama 3 or whichever model you downloaded). 

![Before first chat](../assets/before_first_chat.png) |  ![New first chat](../assets/new_first_chat.png)  
---|---  
  
  7. Try the [example chats](chats.html) or your own prompts!

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_desktop/settings.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](quickstart.html)
  * [ Chats  ](chats.html)
  * [ Models  ](models.html)
  * [ LocalDocs  ](localdocs.html)
  * Settings  [ Settings  ](settings.html) Table of contents 
    * Application Settings 
    * Model Settings 
      * Clone 
      * Sampling Settings 
    * LocalDocs Settings 
  * [ Chat Templates  ](chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * Application Settings 
  * Model Settings 
    * Clone 
    * Sampling Settings 
  * LocalDocs Settings 

# Settings

## Application Settings

General Application Settings

Setting | Description | Default Value  
---|---|---  
**Theme** | Color theme for the application. Options are `Light`, `Dark`, and `LegacyDark` | `Light`  
**Font Size** | Font size setting for text throughout the application. Options are Small, Medium, and Large | Small  
**Language and Locale** | The language and locale of that language you wish to use | System Locale  
**Device** | Device that will run your models. Options are `Auto` (GPT4All chooses), `Metal` (Apple Silicon M1+), `CPU`, and `GPU` | `Auto`  
**Default Model** | Choose your preferred LLM to load by default on startup | Auto  
**Suggestion Mode** | Generate suggested follow up questions at the end of responses | When chatting with LocalDocs  
**Download Path** | Select a destination on your device to save downloaded models | Windows: `C:\Users\{username}\AppData\Local\nomic.ai\GPT4All`  
  
Mac: `/Users/{username}/Library/Application Support/nomic.ai/GPT4All/`  
  
Linux: `/home/{username}/.local/share/nomic.ai/GPT4All`  
**Enable Datalake** | Opt-in to sharing interactions with GPT4All community (**anonymous** and **optional**) | Off  
  
Advanced Application Settings

Setting | Description | Default Value  
---|---|---  
**CPU Threads** | Number of concurrently running CPU threads (more can speed up responses) | 4  
**Enable System Tray** | The application will minimize to the system tray / taskbar when the window is closed | Off  
**Enable Local Server** | Allow any application on your device to use GPT4All via an OpenAI-compatible GPT4All API | Off  
**API Server Port** | Local HTTP port for the local API server | 4891  
  
## Model Settings

Model / Character Settings

Setting | Description | Default Value  
---|---|---  
**Name** | Unique name of this model / character | set by model uploader  
**Model File** | Filename (.gguf) of the model | set by model uploader  
**System Message** | General instructions for the chats this model will be used for | set by model uploader  
**Chat Template** | Format of user <-> assistant interactions for the chats this model will be used for | set by model uploader  
**Chat Name Prompt** | Prompt used to automatically generate chat names | Describe the above conversation in seven words or less.  
**Suggested FollowUp Prompt** | Prompt used to automatically generate follow up questions after a chat response | Suggest three very short factual follow-up questions that have not been answered yet or cannot be found inspired by the previous conversation and excerpts.  
  
### Clone

You can **clone** an existing model, which allows you to save a configuration
of a model file with different prompt templates and sampling settings.

### Sampling Settings

Model Sampling Settings

Setting | Description | Default Value  
---|---|---  
**Context Length** | Maximum length of input sequence in tokens | 2048  
**Max Length** | Maximum length of response in tokens | 4096  
**Prompt Batch Size** | Token batch size for parallel processing | 128  
**Temperature** | Lower temperature gives more likely generations | 0.7  
**Top P** | Prevents choosing highly unlikely tokens | 0.4  
**Top K** | Size of selection pool for tokens | 40  
**Min P** | Minimum relative probability | 0  
**Repeat Penalty Tokens** | Length to apply penalty | 64  
**Repeat Penalty** | Penalize repetitiveness | 1.18  
**GPU Layers** | How many model layers to load into VRAM | 32  
  
## LocalDocs Settings

General LocalDocs Settings

Setting | Description | Default Value  
---|---|---  
**Allowed File Extensions** | Choose which file types will be indexed into LocalDocs collections as text snippets with embedding vectors | `.txt`, `.pdf`, `.md`, `.rst`  
**Use Nomic Embed API** | Use Nomic API to create LocalDocs collections fast and off-device; [Nomic API Key](https://atlas.nomic.ai/) required | Off  
**Embeddings Device** | Device that will run embedding models. Options are `Auto` (GPT4All chooses), `Metal` (Apple Silicon M1+), `CPU`, and `GPU` | `Auto`  
**Show Sources** | Titles of source files retrieved by LocalDocs will be displayed directly in your chats. | On  
  
Advanced LocalDocs Settings

Note that increasing these settings can increase the likelihood of factual
responses, but may result in slower generation times.

Setting | Description | Default Value  
---|---|---  
**Document Snippet Size** | Number of string characters per document snippet | 512  
**Maximum Document Snippets Per Prompt** | Upper limit for the number of snippets from your files LocalDocs can retrieve for LLM context | 3

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_help/faq.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](../gpt4all_desktop/quickstart.html)
  * [ Chats  ](../gpt4all_desktop/chats.html)
  * [ Models  ](../gpt4all_desktop/models.html)
  * [ LocalDocs  ](../gpt4all_desktop/localdocs.html)
  * [ Settings  ](../gpt4all_desktop/settings.html)
  * [ Chat Templates  ](../gpt4all_desktop/chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * FAQ  [ FAQ  ](faq.html) Table of contents 
      * Models 
        * Which language models are supported? 
        * Which embedding models are supported? 
      * Software 
        * What software do I need? 
        * Which SDK languages are supported? 
        * Is there an API? 
        * Can I monitor a GPT4All deployment? 
        * Is there a command line interface (CLI)? 
      * Hardware 
        * What hardware do I need? 
        * What are the system requirements? 
    * [ Troubleshooting  ](troubleshooting.html)

Table of contents

  * Models 
    * Which language models are supported? 
    * Which embedding models are supported? 
  * Software 
    * What software do I need? 
    * Which SDK languages are supported? 
    * Is there an API? 
    * Can I monitor a GPT4All deployment? 
    * Is there a command line interface (CLI)? 
  * Hardware 
    * What hardware do I need? 
    * What are the system requirements? 

# Frequently Asked Questions

## Models

### Which language models are supported?

We support models with a `llama.cpp` implementation which have been uploaded
to [HuggingFace](https://huggingface.co/).

### Which embedding models are supported?

We support SBert and Nomic Embed Text v1 & v1.5.

## Software

### What software do I need?

All you need is to [install GPT4all](../index.html) onto you Windows, Mac, or
Linux computer.

### Which SDK languages are supported?

Our SDK is in Python for usability, but these are light bindings around
[`llama.cpp`](https://github.com/ggerganov/llama.cpp) implementations that we
contribute to for efficiency and accessibility on everyday computers.

### Is there an API?

Yes, you can run your model in server-mode with our [OpenAI-compatible
API](https://platform.openai.com/docs/api-reference/completions), which you
can configure in [settings](../gpt4all_desktop/settings.html#application-
settings)

### Can I monitor a GPT4All deployment?

Yes, GPT4All [integrates](../gpt4all_python/monitoring.html) with
[OpenLIT](https://github.com/openlit/openlit) so you can deploy LLMs with user
interactions and hardware usage automatically monitored for full
observability.

### Is there a command line interface (CLI)?

[Yes](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/cli), we
have a lightweight use of the Python client as a CLI. We welcome further
contributions!

## Hardware

### What hardware do I need?

GPT4All can run on CPU, Metal (Apple Silicon M1+), and GPU.

### What are the system requirements?

Your CPU needs to support [AVX or AVX2
instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) and
you need enough RAM to load a model into memory.

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_help/troubleshooting.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](../gpt4all_desktop/quickstart.html)
  * [ Chats  ](../gpt4all_desktop/chats.html)
  * [ Models  ](../gpt4all_desktop/models.html)
  * [ LocalDocs  ](../gpt4all_desktop/localdocs.html)
  * [ Settings  ](../gpt4all_desktop/settings.html)
  * [ Chat Templates  ](../gpt4all_desktop/chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](../gpt4all_python/home.html)
    * [ Monitoring  ](../gpt4all_python/monitoring.html)
    * [ SDK Reference  ](../gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](faq.html)
    * Troubleshooting  [ Troubleshooting  ](troubleshooting.html) Table of contents 
      * Error Loading Models 
      * Bad Responses 
        * Responses Incoherent 
        * Responses Incorrect 
        * LocalDocs Issues 

Table of contents

  * Error Loading Models 
  * Bad Responses 
    * Responses Incoherent 
    * Responses Incorrect 
    * LocalDocs Issues 

# Troubleshooting

## Error Loading Models

It is possible you are trying to load a model from HuggingFace whose weights
are not compatible with our [backend](https://github.com/nomic-
ai/gpt4all/tree/main/gpt4all-bindings).

Try downloading one of the officially supported models listed on the main
models page in the application. If the problem persists, please share your
experience on our [Discord](https://discord.com/channels/1076964370942267462).

## Bad Responses

Try the [example chats](../gpt4all_desktop/chats.html) to double check that
your system is implementing models correctly.

### Responses Incoherent

If you are seeing something **not at all** resembling the [example
chats](../gpt4all_desktop/chats.html) \- for example, if the responses you are
seeing look nonsensical - try [downloading a different
model](../gpt4all_desktop/models.html), and please share your experience on
our [Discord](https://discord.com/channels/1076964370942267462).

### Responses Incorrect

LLMs can be unreliable. It's helpful to know what their training data was -
they are less likely to be correct when asking about data they were not
trained on unless you give the necessary information in the prompt as
**context**.

Giving LLMs additional context, like chatting using
[LocalDocs](../gpt4all_desktop/localdocs.html), can help merge the language
model's ability to understand text with the files that you trust to contain
the information you need.

Including information in a prompt is not a guarantee that it will be used
correctly, but the more clear and concise your prompts, and the more relevant
your prompts are to your files, the better.

### LocalDocs Issues

Occasionally a model - particularly a smaller or overall weaker LLM - may not
use the relevant text snippets from the files that were referenced via
LocalDocs. If you are seeing this, it can help to use phrases like "in the
docs" or "from the provided files" when prompting your model.

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_python/home.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](../gpt4all_desktop/quickstart.html)
  * [ Chats  ](../gpt4all_desktop/chats.html)
  * [ Models  ](../gpt4all_desktop/models.html)
  * [ LocalDocs  ](../gpt4all_desktop/localdocs.html)
  * [ Settings  ](../gpt4all_desktop/settings.html)
  * [ Chat Templates  ](../gpt4all_desktop/chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * GPT4All Python SDK  [ GPT4All Python SDK  ](home.html) Table of contents 
      * Installation 
      * Load LLM 
      * Chat Session Generation 
      * Direct Generation 
      * Embeddings 
    * [ Monitoring  ](monitoring.html)
    * [ SDK Reference  ](ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * Installation 
  * Load LLM 
  * Chat Session Generation 
  * Direct Generation 
  * Embeddings 

# GPT4All Python SDK

## Installation

To get started, pip-install the `gpt4all` package into your python
environment.

    
    
    pip install gpt4all
    

We recommend installing `gpt4all` into its own virtual environment using
`venv` or `conda`

## Load LLM

Models are loaded by name via the `GPT4All` class. If it's your first time
loading a model, it will be downloaded to your device and saved so it can be
quickly reloaded next time you create a `GPT4All` model with the same name.

Load LLM

    
    
    from gpt4all import GPT4All
    model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
    with model.chat_session():
        print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))
    

`GPT4All` model name | Filesize | RAM Required | Parameters | Quantization | Developer | License | MD5 Sum (Unique Hash)  
---|---|---|---|---|---|---|---  
`Meta-Llama-3-8B-Instruct.Q4_0.gguf` | 4.66 GB | 8 GB | 8 Billion | q4_0 | Meta | [Llama 3 License](https://llama.meta.com/llama3/license/) | c87ad09e1e4c8f9c35a5fcef52b6f1c9  
`Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf` | 4.11 GB | 8 GB | 7 Billion | q4_0 | Mistral & Nous Research | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | Coa5f6b4eabd3992da4d7fb7f020f921eb  
`Phi-3-mini-4k-instruct.Q4_0.gguf` | 2.18 GB | 4 GB | 3.8 billion | q4_0 | Microsoft | [MIT](https://opensource.org/license/mit) | f8347badde9bfc2efbe89124d78ddaf5  
`orca-mini-3b-gguf2-q4_0.gguf` | 1.98 GB | 4 GB | 3 billion | q4_0 | Microsoft | [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0) | 0e769317b90ac30d6e09486d61fefa26  
`gpt4all-13b-snoozy-q4_0.gguf` | 7.37 GB | 16 GB | 13 billion | q4_0 | Nomic AI | [GPL](https://www.gnu.org/licenses/gpl-3.0.en.html) | 40388eb2f8d16bb5d08c96fdfaac6b2c  
  
## Chat Session Generation

Most of the language models you will be able to access from HuggingFace have
been trained as assistants. This guides language models to not just answer
with relevant text, but _helpful_ text.

If you want your LLM's responses to be helpful in the typical sense, we
recommend you apply the chat templates the models were finetuned with.
Information about specific prompt templates is typically available on the
official HuggingFace page for the model.

Example LLM Chat Session Generation

CodeOutput

Load `Llama 3` and enter the following prompt in a chat session:

    
    
    from gpt4all import GPT4All
    model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
    with model.chat_session():
        print(model.generate("quadratic formula"))
    

With the default sampling settings, you should see something resembling the
following:

    
    
    The quadratic formula!
    
    The quadratic formula is a mathematical formula that provides the solutions to a quadratic equation of the form:
    
    ax^2 + bx + c = 0
    
    where a, b, and c are constants. The formula is:
    
    x = (-b Â± â(b^2 - 4ac)) / 2a
    
    Let's break it down:
    
    * x is the variable we're trying to solve for.
    * a, b, and c are the coefficients of the quadratic equation.
    * Â± means "plus or minus".
    * â denotes the square root.
    
    To use the formula, simply plug in the values of a, b, and c into the expression above. The resulting value(s) will be the solutions to the original quadratic equation!
    
    For example, let's say we have the quadratic equation:
    
    x^2 + 5x + 6 = 0
    
    We can plug these values into the formula as follows:
    
    a = 1
    b = 5
    

## Direct Generation

Directly calling `model.generate()` prompts the model without applying any
templates.

Note: this can result in responses that are less like helpful responses and
more like mirroring the tone of your prompt. In general, a language model
outside of a chat session is less of a helpful assistant and more of a lens
into the distribution of the model's training data.

As an example, see how the model's response changes when we give the same
prompt as above without applying a chat session:

Example LLM Direct Generation

CodeOutput

Load `Llama 3` and enter the following prompt:

    
    
    from gpt4all import GPT4All
    model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf")
    print(model.generate("quadratic formula"))
    

With the default sampling settings, you should see something resembling the
following:

    
    
    . The equation is in the form of a + bx = c, where a and b are constants.
    The solution to this problem involves using the quadratic formula which states that for any quadratic equation ax^2+bx+c=0, its solutions can be found by:
    x = (-b Â± â(b^2-4ac)) / 2a
    In your case, since you have a + bx = c, we need to rewrite it in the form of ax^2+bx+c=0. To do this, subtract both sides from c, so that:
    c - (a + bx) = 0
    Now, combine like terms on the left side and simplify:
    ax^2 + (-b)x + (c-a) = 0\n\nSo now we have a quadratic equation in standard form: ax^2+bx+c=0. We can use this to find its solutions using the quadratic formula:
    
    x = ((-b Â± â((-b)^2
    

Why did it respond differently? Because language models, before being fine-
tuned as assistants, are trained to be more like a data mimic than a helpful
assistant. Therefore our responses ends up more like a typical continuation of
math-style text rather than a helpful answer in dialog.

## Embeddings

Nomic trains and open-sources free embedding models that will run very fast on
your hardware.

The easiest way to run the text embedding model locally uses the
[`nomic`](https://github.com/nomic-ai/nomic) python library to interface with
our fast [C/C++ implementations](ref.html#gpt4all.gpt4all.Embed4All).

Example Embeddings Generation

CodeOutput

Importing `embed` from the [`nomic`](https://github.com/nomic-ai/nomic)
library, you can call `embed.text()` with `inference_mode="local"`. This
downloads an embedding model and saves it for later.

    
    
    from nomic import embed
    embeddings = embed.text(["String 1", "String 2"], inference_mode="local")['embeddings']
    print("Number of embeddings created:", len(embeddings))
    print("Number of dimensions per embedding:", len(embeddings[0]))
    
    
    
    Number of embeddings created: 2
    Number of dimensions per embedding: 768
    

![](../assets/local_embed.gif)Nomic embed text local inference

To learn more about making embeddings locally with `nomic`, visit our
[embeddings guide](https://docs.nomic.ai/atlas/guides/embeddings#local-
inference).

The following embedding models can be used within the application and with the
`Embed4All` class from the `gpt4all` Python library. The default context
length as GGUF files is 2048 but can be
[extended](https://huggingface.co/nomic-ai/nomic-embed-
text-v1.5-GGUF#description).

Name | Using with `nomic` | `Embed4All` model name | Context Length | # Embedding Dimensions | File Size  
---|---|---|---|---|---  
[Nomic Embed v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1-GGUF) | `embed.text(strings, model="nomic-embed-text-v1", inference_mode="local")` | `Embed4All("nomic-embed-text-v1.f16.gguf")` | 2048 | 768 | 262 MiB  
[Nomic Embed v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF) | `embed.text(strings, model="nomic-embed-text-v1.5", inference_mode="local")` | `Embed4All("nomic-embed-text-v1.5.f16.gguf")` | 2048 | 64-768 | 262 MiB  
[SBert](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | n/a | `Embed4All("all-MiniLM-L6-v2.gguf2.f16.gguf")` | 512 | 384 | 44 MiB

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_python/monitoring.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](../gpt4all_desktop/quickstart.html)
  * [ Chats  ](../gpt4all_desktop/chats.html)
  * [ Models  ](../gpt4all_desktop/models.html)
  * [ LocalDocs  ](../gpt4all_desktop/localdocs.html)
  * [ Settings  ](../gpt4all_desktop/settings.html)
  * [ Chat Templates  ](../gpt4all_desktop/chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](home.html)
    * Monitoring  [ Monitoring  ](monitoring.html) Table of contents 
      * Setup Monitoring 
      * Visualization 
        * OpenLIT UI 
        * Grafana, DataDog, & Other Integrations 
    * [ SDK Reference  ](ref.html)
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * Setup Monitoring 
  * Visualization 
    * OpenLIT UI 
    * Grafana, DataDog, & Other Integrations 

# GPT4All Monitoring

GPT4All integrates with [OpenLIT](https://github.com/openlit/openlit)
OpenTelemetry auto-instrumentation to perform real-time monitoring of your LLM
application and GPU hardware.

Monitoring can enhance your GPT4All deployment with auto-generated traces and
metrics for

  * **Performance Optimization:** Analyze latency, cost and token usage to ensure your LLM application runs efficiently, identifying and resolving performance bottlenecks swiftly.

  * **User Interaction Insights:** Capture each prompt and response to understand user behavior and usage patterns better, improving user experience and engagement.

  * **Detailed GPU Metrics:** Monitor essential GPU parameters such as utilization, memory consumption, temperature, and power usage to maintain optimal hardware performance and avert potential issues.

## Setup Monitoring

Setup Monitoring

With [OpenLIT](https://github.com/openlit/openlit), you can automatically
monitor traces and metrics for your LLM deployment:

    
    
    pip install openlit
    
    
    
    from gpt4all import GPT4All
    import openlit
    
    openlit.init()  # start
    # openlit.init(collect_gpu_stats=True)  # Optional: To configure GPU monitoring
    
    model = GPT4All(model_name='orca-mini-3b-gguf2-q4_0.gguf')
    
    # Start a chat session and send queries
    with model.chat_session():
        response1 = model.generate(prompt='hello', temp=0)
        response2 = model.generate(prompt='write me a short poem', temp=0)
        response3 = model.generate(prompt='thank you', temp=0)
    
        print(model.current_chat_session)
    

## Visualization

### OpenLIT UI

Connect to OpenLIT's UI to start exploring the collected LLM performance
metrics and traces. Visit the OpenLIT [Quickstart
Guide](https://docs.openlit.io/latest/quickstart) for step-by-step details.

### Grafana, DataDog, & Other Integrations

You can also send the data collected by OpenLIT to popular monitoring tools
like Grafana and DataDog. For detailed instructions on setting up these
connections, please refer to the OpenLIT [Connections
Guide](https://docs.openlit.io/latest/connections/intro).

---

# GPT4All Documentation

Source: https://docs.gpt4all.io/gpt4all_python/ref.html

---

[ ![logo](../assets/nomic.png) ](../index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](../index.html)
  * [ Quickstart  ](../gpt4all_desktop/quickstart.html)
  * [ Chats  ](../gpt4all_desktop/chats.html)
  * [ Models  ](../gpt4all_desktop/models.html)
  * [ LocalDocs  ](../gpt4all_desktop/localdocs.html)
  * [ Settings  ](../gpt4all_desktop/settings.html)
  * [ Chat Templates  ](../gpt4all_desktop/chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](../gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](../gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](home.html)
    * [ Monitoring  ](monitoring.html)
    * SDK Reference  [ SDK Reference  ](ref.html) Table of contents 
      * GPT4All 
        * backend 
        * device 
        * __init__ 
        * chat_session 
        * close 
        * download_model 
        * generate 
        * list_gpus 
        * list_models 
        * retrieve_model 
      * Embed4All 
        * __init__ 
        * close 
        * embed 
  * Help  Help 
    * [ FAQ  ](../gpt4all_help/faq.html)
    * [ Troubleshooting  ](../gpt4all_help/troubleshooting.html)

Table of contents

  * GPT4All 
    * backend 
    * device 
    * __init__ 
    * chat_session 
    * close 
    * download_model 
    * generate 
    * list_gpus 
    * list_models 
    * retrieve_model 
  * Embed4All 
    * __init__ 
    * close 
    * embed 

# GPT4All Python SDK Reference

####  `GPT4All`

Python class that handles instantiation, downloading, generation and chat with
GPT4All models.

Source code in `gpt4all/gpt4all.py`

    
    
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
    510
    511
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599
    600
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
    627
    628
    629
    630
    631
    632
    633
    634
    635
    636
    637
    638
    639
    640
    641
    642
    643
    644
    645
    646
    647
    648

|

    
    
    class GPT4All:
        """
        Python class that handles instantiation, downloading, generation and chat with GPT4All models.
        """
    
        def __init__(
            self,
            model_name: str,
            *,
            model_path: str | os.PathLike[str] | None = None,
            model_type: str | None = None,
            allow_download: bool = True,
            n_threads: int | None = None,
            device: str | None = None,
            n_ctx: int = 2048,
            ngl: int = 100,
            verbose: bool = False,
        ):
            """
            Constructor
    
            Args:
                model_name: Name of GPT4All or custom model. Including ".gguf" file extension is optional but encouraged.
                model_path: Path to directory containing model file or, if file does not exist, where to download model.
                    Default is None, in which case models will be stored in `~/.cache/gpt4all/`.
                model_type: Model architecture. This argument currently does not have any functionality and is just used as
                    descriptive identifier for user. Default is None.
                allow_download: Allow API to download models from gpt4all.io. Default is True.
                n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically.
                device: The processing unit on which the GPT4All model will run. It can be set to:
                    - "cpu": Model will run on the central processing unit.
                    - "gpu": Use Metal on ARM64 macOS, otherwise the same as "kompute".
                    - "kompute": Use the best GPU provided by the Kompute backend.
                    - "cuda": Use the best GPU provided by the CUDA backend.
                    - "amd", "nvidia": Use the best GPU provided by the Kompute backend from this vendor.
                    - A specific device name from the list returned by `GPT4All.list_gpus()`.
                    Default is Metal on ARM64 macOS, "cpu" otherwise.
    
                    Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown, and the GPT4All instance will be rendered invalid. It's advised to ensure the device has enough memory before initiating the model.
                n_ctx: Maximum size of context window
                ngl: Number of GPU layers to use (Vulkan)
                verbose: If True, print debug messages.
            """
    
            self.model_type = model_type
            self._chat_session: ChatSession | None = None
    
            device_init = None
            if sys.platform == "darwin":
                if device is None:
                    backend = "auto"  # "auto" is effectively "metal" due to currently non-functional fallback
                elif device == "cpu":
                    backend = "cpu"
                else:
                    if platform.machine() != "arm64" or device != "gpu":
                        raise ValueError(f"Unknown device for this platform: {device}")
                    backend = "metal"
            else:
                backend = "kompute"
                if device is None or device == "cpu":
                    pass  # use kompute with no device
                elif device in ("cuda", "kompute"):
                    backend = device
                    device_init = "gpu"
                elif device.startswith("cuda:"):
                    backend = "cuda"
                    device_init = _remove_prefix(device, "cuda:")
                else:
                    device_init = _remove_prefix(device, "kompute:")
    
            # Retrieve model and download if allowed
            self.config: ConfigType = self.retrieve_model(model_name, model_path=model_path, allow_download=allow_download, verbose=verbose)
            self.model = LLModel(self.config["path"], n_ctx, ngl, backend)
            if device_init is not None:
                self.model.init_gpu(device_init)
            self.model.load_model()
            # Set n_threads
            if n_threads is not None:
                self.model.set_thread_count(n_threads)
    
        def __enter__(self) -> Self:
            return self
    
        def __exit__(
            self, typ: type[BaseException] | None, value: BaseException | None, tb: TracebackType | None,
        ) -> None:
            self.close()
    
        def close(self) -> None:
            """Delete the model instance and free associated system resources."""
            self.model.close()
    
        @property
        def backend(self) -> Literal["cpu", "kompute", "cuda", "metal"]:
            """The name of the llama.cpp backend currently in use. One of "cpu", "kompute", "cuda", or "metal"."""
            return self.model.backend
    
        @property
        def device(self) -> str | None:
            """The name of the GPU device currently in use, or None for backends other than Kompute or CUDA."""
            return self.model.device
    
        @property
        def current_chat_session(self) -> list[MessageType] | None:
            return None if self._chat_session is None else self._chat_session.history
    
        @current_chat_session.setter
        def current_chat_session(self, history: list[MessageType]) -> None:
            if self._chat_session is None:
                raise ValueError("current_chat_session may only be set when there is an active chat session")
            self._chat_session.history[:] = history
    
        @staticmethod
        def list_models() -> list[ConfigType]:
            """
            Fetch model list from https://gpt4all.io/models/models3.json.
    
            Returns:
                Model list in JSON format.
            """
            resp = requests.get("https://gpt4all.io/models/models3.json")
            if resp.status_code != 200:
                raise ValueError(f"Request failed: HTTP {resp.status_code} {resp.reason}")
            return resp.json()
    
        @classmethod
        def retrieve_model(
            cls,
            model_name: str,
            model_path: str | os.PathLike[str] | None = None,
            allow_download: bool = True,
            verbose: bool = False,
        ) -> ConfigType:
            """
            Find model file, and if it doesn't exist, download the model.
    
            Args:
                model_name: Name of model.
                model_path: Path to find model. Default is None in which case path is set to
                    ~/.cache/gpt4all/.
                allow_download: Allow API to download model from gpt4all.io. Default is True.
                verbose: If True (default), print debug messages.
    
            Returns:
                Model config.
            """
    
            model_filename = append_extension_if_missing(model_name)
    
            # get the config for the model
            config: ConfigType = {}
            if allow_download:
                models = cls.list_models()
                if (model := next((m for m in models if m["filename"] == model_filename), None)) is not None:
                    config.update(model)
    
            # Validate download directory
            if model_path is None:
                try:
                    os.makedirs(DEFAULT_MODEL_DIRECTORY, exist_ok=True)
                except OSError as e:
                    raise RuntimeError("Failed to create model download directory") from e
                model_path = DEFAULT_MODEL_DIRECTORY
            else:
                model_path = Path(model_path)
    
            if not model_path.exists():
                raise FileNotFoundError(f"Model directory does not exist: {model_path!r}")
    
            model_dest = model_path / model_filename
            if model_dest.exists():
                config["path"] = str(model_dest)
                if verbose:
                    print(f"Found model file at {str(model_dest)!r}", file=sys.stderr)
            elif allow_download:
                # If model file does not exist, download
                filesize = config.get("filesize")
                config["path"] = str(cls.download_model(
                    model_filename, model_path, verbose=verbose, url=config.get("url"),
                    expected_size=None if filesize is None else int(filesize), expected_md5=config.get("md5sum"),
                ))
            else:
                raise FileNotFoundError(f"Model file does not exist: {model_dest!r}")
    
            return config
    
        @staticmethod
        def download_model(
            model_filename: str,
            model_path: str | os.PathLike[str],
            verbose: bool = True,
            url: str | None = None,
            expected_size: int | None = None,
            expected_md5: str | None = None,
        ) -> str | os.PathLike[str]:
            """
            Download model from gpt4all.io.
    
            Args:
                model_filename: Filename of model (with .gguf extension).
                model_path: Path to download model to.
                verbose: If True (default), print debug messages.
                url: the models remote url (e.g. may be hosted on HF)
                expected_size: The expected size of the download.
                expected_md5: The expected MD5 hash of the download.
    
            Returns:
                Model file destination.
            """
    
            # Download model
            if url is None:
                url = f"https://gpt4all.io/models/gguf/{model_filename}"
    
            def make_request(offset=None):
                headers = {}
                if offset:
                    print(f"\nDownload interrupted, resuming from byte position {offset}", file=sys.stderr)
                    headers["Range"] = f"bytes={offset}-"  # resume incomplete response
                    headers["Accept-Encoding"] = "identity"  # Content-Encoding changes meaning of ranges
                response = requests.get(url, stream=True, headers=headers)
                if response.status_code not in (200, 206):
                    raise ValueError(f"Request failed: HTTP {response.status_code} {response.reason}")
                if offset and (response.status_code != 206 or str(offset) not in response.headers.get("Content-Range", "")):
                    raise ValueError("Connection was interrupted and server does not support range requests")
                if (enc := response.headers.get("Content-Encoding")) is not None:
                    raise ValueError(f"Expected identity Content-Encoding, got {enc}")
                return response
    
            response = make_request()
    
            total_size_in_bytes = int(response.headers.get("content-length", 0))
            block_size = 2**20  # 1 MB
    
            partial_path = Path(model_path) / (model_filename + ".part")
    
            with open(partial_path, "w+b") as partf:
                try:
                    with tqdm(desc="Downloading", total=total_size_in_bytes, unit="iB", unit_scale=True) as progress_bar:
                        while True:
                            last_progress = progress_bar.n
                            try:
                                for data in response.iter_content(block_size):
                                    partf.write(data)
                                    progress_bar.update(len(data))
                            except ChunkedEncodingError as cee:
                                if cee.args and isinstance(pe := cee.args[0], ProtocolError):
                                    if len(pe.args) >= 2 and isinstance(ir := pe.args[1], IncompleteRead):
                                        assert progress_bar.n <= ir.partial  # urllib3 may be ahead of us but never behind
                                        # the socket was closed during a read - retry
                                        response = make_request(progress_bar.n)
                                        continue
                                raise
                            if total_size_in_bytes != 0 and progress_bar.n < total_size_in_bytes:
                                if progress_bar.n == last_progress:
                                    raise RuntimeError("Download not making progress, aborting.")
                                # server closed connection prematurely - retry
                                response = make_request(progress_bar.n)
                                continue
                            break
    
                    # verify file integrity
                    file_size = partf.tell()
                    if expected_size is not None and file_size != expected_size:
                        raise ValueError(f"Expected file size of {expected_size} bytes, got {file_size}")
                    if expected_md5 is not None:
                        partf.seek(0)
                        hsh = hashlib.md5()
                        with tqdm(desc="Verifying", total=file_size, unit="iB", unit_scale=True) as bar:
                            while chunk := partf.read(block_size):
                                hsh.update(chunk)
                                bar.update(len(chunk))
                        if hsh.hexdigest() != expected_md5.lower():
                            raise ValueError(f"Expected MD5 hash of {expected_md5!r}, got {hsh.hexdigest()!r}")
                except:
                    if verbose:
                        print("Cleaning up the interrupted download...", file=sys.stderr)
                    try:
                        os.remove(partial_path)
                    except OSError:
                        pass
                    raise
    
                # flush buffers and sync the inode
                partf.flush()
                _fsync(partf)
    
            # move to final destination
            download_path = Path(model_path) / model_filename
            try:
                os.rename(partial_path, download_path)
            except FileExistsError:
                try:
                    os.remove(partial_path)
                except OSError:
                    pass
                raise
    
            if verbose:
                print(f"Model downloaded to {str(download_path)!r}", file=sys.stderr)
            return download_path
    
        @overload
        def generate(
            self, prompt: str, *, max_tokens: int = ..., temp: float = ..., top_k: int = ..., top_p: float = ...,
            min_p: float = ..., repeat_penalty: float = ..., repeat_last_n: int = ..., n_batch: int = ...,
            n_predict: int | None = ..., streaming: Literal[False] = ..., callback: ResponseCallbackType = ...,
        ) -> str: ...
        @overload
        def generate(
            self, prompt: str, *, max_tokens: int = ..., temp: float = ..., top_k: int = ..., top_p: float = ...,
            min_p: float = ..., repeat_penalty: float = ..., repeat_last_n: int = ..., n_batch: int = ...,
            n_predict: int | None = ..., streaming: Literal[True], callback: ResponseCallbackType = ...,
        ) -> Iterable[str]: ...
        @overload
        def generate(
            self, prompt: str, *, max_tokens: int = ..., temp: float = ..., top_k: int = ..., top_p: float = ...,
            min_p: float = ..., repeat_penalty: float = ..., repeat_last_n: int = ..., n_batch: int = ...,
            n_predict: int | None = ..., streaming: bool, callback: ResponseCallbackType = ...,
        ) -> Any: ...
    
        def generate(
            self,
            prompt         : str,
            *,
            max_tokens     : int                  = 200,
            temp           : float                = 0.7,
            top_k          : int                  = 40,
            top_p          : float                = 0.4,
            min_p          : float                = 0.0,
            repeat_penalty : float                = 1.18,
            repeat_last_n  : int                  = 64,
            n_batch        : int                  = 8,
            n_predict      : int | None           = None,
            streaming      : bool                 = False,
            callback       : ResponseCallbackType = empty_response_callback,
        ) -> Any:
            """
            Generate outputs from any GPT4All model.
    
            Args:
                prompt: The prompt for the model to complete.
                max_tokens: The maximum number of tokens to generate.
                temp: The model temperature. Larger values increase creativity but decrease factuality.
                top_k: Randomly sample from the top_k most likely tokens at each generation step. Set this to 1 for greedy decoding.
                top_p: Randomly sample at each generation step from the top most likely tokens whose probabilities add up to top_p.
                min_p: Randomly sample at each generation step from the top most likely tokens whose probabilities are at least min_p.
                repeat_penalty: Penalize the model for repetition. Higher values result in less repetition.
                repeat_last_n: How far in the models generation history to apply the repeat penalty.
                n_batch: Number of prompt tokens processed in parallel. Larger values decrease latency but increase resource requirements.
                n_predict: Equivalent to max_tokens, exists for backwards compatibility.
                streaming: If True, this method will instead return a generator that yields tokens as the model generates them.
                callback: A function with arguments token_id:int and response:str, which receives the tokens from the model as they are generated and stops the generation by returning False.
    
            Returns:
                Either the entire completion or a generator that yields the completion token by token.
            """
    
            # Preparing the model request
            generate_kwargs: dict[str, Any] = dict(
                temp           = temp,
                top_k          = top_k,
                top_p          = top_p,
                min_p          = min_p,
                repeat_penalty = repeat_penalty,
                repeat_last_n  = repeat_last_n,
                n_batch        = n_batch,
                n_predict      = n_predict if n_predict is not None else max_tokens,
            )
    
            # Prepare the callback, process the model response
            full_response = ""
    
            def _callback_wrapper(token_id: int, response: str) -> bool:
                nonlocal full_response
                full_response += response
                return callback(token_id, response)
    
            last_msg_rendered = prompt
            if self._chat_session is not None:
                session = self._chat_session
                def render(messages: list[MessageType]) -> str:
                    return session.template.render(
                        messages=messages,
                        add_generation_prompt=True,
                        **self.model.special_tokens_map,
                    )
                session.history.append(MessageType(role="user", content=prompt))
                prompt = render(session.history)
                if len(session.history) > 1:
                    last_msg_rendered = render(session.history[-1:])
    
            # Check request length
            last_msg_len = self.model.count_prompt_tokens(last_msg_rendered)
            if last_msg_len > (limit := self.model.n_ctx - 4):
                raise ValueError(f"Your message was too long and could not be processed ({last_msg_len} > {limit}).")
    
            # Send the request to the model
            if streaming:
                def stream() -> Iterator[str]:
                    yield from self.model.prompt_model_streaming(prompt, _callback_wrapper, **generate_kwargs)
                    if self._chat_session is not None:
                        self._chat_session.history.append(MessageType(role="assistant", content=full_response))
                return stream()
    
            self.model.prompt_model(prompt, _callback_wrapper, **generate_kwargs)
            if self._chat_session is not None:
                self._chat_session.history.append(MessageType(role="assistant", content=full_response))
            return full_response
    
        @contextmanager
        def chat_session(
            self,
            system_message: str | Literal[False] | None = None,
            chat_template: str | None = None,
        ):
            """
            Context manager to hold an inference optimized chat session with a GPT4All model.
    
            Args:
                system_message: An initial instruction for the model, None to use the model default, or False to disable. Defaults to None.
                chat_template: Jinja template for the conversation, or None to use the model default. Defaults to None.
            """
    
            if system_message is None:
                system_message = self.config.get("systemMessage", False)
    
            if chat_template is None:
                if "name" not in self.config:
                    raise ValueError("For sideloaded models or with allow_download=False, you must specify a chat template.")
                if "chatTemplate" not in self.config:
                    raise NotImplementedError("This model appears to have a built-in chat template, but loading it is not "
                                              "currently implemented. Please pass a template to chat_session() directly.")
                if (tmpl := self.config["chatTemplate"]) is None:
                    raise ValueError(f"The model {self.config['name']!r} does not support chat.")
                chat_template = tmpl
    
            history = []
            if system_message is not False:
                history.append(MessageType(role="system", content=system_message))
            self._chat_session = ChatSession(
                template=_jinja_env.from_string(chat_template),
                history=history,
            )
            try:
                yield self
            finally:
                self._chat_session = None
    
        @staticmethod
        def list_gpus() -> list[str]:
            """
            List the names of the available GPU devices.
    
            Returns:
                A list of strings representing the names of the available GPU devices.
            """
            return LLModel.list_gpus()
      
  
---|---  
  
#####  `backend: Literal['cpu', 'kompute', 'cuda', 'metal']` `property`

The name of the llama.cpp backend currently in use. One of "cpu", "kompute",
"cuda", or "metal".

#####  `device: str | None` `property`

The name of the GPU device currently in use, or None for backends other than
Kompute or CUDA.

#####  `__init__(model_name, *, model_path=None, model_type=None,
allow_download=True, n_threads=None, device=None, n_ctx=2048, ngl=100,
verbose=False)`

Constructor

Parameters:

  * **`model_name`** (`str`) â 

Name of GPT4All or custom model. Including ".gguf" file extension is optional
but encouraged.

  * **`model_path`** (`str | PathLike[str] | None`, default: `None` ) â 

Path to directory containing model file or, if file does not exist, where to
download model. Default is None, in which case models will be stored in
`~/.cache/gpt4all/`.

  * **`model_type`** (`str | None`, default: `None` ) â 

Model architecture. This argument currently does not have any functionality
and is just used as descriptive identifier for user. Default is None.

  * **`allow_download`** (`bool`, default: `True` ) â 

Allow API to download models from gpt4all.io. Default is True.

  * **`n_threads`** (`int | None`, default: `None` ) â 

number of CPU threads used by GPT4All. Default is None, then the number of
threads are determined automatically.

  * **`device`** (`str | None`, default: `None` ) â 

The processing unit on which the GPT4All model will run. It can be set to: \-
"cpu": Model will run on the central processing unit. \- "gpu": Use Metal on
ARM64 macOS, otherwise the same as "kompute". \- "kompute": Use the best GPU
provided by the Kompute backend. \- "cuda": Use the best GPU provided by the
CUDA backend. \- "amd", "nvidia": Use the best GPU provided by the Kompute
backend from this vendor. \- A specific device name from the list returned by
`GPT4All.list_gpus()`. Default is Metal on ARM64 macOS, "cpu" otherwise.

Note: If a selected GPU device does not have sufficient RAM to accommodate the
model, an error will be thrown, and the GPT4All instance will be rendered
invalid. It's advised to ensure the device has enough memory before initiating
the model.

  * **`n_ctx`** (`int`, default: `2048` ) â 

Maximum size of context window

  * **`ngl`** (`int`, default: `100` ) â 

Number of GPU layers to use (Vulkan)

  * **`verbose`** (`bool`, default: `False` ) â 

If True, print debug messages.

Source code in `gpt4all/gpt4all.py`

    
    
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269

|

    
    
    def __init__(
        self,
        model_name: str,
        *,
        model_path: str | os.PathLike[str] | None = None,
        model_type: str | None = None,
        allow_download: bool = True,
        n_threads: int | None = None,
        device: str | None = None,
        n_ctx: int = 2048,
        ngl: int = 100,
        verbose: bool = False,
    ):
        """
        Constructor
    
        Args:
            model_name: Name of GPT4All or custom model. Including ".gguf" file extension is optional but encouraged.
            model_path: Path to directory containing model file or, if file does not exist, where to download model.
                Default is None, in which case models will be stored in `~/.cache/gpt4all/`.
            model_type: Model architecture. This argument currently does not have any functionality and is just used as
                descriptive identifier for user. Default is None.
            allow_download: Allow API to download models from gpt4all.io. Default is True.
            n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically.
            device: The processing unit on which the GPT4All model will run. It can be set to:
                - "cpu": Model will run on the central processing unit.
                - "gpu": Use Metal on ARM64 macOS, otherwise the same as "kompute".
                - "kompute": Use the best GPU provided by the Kompute backend.
                - "cuda": Use the best GPU provided by the CUDA backend.
                - "amd", "nvidia": Use the best GPU provided by the Kompute backend from this vendor.
                - A specific device name from the list returned by `GPT4All.list_gpus()`.
                Default is Metal on ARM64 macOS, "cpu" otherwise.
    
                Note: If a selected GPU device does not have sufficient RAM to accommodate the model, an error will be thrown, and the GPT4All instance will be rendered invalid. It's advised to ensure the device has enough memory before initiating the model.
            n_ctx: Maximum size of context window
            ngl: Number of GPU layers to use (Vulkan)
            verbose: If True, print debug messages.
        """
    
        self.model_type = model_type
        self._chat_session: ChatSession | None = None
    
        device_init = None
        if sys.platform == "darwin":
            if device is None:
                backend = "auto"  # "auto" is effectively "metal" due to currently non-functional fallback
            elif device == "cpu":
                backend = "cpu"
            else:
                if platform.machine() != "arm64" or device != "gpu":
                    raise ValueError(f"Unknown device for this platform: {device}")
                backend = "metal"
        else:
            backend = "kompute"
            if device is None or device == "cpu":
                pass  # use kompute with no device
            elif device in ("cuda", "kompute"):
                backend = device
                device_init = "gpu"
            elif device.startswith("cuda:"):
                backend = "cuda"
                device_init = _remove_prefix(device, "cuda:")
            else:
                device_init = _remove_prefix(device, "kompute:")
    
        # Retrieve model and download if allowed
        self.config: ConfigType = self.retrieve_model(model_name, model_path=model_path, allow_download=allow_download, verbose=verbose)
        self.model = LLModel(self.config["path"], n_ctx, ngl, backend)
        if device_init is not None:
            self.model.init_gpu(device_init)
        self.model.load_model()
        # Set n_threads
        if n_threads is not None:
            self.model.set_thread_count(n_threads)
      
  
---|---  
  
#####  `chat_session(system_message=None, chat_template=None)`

Context manager to hold an inference optimized chat session with a GPT4All
model.

Parameters:

  * **`system_message`** (`str | Literal[False] | None`, default: `None` ) â 

An initial instruction for the model, None to use the model default, or False
to disable. Defaults to None.

  * **`chat_template`** (`str | None`, default: `None` ) â 

Jinja template for the conversation, or None to use the model default.
Defaults to None.

Source code in `gpt4all/gpt4all.py`

    
    
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
    627
    628
    629
    630
    631
    632
    633
    634
    635
    636
    637
    638

|

    
    
    @contextmanager
    def chat_session(
        self,
        system_message: str | Literal[False] | None = None,
        chat_template: str | None = None,
    ):
        """
        Context manager to hold an inference optimized chat session with a GPT4All model.
    
        Args:
            system_message: An initial instruction for the model, None to use the model default, or False to disable. Defaults to None.
            chat_template: Jinja template for the conversation, or None to use the model default. Defaults to None.
        """
    
        if system_message is None:
            system_message = self.config.get("systemMessage", False)
    
        if chat_template is None:
            if "name" not in self.config:
                raise ValueError("For sideloaded models or with allow_download=False, you must specify a chat template.")
            if "chatTemplate" not in self.config:
                raise NotImplementedError("This model appears to have a built-in chat template, but loading it is not "
                                          "currently implemented. Please pass a template to chat_session() directly.")
            if (tmpl := self.config["chatTemplate"]) is None:
                raise ValueError(f"The model {self.config['name']!r} does not support chat.")
            chat_template = tmpl
    
        history = []
        if system_message is not False:
            history.append(MessageType(role="system", content=system_message))
        self._chat_session = ChatSession(
            template=_jinja_env.from_string(chat_template),
            history=history,
        )
        try:
            yield self
        finally:
            self._chat_session = None
      
  
---|---  
  
#####  `close()`

Delete the model instance and free associated system resources.

Source code in `gpt4all/gpt4all.py`

    
    
    279
    280
    281

|

    
    
    def close(self) -> None:
        """Delete the model instance and free associated system resources."""
        self.model.close()
      
  
---|---  
  
#####  `download_model(model_filename, model_path, verbose=True, url=None,
expected_size=None, expected_md5=None)` `staticmethod`

Download model from gpt4all.io.

Parameters:

  * **`model_filename`** (`str`) â 

Filename of model (with .gguf extension).

  * **`model_path`** (`str | PathLike[str]`) â 

Path to download model to.

  * **`verbose`** (`bool`, default: `True` ) â 

If True (default), print debug messages.

  * **`url`** (`str | None`, default: `None` ) â 

the models remote url (e.g. may be hosted on HF)

  * **`expected_size`** (`int | None`, default: `None` ) â 

The expected size of the download.

  * **`expected_md5`** (`str | None`, default: `None` ) â 

The expected MD5 hash of the download.

Returns:

  * `str | PathLike[str]` â 

Model file destination.

Source code in `gpt4all/gpt4all.py`

    
    
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491

|

    
    
    @staticmethod
    def download_model(
        model_filename: str,
        model_path: str | os.PathLike[str],
        verbose: bool = True,
        url: str | None = None,
        expected_size: int | None = None,
        expected_md5: str | None = None,
    ) -> str | os.PathLike[str]:
        """
        Download model from gpt4all.io.
    
        Args:
            model_filename: Filename of model (with .gguf extension).
            model_path: Path to download model to.
            verbose: If True (default), print debug messages.
            url: the models remote url (e.g. may be hosted on HF)
            expected_size: The expected size of the download.
            expected_md5: The expected MD5 hash of the download.
    
        Returns:
            Model file destination.
        """
    
        # Download model
        if url is None:
            url = f"https://gpt4all.io/models/gguf/{model_filename}"
    
        def make_request(offset=None):
            headers = {}
            if offset:
                print(f"\nDownload interrupted, resuming from byte position {offset}", file=sys.stderr)
                headers["Range"] = f"bytes={offset}-"  # resume incomplete response
                headers["Accept-Encoding"] = "identity"  # Content-Encoding changes meaning of ranges
            response = requests.get(url, stream=True, headers=headers)
            if response.status_code not in (200, 206):
                raise ValueError(f"Request failed: HTTP {response.status_code} {response.reason}")
            if offset and (response.status_code != 206 or str(offset) not in response.headers.get("Content-Range", "")):
                raise ValueError("Connection was interrupted and server does not support range requests")
            if (enc := response.headers.get("Content-Encoding")) is not None:
                raise ValueError(f"Expected identity Content-Encoding, got {enc}")
            return response
    
        response = make_request()
    
        total_size_in_bytes = int(response.headers.get("content-length", 0))
        block_size = 2**20  # 1 MB
    
        partial_path = Path(model_path) / (model_filename + ".part")
    
        with open(partial_path, "w+b") as partf:
            try:
                with tqdm(desc="Downloading", total=total_size_in_bytes, unit="iB", unit_scale=True) as progress_bar:
                    while True:
                        last_progress = progress_bar.n
                        try:
                            for data in response.iter_content(block_size):
                                partf.write(data)
                                progress_bar.update(len(data))
                        except ChunkedEncodingError as cee:
                            if cee.args and isinstance(pe := cee.args[0], ProtocolError):
                                if len(pe.args) >= 2 and isinstance(ir := pe.args[1], IncompleteRead):
                                    assert progress_bar.n <= ir.partial  # urllib3 may be ahead of us but never behind
                                    # the socket was closed during a read - retry
                                    response = make_request(progress_bar.n)
                                    continue
                            raise
                        if total_size_in_bytes != 0 and progress_bar.n < total_size_in_bytes:
                            if progress_bar.n == last_progress:
                                raise RuntimeError("Download not making progress, aborting.")
                            # server closed connection prematurely - retry
                            response = make_request(progress_bar.n)
                            continue
                        break
    
                # verify file integrity
                file_size = partf.tell()
                if expected_size is not None and file_size != expected_size:
                    raise ValueError(f"Expected file size of {expected_size} bytes, got {file_size}")
                if expected_md5 is not None:
                    partf.seek(0)
                    hsh = hashlib.md5()
                    with tqdm(desc="Verifying", total=file_size, unit="iB", unit_scale=True) as bar:
                        while chunk := partf.read(block_size):
                            hsh.update(chunk)
                            bar.update(len(chunk))
                    if hsh.hexdigest() != expected_md5.lower():
                        raise ValueError(f"Expected MD5 hash of {expected_md5!r}, got {hsh.hexdigest()!r}")
            except:
                if verbose:
                    print("Cleaning up the interrupted download...", file=sys.stderr)
                try:
                    os.remove(partial_path)
                except OSError:
                    pass
                raise
    
            # flush buffers and sync the inode
            partf.flush()
            _fsync(partf)
    
        # move to final destination
        download_path = Path(model_path) / model_filename
        try:
            os.rename(partial_path, download_path)
        except FileExistsError:
            try:
                os.remove(partial_path)
            except OSError:
                pass
            raise
    
        if verbose:
            print(f"Model downloaded to {str(download_path)!r}", file=sys.stderr)
        return download_path
      
  
---|---  
  
#####  `generate(prompt, *, max_tokens=200, temp=0.7, top_k=40, top_p=0.4,
min_p=0.0, repeat_penalty=1.18, repeat_last_n=64, n_batch=8, n_predict=None,
streaming=False, callback=empty_response_callback)`

Generate outputs from any GPT4All model.

Parameters:

  * **`prompt`** (`str`) â 

The prompt for the model to complete.

  * **`max_tokens`** (`int`, default: `200` ) â 

The maximum number of tokens to generate.

  * **`temp`** (`float`, default: `0.7` ) â 

The model temperature. Larger values increase creativity but decrease
factuality.

  * **`top_k`** (`int`, default: `40` ) â 

Randomly sample from the top_k most likely tokens at each generation step. Set
this to 1 for greedy decoding.

  * **`top_p`** (`float`, default: `0.4` ) â 

Randomly sample at each generation step from the top most likely tokens whose
probabilities add up to top_p.

  * **`min_p`** (`float`, default: `0.0` ) â 

Randomly sample at each generation step from the top most likely tokens whose
probabilities are at least min_p.

  * **`repeat_penalty`** (`float`, default: `1.18` ) â 

Penalize the model for repetition. Higher values result in less repetition.

  * **`repeat_last_n`** (`int`, default: `64` ) â 

How far in the models generation history to apply the repeat penalty.

  * **`n_batch`** (`int`, default: `8` ) â 

Number of prompt tokens processed in parallel. Larger values decrease latency
but increase resource requirements.

  * **`n_predict`** (`int | None`, default: `None` ) â 

Equivalent to max_tokens, exists for backwards compatibility.

  * **`streaming`** (`bool`, default: `False` ) â 

If True, this method will instead return a generator that yields tokens as the
model generates them.

  * **`callback`** (`ResponseCallbackType`, default: `empty_response_callback` ) â 

A function with arguments token_id:int and response:str, which receives the
tokens from the model as they are generated and stops the generation by
returning False.

Returns:

  * `Any` â 

Either the entire completion or a generator that yields the completion token
by token.

Source code in `gpt4all/gpt4all.py`

    
    
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599

|

    
    
    def generate(
        self,
        prompt         : str,
        *,
        max_tokens     : int                  = 200,
        temp           : float                = 0.7,
        top_k          : int                  = 40,
        top_p          : float                = 0.4,
        min_p          : float                = 0.0,
        repeat_penalty : float                = 1.18,
        repeat_last_n  : int                  = 64,
        n_batch        : int                  = 8,
        n_predict      : int | None           = None,
        streaming      : bool                 = False,
        callback       : ResponseCallbackType = empty_response_callback,
    ) -> Any:
        """
        Generate outputs from any GPT4All model.
    
        Args:
            prompt: The prompt for the model to complete.
            max_tokens: The maximum number of tokens to generate.
            temp: The model temperature. Larger values increase creativity but decrease factuality.
            top_k: Randomly sample from the top_k most likely tokens at each generation step. Set this to 1 for greedy decoding.
            top_p: Randomly sample at each generation step from the top most likely tokens whose probabilities add up to top_p.
            min_p: Randomly sample at each generation step from the top most likely tokens whose probabilities are at least min_p.
            repeat_penalty: Penalize the model for repetition. Higher values result in less repetition.
            repeat_last_n: How far in the models generation history to apply the repeat penalty.
            n_batch: Number of prompt tokens processed in parallel. Larger values decrease latency but increase resource requirements.
            n_predict: Equivalent to max_tokens, exists for backwards compatibility.
            streaming: If True, this method will instead return a generator that yields tokens as the model generates them.
            callback: A function with arguments token_id:int and response:str, which receives the tokens from the model as they are generated and stops the generation by returning False.
    
        Returns:
            Either the entire completion or a generator that yields the completion token by token.
        """
    
        # Preparing the model request
        generate_kwargs: dict[str, Any] = dict(
            temp           = temp,
            top_k          = top_k,
            top_p          = top_p,
            min_p          = min_p,
            repeat_penalty = repeat_penalty,
            repeat_last_n  = repeat_last_n,
            n_batch        = n_batch,
            n_predict      = n_predict if n_predict is not None else max_tokens,
        )
    
        # Prepare the callback, process the model response
        full_response = ""
    
        def _callback_wrapper(token_id: int, response: str) -> bool:
            nonlocal full_response
            full_response += response
            return callback(token_id, response)
    
        last_msg_rendered = prompt
        if self._chat_session is not None:
            session = self._chat_session
            def render(messages: list[MessageType]) -> str:
                return session.template.render(
                    messages=messages,
                    add_generation_prompt=True,
                    **self.model.special_tokens_map,
                )
            session.history.append(MessageType(role="user", content=prompt))
            prompt = render(session.history)
            if len(session.history) > 1:
                last_msg_rendered = render(session.history[-1:])
    
        # Check request length
        last_msg_len = self.model.count_prompt_tokens(last_msg_rendered)
        if last_msg_len > (limit := self.model.n_ctx - 4):
            raise ValueError(f"Your message was too long and could not be processed ({last_msg_len} > {limit}).")
    
        # Send the request to the model
        if streaming:
            def stream() -> Iterator[str]:
                yield from self.model.prompt_model_streaming(prompt, _callback_wrapper, **generate_kwargs)
                if self._chat_session is not None:
                    self._chat_session.history.append(MessageType(role="assistant", content=full_response))
            return stream()
    
        self.model.prompt_model(prompt, _callback_wrapper, **generate_kwargs)
        if self._chat_session is not None:
            self._chat_session.history.append(MessageType(role="assistant", content=full_response))
        return full_response
      
  
---|---  
  
#####  `list_gpus()` `staticmethod`

List the names of the available GPU devices.

Returns:

  * `list[str]` â 

A list of strings representing the names of the available GPU devices.

Source code in `gpt4all/gpt4all.py`

    
    
    640
    641
    642
    643
    644
    645
    646
    647
    648

|

    
    
    @staticmethod
    def list_gpus() -> list[str]:
        """
        List the names of the available GPU devices.
    
        Returns:
            A list of strings representing the names of the available GPU devices.
        """
        return LLModel.list_gpus()
      
  
---|---  
  
#####  `list_models()` `staticmethod`

Fetch model list from https://gpt4all.io/models/models3.json.

Returns:

  * `list[ConfigType]` â 

Model list in JSON format.

Source code in `gpt4all/gpt4all.py`

    
    
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314

|

    
    
    @staticmethod
    def list_models() -> list[ConfigType]:
        """
        Fetch model list from https://gpt4all.io/models/models3.json.
    
        Returns:
            Model list in JSON format.
        """
        resp = requests.get("https://gpt4all.io/models/models3.json")
        if resp.status_code != 200:
            raise ValueError(f"Request failed: HTTP {resp.status_code} {resp.reason}")
        return resp.json()
      
  
---|---  
  
#####  `retrieve_model(model_name, model_path=None, allow_download=True,
verbose=False)` `classmethod`

Find model file, and if it doesn't exist, download the model.

Parameters:

  * **`model_name`** (`str`) â 

Name of model.

  * **`model_path`** (`str | PathLike[str] | None`, default: `None` ) â 

Path to find model. Default is None in which case path is set to
~/.cache/gpt4all/.

  * **`allow_download`** (`bool`, default: `True` ) â 

Allow API to download model from gpt4all.io. Default is True.

  * **`verbose`** (`bool`, default: `False` ) â 

If True (default), print debug messages.

Returns:

  * `ConfigType` â 

Model config.

Source code in `gpt4all/gpt4all.py`

    
    
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375

|

    
    
    @classmethod
    def retrieve_model(
        cls,
        model_name: str,
        model_path: str | os.PathLike[str] | None = None,
        allow_download: bool = True,
        verbose: bool = False,
    ) -> ConfigType:
        """
        Find model file, and if it doesn't exist, download the model.
    
        Args:
            model_name: Name of model.
            model_path: Path to find model. Default is None in which case path is set to
                ~/.cache/gpt4all/.
            allow_download: Allow API to download model from gpt4all.io. Default is True.
            verbose: If True (default), print debug messages.
    
        Returns:
            Model config.
        """
    
        model_filename = append_extension_if_missing(model_name)
    
        # get the config for the model
        config: ConfigType = {}
        if allow_download:
            models = cls.list_models()
            if (model := next((m for m in models if m["filename"] == model_filename), None)) is not None:
                config.update(model)
    
        # Validate download directory
        if model_path is None:
            try:
                os.makedirs(DEFAULT_MODEL_DIRECTORY, exist_ok=True)
            except OSError as e:
                raise RuntimeError("Failed to create model download directory") from e
            model_path = DEFAULT_MODEL_DIRECTORY
        else:
            model_path = Path(model_path)
    
        if not model_path.exists():
            raise FileNotFoundError(f"Model directory does not exist: {model_path!r}")
    
        model_dest = model_path / model_filename
        if model_dest.exists():
            config["path"] = str(model_dest)
            if verbose:
                print(f"Found model file at {str(model_dest)!r}", file=sys.stderr)
        elif allow_download:
            # If model file does not exist, download
            filesize = config.get("filesize")
            config["path"] = str(cls.download_model(
                model_filename, model_path, verbose=verbose, url=config.get("url"),
                expected_size=None if filesize is None else int(filesize), expected_md5=config.get("md5sum"),
            ))
        else:
            raise FileNotFoundError(f"Model file does not exist: {model_dest!r}")
    
        return config
      
  
---|---  
  
####  `Embed4All`

Python class that handles embeddings for GPT4All.

Source code in `gpt4all/gpt4all.py`

    
    
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188

|

    
    
    class Embed4All:
        """
        Python class that handles embeddings for GPT4All.
        """
    
        MIN_DIMENSIONALITY = 64
    
        def __init__(self, model_name: str | None = None, *, n_threads: int | None = None, device: str | None = None, **kwargs: Any):
            """
            Constructor
    
            Args:
                n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically.
                device: The processing unit on which the embedding model will run. See the `GPT4All` constructor for more info.
                kwargs: Remaining keyword arguments are passed to the `GPT4All` constructor.
            """
            if model_name is None:
                model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf"
            self.gpt4all = GPT4All(model_name, n_threads=n_threads, device=device, **kwargs)
    
        def __enter__(self) -> Self:
            return self
    
        def __exit__(
            self, typ: type[BaseException] | None, value: BaseException | None, tb: TracebackType | None,
        ) -> None:
            self.close()
    
        def close(self) -> None:
            """Delete the model instance and free associated system resources."""
            self.gpt4all.close()
    
        # return_dict=False
        @overload
        def embed(
            self, text: str, *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ...,
            return_dict: Literal[False] = ..., atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ...,
        ) -> list[float]: ...
        @overload
        def embed(
            self, text: list[str], *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ...,
            return_dict: Literal[False] = ..., atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ...,
        ) -> list[list[float]]: ...
        @overload
        def embed(
            self, text: str | list[str], *, prefix: str | None = ..., dimensionality: int | None = ...,
            long_text_mode: str = ..., return_dict: Literal[False] = ..., atlas: bool = ...,
            cancel_cb: EmbCancelCallbackType | None = ...,
        ) -> list[Any]: ...
    
        # return_dict=True
        @overload
        def embed(
            self, text: str, *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ...,
            return_dict: Literal[True], atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ...,
        ) -> EmbedResult[list[float]]: ...
        @overload
        def embed(
            self, text: list[str], *, prefix: str | None = ..., dimensionality: int | None = ..., long_text_mode: str = ...,
            return_dict: Literal[True], atlas: bool = ..., cancel_cb: EmbCancelCallbackType | None = ...,
        ) -> EmbedResult[list[list[float]]]: ...
        @overload
        def embed(
            self, text: str | list[str], *, prefix: str | None = ..., dimensionality: int | None = ...,
            long_text_mode: str = ..., return_dict: Literal[True], atlas: bool = ...,
            cancel_cb: EmbCancelCallbackType | None = ...,
        ) -> EmbedResult[list[Any]]: ...
    
        # return type unknown
        @overload
        def embed(
            self, text: str | list[str], *, prefix: str | None = ..., dimensionality: int | None = ...,
            long_text_mode: str = ..., return_dict: bool = ..., atlas: bool = ...,
            cancel_cb: EmbCancelCallbackType | None = ...,
        ) -> Any: ...
    
        def embed(
            self, text: str | list[str], *, prefix: str | None = None, dimensionality: int | None = None,
            long_text_mode: str = "mean", return_dict: bool = False, atlas: bool = False,
            cancel_cb: EmbCancelCallbackType | None = None,
        ) -> Any:
            """
            Generate one or more embeddings.
    
            Args:
                text: A text or list of texts to generate embeddings for.
                prefix: The model-specific prefix representing the embedding task, without the trailing colon. For Nomic
                    Embed, this can be `search_query`, `search_document`, `classification`, or `clustering`. Defaults to
                    `search_document` or equivalent if known; otherwise, you must explicitly pass a prefix or an empty
                    string if none applies.
                dimensionality: The embedding dimension, for use with Matryoshka-capable models. Defaults to full-size.
                long_text_mode: How to handle texts longer than the model can accept. One of `mean` or `truncate`.
                return_dict: Return the result as a dict that includes the number of prompt tokens processed.
                atlas: Try to be fully compatible with the Atlas API. Currently, this means texts longer than 8192 tokens
                    with long_text_mode="mean" will raise an error. Disabled by default.
                cancel_cb: Called with arguments (batch_sizes, backend_name). Return true to cancel embedding.
    
            Returns:
                With return_dict=False, an embedding or list of embeddings of your text(s).
                With return_dict=True, a dict with keys 'embeddings' and 'n_prompt_tokens'.
    
            Raises:
                CancellationError: If cancel_cb returned True and embedding was canceled.
            """
            if dimensionality is None:
                dimensionality = -1
            else:
                if dimensionality <= 0:
                    raise ValueError(f"Dimensionality must be None or a positive integer, got {dimensionality}")
                if dimensionality < self.MIN_DIMENSIONALITY:
                    warnings.warn(
                        f"Dimensionality {dimensionality} is less than the suggested minimum of {self.MIN_DIMENSIONALITY}."
                        " Performance may be degraded."
                    )
            try:
                do_mean = {"mean": True, "truncate": False}[long_text_mode]
            except KeyError:
                raise ValueError(f"Long text mode must be one of 'mean' or 'truncate', got {long_text_mode!r}")
            result = self.gpt4all.model.generate_embeddings(text, prefix, dimensionality, do_mean, atlas, cancel_cb)
            return result if return_dict else result["embeddings"]
      
  
---|---  
  
#####  `__init__(model_name=None, *, n_threads=None, device=None, **kwargs)`

Constructor

Parameters:

  * **`n_threads`** (`int | None`, default: `None` ) â 

number of CPU threads used by GPT4All. Default is None, then the number of
threads are determined automatically.

  * **`device`** (`str | None`, default: `None` ) â 

The processing unit on which the embedding model will run. See the `GPT4All`
constructor for more info.

  * **`kwargs`** (`Any`, default: `{}` ) â 

Remaining keyword arguments are passed to the `GPT4All` constructor.

Source code in `gpt4all/gpt4all.py`

    
    
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87

|

    
    
    def __init__(self, model_name: str | None = None, *, n_threads: int | None = None, device: str | None = None, **kwargs: Any):
        """
        Constructor
    
        Args:
            n_threads: number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically.
            device: The processing unit on which the embedding model will run. See the `GPT4All` constructor for more info.
            kwargs: Remaining keyword arguments are passed to the `GPT4All` constructor.
        """
        if model_name is None:
            model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf"
        self.gpt4all = GPT4All(model_name, n_threads=n_threads, device=device, **kwargs)
      
  
---|---  
  
#####  `close()`

Delete the model instance and free associated system resources.

Source code in `gpt4all/gpt4all.py`

    
    
    97
    98
    99

|

    
    
    def close(self) -> None:
        """Delete the model instance and free associated system resources."""
        self.gpt4all.close()
      
  
---|---  
  
#####  `embed(text, *, prefix=None, dimensionality=None,
long_text_mode='mean', return_dict=False, atlas=False, cancel_cb=None)`

Generate one or more embeddings.

Parameters:

  * **`text`** (`str | list[str]`) â 

A text or list of texts to generate embeddings for.

  * **`prefix`** (`str | None`, default: `None` ) â 

The model-specific prefix representing the embedding task, without the
trailing colon. For Nomic Embed, this can be `search_query`,
`search_document`, `classification`, or `clustering`. Defaults to
`search_document` or equivalent if known; otherwise, you must explicitly pass
a prefix or an empty string if none applies.

  * **`dimensionality`** (`int | None`, default: `None` ) â 

The embedding dimension, for use with Matryoshka-capable models. Defaults to
full-size.

  * **`long_text_mode`** (`str`, default: `'mean'` ) â 

How to handle texts longer than the model can accept. One of `mean` or
`truncate`.

  * **`return_dict`** (`bool`, default: `False` ) â 

Return the result as a dict that includes the number of prompt tokens
processed.

  * **`atlas`** (`bool`, default: `False` ) â 

Try to be fully compatible with the Atlas API. Currently, this means texts
longer than 8192 tokens with long_text_mode="mean" will raise an error.
Disabled by default.

  * **`cancel_cb`** (`EmbCancelCallbackType | None`, default: `None` ) â 

Called with arguments (batch_sizes, backend_name). Return true to cancel
embedding.

Returns:

  * `Any` â 

With return_dict=False, an embedding or list of embeddings of your text(s).

  * `Any` â 

With return_dict=True, a dict with keys 'embeddings' and 'n_prompt_tokens'.

Raises:

  * `CancellationError` â 

If cancel_cb returned True and embedding was canceled.

Source code in `gpt4all/gpt4all.py`

    
    
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188

|

    
    
    def embed(
        self, text: str | list[str], *, prefix: str | None = None, dimensionality: int | None = None,
        long_text_mode: str = "mean", return_dict: bool = False, atlas: bool = False,
        cancel_cb: EmbCancelCallbackType | None = None,
    ) -> Any:
        """
        Generate one or more embeddings.
    
        Args:
            text: A text or list of texts to generate embeddings for.
            prefix: The model-specific prefix representing the embedding task, without the trailing colon. For Nomic
                Embed, this can be `search_query`, `search_document`, `classification`, or `clustering`. Defaults to
                `search_document` or equivalent if known; otherwise, you must explicitly pass a prefix or an empty
                string if none applies.
            dimensionality: The embedding dimension, for use with Matryoshka-capable models. Defaults to full-size.
            long_text_mode: How to handle texts longer than the model can accept. One of `mean` or `truncate`.
            return_dict: Return the result as a dict that includes the number of prompt tokens processed.
            atlas: Try to be fully compatible with the Atlas API. Currently, this means texts longer than 8192 tokens
                with long_text_mode="mean" will raise an error. Disabled by default.
            cancel_cb: Called with arguments (batch_sizes, backend_name). Return true to cancel embedding.
    
        Returns:
            With return_dict=False, an embedding or list of embeddings of your text(s).
            With return_dict=True, a dict with keys 'embeddings' and 'n_prompt_tokens'.
    
        Raises:
            CancellationError: If cancel_cb returned True and embedding was canceled.
        """
        if dimensionality is None:
            dimensionality = -1
        else:
            if dimensionality <= 0:
                raise ValueError(f"Dimensionality must be None or a positive integer, got {dimensionality}")
            if dimensionality < self.MIN_DIMENSIONALITY:
                warnings.warn(
                    f"Dimensionality {dimensionality} is less than the suggested minimum of {self.MIN_DIMENSIONALITY}."
                    " Performance may be degraded."
                )
        try:
            do_mean = {"mean": True, "truncate": False}[long_text_mode]
        except KeyError:
            raise ValueError(f"Long text mode must be one of 'mean' or 'truncate', got {long_text_mode!r}")
        result = self.gpt4all.model.generate_embeddings(text, prefix, dimensionality, do_mean, atlas, cancel_cb)
        return result if return_dict else result["embeddings"]
      
  
---|---

---

# GPT4All Documentation

Source: https://docs.gpt4all.io

---

[ ![logo](assets/nomic.png) ](index.html "GPT4All") GPT4All

[ nomic-ai/gpt4all  ](https://github.com/nomic-ai/gpt4all "Go to repository")

  * [ GPT4All Documentation  ](index.html)
  * [ Quickstart  ](gpt4all_desktop/quickstart.html)
  * [ Chats  ](gpt4all_desktop/chats.html)
  * [ Models  ](gpt4all_desktop/models.html)
  * [ LocalDocs  ](gpt4all_desktop/localdocs.html)
  * [ Settings  ](gpt4all_desktop/settings.html)
  * [ Chat Templates  ](gpt4all_desktop/chat_templates.html)
  * Cookbook  Cookbook 
    * [ Local AI Chat with Microsoft Excel  ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-microsoft-excel.html)
    * [ Local AI Chat with your Google Drive  ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-google-drive.html)
    * [ Local AI Chat with your Obsidian Vault  ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-Obsidian.html)
    * [ Local AI Chat with your OneDrive  ](gpt4all_desktop/cookbook/use-local-ai-models-to-privately-chat-with-One-Drive.html)
  * API Server  API Server 
    * [ GPT4All API Server  ](gpt4all_api_server/home.html)
  * Python SDK  Python SDK 
    * [ GPT4All Python SDK  ](gpt4all_python/home.html)
    * [ Monitoring  ](gpt4all_python/monitoring.html)
    * [ SDK Reference  ](gpt4all_python/ref.html)
  * Help  Help 
    * [ FAQ  ](gpt4all_help/faq.html)
    * [ Troubleshooting  ](gpt4all_help/troubleshooting.html)

# GPT4All Documentation

GPT4All runs large language models (LLMs) privately on everyday desktops &
laptops.

No API calls or GPUs required - you can just download the application and [get
started](gpt4all_desktop/quickstart.html#quickstart).

Desktop Application

GPT4All runs LLMs as an application on your computer. Nomic's embedding models
can bring information from your local documents and files into your chats.
It's fast, on-device, and completely **private**.

[Download for Windows](https://gpt4all.io/installers/gpt4all-installer-
win64.exe)      [Download for Mac](https://gpt4all.io/installers/gpt4all-
installer-darwin.dmg)      [Download for
Linux](https://gpt4all.io/installers/gpt4all-installer-linux.run)

Python SDK

Use GPT4All in Python to program with LLMs implemented with the
[`llama.cpp`](https://github.com/ggerganov/llama.cpp) backend and [Nomic's C
backend](https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-backend). Nomic
contributes to open source software like
[`llama.cpp`](https://github.com/ggerganov/llama.cpp) to make LLMs accessible
and efficient **for all**.

    
    
    pip install gpt4all
    
    
    
    from gpt4all import GPT4All
    model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
    with model.chat_session():
        print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))