# Deepinfra > We use essential cookies to make our site work. With your consent, we may also --- # Source: https://deepinfra.com/docs/advanced/aisdk We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. AI SDK 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. AI SDK 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # AI SDK The [AI SDK](https://sdk.vercel.ai/) by Vercel is the AI Toolkit for TypeScript and JavaScript from the creators of Next.js. It is a free open- source library that gives you the tools you need to build AI-powered products. What's even better is that it works with [LLM models by DeepInfra](/models/text-generation) out of the box. You can check [AI SDK docs](https://sdk.vercel.ai/providers/ai-sdk-providers/deepinfra#deepinfra- provider). # Install AI SDK npm install ai @ai-sdk/deepinfra copy # LLM Examples The examples below show how to use the AI SDK with DeepInfra and large language models. Make sure to get your API key from DeepInfra. You have to [Login](https://deepinfra.com/login?from=%2Fdash) and get your token. ## Text Generation import { createDeepInfra } from "@ai-sdk/deepinfra"; import { generateText } from "ai"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const { text, usage, finishReason } = await generateText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), prompt: "Write a vegetarian lasagna recipe for 4 people.", }); console.log(text); console.log(usage); console.log(finishReason); copy You can improve the answers further by providing a system message import { createDeepInfra } from "@ai-sdk/deepinfra"; import { generateText } from "ai"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const { text, usage, finishReason } = await generateText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), prompt: "Write a vegetarian lasagna recipe for 4 people.", system: "You are a professional writer. " + "You write simple, clear, and concise content.", }); console.log(text); console.log(usage); console.log(finishReason); copy ## Streaming Generating text is nice, but your users don't want to wait when large amount of text is generated. For those use cases you can use streaming. import { createDeepInfra } from "@ai-sdk/deepinfra"; import { streamText } from "ai"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const result = streamText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), prompt: "Invent a new holiday and describe its traditions.", system: "You are a professional writer. You write simple, clear, and concise content.", }); for await (const textPart of result.textStream) { console.log(textPart); } console.log(await result.usage); console.log(await result.finishReason); copy ### Conversations To create a longer chat-like conversation you have to add each response message and each of the user's messages to every request. This way the model will have the context and will be able to provide better answers. You can tweak it even further by providing a system message. import { createDeepInfra } from "@ai-sdk/deepinfra"; import { generateText } from "ai"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const { text, usage, finishReason } = await generateText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), messages: [ { role: "system", content: "Respond like a michelin starred chef." }, { role: "user", content: "Can you name at least two different techniques to cook lamb?", }, { role: "assistant", content: 'Bonjour! Let me tell you, my friend, cooking lamb is an art form, and I\'m more than happy to share with you not two, but three of my favorite techniques to coax out the rich, unctuous flavors and tender textures of this majestic protein. First, we have the classic "Sous Vide" method. Next, we have the ancient art of "Sous le Sable". And finally, we have the more modern technique of "Hot Smoking."', }, { role: "user", content: "Tell me more about the second method." }, ], }); console.log(text); console.log(usage); console.log(finishReason); copy ### Conversations & Streaming Of course a conversation response can also be streaming and it is very simple. import { createDeepInfra } from "@ai-sdk/deepinfra"; import { streamText } from "ai"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const result = streamText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), messages: [ { role: "system", content: "Respond like a michelin starred chef." }, { role: "user", content: "Can you name at least two different techniques to cook lamb?", }, { role: "assistant", content: 'Bonjour! Let me tell you, my friend, cooking lamb is an art form, and I\'m more than happy to share with you not two, but three of my favorite techniques to coax out the rich, unctuous flavors and tender textures of this majestic protein. First, we have the classic "Sous Vide" method. Next, we have the ancient art of "Sous le Sable". And finally, we have the more modern technique of "Hot Smoking."', }, { role: "user", content: "Tell me more about the second method." }, ], }); for await (const textPart of result.textStream) { console.log(textPart); } console.log(await result.usage); console.log(await result.finishReason); copy ## Generating structured data Getting text, streaming or not, is amazing but when two systems work together a structured approach is even better. import { createDeepInfra } from "@ai-sdk/deepinfra"; import { generateObject } from "ai"; import { z } from "zod"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const { object, usage, finishReason } = await generateObject({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), schema: z.object({ recipe: z.object({ name: z.string(), ingredients: z.array(z.object({ name: z.string(), amount: z.string() })), steps: z.array(z.string()), }), }), prompt: "Generate a lasagna recipe.", }); console.log(object.recipe.name); console.log(object.recipe.ingredients); console.log(object.recipe.steps); console.log(usage); console.log(finishReason); copy You can ask for more specific things like enums, too. import { createDeepInfra } from "@ai-sdk/deepinfra"; import { generateObject } from "ai"; import { z } from "zod"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const { object, usage, finishReason } = await generateObject({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), output: "enum", enum: ["action", "comedy", "drama", "horror", "sci-fi"], prompt: "Classify the genre of this movie plot: " + '"A group of astronauts travel through a wormhole in search of a ' + 'new habitable planet for humanity."', }); console.log(object); console.log(usage); console.log(finishReason); copy ## Tool / Function calling Tool calling allows models to call external functions provided by the user, and use the results to generate a comprehensive response to the user query. They are very powerful. import { createDeepInfra } from "@ai-sdk/deepinfra"; import { generateText, tool } from "ai"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const result = await generateText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), tools: { weather: tool({ description: "Get the weather in a location", parameters: z.object({ location: z.string().describe("The location to get the weather for"), }), execute: async ({ location }) => ({ location, temperature: 72 + Math.floor(Math.random() * 21) - 10, }), }), }, prompt: "What is the weather in San Francisco?", maxSteps: 2, // without it a text response is not generated, only the tool response }); console.log(result.text); console.log(result.usage); console.log(result.finishReason); copy ## Conversations and tool calling Let's see how tool calling works when you are having a conversation import { createDeepInfra } from "@ai-sdk/deepinfra"; import { generateText, tool } from "ai"; const deepinfra = createDeepInfra({ apiKey: "$DEEPINFRA_TOKEN", }); const messages = [ { role: "user", content: "What is the weather in San Francisco?" }, ]; const first_result = await generateText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), tools: { weather: tool({ description: "Get the weather in a location", parameters: z.object({ location: z.string().describe("The location to get the weather for"), }), execute: async ({ location }) => ({ location, temperature: 72 + Math.floor(Math.random() * 21) - 10, }), }), }, messages: messages, maxSteps: 2, // without it a text response is not generated, only the tool response }); console.log(first_result.text); // Let's continue our conversation messages.push(...result.response.messages); messages.push({ role: "user", content: "Is this normal temperature for the summer?", }); const second_result = await generateText({ model: deepinfra("meta-llama/Llama-3.3-70B-Instruct-Turbo"), tools: { weather: tool({ description: "Get the weather in a location", parameters: z.object({ location: z.string().describe("The location to get the weather for"), }), execute: async ({ location }) => ({ location, temperature: 72 + Math.floor(Math.random() * 21) - 10, }), }), }, messages: messages, maxSteps: 2, }); console.log(second_result.text); copy [LlamaIndex](/docs/advanced/llama-index)[AutoGen](/docs/advanced/autogen) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) Featured Models [deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[Qwen/Qwen3-235B-A22B-Thinking-2507](/Qwen/Qwen3-235B-A22B-Thinking-2507)[sesame/csm-1b](/sesame/csm-1b)[deepseek- ai/DeepSeek-V3-0324](/deepseek-ai/DeepSeek-V3-0324)[deepseek- ai/DeepSeek-V3](/deepseek-ai/DeepSeek-V3) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/autogen We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. AutoGen 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. AutoGen 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # AutoGen AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. To learn more, please visit the [AutoGen](https://github.com/microsoft/autogen). #### AutoGen with DeepInfra endpoints # install autogen pip install pyautogen copy Here is how you can configure autogen to use DeepInfra endpoints. The `base_url` is `https://api.deepinfra.com/v1/openai`. You can use any model which is OpenAI compatible. For example, [meta-llama/Meta- Llama-3-70B-Instruct](/meta-llama/Meta-Llama-3-70B-Instruct) is a model that can be used to solve coding tasks. import autogen config_list = [ { "model": "meta-llama/Meta-Llama-3-70B-Instruct", "base_url": "https://api.deepinfra.com/v1/openai", "api_key": "" } ] llm_config={"config_list": config_list, "seed": 42} assistant = autogen.AssistantAgent("assistant", llm_config=llm_config) user_proxy = autogen.UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"}) user_proxy.initiate_chat(assistant, message="What time is it right now?") copy In the example, two agents converse and solve the task. The assistant agent provides a python code snippet which then gets executed on your local machine. In AutoGen, code execution is triggered automatically by the UserProxyAgent when it detects an executable code block in a received message. Here is the output of the above code: user_proxy (to assistant): What time is it now? -------------------------------------------------------------------------------- assistant (to user_proxy): To get the current time, you can use the `datetime` module in Python. Here's an example code: ```python import datetime current_time = datetime.datetime.now() print(current_time.strftime("%I:%M %p")) ``` This code will print the current time in a 12-hour format with the AM/PM designation. If you want to print the time in a 24-hour format, you can use the `%H:%M` format specifier instead of `%I:%M %p`. You can save this code in a file with a `.py` extension and run it in a terminal or command prompt to see the current time. Note: This code assumes that you have Python installed on your computer. If you don't have Python installed, you can download it from the official Python website. -------------------------------------------------------------------------------- Provide feedback to assistant. Press enter to skip and use auto-reply, or type 'exit' to end the conversation: >>>>>>>> NO HUMAN INPUT RECEIVED. >>>>>>>> USING AUTO REPLY... >>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)... execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change user_proxy (to assistant): exitcode: 0 (execution succeeded) Code output: 02:20 PM -------------------------------------------------------------------------------- assistant (to user_proxy): It looks like the code you provided ran successfully and returned the current time in a 12-hour format with the AM/PM designation. Here's the output: 02:20 PM If you have any other questions or need further assistance, feel free to ask! -------------------------------------------------------------------------------- Provide feedback to assistant. Press enter to skip and use auto-reply, or type 'exit' to end the conversation: exit copy [AI SDK](/docs/advanced/aisdk)[Okta SSO](/docs/advanced/okta) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6) Featured Models [Qwen/Qwen3-Coder-480B-A35B-Instruct](/Qwen/Qwen3-Coder-480B-A35B-Instruct)[meta- llama/Llama-3.3-70B-Instruct-Turbo](/meta-llama/Llama-3.3-70B-Instruct- Turbo)[MiniMaxAI/MiniMax-M2](/MiniMaxAI/MiniMax-M2)[Qwen/Qwen3-235B-A22B-Instruct-2507](/Qwen/Qwen3-235B-A22B-Instruct-2507)[deepseek- ai/DeepSeek-R1-Distill-Llama-70B](/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/custom_llms We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. Custom LLMs 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. Custom LLMs 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Custom LLMs You can now run a dedicated instance of your public or private LLM on DeepInfra infrastructure. ## Overview There are a number of benefits for running your own custom LLM instance: * predictable response times * auto-scaling support * run your own (finetuned or trained from scratch) model There are of course some drawbacks: * pricing is for GPU uptime It's important to understand that all our publicly available models, like [mixtral 8x7](/mistralai/Mixtral-8x7B-Instruct-v0.1), are shared among many users, and this lets us offer very competitive pricing as a result. When you run your own model, you get full access to the GPUs and pay per GPU/hours your model is up. So you have to have a sufficient load to justify this resource. ## Usage ### Creating a new deployment A deployment is a particular configuration of your custom models. It has fixed: * `model_name` \-- the name you'd use when doing inference (generation) * `gpu` type -- A100-80GB or H100-80GB supported now, expect more in the future * `num_gpus` \-- how many GPUs to use, bigger models require more GPUs (it should at least fit the weights and have some leftover for KV cache) * `max_batch_size` \-- how many requests to run in parallel (at most), other requests are queued up * weights -- currently Hugging Face is supported (including private repos) It also has a few settings that can be changed dynamically: * `min_instances` \-- how many copies of the model to run at a minumum * `max_instances` \-- up to how many copies to scale during times of higher load To create a new deployment, use the [the Web UI](/dash/deployments?new=custom- llm): [![Custom LLM Web UI](/blog/custom-llm-ui.webp)](/dash/deployments?new=custom- llm) Or, using the HTTP API: curl -X POST https://api.deepinfra.com/deploy/llm -d '{ "model_name": "test-model", "gpu": "A100-80GB", "num_gpus": 2, "max_batch_size": 64, "hf": { "repo": "meta-llama/Llama-2-7b-chat-hf" }, "settings": { "min_instances": 0, "max_instances": 1, } }' -H 'Content-Type: application/json' \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" copy The deploy can be monitored via HTTP, or the Web dashboard. Please note that the model full name is _github-username/model-name_. My github username is `ichernev`, so the model above will have full name `ichernev/test-model`. ### Using a deployment When you create a deployment, the name you specify is prefixed by your github username. So if I (ichernev) create a model `test-model`, it's full name will be `ichernev/test-model`. You can then use this name during inference, or the check the model web page: You can use your model via: * Web demo page: * HTTP APIs (check for details) * DeepInfra interface * OpenAI ChatCompletions API * OpenAI Completions API ### Updating a deployment Once a deployment is running, its scaling parameters can be updated via the deployment details page accessible from [Dashboard / Deployments](/dash/deployments). via HTTP: curl -X PUT https://api.deepinfra.com/deploy/DEPLOY_ID -d '{ "settings": { "min_instances": 2, "max_instances": 2, } }' -H 'Content-Type: application/json' \ -H "Authorization: Bearer YOUR_API_KEY" copy You'd need your `DEPLOY_ID`. It is returned on creation, but also available in Web Dashboard or via [HTTP API `/deploy/list`](https://api.deepinfra.com/docs#/default/deploy_list_deploy_list_get). ### Deleting a deployment When you want to permanently delete / discard a deployment, use: * the trash icon next to a deployment in [Dashboard / Deployments](/dash/deployments) * [DELETE /deploy/DEPLOY_ID](https://api.deepinfra.com/docs#/default/deploy_delete_deploy__deploy_id__delete) ## Limitations and Caveats * We're enforcing a limit of 4 GPUs per user maximum (4 instances x 1 GPU or 1 instance x 4 GPUs, for example). Contact us if you require more. * We try our best to satisfy all requests, but GPUs are a limited resource and sometimes there just isn't enough of it. This means that if try you upscale we might not be able to meet demand (say, you put `min_instances` == 3, but we can only run 2). You're only billed for what actually runs. The current numer of running instances is returned in the deploy object * Billing for Custom LLMs will happen weekly, in a separate invoice * Leaving a custom LLMs running (by mistake) can quickly rack up costs. For example if you forget to shutdown a custom LLM using 2 GPUs on Friday 5pm, and remember about it on Monday at 9am, that will cost you 256 USD (64h * 2 GPUs * 2 USD). Use spending limits in [payment settings](/dash/billing). * Quantization is currently not supported, work in progress [Custom Deployments](/docs/custom-deployments)[LoRA Adapter Models](/docs/advanced/lora) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp) Featured Models [MiniMaxAI/MiniMax-M2](/MiniMaxAI/MiniMax-M2)[hexgrad/Kokoro-82M](/hexgrad/Kokoro-82M)[mistralai/Mistral- Small-3.2-24B-Instruct-2506](/mistralai/Mistral- Small-3.2-24B-Instruct-2506)[google/gemma-3-12b-it](/google/gemma-3-12b-it)[meta- llama/Llama-3.3-70B-Instruct-Turbo](/meta-llama/Llama-3.3-70B-Instruct-Turbo) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/deprecated We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. Deprecated Models 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. Deprecated Models 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Deprecated models Due to the fast paced AI world, newer and better models are being released everyday. At the same time DeepInfra wants to ensure afforadble and high quality access to the latest and best AI models. As a result sometimes we have to make the difficult decision to deprecate a model. However, whenever this happens, you are going to have enough time to select another [one of our models](/models) that best fits your needs. First, you are going to know at least a week in advance when a model is going to be deprecated. Second, even after the deprecation date the API will continue to operate and your applications will continue to work. All inference requests will automatically be forwarded to another model. In addition, when a model is deprecated we are going to send an email to notify all recent users, including a date when it will happen. [Whisper (Speech to Text)](/docs/tutorials/whisper)[Miscellaneous](/docs/misc) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [deepseek-ai/DeepSeek-R1-0528-Turbo](/deepseek- ai/DeepSeek-R1-0528-Turbo)[google/gemini-2.5-flash](/google/gemini-2.5-flash)[Qwen/Qwen3-235B-A22B-Thinking-2507](/Qwen/Qwen3-235B-A22B-Thinking-2507)[mistralai/Mistral- Small-3.2-24B-Instruct-2506](/mistralai/Mistral- Small-3.2-24B-Instruct-2506)[mistralai/Voxtral- Mini-3B-2507](/mistralai/Voxtral-Mini-3B-2507) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/json_mode We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. JSON Mode 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. JSON Mode 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # JSON Mode In addition to responding in text, the DeepInfra API has an option to request that responsesbe returned in JSON format. [To learn more, read our blog](/blog/json-mode). We provide JSON mode both in our inference API as well as our OpenAI compatible API, supported by [a lot of our models](/models?q=json). ## Using JSON Mode Activating a JSON response in any of deepinfra's text APIs, including `/v1/inference`, `/v1/openai/completions` and `/v1/openai/chat/completions` is performed in the same way: adding a parameter `response_format` and setting its value to `{"type": "json_object"}` ### Example Let's go through some simple example of learning about scientific discoveries. This is how you set up our endpoint import openai import json client = openai.OpenAI( base_url="https://api.deepinfra.com/v1/openai", api_key="", ) copy Here is an example of using the openai chat API to invoke a model with JSON mode: messages = [ { "role": "user", "content": "Provide a JSON list of 3 famous scientific breakthroughs in the past century, all of the countries which contributed, and in what year." } ] response = client.chat.completions.create( model="mistralai/Mistral-7B-Instruct-v0.1", messages=messages, response_format={"type":"json_object"}, tool_choice="auto", ) The resulting `response.choices[0].message.content` will contain a string with JSON: { "breakthroughs": [ { "name": "Penicillin", "country": "UK", "year": 1928 }, { "name": "The Double Helix Structure of DNA", "country": "US", "year": 1953 }, { "name": "Artificial Heart", "country": "US", "year": 2008 } ] } copy ## Caveats and warnings It is highly recommended to prompt the model to produce JSON. While this is not strictly necessary, failing to prompt the model to produce JSON can occasionally produce nonsensical responses as the model may misunderstand your intent. For example, a model unaware it is producing JSON may mismatch a quote, leading to stray `:` characters appearing in strings, which while still technically valid JSON, may degrade the quality of the response. Currently, the API will not guarantee the resulting JSON object is complete at the end of a response. For example, if the model stops due to `length`, the JSON object in the response will be improperly terminated, for example in the middle of a string or object. ### A note about JSON and model alignment and accuracy As a big warning and caveat of JSON mode, JSON mode interferes with model's alignment, or "self-control". In particular, when forced to produce a JSON response, the model will be more likely to make up information rather than explain that it does not know, or it will be more likely to behave in ways that fall outside of its training, producing undesirable output rather than objecting. Let's take a really simple prompt: messages = [ { "role": "user", "content": "What is the weather in San Francisco?" } ] response = client.chat.completions.create( model="mistralai/Mistral-7B-Instruct-v0.1", messages=messages, tool_choice="auto", ) copy This prompt, using the default `"text"` `response_format` will give a reasonable canned response: " I don't have real-time updates or location tracking capabilities, so I can't provide current weather information for San Francisco. Please check a reliable weather website or app for this information." However, now let's add `response_format={"type": "json_object"}`. The model now merrily produces a made-up weather forecast with no objection: { "location":"San Francisco", "weather":[ { "timestamp":163856000, "description":"Mostly cloudy", "temperature":25, "feels_like":26.2, "humidity":80, "wind":{ "speed":4.7, "degrees":0 } } ] } copy Because this output format effectively overly constrains the model in such a way that it cannot produce alignment warnings, it instead responds with the most probable tokens, a wildly inaccurate guess of today's weather. [Function Calling](/docs/advanced/function_calling)[Multimodal Models](/docs/advanced/multimodal) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) Featured Models [deepseek-ai/DeepSeek-V3](/deepseek-ai/DeepSeek-V3)[deepseek- ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[google/gemma-3-27b-it](/google/gemma-3-27b-it)[PaddlePaddle/PaddleOCR- VL-0.9B](/PaddlePaddle/PaddleOCR-VL-0.9B)[mistralai/Mistral- Small-3.2-24B-Instruct-2506](/mistralai/Mistral-Small-3.2-24B-Instruct-2506) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/langchain We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. LangChain 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. LangChain 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # LangChain LangChain is a framework for developing applications powered by language models. To learn more, visit the [LangChain website](https://python.langchain.com/). We offer the following modules: * [Chat adapter](https://python.langchain.com/docs/integrations/chat/deepinfra) for most of [our LLMs](/models/text-generation) * [LLM adapter](https://python.langchain.com/docs/integrations/llms/deepinfra) for most of [our LLMs](/models/text-generation) * [Embeddings adapter](https://python.langchain.com/docs/integrations/text_embedding/deepinfra) for all of [our Embeddings models](/models/embeddings) # Install LangChain pip install langchain pip install langchain-community copy # LLM Examples The examples below show how to use LangChain with DeepInfra for language models. Make sure to get your API key from DeepInfra. You have to [Login](https://deepinfra.com/login?from=%2Fdash) and get your token. Please set `os.environ["DEEPINFRA_API_TOKEN"]` with your token. _Read comments in the code for better understanding._ import os from langchain_community.llms import DeepInfra from langchain.prompts import PromptTemplate from langchain.chains import LLMChain # Make sure to get your API key from DeepInfra. You have to Login and get a new token. os.environ["DEEPINFRA_API_TOKEN"] = '' # Create the DeepInfra instance. You can view a list of available parameters in the model page llm = DeepInfra(model_id="meta-llama/Meta-Llama-3-8B-Instruct") llm.model_kwargs = { "temperature": 0.7, "repetition_penalty": 1.2, "max_new_tokens": 250, "top_p": 0.9, } def example1(): # run inference print(llm.invoke("Who let the dogs out?")) def example2(): # run streaming inference for chunk in llm.stream("Who let the dogs out?"): print(chunk) def example3(): # create a prompt template for Question and Answer template = """Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate(template=template, input_variables=["question"]) # initiate the chain llm_chain = prompt | llm # provide a question and run the LLMChain question = "Can penguins reach the North pole?" print(llm_chain.invoke(question)) # run examples example1() copy ## Chat Examples Ensure the `DEEPINFRA_API_KEY` env is set to your api key. import os # or pass deepinfra_api_token parameter to the ChatDeepInfra constructor os.environ["DEEPINFRA_API_TOKEN"] = DEEPINFRA_API_TOKEN from langchain_community.chat_models import ChatDeepInfra from langchain_core.messages import HumanMessage from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler messages = [ HumanMessage( content="Translate this sentence from English to French. I love programming." ) ] def example_sync(): chat = ChatDeepInfra(model="meta-llama/Meta-Llama-3-8B-Instruct") print(chat.invoke(messages)) async def example_async(): chat = ChatDeepInfra(model="meta-llama/Meta-Llama-3-8B-Instruct") await chat.agenerate([messages]) def example_stream(): chat = ChatDeepInfra( streaming=True, verbose=True, callbacks=[StreamingStdOutCallbackHandler()], ) print(chat.invoke(messages)) copy ## Embeddings import os os.environ["DEEPINFRA_API_TOKEN"] = DEEPINFRA_API_TOKEN from langchain_community.embeddings import DeepInfraEmbeddings embeddings = DeepInfraEmbeddings( model_id="sentence-transformers/clip-ViT-B-32", query_instruction="", embed_instruction="", ) docs = ["Dog is not a cat", "Beta is the second letter of Greek alphabet"] document_result = embeddings.embed_documents(docs) print(document_result) copy [Integrations](/docs/integrations)[LlamaIndex](/docs/advanced/llama-index) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp) Featured Models [anthropic/claude-4-sonnet](/anthropic/claude-4-sonnet)[meta- llama/Llama-4-Scout-17B-16E-Instruct](/meta- llama/Llama-4-Scout-17B-16E-Instruct)[nvidia/Nemotron-3-Nano-30B-A3B](/nvidia/Nemotron-3-Nano-30B-A3B)[anthropic/claude-4-opus](/anthropic/claude-4-opus)[deepseek- ai/DeepSeek-R1-Distill-Llama-70B](/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/llama-index We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. LlamaIndex 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. LlamaIndex 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # LlamaIndex [LlamaIndex](https://www.llamaindex.ai) is a popular data framework for LLM applications. And now it works with DeepInfra. ## Large Language Models (LLMs) ### Installation First, install the necessary package: pip install llama-index-llms-deepinfra copy ### Initialization Set up the `DeepInfraLLM` class with your API key and desired parameters: from llama_index.llms.deepinfra import DeepInfraLLM import asyncio llm = DeepInfraLLM( model="mistralai/Mixtral-8x22B-Instruct-v0.1", # Default model name api_key="$DEEPINFRA_TOKEN", # Replace with your DeepInfra API key temperature=0.5, max_tokens=50, additional_kwargs={"top_p": 0.9}, ) copy ### Synchronous Complete Generate a text completion synchronously using the `complete` method: response = llm.complete("Hello World!") print(response.text) copy ### Synchronous Stream Complete Generate a streaming text completion synchronously using the `stream_complete` method: content = "" for completion in llm.stream_complete("Once upon a time"): content += completion.delta print(completion.delta, end="") copy ### Synchronous Chat Generate a chat response synchronously using the `chat` method: from llama_index.core.base.llms.types import ChatMessage messages = [ ChatMessage(role="user", content="Tell me a joke."), ] chat_response = llm.chat(messages) print(chat_response.message.content) copy ### Synchronous Stream Chat Generate a streaming chat response synchronously using the `stream_chat` method: messages = [ ChatMessage(role="system", content="You are a helpful assistant."), ChatMessage(role="user", content="Tell me a story."), ] content = "" for chat_response in llm.stream_chat(messages): content += chat_response.delta print(chat_response.delta, end="") copy ### Asynchronous Complete Generate a text completion asynchronously using the `acomplete` method: async def async_complete(): response = await llm.acomplete("Hello Async World!") print(response.text) asyncio.run(async_complete()) copy ### Asynchronous Stream Complete Generate a streaming text completion asynchronously using the `astream_complete` method: async def async_stream_complete(): content = "" response = await llm.astream_complete("Once upon an async time") async for completion in response: content += completion.delta print(completion.delta, end="") asyncio.run(async_stream_complete()) copy ### Asynchronous Chat Generate a chat response asynchronously using the `achat` method: async def async_chat(): messages = [ ChatMessage(role="user", content="Tell me an async joke."), ] chat_response = await llm.achat(messages) print(chat_response.message.content) asyncio.run(async_chat()) copy ### Asynchronous Stream Chat Generate a streaming chat response asynchronously using the `astream_chat` method: async def async_stream_chat(): messages = [ ChatMessage(role="system", content="You are a helpful assistant."), ChatMessage(role="user", content="Tell me an async story."), ] content = "" response = await llm.astream_chat(messages) async for chat_response in response: content += chat_response.delta print(chat_response.delta, end="") asyncio.run(async_stream_chat()) copy ## Embeddings [LlamaIndex](https://www.llamaindex.ai) can also work with DeepInfra [embeddings models](/models/embeddings) to get embeddings for your text data. ### Installation pip install llama-index llama-index-embeddings-deepinfra copy ### Initialization from dotenv import load_dotenv, find_dotenv from llama_index.embeddings.deepinfra import DeepInfraEmbeddingModel _ = load_dotenv(find_dotenv()) model = DeepInfraEmbeddingModel( model_id="BAAI/bge-large-en-v1.5", # Use custom model ID api_token="YOUR_API_TOKEN", # Optionally provide token here normalize=True, # Optional normalization text_prefix="text: ", # Optional text prefix query_prefix="query: ", # Optional query prefix ) copy ### Synchronous Requests #### Get Text Embedding response = model.get_text_embedding("hello world") print(response) copy #### Batch Requests texts = ["hello world", "goodbye world"] response_batch = model.get_text_embedding_batch(texts) print(response_batch) copy #### Query Requests query_response = model.get_query_embedding("hello world") print(query_response) copy ### Asynchronous Requests #### Get Text Embedding async def main(): text = "hello world" async_response = await model.aget_text_embedding(text) print(async_response) if __name__ == "__main__": import asyncio asyncio.run(main()) copy [LangChain](/docs/advanced/langchain)[AI SDK](/docs/advanced/aisdk) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [MiniMaxAI/MiniMax-M2](/MiniMaxAI/MiniMax-M2)[openai/gpt-oss-20b](/openai/gpt- oss-20b)[anthropic/claude-4-opus](/anthropic/claude-4-opus)[meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8](/meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8)[openai/gpt- oss-120b](/openai/gpt-oss-120b) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/log_probs We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. Log Probabilities 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. Log Probabilities 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Returning Log probabilities In some cases you might want to get the log probabilities for each token generated by our LLM streaming API. By default, our streaming API returns the generated tokens on by one and the log probabilities are attached to each token. ## Example Here is quick example. curl -X POST \ -d '{"input": "I have this dream", "stream": true}' \ -H "Authorization: bearer YOUR_API_KEY" \ -H 'Content-Type: application/json' \ 'https://api.deepinfra.com/v1/inference/meta-llama/Llama-2-7b-chat-hf' data: {"token": {"id": 29892, "text": ",", "logprob": -2.65625, "special": false}, "generated_text": null, "details": null} data: {"token": {"id": 988, "text": " where", "logprob": -0.39575195, "special": false}, "generated_text": null, "details": null} data: {"token": {"id": 1432, "text": " every", "logprob": -3.15625, "special": false}, "generated_text": null, "details": null} data: {"token": {"id": 931, "text": " time", "logprob": -0.1385498, "special": false}, "generated_text": null, "details": null} copy The `logprob` field is the log probability of the token generated. Log probabilities are currently not returned for the non-streaming API, or out OpenAPI compatible API. [Multimodal Models](/docs/advanced/multimodal)[Max Output Tokens](/docs/advanced/max_tokens_limit) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6) Featured Models [openai/gpt-oss-120b](/openai/gpt- oss-120b)[Qwen/Qwen3-235B-A22B-Thinking-2507](/Qwen/Qwen3-235B-A22B-Thinking-2507)[mistralai/Voxtral- Mini-3B-2507](/mistralai/Voxtral-Mini-3B-2507)[meta-llama/Llama- Guard-4-12B](/meta-llama/Llama- Guard-4-12B)[Qwen/Qwen3-235B-A22B-Instruct-2507](/Qwen/Qwen3-235B-A22B-Instruct-2507) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/lora We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. LoRA Adapter Models 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. LoRA Adapter Models 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Deploying LoRA adapter model ### How to deploy LoRA adapter model 1. Navigate to the dashboard 2. Click on the 'New Deployment' button 3. Click on the 'LoRA Model' tab 4. Fill the form: * **LoRA model name** : model name used to reference the deployment * **Hugging Face Model Name** : Hugging Face model name * **Hugging Face Token** : (optional) Hugging Face token if the LoRA adapter model is private ### To use LoRA adapter model, you need 1. LoRA adapter model hosted on Hugging Face 2. Base model that supports LoRA adapter at DeepInfra (you can see the list of supported base models in upload lora form) 3. Hugging Face token if the LoRA adapter model is private 4. DeepInfra account, and DeepInfra API key ### Example flow: Prerequisites: 1. askardeepinfra/llama-3.1-8B-rank-32-example-lora 2. The base model is meta-llama/Meta-Llama-3.1-8B-Instruct which is supported at DeepInfra 3. The LoRA adapter model is public, so no need for Hugging Face token 4. DeepInfra API key is generated from page Then I'm gonna deploy the model: 1. Navigate to the dashboard 2. Click on the 'New Deployment' button 3. Click on the 'LoRA Model' tab 4. Fill the form: * **LoRA model name** : asdf/lora-example * **Hugging Face Model Name** : askardeepinfra/llama-3.1-8B-rank-32-example-lora 5. Click on the 'Upload' button Now the deployment should appear in page, with a name asdf/lora-example. Initially the state is "Initializing", after a while it should become "Deploying" and then "Running". Once the state is "Running", you can use the model. Navigate to where you can find all the information about the the model including: 1. Pricing 2. Precision 3. Demo page, where you can test the model 4. API reference, where you can find information how to inference the model using REST API I'll leave example of inference with curl below: curl "https://api.deepinfra.com/v1/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $DEEPINFRA_API_KEY" \ -d '{ "model": "asdf/lora-example", "messages": [ { "role": "user", "content": "Hello!" } ] }' copy [Custom LLMs](/docs/advanced/custom_llms)[LoRA Image Adapters](/docs/advanced/lora_text_to_image) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [openai/whisper-large-v3-turbo](/openai/whisper- large-v3-turbo)[Qwen/Qwen3-32B](/Qwen/Qwen3-32B)[sesame/csm-1b](/sesame/csm-1b)[mistralai/Voxtral- Mini-3B-2507](/mistralai/Voxtral- Mini-3B-2507)[nvidia/Nemotron-3-Nano-30B-A3B](/nvidia/Nemotron-3-Nano-30B-A3B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/max_tokens_limit We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. Max Output Tokens 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. Max Output Tokens 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Max Output Tokens Limit The DeepInfra API has a maximum output token limit of 16384 tokens per request. This limit is in place to ensure efficient processing and prevent excessive response sizes. However, we understand that some use cases may require longer responses. In this documentation, we will explain how to work within this limit and how to continue responses beyond the maximum token limit. ### Understanding Max Output Tokens Limit The max tokens limit is the maximum number of tokens that can be generated in a single response. Tokens are the basic units of text, such as words or characters, that are used to construct the response. The 16384 token limit is sufficient for most use cases, but if you need to generate longer responses, you can use a technique called "response continuation" to continue the response beyond the limit. ### Continuing Responses Beyond the Limit To continue a response beyond the max tokens limit, you can send a new request with the previous response as the input. This will allow the model to pick up where it left off and generate the next part of the response. Here's an example of how to do this using the curl command: curl "https://api.deepinfra.com/v1/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $API_KEY" \ -d '{ "model": "deepseek-ai/DeepSeek-R1", "messages": [ { "role": "user", "content": "Hello!" }, { "role": "assistant", "content": "**\**\****\n\n**\**\****\n\nHello" } ], "max_tokens": 5 }' copy In this example, the previous response is passed as the content of the assistant message, and the max_tokens parameter is set to 5. The model will then generate the next 5 tokens of the response, which can be used as the input for the next request. If you have any questions or concerns about the max tokens limit, please don't hesitate to contact us vai [feedback@deepinfra.com](mailto:feedback@deepinfra.com). We're always here to help. ### Limitations of Response Continuations The response continuations technique can't help with generating responses that exceed the total context size of the model. You'll get 400 error once you exceed the total context size of the model. [Log Probabilities](/docs/advanced/log_probs)[GPU Instances](/docs/gpu- instances) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6) Featured Models [moonshotai/Kimi-K2-Thinking](/moonshotai/Kimi-K2-Thinking)[google/gemma-3-27b-it](/google/gemma-3-27b-it)[Qwen/Qwen3-32B](/Qwen/Qwen3-32B)[google/gemini-2.5-pro](/google/gemini-2.5-pro)[deepseek- ai/DeepSeek-R1-Distill-Llama-70B](/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/multimodal We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. Multimodal Models 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. Multimodal Models 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Using Multimodal models on DeepInfra DeepInfra hosts multimodal models that support vision and language models combined. These models can take both images and text as input and provide text as output. Currently, we host: * [meta-llama/Llama-3.2-90B-Vision-Instruct](/meta-llama/Llama-3.2-90B-Vision-Instruct) * [meta-llama/Llama-3.2-11B-Vision-Instruct](/meta-llama/Llama-3.2-11B-Vision-Instruct) * [Qwen/QVQ-72B-Preview](/Qwen/QVQ-72B-Preview) ## Quick start Let's consider this image: ![Example image](https://shared.deepinfra.com/models/llava- hf/llava-1.5-7b-hf/cover_image.ed4fba7a25b147e7fe6675e9f760585e11274e8ee72596e6412447260493cd4f-s600.webp) If you ask `What’s in this image?` The model will answer something like this In this image, a large, colorful animal, possibly a llama, is standing alone in a barren, red and orange landscape, close to a large volcano. The setting appears to be an artistic painting, possibly inspired by South American culture or a fantasy world with volcanoes. The llama is situated at the center of the scene, drawing attention to the contrasting colors and the fiery backdrop of the volcano. The overall atmosphere of the image suggests a sense of danger and mystery amidst the volcanic landscape. copy Images are passed to the model in two ways: 1. by passing link to the image (e.g. ) 2. by passing base64 encoded image directly in the request Here is an example of the request. curl "https://api.deepinfra.com/v1/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" \ -d '{ "model": "meta-llama/Llama-3.2-90B-Vision-Instruct", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://shared.deepinfra.com/models/llava-hf/llava-1.5-7b-hf/cover_image.ed4fba7a25b147e7fe6675e9f760585e11274e8ee72596e6412447260493cd4f-s600.webp" } }, { "type": "text", "text": "What’s in this image?" } ] } ] }' copy ## Example of uploading base64 encoded image Uploading images using base64 is convenient when you have images available locally. Here is an example for it: from openai import OpenAI import base64 import requests # Create an OpenAI client with your deepinfra token and endpoint openai = OpenAI( api_key="", base_url="https://api.deepinfra.com/v1/openai", ) image_url = "https://shared.deepinfra.com/models/llava-hf/llava-1.5-7b-hf/cover_image.ed4fba7a25b147e7fe6675e9f760585e11274e8ee72596e6412447260493cd4f-s600.webp" base64_image = base64.b64encode(requests.get(image_url).content).decode("utf-8") chat_completion = openai.chat.completions.create( model="meta-llama/Llama-3.2-90B-Vision-Instruct", messages=[ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" } }, { "type": "text", "text": "What’s in this image?" } ] } ] ) print(chat_completion.choices[0].message.content) copy ## Passing multiple images API allows to pass multiple images too. curl "https://api.deepinfra.com/v1/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" \ -d '{ "model": "meta-llama/Llama-3.2-90B-Vision-Instruct", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://shared.deepinfra.com/models/llava-hf/llava-1.5-7b-hf/cover_image.ed4fba7a25b147e7fe6675e9f760585e11274e8ee72596e6412447260493cd4f-s600.webp" } }, { "type": "image_url", "image_url": { "url": "https://shared.deepinfra.com/models/meta-llama/Llama-2-7b-chat-hf/cover_image.10373e7a429dd725e0eb9e57cd20aeb815426c077217b27d9aedce37bd5c2173-s600.webp" } }, { "type": "text", "text": "What’s in this image?" } ] } ] }' copy ## Calculating costs Images are tokenized and passed to the model as input. The number of tokens consumed by an image is reported in the response under `"usage":{"prompt_tokens": ,...}`. Different models work with different image resolutions. You can still pass images of different resolutions, the model will rescale them automatically. Read the documentation of the model to know the supported image resolutions. ## Limitations and Caveats * Supported image types are: jpg, png, and webp. * Images must be smaller than 20MB * Currently, we don't support passing image fidelity with `detail` argument. [JSON Mode](/docs/advanced/json_mode)[Log Probabilities](/docs/advanced/log_probs) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) Featured Models [meta-llama/Llama-4-Scout-17B-16E-Instruct](/meta- llama/Llama-4-Scout-17B-16E-Instruct)[mistralai/Mistral- Small-3.2-24B-Instruct-2506](/mistralai/Mistral- Small-3.2-24B-Instruct-2506)[meta-llama/Llama-3.3-70B-Instruct-Turbo](/meta- llama/Llama-3.3-70B-Instruct- Turbo)[Qwen/Qwen3-Next-80B-A3B-Instruct](/Qwen/Qwen3-Next-80B-A3B-Instruct)[meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8](/meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/okta We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. Okta SSO 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. Okta SSO 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Okta SSO ## Contents * Supported features * Configuration steps * SP Initiated SSO * Notes ## Supported Features * Single Sign-On (OpenID Connect) initiated via Okta * Single Sign-On (OpenID Connect) initiated via DeepInfra * Automatically creates user accounts in DeepInfra on first sign in ## Configuration Steps * Install the DeepInfra application in your Okta instance * Fill in the configuration options: * **Team ID** \-- your okta subdomain is a great starting point. If you need multiple disjoint teams in the same okta instance a.k.a. multi-tenancy, you can use `subdomain-group`, for the **Team ID**. Lowercase only, starting with subdomain, dashes for separators. * **Use Stage** \-- leave this blank * Assign the users or groups that should be able to log into DeepInfra * Go to the DeepInfra App (inside Okta) → Sign On tab and take note of the **Client ID** and **Client Secret**. * For the **Issuer** (normally your okta domain): there should be a section that has a link titled _OpenID Provider Metadata_. Click this link. In the JSON document shown, look for a key titled “issuer” and copy the URL-value * Send an email to [feedback@deepinfra.com](mailto:feedback@deepinfra.com) that you'd like to setup Okta SSO, including: * Team ID * Issuer * Client ID * Client Secret * Admin email -- the email address of the user, who will be admin of the team * After the setup is complete the users can start signing in: * via okta (from dashboard) * via deepinfra's [sso login](/login_sso), where they need to enter the **Team ID** * The user whose email matches the **Admin email** specified in the email will become team admin on first login ## SP-initiated SSO The sign-in process is initiated from DeepInfra. 1. From your browser, navigate to the [deepinfra login page](https://deepinfra.com/login). 2. Click on `Corporate SSO` button. 3. Enter your **Team ID** and click `SSO Login` 4. Enter your Okta credentials (your email and password) and click "Sign in with Okta". If your credentials are valid, you are redirected to the DeepInfra dashboard. From there you can click on `Team` to see yourself and the other team members. ## Notes * admin can change team member roles (currently toggle between member and admin) * admin has access to billing dashboard * all team members have acess to the same api-tokens and models * if you're interested in single-user-experience -- i.e each person having his own tokens and models, let us know! [AutoGen](/docs/advanced/autogen)[Tutorials & Examples](/docs/tutorials) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet-latest)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1) Featured Models [deepseek-ai/DeepSeek-OCR](/deepseek-ai/DeepSeek- OCR)[hexgrad/Kokoro-82M](/hexgrad/Kokoro-82M)[meta- llama/Llama-3.3-70B-Instruct-Turbo](/meta-llama/Llama-3.3-70B-Instruct- Turbo)[Qwen/Qwen3-Coder-480B-A35B-Instruct- Turbo](/Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo)[deepseek- ai/DeepSeek-R1-Distill-Llama-70B](/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/rate-limits We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. Rate Limits 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. Rate Limits 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Rate Limits ## 200 concurrent requests By default every account has 200 concurrent requests limit per model. If you are querying two different models simultaneously you will be able to handle a total of 400 concurrent requests, 200 for each. We've observed that this is plenty even for applications and serivces with hundreds of thousands of daily active users. For large processing batch jobs, like doing embeddings on a knowledge base, you can use something like [token bucket rate limiting algorithm](https://en.wikipedia.org/wiki/Token_bucket) to keep under 200 concurrent requests. You will still be able to finish you work in a reasonable amount of time. If you need more just let us know why and depending on your case we might raise it. You can request rate limit increase in your [dashboard](/dash/account) ## Understanding Concurrent Requests A concurrent requests limit is the maximum number of requests processed simultaneously. To illustrate how concurrent requests work, let's consider an example: Imagine your application is making requests to our system and has reached the 200 concurrent request limit. Suddenly, 10 of those requests are completed, freeing up 10 slots. Your application can now send 10 new requests to our system, which will then be processed concurrently with the remaining 190 requests. This means that even though you've reached the concurrent request limit, your application can still continue to send new requests as old ones are completed. The rate at which you can send new requests depends on how long each request takes to process. The actual number of requests per minute (RPM) varies based on the duration of each request. Here are some examples: Avg Request Duration| Limit| Approximate RPM ---|---|--- 1 second| 200| 12000 RPM (200 concurrent requests x 60 seconds / 1 second per request) 10 seconds| 200| 1200 RPM (200 concurrent requests x 60 seconds / 10 seconds per request) 60 seconds| 200| 200 RPM (200 concurrent requests x 60 seconds / 60 seconds per request) ## Purpose of rate limits Rate limits are established protocols designed to prevent abuse or misuse of the API. They ensure fair and consistent access to the API for all users while maintaining reliable performance. ## How do you check for rate limits? You will be getting the HTTP **429** response status code with **Rate limited** message. Actions to take: * retry in a bit * or slow down your requests * or apply for increase by contacting us Note: sometimes you might get **429** errors when the model gets too busy. Typically, the auto-scaling logic will kick in. So if you retry in just a bit, it should get resolved. [DeepInfra Native API](/docs/deep_infra_api)[Webhooks](/docs/advanced/webhooks) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp) Featured Models [meta-llama/Llama-4-Scout-17B-16E-Instruct](/meta- llama/Llama-4-Scout-17B-16E-Instruct)[ResembleAI/chatterbox- turbo](/ResembleAI/chatterbox- turbo)[nvidia/Nemotron-3-Nano-30B-A3B](/nvidia/Nemotron-3-Nano-30B-A3B)[deepseek- ai/DeepSeek-V3.2](/deepseek-ai/DeepSeek-V3.2)[openai/whisper- large-v3-turbo](/openai/whisper-large-v3-turbo) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/scoped_jwt We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. Authentication & Tokens 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. Authentication & Tokens 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Scoped JWT authentication ## Contents * Overview * Format * Header * Payload * Signature * Token * Usage ## Overview Scoped JWT authentication allows you to create scope-limited tokens for accessing DeepInfra inference API endpoints. For example, you can issue a scoped JWT and give it to a third party that you provide services to. That third party can now directly do inference using the JWT, but limited to your specification. You don't need to share your API key with that party or to proxy their requests. Scoped JWT tokens are associated with an API key, and they let you specify expiration, allowed models and spending limit. Inference usage done with a scoped JWT will be counted towards the API key that was used for signing that token. ## Simple Usage You can create JWT tokens with a POST to /v1/scoped-jwt: curl -X POST "https://api.deepinfra.com/v1/scoped-jwt" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $DEEPINFRA_API_KEY" \ -d '{ "api_key_name": "auto", "models": ["deepseek-ai/DeepSeek-R1"], "expires_delta": 3600, "spending_limit": 1.0 }' {"token":"jwt:eyJhbGciOiJIUzI1NiIsImtpZCxxxxxxxxxxxxxxxxxx"} copy This creates a JWT token associated with api key `auto`, limited to deepseek-r1, expiring in 1 hour, with spending limit 1.00 USD. You can skip `models` (allow any model), `expires_delta` (no expiration -- ATM that means 1 year) and `spending_limit` (no spending limit). Also you can provide `expires_at` (unix TS) instead of `expires_delta`. You can also check (decode) the JWT token via GET to /v1/scoped-jwt (make sure the token used is the same as the encoding token). curl "https://api.deepinfra.com/v1/scoped-jwt?jwtoken=XXXX" \ -H "Authorization: Bearer $DEEPINFRA_API_KEY" { "expires_at": 1738843515, "models": [ "deepseek-ai/DeepSeek-R1" ], "spending_limit": 1 } copy ## Usage Once issued, the scoped JWT can be used in all inference endpoints in place of an API key, but only if the restrictions are met (models allowed, before expiration date, before money limit is exhausted). curl "https://api.deepinfra.com/v1/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $SCOPED_JWT" \ -d '{ "model": "deepseek-ai/DeepSeek-R1", "messages": [ { "role": "user", "content": "Hello!" } ] }' copy ## Format You can also create and inspect Scoped JWT tokens yourself, here is a detailed explanation on how they are formed. The generral idea is that the payload encodes the restrictions and the signature is based on the API key used. ### Header For the standard `alg` field we only accept `HS256` value (HMAC-SHA256). This is the algorithm you should use to produce the signature. The `kid` field stores your key id. It is formed from your DeepInfra id and the Base64 encoding of the name of the API key you use for signing. These two parts are concatenate with a colon separator. In the example bellow we specify a user with id `di:1000000000000` with an API key named `auto`, which when Base64 encoded becomes `YXV0bw==`. Then we concatenate the two with a colon to get the key id `di:1000000000000:YXV0bw==`. { "alg": "HS256", "kid": "di:1000000000000:YXV0bw==", "typ": "JWT" } copy ### Payload The `sub` field specifies again your user id. The `model` field specifies which model the token is will be allows to access. The `exp` specifies an expiration UTC timestamp in seconds, that can point to no later than week from the moment of issuing the token. { "sub": "di:1000000000000", "model": "deepseek-ai/DeepSeek-R1", "exp": 1734616903 } copy ### Signature Employ the standard way of calculating the JWT signature, using your chosen API key as a secret. We support only the HMAC-SHA256 algorithm. HMAC_SHA256( api_key, base64urlEncoding(header) + '.' + base64urlEncoding(payload) ) ### Token Finally, encode the the three parts and concatenate them with the period separator to form the token. scoped_jwt = 'jwt:' + base64urlEncoding(header) + '.' + base64urlEncoding(payload) + '.' + base64urlEncoding(signature) [Webhooks](/docs/advanced/webhooks)[Model Features](/docs/model-features) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp) Featured Models [MiniMaxAI/MiniMax-M2](/MiniMaxAI/MiniMax-M2)[meta-llama/Llama- Guard-4-12B](/meta-llama/Llama-Guard-4-12B)[openai/whisper- large-v3-turbo](/openai/whisper- large-v3-turbo)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[deepseek- ai/DeepSeek-V3-0324](/deepseek-ai/DeepSeek-V3-0324) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/advanced/webhooks We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. Webhooks 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. Webhooks 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Webhooks Webhooks are an exclusive feature of the DeepInfra API. They don't work with the OpenAI API. Webhooks deliver inference results and notify you about inference errors. Using them is simple. You just supply the optional webhook param like in the following examples Here is an example with text generation. JavaScriptbash import { TextGeneration } from "deepinfra"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL_URL = 'https://api.deepinfra.com/v1/inference/meta-llama/Meta-Llama-3-8B-Instruct'; async function main() { const client = new TextGeneration(MODEL_URL, DEEPINFRA_API_KEY); const res = await client.generate({ "input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", "stop": [ "<|eot_id|>" ], "webhook": "https://your-app.com/deepinfra-webhook" }); console.log(res.inference_status.status); // queued } main(); copy Here is another example with embeddings. JavaScriptbash import { Embeddings } from "deepinfra"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "BAAI/bge-large-en-v1.5"; const main = async () => { const client = new Embeddings(MODEL, DEEPINFRA_API_KEY); const body = { inputs: [ "I like chocolate", ], webhook: "https://your-app.com/deepinfra-webhook", }; const output = await client.generate(body); console.log(output.inference_status.status); // queued }; main(); copy When you provide a webhook the API server will respond with a **queued** status and will call the webhook with the actual result. Delivered response will contain inference result, cost estimate and runtime and/or an error in a JSON body. It is the same JSON response that you get in a regular inference calls. { "request_id": "R7X9fdlIaF5GlVisBAi5xR3E", "inference_status": { "status": "succeeded", "runtime_ms": 228, "cost": 0.0001140000022132881 }, "results": {...} } copy Errors will have the following format { "request_id": "RHNShFanUP5ExA8rzgyDWH88", "inference_status": { "status": "failed", "runtime_ms": 0, "cost": 0.0 } } copy We will make a few attempts if your webhook endpoint returns 400+ status. [Rate Limits](/docs/advanced/rate-limits)[Authentication & Tokens](/docs/advanced/scoped_jwt) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet-latest)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) Featured Models [google/gemini-2.5-flash](/google/gemini-2.5-flash)[anthropic/claude-4-sonnet](/anthropic/claude-4-sonnet)[deepseek- ai/DeepSeek-OCR](/deepseek-ai/DeepSeek- OCR)[nvidia/Nemotron-3-Nano-30B-A3B](/nvidia/Nemotron-3-Nano-30B-A3B)[meta- llama/Llama-3.3-70B-Instruct-Turbo](/meta-llama/Llama-3.3-70B-Instruct-Turbo) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/api-reference We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. API Reference 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. API Reference 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # API Reference DeepInfra provides multiple API options to interact with our models: * [OpenAI-Compatible API](/docs/openai_api) \- Use the familiar OpenAI API format with DeepInfra models * [DeepInfra Native API](/docs/deep_infra_api) \- Our native API for maximum flexibility * [Rate Limits](/docs/advanced/rate-limits) \- Understanding API usage limits * [Webhooks](/docs/advanced/webhooks) \- Event notifications for your applications * [Authentication & Tokens](/docs/advanced/scoped_jwt) \- Secure your API requests [Data Privacy & Security](/docs/data)[OpenAI-Compatible API](/docs/openai_api) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) Featured Models [deepseek-ai/DeepSeek-V3-0324](/deepseek- ai/DeepSeek-V3-0324)[google/gemma-3-4b-it](/google/gemma-3-4b-it)[deepseek- ai/DeepSeek-V3.2](/deepseek- ai/DeepSeek-V3.2)[sesame/csm-1b](/sesame/csm-1b)[Qwen/Qwen3-Coder-480B-A35B-Instruct- Turbo](/Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/custom-deployments We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. Custom Deployments 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. Custom Deployments 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Custom Deployments DeepInfra allows you to deploy and customize your own models: * [Custom LLMs](/docs/advanced/custom_llms) \- Deploy your own language models * [LoRA Adapter Models](/docs/advanced/lora) \- Fine-tune models with LoRA adapters * [LoRA Image Adapters](/docs/advanced/lora_text_to_image) \- Customize image generation models [Containers](/docs/gpu-instances/containers)[Custom LLMs](/docs/advanced/custom_llms) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet- latest)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1) Featured Models [anthropic/claude-4-opus](/anthropic/claude-4-opus)[deepseek- ai/DeepSeek-R1-Distill-Llama-70B](/deepseek-ai/DeepSeek-R1-Distill- Llama-70B)[google/gemini-2.5-pro](/google/gemini-2.5-pro)[deepseek- ai/DeepSeek-R1-0528](/deepseek-ai/DeepSeek-R1-0528)[openai/whisper- large-v3-turbo](/openai/whisper-large-v3-turbo) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/data We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. Data Privacy & Security 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. Data Privacy & Security 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Data privacy during Inference DeepInfra offers simple, scalable and cost-effective inference APIs. The goal of this document is to explain how we handle data during inference when you use the DeepInfra APIs. When we mention a third party model, this means when you access that model through the DeepInfra APIs. ### Data Privacy When using DeepInfra inference APIs, you can be sure that your data is safe. We do not store on disk the data you submit to our APIs. We only store it in memory during the inference process. Once the inference is done the data is deleted from memory. We also don't store the output of the inference process. Once the inference is done the output is sent back to you and then deleted from memory. Exceptions to these rules are outputs of Image Generation models which are stored for easy access for a short period of time. If you opt to use Google model, Google will store the output as outlined in their [Privacy Notice](https://cloud.google.com/terms/cloud-privacy-notice). If you opt to use the Anthropic model, Anthropic will store the output as outlined in their [Trust Center](https://trust.anthropic.com/). ### Bulk Inference APIs When using our bulk inference APIs, you can submit multiple requests in a single API call. This is useful when you have a large number of requests to make. In this case we need to store the data for longer period of time, and we might store it on disk in encrypted form. Once the inference is done and the output is returned to you, the data is deleted from disk and memory after a short period of time. ### No Training Except for when you use the Google or Anthropic models, we do not use data for training our models. We do not store it on disk or use it for any other purpose than the inference process. When using the Google or Anthropic models, the data you submit is subject to the receiving company’s training policy. ### No Sharing Except for when you use the Google or Anthropic models, we do not share the data you submit to our APIs with any third party. When using the Google or Anthropic models, we are required to transfer the data you submit to the company’s endpoints to facilitate the request. ### Logs We generally don't log the data you submit to our APIs. We only log the metadata that might be useful for debugging purposes, like the request ID, the cost of the inference, the sampling parameters. We reserve the right to look at and log a small portions of requests when necessary for debugging or security purposes. When using the Google model, Google logs prompts and responses for a limited period of time, solely for the purpose of detecting violations of their [Prohibited Use Policy](https://policies.google.com/terms/generative-ai/use- policy). Personal information and data you provide when using certain models through our API may be shared with the relevant API endpoints, as specified at the time of use. [Running Inference](/docs/inference)[API Reference](/docs/api-reference) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [zai-org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet- latest)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) Featured Models [meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8](/meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8)[ResembleAI/chatterbox- turbo](/ResembleAI/chatterbox-turbo)[openai/gpt-oss-120b](/openai/gpt- oss-120b)[google/gemma-3-12b-it](/google/gemma-3-12b-it)[nvidia/Nemotron-3-Nano-30B-A3B](/nvidia/Nemotron-3-Nano-30B-A3B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/deep_infra_api We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. DeepInfra Native API 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. DeepInfra Native API 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # DeepInfra API DeepInfra's API is more advanced but gives you access to every model we provide unlike OpenAI which only works with LLMs and embeddings. You can also do Image Generation, Speech Recognition, Object detection, Token classification, Fill mask, Image classification, Zero-shot image classification and Text classification ### JavaScript JavaScript is a first class citizen at DeepInfra. You can install our official client with npm install deepinfra copy ### HTTP/Curl Don't want another dependency? You prefer Go, C#, Java, PHP, Swift, Ruby, C++ or something exotic? No problem. You can always use HTTP and have full access to all features by DeepInfra. ### Completions/Text Generation [List of text generation models](/models/text-generation) You should know how to format the input to make completions work. Different models might have a different input format. The example below is for [meta- llama/Meta-Llama-3-8B-Instruct](/meta-llama/Meta-Llama-3-8B-Instruct) JavaScriptbash import { TextGeneration } from "deepinfra"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL_URL = 'https://api.deepinfra.com/v1/inference/meta-llama/Meta-Llama-3-8B-Instruct'; async function main() { const client = new TextGeneration(MODEL_URL, DEEPINFRA_API_KEY); const res = await client.generate({ "input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", "stop": [ "<|eot_id|>" ] }); console.log(res.results[0].generated_text); console.log(res.inference_status.tokens_input, res.inference_status.tokens_generated) } main(); copy For every model you can check its input format in its API section. ### Embeddings [List of embeddings models](/models/embeddings) The following creates an embedding vector representing the input text JavaScriptbash import { Embeddings } from "deepinfra"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "BAAI/bge-large-en-v1.5"; const main = async () => { const client = new Embeddings(MODEL, DEEPINFRA_API_KEY); const body = { inputs: [ "What is the capital of France?", "What is the capital of Germany?", "What is the capital of Italy?", ], }; const output = await client.generate(body); console.log(output.embeddings[0]); }; main(); copy ### Image Generation [List of image generation models](/models/text-to-image) JavaScriptbash import { TextToImage } from "deepinfra"; import { createWriteStream } from "fs"; import { Readable } from "stream"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "stabilityai/stable-diffusion-2-1"; const main = async () => { const model = new TextToImage(MODEL, DEEPINFRA_API_KEY); const response = await model.generate({ prompt: "a burger with a funny hat on the beach", }); const result = await fetch(response.images[0]); if (result.ok && result.body) { let writer = createWriteStream("image.png"); Readable.fromWeb(result.body).pipe(writer); } }; main(); copy ### Speech Recognition [List of speech recognition models](/models/automatic-speech-recognition) Text to speed for a locally stored `audio.mp3` file JavaScriptbash import { AutomaticSpeechRecognition } from "deepinfra"; import path from "path"; import { fileURLToPath } from 'url'; const __filename = fileURLToPath(import.meta.url); const __dirname = path.dirname(__filename); const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "openai/whisper-large"; const main = async () => { const client = new AutomaticSpeechRecognition(MODEL, DEEPINFRA_API_KEY); const input = { audio: path.join(__dirname, "audio.mp3"), }; const response = await client.generate(input); console.log(response.text); }; main(); copy ### Object Detection [List of object detection models](/models/object-detection) Send an image for detection JavaScriptbash import { ObjectDetection } from "deepinfra"; import path from "path"; import { fileURLToPath } from 'url'; const __filename = fileURLToPath(import.meta.url); const __dirname = path.dirname(__filename); const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "hustvl/yolos-small"; const main = async () => { const model = new ObjectDetection(MODEL, DEEPINFRA_API_KEY); const input = { image: path.join(__dirname, "image.jpg"), }; const response = await model.generate(input); for (const result of response.results) { console.log(result.label, result.score, result.box); } }; main(); copy ### Token Classification [List of token classification models](/models/token-classification) JavaScriptbash import { TokenClassification } from "deepinfra"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "Davlan/bert-base-multilingual-cased-ner-hrl"; const main = async () => { const model = new TokenClassification(MODEL, DEEPINFRA_API_KEY); const input = { input: "My name is John Doe and I live in San Francisco.", }; const response = await model.generate(input); console.log(response.results); }; main(); copy ### Fill Mask [List of fill mask models](/models/fill-mask) JavaScriptbash import { FillMask } from "deepinfra"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "bert-base-cased"; const main = async () => { const model = new FillMask(MODEL, DEEPINFRA_API_KEY); const body = { input: "I need my [MASK] right now!", }; const response = await model.generate(body); console.log(response.results); }; main(); copy ### Image Classification [List of image classification models](/models/image-classification) JavaScriptbash import { ImageClassification } from "deepinfra"; import path from "path"; import { fileURLToPath } from 'url'; const __filename = fileURLToPath(import.meta.url); const __dirname = path.dirname(__filename); const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "google/vit-base-patch16-224"; const main = async () => { const model = new ImageClassification(MODEL, DEEPINFRA_API_KEY); const input = { image: path.join(__dirname, "image.jpg"), }; const response = await model.generate(input); console.log(response.results); }; main(); copy ### Zero-Shot Image Classification [List of zero-shot image classification models](/models/zero-shot-image- classification) JavaScriptbash import { ZeroShotImageClassification } from "deepinfra"; import path from "path"; import { fileURLToPath } from 'url'; const __filename = fileURLToPath(import.meta.url); const __dirname = path.dirname(__filename); const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "openai/clip-vit-base-patch32"; const main = async () => { const model = new ZeroShotImageClassification(MODEL, DEEPINFRA_API_KEY); const body = { image: path.join(__dirname, "image.jpg"), candidate_labels: ["dog", "cat", "car", "horse", "person"], }; const response = await model.generate(body); console.log(response.results); }; main(); copy ### Text Classification [List of text classification models](/models/text-classification) JavaScriptbash import { TextClassification } from "deepinfra"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "ProsusAI/finbert"; const main = async () => { const model = new TextClassification(MODEL, DEEPINFRA_API_KEY); const body = { input: "Nvidia announces new AI chips months after latest launch as market competition heats up", }; const response = await model.generate(body); console.log(response.results); }; main(); copy curl -X POST \ -d '{"input": "Nvidia announces new AI chips months after latest launch as market competition heats up"}' \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" \ -H 'Content-Type: application/json' \ 'https://api.deepinfra.com/v1/inference/ProsusAI/finbert' copy [OpenAI-Compatible API](/docs/openai_api)[Rate Limits](/docs/advanced/rate- limits) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[deepseek- ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest) Featured Models [anthropic/claude-4-sonnet](/anthropic/claude-4-sonnet)[openai/gpt- oss-120b](/openai/gpt- oss-120b)[google/gemma-3-27b-it](/google/gemma-3-27b-it)[MiniMaxAI/MiniMax-M2](/MiniMaxAI/MiniMax-M2)[mistralai/Voxtral- Mini-3B-2507](/mistralai/Voxtral-Mini-3B-2507) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/getting-started We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. Quick Start Guide 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. Quick Start Guide 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Getting Started You don't need to install anything to do your first inference. You only need [your access token](/dash/api_keys). Go to the API section on any model's page. Grab one of the examples. If you are logged in your access token will be prefilled for you. You can try one of the examples from [meta-llama/Meta- Llama-3-8B-Instruct](/meta-llama/Meta-Llama-3-8B-Instruct/api) curl "https://api.deepinfra.com/v1/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" \ -d '{ "model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [ { "role": "user", "content": "Hello!" } ] }' copy and it will respond with something like { "id": "chatcmpl-guMTxWgpFf", "object": "chat.completion", "created": 1694623155, "model": "meta-llama/Meta-Llama-3-8B-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": " Hello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 15, "completion_tokens": 16, "total_tokens": 31, "estimated_cost": 0.0000268 } } copy This example uses the [OpenAI Chat Completions API](/docs/openai_api) which we strongly recommend because it is the most convenient to use when dealing with LLMs. You can also use it with the official JavaScript/Node.js and Python libraries and they will work out of the box. If you want to dip your toes a little more in the AI world you can try the following example curl "https://api.deepinfra.com/v1/inference/meta-llama/Meta-Llama-3-8B-Instruct" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" \ -d '{ "input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", "stop": [ "<|eot_id|>" ] }' copy It is using DeepInfra's API and it require advanced knowledge of how the model works, which in turn gives you more flexiblity. You can read specifics about each model in its API section including stop words, streaming and more. You will get a response similar to the previous example { "request_id": "RWZDRhS5kdoM1XWwXLEshynO", "inference_status": { "status": "succeeded", "runtime_ms": 243, "cost": 0.0000436, "tokens_input": 12, "tokens_generated": 25 }, "results": [ { "generated_text": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?" } ], "num_tokens": 25, "num_input_tokens":12 } copy [Getting Started](/docs)[Available Models](/docs/models) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1) Featured Models [meta-llama/Llama-3.3-70B-Instruct-Turbo](/meta-llama/Llama-3.3-70B-Instruct- Turbo)[deepseek-ai/DeepSeek-V3-0324](/deepseek- ai/DeepSeek-V3-0324)[Qwen/Qwen3-30B-A3B](/Qwen/Qwen3-30B-A3B)[Qwen/Qwen3-Coder-480B-A35B-Instruct- Turbo](/Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo)[meta-llama/Llama- Guard-4-12B](/meta-llama/Llama-Guard-4-12B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/gpu-instances/containers We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. Containers 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. Containers 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Containers ## Overview GPU Containers provide on-demand access to high-performance GPU compute resources in the cloud. With GPU Containers, you can quickly spin up containers with dedicated GPU access for machine learning training, inference, data processing, and other compute-intensive workloads. Key features: * **On-demand GPU access** : Launch containers with dedicated GPU resources when you need them * **Flexible configurations** : Choose from various GPU configurations based on your performance and budget requirements * **SSH access** : Connect directly to your containers via SSH for full control over your environment * **Pay-per-use** : Only pay for the time your containers are running * **Quick setup** : Get started in minutes with our streamlined creation process GPU Containers are ideal for: * Machine learning model training and fine-tuning * Running inference workloads that require GPU acceleration * Data processing and analysis tasks * Development and testing of GPU-accelerated applications * Prototyping and experimentation with different GPU configurations ## Usage ### Web UI #### Starting a New Container 1. **Navigate to GPU Instances** * Go to your [Dashboard](/dash) and select "Instances" from the sidebar * Click the "New Container" button [![GPU Instances Web UI](/docs/instances.webp)](/dash/instances) 2. **Select GPU Configuration** * Choose from available GPU configurations based on your needs * Each configuration shows: * GPU type, quantity and memory (e.g., "1xB100-180GB", "2xB200-180GB") * Hourly pricing * Current availability status * Configurations marked "Out of capacity" are temporarily unavailable ![Select GPU config](/docs/new-container-1.webp) 3. **Enter Container Details** * **Container Name** : Provide a descriptive name for your container * **SSH Key** : Paste your public SSH key for secure access * Use the format: `ssh-rsa AAAAB3NzaC1yc2E...` * This key will be added to the `ubuntu` user account ![Enter container name and SSH key](/docs/new-container-2.webp) 4. **Accept License Agreements** * Review and accept the NVIDIA software license agreements * Acknowledge the cryptocurrency mining prohibition policy * Click "I agree to the above" to create your container #### Connecting to a Running Container **Access and Connect** * Wait for your container status to show "running" in the GPU Instances list * Click on SSH login field * Open your terminal and run: `ssh ubuntu@` * Your container is ready to use with GPU access configured ![Copy SSH Login](/docs/instance-copy-ssh-login.webp) #### Stopping a Container **Terminate Container** * Click on the container you want to stop from the instances list * Click the "Terminate" button * Type "confirm" in the dialog and click "Terminate" * Warning: All container data will be permanently lost ### HTTP API #### Starting a New Container **Create Container** curl -X POST https://api.deepinfra.com/v1/containers \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "my-container", "gpu_config": "8xB200-180GB", "container_image": "di-cont-ubuntu-torch:latest", "cloud_init_user_data": "#cloud-config\nusers:\n- name: ubuntu\n shell: /bin/bash\n sudo: '\''ALL=(ALL) NOPASSWD:ALL'\''\n ssh_authorized_keys:\n - ssh-rsa AAAAB3NzaC1yc2E..." }' copy #### Connecting to a Running Container **Get Container Details** curl -X GET https://api.deepinfra.com/v1/containers/{container_id} \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" copy Once the container state is "running" and an IP address is assigned, connect via SSH: ssh ubuntu@ copy #### Listing Containers curl -X GET https://api.deepinfra.com/v1/containers \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" copy #### Terminating a Container curl -X DELETE https://api.deepinfra.com/v1/containers/{container_id} \ -H "Authorization: Bearer $DEEPINFRA_TOKEN" copy ### Container States Containers progress through several states during their lifecycle: * **creating** : Container is being initialized * **starting** : Container is booting up * **running** : Container is active and accessible * **shutting_down** : Container is being terminated * **failed** : Container failed to start or encountered an error * **deleted** : Container has been permanently removed [GPU Instances](/docs/gpu-instances)[Custom Deployments](/docs/custom- deployments) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [zai-org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet- latest)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) Featured Models [Qwen/Qwen3-14B](/Qwen/Qwen3-14B)[microsoft/phi-4](/microsoft/phi-4)[openai/whisper- large-v3-turbo](/openai/whisper- large-v3-turbo)[anthropic/claude-4-sonnet](/anthropic/claude-4-sonnet)[sesame/csm-1b](/sesame/csm-1b) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/gpu-instances We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. GPU Instances 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. GPU Instances 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # GPU Instances GPU Instances provide on-demand access to high-performance GPU compute resources in the cloud: * [Containers](/docs/gpu-instances/containers) \- Launch and manage GPU containers with dedicated resources [Max Output Tokens](/docs/advanced/max_tokens_limit)[Containers](/docs/gpu- instances/containers) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp) Featured Models [Qwen/Qwen3-32B](/Qwen/Qwen3-32B)[hexgrad/Kokoro-82M](/hexgrad/Kokoro-82M)[deepseek- ai/DeepSeek-R1-Distill-Llama-70B](/deepseek-ai/DeepSeek-R1-Distill- Llama-70B)[sesame/csm-1b](/sesame/csm-1b)[anthropic/claude-4-sonnet](/anthropic/claude-4-sonnet) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/inference We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. Running Inference 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. Running Inference 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Inference Simple, scalable and cost-effective inference API is the main feature of DeepInfra. We package state-of-the-art models into a simple rest API that you can use to build your applications. There are multiple ways to access the API with different endpoints. You can choose the one that suits you best. ### OpenAI APIs For LLMs there is the convenient OpenAI Chat Completions API, and the legacy OpenAI Completions API. Embedding models also support the OpenAI APIs. These can be accessed at the following endpoint https://api.deepinfra.com/v1/openai copy This endpoint works with HTTP/Curl requests as well as with the official OpenAI libraries for Python & Node.js. You can [learn more here](/docs/openai_api) ### Inference Endpoints Every model also has a dedicated inference endpoint. https://api.deepinfra.com/v1/inference/{model_name} copy for example, for `meta-llama/Meta-Llama-3-8B-Instruct` the endpoint is https://api.deepinfra.com/v1/inference/meta-llama/Meta-Llama-3-8B-Instruct copy These endpoints can be accessed with REST requests as well as with the [official DeepInfra Node.js library](https://github.com/deepinfra/deepinfra- node) However, bare in mind that for certain cases, like LLMs, this API is more advanced and harder to uses than the messaging OpenAI Chat Completions API. ### Streaming All LLM models support streaming with all APIs and libraries, you just have to pass the `stream` option. You can see many examples in the API section of every model. ### Authentication DeepInfra requires an API token to access any of its APIs. You can find yours in the [dashboard](/dash/api_keys) To authenticate your requests, you need to pass your API token in the `Authorization` header with type `Bearer`. Authorization: bearer $AUTH_TOKEN or pass it as a parameter to the appropriate library. ### Content types Our inference API supports `multipart/form-data` and `application/json` content types. We strongly suggest to use the latter whenever possible. #### multipart/form-data Using `multipart/form-data` makes sense when you want to send binary data such as media files. Using this content type requires less bandwidth and is more efficient for large files. #### application/json Using `application/json` makes sense when you want to send text data. You can also use this content type for binary data, using data urls. For example: { "image": "..." } copy ### HTTP Status Codes We use standard HTTP status codes to indicate the status of the request. * `200` \- OK. The request was successful. * `4xx` \- Bad Request. The request was invalid or cannot be served. * `5xx` \- Internal Server Error. Something went wrong on our side. ### Response Body The response body is always a JSON object containing the model output. It also contains metadata about the inference request like `request_id`, `cost`, `runtime_ms` (except for LLMs), `tokens_input`, `tokens_generated` (LLMs only). Example response: { "request_id": "RfMWDr1NXCd7cnaegcm3A8q0", "inference_status": { "cost": 0.004639499820768833, "runtime_ms": 1285, "status": "succeeded" }, "text": "Hello World" } copy [Available Models](/docs/models)[Data Privacy & Security](/docs/data) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[deepseek- ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [Qwen/Qwen3-235B-A22B-Thinking-2507](/Qwen/Qwen3-235B-A22B-Thinking-2507)[moonshotai/Kimi-K2-Thinking](/moonshotai/Kimi-K2-Thinking)[sesame/csm-1b](/sesame/csm-1b)[openai/whisper- large-v3-turbo](/openai/whisper- large-v3-turbo)[Qwen/Qwen3-Coder-480B-A35B-Instruct](/Qwen/Qwen3-Coder-480B-A35B-Instruct) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/integrations We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. Integrations 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. Integrations 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Integrations DeepInfra integrates with popular AI frameworks and tools: * [LangChain](/docs/advanced/langchain) \- Build applications with LangChain * [LlamaIndex](/docs/advanced/llama-index) \- Use LlamaIndex for data indexing and retrieval * [AI SDK](/docs/advanced/aisdk) \- Integrate with Vercel AI SDK * [AutoGen](/docs/advanced/autogen) \- Build multi-agent systems with AutoGen * [Okta SSO](/docs/advanced/okta) \- Enterprise single sign-on with Okta [LoRA Image Adapters](/docs/advanced/lora_text_to_image)[LangChain](/docs/advanced/langchain) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [zai-org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [Qwen/Qwen3-235B-A22B-Instruct-2507](/Qwen/Qwen3-235B-A22B-Instruct-2507)[deepseek- ai/DeepSeek-OCR](/deepseek-ai/DeepSeek- OCR)[Qwen/Qwen3-14B](/Qwen/Qwen3-14B)[meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8](/meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/misc/subprocessors We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. Data Subprocessors Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. Data Subprocessors # Deep Infra Subprocessors DeepInfra uses a number of subprocessors to provide its services. Name| Nature of processing ---|--- Stripe| Payment processor Amazon Web Services| Infrastructure Google Cloud Platform| Infrastructure Last updated: Sept 6, 2024| [Miscellaneous](/docs/misc) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [deepseek-ai/DeepSeek-OCR](/deepseek-ai/DeepSeek-OCR)[openai/gpt- oss-120b](/openai/gpt-oss-120b)[mistralai/Voxtral- Small-24B-2507](/mistralai/Voxtral- Small-24B-2507)[google/gemini-2.5-pro](/google/gemini-2.5-pro)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/misc We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. Miscellaneous 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. Miscellaneous 1. [Data Subprocessors](/docs/misc/subprocessors) # Miscellaneous * Deep Infra Data Subprocessors(/docs/misc/subprocessors) [Deprecated Models](/docs/advanced/deprecated)[Data Subprocessors](/docs/misc/subprocessors) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [Qwen/Qwen3-235B-A22B-Instruct-2507](/Qwen/Qwen3-235B-A22B-Instruct-2507)[deepseek- ai/DeepSeek-V3.2](/deepseek-ai/DeepSeek-V3.2)[deepseek- ai/DeepSeek-V3-0324](/deepseek- ai/DeepSeek-V3-0324)[Qwen/Qwen3-Coder-480B-A35B-Instruct](/Qwen/Qwen3-Coder-480B-A35B-Instruct)[MiniMaxAI/MiniMax-M2](/MiniMaxAI/MiniMax-M2) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/model-features We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. Model Features 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. Model Features 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Model Features DeepInfra models support various advanced features to enhance your AI applications: * [Function Calling](/docs/advanced/function_calling) \- Call functions from within model responses * [JSON Mode](/docs/advanced/json_mode) \- Get structured JSON outputs from models * [Multimodal Models](/docs/advanced/multimodal) \- Work with text, images, and other modalities * [Log Probabilities](/docs/advanced/log_probs) \- Access token probability information * [Max Output Tokens](/docs/advanced/max_tokens_limit) \- Control response length [Authentication & Tokens](/docs/advanced/scoped_jwt)[Function Calling](/docs/advanced/function_calling) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp) Featured Models [deepseek-ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8](/meta- llama/Llama-4-Maverick-17B-128E-Instruct- FP8)[Qwen/Qwen3-32B](/Qwen/Qwen3-32B)[canopylabs/orpheus-3b-0.1-ft](/canopylabs/orpheus-3b-0.1-ft)[meta- llama/Llama-4-Scout-17B-16E-Instruct](/meta- llama/Llama-4-Scout-17B-16E-Instruct) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/models We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. Available Models 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. Available Models 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Models DeepInfra hosts a large number of the most popular machine learning models. You can find [the full list here](/models), conveniently split into categories based on their functionality. We are constantly adding more. DeepInfra is usualy amongst the first to add a new model once it is available. Each model also has a dedicated page where you can quickly try it out or see its API documentation. Or just grab an example. We also support deploying [custom models](/docs/advanced/custom_llms) on DeepInfra. If you think there is a model that we should run just let us know at [info@deepinfra.com](mailto:info@deepinfra.com). We read every email. [Quick Start Guide](/docs/getting-started)[Running Inference](/docs/inference) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek-ai/DeepSeek-V3.1)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest)[deepseek- ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp) Featured Models [deepseek-ai/DeepSeek-V3-0324](/deepseek- ai/DeepSeek-V3-0324)[anthropic/claude-4-sonnet](/anthropic/claude-4-sonnet)[Qwen/Qwen3-14B](/Qwen/Qwen3-14B)[Qwen/Qwen3-Next-80B-A3B-Instruct](/Qwen/Qwen3-Next-80B-A3B-Instruct)[mistralai/Voxtral- Small-24B-2507](/mistralai/Voxtral-Small-24B-2507) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/openai_api We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. OpenAI-Compatible API 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. OpenAI-Compatible API 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # OpenAI API We offer OpenAI compatible API for all [LLM models](/models/text-generation) and all [Embeddings models](/models/embeddings). The APIs we support are: * [chat completion](https://platform.openai.com/docs/guides/gpt/chat-completions-api) — both streaming and regular * [completion](https://platform.openai.com/docs/guides/gpt/completions-api) — both streaming and regular * [embeddings](https://platform.openai.com/docs/guides/embeddings) — supported for all embeddings models. The endpoint for the OpenAI APIs is `https://api.deepinfra.com/v1/openai`. You can do HTTP requests. You can also use the official Python and Node.js libraries. In all cases streaming is also supported. ### Official libraries For Python you should run pip install openai copy For JavaScript/Node.js you should run npm install openai copy ### Chat Completions The Chat Completions API is the easiest to use. You exchange messages and it just works. You can change the model to another LLM and it will continue working. PythonJavaScriptbash from openai import OpenAI openai = OpenAI( api_key="$DEEPINFRA_TOKEN", base_url="https://api.deepinfra.com/v1/openai", ) stream = True # or False chat_completion = openai.chat.completions.create( model="meta-llama/Meta-Llama-3-8B-Instruct", messages=[{"role": "user", "content": "Hello"}], stream=stream, ) if stream: for event in chat_completion: if event.choices[0].finish_reason: print(event.choices[0].finish_reason, event.usage['prompt_tokens'], event.usage['completion_tokens']) else: print(event.choices[0].delta.content) else: print(chat_completion.choices[0].message.content) print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens) copy You can see more complete examples at the documentation page of each model. ### Conversations with Chat Completions To create a longer chat-like conversation you have to add each response message and each of the user's messages to every request. This way the model will have the context and will be able to provide better answers. You can tweak it even further by providing a system message. PythonJavaScriptbash from openai import OpenAI openai = OpenAI( api_key="$DEEPINFRA_TOKEN", base_url="https://api.deepinfra.com/v1/openai", ) stream = True # or False chat_completion = openai.chat.completions.create( model="meta-llama/Meta-Llama-3-8B-Instruct", messages=[ {"role": "system", "content": "Respond like a michelin starred chef."}, {"role": "user", "content": "Can you name at least two different techniques to cook lamb?"}, {"role": "assistant", "content": "Bonjour! Let me tell you, my friend, cooking lamb is an art form, and I'm more than happy to share with you not two, but three of my favorite techniques to coax out the rich, unctuous flavors and tender textures of this majestic protein. First, we have the classic \"Sous Vide\" method. Next, we have the ancient art of \"Sous le Sable\". And finally, we have the more modern technique of \"Hot Smoking.\""}, {"role": "user", "content": "Tell me more about the second method."}, ], stream=stream, ) if stream: for event in chat_completion: if event.choices[0].finish_reason: print(event.choices[0].finish_reason, event.usage['prompt_tokens'], event.usage['completion_tokens']) else: print(event.choices[0].delta.content) else: print(chat_completion.choices[0].message.content) print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens) copy The longer the conversation gets, the more time it takes the model to generate the response. The number of messages that you can have in a conversation is limited by the context size of a model. Larger models also usually take more time to respond and are more expensive. ### Completions This is an advanced API. You should know how to format the input to make it work. Different models might have a different input format. The example below is for [meta-llama/Meta-Llama-3-8B-Instruct](/meta-llama/Meta- Llama-3-8B-Instruct). You can see the model's input format in the API section on its page. PythonJavaScriptbash from openai import OpenAI openai = OpenAI( api_key="$DEEPINFRA_TOKEN", base_url="https://api.deepinfra.com/v1/openai", ) stream = True # or False completion = openai.completions.create( model='meta-llama/Meta-Llama-3-8B-Instruct', prompt='<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', stop=['<|eot_id|>'], stream=stream, ) if stream: for event in completion: if event.choices[0].finish_reason: print(event.choices[0].finish_reason, event.usage.prompt_tokens, event.usage.completion_tokens) else: print(event.choices[0].text) else: print(completion.choices[0].text) print(completion.usage.prompt_tokens, completion.usage.completion_tokens) copy For every model you can check its input format in the API section on its page. ### Embeddings DeepInfra supports the OpenAI embeddings API. The following creates an embedding vector representing the input text PythonJavaScriptbash from openai import OpenAI openai = OpenAI( api_key="$DEEPINFRA_TOKEN", base_url="https://api.deepinfra.com/v1/openai", ) input = "The food was delicious and the waiter...", # or an array ["hello", "world"] embeddings = openai.embeddings.create( model="BAAI/bge-large-en-v1.5", input=input, encoding_format="float" ) if isinstance(input, str): print(embeddings.data[0].embedding) else: for i in range(len(input)): print(embeddings.data[i].embedding) print(embeddings.usage.prompt_tokens) copy ### Image Generation You can use the OpenAI compatible API to generate images. Here's an example using Python: PythonJavaScriptbash import io import base64 from PIL import Image from openai import OpenAI client = OpenAI( api_key="$DEEPINFRA_TOKEN", base_url="https://api.deepinfra.com/v1/openai" ) if __name__ == "__main__": response = client.images.generate( prompt="A photo of an astronaut riding a horse on Mars.", size="1024x1024", quality="standard", n=1, ) b64_json = response.data[0].b64_json image_bytes = base64.b64decode(b64_json) image = Image.open(io.BytesIO(image_bytes)) image.save("output.png") copy ## Model parameter Some models have more than one version available, you can infer against a particular version by specifying `{"model": "MODEL_NAME:VERSION", ...}` format. You could also infer against a `deploy_id`, by using `{"model": "deploy_id:DEPLOY_ID", ...}`. This is especially useful for [Custom LLMs](/docs/advanced/custom_llms), you can infer before the deployment is running (and before you have the model-name+version pair). ## Caveats Please note that we might not be 100% compatible yet, let us know in discord or by email if something you require is missing. Supported request attributes: ChatCompletions and Completions: * `model`, including specifying `version`/`deploy_id` support * `messages` (roles `system`, `user`, `assistant`) * `max_tokens` * `stream` * `temperature` * `top_p` * `stop` * `n` * `presence_penalty` * `frequency_penalty` * `response_format` (`{"type": "json"}` only, it will return default format when omitted) * `tools`, `tool_choice` * `echo`, `logprobs` \-- only for (non chat) completions `deploy_id` might not be immediately avaiable if the model is currently deploying Embeddings: * `model` * `input` * `encoding_format` \-- `float` only Images: * `model` \-- Defaults to FLUX Schnell * `quality` and `style` \-- only available for compatibility. * `response_format` \-- only `b64_json` supported for now. You can see even more details on each model's page. [API Reference](/docs/api-reference)[DeepInfra Native API](/docs/deep_infra_api) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[deepseek- ai/DeepSeek-V3.2-Exp](/deepseek-ai/DeepSeek-V3.2-Exp)[zai-org/GLM-4.6](/zai- org/GLM-4.6)[anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest) Featured Models [Qwen/Qwen3-235B-A22B-Thinking-2507](/Qwen/Qwen3-235B-A22B-Thinking-2507)[sesame/csm-1b](/sesame/csm-1b)[meta- llama/Llama-3.3-70B-Instruct-Turbo](/meta-llama/Llama-3.3-70B-Instruct- Turbo)[nvidia/Nemotron-3-Nano-30B-A3B](/nvidia/Nemotron-3-Nano-30B-A3B)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/tutorials/stable-diffusion We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. Stable Diffusion (Text to Image) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. Stable Diffusion (Text to Image) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Running Stable Diffusion on DeepInfra ## Pick a model We support variety of text-to-image models, including Stable Diffusion versions 1.4, 1.5 and 2.1 and many derivatives. Pick a model from [the list of text-to-image models](/models/text-to-image) For example, we'll use `stability-ai/sdxl`. JavaScriptbash import { Sdxl } from "deepinfra"; import { createWriteStream } from "fs"; import { Readable } from "stream"; const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const main = async () => { const model = new Sdxl(DEEPINFRA_API_KEY); const response = await model.generate({ input: { prompt: "a burger with a funny hat on the beach", }, }); const result = await fetch(response.output[0]); if (result.ok && result.body) { let writer = createWriteStream("image.png"); Readable.fromWeb(result.body).pipe(writer); } }; main(); copy ## Advanced options Check [stability-ai/sdxl](/stability-ai/sdxl) for more options [Tutorials & Examples](/docs/tutorials)[Whisper (Speech to Text)](/docs/tutorials/whisper) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [Qwen/Qwen3-32B](/Qwen/Qwen3-32B)[google/gemma-3-4b-it](/google/gemma-3-4b-it)[canopylabs/orpheus-3b-0.1-ft](/canopylabs/orpheus-3b-0.1-ft)[deepseek- ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[google/gemma-3-27b-it](/google/gemma-3-27b-it) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/tutorials/whisper We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. Whisper (Speech to Text) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. [Tutorials & Examples](/docs/tutorials) 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. Whisper (Speech to Text) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Running Whisper using DeepInfra ## Speech recognition made easy [Whisper](https://github.com/openai/whisper) is a Speech-To-Text model from OpenAI. Given an audio file with voice data it produces human speech recognition text with per sentence timestamps. There are different model sizes (small, base, large, etc.) and variants for English. You can see all [speech recognition models](/models?type=automatic-speech- recognition) that we currenly provide. By default, Whisper produces by sentence timestamp segmentation. We also host [whisper-timestamped](/openai/whisper-timestamped-medium) which can provide by word timestamp segmentation. Whisper is fully supported by our REST API and our Node.js client. JavaScriptbash import { AutomaticSpeechRecognition } from "deepinfra"; import path from "path"; import { fileURLToPath } from 'url'; const __filename = fileURLToPath(import.meta.url); const __dirname = path.dirname(__filename); const DEEPINFRA_API_KEY = "$DEEPINFRA_TOKEN"; const MODEL = "openai/whisper-large"; const main = async () => { const client = new AutomaticSpeechRecognition(MODEL, DEEPINFRA_API_KEY); const input = { audio: path.join(__dirname, "/home/user/audio.mp3"), }; const response = await client.generate(input); console.log(response.text); }; main(); copy You can pass audio formats like mp3 and wav. To see additional parameters and how to call this model checkout out its [documentation page](/openai/whisper-large) [Stable Diffusion (Text to Image)](/docs/tutorials/stable- diffusion)[Deprecated Models](/docs/advanced/deprecated) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [anthropic/claude-3-7-sonnet-latest](/anthropic/claude-3-7-sonnet- latest)[deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6) Featured Models [deepseek-ai/DeepSeek-V3.2](/deepseek-ai/DeepSeek-V3.2)[ResembleAI/chatterbox- turbo](/ResembleAI/chatterbox- turbo)[moonshotai/Kimi-K2-Thinking](/moonshotai/Kimi-K2-Thinking)[Qwen/Qwen3-235B-A22B-Thinking-2507](/Qwen/Qwen3-235B-A22B-Thinking-2507)[mistralai/Mistral- Small-3.2-24B-Instruct-2506](/mistralai/Mistral-Small-3.2-24B-Instruct-2506) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms) --- # Source: https://deepinfra.com/docs/tutorials We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… AcceptReject [FLUX.2 is live!](https://deepinfra.com/models?q=flux-2) High-fidelity image generation made simple. [](/) [Models](/models) By Category * * * [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [View all models](/models) By Family * * * [![anthropic logo](/_next/static/media/anthropic.c6636fa8.svg)/Claude](/claude)[![deepseek- ai logo](/_next/static/media/deepseek.b1ec6c4e.svg)/DeepSeek](/deepseek)[![black- forest-labs logo](/_next/static/media/bfl.7e050ff6.svg)/Flux](/flux)[![google logo](/_next/static/media/google.09551b71.svg)/Gemini](/gemini)[![meta-llama logo](/_next/static/media/meta.56a2e6fd.svg)/Llama](/llama)[![mistralai logo](/_next/static/media/mistralai.ecbe51d4.svg)/Mistral](/mistral)[![nvidia logo](/_next/static/media/nvidia.2165073d.svg)/Nemotron](/nemotron)[![qwen logo](/_next/static/media/qwen.d6d74288.svg)/Qwen](/qwen) [Docs](/docs)[Pricing](/pricing)[GPUs](/gpu- instances)[Chat](/chat)[DeepStart](/deepstart)[Blog](/blog) [Contact Sales](/contact-sales)[Log In](/login) [Models](/models) [Automatic Speech Recognition](/models/automatic-speech- recognition)[Embeddings](/models/embeddings)[Reranker](/models/reranker)[Text Generation](/models/text-generation)[Text To Image](/models/text-to- image)[Text To Speech](/models/text-to-speech)[Text To Video](/models/text-to- video)[Zero Shot Image Classification](/models/zero-shot-image-classification) [Docs](/docs) [Pricing](/pricing) [GPUs](/gpu-instances) [Chat](/chat) [DeepStart](/deepstart) [Blog](/blog) Feedback [Contact Sales](/contact-sales) [Log In](/login) 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. Tutorials & Examples 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) Documentation 1. [Getting Started](/docs) 1. [Quick Start Guide](/docs/getting-started) 2. [Available Models](/docs/models) 3. [Running Inference](/docs/inference) 4. [Data Privacy & Security](/docs/data) 2. [API Reference](/docs/api-reference) 1. [OpenAI-Compatible API](/docs/openai_api) 2. [DeepInfra Native API](/docs/deep_infra_api) 3. [Rate Limits](/docs/advanced/rate-limits) 4. [Webhooks](/docs/advanced/webhooks) 5. [Authentication & Tokens](/docs/advanced/scoped_jwt) 3. [Model Features](/docs/model-features) 1. [Function Calling](/docs/advanced/function_calling) 2. [JSON Mode](/docs/advanced/json_mode) 3. [Multimodal Models](/docs/advanced/multimodal) 4. [Log Probabilities](/docs/advanced/log_probs) 5. [Max Output Tokens](/docs/advanced/max_tokens_limit) 4. [GPU Instances](/docs/gpu-instances) 1. [Containers](/docs/gpu-instances/containers) 5. [Custom Deployments](/docs/custom-deployments) 1. [Custom LLMs](/docs/advanced/custom_llms) 2. [LoRA Adapter Models](/docs/advanced/lora) 3. [LoRA Image Adapters](/docs/advanced/lora_text_to_image) 6. [Integrations](/docs/integrations) 1. [LangChain](/docs/advanced/langchain) 2. [LlamaIndex](/docs/advanced/llama-index) 3. [AI SDK](/docs/advanced/aisdk) 4. [AutoGen](/docs/advanced/autogen) 5. [Okta SSO](/docs/advanced/okta) 7. Tutorials & Examples 1. [Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) 2. [Whisper (Speech to Text)](/docs/tutorials/whisper) 3. [Deprecated Models](/docs/advanced/deprecated) 8. [Miscellaneous](/docs/misc) 1. [Data Subprocessors](/docs/misc/subprocessors) # Tutorials In this section we will look at some tutorials to get you started with DeepInfra. We will cover the following topics: * [Stable Diffusion](/docs/tutorials/stable-diffusion) * [Whisper](/docs/tutorials/whisper) [Okta SSO](/docs/advanced/okta)[Stable Diffusion (Text to Image)](/docs/tutorials/stable-diffusion) ![Footer Logo](/_next/static/media/footer_logo.b3e9d8d3.svg) ![SOC 2 Certified](https://static.sprinto.com/_next/static/images/framework/soc2.png)![ISO 27001 Certified](https://static.sprinto.com/_next/static/images/framework/iso-27001.png) Have questions or need a custom solution? [Contact Sales](/contact-sales) Company [Pricing](/pricing) [Docs](/docs) [Compare](/compare) [DeepStart](/deepstart) [About](/about_us) [Careers](https://jobs.gem.com/deep-infra) [Contact us](/contact-sales) [Trust Center](https://trust.deepinfra.com) [DeepGPT](https://deepgpt.com) Latest Models [deepseek-ai/DeepSeek-V3.2-Exp](/deepseek- ai/DeepSeek-V3.2-Exp)[moonshotai/Kimi-K2-Instruct-0905](/moonshotai/Kimi-K2-Instruct-0905)[zai- org/GLM-4.6](/zai-org/GLM-4.6)[deepseek-ai/DeepSeek-V3.1](/deepseek- ai/DeepSeek-V3.1)[anthropic/claude-3-7-sonnet- latest](/anthropic/claude-3-7-sonnet-latest) Featured Models [openai/whisper-large-v3-turbo](/openai/whisper- large-v3-turbo)[ResembleAI/chatterbox-turbo](/ResembleAI/chatterbox- turbo)[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8](/meta- llama/Llama-4-Maverick-17B-128E-Instruct-FP8)[openai/gpt- oss-120b](/openai/gpt- oss-120b)[nvidia/Nemotron-3-Nano-30B-A3B](/nvidia/Nemotron-3-Nano-30B-A3B) ![Built With Love in Palo Alto](/_next/static/media/love.ce60156e.svg) [](https://linkedin.com/company/deep- infra)[](https://x.com/DeepInfra)[](https://github.com/DeepInfra)[](https://discord.gg/x88dCvhqYq) © 2026 Deep Infra. All rights reserved. [Privacy Policy](/privacy)[Terms of Service](/terms)