# Fireworks Ai > Build production-ready AI agents with Fireworks and leading open-source frameworks --- # Source: https://docs.fireworks.ai/ecosystem/integrations/agent-frameworks.md # Agent Frameworks > Build production-ready AI agents with Fireworks and leading open-source frameworks Fireworks AI seamlessly integrates with the best open-source agent frameworks, enabling you to build magical, production-ready applications powered by state-of-the-art language models. ## Supported Frameworks Build LLM applications with powerful orchestration and tool integration Efficient data retrieval and document indexing for LLM-based agents Orchestrate collaborative multi-agent systems for complex tasks Type-safe AI agent development with Pydantic validation Modern agent orchestration with seamless OpenAI-compatible integration ## Need Help? For assistance with agent framework integrations, [contact our team](https://fireworks.ai/contact) or join our [Discord community](https://discord.gg/fireworks-ai). --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/alias-evaluator-revision.md # firectl alias evaluator-revision > Alias an evaluator revision ``` firectl alias evaluator-revision [flags] ``` ### Examples ``` firectl alias evaluator-revision accounts/my-account/evaluators/my-evaluator/versions/abc123 --alias-id current ``` ### Flags ``` --alias-id string Alias ID to assign (e.g. current) -h, --help help for evaluator-revision ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/are-there-any-quotas-for-serverless.md # Are there any quotas for serverless? Yes, serverless deployments have rate limits and quotas. For detailed information about serverless quotas, rate limits, and daily token limits, see our [Rate Limits & Quotas guide](/guides/quotas_usage/rate-limits#rate-limits-on-serverless). --- # Source: https://docs.fireworks.ai/faq-new/billing-pricing/are-there-discounts-for-bulk-usage.md # Are there discounts for bulk usage? We offer discounts for bulk or pre-paid purchases. Contact [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai) to discuss volume pricing. --- # Source: https://docs.fireworks.ai/faq-new/billing-pricing/are-there-extra-fees-for-serving-fine-tuned-models.md # Are there extra fees for serving fine-tuned models? No, deploying fine-tuned models to serverless infrastructure is free. Here's what you need to know: **What's free**: * Deploying fine-tuned models to serverless infrastructure * Hosting the models on serverless infrastructure * Deploying up to 100 fine-tuned models **What you pay for**: * **Usage costs** on a per-token basis when the model is actually used * The **fine-tuning process** itself, if applicable Only a limited set of models are supported for serverless hosting of fine-tuned models. Checkout the [Fireworks Model Library](https://app.fireworks.ai/models?filter=LLM\&serverlessWithLoRA=true) to see models with serverless support for fine-tuning. *Note*: This differs from on-demand deployments, which include hourly hosting costs. --- # Source: https://docs.fireworks.ai/api-reference/audio-streaming-transcriptions.md # Streaming Transcription Streaming transcription is performed over a WebSocket. Provide the transcription parameters and establish a WebSocket connection to the endpoint. Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). In parallel, receive transcription from the WebSocket. Stream audio to get transcription continuously in real-time. Stream audio to get transcription continuously in real-time. Stream audio to get transcription continuously in real-time. ### URLs Fireworks provides serverless, real-time ASR via WebSocket endpoints. Please select the appropriate version: #### Streaming ASR v1 (default) Production-ready and generally recommended for all use cases. ``` wss://audio-streaming.api.fireworks.ai/v1/audio/transcriptions/streaming ``` #### Streaming ASR v2 (preview) An early-access version of our next-generation streaming transcription service. V2 is good for use cases that require lower latency and higher accuracy in noisy situations. ``` wss://audio-streaming-v2.api.fireworks.ai/v1/audio/transcriptions/streaming ``` ### Headers Your Fireworks API key, e.g. `Authorization=API_KEY`. Alternatively, can be provided as a query param. ### Query Parameters Your Fireworks API key. Required when headers cannot be set (e.g., browser WebSocket connections). Can alternatively be provided via the Authorization header. The format in which to return the response. Currently only `verbose_json` is recommended for streaming. The target language for transcription. See the [Supported Languages](#supported-languages) section below for a complete list of available languages. The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. `Um, here's, uh, what was recorded.` will make the model to include the filler words into the transcription. Sampling temperature to use when decoding text tokens during transcription. The timestamp granularities to populate for this streaming transcription. Defaults to null. Set to `word,segment` to enable timestamp granularities. Use a list for timestamp\_granularities in all client libraries. A comma-separated string like `word,segment` only works when manually included in the URL (e.g. in curl). ### Client messages This field is for client to send audio chunks over to server. Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). This field is for client event initiating the context clean up. A unique identifier for the event. A constant string that identifies the type of event as "stt.state.clear". The ID of the context or session to be cleared. This field is for client event initiating tracing. A unique identifier for the event. A constant string indicating the event type is "stt.input.trace". The ID used to correlate this trace event across systems. ### Server messages The task that was performed — either `transcribe` or `translate`. The language of the transcribed/translated text. The transcribed/translated text. Extracted words and their corresponding timestamps. The text content of the word. The language of the word. The probability of the word. The hallucination score of the word. Start time of the word in seconds. Appears only when timestamp\_granularities is set to `word,segment`. End time of the word in seconds. Appears only when timestamp\_granularities is set to `word,segment`. Indicates whether this word has been finalized. Segments of the transcribed/translated text and their corresponding details. The ID of the segment. The text content of the segment. Extracted words in the segment. Start time of the segment in seconds. Appears only when timestamp\_granularities is set to `word,segment`. End time of the segment in seconds. Appears only when timestamp\_granularities is set to `word,segment`. This field is for server to communicate it successfully cleared the context. A unique identifier for the event. A constant string indicating the event type is "stt.state.cleared" The ID of the context or session that has been successfully cleared. This field is for server to complete tracing. A unique identifier for the event. A constant string indicating the event type is "stt.output.trace". The ID used to correlate this output trace with the corresponding input trace. ### Streaming Audio Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). Typically, you will: 1. Resample your audio to 16 kHz if it is not already. 2. Convert it to mono. 3. Send 50ms chunks (16,000 Hz \* 0.05s = 800 samples) of audio in 16-bit PCM (signed, little-endian) format. ### Handling Responses The client maintains a state dictionary, starting with an empty dictionary `{}`. When the server sends the first transcription message, it contains a list of segments. Each segment has an `id` and `text`: ```python theme={null} # Server initial message: { "segments": [ {"id": "0", "text": "This is the first sentence"}, {"id": "1", "text": "This is the second sentence"} ] } # Client initial state: { "0": "This is the first sentence", "1": "This is the second sentence", } ``` When the server sends the next updates to the transcription, the client updates the state dictionary based on the segment `id`: ```python theme={null} # Server continuous message: { "segments": [ {"id": "1", "text": "This is the second sentence modified"}, {"id": "2", "text": "This is the third sentence"} ] } # Client updated state: { "0": "This is the first sentence", "1": "This is the second sentence modified", # overwritten "2": "This is the third sentence", # new } ``` ### Handling Connection Interruptions & Timeouts Real-time streaming transcription over WebSockets can run for a long time. The longer a WebSocket session runs, the more likely it is to experience interruptions from network glitches to service hiccups. It is important to be aware of this and build your client to recover gracefully so the stream keeps going without user impact. In the following section, we’ll outline recommended practices for handling connection interruptions and timeouts effectively. #### When a connection drops Although Fireworks is designed to keep streams running smoothly, occasional interruptions can still occur. If the WebSocket is disrupted (e.g., bandwidth limitation or network failures), your application must initialize a new WebSocket connection, start a fresh streaming session and begin sending audio as soon as the server confirms the connection is open. #### Avoid losing audio during reconnects While you’re reconnecting, audio could be still being produced and you could lose that audio segment if it is not transferred to our API during this period. To minimize the risk of dropping audio during a reconnect, one effective approach is to store the audio data in a buffer until it can re-establish the connection to our API and then sends the data for transcription. ### Keep timestamps continuous across sessions When timestamps are enabled, the result will include the start and end time of the segment in seconds. And each new WebSocket session will reset the timestamps to start from 00:00:00. To keep a continuous timeline, we recommend to maintain a running “stream start offset” in your app and add that offset to timestamps from each new session so they align with the overall audio timeline. ### Example Usage Check out a brief Python example below or example sources: * [Python notebook](https://colab.research.google.com/github/fw-ai/cookbook/blob/main/learn/audio/audio_streaming_speech_to_text/audio_streaming_speech_to_text.ipynb) * [Python sources](https://github.com/fw-ai/cookbook/tree/main/learn/audio/audio_streaming_speech_to_text/python) * [Node.js sources](https://github.com/fw-ai/cookbook/tree/main/learn/audio/audio_streaming_speech_to_text/nodejs) ```python theme={null} !pip3 install requests torch torchaudio websocket-client import io import time import json import torch import requests import torchaudio import threading import websocket import urllib.parse lock = threading.Lock() state = {} def on_open(ws): def send_audio_chunks(): for chunk in audio_chunk_bytes: ws.send(chunk, opcode=websocket.ABNF.OPCODE_BINARY) time.sleep(chunk_size_ms / 1000) final_checkpoint = json.dumps({"checkpoint_id": "final"}) ws.send(final_checkpoint, opcode=websocket.ABNF.OPCODE_TEXT) threading.Thread(target=send_audio_chunks).start() def on_message(ws, message): message = json.loads(message) if message.get("checkpoint_id") == "final": ws.close() return update = {s["id"]: s["text"] for s in message["segments"]} with lock: state.update(update) print("\n".join(f" - {k}: {v}" for k, v in state.items())) def on_error(ws, error): print(f"WebSocket error: {error}") # Open a connection URL with query params url = "wss://audio-streaming.api.fireworks.ai/v1/audio/transcriptions/streaming" params = urllib.parse.urlencode({ "language": "en", }) ws = websocket.WebSocketApp( f"{url}?{params}", header={"Authorization": ""}, on_open=on_open, on_message=on_message, on_error=on_error, ) ws.run_forever() ``` ### Dedicated endpoint For fixed throughput and predictable SLAs, you may request a dedicated endpoint for streaming transcription at [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai) or [discord](https://www.google.com/url?q=https%3A%2F%2Fdiscord.gg%2Ffireworks-ai). ### Supported Languages The following languages are supported for transcription: | Language Code | Language Name | | ------------- | ------------------- | | en | English | | zh | Chinese | | de | German | | es | Spanish | | ru | Russian | | ko | Korean | | fr | French | | ja | Japanese | | pt | Portuguese | | tr | Turkish | | pl | Polish | | ca | Catalan | | nl | Dutch | | ar | Arabic | | sv | Swedish | | it | Italian | | id | Indonesian | | hi | Hindi | | fi | Finnish | | vi | Vietnamese | | he | Hebrew | | uk | Ukrainian | | el | Greek | | ms | Malay | | cs | Czech | | ro | Romanian | | da | Danish | | hu | Hungarian | | ta | Tamil | | no | Norwegian | | th | Thai | | ur | Urdu | | hr | Croatian | | bg | Bulgarian | | lt | Lithuanian | | la | Latin | | mi | Maori | | ml | Malayalam | | cy | Welsh | | sk | Slovak | | te | Telugu | | fa | Persian | | lv | Latvian | | bn | Bengali | | sr | Serbian | | az | Azerbaijani | | sl | Slovenian | | kn | Kannada | | et | Estonian | | mk | Macedonian | | br | Breton | | eu | Basque | | is | Icelandic | | hy | Armenian | | ne | Nepali | | mn | Mongolian | | bs | Bosnian | | kk | Kazakh | | sq | Albanian | | sw | Swahili | | gl | Galician | | mr | Marathi | | pa | Punjabi | | si | Sinhala | | km | Khmer | | sn | Shona | | yo | Yoruba | | so | Somali | | af | Afrikaans | | oc | Occitan | | ka | Georgian | | be | Belarusian | | tg | Tajik | | sd | Sindhi | | gu | Gujarati | | am | Amharic | | yi | Yiddish | | lo | Lao | | uz | Uzbek | | fo | Faroese | | ht | Haitian Creole | | ps | Pashto | | tk | Turkmen | | nn | Nynorsk | | mt | Maltese | | sa | Sanskrit | | lb | Luxembourgish | | my | Myanmar | | bo | Tibetan | | tl | Tagalog | | mg | Malagasy | | as | Assamese | | tt | Tatar | | haw | Hawaiian | | ln | Lingala | | ha | Hausa | | ba | Bashkir | | jw | Javanese | | su | Sundanese | | yue | Cantonese | | zh-hant | Traditional Chinese | | zh-hans | Simplified Chinese | --- # Source: https://docs.fireworks.ai/api-reference/audio-transcriptions.md # Transcribe audio Send a sample audio to get a transcription. ### Headers Your Fireworks API key, e.g. `Authorization=API_KEY`. ### Request ##### (multi-part form) The input audio file to transcribe or an URL to the public audio file. Max audio file size is 1 GB, there is no limit for audio duration. Common file formats such as mp3, flac, and wav are supported. Note that the audio will be resampled to 16kHz, downmixed to mono, and reformatted to 16-bit signed little-endian format before transcription. Pre-converting the file before sending it to the API can improve runtime performance. String name of the ASR model to use. Can be one of `whisper-v3` or `whisper-v3-turbo`. Please use the following serverless endpoints: * [https://audio-prod.api.fireworks.ai](https://audio-prod.api.fireworks.ai) (for `whisper-v3`); * [https://audio-turbo.api.fireworks.ai](https://audio-turbo.api.fireworks.ai) (for `whisper-v3-turbo`); String name of the voice activity detection (VAD) model to use. Can be one of `silero`, or `whisperx-pyannet`. String name of the alignment model to use. Currently supported: * `mms_fa` optimal accuracy for multilingual speech. * `tdnn_ffn` optimal accuracy for English-only speech. * `gentle` best accuracy for English-only speech (requires a dedicated endpoint, contact us at [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai)). The target language for transcription. See the [Supported Languages](#supported-languages) section below for a complete list of available languages. The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. `Um, here's, uh, what was recorded.` will make the model to include the filler words into the transcription. Sampling temperature to use when decoding text tokens during transcription. Alternatively, fallback decoding can be enabled by passing a list of temperatures like `0.0,0.2,0.4,0.6,0.8,1.0`. This can help to improve performance. The format in which to return the response. Can be one of `json`, `text`, `srt`, `verbose_json`, or `vtt`. The timestamp granularities to populate for this transcription. `response_format` must be set `verbose_json` to use timestamp granularities. Either or both of these options are supported. Can be one of `word`, `segment`, or `word,segment`. If not present, defaults to `segment`. Whether to get speaker diarization for the transcription. Can be one of `true`, or `false`. If not present, defaults to `false`. Enabling diarization also requires other fields to hold specific values: 1. `response_format` must be set `verbose_json`. 2. `timestamp_granularities` must include `word` to use diarization. The minimum number of speakers to detect for diarization. `diarize` must be set `true` to use `min_speakers`. If not present, defaults to `1`. The maximum number of speakers to detect for diarization. `diarize` must be set `true` to use `max_speakers`. If not present, defaults to `inf`. Audio preprocessing mode. Currently supported: * `none` to skip audio preprocessing. * `dynamic` for arbitrary audio content with variable loudness. * `soft_dynamic` for speech intense recording such as podcasts and voice-overs. * `bass_dynamic` for boosting lower frequencies; ### Response The task which was performed. Either `transcribe` or `translate`. The language of the transcribed/translated text. The duration of the transcribed/translated audio, in seconds. The transcribed/translated text. Extracted words and their corresponding timestamps. The text content of the word. The language of the word. The probability of the word. The hallucination score of the word. Start time of the word in seconds. End time of the word in seconds. Speaker label for the word. Segments of the transcribed/translated text and their corresponding details. The id of the segment. The text content of the segment. Start time of the segment in seconds. End time of the segment in seconds. Speaker label for the segment. Extracted words in the segment. ```curl curl theme={null} # Download audio file curl -L -o "audio.flac" "https://tinyurl.com/4997djsh" # Make request curl -X POST "https://audio-prod.api.fireworks.ai/v1/audio/transcriptions" \ -H "Authorization: " \ -F "file=@audio.flac" ``` ```python fireworks sdk theme={null} !pip install fireworks-ai requests python-dotenv from fireworks.client.audio import AudioInference import requests import os from dotenv import load_dotenv import time # Create a .env file with your API key load_dotenv() # Download audio sample audio = requests.get("https://tinyurl.com/4cb74vas").content # Prepare client client = AudioInference( model="whisper-v3", base_url="https://audio-prod.api.fireworks.ai", # Or for the turbo version # model="whisper-v3-turbo", # base_url="https://audio-turbo.api.fireworks.ai", api_key=os.getenv("FIREWORKS_API_KEY"), ) # Make request start = time.time() r = await client.transcribe_async(audio=audio) print(f"Took: {(time.time() - start):.3f}s. Text: '{r.text}'") ``` ```python Python (openai sdk) theme={null} !pip install openai requests python-dotenv from openai import OpenAI import os import requests from dotenv import load_dotenv load_dotenv() client = OpenAI( base_url="https://audio-prod.api.fireworks.ai/v1", api_key=os.getenv("FIREWORKS_API_KEY") ) audio_file= requests.get("https://tinyurl.com/4cb74vas").content transcription = client.audio.transcriptions.create( model="whisper-v3", file=audio_file ) print(transcription.text) ``` ### Supported Languages The following languages are supported for transcription: | Language Code | Language Name | | ------------- | ------------------- | | en | English | | zh | Chinese | | de | German | | es | Spanish | | ru | Russian | | ko | Korean | | fr | French | | ja | Japanese | | pt | Portuguese | | tr | Turkish | | pl | Polish | | ca | Catalan | | nl | Dutch | | ar | Arabic | | sv | Swedish | | it | Italian | | id | Indonesian | | hi | Hindi | | fi | Finnish | | vi | Vietnamese | | he | Hebrew | | uk | Ukrainian | | el | Greek | | ms | Malay | | cs | Czech | | ro | Romanian | | da | Danish | | hu | Hungarian | | ta | Tamil | | no | Norwegian | | th | Thai | | ur | Urdu | | hr | Croatian | | bg | Bulgarian | | lt | Lithuanian | | la | Latin | | mi | Maori | | ml | Malayalam | | cy | Welsh | | sk | Slovak | | te | Telugu | | fa | Persian | | lv | Latvian | | bn | Bengali | | sr | Serbian | | az | Azerbaijani | | sl | Slovenian | | kn | Kannada | | et | Estonian | | mk | Macedonian | | br | Breton | | eu | Basque | | is | Icelandic | | hy | Armenian | | ne | Nepali | | mn | Mongolian | | bs | Bosnian | | kk | Kazakh | | sq | Albanian | | sw | Swahili | | gl | Galician | | mr | Marathi | | pa | Punjabi | | si | Sinhala | | km | Khmer | | sn | Shona | | yo | Yoruba | | so | Somali | | af | Afrikaans | | oc | Occitan | | ka | Georgian | | be | Belarusian | | tg | Tajik | | sd | Sindhi | | gu | Gujarati | | am | Amharic | | yi | Yiddish | | lo | Lao | | uz | Uzbek | | fo | Faroese | | ht | Haitian Creole | | ps | Pashto | | tk | Turkmen | | nn | Nynorsk | | mt | Maltese | | sa | Sanskrit | | lb | Luxembourgish | | my | Myanmar | | bo | Tibetan | | tl | Tagalog | | mg | Malagasy | | as | Assamese | | tt | Tatar | | haw | Hawaiian | | ln | Lingala | | ha | Hausa | | ba | Bashkir | | jw | Javanese | | su | Sundanese | | yue | Cantonese | | zh-hant | Traditional Chinese | | zh-hans | Simplified Chinese | --- # Source: https://docs.fireworks.ai/api-reference/audio-translations.md # Translate audio ### Headers Your Fireworks API key, e.g. `Authorization=API_KEY`. ### Request ##### (multi-part form) The input audio file to translate or an URL to the public audio file. Max audio file size is 1 GB, there is no limit for audio duration. Common file formats such as mp3, flac, and wav are supported. Note that the audio will be resampled to 16kHz, downmixed to mono, and reformatted to 16-bit signed little-endian format before transcription. Pre-converting the file before sending it to the API can improve runtime performance. String name of the ASR model to use. Can be one of `whisper-v3` or `whisper-v3-turbo`. Please use the following serverless endpoints: * [https://audio-prod.api.fireworks.ai](https://audio-prod.api.fireworks.ai) (for `whisper-v3`); * [https://audio-turbo.api.fireworks.ai](https://audio-turbo.api.fireworks.ai) (for `whisper-v3-turbo`); String name of the voice activity detection (VAD) model to use. Can be one of `silero`, or `whisperx-pyannet`. String name of the alignment model to use. Currently supported: * `mms_fa` optimal accuracy for multilingual speech. * `tdnn_ffn` optimal accuracy for English-only speech. * `gentle` best accuracy for English-only speech (requires a dedicated endpoint, contact us at [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai)). The source language for transcription. See the [Supported Languages](#supported-languages) section below for a complete list of available languages. The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. `Um, here's, uh, what was recorded.` will make the model to include the filler words into the transcription. Sampling temperature to use when decoding text tokens during transcription. Alternatively, fallback decoding can be enabled by passing a list of temperatures like `0.0,0.2,0.4,0.6,0.8,1.0`. This can help to improve performance. The format in which to return the response. Can be one of `json`, `text`, `srt`, `verbose_json`, or `vtt`. The timestamp granularities to populate for this transcription. response\_format must be set `verbose_json` to use timestamp granularities. Either or both of these options are supported. Can be one of `word`, `segment`, or `word,segment`. If not present, defaults to `segment`. Audio preprocessing mode. Currently supported: * `none` to skip audio preprocessing. * `dynamic` for arbitrary audio content with variable loudness. * `soft_dynamic` for speech intense recording such as podcasts and voice-overs. * `bass_dynamic` for boosting lower frequencies; ### Response The task which was performed. Either `transcribe` or `translate`. The language of the transcribed/translated text. The duration of the transcribed/translated audio, in seconds. The transcribed/translated text. Extracted words and their corresponding timestamps. The text content of the word. Start time of the word in seconds. End time of the word in seconds. Segments of the transcribed/translated text and their corresponding details. ```curl curl theme={null} # Download audio file curl -L -o "audio.flac" "https://tinyurl.com/4997djsh" # Make request curl -X POST "https://audio-prod.api.fireworks.ai/v1/audio/translations" \ -H "Authorization: " \ -F "file=@audio.flac" ``` ```python Python (fireworks sdk) theme={null} !pip install fireworks-ai requests from fireworks.client.audio import AudioInference import requests import time from dotenv import load_dotenv import os load_dotenv() # Prepare client audio = requests.get("https://tinyurl.com/3cy7x44v").content client = AudioInference( model="whisper-v3", base_url="https://audio-prod.api.fireworks.ai", # # Or for the turbo version # model="whisper-v3-turbo", # base_url="https://audio-turbo.api.fireworks.ai", api_key=os.getenv("FIREWORKS_API_KEY") ) # Make request start = time.time() r = await client.translate_async(audio=audio) print(f"Took: {(time.time() - start):.3f}s. Text: '{r.text}'") ``` ```python Python (openai sdk) theme={null} !pip install openai requests from openai import OpenAI import requests from dotenv import load_dotenv import os load_dotenv() client = OpenAI( base_url="https://audio-prod.api.fireworks.ai/v1", api_key=os.getenv("FIREWORKS_API_KEY"), ) audio_file= requests.get("https://tinyurl.com/3cy7x44v").content translation = client.audio.translations.create( model="whisper-v3", file=audio_file, ) print(translation.text) ``` ### Supported Languages Translation is from one of the supported languages to English, the following languages are supported for translation: | Language Code | Language Name | | ------------- | -------------- | | en | English | | zh | Chinese | | de | German | | es | Spanish | | ru | Russian | | ko | Korean | | fr | French | | ja | Japanese | | pt | Portuguese | | tr | Turkish | | pl | Polish | | ca | Catalan | | nl | Dutch | | ar | Arabic | | sv | Swedish | | it | Italian | | id | Indonesian | | hi | Hindi | | fi | Finnish | | vi | Vietnamese | | he | Hebrew | | uk | Ukrainian | | el | Greek | | ms | Malay | | cs | Czech | | ro | Romanian | | da | Danish | | hu | Hungarian | | ta | Tamil | | no | Norwegian | | th | Thai | | ur | Urdu | | hr | Croatian | | bg | Bulgarian | | lt | Lithuanian | | la | Latin | | mi | Maori | | ml | Malayalam | | cy | Welsh | | sk | Slovak | | te | Telugu | | fa | Persian | | lv | Latvian | | bn | Bengali | | sr | Serbian | | az | Azerbaijani | | sl | Slovenian | | kn | Kannada | | et | Estonian | | mk | Macedonian | | br | Breton | | eu | Basque | | is | Icelandic | | hy | Armenian | | ne | Nepali | | mn | Mongolian | | bs | Bosnian | | kk | Kazakh | | sq | Albanian | | sw | Swahili | | gl | Galician | | mr | Marathi | | pa | Punjabi | | si | Sinhala | | km | Khmer | | sn | Shona | | yo | Yoruba | | so | Somali | | af | Afrikaans | | oc | Occitan | | ka | Georgian | | be | Belarusian | | tg | Tajik | | sd | Sindhi | | gu | Gujarati | | am | Amharic | | yi | Yiddish | | lo | Lao | | uz | Uzbek | | fo | Faroese | | ht | Haitian Creole | | ps | Pashto | | tk | Turkmen | | nn | Nynorsk | | mt | Maltese | | sa | Sanskrit | | lb | Luxembourgish | | my | Myanmar | | bo | Tibetan | | tl | Tagalog | | mg | Malagasy | | as | Assamese | | tt | Tatar | | haw | Hawaiian | | ln | Lingala | | ha | Hausa | | ba | Bashkir | | jw | Javanese | | su | Sundanese | | yue | Cantonese | --- # Source: https://docs.fireworks.ai/guides/security_compliance/audit_logs.md # Audit & Access Logs > Monitor and track account activities with audit logging for Enterprise accounts Audit logs are available for Enterprise accounts. This feature enhances security visibility, incident investigation, and compliance reporting. Audit logs include data access logs. All read, write, and delete operations on storage are logged, normalized, and enriched with account context for complete visibility. ## View audit logs You can view audit logs, including data access logs, using the Fireworks CLI: ```bash theme={null} firectl ls audit-logs ``` Audit logs table showing data access activities with columns for timestamp, principal, response code, resource path, and message --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/authentication.md # Authentication > Authentication for access to your account ### Signing in Users using Google SSO can run: ``` firectl signin ``` If you are using [custom SSO](/accounts/sso), also specify the account ID: ``` firectl signin my-enterprise-account ``` ### Authenticate with API Key To authenticate with an API key, append `--api-key` to any firectl command. ``` firectl --api-key API_KEY ``` To persist the API key for all subsequent commands, run: ``` firectl set-api-key API_KEY ``` --- # Source: https://docs.fireworks.ai/deployments/autoscaling.md # Autoscaling > Configure how your deployment scales based on traffic Control how your deployment scales based on traffic and load. ## Configuration options | Flag | Type | Default | Description | | ------------------------ | --------- | ------------- | ------------------------------------------------------ | | `--min-replica-count` | Integer | 0 | Minimum number of replicas. Set to 0 for scale-to-zero | | `--max-replica-count` | Integer | 1 | Maximum number of replicas | | `--scale-up-window` | Duration | 30s | Wait time before scaling up | | `--scale-down-window` | Duration | 10m | Wait time before scaling down | | `--scale-to-zero-window` | Duration | 1h | Idle time before scaling to zero (min: 5m) | | `--load-targets` | Key-value | `default=0.8` | Scaling thresholds. See options below | **Load target options** (use as `--load-targets =[,=...]`): * `default=` - General load target from 0 to 1 * `tokens_generated_per_second=` - Desired tokens per second per replica * `requests_per_second=` - Desired requests per second per replica * `concurrent_requests=` - Desired concurrent requests per replica When multiple targets are specified, the maximum replica count across all is used. ## Common patterns Scale to zero when idle to minimize costs: ```bash theme={null} firectl create deployment \ --min-replica-count 0 \ --max-replica-count 3 \ --scale-to-zero-window 1h ``` Best for: Development, testing, or intermittent production workloads. Keep replicas running for instant response: ```bash theme={null} firectl create deployment \ --min-replica-count 2 \ --max-replica-count 10 \ --scale-up-window 15s \ --load-targets concurrent_requests=5 ``` Best for: Low-latency requirements, avoiding cold starts, high-traffic applications. Match known traffic patterns: ```bash theme={null} firectl create deployment \ --min-replica-count 3 \ --max-replica-count 5 \ --scale-down-window 30m \ --load-targets tokens_generated_per_second=150 ``` Best for: Steady workloads where you know typical load ranges. Cold starts take up to a few minutes when scaling from 0→1. Deployments with min replicas = 0 are auto-deleted after 7 days of no traffic. [Reserved capacity](/deployments/reservations) guarantees availability during scale-up. --- # Source: https://docs.fireworks.ai/api-reference-dlde/batch-delete-batch-jobs.md # Batch Delete Batch Jobs ## OpenAPI ````yaml post /v1/accounts/{account_id}/batchJobs:batchDelete paths: path: /v1/accounts/{account_id}/batchJobs:batchDelete method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: names: allOf: - type: array items: type: string description: The resource names of the batch jobs to delete. required: true refIdentifier: '#/components/schemas/GatewayBatchDeleteBatchJobsBody' requiredProperties: - names examples: example: value: names: - response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/batch-delete-environments.md # Batch Delete Environments ## OpenAPI ````yaml post /v1/accounts/{account_id}/environments:batchDelete paths: path: /v1/accounts/{account_id}/environments:batchDelete method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: names: allOf: - type: array items: type: string description: The resource names of the environments to delete. required: true refIdentifier: '#/components/schemas/GatewayBatchDeleteEnvironmentsBody' requiredProperties: - names examples: example: value: names: - response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/batch-delete-node-pools.md # Batch Delete Node Pools ## OpenAPI ````yaml post /v1/accounts/{account_id}/nodePools:batchDelete paths: path: /v1/accounts/{account_id}/nodePools:batchDelete method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: names: allOf: - type: array items: type: string description: The resource names of the node pools to delete. required: true refIdentifier: '#/components/schemas/GatewayBatchDeleteNodePoolsBody' requiredProperties: - names examples: example: value: names: - response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/guides/batch-inference.md # Batch API > Process large-scale async workloads Process large volumes of requests asynchronously at 50% lower cost. Batch API is ideal for: * Production-scale inference workloads * Large-scale testing and benchmarking * Training smaller models with larger ones ([distillation guide](https://fireworks.ai/blog/deepseek-r1-distillation-reasoning)) Batch jobs automatically use [prompt caching](/guides/prompt-caching) for additional 50% cost savings on cached tokens. Maximize cache hits by placing static content first in your prompts. ## Getting Started Datasets must be in JSONL format (one JSON object per line): **Requirements:** * **File format:** JSONL (each line is a valid JSON object) * **Size limit:** Under 500MB * **Required fields:** `custom_id` (unique) and `body` (request parameters) **Example dataset:** ```json theme={null} {"custom_id": "request-1", "body": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}} {"custom_id": "request-2", "body": {"messages": [{"role": "user", "content": "Explain quantum computing"}], "temperature": 0.7}} {"custom_id": "request-3", "body": {"messages": [{"role": "user", "content": "Tell me a joke"}]}} ``` Save as `batch_input_data.jsonl` locally. You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard. Dataset Upload ```bash theme={null} firectl create dataset batch-input-dataset ./batch_input_data.jsonl ``` You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset). ```bash theme={null} # Create Dataset Entry curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets" \ -H "Authorization: Bearer ${API_KEY}" \ -H "Content-Type: application/json" \ -d '{ "datasetId": "batch-input-dataset", "dataset": { "userUploaded": {} } }' # Upload JSONL file curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets/batch-input-dataset:upload" \ -H "Authorization: Bearer ${API_KEY}" \ -F "file=@./batch_input_data.jsonl" ``` Navigate to the Batch Inference tab and click "Create Batch Inference Job". Select your input dataset: BIJ Dataset Select Choose your model: BIJ Model Select Configure optional settings: BIJ Optional Settings ```bash theme={null} firectl create batch-inference-job \ --model accounts/fireworks/models/llama-v3p1-8b-instruct \ --input-dataset-id batch-input-dataset ``` With additional parameters: ```bash theme={null} firectl create batch-inference-job \ --job-id my-batch-job \ --model accounts/fireworks/models/llama-v3p1-8b-instruct \ --input-dataset-id batch-input-dataset \ --output-dataset-id batch-output-dataset \ --max-tokens 1024 \ --temperature 0.7 \ --top-p 0.9 ``` ```bash theme={null} curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs?batchInferenceJobId=my-batch-job" \ -H "Authorization: Bearer ${API_KEY}" \ -H "Content-Type: application/json" \ -d '{ "model": "accounts/fireworks/models/llama-v3p1-8b-instruct", "inputDatasetId": "accounts/'${ACCOUNT_ID}'/datasets/batch-input-dataset", "outputDatasetId": "accounts/'${ACCOUNT_ID}'/datasets/batch-output-dataset", "inferenceParameters": { "maxTokens": 1024, "temperature": 0.7, "topP": 0.9 } }' ``` View all your batch inference jobs in the dashboard: BIJ List ```bash theme={null} # Get job status firectl get batch-inference-job my-batch-job # List all batch jobs firectl list batch-inference-jobs ``` ```bash theme={null} # Get specific job curl -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs/my-batch-job" \ -H "Authorization: Bearer ${API_KEY}" # List all jobs curl -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs" \ -H "Authorization: Bearer ${API_KEY}" ``` Navigate to the output dataset and download the results: BIJ Dataset Download ```bash theme={null} firectl download dataset batch-output-dataset ``` ```bash theme={null} # Get download endpoint and save response curl -s -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets/batch-output-dataset:getDownloadEndpoint" \ -H "Authorization: Bearer ${API_KEY}" \ -d '{}' > download.json # Extract and download all files jq -r '.filenameToSignedUrls | to_entries[] | "\(.key) \(.value)"' download.json | \ while read -r object_path signed_url; do fname=$(basename "$object_path") echo "Downloading → $fname" curl -L -o "$fname" "$signed_url" done ``` The output dataset contains two files: a **results file** (successful responses in JSONL format) and an **error file** (failed requests with debugging info). ## Reference Batch jobs progress through several states: | State | Description | | -------------- | ----------------------------------------------------- | | **VALIDATING** | Dataset is being validated for format requirements | | **PENDING** | Job is queued and waiting for resources | | **RUNNING** | Actively processing requests | | **COMPLETED** | All requests successfully processed | | **FAILED** | Unrecoverable error occurred (check status message) | | **EXPIRED** | Exceeded 24-hour limit (completed requests are saved) | * **Base Models** – Any model in the [Model Library](https://fireworks.ai/models) * **Custom Models** – Your uploaded or fine-tuned models *Note: Newly added models may have a delay before being supported. See [Quantization](/models/quantization) for precision info.* * **Per-request limits:** Same as [Chat Completion API limits](/api-reference/post-chatcompletions) * **Input dataset:** Max 500MB * **Output dataset:** Max 8GB (job may expire early if reached) * **Job timeout:** 24 hours maximum Jobs expire after 24 hours. Completed rows are billed and saved to the output dataset. **Resume processing:** ```bash theme={null} firectl create batch-inference-job \ --continue-from original-job-id \ --model accounts/fireworks/models/llama-v3p1-8b-instruct \ --output-dataset-id new-output-dataset ``` This processes only unfinished/failed requests from the original job. **Download complete lineage:** ```bash theme={null} firectl download dataset output-dataset-id --download-lineage ``` Downloads all datasets in the continuation chain. * **Validate thoroughly:** Check dataset format before uploading * **Descriptive IDs:** Use meaningful `custom_id` values for tracking * **Optimize tokens:** Set reasonable `max_tokens` limits * **Monitor progress:** Track long-running jobs regularly * **Cache optimization:** Place static content first in prompts ## Next Steps Maximize cost savings with automatic prompt caching Create custom models for your batch workloads Full API documentation for Batch API --- # Source: https://docs.fireworks.ai/deployments/benchmarking.md # Performance benchmarking > Measure and optimize your deployment's performance with load testing Understanding your deployment's performance under various load conditions is essential for production readiness. Fireworks provides tools and best practices for benchmarking throughput, latency, and identifying bottlenecks. ## Fireworks Benchmark Tool Use our open-source benchmarking tool to measure and optimize your deployment's performance: **[Fireworks Benchmark Tool](https://github.com/fw-ai/benchmark)** This tool allows you to: * Test throughput and latency under various load conditions * Simulate production traffic patterns * Identify performance bottlenecks * Compare different deployment configurations ### Installation ```bash theme={null} git clone https://github.com/fw-ai/benchmark.git cd benchmark pip install -r requirements.txt ``` ### Basic usage Run a basic benchmark test: ```bash theme={null} python benchmark.py \ --model "accounts/fireworks/models/llama-v3p1-8b-instruct" \ --deployment "your-deployment-id" \ --num-requests 1000 \ --concurrency 10 ``` ### Key metrics to monitor When benchmarking your deployment, focus on these key metrics: * **Throughput**: Requests per second (RPS) your deployment can handle * **Latency**: Time to first token (TTFT) and end-to-end response time * **Token generation rate**: Tokens per second during generation * **Error rate**: Failed requests under load ## Custom benchmarking You can also develop custom performance testing scripts or integrate with monitoring tools to track metrics over time. Consider: * Using production-like request patterns and payloads * Testing with various concurrency levels * Monitoring resource utilization (GPU, memory, network) * Testing autoscaling behavior under load ## Best practices 1. **Warm up your deployment**: Run a few requests before benchmarking to ensure models are loaded 2. **Test realistic scenarios**: Use request patterns and payloads similar to your production workload 3. **Gradually increase load**: Start with low concurrency and gradually increase to find your deployment's limits 4. **Monitor for errors**: Track error rates and response codes to identify issues under load 5. **Compare configurations**: Test different deployment shapes, quantization levels, and hardware to optimize cost and performance ## Next steps Configure autoscaling to handle variable load Optimize your client code for maximum throughput --- # Source: https://docs.fireworks.ai/api-reference-dlde/cancel-batch-job.md # Cancel Batch Job > Cancels an existing batch job if it is queued, pending, or running. ## OpenAPI ````yaml post /v1/accounts/{account_id}/batchJobs/{batch_job_id}:cancel paths: path: /v1/accounts/{account_id}/batchJobs/{batch_job_id}:cancel method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id batch_job_id: schema: - type: string required: true description: The Batch Job Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: {} required: true refIdentifier: '#/components/schemas/GatewayCancelBatchJobBody' examples: example: value: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/cancel-dpo-job.md # firectl cancel dpo-job > Cancels a running dpo job. ``` firectl cancel dpo-job [flags] ``` ### Examples ``` firectl cancel dpo-job my-dpo-job firectl cancel dpo-job accounts/my-account/dpo-jobs/my-dpo-job ``` ### Flags ``` -h, --help help for dpo-job --wait Wait until the dpo job is cancelled. --wait-timeout duration Maximum time to wait when using --wait flag. (default 10m0s) ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/cancel-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/cancel-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/cancel-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/cancel-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/cancel-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/cancel-reinforcement-fine-tuning-job.md # Cancel Reinforcement Fine-tuning Job ## OpenAPI ````yaml post /v1/accounts/{account_id}/reinforcementFineTuningJobs/{reinforcement_fine_tuning_job_id}:cancel paths: path: >- /v1/accounts/{account_id}/reinforcementFineTuningJobs/{reinforcement_fine_tuning_job_id}:cancel method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id reinforcement_fine_tuning_job_id: schema: - type: string required: true description: The Reinforcement Fine-tuning Job Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: {} required: true refIdentifier: '#/components/schemas/GatewayCancelReinforcementFineTuningJobBody' examples: example: value: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/cancel-supervised-fine-tuning-job.md # firectl cancel supervised-fine-tuning-job > Cancels a running supervised fine-tuning job. ``` firectl cancel supervised-fine-tuning-job [flags] ``` ### Examples ``` firectl cancel supervised-fine-tuning-job my-sft-job firectl cancel supervised-fine-tuning-job accounts/my-account/supervisedFineTuningJobs/my-sft-job ``` ### Flags ``` -h, --help help for supervised-fine-tuning-job ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/updates/changelog.md # Changelog # Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints ## **Improved Evaluator Creation Experience** The evaluator creation workflow has been significantly enhanced with GitHub template integration. You can now: * Fork evaluator templates directly from GitHub repositories * Browse and preview templates before using them * Create evaluators with a streamlined save dialog * View evaluators in a new sortable and paginated table ## **MLOps & Observability Integrations** New documentation for integrating Fireworks with MLOps and observability tools: * [Weights & Biases (W\&B)](/ecosystem/integrations/wandb) integration for experiment tracking during fine-tuning * MLflow integration for model management and experiment logging ## ✨ New Models * **[Kimi K2 Thinking](https://app.fireworks.ai/models/fireworks/kimi-k2-thinking)** is now available in the Model Library * **[KAT Dev 32B](https://app.fireworks.ai/models/fireworks/kat-dev-32b)** is now available in the Model Library * **[KAT Dev 72B Exp](https://app.fireworks.ai/models/fireworks/kat-dev-72b-exp)** is now available in the Model Library ## ☁️ Serverless * **[Kimi K2 Thinking](https://app.fireworks.ai/models/fireworks/kimi-k2-thinking)** is now available on serverless ## 📚 New REST API Endpoints New REST API endpoints are now available for managing Reinforcement Fine-Tuning Steps and deployments: * [Create Reinforcement Fine-Tuning Step](/api-reference/create-reinforcement-fine-tuning-step) * [List Reinforcement Fine-Tuning Steps](/api-reference/list-reinforcement-fine-tuning-steps) * [Get Reinforcement Fine-Tuning Step](/api-reference/get-reinforcement-fine-tuning-step) * [Delete Reinforcement Fine-Tuning Step](/api-reference/delete-reinforcement-fine-tuning-step) * [Scale Deployment](/api-reference/scale-deployment) * [List Deployment Shape Versions](/api-reference/list-deployment-shape-versions) * [Get Deployment Shape Version](/api-reference/get-deployment-shape-version) * [Get Dataset Download Endpoint](/api-reference/get-dataset-download-endpoint) - **Deployment Region Selector:** Added GPU accelerator hints to the region selector, with Global set as default for optimal availability (Web App) - **Preference Fine-Tuning (DPO):** Added to the Fine-Tuning page for training models with human preference data (Web App) - **Redeem Credits:** Credit code redemption is now available to all users from the Billing page (Web App) - **Model Library Search:** Improved fuzzy search with hybrid matching for better model discovery (Web App) - **Cogito Models:** Added Cogito namespace to the Model Library for easier discovery (Web App) - **Custom Model Editing:** You can now edit display name and description inline on custom model detail pages (Web App) - **Loss Curve Charts:** Fixed an issue where loss curves were not updating in real-time during fine-tuning jobs (Web App) - **Deployment Shapes:** Fixed deployment shape selection for fine-tuned models (PEFT and live-merge) (Web App) - **Usage Charts:** Fixed replica calculation in multi-series usage charts (Web App) - **Session Management:** Removed auto-logout on inactivity for improved user experience (Web App) - **Onboarding:** Updated onboarding survey with improved profile and questionnaire flow (Web App) - **Fine-Tuning Form:** Max context length now defaults to and is capped by the selected base model's context length (Web App) - **Secrets for Evaluators:** Added documentation for using secrets in evaluators to securely call external services (Docs) - **Region Selection:** Deprecated regions are now filtered from deployment options (Web App) - **Playground:** Embedding and reranker models are now filtered from playground model selection (Web App) - **LoRA Rank:** Updated valid LoRA rank range to 4-32 in documentation (Docs) - **SFT Documentation:** Added documentation for batch size, learning rate warmup, and gradient accumulation settings (Docs) - **Direct Routing:** Added OpenAI SDK code examples for direct routing (Docs) - **Recommended Models:** Updated model recommendations with migration guidance from Claude, GPT, and Gemini (Docs) ## ☀️ Sunsetting Build SDK The Build SDK is being deprecated in favor of a new Python SDK generated directly from our REST API. The new SDK is more up-to-date, flexible, and continuously synchronized with our REST API. Please note that the last version of the Build SDK will be `0.19.20`, and the new SDK will start at `1.0.0`. Python package managers will not automatically update to the new SDK, so you will need to manually update your dependencies and refactor your code. Existing codebases using the Build SDK will continue to function as before and will not be affected unless you choose to upgrade to the new SDK version. The new SDK replaces the Build SDK's `LLM` and `Dataset` classes with REST API-aligned methods. If you upgrade to version `1.0.0` or later, you will need to migrate your code. ## 🚀 Improved RFT Experience We've drastically improved the RFT experience with better reliability, developer-friendly SDK for hooking up your existing agents, support for multi-turn training, better observability in our Web App, and better overall developer experience. See [Reinforcement Fine-Tuning](/fine-tuning/reinforcement-fine-tuning-models) for more details. ## Supervised Fine-Tuning We now support supervised fine tuning with separate thinking traces for reasoning models (e.g. DeepSeek R1, GPT OSS, Qwen3 Thinking etc) that ensures training-inference consistency. An example including thinking traces would look like: ```json theme={null} { "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris.", "reasoning_content": "The user is asking about the capital city of France, it should be Paris."} ] } { "messages": [ {"role": "user", "content": "What is 1+1?"}, {"role": "assistant", "content": "2", "weight": 0, "reasoning_content": "The user is asking about the result of 1+1, the answer is 2."}, {"role": "user", "content": "Now what is 2+2?"}, {"role": "assistant", "content": "4", "reasoning_content": "The user is asking about the result of 2+2, the answer should be 4."} ] } ``` We are also properly supporting multi-turn fine tuning (with or without thinking traces) for GPT OSS model family that ensures training-inference consistency. ## Supervised Fine-Tuning We now support Qwen3 MoE model (Qwen3 dense models are already supported) and GPT OSS models for supervised fine-tuning. GPT OSS model fine tunning support is single-turn without thinking traces at the moment. ## 🎨 Vision-Language Model Fine-Tuning You can now fine-tune Vision-Language Models (VLMs) on Fireworks AI using the Qwen 2.5 VL model family. This extends our Supervised Fine-tuning V2 platform to support multimodal training with both images and text data. **Supported models:** * Qwen 2.5 VL 3B Instruct * Qwen 2.5 VL 7B Instruct * Qwen 2.5 VL 32B Instruct * Qwen 2.5 VL 72B Instruct **Features:** * Fine-tune on datasets containing both images and text in JSONL format with base64-encoded images * Support for up to 64K context length during training * Built on the same Supervised Fine-tuning V2 infrastructure as text models See the [VLM fine-tuning documentation](/fine-tuning/fine-tuning-vlm) for setup instructions and dataset formatting requirements. ## 🔧 Build SDK: Deployment Configuration Application Requirement The Build SDK now requires you to call `.apply()` to apply any deployment configurations to Fireworks when using `deployment_type="on-demand"` or `deployment_type="on-demand-lora"`. This change ensures explicit control over when deployments are created and helps prevent accidental deployment creation. **Key changes:** * `.apply()` is now required for on-demand and on-demand-lora deployments * Serverless deployments do not require `.apply()` calls * If you do not call `.apply()`, you are expected to set up the deployment through the deployment page at [https://app.fireworks.ai/dashboard/deployments](https://app.fireworks.ai/dashboard/deployments) **Migration guide:** * Add `llm.apply()` after creating LLM instances with `deployment_type="on-demand"` or `deployment_type="on-demand-lora"` * No changes needed for serverless deployments * See updated documentation for examples and best practices This change improves deployment management and provides better control over resource creation. This applies to Python SDK version `>=0.19.14`. ## 🚀 Bring Your Own Rollout and Reward Development for Reinforcement Learning You can now develop your own custom rollout and reward functionality while using Fireworks to manage the training and deployment of your models. This gives you full control over your reinforcement learning workflows while leveraging Fireworks' infrastructure for model training and deployment. See the new [LLM.reinforcement\_step()](/tools-sdks/python-client/sdk-reference#reinforcement-step) method and [ReinforcementStep](/tools-sdks/python-client/sdk-reference#reinforcementstep) class for usage examples and details. ## Supervised Fine-Tuning V2 We now support Llama 4 MoE model supervised fine-tuning (Llama 4 Scout, Llama 4 Maverick, Text only). ## 🏗️ Build SDK `LLM` Deployment Logic Refactor Based on early feedback from users and internal testing, we've refactored the `LLM` class deployment logic in the Build SDK to make it easier to understand. **Key changes:** * The `id` parameter is now required when `deployment_type` is `"on-demand"` * The `base_id` parameter is now required when `deployment_type` is `"on-demand-lora"` * The `deployment_display_name` parameter is now optional and defaults to the filename where the LLM was instantiated A new deployment will be created if a deployment with the same `id` does not exist. Otherwise, the existing deployment will be reused. ## 🚀 Support for Responses API in Python SDK You can now use the Responses API in the Python SDK. This is useful if you want to use the Responses API in your own applications. See the [Responses API guide](/guides/response-api) for usage examples and details. ## Support for LinkedIn authentication You can now log in to Fireworks using your LinkedIn account. This is useful if you already have a LinkedIn account and want to use it to log in to Fireworks. To log in with LinkedIn, go to the [Fireworks login page](https://fireworks.ai/login) and click the "Continue with LinkedIn" button. You can also log in with LinkedIn from the CLI using the `firectl login` command. **How it works:** * Fireworks uses your LinkedIn primary email address for account identification * You can switch between different Fireworks accounts by changing your LinkedIn primary email * See our [LinkedIn authentication FAQ](/faq-new/account-access/what-email-does-linkedin-authentication-use) for detailed instructions on managing email addresses ## Support for GitHub authentication You can now log in to Fireworks using your GitHub account. This is useful if you already have a GitHub account and want to use it to log in to Fireworks. To log in with GitHub, go to the [Fireworks login page](https://fireworks.ai/login) and click the "Continue with GitHub" button. You can also log in with GitHub from the CLI using the `firectl login` command. ## 🚨 Document Inlining Deprecation Document Inlining has been deprecated and is no longer available on the Fireworks platform. This feature allowed LLMs to process images and PDFs through the chat completions API by appending `#transform=inline` to document URLs. **Migration recommendations:** * For image processing: Use Vision Language Models (VLMs) like [Qwen2.5-VL 32B Instruct](https://app.fireworks.ai/models/fireworks/qwen2p5-vl-32b-instruct) * For PDF processing: Use dedicated PDF processing libraries combined with text-based LLMs * For structured extraction: Leverage our [structured responses](/structured-responses/structured-response-formatting) capabilities For assistance with migration, please contact our support team or visit our [Discord community](https://discord.gg/fireworks-ai). ## 🎯 Build SDK: Reward-kit integration for evaluator development The Build SDK now natively integrates with [reward-kit](https://github.com/fw-ai-external/reward-kit) to simplify evaluator development for [Reinforcement Fine-Tuning (RFT)](/fine-tuning/reinforcement-fine-tuning-models). You can now create custom evaluators in Python with automatic dependency management and seamless deployment to Fireworks infrastructure. **Key features:** * Native reward-kit integration for evaluator development * Automatic packaging of dependencies from `pyproject.toml` or `requirements.txt` * Local testing capabilities before deployment * Direct integration with Fireworks datasets and evaluation jobs * Support for third-party libraries and complex evaluation logic See our [Developing Evaluators](/tools-sdks/python-client/developing-evaluators) guide to get started with your first evaluator in minutes. ## Added new Responses API for advanced conversational workflows and integrations * Continue conversations across multiple turns using the `previous_response_id` parameter to maintain context without resending full history * Stream responses in real time as they are generated for responsive applications * Control response storage with the `store` parameter—choose whether responses are retrievable by ID or ephemeral See the [Response API guide](/guides/response-api) for usage examples and details. ## Supervised Fine-Tuning V2 Supervised Fine-Tuning V2 released. **Key features:** * Supports Qwen 2/2.5/3 series, Phi 4, Gemma 3, the Llama 3 family, Deepseek V2, V3, R1 * Longer context window up to full context length of the supported models * Multi-turn function calling fine-tuning * Quantization aware training More details in the [blogpost](https://fireworks.ai/blog/supervised-finetuning-v2). ## Reinforcement Fine-Tuning (RFT) Reinforcement Fine-Tuning released. Train expert models to surpass closed source frontier models through verifiable reward. More details in [blospost](https://fireworks.ai/blog/reinforcement-fine-tuning-models). ## Diarization and batch processing support added to audio inference See our [blog post](https://fireworks.ai/blog/audio-summer-updates-and-new-features) for details. ## 🚀 Easier & faster LoRA fine-tune deployments on Fireworks You can now deploy a LoRA fine-tune with a single command and get speeds that approximately match the base model: ```bash theme={null} firectl create deployment "accounts/fireworks/models/" ``` Previously, this involved two distinct steps, and the resulting deployment was slower than the base model: 1. Create a deployment using `firectl create deployment "accounts/fireworks/models/" --enable-addons` 2. Then deploy the addon to the deployment: `firectl load-lora --deployment ` For more information, see our [deployment documentation](https://docs.fireworks.ai/models/deploying#deploying-to-on-demand). This change is for dedicated deployments with a single LoRA. You can still deploy multiple LoRAs on a deployment or deploy LoRA(s) on some Serverless models as described in the documentation. --- # Source: https://docs.fireworks.ai/fine-tuning/cli-reference.md # Training Guide: CLI > Launch RFT jobs using the eval-protocol CLI The Eval Protocol CLI provides the fastest, most reproducible way to launch RFT jobs. This page covers everything you need to know about using `eval-protocol create rft`. Before launching, review [Training Prerequisites & Validation](/fine-tuning/training-prerequisites) for requirements, validation checks, and common errors. Already familiar with [firectl](/fine-tuning/cli-reference#using-firectl-cli-alternative)? Use it as an alternative to eval-protocol. ## Installation and setup The following guide will help you: * Upload your evaluator to Fireworks. If you don't have one yet, see [Concepts > Evaluators](/fine-tuning/evaluators) * Upload your dataset to Fireworks * Create and launch the RFT job ```bash theme={null} pip install eval-protocol ``` Verify installation: ```bash theme={null} eval-protocol --version ``` Configure your Fireworks API key: ```bash theme={null} export FIREWORKS_API_KEY="fw_your_api_key_here" ``` Or create a `.env` file: ```bash theme={null} FIREWORKS_API_KEY=fw_your_api_key_here ``` Before training, verify your evaluator works. This command discovers and runs your `@evaluation_test` with pytest. If a Dockerfile is present, it builds an image and runs the test in Docker; otherwise it runs on your host. ```bash theme={null} cd evaluator_directory ep local-test ``` From the directory where your evaluator and dataset (dataset.jsonl) are located, ```bash theme={null} eval-protocol create rft \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --output-model my-model-name ``` The CLI will: * Upload evaluator code (if changed) * Upload dataset (if changed) * Create the RFT job * Display dashboard links for monitoring Expected output: ``` Created Reinforcement Fine-tuning Job name: accounts/your-account/reinforcementFineTuningJobs/abc123 Dashboard Links: Evaluator: https://app.fireworks.ai/dashboard/evaluators/your-evaluator Dataset: https://app.fireworks.ai/dashboard/datasets/your-dataset RFT Job: https://app.fireworks.ai/dashboard/fine-tuning/reinforcement/abc123 ``` Click the RFT Job link to watch training progress in real-time. See [Monitor Training](/fine-tuning/monitor-training) for details. ## Common CLI options Customize your RFT job with these flags: **Model and output**: ```bash theme={null} --base-model accounts/fireworks/models/llama-v3p1-8b-instruct # Base model to fine-tune --output-model my-custom-name # Name for fine-tuned model ``` **Training parameters**: ```bash theme={null} --epochs 2 # Number of training epochs (default: 1) --learning-rate 5e-5 # Learning rate (default: 1e-4) --lora-rank 16 # LoRA rank (default: 8) --batch-size 65536 # Batch size in tokens (default: 32768) ``` **Rollout (sampling) parameters**: ```bash theme={null} --inference-temperature 0.8 # Sampling temperature (default: 0.7) --inference-n 8 # Number of rollouts per prompt (default: 4) --inference-max-tokens 4096 # Max tokens per response (default: 2048) --inference-top-p 0.95 # Top-p sampling (default: 1.0) --inference-top-k 50 # Top-k sampling (default: 40) ``` **Remote environments**: ```bash theme={null} --remote-server-url https://your-evaluator.example.com # For remote rollout processing ``` **Force re-upload**: ```bash theme={null} --force # Re-upload evaluator even if unchanged ``` See all options: ```bash theme={null} eval-protocol create rft --help ``` ## Advanced options Track training metrics in W\&B for deeper analysis: ```bash theme={null} eval-protocol create rft \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --wandb-project my-rft-experiments \ --wandb-entity my-org ``` Set `WANDB_API_KEY` in your environment first. Save intermediate checkpoints during training: ```bash theme={null} firectl create rftj \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --checkpoint-frequency 500 # Save every 500 steps ... ``` Available in `firectl` only. Speed up training with multiple GPUs: ```bash theme={null} firectl create rftj \ --base-model accounts/fireworks/models/llama-v3p1-70b-instruct \ --accelerator-count 4 # Use 4 GPUs ... ``` Recommended for large models (70B+). For evaluators that need more time: ```bash theme={null} firectl create rftj \ --rollout-timeout 300 # 5 minutes per rollout ... ``` Default is 60 seconds. Increase for complex evaluations. ## Examples **Fast experimentation** (small model, 1 epoch): ```bash theme={null} eval-protocol create rft \ --base-model accounts/fireworks/models/qwen3-0p6b \ --output-model quick-test ``` **High-quality training** (more rollouts, higher temperature): ```bash theme={null} eval-protocol create rft \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --output-model high-quality-model \ --inference-n 8 \ --inference-temperature 1.0 ``` **Remote environment** (for multi-turn agents): ```bash theme={null} eval-protocol create rft \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --remote-server-url https://your-agent.example.com \ --output-model remote-agent ``` **Multiple epochs with custom learning rate**: ```bash theme={null} eval-protocol create rft \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --epochs 3 \ --learning-rate 5e-5 \ --output-model multi-epoch-model ``` ## Using `firectl` CLI (Alternative) For users already familiar with Fireworks `firectl`, you can create RFT jobs directly: ```bash theme={null} firectl create rftj \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --dataset accounts/your-account/datasets/my-dataset \ --evaluator accounts/your-account/evaluators/my-evaluator \ --output-model my-finetuned-model ``` **Differences from `eval-protocol`**: * Requires fully qualified resource names (accounts/...) * Must manually upload evaluators and datasets first * More verbose but offers finer control * Same underlying API as `eval-protocol` See [firectl documentation](/tools-sdks/firectl/commands/create-reinforcement-fine-tuning-job) for all options. ## Next steps Review requirements, validation, and common errors Track job progress, inspect rollouts, and debug issues Learn how to adjust parameters for better results --- # Source: https://docs.fireworks.ai/deployments/client-side-performance-optimization.md # Client-side performance optimization > Optimize your client code for maximum performance with dedicated deployments When using a dedicated deployment, it is important to optimize the client-side HTTP connection pooling for maximum performance. We recommend using our [Python SDK](/tools-sdks/python-client/sdk-introduction) as it has good defaults for connection pooling and utilizes [aiohttp](https://docs.aiohttp.org/en/stable/index.html) for optimal performance with Python's `asyncio` library. It also includes retry logic for handling `429` errors that Fireworks returns when the server is overloaded. We have run benchmarks that demonstrate the performance benefits. ## General optimization recommendations Based on our benchmarks, we recommend the following: 1. Use a client library optimized for high concurrency, such as [aiohttp](https://docs.aiohttp.org/en/stable/index.html) in Python or [http.Agent](https://nodejs.org/api/http.html#class-httpagent) in Node.js. 2. Keep the [`connection pool size`](https://docs.aiohttp.org/en/stable/client_advanced.html#limiting-connection-pool-size) high (1000+). 3. Increase concurrency until performance stops improving or you observe too many `429` errors. 4. Use [direct routing](/deployments/direct-routing) to avoid the global API load balancer and route requests directly to your deployment. ## Code example: Optimal concurrent requests (Python) Here's how to implement optimal concurrent requests using `asyncio` and the `LLM` class: ```python main.py theme={null} import asyncio from fireworks import LLM async def make_concurrent_requests( messages: list[str], max_workers: int = 1000, max_connections: int = 1000, # this is the default value in the SDK ): """Make concurrent requests with optimized connection pooling""" llm = LLM( model="your-model-name", deployment_type="on-demand", id="your-deployment-id", max_connections=max_connections ) # Apply deployment configuration to Fireworks llm.apply() # Semaphore to limit concurrent requests semaphore = asyncio.Semaphore(max_workers) async def single_request(message: str): """Make a single request with semaphore control""" async with semaphore: response = await llm.chat.completions.acreate( messages=[{"role": "user", "content": message}], max_tokens=100 ) return response.choices[0].message.content # Create all request tasks tasks = [ single_request(message) for message in messages ] # Execute all requests concurrently results = await asyncio.gather(*tasks) return results # Usage example async def main(): messages = ["Hello!"] * 1000 # 1000 requests results = await make_concurrent_requests( messages=messages, ) print(f"Completed {len(results)} requests") if __name__ == "__main__": asyncio.run(main()) ``` This implementation: * Uses `asyncio.Semaphore` to control concurrency to avoid overwhelming the server * Allows configuration of the maximum number of concurrent connections to the Fireworks API --- # Source: https://docs.fireworks.ai/guides/completions-api.md # Completions API > Use the completions API for raw text generation with custom prompt templates The completions API provides raw text generation without automatic message formatting. Use this when you need full control over prompt formatting or when working with base models. ## When to use completions **Use the completions API for:** * Custom prompt templates with specific formatting requirements * Base models (non-instruct/non-chat variants) * Fine-grained control over token-level formatting * Legacy applications that depend on raw completion format **For most use cases, use [chat completions](/guides/querying-text-models) instead.** Chat completions handles message formatting automatically and works better with instruct-tuned models. ## Basic usage ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.completions.create( model="accounts/fireworks/models/deepseek-v3p1", prompt="Once upon a time" ) print(response.choices[0].text) ``` ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const response = await client.completions.create({ model: "accounts/fireworks/models/deepseek-v3p1", prompt: "Once upon a time", }); console.log(response.choices[0].text); ``` ```bash theme={null} curl https://api.fireworks.ai/inference/v1/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $FIREWORKS_API_KEY" \ -d '{ "model": "accounts/fireworks/models/deepseek-v3p1", "prompt": "Once upon a time" }' ``` Most models automatically prepend the beginning-of-sequence (BOS) token (e.g., ``) to your prompt. Verify this with the `raw_output` parameter if needed. ## Custom prompt templates The completions API is useful when you need to implement custom prompt formats: ```python theme={null} # Custom few-shot prompt template prompt = """Task: Classify the sentiment of the following text. Text: I love this product! Sentiment: Positive Text: This is terrible. Sentiment: Negative Text: The weather is nice today. Sentiment:""" response = client.completions.create( model="accounts/fireworks/models/deepseek-v3p1", prompt=prompt, max_tokens=10, temperature=0 ) print(response.choices[0].text) # Output: " Positive" ``` ## Common parameters All [chat completions parameters](/guides/querying-text-models#configuration--debugging) work with completions: * `temperature` - Control randomness (0-2) * `max_tokens` - Limit output length * `top_p`, `top_k`, `min_p` - Sampling parameters * `stream` - Stream responses token-by-token * `frequency_penalty`, `presence_penalty` - Reduce repetition See the [API reference](/api-reference/post-completions) for complete parameter documentation. ## Querying deployments Use completions with [on-demand deployments](/guides/ondemand-deployments) by specifying the deployment identifier: ```python theme={null} response = client.completions.create( model="accounts/fireworks/models/deepseek-v3p1#accounts//deployments/", prompt="Your prompt here" ) ``` ## Next steps Use chat completions for most use cases Stream responses for real-time UX Complete API documentation --- # Source: https://docs.fireworks.ai/getting-started/concepts.md # Concepts > This document outlines basic Fireworks AI concepts. ## Resources ### Account Your account is the top-level resource under which other resources are located. Quotas and billing are enforced at the account level, so usage for all users in an account contribute to the same quotas and bill. * For developer accounts, the account ID is auto-generated from the email address used to sign up. * Enterprise accounts can optionally choose a custom, unique account ID. ### User A user is an email address associated with an account. Users added to an account have full access to delete, edit, and create resources within the account, such as deployments and models. ### Models and model types A model is a set of model weights and metadata associated with the model. Each model has a [**globally unique name**](/getting-started/concepts#resource-names-and-ids) of the form `accounts//models/`. There are two types of models: **Base models:** A base model consists of the full set of model weights, including models pre-trained from scratch and full fine-tunes. * Fireworks has a library of common base models that can be used for [**serverless inference**](/models/overview#serverless-inference) as well as [**dedicated deployments**](/models/overview#dedicated-deployments). Model IDs for these models are pre-populated. For example, `llama-v3p1-70b-instruct` is the model ID for the Llama 3.1 70B model that Fireworks provides. The ID for each model can be found on its page ([**example**](https://app.fireworks.ai/models/fireworks/qwen3-coder-480b-a35b-instruct)) * Users can also [upload their own](/models/uploading-custom-models) custom base models and specify model IDs. **LoRA (low-rank adaptation) addons:** A LoRA addon is a small, fine-tuned model that significantly reduces the amount of memory required to deploy compared to a fully fine-tuned model. Fireworks supports [**training**](/fine-tuning/finetuning-intro), [**uploading**](/models/uploading-custom-models#importing-fine-tuned-models), and [**serving**](/fine-tuning/fine-tuning-models#deploying-a-fine-tuned-model) LoRA addons. LoRA addons must be deployed on a serverless or dedicated deployment for its corresponding base model. Model IDs for LoRAs can be either auto-generated or user-specified. ### Deployments and deployment types A model must be deployed before it can be used for inference. A deployment is a collection (one or more) model servers that host one base model and optionally one or more LoRA addons. Fireworks supports two types of deployments: * **Serverless deployments:** Fireworks hosts popular base models on shared "serverless" deployments. Users pay-per-token to query these models and do not need to configure GPUs. The most popular serverless deployments also support serverless LoRA addons. See our [Quickstart - Serverless](/getting-started/quickstart) guide to get started. * **Dedicated deployments:** Dedicated deployments enable users to configure private deployments with a wide array of hardware (see [on-demand deployments guide](/guides/ondemand-deployments)). Dedicated deployments give users performance guarantees and the most flexibility and control over what models can be deployed. Both LoRA addons and base models can be deployed to dedicated deployments. Dedicated deployments are billed by a GPU-second basis (see [**pricing**](https://fireworks.ai/pricing#ondemand) page). See the [**Querying text models guide**](/guides/querying-text-models) for a comprehensive overview of making LLM inference. ### Deployed model Users can specify a model to query for inference using the model name and deployment name. Alternatively, users can refer to a "deployed model" name that refers to a unique instance of a base model or LoRA addon that is loaded into a deployment. See [On-demand deployments](/guides/ondemand-deployments) guide for more. ### Dataset A dataset is an immutable set of training examples that can be used to fine-tune a model. ### Fine-tuning job A fine-tuning job is an offline training job that uses a dataset to train a LoRA addon model. ## Resource names and IDs A resource name is a globally unique identifier of a resource. The format of a name also identifies the type and hierarchy of the resource, for example: Resource IDs must satisfy the following constraints: * Between 1 and 63 characters (inclusive) * Consists of a-z, 0-9, and hyphen (-) * Does not begin or end with a hyphen (-) * Does not begin with a digit ## Control plane and data plane The Fireworks API can be split into a control plane and a data plane. * The **control plane** consists of APIs used for managing the lifecycle of resources. This includes your account, models, and deployments. * The **data plane** consists of the APIs used for inference and the backend services that power them. ## Interfaces Users can interact with Fireworks through one of many interfaces: * The **web app** at [https://app.fireworks.ai](https://app.fireworks.ai) * The [`firectl`](/tools-sdks/firectl/firectl) CLI * [OpenAI compatible API](/tools-sdks/openai-compatibility) * [Python SDK](/tools-sdks/python-client/sdk-introduction) --- # Source: https://docs.fireworks.ai/api-reference-dlde/connect-environment.md # Connect Environment > Connects the environment to a node pool. Returns an error if there is an existing pending connection. ## OpenAPI ````yaml post /v1/accounts/{account_id}/environments/{environment_id}:connect paths: path: /v1/accounts/{account_id}/environments/{environment_id}:connect method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id environment_id: schema: - type: string required: true description: The Environment Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: connection: allOf: - $ref: '#/components/schemas/gatewayEnvironmentConnection' vscodeVersion: allOf: - type: string title: >- VSCode version on the client side that initiated the connect request required: true refIdentifier: '#/components/schemas/GatewayConnectEnvironmentBody' requiredProperties: - connection examples: example: value: connection: nodePoolId: numRanks: 123 role: useLocalStorage: true vscodeVersion: response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: gatewayEnvironmentConnection: type: object properties: nodePoolId: type: string description: The resource id of the node pool the environment is connected to. numRanks: type: integer format: int32 description: |- For GPU node pools: one GPU per rank w/ host packing, for CPU node pools: one host per rank. If not specified, the default is 1. role: type: string description: |- The ARN of the AWS IAM role that the connection should assume. If not specified, the connection will fall back to the node pool's node_role. zone: type: string description: >- Current for the last zone that this environment is connected to. We want to warn the users about cross zone migration latency when they are connecting to node pool in a different zone as their persistent volume. readOnly: true useLocalStorage: type: boolean description: >- If true, the node's local storage will be mounted on /tmp. This flag has no effect if the node does not have local storage. title: 'Next ID: 8' required: - nodePoolId ```` --- # Source: https://docs.fireworks.ai/fine-tuning/connect-environments.md # Remote Environment Setup > Implement the /init endpoint to run evaluations in your infrastructure If you already have an agent running in your product, or need to run rollouts on your own infrastructure, you can integrate it with RFT using the `RemoteRolloutProcessor`. This delegates rollout execution to an HTTP service you control. Remote agent are ideal for: * Multi-turn agentic workflows with tool use * Access to private databases, APIs, or internal services * Integration with existing agent codebases * Complex simulations that require your infrastructure New to RFT? Start with [local agent](/fine-tuning/quickstart-math) instead. They're simpler and cover most use cases. Only use remote agent environments when you need access to private infrastructure or have an existing agent to integrate. ## How remote rollouts work Remote rollout processor flow diagram showing the interaction between Eval Protocol, your remote server, and Fireworks Tracing During training, Fireworks calls your service's `POST /init` endpoint with the dataset row and correlation metadata. Your agent executes the task (e.g., multi-turn conversation, tool calls, simulation steps), logging progress via Fireworks tracing. Your service sends structured logs tagged with rollout metadata to Fireworks so the system can track completion. Once Fireworks detects completion, it pulls the full trace and evaluates it using your scoring logic. Everything except implementing your remote server is handled automatically by Eval Protocol. You only need to implement the `/init` endpoint and add Fireworks tracing. ## Implementing the /init endpoint Your remote service must implement a single `/init` endpoint that accepts rollout requests. ### Request schema Model configuration including model name and inference parameters like temperature, max\_tokens, etc. Array of conversation messages to send to the model Array of available tools for the model (for function calling) Base URL for making LLM calls through Fireworks tracing (includes correlation metadata) Rollout execution metadata for correlation (rollout\_id, run\_id, row\_id, etc.) Fireworks API key to use for model calls ### Example request ```json theme={null} { "completion_params": { "model": "accounts/fireworks/models/llama-v3p1-8b-instruct", "temperature": 0.7, "max_tokens": 2048 }, "messages": [ { "role": "user", "content": "What is the weather in San Francisco?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string" } } } } } ], "model_base_url": "https://tracing.fireworks.ai/rollout_id/brave-night-42/invocation_id/wise-ocean-15/experiment_id/calm-forest-28/run_id/quick-river-07/row_id/bright-star-91", "metadata": { "invocation_id": "wise-ocean-15", "experiment_id": "calm-forest-28", "rollout_id": "brave-night-42", "run_id": "quick-river-07", "row_id": "bright-star-91" }, "api_key": "fw_your_api_key" } ``` ## Metadata correlation The `metadata` object contains correlation IDs that you must include when logging to Fireworks tracing. This allows Eval Protocol to match logs and traces back to specific evaluation rows. Required metadata fields: * `invocation_id` - Identifies the evaluation invocation * `experiment_id` - Groups related experiments * `rollout_id` - Unique ID for this specific rollout (most important) * `run_id` - Identifies the evaluation run * `row_id` - Links to the dataset row `RemoteRolloutProcessor` automatically generates these IDs and sends them to your server. You don't need to create them yourself—just pass them through to your logging. ## Fireworks tracing integration Your remote server must use Fireworks tracing to report rollout status. Eval Protocol polls these logs to detect when rollouts complete. ### Basic setup ```python theme={null} import logging from eval_protocol import Status, InitRequest, FireworksTracingHttpHandler, RolloutIdFilter # Configure Fireworks tracing handler globally fireworks_handler = FireworksTracingHttpHandler() logging.getLogger().addHandler(fireworks_handler) @app.post("/init") def init(request: InitRequest): # Create rollout-specific logger with filter rollout_logger = logging.getLogger(f"eval_server.{request.metadata.rollout_id}") rollout_logger.addFilter(RolloutIdFilter(request.metadata.rollout_id)) try: # Execute your agent logic here result = execute_agent(request) # Log successful completion with structured status rollout_logger.info( f"Rollout {request.metadata.rollout_id} completed", extra={"status": Status.rollout_finished()} ) return {"status": "success"} except Exception as e: # Log errors with structured status rollout_logger.error( f"Rollout {request.metadata.rollout_id} failed: {e}", extra={"status": Status.rollout_error(str(e))} ) raise ``` ### Key components 1. **FireworksTracingHttpHandler**: Sends logs to Fireworks tracing service 2. **RolloutIdFilter**: Tags logs with the rollout ID for correlation 3. **Status objects**: Structured status reporting that Eval Protocol can parse * `Status.rollout_finished()` - Signals successful completion * `Status.rollout_error(message)` - Signals failure with error details ### Alternative: Environment variable approach For simpler setups, you can use the `EP_ROLLOUT_ID` environment variable instead of manual filters. If your server processes one rollout at a time (e.g., serverless functions, container per request): ```python theme={null} import os import logging from eval_protocol import Status, InitRequest, FireworksTracingHttpHandler # Set rollout ID in environment os.environ["EP_ROLLOUT_ID"] = request.metadata.rollout_id # Configure handler (automatically picks up EP_ROLLOUT_ID) fireworks_handler = FireworksTracingHttpHandler() logging.getLogger().addHandler(fireworks_handler) logger = logging.getLogger(__name__) @app.post("/init") def init(request: InitRequest): # Logs are automatically tagged with rollout_id logger.info("Processing rollout...") # ... execute agent logic ... ``` If your `/init` handler spawns separate Python processes for each rollout: ```python theme={null} import os import logging import multiprocessing from eval_protocol import FireworksTracingHttpHandler, InitRequest def execute_rollout_step_sync(request): # Set EP_ROLLOUT_ID in the child process os.environ["EP_ROLLOUT_ID"] = request.metadata.rollout_id logging.getLogger().addHandler(FireworksTracingHttpHandler()) # Execute your rollout logic here # Logs are automatically tagged @app.post("/init") async def init(request: InitRequest): # Do NOT set EP_ROLLOUT_ID in parent process p = multiprocessing.Process( target=execute_rollout_step_sync, args=(request,) ) p.start() return {"status": "started"} ``` ### How Eval Protocol uses tracing 1. **Your server logs completion**: Uses `Status.rollout_finished()` or `Status.rollout_error()` 2. **Eval Protocol polls**: Searches Fireworks logs by `rollout_id` tag until completion signal found 3. **Status extraction**: Reads structured status fields (`code`, `message`, `details`) to determine outcome 4. **Trace retrieval**: Fetches full trace of model calls and tool use for evaluation ## Complete example Here's a minimal but complete remote server implementation: ```python theme={null} from fastapi import FastAPI from fastapi.responses import JSONResponse from eval_protocol import InitRequest, FireworksTracingHttpHandler, RolloutIdFilter, Status import logging app = FastAPI() # Setup Fireworks tracing fireworks_handler = FireworksTracingHttpHandler() logging.getLogger().addHandler(fireworks_handler) @app.post("/init") async def init(request: InitRequest): # Create rollout-specific logger rollout_logger = logging.getLogger(f"eval_server.{request.metadata.rollout_id}") rollout_logger.addFilter(RolloutIdFilter(request.metadata.rollout_id)) rollout_logger.info(f"Starting rollout {request.metadata.rollout_id}") try: # Your agent logic here # 1. Make model calls using request.model_base_url # 2. Call tools, interact with environment # 3. Collect results result = run_your_agent( messages=request.messages, tools=request.tools, model_config=request.completion_params, api_key=request.api_key ) # Signal completion rollout_logger.info( f"Rollout {request.metadata.rollout_id} completed successfully", extra={"status": Status.rollout_finished()} ) return {"status": "success", "result": result} except Exception as e: # Signal error rollout_logger.error( f"Rollout {request.metadata.rollout_id} failed: {str(e)}", extra={"status": Status.rollout_error(str(e))} ) return JSONResponse( status_code=500, content={"status": "error", "message": str(e)} ) def run_your_agent(messages, tools, model_config, api_key): # Implement your agent logic here # Make model calls, use tools, etc. pass ``` ## Testing locally Before deploying, test your remote server locally: ```bash theme={null} uvicorn main:app --reload --port 8080 ``` In your evaluator test, point to your local server: ```python theme={null} from eval_protocol.pytest import RemoteRolloutProcessor rollout_processor = RemoteRolloutProcessor( remote_base_url="http://localhost:8080" ) ``` ```bash theme={null} pytest my-evaluator-name.py -vs ``` This sends test rollouts to your local server and verifies the integration works. ## Deploying your service Once tested locally, deploy to production: * ✅ Service is publicly accessible (or accessible via VPN/private network) * ✅ HTTPS endpoint with valid SSL certificate (recommended) * ✅ Authentication/authorization configured * ✅ Monitoring and logging set up * ✅ Auto-scaling configured for concurrent rollouts * ✅ Error handling and retry logic implemented * ✅ Service availability SLA meets training requirements **Vercel/Serverless**: * One rollout per function invocation * Use environment variable approach * Configure timeout for long-running evaluations **AWS ECS/Kubernetes**: * Handle concurrent requests with proper worker configuration * Use RolloutIdFilter approach * Set up load balancing **On-premise**: * Ensure network connectivity from Fireworks * Configure firewall rules * Set up VPN if needed for security ## Connecting to RFT Once your remote server is deployed, create an RFT job that uses it: ```bash theme={null} eval-protocol create rft \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --remote-server-url https://your-evaluator.example.com \ --dataset my-dataset ``` The RFT job will send all rollouts to your remote server for evaluation during training. ## Troubleshooting **Symptoms**: Rollouts show as timed out or never complete **Solutions**: * Check that your service is logging `Status.rollout_finished()` correctly * Verify Fireworks tracing handler is configured * Ensure rollout\_id is included in log tags * Check for exceptions being swallowed without logging **Symptoms**: Eval Protocol can't match logs to rollouts **Solutions**: * Verify you're using the exact `rollout_id` from request metadata * Check that RolloutIdFilter or EP\_ROLLOUT\_ID is set correctly * Ensure logs are being sent to Fireworks (check tracing dashboard) **Symptoms**: Training is slow, high rollout latency **Solutions**: * Scale your service to handle concurrent requests * Optimize your agent logic (caching, async operations) * Add more workers or instances * Profile your code to find bottlenecks **Symptoms**: Model calls fail, API errors **Solutions**: * Verify API key is passed correctly from request * Check that your service has network access to Fireworks * Ensure model\_base\_url is used for traced calls ## Example implementations Learn by example: Complete walkthrough using a Vercel TypeScript server for SVG generation Minimal Python implementation showing the basics ## Next steps Launch your RFT job using the CLI Track rollout progress and debug issues Full Remote Rollout Processor tutorial Design effective reward functions --- # Source: https://docs.fireworks.ai/examples/cookbooks.md # Cookbooks > Interactive Jupyter notebooks demonstrating advanced use cases and best practices with Fireworks AI Explore our collection of notebooks that showcase real-world applications, best practices, and advanced techniques for building with Fireworks AI. ## Fine-Tuning & Training Transfer large model capabilities to efficient models using a two-stage SFT + RFT approach. **Techniques:** Supervised Fine-Tuning (SFT) + Reinforcement Fine-Tuning (RFT) **Results:** 52% → 70% accuracy on GSM8K mathematical reasoning Beat frontier closed-source models for product catalog cleansing with vision-language model fine-tuning. **Techniques:** Supervised Fine-Tuning (SFT) **Results:** 48% increase in quality from base model ## Multimodal AI Extract structured data from invoices, forms, and financial documents using state-of-the-art OCR and document understanding. **Use Cases:** Forms, invoices, financial documents, product catalogs **Results:** 90.8% accuracy on invoice extraction (100% on invoice numbers and dates) Real-time audio transcription with streaming support and low latency. **Features:** Streaming support, low-latency transcription, production-ready ## API Features Leverage Model Context Protocol (MCP) for GitHub repository analysis, code search, and documentation Q\&A. **Features:** Repository analysis, code search, documentation Q\&A, GitMCP integration **Models:** Qwen 3 235B with external tool support --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-api-key.md # Source: https://docs.fireworks.ai/api-reference/create-api-key.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-api-key.md # Source: https://docs.fireworks.ai/api-reference/create-api-key.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-api-key.md # Source: https://docs.fireworks.ai/api-reference/create-api-key.md # Create API Key ## OpenAPI ````yaml post /v1/accounts/{account_id}/users/{user_id}/apiKeys paths: path: /v1/accounts/{account_id}/users/{user_id}/apiKeys method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id user_id: schema: - type: string required: true description: The User Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: apiKey: allOf: - $ref: '#/components/schemas/gatewayApiKey' description: The API key to be created. required: true refIdentifier: '#/components/schemas/GatewayCreateApiKeyBody' requiredProperties: - apiKey examples: example: value: apiKey: displayName: expireTime: '2023-11-07T05:31:56Z' response: '200': application/json: schemaArray: - type: object properties: keyId: allOf: - &ref_0 type: string description: >- Unique identifier (Key ID) for the API key, used primarily for deletion. readOnly: true displayName: allOf: - &ref_1 type: string description: >- Display name for the API key, defaults to "default" if not specified. key: allOf: - &ref_2 type: string description: >- The actual API key value, only available upon creation and not stored thereafter. readOnly: true createTime: allOf: - &ref_3 type: string format: date-time description: Timestamp indicating when the API key was created. readOnly: true secure: allOf: - &ref_4 type: boolean description: >- Indicates whether the plaintext value of the API key is unknown to Fireworks. If true, Fireworks does not know this API key's plaintext value. If false, Fireworks does know the plaintext value. readOnly: true email: allOf: - &ref_5 type: string description: Email of the user who owns this API key. readOnly: true prefix: allOf: - &ref_6 type: string title: >- The first few characters of the API key to visually identify it readOnly: true expireTime: allOf: - &ref_7 type: string format: date-time description: >- Timestamp indicating when the API key will expire. If not set, the key never expires. refIdentifier: '#/components/schemas/gatewayApiKey' examples: example: value: keyId: displayName: key: createTime: '2023-11-07T05:31:56Z' secure: true email: prefix: expireTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayApiKey: type: object properties: keyId: *ref_0 displayName: *ref_1 key: *ref_2 createTime: *ref_3 secure: *ref_4 email: *ref_5 prefix: *ref_6 expireTime: *ref_7 ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/create-aws-iam-role-binding.md # Create Aws Iam Role Binding ## OpenAPI ````yaml post /v1/accounts/{account_id}/awsIamRoleBindings paths: path: /v1/accounts/{account_id}/awsIamRoleBindings method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: principal: allOf: - &ref_0 type: string description: >- The principal that is allowed to assume the AWS IAM role. This must be the email address of the user. role: allOf: - &ref_1 type: string description: >- The AWS IAM role ARN that is allowed to be assumed by the principal. required: true refIdentifier: '#/components/schemas/gatewayAwsIamRoleBinding' requiredProperties: &ref_2 - principal - role examples: example: value: principal: role: description: The properties of the AWS IAM role binding being created. response: '200': application/json: schemaArray: - type: object properties: accountId: allOf: - type: string description: The account ID that this binding is associated with. readOnly: true createTime: allOf: - type: string format: date-time description: The creation time of the AWS IAM role binding. readOnly: true principal: allOf: - *ref_0 role: allOf: - *ref_1 refIdentifier: '#/components/schemas/gatewayAwsIamRoleBinding' requiredProperties: *ref_2 examples: example: value: accountId: createTime: '2023-11-07T05:31:56Z' principal: role: description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/create-batch-inference-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/create-batch-inference-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/create-batch-inference-job.md # Create Batch Inference Job ## OpenAPI ````yaml post /v1/accounts/{account_id}/batchInferenceJobs paths: path: /v1/accounts/{account_id}/batchInferenceJobs method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: batchInferenceJobId: schema: - type: string required: false description: ID of the batch inference job. header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string title: >- Human-readable display name of the batch inference job. e.g. "My Batch Inference Job" state: allOf: - &ref_1 $ref: '#/components/schemas/gatewayJobState' description: >- JobState represents the state an asynchronous job can be in. readOnly: true status: allOf: - &ref_2 $ref: '#/components/schemas/gatewayStatus' readOnly: true model: allOf: - &ref_3 type: string description: >- The name of the model to use for inference. This is required, except when continued_from_job_name is specified. inputDatasetId: allOf: - &ref_4 type: string description: >- The name of the dataset used for inference. This is required, except when continued_from_job_name is specified. outputDatasetId: allOf: - &ref_5 type: string description: >- The name of the dataset used for storing the results. This will also contain the error file. inferenceParameters: allOf: - &ref_6 $ref: '#/components/schemas/gatewayInferenceParameters' description: Parameters controlling the inference process. precision: allOf: - &ref_7 $ref: '#/components/schemas/DeploymentPrecision' description: >- The precision with which the model should be served. If PRECISION_UNSPECIFIED, a default will be chosen based on the model. jobProgress: allOf: - &ref_8 $ref: '#/components/schemas/gatewayJobProgress' description: Job progress. readOnly: true continuedFromJobName: allOf: - &ref_9 type: string description: >- The resource name of the batch inference job that this job continues from. Used for lineage tracking to understand job continuation chains. required: true title: 'Next ID: 31' refIdentifier: '#/components/schemas/gatewayBatchInferenceJob' examples: example: value: displayName: model: inputDatasetId: outputDatasetId: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 precision: PRECISION_UNSPECIFIED continuedFromJobName: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the batch inference job. e.g. accounts/my-account/batchInferenceJobs/my-batch-inference-job readOnly: true displayName: allOf: - *ref_0 createTime: allOf: - type: string format: date-time description: The creation time of the batch inference job. readOnly: true createdBy: allOf: - type: string description: >- The email address of the user who initiated this batch inference job. readOnly: true state: allOf: - *ref_1 status: allOf: - *ref_2 model: allOf: - *ref_3 inputDatasetId: allOf: - *ref_4 outputDatasetId: allOf: - *ref_5 inferenceParameters: allOf: - *ref_6 updateTime: allOf: - type: string format: date-time description: The update time for the batch inference job. readOnly: true precision: allOf: - *ref_7 jobProgress: allOf: - *ref_8 continuedFromJobName: allOf: - *ref_9 title: 'Next ID: 31' refIdentifier: '#/components/schemas/gatewayBatchInferenceJob' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' createdBy: state: JOB_STATE_UNSPECIFIED status: code: OK message: model: inputDatasetId: outputDatasetId: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 updateTime: '2023-11-07T05:31:56Z' precision: PRECISION_UNSPECIFIED jobProgress: percent: 123 epoch: 123 totalInputRequests: 123 totalProcessedRequests: 123 successfullyProcessedRequests: 123 failedRequests: 123 outputRows: 123 inputTokens: 123 outputTokens: 123 cachedInputTokenCount: 123 continuedFromJobName: description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayInferenceParameters: type: object properties: maxTokens: type: integer format: int32 description: Maximum number of tokens to generate per response. temperature: type: number format: float description: Sampling temperature, typically between 0 and 2. topP: type: number format: float description: Top-p sampling parameter, typically between 0 and 1. 'n': type: integer format: int32 description: Number of response candidates to generate per input. extraBody: type: string description: |- Additional parameters for the inference request as a JSON string. For example: "{\"stop\": [\"\\n\"]}". topK: type: integer format: int32 description: >- Top-k sampling parameter, limits the token selection to the top k tokens. description: Parameters for the inference requests. gatewayJobProgress: type: object properties: percent: type: integer format: int32 description: Progress percent, within the range from 0 to 100. epoch: type: integer format: int32 description: >- The epoch for which the progress percent is reported, usually starting from 0. This is optional for jobs that don't run in an epoch fasion, e.g. BIJ, EVJ. totalInputRequests: type: integer format: int32 description: Total number of input requests/rows in the job. totalProcessedRequests: type: integer format: int32 description: >- Total number of requests that have been processed (successfully or failed). successfullyProcessedRequests: type: integer format: int32 description: Number of requests that were processed successfully. failedRequests: type: integer format: int32 description: Number of requests that failed to process. outputRows: type: integer format: int32 description: Number of output rows generated. inputTokens: type: integer format: int32 description: Total number of input tokens processed. outputTokens: type: integer format: int32 description: Total number of output tokens generated. cachedInputTokenCount: type: integer format: int32 description: The number of input tokens that hit the prompt cache. description: Progress of a job, e.g. RLOR, EVJ, BIJ etc. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/create-batch-job.md # Create Batch Job ## OpenAPI ````yaml post /v1/accounts/{account_id}/batchJobs paths: path: /v1/accounts/{account_id}/batchJobs method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string description: >- Human-readable display name of the batch job. e.g. "My Batch Job" Must be fewer than 64 characters long. nodePoolId: allOf: - &ref_1 type: string title: >- The ID of the node pool that this batch job should use. e.g. my-node-pool environmentId: allOf: - &ref_2 type: string description: >- The ID of the environment that this batch job should use. e.g. my-env If specified, image_ref must not be specified. snapshotId: allOf: - &ref_3 type: string description: >- The ID of the snapshot used by this batch job. If specified, environment_id must be specified and image_ref must not be specified. numRanks: allOf: - &ref_4 type: integer format: int32 description: |- For GPU node pools: one GPU per rank w/ host packing, for CPU node pools: one host per rank. envVars: allOf: - &ref_5 type: object additionalProperties: type: string description: >- Environment variables to be passed during this job's execution. role: allOf: - &ref_6 type: string description: >- The ARN of the AWS IAM role that the batch job should assume. If not specified, the connection will fall back to the node pool's node_role. pythonExecutor: allOf: - &ref_7 $ref: '#/components/schemas/gatewayPythonExecutor' notebookExecutor: allOf: - &ref_8 $ref: '#/components/schemas/gatewayNotebookExecutor' shellExecutor: allOf: - &ref_9 $ref: '#/components/schemas/gatewayShellExecutor' imageRef: allOf: - &ref_10 type: string description: >- The container image used by this job. If specified, environment_id and snapshot_id must not be specified. annotations: allOf: - &ref_11 type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. state: allOf: - &ref_12 $ref: '#/components/schemas/gatewayBatchJobState' description: The current state of the batch job. readOnly: true shared: allOf: - &ref_13 type: boolean description: >- Whether the batch job is shared with all users in the account. This allows all users to update, delete, clone, and create environments using the batch job. required: true title: 'Next ID: 22' refIdentifier: '#/components/schemas/gatewayBatchJob' requiredProperties: &ref_14 - nodePoolId examples: example: value: displayName: nodePoolId: environmentId: snapshotId: numRanks: 123 envVars: {} role: pythonExecutor: targetType: TARGET_TYPE_UNSPECIFIED target: args: - notebookExecutor: notebookFilename: shellExecutor: command: imageRef: annotations: {} shared: true description: The properties of the batch job being created. response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the batch job. e.g. accounts/my-account/clusters/my-cluster/batchJobs/123456789 readOnly: true displayName: allOf: - *ref_0 createTime: allOf: - type: string format: date-time description: The creation time of the batch job. readOnly: true startTime: allOf: - type: string format: date-time description: The time when the batch job started running. readOnly: true endTime: allOf: - type: string format: date-time description: >- The time when the batch job completed, failed, or was cancelled. readOnly: true createdBy: allOf: - type: string description: The email address of the user who created this batch job. readOnly: true nodePoolId: allOf: - *ref_1 environmentId: allOf: - *ref_2 snapshotId: allOf: - *ref_3 numRanks: allOf: - *ref_4 envVars: allOf: - *ref_5 role: allOf: - *ref_6 pythonExecutor: allOf: - *ref_7 notebookExecutor: allOf: - *ref_8 shellExecutor: allOf: - *ref_9 imageRef: allOf: - *ref_10 annotations: allOf: - *ref_11 state: allOf: - *ref_12 status: allOf: - type: string description: >- Detailed information about the current status of the batch job. readOnly: true shared: allOf: - *ref_13 updateTime: allOf: - type: string format: date-time description: The update time for the batch job. readOnly: true title: 'Next ID: 22' refIdentifier: '#/components/schemas/gatewayBatchJob' requiredProperties: *ref_14 examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' startTime: '2023-11-07T05:31:56Z' endTime: '2023-11-07T05:31:56Z' createdBy: nodePoolId: environmentId: snapshotId: numRanks: 123 envVars: {} role: pythonExecutor: targetType: TARGET_TYPE_UNSPECIFIED target: args: - notebookExecutor: notebookFilename: shellExecutor: command: imageRef: annotations: {} state: STATE_UNSPECIFIED status: shared: true updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: PythonExecutorTargetType: type: string enum: - TARGET_TYPE_UNSPECIFIED - MODULE - FILENAME default: TARGET_TYPE_UNSPECIFIED description: |2- - MODULE: Runs a python module, i.e. passed as -m argument. - FILENAME: Runs a python file. gatewayBatchJobState: type: string enum: - STATE_UNSPECIFIED - CREATING - QUEUED - PENDING - RUNNING - COMPLETED - FAILED - CANCELLING - CANCELLED - DELETING default: STATE_UNSPECIFIED description: |- - CREATING: The batch job is being created. - QUEUED: The batch job is in the queue and waiting to be scheduled. Currently unused. - PENDING: The batch job scheduled and is waiting for resource allocation. - RUNNING: The batch job is running. - COMPLETED: The batch job has finished successfully. - FAILED: The batch job has failed. - CANCELLING: The batch job is being cancelled. - CANCELLED: The batch job was cancelled. - DELETING: The batch job is being deleted. title: 'Next ID: 10' gatewayNotebookExecutor: type: object properties: notebookFilename: type: string description: Path to a notebook file to be executed. description: Execute a notebook file. required: - notebookFilename gatewayPythonExecutor: type: object properties: targetType: $ref: '#/components/schemas/PythonExecutorTargetType' description: The type of Python target to run. target: type: string description: A Python module or filename depending on TargetType. args: type: array items: type: string description: Command line arguments to pass to the Python process. description: Execute a Python process. required: - targetType - target gatewayShellExecutor: type: object properties: command: type: string title: Command we want to run for the shell script description: Execute a shell script. required: - command ```` --- # Source: https://docs.fireworks.ai/api-reference/create-batch-request.md # Create Batch Request Create a batch request for our audio transcription service ### Headers Your Fireworks API key, e.g. `Authorization=FIREWORKS_API_KEY`. Alternatively, can be provided as a query param. ### Path Parameters The relative route of the target API operation (e.g. `"v1/audio/transcriptions"`, `"v1/audio/translations"`). This should correspond to a valid route supported by the backend service. ### Query Parameters Identifies the target backend service or model to handle the request. Currently supported: * `audio-prod`: [https://audio-prod.api.fireworks.ai](https://audio-prod.api.fireworks.ai) * `audio-turbo`: [https://audio-turbo.api.fireworks.ai](https://audio-turbo.api.fireworks.ai) ### Body Request body fields vary depending on the selected `endpoint_id` and `path`. The request body must conform to the schema defined by the corresponding synchronous API.\ For example, transcription requests typically accept fields such as `model`, `diarize`, and `response_format`.\ Refer to the relevant synchronous API for required fields: * [Transcribe audio](https://docs.fireworks.ai/api-reference/audio-transcriptions) * [Translate audio](https://docs.fireworks.ai/api-reference/audio-translations) ### Response The status of the batch request submission.\ A value of `"submitted"` indicates the batch request was accepted and queued for processing. A unique identifier assigned to the batch job. This ID can be used to check job status or retrieve results later. The unique identifier of the account associated with the batch job. The backend service selected to process the request.\ This typically matches the `endpoint_id` used during submission. A human-readable message describing the result of the submission.\ Typically `"Request submitted successfully"` if accepted. ```curl curl theme={null} # Download audio file curl -L -o "audio.flac" "https://tinyurl.com/4997djsh" # Make request curl -X POST "https://audio-batch.api.fireworks.ai/v1/audio/transcriptions?endpoint_id=audio-prod" \ -H "Authorization: " \ -F "file=@audio.flac" ``` ```python python theme={null} !pip install requests import os import requests # input API key and download audio api_key = "" audio = requests.get("https://tinyurl.com/4cb74vas").content # Prepare request data url = "https://audio-batch.api.fireworks.ai/v1/audio/transcriptions?endpoint_id=audio-prod" headers = {"Authorization": api_key} payload = { "model": "whisper-v3", "response_format": "json" } files = {"file": ("audio.flac", audio, "audio/flac")} # Send request response = requests.post(url, headers=headers, data=payload, files=files) print(response.text) ``` To check the status of your batch request, use the [Check Batch Status](https://docs.fireworks.ai/api-reference/get-batch-status) endpoint with the returned `batch_id`. --- # Source: https://docs.fireworks.ai/api-reference-dlde/create-cluster.md # Create Cluster ## OpenAPI ````yaml post /v1/accounts/{account_id}/clusters paths: path: /v1/accounts/{account_id}/clusters method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: cluster: allOf: - $ref: '#/components/schemas/gatewayCluster' description: The properties of the cluster being created. clusterId: allOf: - type: string title: The cluster ID to use in the cluster name. e.g. my-cluster required: true refIdentifier: '#/components/schemas/GatewayCreateClusterBody' requiredProperties: - cluster - clusterId examples: example: value: cluster: displayName: eksCluster: awsAccountId: fireworksManagerRole: region: clusterName: storageBucketName: metricWriterRole: loadBalancerControllerRole: workloadIdentityPoolProviderId: inferenceRole: fakeCluster: projectId: location: clusterName: clusterId: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - &ref_0 type: string title: >- The resource name of the cluster. e.g. accounts/my-account/clusters/my-cluster readOnly: true displayName: allOf: - &ref_1 type: string description: >- Human-readable display name of the cluster. e.g. "My Cluster" Must be fewer than 64 characters long. createTime: allOf: - &ref_2 type: string format: date-time description: The creation time of the cluster. readOnly: true eksCluster: allOf: - &ref_3 $ref: '#/components/schemas/gatewayEksCluster' fakeCluster: allOf: - &ref_4 $ref: '#/components/schemas/gatewayFakeCluster' state: allOf: - &ref_5 $ref: '#/components/schemas/gatewayClusterState' description: The current state of the cluster. readOnly: true status: allOf: - &ref_6 $ref: '#/components/schemas/gatewayStatus' description: >- Detailed information about the current status of the cluster. readOnly: true updateTime: allOf: - &ref_7 type: string format: date-time description: The update time for the cluster. readOnly: true title: 'Next ID: 15' refIdentifier: '#/components/schemas/gatewayCluster' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' eksCluster: awsAccountId: fireworksManagerRole: region: clusterName: storageBucketName: metricWriterRole: loadBalancerControllerRole: workloadIdentityPoolProviderId: inferenceRole: fakeCluster: projectId: location: clusterName: state: STATE_UNSPECIFIED status: code: OK message: updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCluster: type: object properties: name: *ref_0 displayName: *ref_1 createTime: *ref_2 eksCluster: *ref_3 fakeCluster: *ref_4 state: *ref_5 status: *ref_6 updateTime: *ref_7 title: 'Next ID: 15' gatewayClusterState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED default: STATE_UNSPECIFIED description: |2- - CREATING: The cluster is still being created. - READY: The cluster is ready to be used. - DELETING: The cluster is being deleted. - FAILED: Cluster is not operational. Consult 'status' for detailed messaging. Cluster needs to be deleted and re-created. gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEksCluster: type: object properties: awsAccountId: type: string description: The 12-digit AWS account ID where this cluster lives. fireworksManagerRole: type: string title: >- The IAM role ARN used to manage Fireworks resources on AWS. If not specified, the default is arn:aws:iam:::role/FireworksManagerRole region: type: string description: >- The AWS region where this cluster lives. See https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html for a list of available regions. clusterName: type: string description: The EKS cluster name. storageBucketName: type: string description: The S3 bucket name. metricWriterRole: type: string description: >- The IAM role ARN used by Google Managed Prometheus role that will write metrics to Fireworks managed Prometheus. The role must be assumable by the `system:serviceaccount:gmp-system:collector` service account on the EKS cluster. If not specified, no metrics will be written to GCP. loadBalancerControllerRole: type: string description: >- The IAM role ARN used by the EKS load balancer controller (i.e. the load balancer automatically created for the k8s gateway resource). If not specified, no gateway will be created. workloadIdentityPoolProviderId: type: string title: |- The ID of the GCP workload identity pool provider in the Fireworks project for this cluster. The pool ID is assumed to be "byoc-pool" inferenceRole: type: string description: The IAM role ARN used by the inference pods on the cluster. title: |- An Amazon Elastic Kubernetes Service cluster. Next ID: 16 required: - awsAccountId - region gatewayFakeCluster: type: object properties: projectId: type: string location: type: string clusterName: type: string title: A fake cluster using https://pkg.go.dev/k8s.io/client-go/kubernetes/fake gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-dataset.md # Source: https://docs.fireworks.ai/api-reference/create-dataset.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-dataset.md # Source: https://docs.fireworks.ai/api-reference/create-dataset.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-dataset.md # Source: https://docs.fireworks.ai/api-reference/create-dataset.md # Create Dataset ## OpenAPI ````yaml post /v1/accounts/{account_id}/datasets paths: path: /v1/accounts/{account_id}/datasets method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: dataset: allOf: - $ref: '#/components/schemas/gatewayDataset' datasetId: allOf: - type: string sourceDatasetId: allOf: - type: string title: >- If set, indicates we are creating a new dataset by filtering this existing dataset ID filter: allOf: - type: string title: >- Filter condition (SQL-like WHERE clause) to apply to the source dataset required: true refIdentifier: '#/components/schemas/GatewayCreateDatasetBody' requiredProperties: - dataset - datasetId examples: example: value: dataset: displayName: exampleCount: userUploaded: {} evaluationResult: evaluationJobId: transformed: sourceDatasetId: filter: originalFormat: FORMAT_UNSPECIFIED splitted: sourceDatasetId: evalProtocol: {} externalUrl: format: FORMAT_UNSPECIFIED sourceJobName: datasetId: sourceDatasetId: filter: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - &ref_0 type: string readOnly: true displayName: allOf: - &ref_1 type: string createTime: allOf: - &ref_2 type: string format: date-time readOnly: true state: allOf: - &ref_3 $ref: '#/components/schemas/gatewayDatasetState' readOnly: true status: allOf: - &ref_4 $ref: '#/components/schemas/gatewayStatus' readOnly: true exampleCount: allOf: - &ref_5 type: string format: int64 userUploaded: allOf: - &ref_6 $ref: '#/components/schemas/gatewayUserUploaded' evaluationResult: allOf: - &ref_7 $ref: '#/components/schemas/gatewayEvaluationResult' transformed: allOf: - &ref_8 $ref: '#/components/schemas/gatewayTransformed' splitted: allOf: - &ref_9 $ref: '#/components/schemas/gatewaySplitted' evalProtocol: allOf: - &ref_10 $ref: '#/components/schemas/gatewayEvalProtocol' externalUrl: allOf: - &ref_11 type: string title: >- The external URI of the dataset. e.g. gs://foo/bar/baz.jsonl format: allOf: - &ref_12 $ref: '#/components/schemas/DatasetFormat' createdBy: allOf: - &ref_13 type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true updateTime: allOf: - &ref_14 type: string format: date-time description: The update time for the dataset. readOnly: true sourceJobName: allOf: - &ref_15 type: string description: >- The resource name of the job that created this dataset (e.g., batch inference job). Used for lineage tracking to understand dataset provenance. estimatedTokenCount: allOf: - &ref_16 type: string format: int64 description: The estimated number of tokens in the dataset. readOnly: true title: 'Next ID: 23' refIdentifier: '#/components/schemas/gatewayDataset' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: exampleCount: userUploaded: {} evaluationResult: evaluationJobId: transformed: sourceDatasetId: filter: originalFormat: FORMAT_UNSPECIFIED splitted: sourceDatasetId: evalProtocol: {} externalUrl: format: FORMAT_UNSPECIFIED createdBy: updateTime: '2023-11-07T05:31:56Z' sourceJobName: estimatedTokenCount: description: A successful response. deprecated: false type: path components: schemas: DatasetFormat: type: string enum: - FORMAT_UNSPECIFIED - CHAT - COMPLETION - RL default: FORMAT_UNSPECIFIED gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDataset: type: object properties: name: *ref_0 displayName: *ref_1 createTime: *ref_2 state: *ref_3 status: *ref_4 exampleCount: *ref_5 userUploaded: *ref_6 evaluationResult: *ref_7 transformed: *ref_8 splitted: *ref_9 evalProtocol: *ref_10 externalUrl: *ref_11 format: *ref_12 createdBy: *ref_13 updateTime: *ref_14 sourceJobName: *ref_15 estimatedTokenCount: *ref_16 title: 'Next ID: 23' gatewayDatasetState: type: string enum: - STATE_UNSPECIFIED - UPLOADING - READY default: STATE_UNSPECIFIED gatewayEvalProtocol: type: object gatewayEvaluationResult: type: object properties: evaluationJobId: type: string required: - evaluationJobId gatewaySplitted: type: object properties: sourceDatasetId: type: string required: - sourceDatasetId gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayTransformed: type: object properties: sourceDatasetId: type: string filter: type: string originalFormat: $ref: '#/components/schemas/DatasetFormat' required: - sourceDatasetId gatewayUserUploaded: type: object ```` --- # Source: https://docs.fireworks.ai/api-reference/create-deployed-model.md # Load LoRA ## OpenAPI ````yaml post /v1/accounts/{account_id}/deployedModels paths: path: /v1/accounts/{account_id}/deployedModels method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: replaceMergedAddon: schema: - type: boolean required: false description: >- Merges new addon to the base model, while unmerging/deleting any existing addon in the deployment. Must be specified for hot reload deployments header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string description: allOf: - &ref_1 type: string description: Description of the resource. model: allOf: - &ref_2 type: string title: |- The resource name of the model to be deployed. e.g. accounts/my-account/models/my-model deployment: allOf: - &ref_3 type: string description: >- The resource name of the base deployment the model is deployed to. default: allOf: - &ref_4 type: boolean description: >- If true, this is the default target when querying this model without the `#` suffix. The first deployment a model is deployed to will have this field set to true. state: allOf: - &ref_5 $ref: '#/components/schemas/gatewayDeployedModelState' description: The state of the deployed model. readOnly: true serverless: allOf: - &ref_6 type: boolean title: True if the underlying deployment is managed by Fireworks status: allOf: - &ref_7 $ref: '#/components/schemas/gatewayStatus' description: Contains model deploy/undeploy details. readOnly: true public: allOf: - &ref_8 type: boolean description: If true, the deployed model will be publicly reachable. required: true title: 'Next ID: 20' refIdentifier: '#/components/schemas/gatewayDeployedModel' examples: example: value: displayName: description: model: deployment: default: true serverless: true public: true response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name. e.g. accounts/my-account/deployedModels/my-deployed-model readOnly: true displayName: allOf: - *ref_0 description: allOf: - *ref_1 createTime: allOf: - type: string format: date-time description: The creation time of the resource. readOnly: true model: allOf: - *ref_2 deployment: allOf: - *ref_3 default: allOf: - *ref_4 state: allOf: - *ref_5 serverless: allOf: - *ref_6 status: allOf: - *ref_7 public: allOf: - *ref_8 updateTime: allOf: - type: string format: date-time description: The update time for the deployed model. readOnly: true title: 'Next ID: 20' refIdentifier: '#/components/schemas/gatewayDeployedModel' examples: example: value: name: displayName: description: createTime: '2023-11-07T05:31:56Z' model: deployment: default: true state: STATE_UNSPECIFIED serverless: true status: code: OK message: public: true updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDeployedModelState: type: string enum: - STATE_UNSPECIFIED - UNDEPLOYING - DEPLOYING - DEPLOYED - UPDATING default: STATE_UNSPECIFIED description: |- - UNDEPLOYING: The model is being undeployed. - DEPLOYING: The model is being deployed. - DEPLOYED: The model is deployed and ready for inference. - UPDATING: there are updates happening with the deployed model title: 'Next ID: 6' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-deployment.md # Source: https://docs.fireworks.ai/api-reference/create-deployment.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-deployment.md # Source: https://docs.fireworks.ai/api-reference/create-deployment.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-deployment.md # Source: https://docs.fireworks.ai/api-reference/create-deployment.md # Create Deployment ## OpenAPI ````yaml post /v1/accounts/{account_id}/deployments paths: path: /v1/accounts/{account_id}/deployments method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: disableAutoDeploy: schema: - type: boolean required: false description: >- By default, a deployment created with a currently undeployed base model will be deployed to this deployment. If true, this auto-deploy function is disabled. disableSpeculativeDecoding: schema: - type: boolean required: false description: >- By default, a deployment will use the speculative decoding settings from the base model. If true, this will disable speculative decoding. deploymentId: schema: - type: string required: false description: >- The ID of the deployment. If not specified, a random ID will be generated. validateOnly: schema: - type: boolean required: false description: >- If true, this will not create the deployment, but will return the deployment that would be created. skipShapeValidation: schema: - type: boolean required: false description: >- By default, a deployment will ensure the deployment shape provided is validated. If true, we will not require the deployment shape to be validated. header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string description: >- Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long. description: allOf: - &ref_1 type: string description: Description of the deployment. expireTime: allOf: - &ref_2 type: string format: date-time description: >- The time at which this deployment will automatically be deleted. state: allOf: - &ref_3 $ref: '#/components/schemas/gatewayDeploymentState' description: The state of the deployment. readOnly: true status: allOf: - &ref_4 $ref: '#/components/schemas/gatewayStatus' description: >- Detailed status information regarding the most recent operation. readOnly: true minReplicaCount: allOf: - &ref_5 type: integer format: int32 description: |- The minimum number of replicas. If not specified, the default is 0. maxReplicaCount: allOf: - &ref_6 type: integer format: int32 description: >- The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0. autoscalingPolicy: allOf: - &ref_7 $ref: '#/components/schemas/gatewayAutoscalingPolicy' baseModel: allOf: - &ref_8 type: string title: >- The base model name. e.g. accounts/fireworks/models/falcon-7b acceleratorCount: allOf: - &ref_9 type: integer format: int32 description: >- The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model. acceleratorType: allOf: - &ref_10 $ref: '#/components/schemas/gatewayAcceleratorType' description: |- The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB. precision: allOf: - &ref_11 $ref: '#/components/schemas/DeploymentPrecision' description: The precision with which the model should be served. enableAddons: allOf: - &ref_12 type: boolean description: If true, PEFT addons are enabled for this deployment. draftTokenCount: allOf: - &ref_13 type: integer format: int32 description: >- The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior. draftModel: allOf: - &ref_14 type: string description: >- The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior. ngramSpeculationLength: allOf: - &ref_15 type: integer format: int32 description: >- The length of previous input sequence to be considered for N-gram speculation. enableSessionAffinity: allOf: - &ref_16 type: boolean description: Whether to apply sticky routing based on `user` field. directRouteApiKeys: allOf: - &ref_17 type: array items: type: string description: >- The set of API keys used to access the direct route deployment. If direct routing is not enabled, this field is unused. directRouteType: allOf: - &ref_18 $ref: '#/components/schemas/gatewayDirectRouteType' description: >- If set, this deployment will expose an endpoint that bypasses the Fireworks API gateway. deploymentTemplate: allOf: - &ref_19 type: string description: >- The name of the deployment template to use for this deployment. Only available to enterprise accounts. autoTune: allOf: - &ref_20 $ref: '#/components/schemas/gatewayAutoTune' description: The performance profile to use for this deployment. placement: allOf: - &ref_21 $ref: '#/components/schemas/gatewayPlacement' description: >- The desired geographic region where the deployment must be placed. If unspecified, the default is the GLOBAL multi-region. region: allOf: - &ref_22 $ref: '#/components/schemas/gatewayRegion' description: >- The geographic region where the deployment is presently located. This region may change over time, but within the `placement` constraint. readOnly: true disableDeploymentSizeValidation: allOf: - &ref_23 type: boolean description: Whether the deployment size validation is disabled. enableMtp: allOf: - &ref_24 type: boolean description: If true, MTP is enabled for this deployment. enableHotReloadLatestAddon: allOf: - &ref_25 type: boolean description: >- Allows up to 1 addon at a time to be loaded, and will merge it into the base model. deploymentShape: allOf: - &ref_26 type: string description: >- The name of the deployment shape that this deployment is using. On the server side, this will be replaced with the deployment shape version name. activeModelVersion: allOf: - &ref_27 type: string description: >- The model version that is currently active and applied to running replicas of a deployment. targetModelVersion: allOf: - &ref_28 type: string description: >- The target model version that is being rolled out to the deployment. In a ready steady state, the target model version is the same as the active model version. required: true title: 'Next ID: 82' refIdentifier: '#/components/schemas/gatewayDeployment' requiredProperties: &ref_29 - baseModel examples: example: value: displayName: description: expireTime: '2023-11-07T05:31:56Z' minReplicaCount: 123 maxReplicaCount: 123 autoscalingPolicy: scaleUpWindow: scaleDownWindow: scaleToZeroWindow: loadTargets: {} baseModel: acceleratorCount: 123 acceleratorType: ACCELERATOR_TYPE_UNSPECIFIED precision: PRECISION_UNSPECIFIED enableAddons: true draftTokenCount: 123 draftModel: ngramSpeculationLength: 123 enableSessionAffinity: true directRouteApiKeys: - directRouteType: DIRECT_ROUTE_TYPE_UNSPECIFIED deploymentTemplate: autoTune: longPrompt: true placement: region: REGION_UNSPECIFIED multiRegion: MULTI_REGION_UNSPECIFIED regions: - REGION_UNSPECIFIED disableDeploymentSizeValidation: true enableMtp: true enableHotReloadLatestAddon: true deploymentShape: activeModelVersion: targetModelVersion: description: The properties of the deployment being created. response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the deployment. e.g. accounts/my-account/deployments/my-deployment readOnly: true displayName: allOf: - *ref_0 description: allOf: - *ref_1 createTime: allOf: - type: string format: date-time description: The creation time of the deployment. readOnly: true expireTime: allOf: - *ref_2 purgeTime: allOf: - type: string format: date-time description: The time at which the resource will be hard deleted. readOnly: true deleteTime: allOf: - type: string format: date-time description: The time at which the resource will be soft deleted. readOnly: true state: allOf: - *ref_3 status: allOf: - *ref_4 minReplicaCount: allOf: - *ref_5 maxReplicaCount: allOf: - *ref_6 desiredReplicaCount: allOf: - type: integer format: int32 description: >- The desired number of replicas for this deployment. This represents the target replica count that the system is trying to achieve. readOnly: true replicaCount: allOf: - type: integer format: int32 readOnly: true autoscalingPolicy: allOf: - *ref_7 baseModel: allOf: - *ref_8 acceleratorCount: allOf: - *ref_9 acceleratorType: allOf: - *ref_10 precision: allOf: - *ref_11 cluster: allOf: - type: string description: >- If set, this deployment is deployed to a cloud-premise cluster. readOnly: true enableAddons: allOf: - *ref_12 draftTokenCount: allOf: - *ref_13 draftModel: allOf: - *ref_14 ngramSpeculationLength: allOf: - *ref_15 enableSessionAffinity: allOf: - *ref_16 directRouteApiKeys: allOf: - *ref_17 numPeftDeviceCached: allOf: - type: integer format: int32 title: How many peft adapters to keep on gpu side for caching readOnly: true directRouteType: allOf: - *ref_18 directRouteHandle: allOf: - type: string description: >- The handle for calling a direct route. The meaning of the handle depends on the direct route type of the deployment: INTERNET -> The host name for accessing the deployment GCP_PRIVATE_SERVICE_CONNECT -> The service attachment name used to create the PSC endpoint. AWS_PRIVATELINK -> The service name used to create the VPC endpoint. readOnly: true deploymentTemplate: allOf: - *ref_19 autoTune: allOf: - *ref_20 placement: allOf: - *ref_21 region: allOf: - *ref_22 updateTime: allOf: - type: string format: date-time description: The update time for the deployment. readOnly: true disableDeploymentSizeValidation: allOf: - *ref_23 enableMtp: allOf: - *ref_24 enableHotReloadLatestAddon: allOf: - *ref_25 deploymentShape: allOf: - *ref_26 activeModelVersion: allOf: - *ref_27 targetModelVersion: allOf: - *ref_28 title: 'Next ID: 82' refIdentifier: '#/components/schemas/gatewayDeployment' requiredProperties: *ref_29 examples: example: value: name: displayName: description: createTime: '2023-11-07T05:31:56Z' expireTime: '2023-11-07T05:31:56Z' purgeTime: '2023-11-07T05:31:56Z' deleteTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: minReplicaCount: 123 maxReplicaCount: 123 desiredReplicaCount: 123 replicaCount: 123 autoscalingPolicy: scaleUpWindow: scaleDownWindow: scaleToZeroWindow: loadTargets: {} baseModel: acceleratorCount: 123 acceleratorType: ACCELERATOR_TYPE_UNSPECIFIED precision: PRECISION_UNSPECIFIED cluster: enableAddons: true draftTokenCount: 123 draftModel: ngramSpeculationLength: 123 enableSessionAffinity: true directRouteApiKeys: - numPeftDeviceCached: 123 directRouteType: DIRECT_ROUTE_TYPE_UNSPECIFIED directRouteHandle: deploymentTemplate: autoTune: longPrompt: true placement: region: REGION_UNSPECIFIED multiRegion: MULTI_REGION_UNSPECIFIED regions: - REGION_UNSPECIFIED region: REGION_UNSPECIFIED updateTime: '2023-11-07T05:31:56Z' disableDeploymentSizeValidation: true enableMtp: true enableHotReloadLatestAddon: true deploymentShape: activeModelVersion: targetModelVersion: description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve gatewayAcceleratorType: type: string enum: - ACCELERATOR_TYPE_UNSPECIFIED - NVIDIA_A100_80GB - NVIDIA_H100_80GB - AMD_MI300X_192GB - NVIDIA_A10G_24GB - NVIDIA_A100_40GB - NVIDIA_L4_24GB - NVIDIA_H200_141GB - NVIDIA_B200_180GB - AMD_MI325X_256GB default: ACCELERATOR_TYPE_UNSPECIFIED gatewayAutoTune: type: object properties: longPrompt: type: boolean description: If true, this deployment is optimized for long prompt lengths. gatewayAutoscalingPolicy: type: object properties: scaleUpWindow: type: string description: >- The duration the autoscaler will wait before scaling up a deployment after observing increased load. Default is 30s. scaleDownWindow: type: string description: >- The duration the autoscaler will wait before scaling down a deployment after observing decreased load. Default is 10m. scaleToZeroWindow: type: string description: >- The duration after which there are no requests that the deployment will be scaled down to zero replicas, if min_replica_count==0. Default is 1h. This must be at least 5 minutes. loadTargets: type: object additionalProperties: type: number format: float title: >- Map of load metric names to their target utilization factors. Currently only the "default" key is supported, which specifies the default target for all metrics. If not specified, the default target is 0.8 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDeploymentState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED - UPDATING - DELETED default: STATE_UNSPECIFIED description: |2- - CREATING: The deployment is still being created. - READY: The deployment is ready to be used. - DELETING: The deployment is being deleted. - FAILED: The deployment failed to be created. See the `status` field for additional details on why it failed. - UPDATING: There are in-progress updates happening with the deployment. - DELETED: The deployment is soft-deleted. gatewayDirectRouteType: type: string enum: - DIRECT_ROUTE_TYPE_UNSPECIFIED - INTERNET - GCP_PRIVATE_SERVICE_CONNECT - AWS_PRIVATELINK default: DIRECT_ROUTE_TYPE_UNSPECIFIED title: |- - DIRECT_ROUTE_TYPE_UNSPECIFIED: No direct routing - INTERNET: The direct route is exposed via the public internet - GCP_PRIVATE_SERVICE_CONNECT: The direct route is exposed via GCP Private Service Connect - AWS_PRIVATELINK: The direct route is exposed via AWS PrivateLink gatewayMultiRegion: type: string enum: - MULTI_REGION_UNSPECIFIED - GLOBAL - US default: MULTI_REGION_UNSPECIFIED gatewayPlacement: type: object properties: region: $ref: '#/components/schemas/gatewayRegion' description: The region where the deployment must be placed. multiRegion: $ref: '#/components/schemas/gatewayMultiRegion' description: The multi-region where the deployment must be placed. regions: type: array items: $ref: '#/components/schemas/gatewayRegion' title: The list of regions where the deployment must be placed description: >- The desired geographic region where the deployment must be placed. Exactly one field will be specified. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/create-dpo-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/create-dpo-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/create-dpo-job.md # null ## OpenAPI ````yaml post /v1/accounts/{account_id}/dpoJobs paths: path: /v1/accounts/{account_id}/dpoJobs method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: dpoJobId: schema: - type: string required: false description: >- ID of the DPO job, a random ID will be generated if not specified. header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string dataset: allOf: - &ref_1 type: string description: The name of the dataset used for training. state: allOf: - &ref_2 $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - &ref_3 $ref: '#/components/schemas/gatewayStatus' readOnly: true trainingConfig: allOf: - &ref_4 $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. wandbConfig: allOf: - &ref_5 $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging job progress. required: true title: 'Next ID: 13' refIdentifier: '#/components/schemas/gatewayDpoJob' requiredProperties: &ref_6 - dataset examples: example: value: displayName: dataset: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 wandbConfig: enabled: true apiKey: project: entity: runId: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - *ref_0 createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time readOnly: true dataset: allOf: - *ref_1 state: allOf: - *ref_2 status: allOf: - *ref_3 createdBy: allOf: - type: string description: The email address of the user who initiated this dpo job. readOnly: true trainingConfig: allOf: - *ref_4 wandbConfig: allOf: - *ref_5 title: 'Next ID: 13' refIdentifier: '#/components/schemas/gatewayDpoJob' requiredProperties: *ref_6 examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 wandbConfig: enabled: true apiKey: project: entity: runId: url: description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/create-environment.md # Create Environment ## OpenAPI ````yaml post /v1/accounts/{account_id}/environments paths: path: /v1/accounts/{account_id}/environments method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: environment: allOf: - $ref: '#/components/schemas/gatewayEnvironment' description: The properties of the Environment being created. environmentId: allOf: - type: string title: >- The environment ID to use in the environment name. e.g. my-env required: true refIdentifier: '#/components/schemas/GatewayCreateEnvironmentBody' requiredProperties: - environment - environmentId examples: example: value: environment: displayName: baseImageRef: shared: true annotations: {} environmentId: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - &ref_0 type: string title: >- The resource name of the environment. e.g. accounts/my-account/clusters/my-cluster/environments/my-env readOnly: true displayName: allOf: - &ref_1 type: string title: >- Human-readable display name of the environment. e.g. "My Environment" createTime: allOf: - &ref_2 type: string format: date-time description: The creation time of the environment. readOnly: true createdBy: allOf: - &ref_3 type: string description: >- The email address of the user who created this environment. readOnly: true state: allOf: - &ref_4 $ref: '#/components/schemas/gatewayEnvironmentState' description: The current state of the environment. readOnly: true status: allOf: - &ref_5 $ref: '#/components/schemas/gatewayStatus' description: The current error status of the environment. readOnly: true connection: allOf: - &ref_6 $ref: '#/components/schemas/gatewayEnvironmentConnection' description: Information about the current environment connection. readOnly: true baseImageRef: allOf: - &ref_7 type: string description: >- The URI of the base container image used for this environment. imageRef: allOf: - &ref_8 type: string description: >- The URI of the container image used for this environment. This is a image is an immutable snapshot of the base_image_ref when the environment was created. readOnly: true snapshotImageRef: allOf: - &ref_9 type: string description: >- The URI of the latest container image snapshot for this environment. readOnly: true shared: allOf: - &ref_10 type: boolean description: >- Whether the environment is shared with all users in the account. This allows all users to connect, disconnect, update, delete, clone, and create batch jobs using the environment. annotations: allOf: - &ref_11 type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. updateTime: allOf: - &ref_12 type: string format: date-time description: The update time for the environment. readOnly: true title: 'Next ID: 14' refIdentifier: '#/components/schemas/gatewayEnvironment' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' createdBy: state: STATE_UNSPECIFIED status: code: OK message: connection: nodePoolId: numRanks: 123 role: zone: useLocalStorage: true baseImageRef: imageRef: snapshotImageRef: shared: true annotations: {} updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEnvironment: type: object properties: name: *ref_0 displayName: *ref_1 createTime: *ref_2 createdBy: *ref_3 state: *ref_4 status: *ref_5 connection: *ref_6 baseImageRef: *ref_7 imageRef: *ref_8 snapshotImageRef: *ref_9 shared: *ref_10 annotations: *ref_11 updateTime: *ref_12 title: 'Next ID: 14' gatewayEnvironmentConnection: type: object properties: nodePoolId: type: string description: The resource id of the node pool the environment is connected to. numRanks: type: integer format: int32 description: |- For GPU node pools: one GPU per rank w/ host packing, for CPU node pools: one host per rank. If not specified, the default is 1. role: type: string description: |- The ARN of the AWS IAM role that the connection should assume. If not specified, the connection will fall back to the node pool's node_role. zone: type: string description: >- Current for the last zone that this environment is connected to. We want to warn the users about cross zone migration latency when they are connecting to node pool in a different zone as their persistent volume. readOnly: true useLocalStorage: type: boolean description: >- If true, the node's local storage will be mounted on /tmp. This flag has no effect if the node does not have local storage. title: 'Next ID: 8' required: - nodePoolId gatewayEnvironmentState: type: string enum: - STATE_UNSPECIFIED - CREATING - DISCONNECTED - CONNECTING - CONNECTED - DISCONNECTING - RECONNECTING - DELETING default: STATE_UNSPECIFIED description: |- - CREATING: The environment is being created. - DISCONNECTED: The environment is not connected. - CONNECTING: The environment is being connected to a node. - CONNECTED: The environment is connected to a node. - DISCONNECTING: The environment is being disconnected from a node. - RECONNECTING: The environment is reconnecting with new connection parameters. - DELETING: The environment is being deleted. title: 'Next ID: 8' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference/create-evaluation-job.md # Create Evaluation Job ## OpenAPI ````yaml post /v1/accounts/{account_id}/evaluationJobs openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluationJobs: post: tags: - Gateway summary: Create Evaluation Job operationId: Gateway_CreateEvaluationJob parameters: - name: account_id in: path required: true description: The Account Id schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/GatewayCreateEvaluationJobBody' required: true responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayEvaluationJob' components: schemas: GatewayCreateEvaluationJobBody: type: object properties: evaluationJob: $ref: '#/components/schemas/gatewayEvaluationJob' evaluationJobId: type: string leaderboardIds: type: array items: type: string description: Optional leaderboards to attach this job to upon creation. required: - evaluationJob gatewayEvaluationJob: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true createdBy: type: string readOnly: true state: $ref: '#/components/schemas/gatewayJobState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true evaluator: type: string description: >- The fully-qualified resource name of the Evaluation used by this job. Format: accounts/{account_id}/evaluators/{evaluator_id} inputDataset: type: string description: >- The fully-qualified resource name of the input Dataset used by this job. Format: accounts/{account_id}/datasets/{dataset_id} outputDataset: type: string description: >- The fully-qualified resource name of the output Dataset created by this job. Format: accounts/{account_id}/datasets/{output_dataset_id} metrics: type: object additionalProperties: type: number format: double readOnly: true outputStats: type: string description: The output dataset's aggregated stats for the evaluation job. updateTime: type: string format: date-time description: The update time for the evaluation job. readOnly: true title: 'Next ID: 18' required: - evaluator - inputDataset - outputDataset gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED - JOB_STATE_PAUSED default: JOB_STATE_UNSPECIFIED description: |- JobState represents the state an asynchronous job can be in. - JOB_STATE_PAUSED: Job is paused, typically due to account suspension or manual intervention. gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/create-evaluator.md # Create Evaluator > Creates a custom evaluator for scoring model outputs. Evaluators use the [Eval Protocol](https://evalprotocol.io) to define test cases, run model inference, and score responses. They are used with evaluation jobs and Reinforcement Fine-Tuning (RFT). ## Source Code Requirements Your project should contain: - `requirements.txt` - Python dependencies for your evaluator - `test_*.py` - Pytest test file(s) with [`@evaluation_test`](https://evalprotocol.io/reference/evaluation-test) decorated functions - Any additional code/modules your evaluator needs ## Workflow **Recommended:** Use the [`ep upload`](https://evalprotocol.io/reference/cli#ep-upload) CLI command to handle all these steps automatically. If using the API directly: 1. Call this endpoint to create the evaluator resource 2. Package your source directory as a `.tar.gz` (respecting `.gitignore`) 3. Call [Get Evaluator Upload Endpoint](/api-reference/get-evaluator-upload-endpoint) to get a signed upload URL 4. `PUT` the tar.gz file to the signed URL 5. Call [Validate Evaluator Upload](/api-reference/validate-evaluator-upload) to trigger server-side validation 6. Poll [Get Evaluator](/api-reference/get-evaluator) until ready Once active, reference the evaluator in [Create Evaluation Job](/api-reference/create-evaluation-job) or [Create Reinforcement Fine-tuning Job](/api-reference/create-reinforcement-fine-tuning-job). ## OpenAPI ````yaml post /v1/accounts/{account_id}/evaluatorsV2 openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluatorsV2: post: tags: - Gateway summary: Create Evaluator description: >- Creates a custom evaluator for scoring model outputs. Evaluators use the [Eval Protocol](https://evalprotocol.io) to define test cases, run model inference, and score responses. They are used with evaluation jobs and Reinforcement Fine-Tuning (RFT). ## Source Code Requirements Your project should contain: - `requirements.txt` - Python dependencies for your evaluator - `test_*.py` - Pytest test file(s) with [`@evaluation_test`](https://evalprotocol.io/reference/evaluation-test) decorated functions - Any additional code/modules your evaluator needs ## Workflow **Recommended:** Use the [`ep upload`](https://evalprotocol.io/reference/cli#ep-upload) CLI command to handle all these steps automatically. If using the API directly: 1. Call this endpoint to create the evaluator resource 2. Package your source directory as a `.tar.gz` (respecting `.gitignore`) 3. Call [Get Evaluator Upload Endpoint](/api-reference/get-evaluator-upload-endpoint) to get a signed upload URL 4. `PUT` the tar.gz file to the signed URL 5. Call [Validate Evaluator Upload](/api-reference/validate-evaluator-upload) to trigger server-side validation 6. Poll [Get Evaluator](/api-reference/get-evaluator) until ready Once active, reference the evaluator in [Create Evaluation Job](/api-reference/create-evaluation-job) or [Create Reinforcement Fine-tuning Job](/api-reference/create-reinforcement-fine-tuning-job). operationId: Gateway_CreateEvaluatorV2 parameters: - name: account_id in: path required: true description: The Account Id schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/GatewayCreateEvaluatorV2Body' required: true responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayEvaluator' components: schemas: GatewayCreateEvaluatorV2Body: type: object properties: evaluator: $ref: '#/components/schemas/gatewayEvaluator' evaluatorId: type: string required: - evaluator gatewayEvaluator: type: object properties: name: type: string readOnly: true displayName: type: string description: type: string createTime: type: string format: date-time readOnly: true createdBy: type: string readOnly: true updateTime: type: string format: date-time readOnly: true state: $ref: '#/components/schemas/gatewayEvaluatorState' readOnly: true requirements: type: string title: Content for the requirements.txt for package installation entryPoint: type: string title: >- entry point of the evaluator inside the codebase. In module::function or path::function format status: $ref: '#/components/schemas/gatewayStatus' title: Status of the evaluator, used to expose build status to the user readOnly: true commitHash: type: string title: Commit hash of this evaluator from the user's original codebase source: $ref: '#/components/schemas/gatewayEvaluatorSource' description: Source information for the evaluator codebase. defaultDataset: type: string title: Default dataset that is associated with the evaluator title: 'Next ID: 17' gatewayEvaluatorState: type: string enum: - STATE_UNSPECIFIED - ACTIVE - BUILDING - BUILD_FAILED default: STATE_UNSPECIFIED title: |- - ACTIVE: The evaluator is ready to use for evaluation - BUILDING: The evaluator is being built, i.e. building the e2b template - BUILD_FAILED: The evaluator build failed, and it cannot be used for evaluation gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayEvaluatorSource: type: object properties: type: $ref: '#/components/schemas/EvaluatorSourceType' description: Identifies how the evaluator source code is provided. githubRepositoryName: type: string description: >- Normalized GitHub repository name (e.g. owner/repository) when the source is GitHub. gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] EvaluatorSourceType: type: string enum: - TYPE_UNSPECIFIED - TYPE_UPLOAD - TYPE_GITHUB - TYPE_TEMPORARY default: TYPE_UNSPECIFIED title: |- - TYPE_UPLOAD: Source code is uploaded by the user - TYPE_GITHUB: Source code is from a GitHub repository - TYPE_TEMPORARY: Source code is a temporary UI uploaded code securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-identity-provider.md # firectl create identity-provider > Creates a new identity provider. ``` firectl create identity-provider [flags] ``` ### Examples ``` # Create SAML identity provider firectl create identity-provider --display-name="Company SAML" \ --saml-metadata-url="https://company.okta.com/app/xyz/sso/saml/metadata" # Create OIDC identity provider firectl create identity-provider --display-name="Company OIDC" \ --oidc-issuer="https://auth.company.com" \ --oidc-client-id="abc123" \ --oidc-client-secret="secret456" # Create OIDC identity provider with multiple domains firectl create identity-provider --display-name="Example OIDC" \ --oidc-issuer="https://accounts.google.com" \ --oidc-client-id="client123" \ --oidc-client-secret="secret456" \ --tenant-domains="example.com,example.co.uk" ``` ### Flags ``` --display-name string The display name of the identity provider (required) -h, --help help for identity-provider --oidc-client-id string The OIDC client ID for OIDC providers --oidc-client-secret string The OIDC client secret for OIDC providers --oidc-issuer string The OIDC issuer URL for OIDC providers --saml-metadata-url string The SAML metadata URL for SAML providers --tenant-domains string Comma-separated list of allowed domains for the organization (e.g., 'example.com,example.co.uk'). If not provided, domain will be derived from account email. ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --dry-run Print the request proto without running it. -o, --output Output Set the output format to "text", "json", or "flag". (default text) -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-model.md # Source: https://docs.fireworks.ai/api-reference/create-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-model.md # Source: https://docs.fireworks.ai/api-reference/create-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-model.md # Source: https://docs.fireworks.ai/api-reference/create-model.md # Create Model ## OpenAPI ````yaml post /v1/accounts/{account_id}/models paths: path: /v1/accounts/{account_id}/models method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: model: allOf: - $ref: '#/components/schemas/gatewayModel' description: The properties of the Model being created. modelId: allOf: - type: string description: ID of the model. cluster: allOf: - type: string description: >- The resource name of the BYOC cluster to which this model belongs. e.g. accounts/my-account/clusters/my-cluster. Empty if it belongs to a Fireworks cluster. required: true refIdentifier: '#/components/schemas/GatewayCreateModelBody' requiredProperties: - modelId examples: example: value: model: displayName: description: kind: KIND_UNSPECIFIED githubUrl: huggingFaceUrl: baseModelDetails: worldSize: 123 checkpointFormat: CHECKPOINT_FORMAT_UNSPECIFIED parameterCount: moe: true tunable: true modelType: supportsFireattention: true supportsMtp: true peftDetails: baseModel: r: 123 targetModules: - mergeAddonModelName: teftDetails: {} public: true conversationConfig: style: system: template: contextLength: 123 supportsImageInput: true supportsTools: true defaultDraftModel: defaultDraftTokenCount: 123 deprecationDate: year: 123 month: 123 day: 123 supportsLora: true useHfApplyChatTemplate: true trainingContextLength: 123 snapshotType: FULL_SNAPSHOT modelId: cluster: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - &ref_0 type: string title: >- The resource name of the model. e.g. accounts/my-account/models/my-model readOnly: true displayName: allOf: - &ref_1 type: string description: |- Human-readable display name of the model. e.g. "My Model" Must be fewer than 64 characters long. description: allOf: - &ref_2 type: string description: >- The description of the model. Must be fewer than 1000 characters long. createTime: allOf: - &ref_3 type: string format: date-time description: The creation time of the model. readOnly: true state: allOf: - &ref_4 $ref: '#/components/schemas/gatewayModelState' description: The state of the model. readOnly: true status: allOf: - &ref_5 $ref: '#/components/schemas/gatewayStatus' description: >- Contains detailed message when the last model operation fails. readOnly: true kind: allOf: - &ref_6 $ref: '#/components/schemas/gatewayModelKind' description: |- The kind of model. If not specified, the default is HF_PEFT_ADDON. githubUrl: allOf: - &ref_7 type: string description: The URL to GitHub repository of the model. huggingFaceUrl: allOf: - &ref_8 type: string description: The URL to the Hugging Face model. baseModelDetails: allOf: - &ref_9 $ref: '#/components/schemas/gatewayBaseModelDetails' description: >- Base model details. Required if kind is HF_BASE_MODEL. Must not be set otherwise. peftDetails: allOf: - &ref_10 $ref: '#/components/schemas/gatewayPEFTDetails' description: |- PEFT addon details. Required if kind is HF_PEFT_ADDON or HF_TEFT_ADDON. teftDetails: allOf: - &ref_11 $ref: '#/components/schemas/gatewayTEFTDetails' description: >- TEFT addon details. Required if kind is HF_TEFT_ADDON. Must not be set otherwise. public: allOf: - &ref_12 type: boolean description: If true, the model will be publicly readable. conversationConfig: allOf: - &ref_13 $ref: '#/components/schemas/gatewayConversationConfig' description: >- If set, the Chat Completions API will be enabled for this model. contextLength: allOf: - &ref_14 type: integer format: int32 description: The maximum context length supported by the model. supportsImageInput: allOf: - &ref_15 type: boolean description: If set, images can be provided as input to the model. supportsTools: allOf: - &ref_16 type: boolean description: >- If set, tools (i.e. functions) can be provided as input to the model, and the model may respond with one or more tool calls. importedFrom: allOf: - &ref_17 type: string description: >- The name of the the model from which this was imported. This field is empty if the model was not imported. readOnly: true fineTuningJob: allOf: - &ref_18 type: string description: >- If the model was created from a fine-tuning job, this is the fine-tuning job name. readOnly: true defaultDraftModel: allOf: - &ref_19 type: string description: >- The default draft model to use when creating a deployment. If empty, speculative decoding is disabled by default. defaultDraftTokenCount: allOf: - &ref_20 type: integer format: int32 description: >- The default draft token count to use when creating a deployment. Must be specified if default_draft_model is specified. deployedModelRefs: allOf: - &ref_21 type: array items: type: object $ref: '#/components/schemas/gatewayDeployedModelRef' description: Populated from GetModel API call only. readOnly: true cluster: allOf: - &ref_22 type: string description: >- The resource name of the BYOC cluster to which this model belongs. e.g. accounts/my-account/clusters/my-cluster. Empty if it belongs to a Fireworks cluster. readOnly: true deprecationDate: allOf: - &ref_23 $ref: '#/components/schemas/typeDate' description: >- If specified, this is the date when the serverless deployment of the model will be taken down. calibrated: allOf: - &ref_24 type: boolean description: >- If true, the model is calibrated and can be deployed to non-FP16 precisions. readOnly: true tunable: allOf: - &ref_25 type: boolean description: >- If true, the model can be fine-tuned. The value will be true if the tunable field is true, and the model is validated against the model_type field. readOnly: true supportsLora: allOf: - &ref_26 type: boolean description: Whether this model supports LoRA. useHfApplyChatTemplate: allOf: - &ref_27 type: boolean description: >- If true, the model will use the Hugging Face apply_chat_template API to apply the chat template. updateTime: allOf: - &ref_28 type: string format: date-time description: The update time for the model. readOnly: true defaultSamplingParams: allOf: - &ref_29 type: object additionalProperties: type: number format: float description: >- A json object that contains the default sampling parameters for the model. readOnly: true rlTunable: allOf: - &ref_30 type: boolean description: If true, the model is RL tunable. readOnly: true supportedPrecisions: allOf: - &ref_31 type: array items: $ref: '#/components/schemas/DeploymentPrecision' title: Supported precisions readOnly: true supportedPrecisionsWithCalibration: allOf: - &ref_32 type: array items: $ref: '#/components/schemas/DeploymentPrecision' title: Supported precisions if calibrated readOnly: true trainingContextLength: allOf: - &ref_33 type: integer format: int32 description: The maximum context length supported by the model. snapshotType: allOf: - &ref_34 $ref: '#/components/schemas/ModelSnapshotType' title: 'Next ID: 56' refIdentifier: '#/components/schemas/gatewayModel' examples: example: value: name: displayName: description: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: kind: KIND_UNSPECIFIED githubUrl: huggingFaceUrl: baseModelDetails: worldSize: 123 checkpointFormat: CHECKPOINT_FORMAT_UNSPECIFIED parameterCount: moe: true tunable: true modelType: supportsFireattention: true defaultPrecision: PRECISION_UNSPECIFIED supportsMtp: true peftDetails: baseModel: r: 123 targetModules: - baseModelType: mergeAddonModelName: teftDetails: {} public: true conversationConfig: style: system: template: contextLength: 123 supportsImageInput: true supportsTools: true importedFrom: fineTuningJob: defaultDraftModel: defaultDraftTokenCount: 123 deployedModelRefs: - name: deployment: state: STATE_UNSPECIFIED default: true public: true cluster: deprecationDate: year: 123 month: 123 day: 123 calibrated: true tunable: true supportsLora: true useHfApplyChatTemplate: true updateTime: '2023-11-07T05:31:56Z' defaultSamplingParams: {} rlTunable: true supportedPrecisions: - PRECISION_UNSPECIFIED supportedPrecisionsWithCalibration: - PRECISION_UNSPECIFIED trainingContextLength: 123 snapshotType: FULL_SNAPSHOT description: A successful response. deprecated: false type: path components: schemas: BaseModelDetailsCheckpointFormat: type: string enum: - CHECKPOINT_FORMAT_UNSPECIFIED - NATIVE - HUGGINGFACE default: CHECKPOINT_FORMAT_UNSPECIFIED DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve ModelSnapshotType: type: string enum: - FULL_SNAPSHOT - INCREMENTAL_SNAPSHOT default: FULL_SNAPSHOT gatewayBaseModelDetails: type: object properties: worldSize: type: integer format: int32 description: |- The default number of GPUs the model is served with. If not specified, the default is 1. checkpointFormat: $ref: '#/components/schemas/BaseModelDetailsCheckpointFormat' parameterCount: type: string format: int64 description: >- The number of model parameters. For serverless models, this determines the price per token. moe: type: boolean description: >- If true, this is a Mixture of Experts (MoE) model. For serverless models, this affects the price per token. tunable: type: boolean description: If true, this model is available for fine-tuning. modelType: type: string description: The type of the model. supportsFireattention: type: boolean description: Whether this model supports fireattention. defaultPrecision: $ref: '#/components/schemas/DeploymentPrecision' description: Default precision of the model. readOnly: true supportsMtp: type: boolean description: If true, this model supports MTP. title: 'Next ID: 11' gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayConversationConfig: type: object properties: style: type: string description: The chat template to use. system: type: string description: The system prompt (if the chat style supports it). template: type: string description: The Jinja template (if style is "jinja"). required: - style gatewayDeployedModelRef: type: object properties: name: type: string title: >- The resource name. e.g. accounts/my-account/deployedModels/my-deployed-model readOnly: true deployment: type: string description: The resource name of the base deployment the model is deployed to. readOnly: true state: $ref: '#/components/schemas/gatewayDeployedModelState' description: The state of the deployed model. readOnly: true default: type: boolean description: >- If true, this is the default target when querying this model without the `#` suffix. The first deployment a model is deployed to will have this field set to true automatically. readOnly: true public: type: boolean description: If true, the deployed model will be publicly reachable. readOnly: true title: 'Next ID: 6' gatewayDeployedModelState: type: string enum: - STATE_UNSPECIFIED - UNDEPLOYING - DEPLOYING - DEPLOYED - UPDATING default: STATE_UNSPECIFIED description: |- - UNDEPLOYING: The model is being undeployed. - DEPLOYING: The model is being deployed. - DEPLOYED: The model is deployed and ready for inference. - UPDATING: there are updates happening with the deployed model title: 'Next ID: 6' gatewayModel: type: object properties: name: *ref_0 displayName: *ref_1 description: *ref_2 createTime: *ref_3 state: *ref_4 status: *ref_5 kind: *ref_6 githubUrl: *ref_7 huggingFaceUrl: *ref_8 baseModelDetails: *ref_9 peftDetails: *ref_10 teftDetails: *ref_11 public: *ref_12 conversationConfig: *ref_13 contextLength: *ref_14 supportsImageInput: *ref_15 supportsTools: *ref_16 importedFrom: *ref_17 fineTuningJob: *ref_18 defaultDraftModel: *ref_19 defaultDraftTokenCount: *ref_20 deployedModelRefs: *ref_21 cluster: *ref_22 deprecationDate: *ref_23 calibrated: *ref_24 tunable: *ref_25 supportsLora: *ref_26 useHfApplyChatTemplate: *ref_27 updateTime: *ref_28 defaultSamplingParams: *ref_29 rlTunable: *ref_30 supportedPrecisions: *ref_31 supportedPrecisionsWithCalibration: *ref_32 trainingContextLength: *ref_33 snapshotType: *ref_34 title: 'Next ID: 56' gatewayModelKind: type: string enum: - KIND_UNSPECIFIED - HF_BASE_MODEL - HF_PEFT_ADDON - HF_TEFT_ADDON - FLUMINA_BASE_MODEL - FLUMINA_ADDON - DRAFT_ADDON - FIRE_AGENT - LIVE_MERGE - CUSTOM_MODEL - EMBEDDING_MODEL - SNAPSHOT_MODEL default: KIND_UNSPECIFIED description: |2- - HF_BASE_MODEL: An LLM base model. - HF_PEFT_ADDON: A parameter-efficent fine-tuned addon. - HF_TEFT_ADDON: A token-eficient fine-tuned addon. - FLUMINA_BASE_MODEL: A Flumina base model. - FLUMINA_ADDON: A Flumina addon. - DRAFT_ADDON: A draft model used for speculative decoding in a deployment. - FIRE_AGENT: A FireAgent model. - LIVE_MERGE: A live-merge model. - CUSTOM_MODEL: A customized model - EMBEDDING_MODEL: An Embedding model. - SNAPSHOT_MODEL: A snapshot model. gatewayModelState: type: string enum: - STATE_UNSPECIFIED - UPLOADING - READY default: STATE_UNSPECIFIED description: |- - UPLOADING: The model is still being uploaded (upload is asynchronous). - READY: The model is ready to be used. title: 'Next ID: 7' gatewayPEFTDetails: type: object properties: baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b r: type: integer format: int32 description: |- The rank of the update matrices. Must be between 4 and 64, inclusive. targetModules: type: array items: type: string title: >- This is the target modules for an adapter that we extract from for more information what target module means, check out https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft baseModelType: type: string description: The type of the model. readOnly: true mergeAddonModelName: type: string title: >- The resource name of the model to merge with base model, e.g accounts/fireworks/models/falcon-7b-lora title: |- PEFT addon details. Next ID: 6 required: - baseModel - r - targetModules gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayTEFTDetails: type: object typeDate: type: object properties: year: type: integer format: int32 description: >- Year of the date. Must be from 1 to 9999, or 0 to specify a date without a year. month: type: integer format: int32 description: >- Month of a year. Must be from 1 to 12, or 0 to specify a year without a month and day. day: type: integer format: int32 description: >- Day of a month. Must be from 1 to 31 and valid for the year and month, or 0 to specify a year by itself or a year and month where the day isn't significant. description: >- * A full date, with non-zero year, month, and day values * A month and day value, with a zero year, such as an anniversary * A year on its own, with zero month and day values * A year and month value, with a zero day, such as a credit card expiration date Related types are [google.type.TimeOfDay][google.type.TimeOfDay] and `google.protobuf.Timestamp`. title: >- Represents a whole or partial calendar date, such as a birthday. The time of day and time zone are either specified elsewhere or are insignificant. The date is relative to the Gregorian Calendar. This can represent one of the following: ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/create-node-pool-binding.md # Create Node Pool Binding ## OpenAPI ````yaml post /v1/accounts/{account_id}/nodePoolBindings paths: path: /v1/accounts/{account_id}/nodePoolBindings method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: principal: allOf: - &ref_0 type: string description: >- The principal that is allowed use the node pool. This must be the email address of the user. required: true refIdentifier: '#/components/schemas/gatewayNodePoolBinding' requiredProperties: &ref_1 - principal examples: example: value: principal: description: The properties of the node pool binding being created. response: '200': application/json: schemaArray: - type: object properties: accountId: allOf: - type: string description: The account ID that this binding is associated with. readOnly: true clusterId: allOf: - type: string description: The cluster ID that this binding is associated with. readOnly: true nodePoolId: allOf: - type: string description: The node pool ID that this binding is associated with. readOnly: true createTime: allOf: - type: string format: date-time description: The creation time of the node pool binding. readOnly: true principal: allOf: - *ref_0 refIdentifier: '#/components/schemas/gatewayNodePoolBinding' requiredProperties: *ref_1 examples: example: value: accountId: clusterId: nodePoolId: createTime: '2023-11-07T05:31:56Z' principal: description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/create-node-pool.md # Create Node Pool ## OpenAPI ````yaml post /v1/accounts/{account_id}/nodePools paths: path: /v1/accounts/{account_id}/nodePools method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: nodePool: allOf: - $ref: '#/components/schemas/gatewayNodePool' description: The properties of the NodePool being created. nodePoolId: allOf: - type: string title: >- The node pool ID to use in the node pool name. e.g. my-pool required: true refIdentifier: '#/components/schemas/GatewayCreateNodePoolBody' requiredProperties: - nodePool - nodePoolId examples: example: value: nodePool: displayName: minNodeCount: 123 maxNodeCount: 123 overprovisionNodeCount: 123 eksNodePool: nodeRole: instanceType: spot: true nodeGroupName: subnetIds: - zone: placementGroup: launchTemplate: fakeNodePool: machineType: numNodes: 123 serviceAccount: annotations: {} nodePoolId: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - &ref_0 type: string title: >- The resource name of the node pool. e.g. accounts/my-account/clusters/my-cluster/nodePools/my-pool readOnly: true displayName: allOf: - &ref_1 type: string description: >- Human-readable display name of the node pool. e.g. "My Node Pool" Must be fewer than 64 characters long. createTime: allOf: - &ref_2 type: string format: date-time description: The creation time of the node pool. readOnly: true minNodeCount: allOf: - &ref_3 type: integer format: int32 description: >- https://cloud.google.com/kubernetes-engine/quotas Minimum number of nodes in this node pool. Must be a non-negative integer less than or equal to max_node_count. If not specified, the default is 0. maxNodeCount: allOf: - &ref_4 type: integer format: int32 description: >- https://cloud.google.com/kubernetes-engine/quotas Maximum number of nodes in this node pool. Must be a positive integer greater than or equal to min_node_count. If not specified, the default is 1. overprovisionNodeCount: allOf: - &ref_5 type: integer format: int32 description: >- The number of nodes to overprovision by the autoscaler. Must be a non-negative integer and less than or equal to min_node_count and max_node_count-min_node_count. If not specified, the default is 0. eksNodePool: allOf: - &ref_6 $ref: '#/components/schemas/gatewayEksNodePool' fakeNodePool: allOf: - &ref_7 $ref: '#/components/schemas/gatewayFakeNodePool' annotations: allOf: - &ref_8 type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. state: allOf: - &ref_9 $ref: '#/components/schemas/gatewayNodePoolState' description: The current state of the node pool. readOnly: true status: allOf: - &ref_10 $ref: '#/components/schemas/gatewayStatus' description: >- Contains detailed message when the last node pool operation fails, e.g. when node pool is in FAILED state or when last node pool update fails. readOnly: true nodePoolStats: allOf: - &ref_11 $ref: '#/components/schemas/gatewayNodePoolStats' description: Live statistics of the node pool. readOnly: true updateTime: allOf: - &ref_12 type: string format: date-time description: The update time for the node pool. readOnly: true title: 'Next ID: 16' refIdentifier: '#/components/schemas/gatewayNodePool' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' minNodeCount: 123 maxNodeCount: 123 overprovisionNodeCount: 123 eksNodePool: nodeRole: instanceType: spot: true nodeGroupName: subnetIds: - zone: placementGroup: launchTemplate: fakeNodePool: machineType: numNodes: 123 serviceAccount: annotations: {} state: STATE_UNSPECIFIED status: code: OK message: nodePoolStats: nodeCount: 123 ranksPerNode: 123 environmentCount: 123 environmentRanks: 123 batchJobCount: {} batchJobRanks: {} updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEksNodePool: type: object properties: nodeRole: type: string description: |- If not specified, the parent cluster's system_node_group_role will be used. title: |- The IAM role ARN to associate with nodes. The role must have the following IAM policies attached: - AmazonEKSWorkerNodePolicy - AmazonEC2ContainerRegistryReadOnly - AmazonEKS_CNI_Policy instanceType: type: string description: >- The type of instance used in this node pool. See https://aws.amazon.com/ec2/instance-types/ for a list of valid instance types. spot: type: boolean title: >- If true, nodes are created as preemptible VM instances. See https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html#managed-node-group-capacity-types nodeGroupName: type: string description: |- The name of the node group. If not specified, the default is the node pool ID. subnetIds: type: array items: type: string description: >- A list of subnet IDs for nodes in this node pool. If not specified, the parent cluster's default subnet IDs that matches the zone will be used. Note that all the subnets will need to be in the same zone. zone: type: string description: >- Zone for the node pool. If not specified, a random zone in the cluster's region will be selected. placementGroup: type: string description: Cluster placement group to colocate hosts in this pool. launchTemplate: type: string description: Launch template to create for this node group. title: |- An Amazon Elastic Kubernetes Service node pool. Next ID: 10 required: - instanceType gatewayFakeNodePool: type: object properties: machineType: type: string numNodes: type: integer format: int32 serviceAccount: type: string description: A fake node pool to be used with FakeCluster. gatewayNodePool: type: object properties: name: *ref_0 displayName: *ref_1 createTime: *ref_2 minNodeCount: *ref_3 maxNodeCount: *ref_4 overprovisionNodeCount: *ref_5 eksNodePool: *ref_6 fakeNodePool: *ref_7 annotations: *ref_8 state: *ref_9 status: *ref_10 nodePoolStats: *ref_11 updateTime: *ref_12 title: 'Next ID: 16' gatewayNodePoolState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED default: STATE_UNSPECIFIED description: |2- - CREATING: The cluster is still being created. - READY: The node pool is ready to be used. - DELETING: The node pool is being deleted. - FAILED: Node pool is not operational. Consult 'status' for detailed messaging. Node pool needs to be deleted and re-created. gatewayNodePoolStats: type: object properties: nodeCount: type: integer format: int32 description: The number of nodes currently available in this pool. ranksPerNode: type: integer format: int32 description: >- The number of ranks available per node. This is determined by the machine type of the nodes in this node pool. environmentCount: type: integer format: int32 description: The number of environments connected to this node pool. environmentRanks: type: integer format: int32 description: |- The number of ranks in this node pool that are currently allocated to environment connections. batchJobCount: type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of batch jobs in that state allocated to this node pool. batchJobRanks: type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of ranks allocated to batch jobs in that state in this node pool. title: 'Next ID: 7' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/create-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/create-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/create-reinforcement-fine-tuning-job.md # Create Reinforcement Fine-tuning Job ## OpenAPI ````yaml post /v1/accounts/{account_id}/reinforcementFineTuningJobs paths: path: /v1/accounts/{account_id}/reinforcementFineTuningJobs method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: reinforcementFineTuningJobId: schema: - type: string required: false description: >- ID of the reinforcement fine-tuning job, a random UUID will be generated if not specified. header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string dataset: allOf: - &ref_1 type: string description: The name of the dataset used for training. evaluationDataset: allOf: - &ref_2 type: string description: The name of a separate dataset to use for evaluation. evalAutoCarveout: allOf: - &ref_3 type: boolean description: Whether to auto-carve the dataset for eval. state: allOf: - &ref_4 $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - &ref_5 $ref: '#/components/schemas/gatewayStatus' readOnly: true trainingConfig: allOf: - &ref_6 $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. evaluator: allOf: - &ref_7 type: string description: >- The evaluator resource name to use for RLOR fine-tuning job. wandbConfig: allOf: - &ref_8 $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. inferenceParameters: allOf: - &ref_9 $ref: '#/components/schemas/gatewayInferenceParameters' description: BIJ parameters. mcpServer: allOf: - &ref_10 type: string required: true title: 'Next ID: 29' refIdentifier: '#/components/schemas/gatewayReinforcementFineTuningJob' requiredProperties: &ref_11 - dataset - evaluator examples: example: value: displayName: dataset: evaluationDataset: evalAutoCarveout: true trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 evaluator: wandbConfig: enabled: true apiKey: project: entity: runId: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 mcpServer: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - *ref_0 createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time description: The completed time for the reinforcement fine-tuning job. readOnly: true dataset: allOf: - *ref_1 evaluationDataset: allOf: - *ref_2 evalAutoCarveout: allOf: - *ref_3 state: allOf: - *ref_4 status: allOf: - *ref_5 createdBy: allOf: - type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true trainingConfig: allOf: - *ref_6 evaluator: allOf: - *ref_7 wandbConfig: allOf: - *ref_8 outputStats: allOf: - type: string description: >- The output dataset's aggregated stats for the evaluation job. readOnly: true inferenceParameters: allOf: - *ref_9 outputMetrics: allOf: - type: string readOnly: true mcpServer: allOf: - *ref_10 title: 'Next ID: 29' refIdentifier: '#/components/schemas/gatewayReinforcementFineTuningJob' requiredProperties: *ref_11 examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: evaluationDataset: evalAutoCarveout: true state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 evaluator: wandbConfig: enabled: true apiKey: project: entity: runId: url: outputStats: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 outputMetrics: mcpServer: description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayInferenceParameters: type: object properties: maxTokens: type: integer format: int32 description: Maximum number of tokens to generate per response. temperature: type: number format: float description: Sampling temperature, typically between 0 and 2. topP: type: number format: float description: Top-p sampling parameter, typically between 0 and 1. 'n': type: integer format: int32 description: Number of response candidates to generate per input. extraBody: type: string description: |- Additional parameters for the inference request as a JSON string. For example: "{\"stop\": [\"\\n\"]}". topK: type: integer format: int32 description: >- Top-k sampling parameter, limits the token selection to the top k tokens. description: Parameters for the inference requests. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/api-reference/create-reinforcement-fine-tuning-step.md # Create Reinforcement Fine-tuning Step ## OpenAPI ````yaml post /v1/accounts/{account_id}/rlorTrainerJobs paths: path: /v1/accounts/{account_id}/rlorTrainerJobs method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: rlorTrainerJobId: schema: - type: string required: false description: >- ID of the RLOR trainer job, a random UUID will be generated if not specified. header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string dataset: allOf: - &ref_1 type: string description: The name of the dataset used for training. evaluationDataset: allOf: - &ref_2 type: string description: The name of a separate dataset to use for evaluation. evalAutoCarveout: allOf: - &ref_3 type: boolean description: Whether to auto-carve the dataset for eval. state: allOf: - &ref_4 $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - &ref_5 $ref: '#/components/schemas/gatewayStatus' readOnly: true trainingConfig: allOf: - &ref_6 $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. rewardWeights: allOf: - &ref_7 type: array items: type: string description: >- A list of reward metrics to use for training in format of "=". wandbConfig: allOf: - &ref_8 $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. required: true title: 'Next ID: 18' refIdentifier: '#/components/schemas/gatewayRlorTrainerJob' examples: example: value: displayName: dataset: evaluationDataset: evalAutoCarveout: true trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 rewardWeights: - wandbConfig: enabled: true apiKey: project: entity: runId: response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - *ref_0 createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time readOnly: true dataset: allOf: - *ref_1 evaluationDataset: allOf: - *ref_2 evalAutoCarveout: allOf: - *ref_3 state: allOf: - *ref_4 status: allOf: - *ref_5 createdBy: allOf: - type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true trainingConfig: allOf: - *ref_6 rewardWeights: allOf: - *ref_7 wandbConfig: allOf: - *ref_8 title: 'Next ID: 18' refIdentifier: '#/components/schemas/gatewayRlorTrainerJob' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: evaluationDataset: evalAutoCarveout: true state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 rewardWeights: - wandbConfig: enabled: true apiKey: project: entity: runId: url: description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-secret.md # Source: https://docs.fireworks.ai/api-reference/create-secret.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-secret.md # Source: https://docs.fireworks.ai/api-reference/create-secret.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-secret.md # Source: https://docs.fireworks.ai/api-reference/create-secret.md # null ## OpenAPI ````yaml post /v1/accounts/{account_id}/secrets paths: path: /v1/accounts/{account_id}/secrets method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: name: allOf: - &ref_0 type: string title: |- name follows the convention accounts/account-id/secrets/unkey-key-id keyName: allOf: - &ref_1 type: string title: >- name of the key. In this case, it can be WOLFRAM_ALPHA_API_KEY value: allOf: - &ref_2 type: string example: sk-1234567890abcdef description: >- The secret value. This field is INPUT_ONLY and will not be returned in GET or LIST responses for security reasons. The value is only accepted when creating or updating secrets. required: true refIdentifier: '#/components/schemas/gatewaySecret' requiredProperties: &ref_3 - name - keyName examples: example: value: name: keyName: value: sk-1234567890abcdef response: '200': application/json: schemaArray: - type: object properties: name: allOf: - *ref_0 keyName: allOf: - *ref_1 value: allOf: - *ref_2 refIdentifier: '#/components/schemas/gatewaySecret' requiredProperties: *ref_3 examples: example: value: name: keyName: value: sk-1234567890abcdef description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/create-snapshot.md # Create Snapshot ## OpenAPI ````yaml post /v1/accounts/{account_id}/snapshots paths: path: /v1/accounts/{account_id}/snapshots method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: state: allOf: - &ref_0 $ref: '#/components/schemas/gatewaySnapshotState' description: The state of the snapshot. readOnly: true status: allOf: - &ref_1 $ref: '#/components/schemas/gatewayStatus' description: The status code and message of the snapshot. readOnly: true required: true title: 'Next ID: 7' refIdentifier: '#/components/schemas/gatewaySnapshot' examples: example: value: {} description: The properties of the snapshot being created. response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the snapshot. e.g. accounts/my-account/clusters/my-cluster/environments/my-env/snapshots/1 readOnly: true createTime: allOf: - type: string format: date-time description: The creation time of the snapshot. readOnly: true state: allOf: - *ref_0 status: allOf: - *ref_1 imageRef: allOf: - type: string description: The URI of the container image for this snapshot. readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the snapshot. readOnly: true title: 'Next ID: 7' refIdentifier: '#/components/schemas/gatewaySnapshot' examples: example: value: name: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: imageRef: updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewaySnapshotState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - FAILED - DELETING default: STATE_UNSPECIFIED gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/create-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/create-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/create-supervised-fine-tuning-job.md # Create Supervised Fine-tuning Job ## OpenAPI ````yaml post /v1/accounts/{account_id}/supervisedFineTuningJobs paths: path: /v1/accounts/{account_id}/supervisedFineTuningJobs method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: supervisedFineTuningJobId: schema: - type: string required: false description: >- ID of the supervised fine-tuning job, a random UUID will be generated if not specified. header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string dataset: allOf: - &ref_1 type: string description: The name of the dataset used for training. state: allOf: - &ref_2 $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - &ref_3 $ref: '#/components/schemas/gatewayStatus' readOnly: true outputModel: allOf: - &ref_4 type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: allOf: - &ref_5 type: string description: >- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: allOf: - &ref_6 type: string description: >- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: allOf: - &ref_7 type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration earlyStop: allOf: - &ref_8 type: boolean description: >- Whether to stop training early if the validation loss does not improve. epochs: allOf: - &ref_9 type: integer format: int32 description: The number of epochs to train for. learningRate: allOf: - &ref_10 type: number format: float description: The learning rate used for training. maxContextLength: allOf: - &ref_11 type: integer format: int32 description: The maximum context length to use with the model. loraRank: allOf: - &ref_12 type: integer format: int32 description: The rank of the LoRA layers. wandbConfig: allOf: - &ref_13 $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. evaluationDataset: allOf: - &ref_14 type: string description: The name of a separate dataset to use for evaluation. isTurbo: allOf: - &ref_15 type: boolean description: Whether to run the fine-tuning job in turbo mode. evalAutoCarveout: allOf: - &ref_16 type: boolean description: Whether to auto-carve the dataset for eval. region: allOf: - &ref_17 $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. nodes: allOf: - &ref_18 type: integer format: int32 description: The number of nodes to use for the fine-tuning job. batchSize: allOf: - &ref_19 type: integer format: int32 title: The batch size for sequence packing in training mtpEnabled: allOf: - &ref_20 type: boolean title: Whether to enable MTP (Model-Token-Prediction) mode mtpNumDraftTokens: allOf: - &ref_21 type: integer format: int32 title: Number of draft tokens to use in MTP mode mtpFreezeBaseModel: allOf: - &ref_22 type: boolean title: >- Whether to freeze the base model parameters during MTP training hiddenStatesGenConfig: allOf: - &ref_23 $ref: '#/components/schemas/gatewayHiddenStatesGenConfig' description: >- Config for generating dataset with hidden states for training. metricsFileSignedUrl: allOf: - &ref_24 type: string title: The signed URL for the metrics file gradientAccumulationSteps: allOf: - &ref_25 type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: allOf: - &ref_26 type: integer format: int32 title: Number of steps for learning rate warm up required: true title: 'Next ID: 42' refIdentifier: '#/components/schemas/gatewaySupervisedFineTuningJob' requiredProperties: &ref_27 - dataset examples: example: value: displayName: dataset: outputModel: baseModel: warmStartFrom: jinjaTemplate: earlyStop: true epochs: 123 learningRate: 123 maxContextLength: 123 loraRank: 123 wandbConfig: enabled: true apiKey: project: entity: runId: evaluationDataset: isTurbo: true evalAutoCarveout: true region: REGION_UNSPECIFIED nodes: 123 batchSize: 123 mtpEnabled: true mtpNumDraftTokens: 123 mtpFreezeBaseModel: true hiddenStatesGenConfig: deployedModel: maxWorkers: 123 maxTokens: 123 inputOffset: 123 inputLimit: 123 maxContextLen: 123 regenerateAssistant: true outputActivations: true apiKey: metricsFileSignedUrl: gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - *ref_0 createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time readOnly: true dataset: allOf: - *ref_1 state: allOf: - *ref_2 status: allOf: - *ref_3 createdBy: allOf: - type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true outputModel: allOf: - *ref_4 baseModel: allOf: - *ref_5 warmStartFrom: allOf: - *ref_6 jinjaTemplate: allOf: - *ref_7 earlyStop: allOf: - *ref_8 epochs: allOf: - *ref_9 learningRate: allOf: - *ref_10 maxContextLength: allOf: - *ref_11 loraRank: allOf: - *ref_12 wandbConfig: allOf: - *ref_13 evaluationDataset: allOf: - *ref_14 isTurbo: allOf: - *ref_15 evalAutoCarveout: allOf: - *ref_16 region: allOf: - *ref_17 updateTime: allOf: - type: string format: date-time description: The update time for the supervised fine-tuning job. readOnly: true nodes: allOf: - *ref_18 batchSize: allOf: - *ref_19 mtpEnabled: allOf: - *ref_20 mtpNumDraftTokens: allOf: - *ref_21 mtpFreezeBaseModel: allOf: - *ref_22 hiddenStatesGenConfig: allOf: - *ref_23 metricsFileSignedUrl: allOf: - *ref_24 gradientAccumulationSteps: allOf: - *ref_25 learningRateWarmupSteps: allOf: - *ref_26 title: 'Next ID: 42' refIdentifier: '#/components/schemas/gatewaySupervisedFineTuningJob' requiredProperties: *ref_27 examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: outputModel: baseModel: warmStartFrom: jinjaTemplate: earlyStop: true epochs: 123 learningRate: 123 maxContextLength: 123 loraRank: 123 wandbConfig: enabled: true apiKey: project: entity: runId: url: evaluationDataset: isTurbo: true evalAutoCarveout: true region: REGION_UNSPECIFIED updateTime: '2023-11-07T05:31:56Z' nodes: 123 batchSize: 123 mtpEnabled: true mtpNumDraftTokens: 123 mtpFreezeBaseModel: true hiddenStatesGenConfig: deployedModel: maxWorkers: 123 maxTokens: 123 inputOffset: 123 inputLimit: 123 maxContextLen: 123 regenerateAssistant: true outputActivations: true apiKey: metricsFileSignedUrl: gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayHiddenStatesGenConfig: type: object properties: deployedModel: type: string maxWorkers: type: integer format: int32 maxTokens: type: integer format: int32 inputOffset: type: integer format: int32 inputLimit: type: integer format: int32 maxContextLen: type: integer format: int32 regenerateAssistant: type: boolean outputActivations: type: boolean apiKey: type: string description: >- Config for generating dataset with hidden states for SFTJ or eagle training. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-user.md # Source: https://docs.fireworks.ai/api-reference/create-user.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-user.md # Source: https://docs.fireworks.ai/api-reference/create-user.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/create-user.md # Source: https://docs.fireworks.ai/api-reference/create-user.md # Create User ## OpenAPI ````yaml post /v1/accounts/{account_id}/users paths: path: /v1/accounts/{account_id}/users method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: userId: schema: - type: string required: false description: |- The user ID to use in the user name. e.g. my-user If not specified, a default ID is generated from user.email. header: {} cookie: {} body: application/json: schemaArray: - type: object properties: displayName: allOf: - &ref_0 type: string description: |- Human-readable display name of the user. e.g. "Alice" Must be fewer than 64 characters long. serviceAccount: allOf: - &ref_1 type: boolean title: >- Whether this user is a service account (can only be set by admins) role: allOf: - &ref_2 type: string description: The user's role, e.g. admin or user. email: allOf: - &ref_3 type: string description: The user's email address. state: allOf: - &ref_4 $ref: '#/components/schemas/gatewayUserState' description: The state of the user. readOnly: true status: allOf: - &ref_5 $ref: '#/components/schemas/gatewayStatus' description: Contains information about the user status. readOnly: true required: true title: 'Next ID: 13' refIdentifier: '#/components/schemas/gatewayUser' requiredProperties: &ref_6 - role examples: example: value: displayName: serviceAccount: true role: email: description: The properties of the user being created. response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the user. e.g. accounts/my-account/users/my-user readOnly: true displayName: allOf: - *ref_0 serviceAccount: allOf: - *ref_1 createTime: allOf: - type: string format: date-time description: The creation time of the user. readOnly: true role: allOf: - *ref_2 email: allOf: - *ref_3 state: allOf: - *ref_4 status: allOf: - *ref_5 updateTime: allOf: - type: string format: date-time description: The update time for the user. readOnly: true title: 'Next ID: 13' refIdentifier: '#/components/schemas/gatewayUser' requiredProperties: *ref_6 examples: example: value: name: displayName: serviceAccount: true createTime: '2023-11-07T05:31:56Z' role: email: state: STATE_UNSPECIFIED status: code: OK message: updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayUserState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - UPDATING - DELETING default: STATE_UNSPECIFIED ```` --- # Source: https://docs.fireworks.ai/api-reference/creates-an-embedding-vector-representing-the-input-text.md # Create embeddings ## OpenAPI ````yaml post /embeddings paths: path: /embeddings method: post servers: - url: https://api.fireworks.ai/inference/v1/ request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer cookie: {} parameters: path: {} query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: input: allOf: - description: > Input text to embed, encoded as a string. To embed multiple inputs in a single request, pass an array of strings. You can pass structured object(s) to use along with the prompt_template. The input must not exceed the max input tokens for the model (8192 tokens for `nomic-ai/nomic-embed-text-v1.5`), cannot be an empty string, and any array must be 2048 dimensions or less. example: The quick brown fox jumped over the lazy dog oneOf: - type: string title: string description: The string that will be turned into an embedding. default: '' example: This is a test. - type: array title: array of strings description: >- The array of strings that will be turned into an embedding. minItems: 1 maxItems: 2048 items: type: string default: '' example: '[''This is a test.'', ''This is another test.'']' - type: object title: structured data description: >- Structured data to use while forming the input string using the prompt template. example: text: Hello world metadata: id: 1 source: user_input - type: array title: array of objects description: >- Array of structured data to use while forming the input strings using the prompt template. items: type: object example: - text: First document metadata: id: 1 source: user_input - text: Second document metadata: id: 2 source: user_input x-oaiExpandable: true model: allOf: - description: The model to use for generating embeddings. example: nomic-ai/nomic-embed-text-v1.5 type: string x-oaiTypeLabel: string prompt_template: allOf: - description: > Template string for processing input data before embedding. When provided, fields from the input object are substituted using [Jinja2](https://jinja.palletsprojects.com/en/stable/). For example, simple substitution is done using `{field_name}` syntax. The resulting string(s) are then embedded. For array inputs, each object generates a separate string. Additionally, we expose `truncate_tokens(string)` function to the template that allows to truncate the string based on token lengths instead of characters type: string example: 'Embed this text: {text}' dimensions: allOf: - description: > The number of dimensions the resulting output embeddings should have. Only supported in `nomic-ai/nomic-embed-text-v1.5` and later models. type: integer minimum: 1 example: 768 return_logits: allOf: - description: > If provided, returns raw model logits (pre-softmax scores) for specified token or class indices. If an empty list is provided, returns logits for all available tokens/classes. Otherwise, only the specified indices are returned. When used with normalize=true, softmax is applied to create probability distributions. Softmax is applied only to the selected tokens, so output probabilities will always add up to 1. type: array items: type: integer example: - 0 - 1 - 2 normalize: allOf: - description: > Controls normalization of the output. When return_logits is not provided, embeddings are L2 normalized (unit vectors). When return_logits is provided, softmax is applied to the selected logits to create probability distributions. type: boolean default: false example: false required: true refIdentifier: '#/components/schemas/CreateEmbeddingRequest' requiredProperties: - model - input additionalProperties: false examples: example: value: input: The quick brown fox jumped over the lazy dog model: nomic-ai/nomic-embed-text-v1.5 prompt_template: 'Embed this text: {text}' dimensions: 768 return_logits: - 0 - 1 - 2 normalize: false response: '200': application/json: schemaArray: - type: object properties: data: allOf: - type: array description: The list of embeddings generated by the model. items: $ref: '#/components/schemas/Embedding' model: allOf: - type: string description: The name of the model used to generate the embedding. object: allOf: - type: string description: The object type, which is always "list". enum: - list usage: allOf: - type: object description: The usage information for the request. properties: prompt_tokens: type: integer description: The number of tokens used by the prompt. total_tokens: type: integer description: The total number of tokens used by the request. required: - prompt_tokens - total_tokens refIdentifier: '#/components/schemas/CreateEmbeddingResponse' requiredProperties: - object - model - data - usage examples: example: value: data: - index: 123 embedding: - 123 object: embedding model: object: list usage: prompt_tokens: 123 total_tokens: 123 description: OK deprecated: false type: path components: schemas: Embedding: type: object description: | Represents an embedding vector returned by embedding endpoint. properties: index: type: integer description: The index of the embedding in the list of embeddings. embedding: type: array description: > The embedding vector, which is a list of floats. The length of vector depends on the model as listed in the [embedding guide](/guides/querying-embedding-models). items: type: number object: type: string description: The object type, which is always "embedding". enum: - embedding required: - index - object - embedding x-oaiMeta: name: The embedding object example: | { "object": "embedding", "embedding": [ 0.0023064255, -0.009327292, .... (1536 floats total for ada-002) -0.0028842222, ], "index": 0 } ```` --- # Source: https://docs.fireworks.ai/guides/security_compliance/data_handling.md # Zero Data Retention > Data retention policies at Fireworks ## Zero data retention Fireworks has Zero Data Retention by default. Specifically, this means * Fireworks does not log or store prompt or generation data for any open models, without explicit user opt-in. * More technically: prompt and generation data exist only in volatile memory for the duration of the request. If [prompt caching](https://docs.fireworks.ai/guides/prompt-caching#data-privacy) is active, some prompt data (and associated KV caches) can be stored in volatile memory for several minutes. In either case, prompt and generation data are not logged into any persistent storage. * Fireworks logs metadata (e.g. number of tokens in a request) as required to deliver the service. * Users can explicitly opt-in to log prompt and generation data for certain advanced features (e.g. FireOptimizer). * For proprietary Fireworks models (e.g. f1, FireFunction), prompt and generation data may be logged to enable bulk analytics to improve the model. * In this case, the model description will contain an explicit message about logging. ## Response API data retention For the Response API specifically, Fireworks retains conversation data with the following policy when the API request has `store=True` (the default): * **What is stored**: Conversation messages include the complete conversation data: * User prompts * Model responses * Tools called by the model * **Opt-out option**: You can disable data storage by setting `store=False` in your API requests to prevent any conversation data from being retained. * **Retention period**: All stored conversation data is automatically deleted after 30 days. * **Immediate deletion**: You can immediately delete stored conversation data using the DELETE API endpoint by providing the `response_id`. This will permanently remove the record. This retention policy is designed to be consistent with the OpenAI API while providing users control over their data storage preferences. The Response API retention policy only applies to conversation data when using the Response API endpoints. All other Fireworks services follow the zero data retention policy described above. --- # Source: https://docs.fireworks.ai/guides/security_compliance/data_security.md # Data Security > How we secure and handle your data for inference and training At Fireworks, protecting customer data is at the core of our platform. We design all of our systems, infrastructure, and business processes to ensure customer trust through verifiable security & compliance. This page provides an overview of our key security measures. For documentation and audit reports, see our [Trust Center](https://trust.fireworks.ai/). ## Zero Data Retention Fireworks does not log or store prompt or generation data for open models, without explicit user opt-in. See our [Zero Data Retention Policy](https://docs.fireworks.ai/guides/security_compliance/data_handling). ## Secure Data Handling **Data Ownership & Control:** Customers maintain ownership of their data. Customer data stored as part of an active workflow can be permanently deleted with auditable confirmation, and secure wipe processes ensure deleted assets cannot be reconstructed. **Encryption**: Data is encrypted in transit (TLS 1.2+) and at rest (AES-256). **Bring Your Own Bucket:** Customers may integrate their own cloud storage to retain governance and apply their own compliance frameworks. * Datasets: [GCS Bucket Integration](/fine-tuning/secure-fine-tuning#gcs-bucket-integration) (AWS S3 coming soon) * Models: [External AWS S3 Bucket Integration](/models/uploading-custom-models#uploading-your-model) * (Coming soon) Encryption Keys: Customers may choose to use their own encryption keys and policies for end-to-end control. **Access Logging:** All customer data access is logged, monitored, and protected against tampering. See [Audit & Access Logs](https://docs.fireworks.ai/guides/security_compliance/audit_logs). ## Workload Isolation Dedicated workloads run in logically isolated environments, preventing cross-customer access or data leakage. ## Secure Training Fireworks enables secure model training, including fine-tuning and reinforcement learning, while maintaining customer control over sensitive components and data. This approach builds on our [Zero Data Retention](#zero-data-retention) policy to ensure sensitive training data never persists on our platform. **Customer-Controlled Architecture:** For advanced training workflows like reinforcement learning, critical components remain under customer control: * Reward models and reward functions are kept proprietary and not shared * Rollout servers and training metrics are built and managed by customers * Model checkpoints are managed through secure cloud storage registries **Minimal Data Sharing:** Training data is shared via controlled bucket access with minimal sharing and step-wise retention, limiting data exposure while enabling effective training workflows. **API-Based Integration:** Customers leverage Fireworks' training APIs while maintaining full control over sensitive components, ensuring no cross-component data leakage. For detailed guidance on secure reinforcement fine-tuning and using your own cloud storage, see [Secure Fine Tuning](/fine-tuning/secure-fine-tuning). ## Technical Safeguards * **Device Trust**: Only approved, secured devices with strong authentication can access sensitive Fireworks systems. * **Identity & Access Management**: Fine-grained access controls are enforced across all Fireworks environments, following the principle of least privilege. * **Network Security** * Private network isolation for customer workloads. * Firewalls and security groups prevent unauthorized inbound/outbound traffic. * DDoS protection is in place across core services. * **Monitoring & Detection**: Real-time monitoring and anomaly detection systems alert on suspicious activity * **Vulnerability Management**: Continuous scanning and patching processes keep infrastructure up to date against known threats. ## Operational Security * **Security Reviews & Testing**: Regular penetration testing validates controls. * **Incident Response**: A formal incident response plan ensures swift containment, customer notification, and remediation if an issue arises. * **Employee Access**: Only a minimal subset of Fireworks personnel have access to production systems, and all access is logged and periodically reviewed. * **Third-Party Risk Management**: Vendors and subprocessors undergo rigorous due diligence and contractual security obligations. ## Compliance & Certifications Fireworks aligns with leading industry standards to support customer compliance obligations: * **SOC 2 Type II** (certified) * **ISO 27001 / ISO 27701 / ISO 42001** (in progress) * **HIPAA Support**: Firework is HIPAA compliant and supports healthcare and life sciences organizations in leveraging our rapid inference capabilities with confidence. * **Regulatory Alignment**: Controls are mapped to GDPR, CCPA, and other international data protection frameworks Documentation and audit reports are available in our [Trust Center](https://trust.fireworks.ai/). --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-api-key.md # Source: https://docs.fireworks.ai/api-reference/delete-api-key.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-api-key.md # Source: https://docs.fireworks.ai/api-reference/delete-api-key.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-api-key.md # Source: https://docs.fireworks.ai/api-reference/delete-api-key.md # Delete API Key ## OpenAPI ````yaml post /v1/accounts/{account_id}/users/{user_id}/apiKeys:delete paths: path: /v1/accounts/{account_id}/users/{user_id}/apiKeys:delete method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id user_id: schema: - type: string required: true description: The User Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: keyId: allOf: - type: string description: The key ID for the API key. required: true refIdentifier: '#/components/schemas/GatewayDeleteApiKeyBody' requiredProperties: - keyId examples: example: value: keyId: response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/delete-aws-iam-role-binding.md # Delete Aws Iam Role Binding ## OpenAPI ````yaml post /v1/accounts/{account_id}/awsIamRoleBindings:delete paths: path: /v1/accounts/{account_id}/awsIamRoleBindings:delete method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: principal: allOf: - type: string description: >- The principal that is allowed to assume the AWS IAM role. This must be the email address of the user. role: allOf: - type: string description: >- The AWS IAM role ARN that is allowed to be assumed by the principal. required: true title: |- The AWS IAM role binding being deleted. Must specify account_id, principal, and role. examples: example: value: principal: role: description: |- The AWS IAM role binding being deleted. Must specify account_id, principal, and role. response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/delete-batch-inference-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/delete-batch-inference-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/delete-batch-inference-job.md # Delete Batch Inference Job ## OpenAPI ````yaml delete /v1/accounts/{account_id}/batchInferenceJobs/{batch_inference_job_id} paths: path: /v1/accounts/{account_id}/batchInferenceJobs/{batch_inference_job_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id batch_inference_job_id: schema: - type: string required: true description: The Batch Inference Job Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/delete-batch-job.md # Delete Batch Job ## OpenAPI ````yaml delete /v1/accounts/{account_id}/batchJobs/{batch_job_id} paths: path: /v1/accounts/{account_id}/batchJobs/{batch_job_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id batch_job_id: schema: - type: string required: true description: The Batch Job Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/delete-cluster.md # Delete Cluster ## OpenAPI ````yaml delete /v1/accounts/{account_id}/clusters/{cluster_id} paths: path: /v1/accounts/{account_id}/clusters/{cluster_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id cluster_id: schema: - type: string required: true description: The Cluster Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-dataset.md # Source: https://docs.fireworks.ai/api-reference/delete-dataset.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-dataset.md # Source: https://docs.fireworks.ai/api-reference/delete-dataset.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-dataset.md # Source: https://docs.fireworks.ai/api-reference/delete-dataset.md # Delete Dataset ## OpenAPI ````yaml delete /v1/accounts/{account_id}/datasets/{dataset_id} paths: path: /v1/accounts/{account_id}/datasets/{dataset_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id dataset_id: schema: - type: string required: true description: The Dataset Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference/delete-deployed-model.md # Unload LoRA ## OpenAPI ````yaml delete /v1/accounts/{account_id}/deployedModels/{deployed_model_id} paths: path: /v1/accounts/{account_id}/deployedModels/{deployed_model_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id deployed_model_id: schema: - type: string required: true description: The Deployed Model Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-deployment.md # Source: https://docs.fireworks.ai/api-reference/delete-deployment.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-deployment.md # Source: https://docs.fireworks.ai/api-reference/delete-deployment.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-deployment.md # Source: https://docs.fireworks.ai/api-reference/delete-deployment.md # Delete Deployment ## OpenAPI ````yaml delete /v1/accounts/{account_id}/deployments/{deployment_id} paths: path: /v1/accounts/{account_id}/deployments/{deployment_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id deployment_id: schema: - type: string required: true description: The Deployment Id query: hard: schema: - type: boolean required: false description: If true, this will perform a hard deletion. ignoreChecks: schema: - type: boolean required: false description: >- If true, this will ignore checks and force the deletion of a deployment that is currently deployed and is in use. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/delete-dpo-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/delete-dpo-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/delete-dpo-job.md # null ## OpenAPI ````yaml delete /v1/accounts/{account_id}/dpoJobs/{dpo_job_id} paths: path: /v1/accounts/{account_id}/dpoJobs/{dpo_job_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id dpo_job_id: schema: - type: string required: true description: The Dpo Job Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/delete-environment.md # Delete Environment ## OpenAPI ````yaml delete /v1/accounts/{account_id}/environments/{environment_id} paths: path: /v1/accounts/{account_id}/environments/{environment_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id environment_id: schema: - type: string required: true description: The Environment Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference/delete-evaluation-job.md # Delete Evaluation Job ## OpenAPI ````yaml delete /v1/accounts/{account_id}/evaluationJobs/{evaluation_job_id} openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluationJobs/{evaluation_job_id}: delete: tags: - Gateway summary: Delete Evaluation Job operationId: Gateway_DeleteEvaluationJob parameters: - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluation_job_id in: path required: true description: The Evaluation Job Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: type: object properties: {} components: securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-evaluator-revision.md # firectl delete evaluator-revision > Delete an evaluator revision ``` firectl delete evaluator-revision [flags] ``` ### Examples ``` firectl delete evaluator-revision accounts/my-account/evaluators/my-evaluator/versions/abc123 ``` ### Flags ``` -h, --help help for evaluator-revision ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/delete-evaluator.md # Delete Evaluator > Deletes an evaluator and its associated versions and build artifacts. ## OpenAPI ````yaml delete /v1/accounts/{account_id}/evaluators/{evaluator_id} openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluators/{evaluator_id}: delete: tags: - Gateway summary: Delete Evaluator description: Deletes an evaluator and its associated versions and build artifacts. operationId: Gateway_DeleteEvaluator parameters: - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluator_id in: path required: true description: The Evaluator Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: type: object properties: {} components: securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-model.md # Source: https://docs.fireworks.ai/api-reference/delete-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-model.md # Source: https://docs.fireworks.ai/api-reference/delete-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-model.md # Source: https://docs.fireworks.ai/api-reference/delete-model.md # Delete Model ## OpenAPI ````yaml delete /v1/accounts/{account_id}/models/{model_id} paths: path: /v1/accounts/{account_id}/models/{model_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id model_id: schema: - type: string required: true description: The Model Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/delete-node-pool-binding.md # Delete Node Pool Binding ## OpenAPI ````yaml post /v1/accounts/{account_id}/nodePoolBindings:delete paths: path: /v1/accounts/{account_id}/nodePoolBindings:delete method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: principal: allOf: - type: string description: >- The principal that is allowed use the node pool. This must be the email address of the user. required: true title: |- The node pool binding being deleted. Must specify account_id, cluster_id, node_pool_id, and principal. examples: example: value: principal: description: |- The node pool binding being deleted. Must specify account_id, cluster_id, node_pool_id, and principal. response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/delete-node-pool.md # Delete Node Pool ## OpenAPI ````yaml delete /v1/accounts/{account_id}/nodePools/{node_pool_id} paths: path: /v1/accounts/{account_id}/nodePools/{node_pool_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id node_pool_id: schema: - type: string required: true description: The Node Pool Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/delete-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/delete-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/delete-reinforcement-fine-tuning-job.md # Delete Reinforcement Fine-tuning Job ## OpenAPI ````yaml delete /v1/accounts/{account_id}/reinforcementFineTuningJobs/{reinforcement_fine_tuning_job_id} paths: path: >- /v1/accounts/{account_id}/reinforcementFineTuningJobs/{reinforcement_fine_tuning_job_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id reinforcement_fine_tuning_job_id: schema: - type: string required: true description: The Reinforcement Fine-tuning Job Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference/delete-reinforcement-fine-tuning-step.md # Delete Reinforcement Fine-tuning Step ## OpenAPI ````yaml delete /v1/accounts/{account_id}/rlorTrainerJobs/{rlor_trainer_job_id} paths: path: /v1/accounts/{account_id}/rlorTrainerJobs/{rlor_trainer_job_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id rlor_trainer_job_id: schema: - type: string required: true description: The Rlor Trainer Job Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference/delete-response.md # Delete Response > Deletes a model response by its ID. Once deleted, the response data will be gone immediately and permanently. The response cannot be recovered and any conversations that reference this response ID will no longer be able to access it. ## OpenAPI ````yaml delete /v1/responses/{response_id} paths: path: /v1/responses/{response_id} method: delete servers: - url: https://api.fireworks.ai/inference request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: response_id: schema: - type: string required: true title: Response Id description: The ID of the response to delete query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: message: allOf: - type: string title: Message description: Confirmation message example: Response deleted successfully title: DeleteResponse description: Response model for deleting a response. refIdentifier: '#/components/schemas/DeleteResponse' requiredProperties: - message examples: example: value: message: Response deleted successfully description: Successful Response '422': application/json: schemaArray: - type: object properties: detail: allOf: - items: $ref: '#/components/schemas/ValidationError' type: array title: Detail title: HTTPValidationError refIdentifier: '#/components/schemas/HTTPValidationError' examples: example: value: detail: - loc: - msg: type: description: Validation Error deprecated: false type: path components: schemas: ValidationError: properties: loc: items: anyOf: - type: string - type: integer type: array title: Location msg: type: string title: Message type: type: string title: Error Type type: object required: - loc - msg - type title: ValidationError ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-secret.md # Source: https://docs.fireworks.ai/api-reference/delete-secret.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-secret.md # Source: https://docs.fireworks.ai/api-reference/delete-secret.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-secret.md # Source: https://docs.fireworks.ai/api-reference/delete-secret.md # null ## OpenAPI ````yaml delete /v1/accounts/{account_id}/secrets/{secret_id} paths: path: /v1/accounts/{account_id}/secrets/{secret_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id secret_id: schema: - type: string required: true description: The Secret Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/delete-snapshot.md # Delete Snapshot ## OpenAPI ````yaml delete /v1/accounts/{account_id}/snapshots/{snapshot_id} paths: path: /v1/accounts/{account_id}/snapshots/{snapshot_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id snapshot_id: schema: - type: string required: true description: The Snapshot Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/delete-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/delete-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/delete-supervised-fine-tuning-job.md # Delete Supervised Fine-tuning Job ## OpenAPI ````yaml delete /v1/accounts/{account_id}/supervisedFineTuningJobs/{supervised_fine_tuning_job_id} paths: path: >- /v1/accounts/{account_id}/supervisedFineTuningJobs/{supervised_fine_tuning_job_id} method: delete servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id supervised_fine_tuning_job_id: schema: - type: string required: true description: The Supervised Fine-tuning Job Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/delete-user.md # firectl delete user > Deletes a user. ``` firectl delete user [flags] ``` ### Examples ``` firectl delete user my-user firectl delete user accounts/my-account/users/my-user ``` ### Flags ``` -h, --help help for user ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/fine-tuning/deploying-loras.md # Deploying Fine Tuned Models > Deploy one or multiple LoRA models fine tuned on Fireworks After fine-tuning your model on Fireworks, deploy it to make it available for inference. You can also upload and deploy LoRA models fine-tuned outside of Fireworks. See [importing fine-tuned models](/models/uploading-custom-models#importing-fine-tuned-models) for details. ## Single-LoRA deployment Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments. ### Quick deployment Deploy your LoRA fine-tuned model with one simple command: ```bash theme={null} firectl create deployment "accounts/fireworks/models/" ``` Your deployment will be ready to use once it completes, with performance that matches the base model. ### Deployment with the Build SDK You can also deploy your LoRA fine-tuned model using the Build SDK: ```python theme={null} from fireworks import LLM # Deploy a fine-tuned model with on-demand deployment (live merge) fine_tuned_llm = LLM( model="accounts/your-account/models/your-fine-tuned-model-id", deployment_type="on-demand", id="my-fine-tuned-deployment" # Simple string identifier ) # Apply the deployment to ensure it's ready fine_tuned_llm.apply() # Use the deployed model response = fine_tuned_llm.chat.completions.create( messages=[{"role": "user", "content": "Hello!"}] ) # Track deployment in web dashboard print(f"Track at: {fine_tuned_llm.deployment_url}") ``` The `id` parameter can be any simple string - it does not need to follow the format `"accounts/account_id/deployments/model_id"`. ## Multi-LoRA deployment If you have multiple fine-tuned versions of the same base model (e.g., you've fine-tuned the same model for different use cases, applications, or prototyping), you can share a single base model deployment across these LoRA models to achieve higher utilization. Multi-LoRA deployment comes with performance tradeoffs. We recommend using it only if you need to serve multiple fine-tunes of the same base model and are willing to trade performance for higher deployment utilization. ### Deploy with CLI Deploy the base model with addons enabled: ```bash theme={null} firectl create deployment "accounts/fireworks/models/" --enable-addons ``` Once the deployment is ready, load your LoRA models onto the deployment: ```bash theme={null} firectl load-lora --deployment ``` You can load multiple LoRA models onto the same deployment by repeating this command with different model IDs. ### Deploy with the Build SDK You can also use multi-LoRA deployment with the Build SDK: ```python theme={null} from fireworks import LLM # Create a base model deployment with addons enabled base_model = LLM( model="accounts/fireworks/models/base-model-id", deployment_type="on-demand", id="shared-base-deployment", # Simple string identifier enable_addons=True ) base_model.apply() # Deploy multiple fine-tuned models using the same base deployment fine_tuned_model_1 = LLM( model="accounts/your-account/models/fine-tuned-model-1", deployment_type="on-demand-lora", base_id=base_model.deployment_id ) fine_tuned_model_2 = LLM( model="accounts/your-account/models/fine-tuned-model-2", deployment_type="on-demand-lora", base_id=base_model.deployment_id ) # Apply deployments fine_tuned_model_1.apply() fine_tuned_model_2.apply() # Use the deployed models response_1 = fine_tuned_model_1.chat.completions.create( messages=[{"role": "user", "content": "Hello from model 1!"}] ) response_2 = fine_tuned_model_2.chat.completions.create( messages=[{"role": "user", "content": "Hello from model 2!"}] ) ``` When using `deployment_type="on-demand-lora"`, you need to provide the `base_id` parameter that references the deployment ID of your base model deployment. ### When to use multi-LoRA deployment Use multi-LoRA deployment when you: * Need to serve multiple fine-tuned models based on the same base model * Want to maximize deployment utilization * Can accept some performance tradeoff compared to single-LoRA deployment * Are managing multiple variants or experiments of the same model ## Serverless deployment For quick experimentation and prototyping, you can deploy your fine-tuned model to shared serverless infrastructure without managing GPUs. Not all base models support serverless addons. Check the [list of models that support serverless with LoRA](https://app.fireworks.ai/models?filter=LLM\&serverlessWithLoRA=true) to confirm your base model is supported. ### Deploy to serverless Load your fine-tuned model into a serverless deployment: ```bash theme={null} firectl load-lora ``` ### Key considerations * **No hosting costs**: Deploying to serverless is free—you only pay per-token usage costs * **Rate limits**: Same rate limits apply as serverless base models * **Performance**: Lower performance than on-demand deployments and the base model * **Automatic unloading**: Unused addons may be automatically unloaded after a week * **Limit**: Deploy up to 100 fine-tuned models to serverless For production workloads requiring consistent performance, use [on-demand deployments](#single-lora-deployment) instead. ## Next steps Learn about deployment configuration and optimization Upload LoRA models fine-tuned outside of Fireworks --- # Source: https://docs.fireworks.ai/deployments/direct-routing.md # Direct routing > Direct routing enables enterprise users reduce latency to their deployments. ## Internet direct routing Internet direct routing bypasses our global API load balancer and directly routes your request to the machines where your deployment is running. This can save several tens or even hundreds of milliseconds of time-to-first-token (TTFT) latency. To create a deployment using Internet direct routing: When creating a deployment with direct routing, the `--region` parameter is required to specify the deployment region. ```bash theme={null} $ firectl create deployment accounts/fireworks/models/llama-v3p1-8b-instruct \ --direct-route-type INTERNET \ --direct-route-api-keys \ --region Name: accounts/my-account/deployments/abcd1234 ... Direct Route Handle: my-account-abcd1234.us-arizona-1.direct.fireworks.ai Region: US_ARIZONA_1 ``` If you have multiple API keys, use repeated fields, such as: `--direct-route-api-keys= --direct-route-api-keys=`. These keys can be any alpha-numeric string and are a distinct concept from the API keys provisioned via the Fireworks console. A key provisioned in the console but not specified the list here will not be allowed when querying the model via direct routing. Take note of the `Direct Route Handle` to get the inference endpoint. This is what you will use access the deployment instead of the global `https://api.fireworks.ai/inference/` endpoint. For example: ```bash theme={null} curl \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data '{ "model": "accounts/fireworks/models/llama-v3-8b-instruct", "prompt": "The sky is" }' \ --url https://my-account-abcd1234.us-arizona-1.direct.fireworks.ai/v1/completions ``` ### Use the OpenAI SDK with direct routing Set the direct route handle (with the `/v1` suffix) as the `base_url` when you initialize the OpenAI SDK so your calls go straight to the regional deployment endpoint. ```python theme={null} from openai import OpenAI client = OpenAI( api_key="", base_url="https://my-account-abcd1234.us-arizona-1.direct.fireworks.ai/v1" ) response = client.chat.completions.create( model="accounts/fireworks/models/llama-v3-8b-instruct", messages=[{"role": "user", "content": "Hello!"}] ) ``` The direct route handle replaces the standard [https://api.fireworks.ai/inference/v1](https://api.fireworks.ai/inference/v1) endpoint—keep the `/v1` suffix so the OpenAI SDK routes requests correctly while bypassing the global load balancer to reduce latency. ## Supported Regions for Direct Routing Direct routing is currently supported in the following regions: * `US_IOWA_1` * `US_VIRGINIA_1` * `US_ARIZONA_1` * `US_ILLINOIS_1` * `US_TEXAS_1` * `US_ILLINOIS_2` * `EU_FRANKFURT_1` * `US_WASHINGTON_3` * `US_WASHINGTON_1` * `AP_TOKYO_1` ## Private Service Connect (PSC) Contact your Fireworks representative to set up [GCP Private Service Connect](https://cloud.google.com/vpc/docs/private-service-connect) to your deployment. ## AWS PrivateLink Contact your Fireworks representative to set up [AWS PrivateLink](https://aws.amazon.com/privatelink/) to your deployment. --- # Source: https://docs.fireworks.ai/api-reference-dlde/disconnect-environment.md # Disconnect Environment > Disconnects the environment from the node pool. Returns an error if the environment is not connected to a node pool. ## OpenAPI ````yaml post /v1/accounts/{account_id}/environments/{environment_id}:disconnect paths: path: /v1/accounts/{account_id}/environments/{environment_id}:disconnect method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id environment_id: schema: - type: string required: true description: The Environment Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: force: allOf: - type: boolean description: >- Disconnect the environment even if snapshotting fails (e.g. due to pod failure). This flag should only be used if you are certain that the pod is gone. resetSnapshots: allOf: - type: boolean description: >- Forces snapshots to be rebuilt. This can be used when there are too many snapshot layers or when an unforeseen snapshotting logic error has occurred. required: true refIdentifier: '#/components/schemas/GatewayDisconnectEnvironmentBody' examples: example: value: force: true resetSnapshots: true response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/do-you-provide-notice-before-removing-model-availability.md # Do you provide notice before removing model availability? Yes, we provide advance notice before removing models from the serverless infrastructure: * **Minimum 2 weeks’ notice** before model removal * Longer notice periods may be provided for **popular models**, depending on usage * Higher-usage models may have extended deprecation timelines **Best Practices**: 1. Monitor announcements regularly. 2. Prepare a migration plan in advance. 3. Test alternative models to ensure continuity. 4. Keep your contact information updated for timely notifications. --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/do-you-support-auto-scaling.md # Do you support Auto Scaling? Yes, our system supports **auto scaling** with the following features: * **Scaling down to zero** capability for resource efficiency * Controllable **scale-up and scale-down velocity** * **Custom scaling rules and thresholds** to match your specific needs --- # Source: https://docs.fireworks.ai/faq-new/models-inference/does-fireworks-support-custom-base-models.md # Does Fireworks support custom base models? Yes, custom base models can be deployed via **firectl**. You can learn more about custom model deployment in our [guide on uploading custom models](https://docs.fireworks.ai/models/uploading-custom-models). --- # Source: https://docs.fireworks.ai/faq-new/models-inference/does-the-api-support-batching-and-load-balancing.md # Does the API support batching and load balancing? Current capabilities include: * **Load balancing**: Yes, supported out of the box * **Continuous batching**: Yes, supported * **Batch inference**: Yes, supported via the [Batch API](/guides/batch-inference) * **Streaming**: Yes, supported For asynchronous batch processing of large volumes of requests, see our [Batch API documentation](/guides/batch-inference). --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/download-billing-metrics.md # firectl download billing-metrics > Exports billing metrics ``` firectl download billing-metrics [flags] ``` ### Examples ``` firectl export billing-metrics ``` ### Flags ``` --end-time string The end time (exclusive). --filename string The file name to export to. (default "billing_metrics.csv") -h, --help help for billing-metrics --start-time string The start time (inclusive). ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/download-dataset.md # firectl download dataset > Downloads a dataset to a local directory. ``` firectl download dataset [flags] ``` ### Examples ``` # Download a single dataset firectl download dataset my-dataset --output-dir /path/to/download # Download entire lineage chain firectl download dataset my-dataset --download-lineage --output-dir /path/to/download ``` ### Flags ``` --download-lineage If true, downloads entire lineage chain (all related datasets) -h, --help help for dataset --output-dir string Directory to download dataset files to (default ".") --quiet If true, does not show download progress ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/download-dpo-job-metrics.md # firectl download dpo-job-metrics > Retrieves metrics for a dpo job. ``` firectl download dpo-job-metrics [flags] ``` ### Examples ``` firectl download dpoj-metrics my-dpo-job firectl download dpoj-metrics accounts/my-account/dpo-jobs/my-dpo-job ``` ### Flags ``` --filename string The file name to export to. (default "metrics.jsonl") -h, --help help for dpo-job-metrics ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/download-model.md # firectl download model > Download a model. ``` firectl download model [flags] ``` ### Examples ``` firectl download model my-model /path/to/checkpoint/ ``` ### Flags ``` -h, --help help for model --quiet If true, does not print the upload progress bar. ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/fine-tuning/dpo-fine-tuning.md # Direct Preference Optimization Direct Preference Optimization (DPO) fine-tunes models by training them on pairs of preferred and non-preferred responses to the same prompt. This teaches the model to generate more desirable outputs while reducing unwanted behaviors. **Use DPO when:** * Aligning model outputs with brand voice, tone, or style guidelines * Reducing hallucinations or incorrect reasoning patterns * Improving response quality where there's no single "correct" answer * Teaching models to follow specific formatting or structural preferences ## Fine-tuning with DPO Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example. **Minimum Requirements:** * **Minimum examples needed:** 3 * **Maximum examples:** Up to 3 million examples per dataset * **File format:** JSONL (each line is a valid JSON object) * **Dataset Schema:** Each training sample must include the following fields: * An `input` field containing a `messages` array, where each message is an object with two fields: * `role`: one of `system`, `user`, or `assistant` * `content`: a string representing the message content * A `preferred_output` field containing an assistant message with an ideal response * A `non_preferred_output` field containing an assistant message with a suboptimal response Here’s an example conversation dataset (one training example): ```json einstein_dpo.jsonl theme={null} { "input": { "messages": [ { "role": "user", "content": "What is Einstein famous for?" } ], "tools": [] }, "preferred_output": [ { "role": "assistant", "content": "Einstein is renowned for his theory of relativity, especially the equation E=mc²." } ], "non_preferred_output": [ { "role": "assistant", "content": "He was a famous scientist." } ] } ``` We currently only support one-turn conversations for each example, where the preferred and non-preferred messages need to be the last assistant message. Save this dataset as jsonl file locally, for example `einstein_dpo.jsonl`. There are a couple ways to upload the dataset to Fireworks platform for fine tuning: `firectl`, `Restful API` , `builder SDK` or `UI`. * You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard. Dataset Pn * Upload dataset using `firectl` ```bash theme={null} firectl create dataset /path/to/file.jsonl ``` You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset). Note that the `exampleCount` parameter needs to be provided by the client. ```jsx theme={null} // Create Dataset Entry const createDatasetPayload = { datasetId: "trader-poe-sample-data", dataset: { userUploaded: {} } // Additional params such as exampleCount }; const urlCreateDataset = `${BASE_URL}/datasets`; const response = await fetch(urlCreateDataset, { method: "POST", headers: HEADERS_WITH_CONTENT_TYPE, body: JSON.stringify(createDatasetPayload) }); ``` ```jsx theme={null} // Upload JSONL file const urlUpload = `${BASE_URL}/datasets/${DATASET_ID}:upload`; const files = new FormData(); files.append("file", localFileInput.files[0]); const uploadResponse = await fetch(urlUpload, { method: "POST", headers: HEADERS, body: files }); ``` While all of the above approaches should work, `UI` is more suitable for smaller datasets `< 500MB` while `firectl` might work better for bigger datasets. Ensure the dataset ID conforms to the [resource id restrictions](/getting-started/concepts#resource-names-and-ids). Simple use `firectl` to create a new DPO job: ```bash theme={null} firectl create dpoj \ --base-model accounts/account-id/models/base-model-id \ --dataset accounts/my-account-id/datasets/my-dataset-id \ --output-model new-model-id ``` for our example, we might run the following command: ```bash theme={null} firectl create dpoj \ --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \ --dataset accounts/pyroworks/datasets/einstein-dpo \ --output-model einstein-dpo-model ``` to fine-tune a [Llama 3.1 8b Instruct](https://fireworks.ai/models/fireworks/llama-v3p1-8b-instruct) model with our Einstein dataset. Use `firectl` to monitor progress updates for the DPO fine-tuning job. ```bash theme={null} firectl get dpoj dpo-job-id ``` Once the job is complete, the `STATE` will be set to `JOB_STATE_COMPLETED`, and the fine-tuned model can be deployed. Once training completes, you can create a deployment to interact with the fine-tuned model. Refer to [deploying a fine-tuned model](/fine-tuning/fine-tuning-models#deploying-a-fine-tuned-model) for more details. ## Next Steps Explore other fine-tuning methods to improve model output for different use cases. Train models on input-output examples to improve task-specific performance. Optimize models using AI feedback for complex reasoning and decision-making. Fine-tune vision-language models to understand both images and text. --- # Source: https://docs.fireworks.ai/fine-tuning/environments.md # Agent Tracing > Understand where your agent runs and how tracing enables reinforcement fine-tuning ## Why agent tracing is critical to doing RL Reinforcement learning for agents depends on the entire chain of actions, tool calls, state transitions, and intermediate decisions—not just the final answer. Tracing captures this full trajectory so you can compute reliable rewards, reproduce behavior, and iterate quickly. **Why it matters** * **Credit assignment**: You need a complete record of each step to attribute reward to the decisions that caused success or failure. * **Reproducibility**: Deterministic replays require the exact prompts, model parameters, tool I/O, and environment state. * **Debuggability**: You can pinpoint where an episode fails (model output, tool error, data mismatch, timeout). Use Fireworks Tracing to drive the RL loop: emit structured logs with `FireworksTracingHttpHandler`, tag them with rollout correlation metadata, and signal completion using `Status.rollout_finished()` or `Status.rollout_error()`. When you make model calls, use the `model_base_url` issued by the trainer (it points to `https://tracing.fireworks.ai`) so chat completions are recorded as traces via an OpenAI-compatible endpoint. ## How Fireworks tracing works for RFT * **Traced completions**: The trainer provides a `model_base_url` on `https://tracing.fireworks.ai` that encodes correlation metadata. Your agent uses this OpenAI-compatible URL for LLM calls; tracing.fireworks.ai records the calls as traces automatically. * **Structured logging sink**: Your agent logs to Fireworks via `FireworksTracingHttpHandler`, including a structured `Status` when a rollout finishes or errors. * **Join traces and logs**: The trainer polls the logging sink by `rollout_id` to detect completion, then loads the full trace. Logs and traces are deterministically joined using the same correlation tags. ### Correlation metadata * **Correlate every log and trace** with these metadata fields provided in `/init`: `invocation_id`, `experiment_id`, `rollout_id`, `run_id`, `row_id`. * **Emit structured completion** from your server logs: * Add `FireworksTracingHttpHandler` and `RolloutIdFilter` to attach the `rollout_id` * Log `Status.rollout_finished()` on success, or `Status.rollout_error(message)` on failure * **Alternative**: If you run one rollout per process, set `EP_ROLLOUT_ID` in the child process instead of adding a filter. * **Record model calls as traces** by using the `model_base_url` from the trainer. It encodes the correlation IDs so your completions are automatically captured. ### tracing.fireworks.ai base URL * **Purpose-built for RL**: tracing.fireworks.ai is the Fireworks gateway used during RFT to capture traces and correlate them with rollout status. * **OpenAI-compatible**: It exposes Chat Completions-compatible endpoints, so you set it as your client's `base_url`. * **Correlation-aware**: The trainer embeds `rollout_id`, `run_id`, and related IDs into the `model_base_url` path so your completions are automatically tagged and joinable with logs. * **Drop-in usage**: Always use the `model_base_url` provided in `/init`—do not override it—so traces and logs are correctly linked. ## End-to-end tracing setup with tracing.fireworks.ai Your server implements `/init` and receives `metadata` and `model_base_url`. Attach `RolloutIdFilter` or set `EP_ROLLOUT_ID` for the current rollout. Call the model using `model_base_url` so chat completions are persisted as traces with correlation tags. Attach `FireworksTracingHttpHandler` to your logger and log `Status.rollout_finished()` or `Status.rollout_error()` when the rollout concludes. The trainer polls Fireworks logs by `rollout_id`, then loads the full traces; logs and traces share the same tags and are joined to finalize results and compute rewards. ### Remote server minimal example ```python remote_server.py theme={null} import logging import os from eval_protocol import InitRequest, Status, FireworksTracingHttpHandler, RolloutIdFilter # Configure Fireworks logging sink once at startup logging.getLogger().addHandler(FireworksTracingHttpHandler()) @app.post("/init") def init(request: InitRequest): # Option A: add filter that injects rollout_id on every log record logger = logging.getLogger(f"eval.{request.metadata.rollout_id}") logger.addFilter(RolloutIdFilter(request.metadata.rollout_id)) # Option B: per-process correlation (use when spawning one rollout per process) # os.environ["EP_ROLLOUT_ID"] = request.metadata.rollout_id # Make model calls via the correlated base URL so completions are traced # client = YourLLMClient(base_url=request.model_base_url, api_key=request.api_key) try: # ... execute rollout steps, tool calls, etc. ... logger.info("rollout finished", extra={"status": Status.rollout_finished()}) except Exception as e: logger.error("rollout error", extra={"status": Status.rollout_error(str(e))}) ``` Under the hood, the trainer polls the logging sink for `Status` and then loads the full trace for scoring. Because both logs and traces share the same correlation tags, Fireworks can deterministically join them to finalize results and compute rewards. ### What to capture in a trace * **Inputs and context**: Task ID, dataset split, initial state, seeds, and any retrieval results provided to the agent. * **Model calls**: System/user messages, tool messages, model/version, parameters (e.g., temperature, top\_p, seed), token counts, and optional logprobs. * **Tool and API calls**: Request/response summaries, status codes, durations, retries, and sanitized payload snippets. * **Environment state transitions**: Key state before/after each action that affects reward or next-step choices. * **Rewards**: Per-step shaping rewards, terminal reward, and component breakdowns with weights and units. * **Errors and timeouts**: Exceptions, stack traces, and where they occurred in the trajectory. * **Artifacts**: Files, code, unit test results, or other outputs needed to verify correctness. Never record secrets or raw sensitive data in traces. Redact tokens, credentials, and PII. Store references (IDs, hashes) instead of full payloads whenever possible. ### How tracing powers the training loop 1. **Rollout begins**: Trainer creates a rollout and sends it to your environment (local or remote) with a unique identifier. 2. **Agent executes**: Your agent emits spans for model calls, tool calls, and state changes; your evaluator computes step and terminal rewards. 3. **Rewards aggregate**: The trainer consumes your rewards and updates the policy; traces are stored for replay and analysis. 4. **Analyze and iterate**: You filter traces by reward, failure type, latency, or cost to refine prompts, tools, or reward shaping. ### How RemoteRolloutProcessor uses Fireworks Tracing 1. **Remote server logs completion** with structured status: `Status.rollout_finished()` or `Status.rollout_error()`. 2. **Trainer polls Fireworks Tracing** by `rollout_id` until completion status is found. 3. **Status extracted** from structured fields (`code`, `message`, `details`) to finalize the rollout result. ### Best practices * **Make it deterministic**: Record seeds, versions, and any non-deterministic knobs; prefer idempotent tool calls or cached fixtures in test runs. * **Keep signals bounded**: Normalize rewards to a consistent range (e.g., \[0, 1]) and document your components and weights. * **Summarize, don’t dump**: Log compact summaries and references for large payloads to keep traces fast and cheap. * **Emit heartbeats**: Send periodic status updates so long-running rollouts are observable; always finalize with success or failure. * **Use consistent schemas**: Keep field names and structures stable to enable dashboards, filters, and automated diagnostics. ## Next steps Implement `/init`, tracing, and structured status for remote agents Build and deploy a local evaluator in under 10 minutes Launch your RFT job Design effective reward functions for your task --- # Source: https://docs.fireworks.ai/fine-tuning/evaluators.md # Evaluators > Understand the fundamentals of evaluators and reward functions in reinforcement fine-tuning An evaluator (also called a reward function) is code that scores model outputs from 0.0 (worst) to 1.0 (best). During reinforcement fine-tuning, your evaluator guides the model toward better responses by providing feedback on its generated outputs. ## Why evaluators matter Unlike supervised fine-tuning where you provide perfect examples, RFT uses evaluators to define what "good" means. This is powerful because: * **No perfect data required** - Just prompts and a way to score outputs * **Encourages exploration** - Models learn strategies, not just patterns * **Noise tolerant** - Even noisy signals can improve model performance * **Encodes domain expertise** - Complex rules and logic that are hard to demonstrate with examples ## Anatomy of an evaluator Every evaluator has three core components: ### 1. Input data The prompt and any ground truth data needed for evaluation: ```python theme={null} { "messages": [ {"role": "system", "content": "You are a math tutor."}, {"role": "user", "content": "What is 15 * 23?"} ], "ground_truth": "345" # Optional additional data } ``` ### 2. Model output The assistant's response to evaluate: ```python theme={null} { "role": "assistant", "content": "Let me calculate that step by step:\n15 * 23 = 345" } ``` ### 3. Scoring logic Code that compares the output to your criteria: ```python theme={null} def evaluate(model_output: str, ground_truth: str) -> float: # Extract answer from model's response predicted = extract_number(model_output) # Score it if predicted == int(ground_truth): return 1.0 # Perfect else: return 0.0 # Wrong ``` ## Types of evaluators ### Rule-based evaluators Check if outputs match specific patterns or rules: * **Exact match** - Output exactly equals expected value * **Contains** - Output includes required text * **Regex** - Output matches a pattern * **Format validation** - Output follows required structure (e.g., valid JSON) Start with rule-based evaluators. They're simple, fast, and surprisingly effective. ### Execution-based evaluators Run code or commands to verify correctness: * **Code execution** - Run generated code and check results * **Test suites** - Pass generated code through unit tests * **API calls** - Execute commands and verify outcomes * **Simulations** - Run agents in environments and measure success ### LLM-as-judge evaluators Use another model to evaluate quality: * **Rubric scoring** - Judge outputs against criteria * **Comparative ranking** - Compare multiple outputs * **Natural language assessment** - Evaluate subjective qualities like helpfulness ## Scoring guidelines Your evaluator should return a score between 0.0 and 1.0: | Score range | Meaning | Example | | ----------- | ------- | --------------------------- | | 1.0 | Perfect | Exact correct answer | | 0.7-0.9 | Good | Right approach, minor error | | 0.4-0.6 | Partial | Some correct elements | | 0.1-0.3 | Poor | Wrong but attempted | | 0.0 | Failure | Completely wrong | Binary scoring (0.0 or 1.0) works well for many tasks. Use gradual scoring when you can meaningfully distinguish between partial successes. ## Best practices Begin with basic evaluation logic and refine over time: ```python theme={null} # Start here score = 1.0 if predicted == expected else 0.0 # Then refine if needed score = calculate_similarity(predicted, expected) ``` Start with the simplest scoring approach that captures your core requirements. You can always add sophistication later based on training results. Training generates many outputs to evaluate, so performance matters: * **Cache expensive computations**: Store results of repeated calculations * **Use timeouts for code execution**: Prevent hanging on infinite loops * **Batch API calls when possible**: Reduce network overhead * **Profile slow evaluators and optimize**: Identify and fix bottlenecks Aim for evaluations that complete in seconds, not minutes. Slow evaluators directly increase training time and cost. Models will generate unexpected outputs, so build robust error handling: ```python theme={null} try: result = execute_code(model_output) score = check_result(result) except TimeoutError: score = 0.0 # Code ran too long except SyntaxError: score = 0.0 # Invalid code except Exception as e: score = 0.0 # Any other error ``` Anticipate and gracefully handle malformed outputs, syntax errors, timeouts, and edge cases specific to your domain. Models will exploit evaluation weaknesses, so design defensively: **Example: Length exploitation** If you score outputs by length, the model might generate verbose nonsense. Add constraints: ```python theme={null} # Bad: Model learns to write long outputs score = min(len(output) / 1000, 1.0) # Better: Require correctness AND reasonable length if is_correct(output): score = 1.0 if len(output) < 500 else 0.8 else: score = 0.0 ``` **Example: Format over substance** If you only check JSON validity, the model might return valid but wrong JSON. Check content too: ```python theme={null} # Bad: Only checks format score = 1.0 if is_valid_json(output) else 0.0 # Better: Check format AND content if is_valid_json(output): data = json.loads(output) score = evaluate_content(data) else: score = 0.0 ``` Always combine format checks with content validation to prevent models from gaming the system. ## Debugging evaluators Test your evaluator before training. Look for: * **Correct scoring** - Good outputs score high, bad outputs score low * **Reasonable runtime** - Each evaluation completes in reasonable time * **Clear feedback** - Evaluation reasons explain scores Run your evaluator on manually created good and bad examples first. If it doesn't score them correctly, fix the evaluator before training. ## Next steps Connect to your environment for single and multi-turn agents Follow a complete example building and using an evaluator --- # Source: https://docs.fireworks.ai/api-reference/execute-reinforcement-fine-tuning-step.md # Execute one training step for keep-alive Reinforcement Fine-tuning Step ## OpenAPI ````yaml post /v1/accounts/{account_id}/rlorTrainerJobs/{rlor_trainer_job_id}:executeTrainStep openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/rlorTrainerJobs/{rlor_trainer_job_id}:executeTrainStep: post: tags: - Gateway summary: Execute one training step for keep-alive Reinforcement Fine-tuning Step operationId: Gateway_ExecuteRlorTrainStep parameters: - name: account_id in: path required: true description: The Account Id schema: type: string - name: rlor_trainer_job_id in: path required: true description: The Rlor Trainer Job Id schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/GatewayExecuteRlorTrainStepBody' required: true responses: '200': description: A successful response. content: application/json: schema: type: object properties: {} components: schemas: GatewayExecuteRlorTrainStepBody: type: object properties: dataset: type: string description: Dataset to process for this iteration. outputModel: type: string description: Output model to materialize when training completes. required: - dataset - outputModel securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/accounts/exporting-billing-metrics.md # Exporting Billing Metrics > Export billing and usage metrics for all Fireworks services ## Overview Fireworks provides a CLI tool to export comprehensive billing metrics for all usage types including serverless inference, on-demand deployments, and fine-tuning jobs. The exported data can be used for cost analysis, internal billing, and usage tracking. ## Exporting billing metrics Use the Fireworks CLI to export a billing CSV that includes all usage: ```bash theme={null} # Authenticate (once) firectl auth login # Export billing metrics to CSV firectl export billing-metrics ``` ## Examples Export all billing metrics for an account: ```bash theme={null} firectl export billing-metrics ``` Export metrics for a specific date range and filename: ```bash theme={null} firectl export billing-metrics \ --start-time "2025-01-01" \ --end-time "2025-01-31" \ --filename january_metrics.csv ``` ## Output format The exported CSV includes the following columns: * **email**: Account email * **start\_time**: Request start timestamp * **end\_time**: Request end timestamp * **usage\_type**: Type of usage (e.g., TEXT\_COMPLETION\_INFERENCE\_USAGE) * **accelerator\_type**: GPU/hardware type used * **accelerator\_seconds**: Compute time in seconds * **base\_model\_name**: The model used * **model\_bucket**: Model category * **parameter\_count**: Model size * **prompt\_tokens**: Input tokens * **completion\_tokens**: Output tokens ### Sample row ```csv theme={null} email,start_time,end_time,usage_type,accelerator_type,accelerator_seconds,base_model_name,model_bucket,parameter_count,prompt_tokens,completion_tokens user@example.com,2025-10-20 17:16:48 UTC,2025-10-20 17:16:48 UTC,TEXT_COMPLETION_INFERENCE_USAGE,,,accounts/fireworks/models/llama4-maverick-instruct-basic,Llama 4 Maverick Basic,401583781376,803,109 ``` ## Automation You can automate exports in cron jobs and load the CSV into your internal systems: ```bash theme={null} # Example: Daily export with dated filename firectl export billing-metrics \ --start-time "$(date -v-1d '+%Y-%m-%d')" \ --end-time "$(date '+%Y-%m-%d')" \ --filename "billing_$(date '+%Y%m%d').csv" ``` Run `firectl export billing-metrics --help` to see all available flags and options. ## Coverage This export includes: * **Serverless inference**: All serverless API usage * **On-demand deployments**: Deployment usage (see also [Exporting deployment metrics](/deployments/exporting-metrics) for real-time Prometheus metrics) * **Fine-tuning jobs**: Fine-tuning compute usage * **Other services**: All billable Fireworks services For real-time monitoring of on-demand deployment performance metrics (latency, throughput, etc.), use the [Prometheus metrics endpoint](/deployments/exporting-metrics) instead. ## See also * [firectl CLI overview](/tools-sdks/firectl/firectl) * [Exporting deployment metrics](/deployments/exporting-metrics) - Real-time Prometheus metrics for on-demand deployments * [Rate Limits & Quotas](/guides/quotas_usage/rate-limits) - Understanding spend limits and quotas --- # Source: https://docs.fireworks.ai/deployments/exporting-metrics.md # Exporting Metrics > Export metrics from your dedicated deployments to your observability stack ## Overview Fireworks provides a metrics endpoint in Prometheus format, enabling integration with popular observability tools like Prometheus, OpenTelemetry (OTel) Collector, Datadog Agent, and Vector. This page covers real-time performance metrics (latency, throughput, etc.) for on-demand deployments. For billing and usage data across all Fireworks services, see [Exporting Billing Metrics](/accounts/exporting-billing-metrics). ## Setting Up Metrics Collection ### Endpoint The metrics endpoint is as follows. This URL and authorization header can be directly used by services like Grafana Cloud to ingest Fireworks metrics. ``` https://api.fireworks.ai/v1/accounts//metrics ``` ### Authentication Use the Authorization header with your Fireworks API key: ```json theme={null} { "Authorization": "Bearer YOUR_API_KEY" } ``` ### Scrape Interval We recommend using a 1-minute scrape interval as metrics are updated every 30s. ### Rate Limits To ensure service stability and fair usage: * Maximum of 6 requests per minute per account * Exceeding this limit results in HTTP 429 (Too Many Requests) responses * Use a 1-minute scrape interval to stay within limits ## Integration Options Fireworks metrics can be integrated with various observability platforms through multiple approaches: ### OpenTelemetry Collector Integration The Fireworks metrics endpoint can be integrated with OpenTelemetry Collector by configuring a Prometheus receiver that scrapes the endpoint. This allows Fireworks metrics to be pushed to a variety of popular exporters—see the [OpenTelemetry registry](https://opentelemetry.io/ecosystem/registry/) for a full list. ### Direct Prometheus Integration To integrate directly with Prometheus, specify the Fireworks metrics endpoint in your scrape config: ```yaml theme={null} global: scrape_interval: 60s scrape_configs: - job_name: 'fireworks' metrics_path: 'v1/accounts//metrics' authorization: type: "Bearer" credentials: "YOUR_API_KEY" static_configs: - targets: ['api.fireworks.ai'] scheme: https ``` For more details on Prometheus configuration, refer to the [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/). ### Supported Platforms Fireworks metrics can be exported to various observability platforms including: * Prometheus * Datadog * Grafana * New Relic ## Available Metrics ### Common Labels All metrics include the following common labels: * `base_model`: The base model identifier (e.g., "accounts/fireworks/models/deepseek-v3") * `deployment`: Full deployment path (e.g., "accounts/account-name/deployments/deployment-id") * `deployment_account`: The account name * `deployment_id`: The deployment identifier ### Rate Metrics (per second) These metrics show activity rates calculated using 1-minute windows: #### Request Rate * `request_counter_total:sum_by_deployment`: Request rate per deployment #### Error Rate * `requests_error_total:sum_by_deployment`: Error rate per deployment, broken down by HTTP status code (includes additional `http_code` label) #### Token Processing Rates * `tokens_cached_prompt_total:sum_by_deployment`: Rate of cached prompt tokens per deployment * `tokens_prompt_total:sum_by_deployment`: Rate of total prompt tokens processed per deployment ### Latency Histogram Metrics These metrics provide latency distribution data with histogram buckets, calculated using 1-minute windows: #### Generation Latency * `latency_generation_per_token_ms_bucket:sum_by_deployment`: Per-token generation time distribution * `latency_generation_queue_ms_bucket:sum_by_deployment`: Time spent waiting in generation queue #### Request Latency * `latency_overall_ms_bucket:sum_by_deployment`: End-to-end request latency distribution * `latency_to_first_token_ms_bucket:sum_by_deployment`: Time to first token distribution #### Prefill Latency * `latency_prefill_ms_bucket:sum_by_deployment`: Prefill processing time distribution * `latency_prefill_queue_ms_bucket:sum_by_deployment`: Time spent waiting in prefill queue ### Token Distribution Metrics These histogram metrics show token count distributions per request, calculated using 1-minute windows: * `tokens_generated_per_request_bucket:sum_by_deployment`: Distribution of generated tokens per request * `tokens_prompt_per_request_bucket:sum_by_deployment`: Distribution of prompt tokens per request ### Resource Utilization Metrics These gauge metrics show average resource usage: * `generator_kv_blocks_fraction:avg_by_deployment`: Average fraction of KV cache blocks in use * `generator_kv_slots_fraction:avg_by_deployment`: Average fraction of KV cache slots in use * `generator_model_forward_time:avg_by_deployment`: Average time spent in model forward pass * `requests_coordinator_concurrent_count:avg_by_deployment`: Average number of concurrent requests * `prefiller_prompt_cache_ttl:avg_by_deployment`: Average prompt cache time-to-live --- # Source: https://docs.fireworks.ai/fine-tuning/fine-tuning-models.md # Supervised Fine Tuning - Text This guide will focus on using supervised fine-tuning to fine-tune and deploy a model with on-demand and serverless hosting. ## Fine-tuning a model using SFT You can confirm that a base model is available to fine-tune by looking for the `Tunnable` tag in the model library or by using: ```bash theme={null} firectl get model -a fireworks ``` And looking for `Tunable: true`. Some base models cannot be tuned on Fireworks (`Tunable: false`) but still list support for LoRA (`Supports Lora: true`). This means that users can tune a LoRA for this base model on a separate platform and upload it to Fireworks for inference. Consult [importing fine-tuned models](/models/uploading-custom-models#importing-fine-tuned-models) for more information. Datasets must be in JSONL format, where each line represents a complete JSON-formatted training example. Make sure your data conforms to the following restrictions: * **Minimum examples:** 3 * **Maximum examples:** 3 million per dataset * **File format:** `.jsonl` * **Message schema:** Each training sample must include a messages array, where each message is an object with two fields: * `role`: one of `system`, `user`, or `assistant`. A message with the `system` role is optional, but if specified, it must be the first message of the conversation * `content`: a string representing the message content * `weight`: optional key with value to be configured in either 0 or 1. message will be skipped if value is set to 0 * **Sample weight:** Optional key `weight` at the root of the JSON object. It can be any floating point number (positive, negative, or 0) and is used as a loss multiplier for tokens in that sample. If used, this field must be present in all samples in the dataset. Here is an example conversation dataset: ```json theme={null} { "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris."} ] } { "messages": [ {"role": "user", "content": "What is 1+1?"}, {"role": "assistant", "content": "2", "weight": 0}, {"role": "user", "content": "Now what is 2+2?"}, {"role": "assistant", "content": "4"} ] } ``` Here is an example conversation dataset with sample weights: ```json theme={null} { "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris."} ], "weight": 0.5 } { "messages": [ {"role": "user", "content": "What is 1+1?"}, {"role": "assistant", "content": "2", "weight": 0}, {"role": "user", "content": "Now what is 2+2?"}, {"role": "assistant", "content": "4"} ], "weight": 1.0 } ``` We also support function calling dataset with a list of tools. An example would look like: ```json theme={null} { "tools": [ { "type": "function", "function": { "name": "get_car_specs", "description": "Fetches detailed specifications for a car based on the given trim ID.", "parameters": { "trimid": { "description": "The trim ID of the car for which to retrieve specifications.", "type": "int", "default": "" } } } }, ], "messages": [ { "role": "user", "content": "What is the specs of the car with trim 121?" }, { "role": "assistant", "tool_calls": [ { "type": "function", "function": { "name": "get_car_specs", "arguments": "{\"trimid\": 121}" } } ] } ] } ``` For the subset of models that supports thinking (e.g. DeepSeek R1, GPT OSS models and Qwen3 thinking models), we also support fine tuning with thinking traces. If you wish to fine tune with thinking traces, the dataset could also include thinking traces for assistant turns. Though optional, ideally each assistant turn includes a thinking trace. For example: ```json theme={null} { "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris.", "reasoning_content": "The user is asking about the capital city of France, it should be Paris."} ] } { "messages": [ {"role": "user", "content": "What is 1+1?"}, {"role": "assistant", "content": "2", "weight": 0, "reasoning_content": "The user is asking about the result of 1+1, the answer is 2."}, {"role": "user", "content": "Now what is 2+2?"}, {"role": "assistant", "content": "4", "reasoning_content": "The user is asking about the result of 2+2, the answer should be 4."} ] } ``` Note that when fine tuning with intermediate thinking traces, the number of total tuned tokens could exceed the number of total tokens in the dataset. This is because we perform preprocessing and expand the dataset to ensure train-inference consistency. There are a couple ways to upload the dataset to Fireworks platform for fine tuning: `firectl`, `Restful API` , `builder SDK` or `UI`. * You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard. Dataset Pn ```bash theme={null} firectl create dataset /path/to/jsonl/file ``` You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset). Note that the `exampleCount` parameter needs to be provided by the client. ```jsx theme={null} // Create Dataset Entry const createDatasetPayload = { datasetId: "trader-poe-sample-data", dataset: { userUploaded: {} } // Additional params such as exampleCount }; const urlCreateDataset = `${BASE_URL}/datasets`; const response = await fetch(urlCreateDataset, { method: "POST", headers: HEADERS_WITH_CONTENT_TYPE, body: JSON.stringify(createDatasetPayload) }); ``` ```jsx theme={null} // Upload JSONL file const urlUpload = `${BASE_URL}/datasets/${DATASET_ID}:upload`; const files = new FormData(); files.append("file", localFileInput.files[0]); const uploadResponse = await fetch(urlUpload, { method: "POST", headers: HEADERS, body: files }); ``` While all of the above approaches should work, `UI` is more suitable for smaller datasets `< 500MB` while `firectl` might work better for bigger datasets. Ensure the dataset ID conforms to the [resource id restrictions](/getting-started/concepts#resource-names-and-ids). There are also a couple ways to launch the fine-tuning jobs. We highly recommend creating supervised fine tuning jobs via `UI` . Simply navigate to the `Fine-Tuning` tab, click `Fine-Tune a Model` and follow the wizard from there. You can even pick a LoRA model to start the fine-tuning for continued training. Fine Tuning Pn Create Sftj Pn Ensure the fine tuned model ID conforms to the [resource id restrictions](/getting-started/concepts#resource-names-and-ids). This will return a fine-tuning job ID. For a full explanation of the settings available to control the fine-tuning process, including learning rate and epochs, consult [additional SFT job settings](#additional-sft-job-settings). ```bash theme={null} firectl create sftj --base-model --dataset --output-model ``` Similar to UI, instead of tuning a base model, you can also start tuning from a previous LoRA model using ```bash theme={null} firectl create sftj --warm-start-from --dataset --output-model ``` Notice that we use `--warm-start-from` instead of `--base-model` when creating this job. With `UI`, once the job is created, it will show in the list of jobs. Clicking to view the job details to monitor the job progress. Sftj Details Pn With `firectl`, you can monitor the progress of the tuning job by running ```bash theme={null} firectl get sftj ``` Once the job successfully completes, you will see the new LoRA model in your model list ```bash theme={null} firectl list models ``` ## Deploying a fine-tuned model After fine-tuning completes, deploy your model to make it available for inference: ```bash theme={null} firectl create deployment ``` This creates a dedicated deployment with performance matching the base model. For more details on deploying fine-tuned models, including multi-LoRA and serverless deployments, see the [Deploying Fine Tuned Models guide](/fine-tuning/deploying-loras). ## Additional SFT job settings Additional tuning settings are available when starting a fine-tuning job. All of the below settings are optional and will have reasonable defaults if not specified. For settings that affect tuning quality like `epochs` and `learning rate`, we recommend using default settings and only changing hyperparameters if results are not as desired. By default, the fine-tuning job will run evaluation by running the fine-tuned model against an evaluation set that's created by automatically carving out a portion of your training set. You have the option to explicitly specify a separate evaluation dataset to use instead of carving out training data. `evaluation_dataset`: The ID of a separate dataset to use for evaluation. Must be pre-uploaded via firectl ```shell theme={null} firectl create sftj \ --evaluation-dataset my-eval-set \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` Depending on the size of the model, the default context size will be different. For most models, the default context size is >= 32768. Training examples will be cut-off at 32768 tokens. Usually you do not need to set the max context length unless out of memory error is encountered with higher lora rank and large max context length. ```shell theme={null} firectl create sftj \ --max-context-length 65536 \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` Batch size is the number of tokens packed into one forward step during training. One batch could consist of multiple training samples. We do sequence packing on the training samples, and batch size controls how many total tokens will be packed into each batch. ```shell theme={null} firectl create sftj \ --batch-size 65536 \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` Epochs are the number of passes over the training data. Our default value is 1. If the model does not follow the training data as much as expected, increase the number of epochs by 1 or 2. Non-integer values are supported. **Note: we set a max value of 3 million dataset examples × epochs** ```shell theme={null} firectl create sftj \ --epochs 2.0 \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` Learning rate controls how fast the model updates from data. We generally do not recommend changing learning rate. The default value is automatically based on your selected model. ```shell theme={null} firectl create sftj \ --learning-rate 0.0001 \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` Learning rate warmup steps controls the number of training steps during which the learning rate will be linearly ramped up to the set learning rate. ```shell theme={null} firectl create sftj \ --learning-rate 0.0001 \ --learning-rate-warmup-steps 200 \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` Gradient accumulation steps controls the number of forward steps and backward steps to take (gradients are accumulated) before optimizer.step() is taken. Gradient accumulation steps > 1 increases effective batch size. ```shell theme={null} firectl create sftj \ --gradient-accumulation-steps 4 \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` LoRA rank refers to the number of parameters that will be tuned in your LoRA add-on. Higher LoRA rank increases the amount of information that can be captured while tuning. LoRA rank must be a power of 2 up to 64. Our default value is 8. ```shell theme={null} firectl create sftj \ --lora-rank 16 \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` The fine-tuning service integrates with Weights & Biases to provide observability into the tuning process. To use this feature, you must have a Weights & Biases account and have provisioned an API key. ```shell theme={null} firectl create sftj \ --wandb-entity my-org \ --wandb-api-key xxx \ --wandb-project "My Project" \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` By default, the fine-tuning job will generate a random unique ID for the model. This ID is used to refer to the model at inference time. You can optionally specify a custom ID, within [ID constraints](/getting-started/concepts#resource-names-and-ids). ```shell theme={null} firectl create sftj \ --output-model my-model \ --base-model MY_BASE_MODEL \ --dataset cancerset ``` By default, the fine-tuning job will generate a random unique ID for the fine-tuning job. You can optionally choose a custom ID. ```shell theme={null} firectl create sftj \ --job-id my-fine-tuning-job \ --base-model MY_BASE_MODEL \ --dataset cancerset \ --output-model my-tuned-model ``` ## Appendix `Python builder SDK` [references](/tools-sdks/python-client/sdk-introduction) `Restful API`[ references](/api-reference/introduction) `firectl` [references](/tools-sdks/firectl/firectl) --- # Source: https://docs.fireworks.ai/fine-tuning/fine-tuning-vlm.md # Supervised Fine Tuning - Vision > Learn how to fine-tune vision-language models on Fireworks AI with image and text datasets Vision-language model (VLM) fine-tuning allows you to adapt pre-trained models that can understand both text and images to your specific use cases. This is particularly valuable for tasks like document analysis, visual question answering, image captioning, and domain-specific visual understanding. To see all vision models that support fine-tuning, visit the [Model Library for vision models](https://app.fireworks.ai/models?filter=vision\&tunable=true). ## Fine-tuning a VLM using LoRA vision datasets must be in JSONL format in OpenAI-compatible chat format. Each line represents a complete training example. **Dataset Requirements:** * **Format**: `.jsonl` file * **Minimum examples**: 3 * **Maximum examples**: 3 million per dataset * **Images**: Must be base64 encoded with proper MIME type prefixes * **Supported image formats**: PNG, JPG, JPEG **Message Schema:** Each training example must include a `messages` array where each message has: * `role`: one of `system`, `user`, or `assistant` * `content`: an array containing text and image objects or just text ### Basic VLM Dataset Example ```json theme={null} { "messages": [ { "role": "system", "content": "You are a helpful visual assistant that can analyze images and answer questions about them." }, { "role": "user", "content": [ { "type": "text", "text": "What objects do you see in this image?" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..." } } ] }, { "role": "assistant", "content": "I can see a red car, a tree, and a blue house in this image." } ] } ``` ### If your dataset contains image urls Images must be base64 encoded with MIME type prefixes. If your dataset contains image URLs, you'll need to download and encode them to base64. ```json theme={null} { "type": "image_url", "image_url": { // ❌ Raw HTTP/HTTPS URLs are NOT supported "url": "https://example.com/image.jpg" } } ``` ```json theme={null} { "type": "image_url", "image_url": { // ✅ Use data URI with base64 encoding // Format: data:image/{format};base64,{base64_encoded_data} "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..." } } ``` You can use the following script to automatically convert your dataset to the correct format: **Usage:** ```bash theme={null} # Install required dependency pip install requests # Download the script wget https://raw.githubusercontent.com/fw-ai/cookbook/refs/heads/main/learn/vlm-finetuning/utils/download_images_and_encode_to_b64.py # Run the script - will output a new dataset _base64.jsonl python download_images_and_encode_to_b64.py --input_file ``` ### Advanced Dataset Examples ```json expandable theme={null} { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Compare these two images and tell me the differences" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..." } }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4BBBSkZJRg..." } } ] }, { "role": "assistant", "content": "The first image shows a daytime scene while the second shows the same location at night. The lighting and shadows are completely different." } ] } ``` ```json expandable theme={null} { "messages": [ { "role": "system", "content": "You are a helpful visual assistant that can analyze images and remember details from previous images in our conversation." }, { "role": "user", "content": [ { "type": "text", "text": "Can you describe this kitchen layout for me?" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ..." } } ] }, { "role": "assistant", "content": "This is a modern open-plan kitchen with white cabinets, granite countertops, and stainless steel appliances. The island is positioned in the center with bar seating for 3-4 people. There's a large window above the sink providing natural light." }, { "role": "user", "content": "Now look at this living room. Do you think the styles would work well together?" }, { "role": "assistant", "content": "I'd be happy to help compare the styles! However, I don't see a living room image in your message. Could you please share the living room photo so I can analyze how well it would coordinate with the modern kitchen style we just discussed?" }, { "role": "user", "content": [ { "type": "text", "text": "Sorry about that! Here's the living room:" }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4BBBSkZJRgABAQ..." } } ] }, { "role": "assistant", "content": "Perfect! This living room has a complementary modern aesthetic that would work beautifully with the kitchen. Both spaces feature clean lines and a neutral color palette. The living room's contemporary furniture and the kitchen's sleek design would create a cohesive flow in an open floor plan. The warm wood accents in the living room would add nice contrast to the kitchen's cooler tones." }, ] } ``` ### Try with an Example Dataset To get a feel for how VLM fine-tuning works, you can use an example vision dataset. This is a classification dataset that contains images of food with `` tags for reasoning. ```bash theme={null} # Download the example dataset curl -L -o food_reasoning.jsonl https://huggingface.co/datasets/fireworks-ai/vision-food-reasoning-dataset/resolve/main/food_reasoning.jsonl ``` ```bash theme={null} # Download the example dataset wget https://huggingface.co/datasets/fireworks-ai/vision-food-reasoning-dataset/resolve/main/food_reasoning.jsonl ``` Upload your prepared JSONL dataset to Fireworks for training: ```bash theme={null} firectl create dataset my-vlm-dataset /path/to/vlm_training_data.jsonl ``` Navigate to the Datasets tab in the Fireworks console, click "Create Dataset", and upload your JSONL file through the wizard. Dataset creation interface ```javascript theme={null} // Create dataset entry const createDatasetPayload = { datasetId: "my-vlm-dataset", dataset: { userUploaded: {} } }; const response = await fetch(`${BASE_URL}/datasets`, { method: "POST", headers: { "Authorization": `Bearer ${API_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify(createDatasetPayload) }); // Upload JSONL file const formData = new FormData(); formData.append("file", fileInput.files[0]); const uploadResponse = await fetch(`${BASE_URL}/datasets/my-vlm-dataset:upload`, { method: "POST", headers: { "Authorization": `Bearer ${API_KEY}` }, body: formData }); ``` For larger datasets (>500MB), use `firectl` as it handles large uploads more reliably than the web interface. For enhanced data control and security, we also support bring your own bucket (BYOB) configurations. See our [Secure Fine Tuning](/fine-tuning/secure-fine-tuning#gcs-bucket-integration) guide for setup details. Create a supervised fine-tuning job for your VLM: ```bash theme={null} firectl create sftj \ --base-model accounts/fireworks/models/qwen2p5-vl-32b-instruct \ --dataset my-vlm-dataset \ --output-model my-custom-vlm \ --epochs 3 ``` For additional parameters like learning rates, evaluation datasets, and batch sizes, see [Additional SFT job settings](/fine-tuning/fine-tuning-models#additional-sft-job-settings). 1. Navigate to the Fine-tuning tab in the Fireworks console 2. Click "Create Fine-tuning Job" 3. Select your VLM base model (Qwen 2.5 VL) 4. Choose your uploaded dataset 5. Configure training parameters 6. Launch the job Fine-tuning job creation interface VLM fine-tuning jobs typically take longer than text-only models due to the additional image processing. Expect training times of several hours depending on dataset size and model complexity. Track your VLM fine-tuning job in the [Fireworks console](https://app.fireworks.ai/dashboard/fine-tuning). VLM fine-tuning job in the Fireworks console Monitor key metrics: * **Training loss**: Should generally decrease over time * **Evaluation loss**: Monitor for overfitting if using evaluation dataset * **Training progress**: Epochs completed and estimated time remaining Your VLM fine-tuning job is complete when the status shows `COMPLETED` and your custom model is ready for deployment. Once training is complete, deploy your custom VLM: ```bash theme={null} # Create a deployment for your fine-tuned VLM firectl create deployment my-custom-vlm # Check deployment status firectl get deployment accounts/your-account/deployment/deployment-id ``` Deploy from the UI using the `Deploy` dropdown in the fine-tuning job page. Deploy dropdown in the fine-tuning job page ## Advanced Configuration For additional fine-tuning parameters and advanced settings like custom learning rates, batch sizes, and optimization options, see the [Additional SFT job settings](/fine-tuning/fine-tuning-models#additional-sft-job-settings) section in our comprehensive fine-tuning guide. ## Interactive Tutorials: Fine-tuning VLMs For a hands-on, step-by-step walkthrough of VLM fine-tuning, we've created two fine tuning cookbooks that demonstrates the complete process from dataset preparation, model deployment to evaluation. **Google Colab Notebook: Fine-tune Qwen2.5 VL on Fireworks AI** **Finetuning a VLM to beat SOTA closed source model** The cookbooks above cover the following: * Setting up your environment with Fireworks CLI * Preparing vision datasets in the correct format * Launching and monitoring VLM fine-tuning jobs * Testing your fine-tuned model * Best practices for VLM fine-tuning * Running inference on serverless VLMs * Running evals to show performance gains ## Testing Your Fine-tuned VLM After deployment, test your fine-tuned VLM using the same API patterns as base VLMs: ```python Python (OpenAI Compatible) theme={null} import openai client = openai.OpenAI( base_url="https://api.fireworks.ai/inference/v1", api_key="", ) response = client.chat.completions.create( model="accounts/your-account/models/my-custom-vlm", messages=[{ "role": "user", "content": [{ "type": "image_url", "image_url": { "url": "https://raw.githubusercontent.com/fw-ai/cookbook/refs/heads/main/learn/vlm-finetuning/images/icecream.jpeg" }, },{ "type": "text", "text": "What's in this image?", }], }] ) print(response.choices[0].message.content) ``` ```python Python (Fireworks SDK) theme={null} from fireworks import LLM # Use your fine-tuned model llm = LLM(model="accounts/your-account/models/my-custom-vlm") response = llm.chat.completions.create( messages=[{ "role": "user", "content": [{ "type": "image_url", "image_url": { "url": "https://raw.githubusercontent.com/fw-ai/cookbook/refs/heads/main/learn/vlm-finetuning/images/icecream.jpeg" }, },{ "type": "text", "text": "What's in this image?", }], }] ) print(response.choices[0].message.content) ``` If you fine-tuned using the example dataset, your model should include `` tags in its response. --- # Source: https://docs.fireworks.ai/fine-tuning/finetuning-intro.md # Fine Tuning Overview Fireworks helps you fine-tune models to improve quality and performance for your product use cases, without the burden of building & maintaining your own training infrastructure. ## Fine-tuning methods Train models using custom reward functions for complex reasoning tasks Train text models with labeled examples of desired outputs Train vision-language models with image and text pairs Align models with human preferences using pairwise comparisons ## Supported models Fireworks supports fine-tuning for most major open source models, including DeepSeek, Qwen, Kimi, and Llama model families, and supports fine-tuning large state-of-the-art models like Kimi K2 0905 and DeepSeek V3.1. To see all models that support fine-tuning, visit the [Model Library for text models](https://app.fireworks.ai/models?filter=LLM\&tunable=true) or [vision models](https://app.fireworks.ai/models?filter=vision\&tunable=true). ## Fireworks uses LoRA Fireworks uses **[Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685)** to fine-tune models efficiently. The fine-tuning process generates a LoRA addon—a small adapter that modifies the base model's behavior without retraining all its weights. This approach is: * **Faster and cheaper** - Train models in hours, not days * **Easy to deploy** - Deploy LoRA addons instantly on Fireworks * **Flexible** - Run [multiple LoRAs](/fine-tuning/deploying-loras#multi-lora-deployment) on a single base model deployment ## When to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT) In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model's outputs. The model is iteratively trained to produce outputs that maximize this score. To learn more about the differences between SFT and RFT, see [when to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT)](./finetuning-intro#when-to-use-supervised-fine-tuning-sft-vs-reinforcement-fine-tuning-models-rft). Supervised fine-tuning (SFT) works well for many common scenarios, especially when: * You have a sizable dataset (\~1000+ examples) with high-quality, ground-truth lables. * The dataset covers most possible input scenarios. * Tasks are relatively straightforward, such as: * Classification * Content extraction However, SFT may struggle in situations where: * Your dataset is small. * You lack ground-truth outputs (a.k.a. “golden generations”). * The task requires multi-step reasoning. Here is a simple decision tree: ```mermaid theme={null} flowchart TD B{"Do you have labeled ground truth data?"} B --"Yes"--> C{"How much?"} C --"more than 1000 examples"--> D["SFT"] C --"100-1000 examples"-->F{"Does reasoning help?"} C --"~100s examples"--> E["RFT"] F --"No"-->D F -- "Yes" -->E B --"No"--> G{"Is this a verifiable task (see below)?"} G -- "Yes" -->E G -- "No"-->H["RLHF / LLM as judge"] ``` `Verifiable` refers to whether it is relatively easy to make a judgement on the quality of the model generation. --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/firectl.md # Getting started > Learn to create, deploy, and manage resources using Firectl Firectl can be installed several ways based on your choice and platform. ```bash homebrew theme={null} brew tap fw-ai/firectl brew install firectl # If you encounter a failed SHA256 check, try first running brew update ``` ```bash macOS (Apple Silicon) theme={null} curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-arm64.gz -o firectl.gz gzip -d firectl.gz && chmod a+x firectl sudo mv firectl /usr/local/bin/firectl sudo chown root: /usr/local/bin/firectl ``` ```bash macOS (x86_64) theme={null} curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-amd64.gz -o firectl.gz gzip -d firectl.gz && chmod a+x firectl sudo mv firectl /usr/local/bin/firectl sudo chown root: /usr/local/bin/firectl ``` ```bash Linux (x86_64) theme={null} wget -O firectl.gz https://storage.googleapis.com/fireworks-public/firectl/stable/linux-amd64.gz gunzip firectl.gz sudo install -o root -g root -m 0755 firectl /usr/local/bin/firectl ``` ```Text Windows (64 bit) theme={null} wget -L https://storage.googleapis.com/fireworks-public/firectl/stable/firectl.exe ``` ### Sign into Fireworks account To sign into your Fireworks account: ```bash theme={null} firectl signin ``` If you have set up [Custom SSO](/accounts/sso) then also pass your account ID: ```bash theme={null} firectl signin ``` ### Check you have signed in To show which account you have signed into: ```bash theme={null} firectl whoami ``` ### Check your installed version ```bash theme={null} firectl version ``` ### Upgrade to the latest version ```bash theme={null} sudo firectl upgrade ``` --- # Source: https://docs.fireworks.ai/faq-new/models-inference/flux-image-generation.md # FLUX image generation ## Can I generate multiple images in a single API call? No, FLUX serverless supports only one image per API call. For multiple images, send separate parallel requests—these will be automatically load-balanced across our replicas for optimal performance. ## Does FLUX support image-to-image generation? No, image-to-image generation is not currently supported. We are evaluating this feature for future implementation. If you have specific use cases, please share them with our support team to help inform development. ## Can I create custom LoRA models with FLUX? Inference on FLUX-LoRA adapters is currently supported. However, managed training on Fireworks with FLUX is not, although this feature is under development. Updates about our managed LoRA training service will be announced when available. --- # Source: https://docs.fireworks.ai/guides/function-calling.md # Tool Calling > Connect models to external tools and APIs Tool calling (also known as function calling) enables models to intelligently select and use external tools based on user input. You can build agents that access APIs, retrieve real-time data, or perform actions—all through [OpenAI-compatible](https://platform.openai.com/docs/guides/function-calling) tool specifications. **How it works:** 1. Define tools using [JSON Schema](https://json-schema.org/learn/getting-started-step-by-step) (name, description, parameters) 2. Model analyzes the query and decides whether to call a tool 3. If needed, model returns structured tool calls with parameters 4. You execute the tool and send results back for the final response ## Quick example Define tools and send a request - the model will return structured tool calls when needed: ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City name"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="accounts/fireworks/models/kimi-k2-instruct-0905", messages=[{"role": "user", "content": "What's the weather in San Francisco?"}], tools=tools, temperature=0.1 ) print(response.choices[0].message.tool_calls) # Output: [ChatCompletionMessageToolCall(id='call_abc123', function=Function(arguments='{"location":"San Francisco"}', name='get_weather'), type='function')] ``` For best results with tool calling, use a low temperature (0.0-0.3) to reduce hallucinated parameter values and ensure more deterministic tool selection. ```python theme={null} import os from openai import OpenAI import json client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) # Step 1: Define your tools tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } }] # Step 2: Send initial request messages = [{"role": "user", "content": "What's the weather in San Francisco?"}] response = client.chat.completions.create( model="accounts/fireworks/models/kimi-k2-instruct-0905", messages=messages, tools=tools, temperature=0.1 ) # Step 3: Check if model wants to call a tool if response.choices[0].message.tool_calls: # Step 4: Execute the tool tool_call = response.choices[0].message.tool_calls[0] # Your actual tool implementation def get_weather(location, unit="celsius"): # In production, call your weather API here return {"temperature": 72, "condition": "sunny", "unit": unit} # Parse arguments and call your function function_args = json.loads(tool_call.function.arguments) function_response = get_weather(**function_args) # Step 5: Send tool response back to model messages.append(response.choices[0].message) # Add assistant's tool call messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(function_response) }) # Step 6: Get final response final_response = client.chat.completions.create( model="accounts/fireworks/models/kimi-k2-instruct-0905", messages=messages, tools=tools, temperature=0.1 ) print(final_response.choices[0].message.content) # Output: "It's currently 72°F and sunny in San Francisco." ``` ## Defining tools Tools are defined using [JSON Schema](https://json-schema.org/understanding-json-schema/reference) format. Each tool requires: * **name**: Function identifier (a-z, A-Z, 0-9, underscores, dashes; max 64 characters) * **description**: Clear explanation of what the function does (used by the model to decide when to call it) * **parameters**: JSON Schema object describing the function's parameters Write detailed descriptions and parameter definitions. The model relies on these to select the correct tool and provide appropriate arguments. ### Parameter types JSON Schema supports: `string`, `number`, `integer`, `object`, `array`, `boolean`, and `null`. You can also: * Use `enum` to restrict values to specific options * Mark parameters as `required` or optional * Provide descriptions for each parameter ```python theme={null} tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name, e.g. San Francisco" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["location"] } } }, { "type": "function", "function": { "name": "search_restaurants", "description": "Search for restaurants by cuisine type", "parameters": { "type": "object", "properties": { "cuisine": { "type": "string", "description": "Type of cuisine (e.g., Italian, Mexican)" }, "location": { "type": "string", "description": "City or neighborhood" }, "price_range": { "type": "string", "enum": ["$", "$$", "$$$", "$$$$"] } }, "required": ["cuisine", "location"] } } } ] ``` ## Additional configurations ### tool\_choice The [`tool_choice`](/api-reference/post-chatcompletions#body-tool-choice) parameter controls how the model uses tools: * **`auto`** (default): Model decides whether to call a tool or respond directly * **`none`**: Model will not call any tools * **`required`**: Model must call at least one tool * **Specific function**: Force the model to call a particular function ```python theme={null} # Force a specific tool response = client.chat.completions.create( model="accounts/fireworks/models/kimi-k2-instruct-0905", messages=[{"role": "user", "content": "What's the weather?"}], tools=tools, tool_choice={"type": "function", "function": {"name": "get_weather"}}, temperature=0.1 ) ``` Some models support parallel tool calling, where multiple tools can be called in a single response. Check the model's capabilities before relying on this feature. ## Streaming Tool calls work with streaming responses. Arguments are sent incrementally as the model generates them: ```python theme={null} import json import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city", "parameters": { "type": "object", "properties": { "city": {"type": "string", "description": "City name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["city"] } } }] stream = client.chat.completions.create( model="accounts/fireworks/models/kimi-k2-instruct-0905", messages=[{"role": "user", "content": "What's the weather in San Francisco?"}], tools=tools, stream=True, temperature=0.1 ) # Accumulate tool call data tool_calls = {} for chunk in stream: if chunk.choices[0].delta.tool_calls: for tool_call in chunk.choices[0].delta.tool_calls: index = tool_call.index if index not in tool_calls: tool_calls[index] = {"id": "", "name": "", "arguments": ""} if tool_call.id: tool_calls[index]["id"] = tool_call.id if tool_call.function and tool_call.function.name: tool_calls[index]["name"] = tool_call.function.name if tool_call.function and tool_call.function.arguments: tool_calls[index]["arguments"] += tool_call.function.arguments if chunk.choices[0].finish_reason == "tool_calls": for tool_call in tool_calls.values(): args = json.loads(tool_call["arguments"]) print(f"Calling {tool_call['name']} with {args}") break ``` ## Troubleshooting * Check that your tool descriptions are clear and detailed * Ensure the user query clearly indicates a need for the tool * Try using `tool_choice="required"` to force tool usage * Verify your model supports tool calling (check `supportsTools` field) * Add more detailed parameter descriptions * Use lower temperature (0.0-0.3) for more deterministic outputs * Provide examples in parameter descriptions * Use `enum` to constrain values to specific options * Always validate tool call arguments before parsing * Handle partial or malformed JSON gracefully in production * Use try-catch blocks when parsing `tool_call.function.arguments` ## Next steps Enforce JSON schemas for consistent responses Learn about chat completions and other APIs Deploy models on dedicated GPUs Full chat completions API documentation --- # Source: https://docs.fireworks.ai/api-reference/generate-or-edit-image-using-flux-kontext.md # Generate or edit an image with FLUX.1 Kontext 💡 Note that this API is async and will return the **request\_id** instead of the image. Call the [get\_result](/api-reference/get-generated-image-from-flux-kontex) API to obtain the generated image. FLUX Kontext Pro is a specialized model for generating contextually-aware images from text descriptions. Designed for professional use cases requiring high-quality, consistent image generation. Use our [Playground](https://app.fireworks.ai/playground?model=accounts/fireworks/models/flux-kontext-pro) to quickly try it out in your browser. FLUX Kontext Max is the most advanced model in the Kontext series, offering maximum quality and context understanding. Ideal for enterprise applications requiring the highest level of image generation performance. Use our [Playground](https://app.fireworks.ai/playground?model=accounts/fireworks/models/flux-kontext-max) to quickly try it out in your browser. ## Path The model to use for image generation. Use **flux-kontext-pro** or **flux-kontext-max** as the model name in the API. ## Headers The media type of the request body. Your Fireworks API key. ## Request Body Prompt to use for the image generation process. Base64 encoded image or URL to use with Kontext. Optional seed for reproducibility. Aspect ratio of the image between 21:9 and 9:21. Output format for the generated image. Can be 'jpeg' or 'png'. **Options:** `jpeg`, `png` URL to receive webhook notifications. **Length:** 1-2083 characters Optional secret for webhook signature verification. Whether to perform upsampling on the prompt. If active, automatically modifies the prompt for more creative generation. Tolerance level for input and output moderation. Between 0 and 6, 0 being most strict, 6 being least strict. Limit of 2 for Image to Image. **Range:** 0-6 ```python Python theme={null} import requests url = "https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/{model}" headers = { "Content-Type": "application/json", "Authorization": "Bearer $API_KEY", } data = { "prompt": "A beautiful sunset over the ocean", "input_image": "", "seed": 42, "aspect_ratio": "", "output_format": "jpeg", "webhook_url": "", "webhook_secret": "", "prompt_upsampling": False, "safety_tolerance": 2 } response = requests.post(url, headers=headers, json=data) ``` ```typescript TypeScript theme={null} import fs from "fs"; import fetch from "node-fetch"; (async () => { const response = await fetch("https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/{model}", { method: "POST", headers: { "Content-Type": "application/json", "Authorization": "Bearer $API_KEY" }, body: JSON.stringify({ prompt: "A beautiful sunset over the ocean" }), }); })().catch(console.error); ``` ```shell curl theme={null} curl --request POST \ -S --fail-with-body \ --url https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/{model} \ -H 'Content-Type: application/json' \ -H "Authorization: Bearer $API_KEY" \ --data ' { "prompt": "A beautiful sunset over the ocean" }' ``` ## Response Successful Response request id Unsuccessful Response error message --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-account.md # Source: https://docs.fireworks.ai/api-reference/get-account.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-account.md # Source: https://docs.fireworks.ai/api-reference/get-account.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-account.md # Source: https://docs.fireworks.ai/api-reference/get-account.md # Get Account ## OpenAPI ````yaml get /v1/accounts/{account_id} paths: path: /v1/accounts/{account_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: The resource name of the account. e.g. accounts/my-account readOnly: true displayName: allOf: - type: string description: >- Human-readable display name of the account. e.g. "My Account" Must be fewer than 64 characters long. createTime: allOf: - type: string format: date-time description: The creation time of the account. readOnly: true email: allOf: - type: string description: >- For developer accounts, this is the email of the developer user and is immutable. For ENTERPRISE and BUSINESS accounts, this is mutable and it is the email that will recieve the invoice for the account if automated billing is used. state: allOf: - $ref: '#/components/schemas/gatewayAccountState' description: The state of the account. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: Contains information about the account status. readOnly: true suspendState: allOf: - $ref: '#/components/schemas/AccountSuspendState' readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the account. readOnly: true title: 'Next ID: 25' refIdentifier: '#/components/schemas/gatewayAccount' requiredProperties: - email examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' email: state: STATE_UNSPECIFIED status: code: OK message: suspendState: UNSUSPENDED updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: AccountSuspendState: type: string enum: - UNSUSPENDED - FAILED_PAYMENTS - CREDIT_DEPLETED - MONTHLY_SPEND_LIMIT_EXCEEDED - BLOCKED_BY_ABUSE_RULE default: UNSUSPENDED gatewayAccountState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - UPDATING - DELETING default: STATE_UNSPECIFIED gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/get-batch-inference-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/get-batch-inference-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-batch-inference-job.md # Source: https://docs.fireworks.ai/api-reference/get-batch-inference-job.md # Get Batch Inference Job ## OpenAPI ````yaml get /v1/accounts/{account_id}/batchInferenceJobs/{batch_inference_job_id} paths: path: /v1/accounts/{account_id}/batchInferenceJobs/{batch_inference_job_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id batch_inference_job_id: schema: - type: string required: true description: The Batch Inference Job Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the batch inference job. e.g. accounts/my-account/batchInferenceJobs/my-batch-inference-job readOnly: true displayName: allOf: - type: string title: >- Human-readable display name of the batch inference job. e.g. "My Batch Inference Job" createTime: allOf: - type: string format: date-time description: The creation time of the batch inference job. readOnly: true createdBy: allOf: - type: string description: >- The email address of the user who initiated this batch inference job. readOnly: true state: allOf: - $ref: '#/components/schemas/gatewayJobState' description: >- JobState represents the state an asynchronous job can be in. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' readOnly: true model: allOf: - type: string description: >- The name of the model to use for inference. This is required, except when continued_from_job_name is specified. inputDatasetId: allOf: - type: string description: >- The name of the dataset used for inference. This is required, except when continued_from_job_name is specified. outputDatasetId: allOf: - type: string description: >- The name of the dataset used for storing the results. This will also contain the error file. inferenceParameters: allOf: - $ref: '#/components/schemas/gatewayInferenceParameters' description: Parameters controlling the inference process. updateTime: allOf: - type: string format: date-time description: The update time for the batch inference job. readOnly: true precision: allOf: - $ref: '#/components/schemas/DeploymentPrecision' description: >- The precision with which the model should be served. If PRECISION_UNSPECIFIED, a default will be chosen based on the model. jobProgress: allOf: - $ref: '#/components/schemas/gatewayJobProgress' description: Job progress. readOnly: true continuedFromJobName: allOf: - type: string description: >- The resource name of the batch inference job that this job continues from. Used for lineage tracking to understand job continuation chains. title: 'Next ID: 31' refIdentifier: '#/components/schemas/gatewayBatchInferenceJob' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' createdBy: state: JOB_STATE_UNSPECIFIED status: code: OK message: model: inputDatasetId: outputDatasetId: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 updateTime: '2023-11-07T05:31:56Z' precision: PRECISION_UNSPECIFIED jobProgress: percent: 123 epoch: 123 totalInputRequests: 123 totalProcessedRequests: 123 successfullyProcessedRequests: 123 failedRequests: 123 outputRows: 123 inputTokens: 123 outputTokens: 123 cachedInputTokenCount: 123 continuedFromJobName: description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayInferenceParameters: type: object properties: maxTokens: type: integer format: int32 description: Maximum number of tokens to generate per response. temperature: type: number format: float description: Sampling temperature, typically between 0 and 2. topP: type: number format: float description: Top-p sampling parameter, typically between 0 and 1. 'n': type: integer format: int32 description: Number of response candidates to generate per input. extraBody: type: string description: |- Additional parameters for the inference request as a JSON string. For example: "{\"stop\": [\"\\n\"]}". topK: type: integer format: int32 description: >- Top-k sampling parameter, limits the token selection to the top k tokens. description: Parameters for the inference requests. gatewayJobProgress: type: object properties: percent: type: integer format: int32 description: Progress percent, within the range from 0 to 100. epoch: type: integer format: int32 description: >- The epoch for which the progress percent is reported, usually starting from 0. This is optional for jobs that don't run in an epoch fasion, e.g. BIJ, EVJ. totalInputRequests: type: integer format: int32 description: Total number of input requests/rows in the job. totalProcessedRequests: type: integer format: int32 description: >- Total number of requests that have been processed (successfully or failed). successfullyProcessedRequests: type: integer format: int32 description: Number of requests that were processed successfully. failedRequests: type: integer format: int32 description: Number of requests that failed to process. outputRows: type: integer format: int32 description: Number of output rows generated. inputTokens: type: integer format: int32 description: Total number of input tokens processed. outputTokens: type: integer format: int32 description: Total number of output tokens generated. cachedInputTokenCount: type: integer format: int32 description: The number of input tokens that hit the prompt cache. description: Progress of a job, e.g. RLOR, EVJ, BIJ etc. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-batch-job-logs.md # Get Batch Job Logs ## OpenAPI ````yaml get /v1/accounts/{account_id}/batchJobs/{batch_job_id}:getLogs paths: path: /v1/accounts/{account_id}/batchJobs/{batch_job_id}:getLogs method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id batch_job_id: schema: - type: string required: true description: The Batch Job Id query: ranks: schema: - type: array items: allOf: - type: integer format: int32 required: false description: Ranks, for which to fetch logs. explode: true pageSize: schema: - type: integer required: false description: >- The maximum number of log entries to return. The maximum page_size is 10,000, values above 10,000 will be coerced to 10,000. If unspecified, the default is 100. pageToken: schema: - type: string required: false description: >- A page token, received from a previous GetBatchJobLogsRequest call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to GetBatchJobLogsRequest must match the call that provided the page token. startTime: schema: - type: string required: false description: |- Entries before this timestamp won't be returned. If not specified, up to page_size last records will be returned. format: date-time filter: schema: - type: string required: false description: |- Only entries matching this filter will be returned. Currently only basic substring match is performed. startFromHead: schema: - type: boolean required: false description: >- Pagination direction, time-wise reverse direction by default (false). readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: entries: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayLogEntry' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. refIdentifier: '#/components/schemas/gatewayGetBatchJobLogsResponse' examples: example: value: entries: - logTime: '2023-11-07T05:31:56Z' rank: 123 message: nextPageToken: description: A successful response. deprecated: false type: path components: schemas: gatewayLogEntry: type: object properties: logTime: type: string format: date-time description: The timestamp of the log entry. rank: type: integer format: int32 description: The rank which produced the log entry. message: type: string description: The log messsage. ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-batch-job.md # Get Batch Job ## OpenAPI ````yaml get /v1/accounts/{account_id}/batchJobs/{batch_job_id} paths: path: /v1/accounts/{account_id}/batchJobs/{batch_job_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id batch_job_id: schema: - type: string required: true description: The Batch Job Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the batch job. e.g. accounts/my-account/clusters/my-cluster/batchJobs/123456789 readOnly: true displayName: allOf: - type: string description: >- Human-readable display name of the batch job. e.g. "My Batch Job" Must be fewer than 64 characters long. createTime: allOf: - type: string format: date-time description: The creation time of the batch job. readOnly: true startTime: allOf: - type: string format: date-time description: The time when the batch job started running. readOnly: true endTime: allOf: - type: string format: date-time description: >- The time when the batch job completed, failed, or was cancelled. readOnly: true createdBy: allOf: - type: string description: The email address of the user who created this batch job. readOnly: true nodePoolId: allOf: - type: string title: >- The ID of the node pool that this batch job should use. e.g. my-node-pool environmentId: allOf: - type: string description: >- The ID of the environment that this batch job should use. e.g. my-env If specified, image_ref must not be specified. snapshotId: allOf: - type: string description: >- The ID of the snapshot used by this batch job. If specified, environment_id must be specified and image_ref must not be specified. numRanks: allOf: - type: integer format: int32 description: |- For GPU node pools: one GPU per rank w/ host packing, for CPU node pools: one host per rank. envVars: allOf: - type: object additionalProperties: type: string description: >- Environment variables to be passed during this job's execution. role: allOf: - type: string description: >- The ARN of the AWS IAM role that the batch job should assume. If not specified, the connection will fall back to the node pool's node_role. pythonExecutor: allOf: - $ref: '#/components/schemas/gatewayPythonExecutor' notebookExecutor: allOf: - $ref: '#/components/schemas/gatewayNotebookExecutor' shellExecutor: allOf: - $ref: '#/components/schemas/gatewayShellExecutor' imageRef: allOf: - type: string description: >- The container image used by this job. If specified, environment_id and snapshot_id must not be specified. annotations: allOf: - type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. state: allOf: - $ref: '#/components/schemas/gatewayBatchJobState' description: The current state of the batch job. readOnly: true status: allOf: - type: string description: >- Detailed information about the current status of the batch job. readOnly: true shared: allOf: - type: boolean description: >- Whether the batch job is shared with all users in the account. This allows all users to update, delete, clone, and create environments using the batch job. updateTime: allOf: - type: string format: date-time description: The update time for the batch job. readOnly: true title: 'Next ID: 22' refIdentifier: '#/components/schemas/gatewayBatchJob' requiredProperties: - nodePoolId examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' startTime: '2023-11-07T05:31:56Z' endTime: '2023-11-07T05:31:56Z' createdBy: nodePoolId: environmentId: snapshotId: numRanks: 123 envVars: {} role: pythonExecutor: targetType: TARGET_TYPE_UNSPECIFIED target: args: - notebookExecutor: notebookFilename: shellExecutor: command: imageRef: annotations: {} state: STATE_UNSPECIFIED status: shared: true updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: PythonExecutorTargetType: type: string enum: - TARGET_TYPE_UNSPECIFIED - MODULE - FILENAME default: TARGET_TYPE_UNSPECIFIED description: |2- - MODULE: Runs a python module, i.e. passed as -m argument. - FILENAME: Runs a python file. gatewayBatchJobState: type: string enum: - STATE_UNSPECIFIED - CREATING - QUEUED - PENDING - RUNNING - COMPLETED - FAILED - CANCELLING - CANCELLED - DELETING default: STATE_UNSPECIFIED description: |- - CREATING: The batch job is being created. - QUEUED: The batch job is in the queue and waiting to be scheduled. Currently unused. - PENDING: The batch job scheduled and is waiting for resource allocation. - RUNNING: The batch job is running. - COMPLETED: The batch job has finished successfully. - FAILED: The batch job has failed. - CANCELLING: The batch job is being cancelled. - CANCELLED: The batch job was cancelled. - DELETING: The batch job is being deleted. title: 'Next ID: 10' gatewayNotebookExecutor: type: object properties: notebookFilename: type: string description: Path to a notebook file to be executed. description: Execute a notebook file. required: - notebookFilename gatewayPythonExecutor: type: object properties: targetType: $ref: '#/components/schemas/PythonExecutorTargetType' description: The type of Python target to run. target: type: string description: A Python module or filename depending on TargetType. args: type: array items: type: string description: Command line arguments to pass to the Python process. description: Execute a Python process. required: - targetType - target gatewayShellExecutor: type: object properties: command: type: string title: Command we want to run for the shell script description: Execute a shell script. required: - command ```` --- # Source: https://docs.fireworks.ai/api-reference/get-batch-status.md # Check Batch Status This endpoint allows you to check the current status of a previously submitted batch request, and retrieve the final result if available. Check status of your batch request ### Headers Your Fireworks API key. e.g. `Authorization=FIREWORKS_API_KEY`. Alternatively, can be provided as a query param. ### Path Parameters The identifier of your Fireworks account. Must match the account used when the batch request was submitted. The unique identifier of the batch job to check.\ This should match the `batch_id` returned when the batch request was originally submitted. ### Response The response includes the status of the batch job and, if completed, the final result. The status of the batch job at the time of the request.\ Possible values include `"completed"` and `"processing"`. The unique identifier of the batch job whose status is being retrieved.\ This ID matches the one provided in the original request. A human-readable message describing the current state of the batch job.\ This field is typically `null` when the job has completed successfully. The original content type of the response body.\ This value can be used to determine how to parse the string in the `body` field. The serialized result of the batch job, this field is only present when `status` is `"completed"`.\ The format of this string depends on the `content_type` field and may vary across endpoints.\ Clients should use `content_type` to determine how to parse or interpret the value. ```curl curl theme={null} # Make request curl -X GET "https://audio-batch.api.fireworks.ai/v1/accounts/{account_id}/batch_job/{batch_id}" \ -H "Authorization: " ``` ```python python theme={null} !pip install requests import os import requests # Input api key and path parameters api_key = "" account_id = "" batch_id = "" # Send request url = f"https://audio-batch.api.fireworks.ai/v1/accounts/{account_id}/batch_job/{batch_id}" headers = {"Authorization": api_key} response = requests.get(url, headers=headers) print(response.text) ``` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-cluster-connection-info.md # Get Cluster Connection Info > Retrieve connection settings for the cluster to be put in kubeconfig ## OpenAPI ````yaml get /v1/accounts/{account_id}/clusters/{cluster_id}:getConnectionInfo paths: path: /v1/accounts/{account_id}/clusters/{cluster_id}:getConnectionInfo method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id cluster_id: schema: - type: string required: true description: The Cluster Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: endpoint: allOf: - type: string description: The cluster's Kubernetes API server endpoint. caData: allOf: - type: string description: Base64-encoded cluster's CA certificate. refIdentifier: '#/components/schemas/gatewayClusterConnectionInfo' examples: example: value: endpoint: caData: description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-cluster.md # Get Cluster ## OpenAPI ````yaml get /v1/accounts/{account_id}/clusters/{cluster_id} paths: path: /v1/accounts/{account_id}/clusters/{cluster_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id cluster_id: schema: - type: string required: true description: The Cluster Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the cluster. e.g. accounts/my-account/clusters/my-cluster readOnly: true displayName: allOf: - type: string description: >- Human-readable display name of the cluster. e.g. "My Cluster" Must be fewer than 64 characters long. createTime: allOf: - type: string format: date-time description: The creation time of the cluster. readOnly: true eksCluster: allOf: - $ref: '#/components/schemas/gatewayEksCluster' fakeCluster: allOf: - $ref: '#/components/schemas/gatewayFakeCluster' state: allOf: - $ref: '#/components/schemas/gatewayClusterState' description: The current state of the cluster. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: >- Detailed information about the current status of the cluster. readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the cluster. readOnly: true title: 'Next ID: 15' refIdentifier: '#/components/schemas/gatewayCluster' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' eksCluster: awsAccountId: fireworksManagerRole: region: clusterName: storageBucketName: metricWriterRole: loadBalancerControllerRole: workloadIdentityPoolProviderId: inferenceRole: fakeCluster: projectId: location: clusterName: state: STATE_UNSPECIFIED status: code: OK message: updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayClusterState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED default: STATE_UNSPECIFIED description: |2- - CREATING: The cluster is still being created. - READY: The cluster is ready to be used. - DELETING: The cluster is being deleted. - FAILED: Cluster is not operational. Consult 'status' for detailed messaging. Cluster needs to be deleted and re-created. gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEksCluster: type: object properties: awsAccountId: type: string description: The 12-digit AWS account ID where this cluster lives. fireworksManagerRole: type: string title: >- The IAM role ARN used to manage Fireworks resources on AWS. If not specified, the default is arn:aws:iam:::role/FireworksManagerRole region: type: string description: >- The AWS region where this cluster lives. See https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html for a list of available regions. clusterName: type: string description: The EKS cluster name. storageBucketName: type: string description: The S3 bucket name. metricWriterRole: type: string description: >- The IAM role ARN used by Google Managed Prometheus role that will write metrics to Fireworks managed Prometheus. The role must be assumable by the `system:serviceaccount:gmp-system:collector` service account on the EKS cluster. If not specified, no metrics will be written to GCP. loadBalancerControllerRole: type: string description: >- The IAM role ARN used by the EKS load balancer controller (i.e. the load balancer automatically created for the k8s gateway resource). If not specified, no gateway will be created. workloadIdentityPoolProviderId: type: string title: |- The ID of the GCP workload identity pool provider in the Fireworks project for this cluster. The pool ID is assumed to be "byoc-pool" inferenceRole: type: string description: The IAM role ARN used by the inference pods on the cluster. title: |- An Amazon Elastic Kubernetes Service cluster. Next ID: 16 required: - awsAccountId - region gatewayFakeCluster: type: object properties: projectId: type: string location: type: string clusterName: type: string title: A fake cluster using https://pkg.go.dev/k8s.io/client-go/kubernetes/fake gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference/get-dataset-download-endpoint.md # Get Dataset Download Endpoint ## OpenAPI ````yaml get /v1/accounts/{account_id}/datasets/{dataset_id}:getDownloadEndpoint paths: path: /v1/accounts/{account_id}/datasets/{dataset_id}:getDownloadEndpoint method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id dataset_id: schema: - type: string required: true description: The Dataset Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. downloadLineage: schema: - type: boolean required: false description: |- If true, downloads entire lineage chain (all related datasets). Filenames will be prefixed with dataset IDs to avoid collisions. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: filenameToSignedUrls: allOf: - type: object additionalProperties: type: string title: Signed URLs for downloading dataset files refIdentifier: '#/components/schemas/gatewayGetDatasetDownloadEndpointResponse' examples: example: value: filenameToSignedUrls: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference/get-dataset-upload-endpoint.md # Get Dataset Upload Endpoint ## OpenAPI ````yaml post /v1/accounts/{account_id}/datasets/{dataset_id}:getUploadEndpoint paths: path: /v1/accounts/{account_id}/datasets/{dataset_id}:getUploadEndpoint method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id dataset_id: schema: - type: string required: true description: The Dataset Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: filenameToSize: allOf: - type: object additionalProperties: type: string format: int64 description: A mapping from the file name to its size in bytes. readMask: allOf: - type: string description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. required: true refIdentifier: '#/components/schemas/GatewayGetDatasetUploadEndpointBody' requiredProperties: - filenameToSize examples: example: value: filenameToSize: {} readMask: response: '200': application/json: schemaArray: - type: object properties: filenameToSignedUrls: allOf: - type: object additionalProperties: type: string title: Signed URLs for uploading dataset files refIdentifier: '#/components/schemas/gatewayGetDatasetUploadEndpointResponse' examples: example: value: filenameToSignedUrls: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-dataset.md # Source: https://docs.fireworks.ai/api-reference/get-dataset.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-dataset.md # Source: https://docs.fireworks.ai/api-reference/get-dataset.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-dataset.md # Source: https://docs.fireworks.ai/api-reference/get-dataset.md # Get Dataset ## OpenAPI ````yaml get /v1/accounts/{account_id}/datasets/{dataset_id} paths: path: /v1/accounts/{account_id}/datasets/{dataset_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id dataset_id: schema: - type: string required: true description: The Dataset Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - type: string createTime: allOf: - type: string format: date-time readOnly: true state: allOf: - $ref: '#/components/schemas/gatewayDatasetState' readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' readOnly: true exampleCount: allOf: - type: string format: int64 userUploaded: allOf: - $ref: '#/components/schemas/gatewayUserUploaded' evaluationResult: allOf: - $ref: '#/components/schemas/gatewayEvaluationResult' transformed: allOf: - $ref: '#/components/schemas/gatewayTransformed' splitted: allOf: - $ref: '#/components/schemas/gatewaySplitted' evalProtocol: allOf: - $ref: '#/components/schemas/gatewayEvalProtocol' externalUrl: allOf: - type: string title: >- The external URI of the dataset. e.g. gs://foo/bar/baz.jsonl format: allOf: - $ref: '#/components/schemas/DatasetFormat' createdBy: allOf: - type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the dataset. readOnly: true sourceJobName: allOf: - type: string description: >- The resource name of the job that created this dataset (e.g., batch inference job). Used for lineage tracking to understand dataset provenance. estimatedTokenCount: allOf: - type: string format: int64 description: The estimated number of tokens in the dataset. readOnly: true title: 'Next ID: 23' refIdentifier: '#/components/schemas/gatewayDataset' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: exampleCount: userUploaded: {} evaluationResult: evaluationJobId: transformed: sourceDatasetId: filter: originalFormat: FORMAT_UNSPECIFIED splitted: sourceDatasetId: evalProtocol: {} externalUrl: format: FORMAT_UNSPECIFIED createdBy: updateTime: '2023-11-07T05:31:56Z' sourceJobName: estimatedTokenCount: description: A successful response. deprecated: false type: path components: schemas: DatasetFormat: type: string enum: - FORMAT_UNSPECIFIED - CHAT - COMPLETION - RL default: FORMAT_UNSPECIFIED gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDatasetState: type: string enum: - STATE_UNSPECIFIED - UPLOADING - READY default: STATE_UNSPECIFIED gatewayEvalProtocol: type: object gatewayEvaluationResult: type: object properties: evaluationJobId: type: string required: - evaluationJobId gatewaySplitted: type: object properties: sourceDatasetId: type: string required: - sourceDatasetId gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayTransformed: type: object properties: sourceDatasetId: type: string filter: type: string originalFormat: $ref: '#/components/schemas/DatasetFormat' required: - sourceDatasetId gatewayUserUploaded: type: object ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployed-model.md # Source: https://docs.fireworks.ai/api-reference/get-deployed-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployed-model.md # Source: https://docs.fireworks.ai/api-reference/get-deployed-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployed-model.md # Source: https://docs.fireworks.ai/api-reference/get-deployed-model.md # Get LoRA ## OpenAPI ````yaml get /v1/accounts/{account_id}/deployedModels/{deployed_model_id} paths: path: /v1/accounts/{account_id}/deployedModels/{deployed_model_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id deployed_model_id: schema: - type: string required: true description: The Deployed Model Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name. e.g. accounts/my-account/deployedModels/my-deployed-model readOnly: true displayName: allOf: - type: string description: allOf: - type: string description: Description of the resource. createTime: allOf: - type: string format: date-time description: The creation time of the resource. readOnly: true model: allOf: - type: string title: |- The resource name of the model to be deployed. e.g. accounts/my-account/models/my-model deployment: allOf: - type: string description: >- The resource name of the base deployment the model is deployed to. default: allOf: - type: boolean description: >- If true, this is the default target when querying this model without the `#` suffix. The first deployment a model is deployed to will have this field set to true. state: allOf: - $ref: '#/components/schemas/gatewayDeployedModelState' description: The state of the deployed model. readOnly: true serverless: allOf: - type: boolean title: True if the underlying deployment is managed by Fireworks status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: Contains model deploy/undeploy details. readOnly: true public: allOf: - type: boolean description: If true, the deployed model will be publicly reachable. updateTime: allOf: - type: string format: date-time description: The update time for the deployed model. readOnly: true title: 'Next ID: 20' refIdentifier: '#/components/schemas/gatewayDeployedModel' examples: example: value: name: displayName: description: createTime: '2023-11-07T05:31:56Z' model: deployment: default: true state: STATE_UNSPECIFIED serverless: true status: code: OK message: public: true updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDeployedModelState: type: string enum: - STATE_UNSPECIFIED - UNDEPLOYING - DEPLOYING - DEPLOYED - UPDATING default: STATE_UNSPECIFIED description: |- - UNDEPLOYING: The model is being undeployed. - DEPLOYING: The model is being deployed. - DEPLOYED: The model is deployed and ready for inference. - UPDATING: there are updates happening with the deployed model title: 'Next ID: 6' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployment-shape-version.md # Source: https://docs.fireworks.ai/api-reference/get-deployment-shape-version.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployment-shape-version.md # Source: https://docs.fireworks.ai/api-reference/get-deployment-shape-version.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployment-shape-version.md # Source: https://docs.fireworks.ai/api-reference/get-deployment-shape-version.md # Get Deployment Shape Version ## OpenAPI ````yaml get /v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id}/versions/{version_id} paths: path: >- /v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id}/versions/{version_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id deployment_shape_id: schema: - type: string required: true description: The Deployment Shape Id version_id: schema: - type: string required: true description: The Version Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the deployment shape version. e.g. accounts/my-account/deploymentShapes/my-deployment-shape/versions/{version_id} readOnly: true createTime: allOf: - type: string format: date-time description: >- The creation time of the deployment shape version. Lists will be ordered by this field. readOnly: true snapshot: allOf: - $ref: '#/components/schemas/gatewayDeploymentShape' description: Full snapshot of the Deployment Shape at this version. readOnly: true validated: allOf: - type: boolean description: If true, this version has been validated. public: allOf: - type: boolean description: If true, this version will be publicly readable. latestValidated: allOf: - type: boolean description: >- If true, this version is the latest validated version. Only one version of the shape can be the latest validated version. readOnly: true title: >- A deployment shape version is a specific version of a deployment shape. Versions are immutable, only created on updates and deleted when the deployment shape is deleted. Next ID: 9 refIdentifier: '#/components/schemas/gatewayDeploymentShapeVersion' examples: example: value: name: createTime: '2023-11-07T05:31:56Z' snapshot: name: displayName: description: createTime: '2023-11-07T05:31:56Z' updateTime: '2023-11-07T05:31:56Z' baseModel: modelType: parameterCount: acceleratorCount: 123 acceleratorType: ACCELERATOR_TYPE_UNSPECIFIED precision: PRECISION_UNSPECIFIED enableAddons: true draftTokenCount: 123 draftModel: ngramSpeculationLength: 123 enableSessionAffinity: true numLoraDeviceCached: 123 presetType: PRESET_TYPE_UNSPECIFIED validated: true public: true latestValidated: true description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve DeploymentShapePresetType: type: string enum: - PRESET_TYPE_UNSPECIFIED - MINIMAL - FAST - THROUGHPUT - REINFORCEMENT_FINE_TUNING default: PRESET_TYPE_UNSPECIFIED gatewayAcceleratorType: type: string enum: - ACCELERATOR_TYPE_UNSPECIFIED - NVIDIA_A100_80GB - NVIDIA_H100_80GB - AMD_MI300X_192GB - NVIDIA_A10G_24GB - NVIDIA_A100_40GB - NVIDIA_L4_24GB - NVIDIA_H200_141GB - NVIDIA_B200_180GB - AMD_MI325X_256GB default: ACCELERATOR_TYPE_UNSPECIFIED gatewayDeploymentShape: type: object properties: name: type: string title: >- The resource name of the deployment shape. e.g. accounts/my-account/deploymentShapes/my-deployment-shape readOnly: true displayName: type: string description: >- Human-readable display name of the deployment shape. e.g. "My Deployment Shape" Must be fewer than 64 characters long. description: type: string description: >- The description of the deployment shape. Must be fewer than 1000 characters long. createTime: type: string format: date-time description: The creation time of the deployment shape. readOnly: true updateTime: type: string format: date-time description: The update time for the deployment shape. readOnly: true baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b modelType: type: string description: The model type of the base model. readOnly: true parameterCount: type: string format: int64 description: The parameter count of the base model . readOnly: true acceleratorCount: type: integer format: int32 description: >- The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model. acceleratorType: $ref: '#/components/schemas/gatewayAcceleratorType' description: |- The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB. precision: $ref: '#/components/schemas/DeploymentPrecision' description: The precision with which the model should be served. enableAddons: type: boolean description: >- If true, LORA addons are enabled for deployments created from this shape. draftTokenCount: type: integer format: int32 description: |- The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. draftModel: type: string description: >- The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. this behavior. ngramSpeculationLength: type: integer format: int32 description: >- The length of previous input sequence to be considered for N-gram speculation. enableSessionAffinity: type: boolean description: Whether to apply sticky routing based on `user` field. numLoraDeviceCached: type: integer format: int32 title: How many LORA adapters to keep on GPU side for caching readOnly: true presetType: $ref: '#/components/schemas/DeploymentShapePresetType' description: Type of deployment shape for different deployment configurations. title: >- A deployment shape is a set of parameters that define the shape of a deployment. Deployments are created from a deployment shape. Next ID: 33 required: - baseModel ```` --- # Source: https://docs.fireworks.ai/api-reference/get-deployment-shape.md # Get Deployment Shape ## OpenAPI ````yaml get /v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id} openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id}: get: tags: - Gateway summary: Get Deployment Shape operationId: Gateway_GetDeploymentShape parameters: - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string - name: deployment_shape_id in: path required: true description: The Deployment Shape Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayDeploymentShape' components: schemas: gatewayDeploymentShape: type: object properties: name: type: string title: >- The resource name of the deployment shape. e.g. accounts/my-account/deploymentShapes/my-deployment-shape readOnly: true displayName: type: string description: >- Human-readable display name of the deployment shape. e.g. "My Deployment Shape" Must be fewer than 64 characters long. description: type: string description: >- The description of the deployment shape. Must be fewer than 1000 characters long. createTime: type: string format: date-time description: The creation time of the deployment shape. readOnly: true updateTime: type: string format: date-time description: The update time for the deployment shape. readOnly: true baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b modelType: type: string description: The model type of the base model. readOnly: true parameterCount: type: string format: int64 description: The parameter count of the base model . readOnly: true acceleratorCount: type: integer format: int32 description: >- The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model. acceleratorType: $ref: '#/components/schemas/gatewayAcceleratorType' description: |- The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB. precision: $ref: '#/components/schemas/DeploymentPrecision' description: The precision with which the model should be served. disableDeploymentSizeValidation: type: boolean description: If true, the deployment size validation is disabled. enableAddons: type: boolean description: >- If true, LORA addons are enabled for deployments created from this shape. draftTokenCount: type: integer format: int32 description: |- The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. draftModel: type: string description: >- The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. this behavior. ngramSpeculationLength: type: integer format: int32 description: >- The length of previous input sequence to be considered for N-gram speculation. enableSessionAffinity: type: boolean description: Whether to apply sticky routing based on `user` field. numLoraDeviceCached: type: integer format: int32 title: How many LORA adapters to keep on GPU side for caching presetType: $ref: '#/components/schemas/DeploymentShapePresetType' description: Type of deployment shape for different deployment configurations. title: >- A deployment shape is a set of parameters that define the shape of a deployment. Deployments are created from a deployment shape. Next ID: 34 required: - baseModel gatewayAcceleratorType: type: string enum: - ACCELERATOR_TYPE_UNSPECIFIED - NVIDIA_A100_80GB - NVIDIA_H100_80GB - AMD_MI300X_192GB - NVIDIA_A10G_24GB - NVIDIA_A100_40GB - NVIDIA_L4_24GB - NVIDIA_H200_141GB - NVIDIA_B200_180GB - AMD_MI325X_256GB - AMD_MI350X_288GB default: ACCELERATOR_TYPE_UNSPECIFIED title: 'Next ID: 11' DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve DeploymentShapePresetType: type: string enum: - PRESET_TYPE_UNSPECIFIED - MINIMAL - FAST - THROUGHPUT - FULL_PRECISION default: PRESET_TYPE_UNSPECIFIED title: |- - MINIMAL: Preset for cheapest & most minimal type of deployment - FAST: Preset for fastest generation & TTFT deployment - THROUGHPUT: Preset for best throughput deployment - FULL_PRECISION: Preset for deployment with full precision for training & most accurate numerics securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployment.md # Source: https://docs.fireworks.ai/api-reference/get-deployment.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployment.md # Source: https://docs.fireworks.ai/api-reference/get-deployment.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-deployment.md # Source: https://docs.fireworks.ai/api-reference/get-deployment.md # Get Deployment ## OpenAPI ````yaml get /v1/accounts/{account_id}/deployments/{deployment_id} paths: path: /v1/accounts/{account_id}/deployments/{deployment_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id deployment_id: schema: - type: string required: true description: The Deployment Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the deployment. e.g. accounts/my-account/deployments/my-deployment readOnly: true displayName: allOf: - type: string description: >- Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long. description: allOf: - type: string description: Description of the deployment. createTime: allOf: - type: string format: date-time description: The creation time of the deployment. readOnly: true expireTime: allOf: - type: string format: date-time description: >- The time at which this deployment will automatically be deleted. purgeTime: allOf: - type: string format: date-time description: The time at which the resource will be hard deleted. readOnly: true deleteTime: allOf: - type: string format: date-time description: The time at which the resource will be soft deleted. readOnly: true state: allOf: - $ref: '#/components/schemas/gatewayDeploymentState' description: The state of the deployment. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: >- Detailed status information regarding the most recent operation. readOnly: true minReplicaCount: allOf: - type: integer format: int32 description: |- The minimum number of replicas. If not specified, the default is 0. maxReplicaCount: allOf: - type: integer format: int32 description: >- The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0. desiredReplicaCount: allOf: - type: integer format: int32 description: >- The desired number of replicas for this deployment. This represents the target replica count that the system is trying to achieve. readOnly: true replicaCount: allOf: - type: integer format: int32 readOnly: true autoscalingPolicy: allOf: - $ref: '#/components/schemas/gatewayAutoscalingPolicy' baseModel: allOf: - type: string title: >- The base model name. e.g. accounts/fireworks/models/falcon-7b acceleratorCount: allOf: - type: integer format: int32 description: >- The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model. acceleratorType: allOf: - $ref: '#/components/schemas/gatewayAcceleratorType' description: |- The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB. precision: allOf: - $ref: '#/components/schemas/DeploymentPrecision' description: The precision with which the model should be served. cluster: allOf: - type: string description: >- If set, this deployment is deployed to a cloud-premise cluster. readOnly: true enableAddons: allOf: - type: boolean description: If true, PEFT addons are enabled for this deployment. draftTokenCount: allOf: - type: integer format: int32 description: >- The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior. draftModel: allOf: - type: string description: >- The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior. ngramSpeculationLength: allOf: - type: integer format: int32 description: >- The length of previous input sequence to be considered for N-gram speculation. enableSessionAffinity: allOf: - type: boolean description: Whether to apply sticky routing based on `user` field. directRouteApiKeys: allOf: - type: array items: type: string description: >- The set of API keys used to access the direct route deployment. If direct routing is not enabled, this field is unused. numPeftDeviceCached: allOf: - type: integer format: int32 title: How many peft adapters to keep on gpu side for caching readOnly: true directRouteType: allOf: - $ref: '#/components/schemas/gatewayDirectRouteType' description: >- If set, this deployment will expose an endpoint that bypasses the Fireworks API gateway. directRouteHandle: allOf: - type: string description: >- The handle for calling a direct route. The meaning of the handle depends on the direct route type of the deployment: INTERNET -> The host name for accessing the deployment GCP_PRIVATE_SERVICE_CONNECT -> The service attachment name used to create the PSC endpoint. AWS_PRIVATELINK -> The service name used to create the VPC endpoint. readOnly: true deploymentTemplate: allOf: - type: string description: >- The name of the deployment template to use for this deployment. Only available to enterprise accounts. autoTune: allOf: - $ref: '#/components/schemas/gatewayAutoTune' description: The performance profile to use for this deployment. placement: allOf: - $ref: '#/components/schemas/gatewayPlacement' description: >- The desired geographic region where the deployment must be placed. If unspecified, the default is the GLOBAL multi-region. region: allOf: - $ref: '#/components/schemas/gatewayRegion' description: >- The geographic region where the deployment is presently located. This region may change over time, but within the `placement` constraint. readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the deployment. readOnly: true disableDeploymentSizeValidation: allOf: - type: boolean description: Whether the deployment size validation is disabled. enableMtp: allOf: - type: boolean description: If true, MTP is enabled for this deployment. enableHotReloadLatestAddon: allOf: - type: boolean description: >- Allows up to 1 addon at a time to be loaded, and will merge it into the base model. deploymentShape: allOf: - type: string description: >- The name of the deployment shape that this deployment is using. On the server side, this will be replaced with the deployment shape version name. activeModelVersion: allOf: - type: string description: >- The model version that is currently active and applied to running replicas of a deployment. targetModelVersion: allOf: - type: string description: >- The target model version that is being rolled out to the deployment. In a ready steady state, the target model version is the same as the active model version. title: 'Next ID: 82' refIdentifier: '#/components/schemas/gatewayDeployment' requiredProperties: - baseModel examples: example: value: name: displayName: description: createTime: '2023-11-07T05:31:56Z' expireTime: '2023-11-07T05:31:56Z' purgeTime: '2023-11-07T05:31:56Z' deleteTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: minReplicaCount: 123 maxReplicaCount: 123 desiredReplicaCount: 123 replicaCount: 123 autoscalingPolicy: scaleUpWindow: scaleDownWindow: scaleToZeroWindow: loadTargets: {} baseModel: acceleratorCount: 123 acceleratorType: ACCELERATOR_TYPE_UNSPECIFIED precision: PRECISION_UNSPECIFIED cluster: enableAddons: true draftTokenCount: 123 draftModel: ngramSpeculationLength: 123 enableSessionAffinity: true directRouteApiKeys: - numPeftDeviceCached: 123 directRouteType: DIRECT_ROUTE_TYPE_UNSPECIFIED directRouteHandle: deploymentTemplate: autoTune: longPrompt: true placement: region: REGION_UNSPECIFIED multiRegion: MULTI_REGION_UNSPECIFIED regions: - REGION_UNSPECIFIED region: REGION_UNSPECIFIED updateTime: '2023-11-07T05:31:56Z' disableDeploymentSizeValidation: true enableMtp: true enableHotReloadLatestAddon: true deploymentShape: activeModelVersion: targetModelVersion: description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve gatewayAcceleratorType: type: string enum: - ACCELERATOR_TYPE_UNSPECIFIED - NVIDIA_A100_80GB - NVIDIA_H100_80GB - AMD_MI300X_192GB - NVIDIA_A10G_24GB - NVIDIA_A100_40GB - NVIDIA_L4_24GB - NVIDIA_H200_141GB - NVIDIA_B200_180GB - AMD_MI325X_256GB default: ACCELERATOR_TYPE_UNSPECIFIED gatewayAutoTune: type: object properties: longPrompt: type: boolean description: If true, this deployment is optimized for long prompt lengths. gatewayAutoscalingPolicy: type: object properties: scaleUpWindow: type: string description: >- The duration the autoscaler will wait before scaling up a deployment after observing increased load. Default is 30s. scaleDownWindow: type: string description: >- The duration the autoscaler will wait before scaling down a deployment after observing decreased load. Default is 10m. scaleToZeroWindow: type: string description: >- The duration after which there are no requests that the deployment will be scaled down to zero replicas, if min_replica_count==0. Default is 1h. This must be at least 5 minutes. loadTargets: type: object additionalProperties: type: number format: float title: >- Map of load metric names to their target utilization factors. Currently only the "default" key is supported, which specifies the default target for all metrics. If not specified, the default target is 0.8 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDeploymentState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED - UPDATING - DELETED default: STATE_UNSPECIFIED description: |2- - CREATING: The deployment is still being created. - READY: The deployment is ready to be used. - DELETING: The deployment is being deleted. - FAILED: The deployment failed to be created. See the `status` field for additional details on why it failed. - UPDATING: There are in-progress updates happening with the deployment. - DELETED: The deployment is soft-deleted. gatewayDirectRouteType: type: string enum: - DIRECT_ROUTE_TYPE_UNSPECIFIED - INTERNET - GCP_PRIVATE_SERVICE_CONNECT - AWS_PRIVATELINK default: DIRECT_ROUTE_TYPE_UNSPECIFIED title: |- - DIRECT_ROUTE_TYPE_UNSPECIFIED: No direct routing - INTERNET: The direct route is exposed via the public internet - GCP_PRIVATE_SERVICE_CONNECT: The direct route is exposed via GCP Private Service Connect - AWS_PRIVATELINK: The direct route is exposed via AWS PrivateLink gatewayMultiRegion: type: string enum: - MULTI_REGION_UNSPECIFIED - GLOBAL - US default: MULTI_REGION_UNSPECIFIED gatewayPlacement: type: object properties: region: $ref: '#/components/schemas/gatewayRegion' description: The region where the deployment must be placed. multiRegion: $ref: '#/components/schemas/gatewayMultiRegion' description: The multi-region where the deployment must be placed. regions: type: array items: $ref: '#/components/schemas/gatewayRegion' title: The list of regions where the deployment must be placed description: >- The desired geographic region where the deployment must be placed. Exactly one field will be specified. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference/get-dpo-job-metrics-file-endpoint.md # null ## OpenAPI ````yaml get /v1/accounts/{account_id}/dpoJobs/{dpo_job_id}:getMetricsFileEndpoint paths: path: /v1/accounts/{account_id}/dpoJobs/{dpo_job_id}:getMetricsFileEndpoint method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id dpo_job_id: schema: - type: string required: true description: The Dpo Job Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: signedUrl: allOf: - type: string title: The signed URL for the metrics file title: |- when the JobMetrics file has been created for the DPO job and the file exists, we will populate this field empty otherwise refIdentifier: '#/components/schemas/gatewayGetDpoJobMetricsFileResponse' examples: example: value: signedUrl: description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/get-dpo-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/get-dpo-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-dpo-job.md # Source: https://docs.fireworks.ai/api-reference/get-dpo-job.md # null ## OpenAPI ````yaml get /v1/accounts/{account_id}/dpoJobs/{dpo_job_id} paths: path: /v1/accounts/{account_id}/dpoJobs/{dpo_job_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id dpo_job_id: schema: - type: string required: true description: The Dpo Job Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - type: string createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time readOnly: true dataset: allOf: - type: string description: The name of the dataset used for training. state: allOf: - $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: allOf: - type: string description: The email address of the user who initiated this dpo job. readOnly: true trainingConfig: allOf: - $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. wandbConfig: allOf: - $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging job progress. title: 'Next ID: 13' refIdentifier: '#/components/schemas/gatewayDpoJob' requiredProperties: - dataset examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 wandbConfig: enabled: true apiKey: project: entity: runId: url: description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-environment.md # Get Environment ## OpenAPI ````yaml get /v1/accounts/{account_id}/environments/{environment_id} paths: path: /v1/accounts/{account_id}/environments/{environment_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id environment_id: schema: - type: string required: true description: The Environment Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the environment. e.g. accounts/my-account/clusters/my-cluster/environments/my-env readOnly: true displayName: allOf: - type: string title: >- Human-readable display name of the environment. e.g. "My Environment" createTime: allOf: - type: string format: date-time description: The creation time of the environment. readOnly: true createdBy: allOf: - type: string description: >- The email address of the user who created this environment. readOnly: true state: allOf: - $ref: '#/components/schemas/gatewayEnvironmentState' description: The current state of the environment. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: The current error status of the environment. readOnly: true connection: allOf: - $ref: '#/components/schemas/gatewayEnvironmentConnection' description: Information about the current environment connection. readOnly: true baseImageRef: allOf: - type: string description: >- The URI of the base container image used for this environment. imageRef: allOf: - type: string description: >- The URI of the container image used for this environment. This is a image is an immutable snapshot of the base_image_ref when the environment was created. readOnly: true snapshotImageRef: allOf: - type: string description: >- The URI of the latest container image snapshot for this environment. readOnly: true shared: allOf: - type: boolean description: >- Whether the environment is shared with all users in the account. This allows all users to connect, disconnect, update, delete, clone, and create batch jobs using the environment. annotations: allOf: - type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. updateTime: allOf: - type: string format: date-time description: The update time for the environment. readOnly: true title: 'Next ID: 14' refIdentifier: '#/components/schemas/gatewayEnvironment' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' createdBy: state: STATE_UNSPECIFIED status: code: OK message: connection: nodePoolId: numRanks: 123 role: zone: useLocalStorage: true baseImageRef: imageRef: snapshotImageRef: shared: true annotations: {} updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEnvironmentConnection: type: object properties: nodePoolId: type: string description: The resource id of the node pool the environment is connected to. numRanks: type: integer format: int32 description: |- For GPU node pools: one GPU per rank w/ host packing, for CPU node pools: one host per rank. If not specified, the default is 1. role: type: string description: |- The ARN of the AWS IAM role that the connection should assume. If not specified, the connection will fall back to the node pool's node_role. zone: type: string description: >- Current for the last zone that this environment is connected to. We want to warn the users about cross zone migration latency when they are connecting to node pool in a different zone as their persistent volume. readOnly: true useLocalStorage: type: boolean description: >- If true, the node's local storage will be mounted on /tmp. This flag has no effect if the node does not have local storage. title: 'Next ID: 8' required: - nodePoolId gatewayEnvironmentState: type: string enum: - STATE_UNSPECIFIED - CREATING - DISCONNECTED - CONNECTING - CONNECTED - DISCONNECTING - RECONNECTING - DELETING default: STATE_UNSPECIFIED description: |- - CREATING: The environment is being created. - DISCONNECTED: The environment is not connected. - CONNECTING: The environment is being connected to a node. - CONNECTED: The environment is connected to a node. - DISCONNECTING: The environment is being disconnected from a node. - RECONNECTING: The environment is reconnecting with new connection parameters. - DELETING: The environment is being deleted. title: 'Next ID: 8' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference/get-evaluation-job-log-endpoint.md # Get Evaluation Job execution logs (stream log endpoint + tracing IDs). ## OpenAPI ````yaml get /v1/accounts/{account_id}/evaluationJobs/{evaluation_job_id}:getExecutionLogEndpoint openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluationJobs/{evaluation_job_id}:getExecutionLogEndpoint: get: tags: - Gateway summary: Get Evaluation Job execution logs (stream log endpoint + tracing IDs). operationId: Gateway_GetEvaluationJobExecutionLogEndpoint parameters: - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluation_job_id in: path required: true description: The Evaluation Job Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: >- #/components/schemas/gatewayGetEvaluationJobExecutionLogEndpointResponse components: schemas: gatewayGetEvaluationJobExecutionLogEndpointResponse: type: object properties: executionLogSignedUri: type: string description: >- Short-lived signed URL for the execution log file. Empty if the log file has not been created yet (e.g. job not started or still initializing). contentType: type: string description: |- Content type for the log file (e.g. "text/plain"). Only set when execution_log_signed_uri is present. expireTime: type: string format: date-time description: |- Expiration time of the signed URL. Only set when execution_log_signed_uri is present. description: |- Response carries the stream log URL (for VirtualizedLogViewer). Next ID: 4 securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/get-evaluation-job.md # Get Evaluation Job ## OpenAPI ````yaml get /v1/accounts/{account_id}/evaluationJobs/{evaluation_job_id} openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluationJobs/{evaluation_job_id}: get: tags: - Gateway summary: Get Evaluation Job operationId: Gateway_GetEvaluationJob parameters: - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluation_job_id in: path required: true description: The Evaluation Job Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayEvaluationJob' components: schemas: gatewayEvaluationJob: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true createdBy: type: string readOnly: true state: $ref: '#/components/schemas/gatewayJobState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true evaluator: type: string description: >- The fully-qualified resource name of the Evaluation used by this job. Format: accounts/{account_id}/evaluators/{evaluator_id} inputDataset: type: string description: >- The fully-qualified resource name of the input Dataset used by this job. Format: accounts/{account_id}/datasets/{dataset_id} outputDataset: type: string description: >- The fully-qualified resource name of the output Dataset created by this job. Format: accounts/{account_id}/datasets/{output_dataset_id} metrics: type: object additionalProperties: type: number format: double readOnly: true outputStats: type: string description: The output dataset's aggregated stats for the evaluation job. updateTime: type: string format: date-time description: The update time for the evaluation job. readOnly: true title: 'Next ID: 18' required: - evaluator - inputDataset - outputDataset gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED - JOB_STATE_PAUSED default: JOB_STATE_UNSPECIFIED description: |- JobState represents the state an asynchronous job can be in. - JOB_STATE_PAUSED: Job is paused, typically due to account suspension or manual intervention. gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/get-evaluator-build-log-endpoint.md # Get Evaluator Build Log Endpoint > Returns a signed URL to download the evaluator's build logs. Useful for debugging `BUILD_FAILED` state. ## OpenAPI ````yaml get /v1/accounts/{account_id}/evaluators/{evaluator_id}:getBuildLogEndpoint openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluators/{evaluator_id}:getBuildLogEndpoint: get: tags: - Gateway summary: Get Evaluator Build Log Endpoint description: |- Returns a signed URL to download the evaluator's build logs. Useful for debugging `BUILD_FAILED` state. operationId: Gateway_GetEvaluatorBuildLogEndpoint parameters: - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluator_id in: path required: true description: The Evaluator Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: >- #/components/schemas/gatewayGetEvaluatorBuildLogEndpointResponse components: schemas: gatewayGetEvaluatorBuildLogEndpointResponse: type: object properties: buildLogSignedUri: type: string title: Signed URL for the build log securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-evaluator-revision.md # firectl get evaluator-revision > Get an evaluator revision ``` firectl get evaluator-revision [flags] ``` ### Examples ``` firectl get evaluator-revision accounts/my-account/evaluators/my-evaluator/versions/latest ``` ### Flags ``` -h, --help help for evaluator-revision ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --dry-run Print the request proto without running it. -o, --output Output Set the output format to "text", "json", or "flag". (default text) -p, --profile string fireworks auth and settings profile to use. ``` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/get-evaluator-source-code-endpoint.md # Get Evaluator Source Code Endpoint > Returns a signed URL to download the evaluator's source code archive. Useful for debugging or reviewing the uploaded code. ## OpenAPI ````yaml get /v1/accounts/{account_id}/evaluators/{evaluator_id}:getSourceCodeSignedUrl openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluators/{evaluator_id}:getSourceCodeSignedUrl: get: tags: - Gateway summary: Get Evaluator Source Code Endpoint description: |- Returns a signed URL to download the evaluator's source code archive. Useful for debugging or reviewing the uploaded code. operationId: Gateway_GetEvaluatorSourceCodeEndpoint parameters: - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluator_id in: path required: true description: The Evaluator Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: >- #/components/schemas/gatewayGetEvaluatorSourceCodeEndpointResponse components: schemas: gatewayGetEvaluatorSourceCodeEndpointResponse: type: object properties: filenameToSignedUrls: type: object additionalProperties: type: string title: Mapping from filename to signed URL for downloading the source code securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/get-evaluator-upload-endpoint.md # Get Evaluator Upload Endpoint > Returns signed URLs for uploading evaluator source code (**step 3** in the [Create Evaluator](/api-reference/create-evaluator) workflow). After receiving the signed URL, upload your `.tar.gz` archive using HTTP `PUT` with `Content-Type: application/octet-stream` header. ## OpenAPI ````yaml post /v1/accounts/{account_id}/evaluators/{evaluator_id}:getUploadEndpoint openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluators/{evaluator_id}:getUploadEndpoint: post: tags: - Gateway summary: Get Evaluator Upload Endpoint description: >- Returns signed URLs for uploading evaluator source code (**step 3** in the [Create Evaluator](/api-reference/create-evaluator) workflow). After receiving the signed URL, upload your `.tar.gz` archive using HTTP `PUT` with `Content-Type: application/octet-stream` header. operationId: Gateway_GetEvaluatorUploadEndpoint parameters: - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluator_id in: path required: true description: The Evaluator Id schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/GatewayGetEvaluatorUploadEndpointBody' required: true responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayGetEvaluatorUploadEndpointResponse' components: schemas: GatewayGetEvaluatorUploadEndpointBody: type: object properties: filenameToSize: type: object additionalProperties: type: string format: int64 readMask: type: string required: - filenameToSize gatewayGetEvaluatorUploadEndpointResponse: type: object properties: filenameToSignedUrls: type: object additionalProperties: type: string securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/get-evaluator.md # Get Evaluator > Retrieves an evaluator by name. Use this to monitor build progress after creation (**step 6** in the [Create Evaluator](/api-reference/create-evaluator) workflow). Possible states: - `BUILDING` - Environment is being prepared - `ACTIVE` - Evaluator is ready to use - `BUILD_FAILED` - Check build logs via [Get Evaluator Build Log Endpoint](/api-reference/get-evaluator-build-log-endpoint) ## OpenAPI ````yaml get /v1/accounts/{account_id}/evaluators/{evaluator_id} openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluators/{evaluator_id}: get: tags: - Gateway summary: Get Evaluator description: >- Retrieves an evaluator by name. Use this to monitor build progress after creation (**step 6** in the [Create Evaluator](/api-reference/create-evaluator) workflow). Possible states: - `BUILDING` - Environment is being prepared - `ACTIVE` - Evaluator is ready to use - `BUILD_FAILED` - Check build logs via [Get Evaluator Build Log Endpoint](/api-reference/get-evaluator-build-log-endpoint) operationId: Gateway_GetEvaluator parameters: - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string - name: evaluator_id in: path required: true description: The Evaluator Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayEvaluator' components: schemas: gatewayEvaluator: type: object properties: name: type: string readOnly: true displayName: type: string description: type: string createTime: type: string format: date-time readOnly: true createdBy: type: string readOnly: true updateTime: type: string format: date-time readOnly: true state: $ref: '#/components/schemas/gatewayEvaluatorState' readOnly: true requirements: type: string title: Content for the requirements.txt for package installation entryPoint: type: string title: >- entry point of the evaluator inside the codebase. In module::function or path::function format status: $ref: '#/components/schemas/gatewayStatus' title: Status of the evaluator, used to expose build status to the user readOnly: true commitHash: type: string title: Commit hash of this evaluator from the user's original codebase source: $ref: '#/components/schemas/gatewayEvaluatorSource' description: Source information for the evaluator codebase. defaultDataset: type: string title: Default dataset that is associated with the evaluator title: 'Next ID: 17' gatewayEvaluatorState: type: string enum: - STATE_UNSPECIFIED - ACTIVE - BUILDING - BUILD_FAILED default: STATE_UNSPECIFIED title: |- - ACTIVE: The evaluator is ready to use for evaluation - BUILDING: The evaluator is being built, i.e. building the e2b template - BUILD_FAILED: The evaluator build failed, and it cannot be used for evaluation gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayEvaluatorSource: type: object properties: type: $ref: '#/components/schemas/EvaluatorSourceType' description: Identifies how the evaluator source code is provided. githubRepositoryName: type: string description: >- Normalized GitHub repository name (e.g. owner/repository) when the source is GitHub. gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] EvaluatorSourceType: type: string enum: - TYPE_UNSPECIFIED - TYPE_UPLOAD - TYPE_GITHUB - TYPE_TEMPORARY default: TYPE_UNSPECIFIED title: |- - TYPE_UPLOAD: Source code is uploaded by the user - TYPE_GITHUB: Source code is from a GitHub repository - TYPE_TEMPORARY: Source code is a temporary UI uploaded code securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-feature-flag.md # firectl get feature-flag > Gets a feature flag. ``` firectl get feature-flag [flags] ``` ### Examples ``` firectl get feature-flag my-account my-feature-flag ``` ### Flags ``` -h, --help help for feature-flag ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --dry-run Print the request proto without running it. -o, --output Output Set the output format to "text", "json", or "flag". (default text) -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/api-reference/get-generated-image-from-flux-kontex.md # Get generated image from FLUX.1 Kontext Replace **model** with **flux-kontext-pro** in the API to get the result. Replace **model** with **flux-kontext-max** in the API to get the result. ## Path The model to use for image generation. Use **flux-kontext-pro** or **flux-kontext-max** as the model name in the API. ## Headers The media type of the request body. Your Fireworks API key. ## Request Body Request id generated from create/edit image request. ```python Python theme={null} import requests url = "https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/{model}/get_result" headers = { "Content-Type": "application/json", "Authorization": "Bearer $API_KEY", } data = { id: "request_id" } response = requests.post(url, headers=headers, json=data) print(response.text) ``` ```typescript TypeScript theme={null} import fs from "fs"; import fetch from "node-fetch"; (async () => { const response = await fetch("https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/{model}/get_result", { method: "POST", headers: { "Content-Type": "application/json", "Authorization": "Bearer $API_KEY" }, body: JSON.stringify({ id: "request_id" }), }); })().catch(console.error); ``` ```shell curl theme={null} curl --request POST \ -S --fail-with-body \ --url https://api.fireworks.ai/inference/v1/workflows/accounts/fireworks/models/{model}/get_result \ -H 'Content-Type: application/json' \ -H "Authorization: Bearer $API_KEY" \ --data ' { id: "request_id" }' ``` ## Response Task id for retrieving result Available options: Task not found, Pending, Request Moderated, Content Moderated, Ready, Error --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-identity-provider.md # firectl get identity-provider > Prints information about an identity provider. ``` firectl get identity-provider [flags] ``` ### Examples ``` firectl get identity-provider my-provider ``` ### Flags ``` -h, --help help for identity-provider ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --dry-run Print the request proto without running it. -o, --output Output Set the output format to "text", "json", or "flag". (default text) -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/api-reference/get-model-download-endpoint.md # Get Model Download Endpoint ## OpenAPI ````yaml get /v1/accounts/{account_id}/models/{model_id}:getDownloadEndpoint paths: path: /v1/accounts/{account_id}/models/{model_id}:getDownloadEndpoint method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id model_id: schema: - type: string required: true description: The Model Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: filenameToSignedUrls: allOf: - type: object additionalProperties: type: string title: Signed URLs for for downloading model files refIdentifier: '#/components/schemas/gatewayGetModelDownloadEndpointResponse' examples: example: value: filenameToSignedUrls: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference/get-model-upload-endpoint.md # Get Model Upload Endpoint ## OpenAPI ````yaml post /v1/accounts/{account_id}/models/{model_id}:getUploadEndpoint paths: path: /v1/accounts/{account_id}/models/{model_id}:getUploadEndpoint method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id model_id: schema: - type: string required: true description: The Model Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: filenameToSize: allOf: - type: object additionalProperties: type: string format: int64 description: A mapping from the file name to its size in bytes. enableResumableUpload: allOf: - type: boolean description: If true, enable resumable upload instead of PUT. readMask: allOf: - type: string description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. required: true refIdentifier: '#/components/schemas/GatewayGetModelUploadEndpointBody' requiredProperties: - filenameToSize examples: example: value: filenameToSize: {} enableResumableUpload: true readMask: response: '200': application/json: schemaArray: - type: object properties: filenameToSignedUrls: allOf: - type: object additionalProperties: type: string title: Signed URLs for uploading model files filenameToUnsignedUris: allOf: - type: object additionalProperties: type: string description: >- Unsigned URIs (e.g. s3://bucket/key, gs://bucket/key) for uploading model files. Returned when the caller has permission to upload to the URIs. refIdentifier: '#/components/schemas/gatewayGetModelUploadEndpointResponse' examples: example: value: filenameToSignedUrls: {} filenameToUnsignedUris: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-model.md # Source: https://docs.fireworks.ai/api-reference/get-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-model.md # Source: https://docs.fireworks.ai/api-reference/get-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-model.md # Source: https://docs.fireworks.ai/api-reference/get-model.md # Get Model ## OpenAPI ````yaml get /v1/accounts/{account_id}/models/{model_id} paths: path: /v1/accounts/{account_id}/models/{model_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id model_id: schema: - type: string required: true description: The Model Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the model. e.g. accounts/my-account/models/my-model readOnly: true displayName: allOf: - type: string description: |- Human-readable display name of the model. e.g. "My Model" Must be fewer than 64 characters long. description: allOf: - type: string description: >- The description of the model. Must be fewer than 1000 characters long. createTime: allOf: - type: string format: date-time description: The creation time of the model. readOnly: true state: allOf: - $ref: '#/components/schemas/gatewayModelState' description: The state of the model. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: >- Contains detailed message when the last model operation fails. readOnly: true kind: allOf: - $ref: '#/components/schemas/gatewayModelKind' description: |- The kind of model. If not specified, the default is HF_PEFT_ADDON. githubUrl: allOf: - type: string description: The URL to GitHub repository of the model. huggingFaceUrl: allOf: - type: string description: The URL to the Hugging Face model. baseModelDetails: allOf: - $ref: '#/components/schemas/gatewayBaseModelDetails' description: >- Base model details. Required if kind is HF_BASE_MODEL. Must not be set otherwise. peftDetails: allOf: - $ref: '#/components/schemas/gatewayPEFTDetails' description: |- PEFT addon details. Required if kind is HF_PEFT_ADDON or HF_TEFT_ADDON. teftDetails: allOf: - $ref: '#/components/schemas/gatewayTEFTDetails' description: >- TEFT addon details. Required if kind is HF_TEFT_ADDON. Must not be set otherwise. public: allOf: - type: boolean description: If true, the model will be publicly readable. conversationConfig: allOf: - $ref: '#/components/schemas/gatewayConversationConfig' description: >- If set, the Chat Completions API will be enabled for this model. contextLength: allOf: - type: integer format: int32 description: The maximum context length supported by the model. supportsImageInput: allOf: - type: boolean description: If set, images can be provided as input to the model. supportsTools: allOf: - type: boolean description: >- If set, tools (i.e. functions) can be provided as input to the model, and the model may respond with one or more tool calls. importedFrom: allOf: - type: string description: >- The name of the the model from which this was imported. This field is empty if the model was not imported. readOnly: true fineTuningJob: allOf: - type: string description: >- If the model was created from a fine-tuning job, this is the fine-tuning job name. readOnly: true defaultDraftModel: allOf: - type: string description: >- The default draft model to use when creating a deployment. If empty, speculative decoding is disabled by default. defaultDraftTokenCount: allOf: - type: integer format: int32 description: >- The default draft token count to use when creating a deployment. Must be specified if default_draft_model is specified. deployedModelRefs: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayDeployedModelRef' description: Populated from GetModel API call only. readOnly: true cluster: allOf: - type: string description: >- The resource name of the BYOC cluster to which this model belongs. e.g. accounts/my-account/clusters/my-cluster. Empty if it belongs to a Fireworks cluster. readOnly: true deprecationDate: allOf: - $ref: '#/components/schemas/typeDate' description: >- If specified, this is the date when the serverless deployment of the model will be taken down. calibrated: allOf: - type: boolean description: >- If true, the model is calibrated and can be deployed to non-FP16 precisions. readOnly: true tunable: allOf: - type: boolean description: >- If true, the model can be fine-tuned. The value will be true if the tunable field is true, and the model is validated against the model_type field. readOnly: true supportsLora: allOf: - type: boolean description: Whether this model supports LoRA. useHfApplyChatTemplate: allOf: - type: boolean description: >- If true, the model will use the Hugging Face apply_chat_template API to apply the chat template. updateTime: allOf: - type: string format: date-time description: The update time for the model. readOnly: true defaultSamplingParams: allOf: - type: object additionalProperties: type: number format: float description: >- A json object that contains the default sampling parameters for the model. readOnly: true rlTunable: allOf: - type: boolean description: If true, the model is RL tunable. readOnly: true supportedPrecisions: allOf: - type: array items: $ref: '#/components/schemas/DeploymentPrecision' title: Supported precisions readOnly: true supportedPrecisionsWithCalibration: allOf: - type: array items: $ref: '#/components/schemas/DeploymentPrecision' title: Supported precisions if calibrated readOnly: true trainingContextLength: allOf: - type: integer format: int32 description: The maximum context length supported by the model. snapshotType: allOf: - $ref: '#/components/schemas/ModelSnapshotType' title: 'Next ID: 56' refIdentifier: '#/components/schemas/gatewayModel' examples: example: value: name: displayName: description: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: kind: KIND_UNSPECIFIED githubUrl: huggingFaceUrl: baseModelDetails: worldSize: 123 checkpointFormat: CHECKPOINT_FORMAT_UNSPECIFIED parameterCount: moe: true tunable: true modelType: supportsFireattention: true defaultPrecision: PRECISION_UNSPECIFIED supportsMtp: true peftDetails: baseModel: r: 123 targetModules: - baseModelType: mergeAddonModelName: teftDetails: {} public: true conversationConfig: style: system: template: contextLength: 123 supportsImageInput: true supportsTools: true importedFrom: fineTuningJob: defaultDraftModel: defaultDraftTokenCount: 123 deployedModelRefs: - name: deployment: state: STATE_UNSPECIFIED default: true public: true cluster: deprecationDate: year: 123 month: 123 day: 123 calibrated: true tunable: true supportsLora: true useHfApplyChatTemplate: true updateTime: '2023-11-07T05:31:56Z' defaultSamplingParams: {} rlTunable: true supportedPrecisions: - PRECISION_UNSPECIFIED supportedPrecisionsWithCalibration: - PRECISION_UNSPECIFIED trainingContextLength: 123 snapshotType: FULL_SNAPSHOT description: A successful response. deprecated: false type: path components: schemas: BaseModelDetailsCheckpointFormat: type: string enum: - CHECKPOINT_FORMAT_UNSPECIFIED - NATIVE - HUGGINGFACE default: CHECKPOINT_FORMAT_UNSPECIFIED DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve ModelSnapshotType: type: string enum: - FULL_SNAPSHOT - INCREMENTAL_SNAPSHOT default: FULL_SNAPSHOT gatewayBaseModelDetails: type: object properties: worldSize: type: integer format: int32 description: |- The default number of GPUs the model is served with. If not specified, the default is 1. checkpointFormat: $ref: '#/components/schemas/BaseModelDetailsCheckpointFormat' parameterCount: type: string format: int64 description: >- The number of model parameters. For serverless models, this determines the price per token. moe: type: boolean description: >- If true, this is a Mixture of Experts (MoE) model. For serverless models, this affects the price per token. tunable: type: boolean description: If true, this model is available for fine-tuning. modelType: type: string description: The type of the model. supportsFireattention: type: boolean description: Whether this model supports fireattention. defaultPrecision: $ref: '#/components/schemas/DeploymentPrecision' description: Default precision of the model. readOnly: true supportsMtp: type: boolean description: If true, this model supports MTP. title: 'Next ID: 11' gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayConversationConfig: type: object properties: style: type: string description: The chat template to use. system: type: string description: The system prompt (if the chat style supports it). template: type: string description: The Jinja template (if style is "jinja"). required: - style gatewayDeployedModelRef: type: object properties: name: type: string title: >- The resource name. e.g. accounts/my-account/deployedModels/my-deployed-model readOnly: true deployment: type: string description: The resource name of the base deployment the model is deployed to. readOnly: true state: $ref: '#/components/schemas/gatewayDeployedModelState' description: The state of the deployed model. readOnly: true default: type: boolean description: >- If true, this is the default target when querying this model without the `#` suffix. The first deployment a model is deployed to will have this field set to true automatically. readOnly: true public: type: boolean description: If true, the deployed model will be publicly reachable. readOnly: true title: 'Next ID: 6' gatewayDeployedModelState: type: string enum: - STATE_UNSPECIFIED - UNDEPLOYING - DEPLOYING - DEPLOYED - UPDATING default: STATE_UNSPECIFIED description: |- - UNDEPLOYING: The model is being undeployed. - DEPLOYING: The model is being deployed. - DEPLOYED: The model is deployed and ready for inference. - UPDATING: there are updates happening with the deployed model title: 'Next ID: 6' gatewayModelKind: type: string enum: - KIND_UNSPECIFIED - HF_BASE_MODEL - HF_PEFT_ADDON - HF_TEFT_ADDON - FLUMINA_BASE_MODEL - FLUMINA_ADDON - DRAFT_ADDON - FIRE_AGENT - LIVE_MERGE - CUSTOM_MODEL - EMBEDDING_MODEL - SNAPSHOT_MODEL default: KIND_UNSPECIFIED description: |2- - HF_BASE_MODEL: An LLM base model. - HF_PEFT_ADDON: A parameter-efficent fine-tuned addon. - HF_TEFT_ADDON: A token-eficient fine-tuned addon. - FLUMINA_BASE_MODEL: A Flumina base model. - FLUMINA_ADDON: A Flumina addon. - DRAFT_ADDON: A draft model used for speculative decoding in a deployment. - FIRE_AGENT: A FireAgent model. - LIVE_MERGE: A live-merge model. - CUSTOM_MODEL: A customized model - EMBEDDING_MODEL: An Embedding model. - SNAPSHOT_MODEL: A snapshot model. gatewayModelState: type: string enum: - STATE_UNSPECIFIED - UPLOADING - READY default: STATE_UNSPECIFIED description: |- - UPLOADING: The model is still being uploaded (upload is asynchronous). - READY: The model is ready to be used. title: 'Next ID: 7' gatewayPEFTDetails: type: object properties: baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b r: type: integer format: int32 description: |- The rank of the update matrices. Must be between 4 and 64, inclusive. targetModules: type: array items: type: string title: >- This is the target modules for an adapter that we extract from for more information what target module means, check out https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft baseModelType: type: string description: The type of the model. readOnly: true mergeAddonModelName: type: string title: >- The resource name of the model to merge with base model, e.g accounts/fireworks/models/falcon-7b-lora title: |- PEFT addon details. Next ID: 6 required: - baseModel - r - targetModules gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayTEFTDetails: type: object typeDate: type: object properties: year: type: integer format: int32 description: >- Year of the date. Must be from 1 to 9999, or 0 to specify a date without a year. month: type: integer format: int32 description: >- Month of a year. Must be from 1 to 12, or 0 to specify a year without a month and day. day: type: integer format: int32 description: >- Day of a month. Must be from 1 to 31 and valid for the year and month, or 0 to specify a year by itself or a year and month where the day isn't significant. description: >- * A full date, with non-zero year, month, and day values * A month and day value, with a zero year, such as an anniversary * A year on its own, with zero month and day values * A year and month value, with a zero day, such as a credit card expiration date Related types are [google.type.TimeOfDay][google.type.TimeOfDay] and `google.protobuf.Timestamp`. title: >- Represents a whole or partial calendar date, such as a birthday. The time of day and time zone are either specified elsewhere or are insignificant. The date is relative to the Gregorian Calendar. This can represent one of the following: ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-node-pool-stats.md # Get Node Pool Stats ## OpenAPI ````yaml get /v1/accounts/{account_id}/nodePools/{node_pool_id}:getStats paths: path: /v1/accounts/{account_id}/nodePools/{node_pool_id}:getStats method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id node_pool_id: schema: - type: string required: true description: The Node Pool Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: nodeCount: allOf: - type: integer format: int32 description: The number of nodes currently available in this pool. ranksPerNode: allOf: - type: integer format: int32 description: >- The number of ranks available per node. This is determined by the machine type of the nodes in this node pool. environmentCount: allOf: - type: integer format: int32 description: The number of environments connected to this node pool. environmentRanks: allOf: - type: integer format: int32 description: >- The number of ranks in this node pool that are currently allocated to environment connections. batchJobCount: allOf: - type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of batch jobs in that state allocated to this node pool. batchJobRanks: allOf: - type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of ranks allocated to batch jobs in that state in this node pool. title: 'Next ID: 7' refIdentifier: '#/components/schemas/gatewayNodePoolStats' examples: example: value: nodeCount: 123 ranksPerNode: 123 environmentCount: 123 environmentRanks: 123 batchJobCount: {} batchJobRanks: {} description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-node-pool.md # Get Node Pool ## OpenAPI ````yaml get /v1/accounts/{account_id}/nodePools/{node_pool_id} paths: path: /v1/accounts/{account_id}/nodePools/{node_pool_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id node_pool_id: schema: - type: string required: true description: The Node Pool Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the node pool. e.g. accounts/my-account/clusters/my-cluster/nodePools/my-pool readOnly: true displayName: allOf: - type: string description: >- Human-readable display name of the node pool. e.g. "My Node Pool" Must be fewer than 64 characters long. createTime: allOf: - type: string format: date-time description: The creation time of the node pool. readOnly: true minNodeCount: allOf: - type: integer format: int32 description: >- https://cloud.google.com/kubernetes-engine/quotas Minimum number of nodes in this node pool. Must be a non-negative integer less than or equal to max_node_count. If not specified, the default is 0. maxNodeCount: allOf: - type: integer format: int32 description: >- https://cloud.google.com/kubernetes-engine/quotas Maximum number of nodes in this node pool. Must be a positive integer greater than or equal to min_node_count. If not specified, the default is 1. overprovisionNodeCount: allOf: - type: integer format: int32 description: >- The number of nodes to overprovision by the autoscaler. Must be a non-negative integer and less than or equal to min_node_count and max_node_count-min_node_count. If not specified, the default is 0. eksNodePool: allOf: - $ref: '#/components/schemas/gatewayEksNodePool' fakeNodePool: allOf: - $ref: '#/components/schemas/gatewayFakeNodePool' annotations: allOf: - type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. state: allOf: - $ref: '#/components/schemas/gatewayNodePoolState' description: The current state of the node pool. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: >- Contains detailed message when the last node pool operation fails, e.g. when node pool is in FAILED state or when last node pool update fails. readOnly: true nodePoolStats: allOf: - $ref: '#/components/schemas/gatewayNodePoolStats' description: Live statistics of the node pool. readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the node pool. readOnly: true title: 'Next ID: 16' refIdentifier: '#/components/schemas/gatewayNodePool' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' minNodeCount: 123 maxNodeCount: 123 overprovisionNodeCount: 123 eksNodePool: nodeRole: instanceType: spot: true nodeGroupName: subnetIds: - zone: placementGroup: launchTemplate: fakeNodePool: machineType: numNodes: 123 serviceAccount: annotations: {} state: STATE_UNSPECIFIED status: code: OK message: nodePoolStats: nodeCount: 123 ranksPerNode: 123 environmentCount: 123 environmentRanks: 123 batchJobCount: {} batchJobRanks: {} updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEksNodePool: type: object properties: nodeRole: type: string description: |- If not specified, the parent cluster's system_node_group_role will be used. title: |- The IAM role ARN to associate with nodes. The role must have the following IAM policies attached: - AmazonEKSWorkerNodePolicy - AmazonEC2ContainerRegistryReadOnly - AmazonEKS_CNI_Policy instanceType: type: string description: >- The type of instance used in this node pool. See https://aws.amazon.com/ec2/instance-types/ for a list of valid instance types. spot: type: boolean title: >- If true, nodes are created as preemptible VM instances. See https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html#managed-node-group-capacity-types nodeGroupName: type: string description: |- The name of the node group. If not specified, the default is the node pool ID. subnetIds: type: array items: type: string description: >- A list of subnet IDs for nodes in this node pool. If not specified, the parent cluster's default subnet IDs that matches the zone will be used. Note that all the subnets will need to be in the same zone. zone: type: string description: >- Zone for the node pool. If not specified, a random zone in the cluster's region will be selected. placementGroup: type: string description: Cluster placement group to colocate hosts in this pool. launchTemplate: type: string description: Launch template to create for this node group. title: |- An Amazon Elastic Kubernetes Service node pool. Next ID: 10 required: - instanceType gatewayFakeNodePool: type: object properties: machineType: type: string numNodes: type: integer format: int32 serviceAccount: type: string description: A fake node pool to be used with FakeCluster. gatewayNodePoolState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED default: STATE_UNSPECIFIED description: |2- - CREATING: The cluster is still being created. - READY: The node pool is ready to be used. - DELETING: The node pool is being deleted. - FAILED: Node pool is not operational. Consult 'status' for detailed messaging. Node pool needs to be deleted and re-created. gatewayNodePoolStats: type: object properties: nodeCount: type: integer format: int32 description: The number of nodes currently available in this pool. ranksPerNode: type: integer format: int32 description: >- The number of ranks available per node. This is determined by the machine type of the nodes in this node pool. environmentCount: type: integer format: int32 description: The number of environments connected to this node pool. environmentRanks: type: integer format: int32 description: |- The number of ranks in this node pool that are currently allocated to environment connections. batchJobCount: type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of batch jobs in that state allocated to this node pool. batchJobRanks: type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of ranks allocated to batch jobs in that state in this node pool. title: 'Next ID: 7' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-quota.md # firectl get quota > Prints information about a quota. ``` firectl get quota [flags] ``` ### Examples ``` firectl get quota serverless-inference-rpm firectl get quota accounts/my-account/quotas/serverless-inference-rpm ``` ### Flags ``` -h, --help help for quota ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --dry-run Print the request proto without running it. -o, --output Output Set the output format to "text", "json", or "flag". (default text) -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/get-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/get-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-reinforcement-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/get-reinforcement-fine-tuning-job.md # Get Reinforcement Fine-tuning Job ## OpenAPI ````yaml get /v1/accounts/{account_id}/reinforcementFineTuningJobs/{reinforcement_fine_tuning_job_id} paths: path: >- /v1/accounts/{account_id}/reinforcementFineTuningJobs/{reinforcement_fine_tuning_job_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id reinforcement_fine_tuning_job_id: schema: - type: string required: true description: The Reinforcement Fine-tuning Job Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - type: string createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time description: The completed time for the reinforcement fine-tuning job. readOnly: true dataset: allOf: - type: string description: The name of the dataset used for training. evaluationDataset: allOf: - type: string description: The name of a separate dataset to use for evaluation. evalAutoCarveout: allOf: - type: boolean description: Whether to auto-carve the dataset for eval. state: allOf: - $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: allOf: - type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true trainingConfig: allOf: - $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. evaluator: allOf: - type: string description: >- The evaluator resource name to use for RLOR fine-tuning job. wandbConfig: allOf: - $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. outputStats: allOf: - type: string description: >- The output dataset's aggregated stats for the evaluation job. readOnly: true inferenceParameters: allOf: - $ref: '#/components/schemas/gatewayInferenceParameters' description: BIJ parameters. outputMetrics: allOf: - type: string readOnly: true mcpServer: allOf: - type: string title: 'Next ID: 29' refIdentifier: '#/components/schemas/gatewayReinforcementFineTuningJob' requiredProperties: - dataset - evaluator examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: evaluationDataset: evalAutoCarveout: true state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 evaluator: wandbConfig: enabled: true apiKey: project: entity: runId: url: outputStats: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 outputMetrics: mcpServer: description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayInferenceParameters: type: object properties: maxTokens: type: integer format: int32 description: Maximum number of tokens to generate per response. temperature: type: number format: float description: Sampling temperature, typically between 0 and 2. topP: type: number format: float description: Top-p sampling parameter, typically between 0 and 1. 'n': type: integer format: int32 description: Number of response candidates to generate per input. extraBody: type: string description: |- Additional parameters for the inference request as a JSON string. For example: "{\"stop\": [\"\\n\"]}". topK: type: integer format: int32 description: >- Top-k sampling parameter, limits the token selection to the top k tokens. description: Parameters for the inference requests. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/api-reference/get-reinforcement-fine-tuning-step.md # Get Reinforcement Fine-tuning Step ## OpenAPI ````yaml get /v1/accounts/{account_id}/rlorTrainerJobs/{rlor_trainer_job_id} paths: path: /v1/accounts/{account_id}/rlorTrainerJobs/{rlor_trainer_job_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id rlor_trainer_job_id: schema: - type: string required: true description: The Rlor Trainer Job Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - type: string createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time readOnly: true dataset: allOf: - type: string description: The name of the dataset used for training. evaluationDataset: allOf: - type: string description: The name of a separate dataset to use for evaluation. evalAutoCarveout: allOf: - type: boolean description: Whether to auto-carve the dataset for eval. state: allOf: - $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: allOf: - type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true trainingConfig: allOf: - $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. rewardWeights: allOf: - type: array items: type: string description: >- A list of reward metrics to use for training in format of "=". wandbConfig: allOf: - $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. title: 'Next ID: 18' refIdentifier: '#/components/schemas/gatewayRlorTrainerJob' examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: evaluationDataset: evalAutoCarveout: true state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 rewardWeights: - wandbConfig: enabled: true apiKey: project: entity: runId: url: description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-reservation.md # firectl get reservation > Prints information about a reservation. ``` firectl get reservation [flags] ``` ### Examples ``` firectl get reservation abcdef firectl get reservation accounts/my-account/reservations/abcdef ``` ### Flags ``` -h, --help help for reservation ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --dry-run Print the request proto without running it. -o, --output Output Set the output format to "text", "json", or "flag". (default text) -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/api-reference/get-response.md # Get Response ## OpenAPI ````yaml get /v1/responses/{response_id} paths: path: /v1/responses/{response_id} method: get servers: - url: https://api.fireworks.ai/inference request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: response_id: schema: - type: string required: true title: Response Id query: {} header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: id: allOf: - anyOf: - type: string - type: 'null' title: Id description: >- The unique identifier of the response. Will be None if store=False. object: allOf: - type: string title: Object description: The object type, which is always 'response'. default: response created_at: allOf: - type: integer title: Created At description: >- The Unix timestamp (in seconds) when the response was created. status: allOf: - type: string title: Status description: >- The status of the response. Can be 'completed', 'in_progress', 'incomplete', 'failed', or 'cancelled'. model: allOf: - type: string title: Model description: >- The model used to generate the response (e.g., `accounts//models/`). output: allOf: - items: anyOf: - $ref: '#/components/schemas/Message' - $ref: '#/components/schemas/ToolCall' - $ref: '#/components/schemas/ToolOutput' type: array title: Output description: >- An array of output items produced by the model. Can contain messages, tool calls, and tool outputs. previous_response_id: allOf: - anyOf: - type: string - type: 'null' title: Previous Response Id description: >- The ID of the previous response in the conversation, if this response continues a conversation. usage: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Usage description: >- Token usage information for the request. Contains 'prompt_tokens', 'completion_tokens', and 'total_tokens'. error: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Error description: >- Error information if the response failed. Contains 'type', 'code', and 'message' fields. incomplete_details: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Incomplete Details description: >- Details about why the response is incomplete, if status is 'incomplete'. Contains 'reason' field which can be 'max_output_tokens', 'max_tool_calls', or 'content_filter'. instructions: allOf: - anyOf: - type: string - type: 'null' title: Instructions description: >- System instructions that guide the model's behavior. Similar to a system message. max_output_tokens: allOf: - anyOf: - type: integer - type: 'null' title: Max Output Tokens description: >- The maximum number of tokens that can be generated in the response. Must be at least 1. max_tool_calls: allOf: - anyOf: - type: integer minimum: 1 - type: 'null' title: Max Tool Calls description: >- The maximum number of tool calls allowed in a single response. Must be at least 1. parallel_tool_calls: allOf: - type: boolean title: Parallel Tool Calls description: >- Whether to enable parallel function calling during tool use. Default is True. default: true reasoning: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Reasoning description: >- Reasoning output from the model, if reasoning is enabled. Contains 'content' and 'type' fields. store: allOf: - anyOf: - type: boolean - type: 'null' title: Store description: >- Whether to store this response for future retrieval. If False, the response will not be persisted and previous_response_id cannot reference it. Default is True. default: true temperature: allOf: - type: number maximum: 2 minimum: 0 title: Temperature description: >- The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0. default: 1 text: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Text description: Text generation configuration parameters, if applicable. tool_choice: allOf: - anyOf: - type: string - additionalProperties: true type: object title: Tool Choice description: >- Controls which (if any) tool the model should use. Can be 'none', 'auto', 'required', or an object specifying a particular tool. Default is 'auto'. default: auto tools: allOf: - items: additionalProperties: true type: object type: array title: Tools description: >- A list of tools the model may call. Each tool is defined with a type and function specification following the OpenAI tool format. Supports 'function', 'mcp', 'sse', and 'python' tool types. top_p: allOf: - type: number maximum: 1 minimum: 0 title: Top P description: >- An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0. default: 1 truncation: allOf: - type: string title: Truncation description: >- The truncation strategy to use for the context. Can be 'auto' or 'disabled'. Default is 'disabled'. default: disabled user: allOf: - anyOf: - type: string - type: 'null' title: User description: >- A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse. metadata: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Metadata description: >- Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information about the response in a structured format. title: Response description: >- Represents a response object returned from the API. A response includes the model output, token usage, configuration parameters, and metadata about the conversation state. refIdentifier: '#/components/schemas/Response' requiredProperties: - created_at - status - model - output examples: example: value: id: object: response created_at: 123 status: model: output: - id: type: message role: content: - type: text: status: previous_response_id: usage: {} error: {} incomplete_details: {} instructions: max_output_tokens: 123 max_tool_calls: 2 parallel_tool_calls: true reasoning: {} store: true temperature: 1 text: {} tool_choice: tools: - {} top_p: 1 truncation: disabled user: metadata: {} description: Successful Response '422': application/json: schemaArray: - type: object properties: detail: allOf: - items: $ref: '#/components/schemas/ValidationError' type: array title: Detail title: HTTPValidationError refIdentifier: '#/components/schemas/HTTPValidationError' examples: example: value: detail: - loc: - msg: type: description: Validation Error deprecated: false type: path components: schemas: Message: properties: id: type: string title: Id description: The unique identifier of the message. type: type: string title: Type description: The object type, always 'message'. default: message role: type: string title: Role description: >- The role of the message sender. Can be 'user', 'assistant', or 'system'. content: items: $ref: '#/components/schemas/MessageContent' type: array title: Content description: >- An array of content parts that make up the message. Each part has a type and associated data. status: type: string title: Status description: The status of the message. Can be 'in_progress' or 'completed'. type: object required: - id - role - content - status title: Message description: Represents a message in a conversation. MessageContent: properties: type: type: string title: Type description: >- The type of the content part. Can be 'input_text', 'output_text', 'image', etc. text: anyOf: - type: string - type: 'null' title: Text description: The text content, if applicable. type: object required: - type title: MessageContent description: Represents a piece of content within a message. ToolCall: properties: id: type: string title: Id description: The unique identifier of the tool call. type: type: string title: Type description: The type of tool call. Can be 'function', 'tool_call', or 'mcp'. function: anyOf: - additionalProperties: true type: object - type: 'null' title: Function description: >- The function definition for function tool calls. Contains 'name' and 'arguments' keys. mcp: anyOf: - additionalProperties: true type: object - type: 'null' title: Mcp description: >- The MCP (Model Context Protocol) tool call definition for MCP tool calls. type: object required: - id - type title: ToolCall description: Represents a tool call made by the model. ToolOutput: properties: type: type: string title: Type description: The object type, always 'tool_output'. default: tool_output tool_call_id: type: string title: Tool Call Id description: The ID of the tool call that this output corresponds to. output: type: string title: Output description: The output content from the tool execution. type: object required: - tool_call_id - output title: ToolOutput description: Represents the output/result of a tool call. ValidationError: properties: loc: items: anyOf: - type: string - type: integer type: array title: Location msg: type: string title: Message type: type: string title: Error Type type: object required: - loc - msg - type title: ValidationError ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-secret.md # Source: https://docs.fireworks.ai/api-reference/get-secret.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-secret.md # Source: https://docs.fireworks.ai/api-reference/get-secret.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-secret.md # Source: https://docs.fireworks.ai/api-reference/get-secret.md # Get Secret > Retrieves a secret by name. Note that the `value` field is not returned in the response for security reasons. Only the `name` and `key_name` fields are included. ## OpenAPI ````yaml get /v1/accounts/{account_id}/secrets/{secret_id} paths: path: /v1/accounts/{account_id}/secrets/{secret_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id secret_id: schema: - type: string required: true description: The Secret Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: |- name follows the convention accounts/account-id/secrets/unkey-key-id keyName: allOf: - type: string title: >- name of the key. In this case, it can be WOLFRAM_ALPHA_API_KEY value: allOf: - type: string example: sk-1234567890abcdef description: >- The secret value. This field is INPUT_ONLY and will not be returned in GET or LIST responses for security reasons. The value is only accepted when creating or updating secrets. refIdentifier: '#/components/schemas/gatewaySecret' requiredProperties: - name - keyName examples: example: value: name: keyName: value: sk-1234567890abcdef description: A successful response. deprecated: false type: path components: schemas: {} ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/get-snapshot.md # Get Snapshot ## OpenAPI ````yaml get /v1/accounts/{account_id}/snapshots/{snapshot_id} paths: path: /v1/accounts/{account_id}/snapshots/{snapshot_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id snapshot_id: schema: - type: string required: true description: The Snapshot Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the snapshot. e.g. accounts/my-account/clusters/my-cluster/environments/my-env/snapshots/1 readOnly: true createTime: allOf: - type: string format: date-time description: The creation time of the snapshot. readOnly: true state: allOf: - $ref: '#/components/schemas/gatewaySnapshotState' description: The state of the snapshot. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: The status code and message of the snapshot. readOnly: true imageRef: allOf: - type: string description: The URI of the container image for this snapshot. readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the snapshot. readOnly: true title: 'Next ID: 7' refIdentifier: '#/components/schemas/gatewaySnapshot' examples: example: value: name: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: imageRef: updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewaySnapshotState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - FAILED - DELETING default: STATE_UNSPECIFIED gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/get-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/get-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-supervised-fine-tuning-job.md # Source: https://docs.fireworks.ai/api-reference/get-supervised-fine-tuning-job.md # Get Supervised Fine-tuning Job ## OpenAPI ````yaml get /v1/accounts/{account_id}/supervisedFineTuningJobs/{supervised_fine_tuning_job_id} paths: path: >- /v1/accounts/{account_id}/supervisedFineTuningJobs/{supervised_fine_tuning_job_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id supervised_fine_tuning_job_id: schema: - type: string required: true description: The Supervised Fine-tuning Job Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string readOnly: true displayName: allOf: - type: string createTime: allOf: - type: string format: date-time readOnly: true completedTime: allOf: - type: string format: date-time readOnly: true dataset: allOf: - type: string description: The name of the dataset used for training. state: allOf: - $ref: '#/components/schemas/gatewayJobState' readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: allOf: - type: string description: >- The email address of the user who initiated this fine-tuning job. readOnly: true outputModel: allOf: - type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: allOf: - type: string description: >- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: allOf: - type: string description: >- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: allOf: - type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration earlyStop: allOf: - type: boolean description: >- Whether to stop training early if the validation loss does not improve. epochs: allOf: - type: integer format: int32 description: The number of epochs to train for. learningRate: allOf: - type: number format: float description: The learning rate used for training. maxContextLength: allOf: - type: integer format: int32 description: The maximum context length to use with the model. loraRank: allOf: - type: integer format: int32 description: The rank of the LoRA layers. wandbConfig: allOf: - $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. evaluationDataset: allOf: - type: string description: The name of a separate dataset to use for evaluation. isTurbo: allOf: - type: boolean description: Whether to run the fine-tuning job in turbo mode. evalAutoCarveout: allOf: - type: boolean description: Whether to auto-carve the dataset for eval. region: allOf: - $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. updateTime: allOf: - type: string format: date-time description: The update time for the supervised fine-tuning job. readOnly: true nodes: allOf: - type: integer format: int32 description: The number of nodes to use for the fine-tuning job. batchSize: allOf: - type: integer format: int32 title: The batch size for sequence packing in training mtpEnabled: allOf: - type: boolean title: Whether to enable MTP (Model-Token-Prediction) mode mtpNumDraftTokens: allOf: - type: integer format: int32 title: Number of draft tokens to use in MTP mode mtpFreezeBaseModel: allOf: - type: boolean title: >- Whether to freeze the base model parameters during MTP training hiddenStatesGenConfig: allOf: - $ref: '#/components/schemas/gatewayHiddenStatesGenConfig' description: >- Config for generating dataset with hidden states for training. metricsFileSignedUrl: allOf: - type: string title: The signed URL for the metrics file gradientAccumulationSteps: allOf: - type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: allOf: - type: integer format: int32 title: Number of steps for learning rate warm up title: 'Next ID: 42' refIdentifier: '#/components/schemas/gatewaySupervisedFineTuningJob' requiredProperties: - dataset examples: example: value: name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: outputModel: baseModel: warmStartFrom: jinjaTemplate: earlyStop: true epochs: 123 learningRate: 123 maxContextLength: 123 loraRank: 123 wandbConfig: enabled: true apiKey: project: entity: runId: url: evaluationDataset: isTurbo: true evalAutoCarveout: true region: REGION_UNSPECIFIED updateTime: '2023-11-07T05:31:56Z' nodes: 123 batchSize: 123 mtpEnabled: true mtpNumDraftTokens: 123 mtpFreezeBaseModel: true hiddenStatesGenConfig: deployedModel: maxWorkers: 123 maxTokens: 123 inputOffset: 123 inputLimit: 123 maxContextLen: 123 regenerateAssistant: true outputActivations: true apiKey: metricsFileSignedUrl: gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayHiddenStatesGenConfig: type: object properties: deployedModel: type: string maxWorkers: type: integer format: int32 maxTokens: type: integer format: int32 inputOffset: type: integer format: int32 inputLimit: type: integer format: int32 maxContextLen: type: integer format: int32 regenerateAssistant: type: boolean outputActivations: type: boolean apiKey: type: string description: >- Config for generating dataset with hidden states for SFTJ or eagle training. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-user.md # Source: https://docs.fireworks.ai/api-reference/get-user.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-user.md # Source: https://docs.fireworks.ai/api-reference/get-user.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/get-user.md # Source: https://docs.fireworks.ai/api-reference/get-user.md # Get User ## OpenAPI ````yaml get /v1/accounts/{account_id}/users/{user_id} paths: path: /v1/accounts/{account_id}/users/{user_id} method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id user_id: schema: - type: string required: true description: The User Id query: readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: name: allOf: - type: string title: >- The resource name of the user. e.g. accounts/my-account/users/my-user readOnly: true displayName: allOf: - type: string description: |- Human-readable display name of the user. e.g. "Alice" Must be fewer than 64 characters long. serviceAccount: allOf: - type: boolean title: >- Whether this user is a service account (can only be set by admins) createTime: allOf: - type: string format: date-time description: The creation time of the user. readOnly: true role: allOf: - type: string description: The user's role, e.g. admin or user. email: allOf: - type: string description: The user's email address. state: allOf: - $ref: '#/components/schemas/gatewayUserState' description: The state of the user. readOnly: true status: allOf: - $ref: '#/components/schemas/gatewayStatus' description: Contains information about the user status. readOnly: true updateTime: allOf: - type: string format: date-time description: The update time for the user. readOnly: true title: 'Next ID: 13' refIdentifier: '#/components/schemas/gatewayUser' requiredProperties: - role examples: example: value: name: displayName: serviceAccount: true createTime: '2023-11-07T05:31:56Z' role: email: state: STATE_UNSPECIFIED status: code: OK message: updateTime: '2023-11-07T05:31:56Z' description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayUserState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - UPDATING - DELETING default: STATE_UNSPECIFIED ```` --- # Source: https://docs.fireworks.ai/faq-new/account-access/how-do-i-close-my-fireworksai-account.md # How do I close my Fireworks.ai account? To close your account: 1. Email [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai) 2. Include in your request: * Your account ID * A clear request for account deletion Before closing your account, please ensure: * All outstanding invoices are paid * Any active deployments are terminated * Important data is backed up if needed --- # Source: https://docs.fireworks.ai/faq-new/models-inference/how-do-i-control-output-image-sizes-when-using-sdxl-controlnet.md # How do I control output image sizes when using SDXL ControlNet? When using **SDXL ControlNet** (e.g., canny control), the output image size is determined by the explicit **width** and **height** parameters in your API request: The input control signal image will be automatically: * **Resized** to fit your specified dimensions * **Cropped** to preserve aspect ratio **Example**: To generate a 768x1344 image, explicitly include these parameters in your request: ```json theme={null} { "width": 768, "height": 1344 } ``` *Note*: While these parameters may not appear in the web interface examples, they are supported API parameters that can be included in your requests. --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/how-does-autoscaling-affect-my-costs.md # How does autoscaling affect my costs? * **Scaling from 0**: No minimum cost when scaled to zero * **Scaling up**: Each new replica adds to your total cost proportionally. For example: * Scaling from 1 to 2 replicas doubles your GPU costs * If each replica uses multiple GPUs, costs scale accordingly (e.g., scaling from 1 to 2 replicas with 2 GPUs each means paying for 4 GPUs total) For current pricing details, please visit our [pricing page](https://fireworks.ai/pricing). --- # Source: https://docs.fireworks.ai/faq-new/billing-pricing/how-does-billing-and-credit-usage-work.md # How does billing and credit usage work? Usage and billing operate through a **tiered system**: * Each **tier** has a monthly usage limit, regardless of available credits. * Once you reach your tier's limit, **service will be suspended** even if you have remaining credits. * **Usage limits** reset at the beginning of each month. * Pre-purchased credits do not prevent additional charges once the limit is exceeded. For detailed information about spend limits, tiers, and how to manage them, see our [Rate Limits & Quotas guide](/guides/quotas_usage/rate-limits#spend-limits). --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/how-does-billing-and-scaling-work-for-on-demand-gpu-deployments.md # How does billing and scaling work for on-demand GPU deployments? On-demand GPU deployments have unique billing and scaling characteristics compared to serverless deployments: **Billing**: * Charges start when the server begins accepting requests * **Billed by GPU-second** for each active instance * Costs accumulate even if there are no active API calls **Scaling options**: * Supports **autoscaling** from 0 to multiple GPUs * Each additional GPU **adds to the billing rate** * Can handle unlimited requests within the GPU’s capacity **Management requirements**: * Not fully serverless; requires some manual management * **Manually delete deployments** when no longer needed * Or configure autoscaling to **scale down to 0** during inactive periods **Cost control tips**: * Regularly **monitor active deployments** * **Delete unused deployments** to avoid unnecessary costs * Consider **serverless options** for intermittent usage * Use **autoscaling to 0** to optimize costs during low-demand times --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/how-does-billing-work-for-on-demand-deployments.md # How does billing work for on-demand deployments? On-demand deployments come with automatic cost optimization features: * **Default autoscaling**: Automatically scales to 0 replicas when not in use * **Pay for what you use**: Charged only for GPU time when replicas are active * **Flexible configuration**: Customize autoscaling behavior to match your needs **Best practices for cost management**: 1. **Leverage default autoscaling**: The system automatically scales down deployments when not in use 2. **Customize carefully**: While you can modify autoscaling behavior using our [configuration options](https://docs.fireworks.ai/guides/ondemand-deployments#customizing-autoscaling-behavior), note that preventing scale-to-zero will result in continuous GPU charges 3. **Consider your use case**: For intermittent or low-frequency usage, serverless deployments might be more cost-effective For detailed configuration options, see our [deployment guide](https://docs.fireworks.ai/guides/ondemand-deployments#replica-count-horizontal-scaling). --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/how-does-the-system-scale.md # How does the system scale? Our system is **horizontally scalable**, meaning it: * Scales linearly with additional **replicas** of the deployment * **Automatically allocates resources** based on demand * Manages **distributed load handling** efficiently --- # Source: https://docs.fireworks.ai/faq-new/billing-pricing/how-many-tokens-per-image.md # How many tokens per image? > Learn how to calculate token usage for images in vision models and understand pricing implications Image token consumption varies by model and resolution, typically ranging from 1,000 to 2,500 tokens per image for most common resolutions. ## Common resolution token counts The following table shows the token counts for a single image for Qwen2.5 VL at different image resolutions: | Resolution | Token Count | | ---------- | ----------- | | 336×336 | 144 | | 672×672 | 576 | | 1024×1024 | 1,369 | | 1280×720 | 1,196 | | 1920×1080 | 2,769 | | 2560×1440 | 4,641 | | 3840×2160 | 10,549 | ## Calculating exact token count for your images You can determine exact token usage by processing your images through the model's tokenizer. For instance, for Qwen2.5 VL, you can use the following code: ```bash theme={null} pip install torch torchvision transformers pillow ``` ```python Tokenizing your image theme={null} import requests from PIL import Image from transformers import AutoProcessor import os # Your image source - can be URL or local path IMAGE_URL_OR_PATH = "https://images.unsplash.com/photo-1519125323398-675f0ddb6308" def load_image(source): """Load image from URL or local file path""" if source.startswith(('http://', 'https://')): print(f"Downloading image from URL: {source}") response = requests.get(source) response.raise_for_status() return Image.open(requests.get(source, stream=True).raw) else: print(f"Loading image from path: {source}") if not os.path.exists(source): raise FileNotFoundError(f"Image file not found: {source}") return Image.open(source) def count_image_tokens(image): """Count how many tokens an image takes using Qwen 2.5 VL processor""" processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "What's in this image?"}, ], } ] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=text, images=[image], return_tensors="pt") input_ids = inputs["input_ids"][0] # Count the image pad tokens (151655 is Qwen2.5 VL's image token ID) image_tokens = (input_ids == 151655).sum().item() return image_tokens, input_ids def main(): import sys image_source = sys.argv[1] if len(sys.argv) > 1 else IMAGE_URL_OR_PATH print(f"Processing image: {image_source}") image = load_image(image_source) print(f"Image size: {image.size}") print(f"Image mode: {image.mode}") print("\nCalculating tokens...") image_tokens, input_ids = count_image_tokens(image) print(f"Total tokens: {len(input_ids)}") print(f"Image tokens: {image_tokens}") print(f"Text tokens: {len(input_ids) - image_tokens}") if __name__ == "__main__": main() ``` ```bash Usage theme={null} # Calculate tokens for an image URL python token_calculator.py "https://example.com/image.jpg" # Calculate tokens for a local image python token_calculator.py "path/to/your/image.png" ``` --- # Source: https://docs.fireworks.ai/faq-new/billing-pricing/how-much-does-fireworks-cost.md # How much does Fireworks cost? Fireworks AI operates on a **pay-as-you-go** model for all non-Enterprise usage, and new users automatically receive free credits. You pay based on: * **Per token** for serverless inference * **Per GPU usage time** for on-demand deployments * **Per token of training data** for fine-tuning For customers needing **enterprise-grade security and reliability**, please reach out to us at [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai) to discuss options. Find out more about our current pricing on our [Pricing page](https://fireworks.ai/pricing). --- # Source: https://docs.fireworks.ai/fine-tuning/how-rft-works.md # Basics > Understand the reinforcement learning fundamentals behind RFT ## What is reinforcement fine-tuning? In traditional supervised fine-tuning, you provide a dataset with labeled examples showing exactly what the model should output. In reinforcement fine-tuning, you instead provide: 1. **A dataset**: Prompts, with input examples for the model to respond to 2. **An evaluator**: Code that scores the model's outputs from 0.0 (bad) to 1.0 (good), also known as a reward function 3. **An agent**: An LLM application, with access to tools, APIs, and data needed for your task During training, the model generates responses to each prompt, receives scores from your reward function, and produces outputs that maximize the reward. ## Use cases Reinforcement fine-tuning helps you train models to excel at: * **Code generation and analysis** - Writing and debugging functions with verifiable execution results or test outcomes * **Structured output generation** - JSON formatting, data extraction, classification, and schema compliance with programmatic validation * **Domain-specific reasoning** - Legal analysis, financial modeling, or medical triage with verifiable criteria and compliance checks * **Tool-using agents** - Multi-step workflows where agents call external APIs with measurable success criteria ## How it works Define how you'll score model outputs from 0 to 1. For example, scoring outputs higherchecking if your agent called the right tools, or if your LLM-as-judge rates the output highly. Create a JSONL file with prompts (system and user messages). These will be used to generate rollouts during training. Train locally, or connect your agent as a remote server to Fireworks with our /init and /status endpoints. Create an RFT job via the UI or CLI. Fireworks orchestrates rollouts, evaluates them, and trains the model to maximize reward. Once training completes, deploy your fine-tuned LoRA model to production with an on-demand deployment. ### RFT works best when: 1. You can determine whether a model's output is "good" or "bad," even if only approximately 2. You have prompts but lack perfect "golden" completions to learn from 3. The task requires multi-step reasoning where evaluating intermediate steps is hard 4. You want the model to explore creative solutions beyond your training examples ## Next steps Learn how to design effective reward functions Learn how to launch and configure RFT jobs --- # Source: https://docs.fireworks.ai/faq-new/models-inference/how-to-check-if-a-model-is-available-on-serverless.md # How to check if a model is available on serverless? ## Web UI Go to [https://app.fireworks.ai/models?filter=LLM\&serverless=true](https://app.fireworks.ai/models?filter=LLM\&serverless=true) ## Programmatically You can use the [`is_available_on_serverless`](/tools-sdks/python-client/sdk-reference#is-available-on-serverless) method on the [LLM](/tools-sdks/python-client/sdk-reference#llm) object in our [Build SDK](/tools-sdks/python-client/sdk-introduction) to check if a model is available on serverless. ```python theme={null} llm = LLM(model="llama4-maverick-instruct-basic", deployment_type="auto") print(llm.is_available_on_serverless()) # True llm = LLM(model="qwen2p5-7b-instruct", deployment_type="auto") # Error will be raised saying: "LLM(id=...) must be provided when deployment_strategy is on-demand" # Which means the model is not available on serverless if the # deployment_strategy was resolved as "on-demand" when the deployment_type was # "auto" ``` --- # Source: https://docs.fireworks.ai/faq-new/account-access/i-have-multiple-fireworks-accounts-when-i-try-to-login-with-google-on-fireworks.md # I have multiple Fireworks accounts. When I try to login with Google on Fireworks' web UI, I'm getting signed into the wrong account. How do I fix this? If you log in with Google, account management is controlled by Google. You can log in through an incognito mode or create separate Chrome/browser profiles to log in with different Google accounts. You could also follow the steps in this [guide](https://support.google.com/accounts/answer/13533235?hl=en#zippy=%2Csign-in-with-google) to disassociate Fireworks.ai with a particular Google account sign-in. If you have more complex issues please contact us on Discord. --- # Source: https://docs.fireworks.ai/guides/inference-error-codes.md # Inference Error Codes > Common error codes, their meanings, and resolutions for inference requests Understanding error codes helps you quickly diagnose and resolve issues when making inference requests to the Fireworks API. ## Common error codes | **Code** | **Error Name** | **Possible Issue(s)** | **How to Resolve** | | -------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `400` | Bad Request | Invalid input or malformed request. | Review the request parameters and ensure they match the expected format. | | `401` | Unauthorized | Invalid API key or insufficient permissions. | Verify your API key and ensure it has the correct permissions. | | `402` | Payment Required | Account is not on a paid plan or has exceeded usage limits. | Check your billing status and ensure your payment method is up to date. Upgrade your plan if necessary. | | `403` | Forbidden | Authentication issues. | Verify you have the correct API key. | | `404` | Not Found | The API endpoint path doesn't exist, the model doesn't exist, the model is not deployed, or you don't have permission to access it. | Verify the URL path in your request and ensure you are using the correct API endpoint. Check if the model exists and is available. Ensure you have the necessary permissions. | | `405` | Method Not Allowed | Using an unsupported HTTP method (e.g., using GET instead of POST). | Check the API documentation for the correct HTTP method. | | `408` | Request Timeout | The request took too long to complete, possibly due to server overload or network issues. | Retry the request after a brief wait. Consider increasing the timeout value if applicable. | | `412` | Precondition Failed | Account is suspended or there's an issue with account status. This error also occurs when attempting to invoke a LoRA model that failed to load. | Check your account status and billing information. For LoRA models, ensure the model was uploaded correctly and is compatible. Contact support if the issue persists. | | `413` | Payload Too Large | Input data exceeds the allowed size limit. | Reduce the size of the input payload (e.g., by trimming large text or image data). | | `429` | Over Quota | You've reached the API rate limit. | Wait for the quota to reset or upgrade your plan for a higher rate limit. See [Rate Limits & Quotas](/guides/quotas_usage/rate-limits). | | `500` | Internal Server Error | Server-side code bug that is unlikely to resolve on its own. | Contact Fireworks support immediately, as this error typically requires intervention from the engineering team. | | `502` | Bad Gateway | The server received an invalid response from an upstream server. | Wait and retry the request. If the error persists, it may indicate a server outage. | | `503` | Service Unavailable | The service is down for maintenance or experiencing issues. | Retry the request after some time. Check the [status page](https://status.fireworks.ai) for maintenance announcements. | | `504` | Gateway Timeout | The server did not receive a response in time from an upstream server. | Wait briefly and retry the request. Consider using a shorter input prompt if applicable. | | `520` | Unknown Error | An unexpected error occurred with no clear explanation. | Retry the request. If the issue persists, contact support for further assistance. | ## Troubleshooting tips If you encounter an error not listed here: * Review the API documentation for the correct usage of endpoints and parameters * Check the [Fireworks status page](https://status.fireworks.ai) for any ongoing service disruptions * Contact support at [support@fireworks.ai](mailto:support@fireworks.ai) or join our [Discord](https://discord.gg/fireworks-ai) Enable detailed error logging in your application to capture the full error response, including error messages and request IDs, which helps with debugging. --- # Source: https://docs.fireworks.ai/ecosystem/integrations.md # Cloud Integrations > Cloud Integrations ## Cloud Deployments Deploy Fireworks models on AWS SageMaker Run Fireworks on Amazon Elastic Kubernetes Service Deploy using Amazon Elastic Container Service Build and deploy AI agents with AgentCore ## Need Help? For assistance with cloud deployments or custom integrations, [contact our team](https://fireworks.ai/contact). --- # Source: https://docs.fireworks.ai/getting-started/introduction.md # Source: https://docs.fireworks.ai/examples/introduction.md # Source: https://docs.fireworks.ai/api-reference/introduction.md # Source: https://docs.fireworks.ai/getting-started/introduction.md # Source: https://docs.fireworks.ai/examples/introduction.md # Source: https://docs.fireworks.ai/api-reference/introduction.md # Source: https://docs.fireworks.ai/getting-started/introduction.md # Source: https://docs.fireworks.ai/examples/introduction.md # Source: https://docs.fireworks.ai/api-reference/introduction.md # Source: https://docs.fireworks.ai/getting-started/introduction.md # Source: https://docs.fireworks.ai/examples/introduction.md # Source: https://docs.fireworks.ai/api-reference/introduction.md # Introduction Fireworks AI REST API enables you to interact with various language, image and embedding models using an API Key. It also lets you automate management of models, deployments, datasets, and more. ## Authentication All requests made to the Fireworks AI REST API must include an `Authorization` header with a valid `Bearer` token using your API key, along with the `Content-Type: application/json` header. ### Getting your API key You can obtain an API key by: * Using the [`firectl create api-key`](/tools-sdks/firectl/commands/create-api-key) command * Generating one through the [Fireworks AI dashboard](https://app.fireworks.ai/settings/users/api-keys) ### Request headers Include the following headers in your REST API requests: ```json theme={null} authorization: Bearer content-type: application/json ``` --- # Source: https://docs.fireworks.ai/faq-new/deployment-infrastructure/is-latency-guaranteed-for-serverless-models.md # Are there SLAs for serverless? Our multi-tenant serverless offering does not currently come with Service Level Agreements (SLAs) for latency or availability. If you have specific performance or availability requirements, we recommend: * **On-demand deployments**: Provides dedicated resources with predictable performance * **Contact sales**: [Reach out to discuss](https://fireworks.ai/company/contact-us) custom solutions and enterprise options --- # Source: https://docs.fireworks.ai/faq-new/billing-pricing/is-prompt-caching-billed-differently.md # Is prompt caching billed differently for serverless models? No, **prompt caching does not affect billing for serverless models**. You are charged the same amount regardless of whether your request benefits from prompt caching or not. --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-accounts.md # Source: https://docs.fireworks.ai/api-reference/list-accounts.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-accounts.md # Source: https://docs.fireworks.ai/api-reference/list-accounts.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-accounts.md # Source: https://docs.fireworks.ai/api-reference/list-accounts.md # List Accounts ## OpenAPI ````yaml get /v1/accounts paths: path: /v1/accounts method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: {} query: pageSize: schema: - type: integer required: false description: >- The maximum number of accounts to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListAccounts call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListAccounts must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only accounts satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: |- Not supported. Accounts will be returned ordered by `name`. readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: accounts: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayAccount' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of accounts. refIdentifier: '#/components/schemas/gatewayListAccountsResponse' examples: example: value: accounts: - name: displayName: createTime: '2023-11-07T05:31:56Z' email: state: STATE_UNSPECIFIED status: code: OK message: suspendState: UNSUSPENDED updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: AccountSuspendState: type: string enum: - UNSUSPENDED - FAILED_PAYMENTS - CREDIT_DEPLETED - MONTHLY_SPEND_LIMIT_EXCEEDED - BLOCKED_BY_ABUSE_RULE default: UNSUSPENDED gatewayAccount: type: object properties: name: type: string title: The resource name of the account. e.g. accounts/my-account readOnly: true displayName: type: string description: |- Human-readable display name of the account. e.g. "My Account" Must be fewer than 64 characters long. createTime: type: string format: date-time description: The creation time of the account. readOnly: true email: type: string description: >- For developer accounts, this is the email of the developer user and is immutable. For ENTERPRISE and BUSINESS accounts, this is mutable and it is the email that will recieve the invoice for the account if automated billing is used. state: $ref: '#/components/schemas/gatewayAccountState' description: The state of the account. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: Contains information about the account status. readOnly: true suspendState: $ref: '#/components/schemas/AccountSuspendState' readOnly: true updateTime: type: string format: date-time description: The update time for the account. readOnly: true title: 'Next ID: 25' required: - email gatewayAccountState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - UPDATING - DELETING default: STATE_UNSPECIFIED gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-api-key.md # firectl list api-key > Prints all API keys for the signed in user. ``` firectl list api-key [flags] ``` ### Examples ``` firectl list api-keys ``` ### Flags ``` --all-users Admin only: list API keys for all users in the account -h, --help help for api-key ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/api-reference/list-api-keys.md # List API Keys ## OpenAPI ````yaml get /v1/accounts/{account_id}/users/{user_id}/apiKeys paths: path: /v1/accounts/{account_id}/users/{user_id}/apiKeys method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id user_id: schema: - type: string required: true description: The User Id query: pageSize: schema: - type: integer required: false description: >- Number of API keys to return in the response. Pagination support to be added. pageToken: schema: - type: string required: false description: >- Token for fetching the next page of results. Pagination support to be added. filter: schema: - type: string required: false description: Field for filtering results. orderBy: schema: - type: string required: false description: Field for ordering results. readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: apiKeys: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayApiKey' description: List of API keys retrieved. nextPageToken: allOf: - type: string title: >- Token for fetching the next page of results. Pagination support to be added. TODO: Implement pagination totalSize: allOf: - type: integer format: int32 description: The total number of API keys. refIdentifier: '#/components/schemas/gatewayListApiKeysResponse' examples: example: value: apiKeys: - keyId: displayName: key: createTime: '2023-11-07T05:31:56Z' secure: true email: prefix: expireTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayApiKey: type: object properties: keyId: type: string description: >- Unique identifier (Key ID) for the API key, used primarily for deletion. readOnly: true displayName: type: string description: >- Display name for the API key, defaults to "default" if not specified. key: type: string description: >- The actual API key value, only available upon creation and not stored thereafter. readOnly: true createTime: type: string format: date-time description: Timestamp indicating when the API key was created. readOnly: true secure: type: boolean description: >- Indicates whether the plaintext value of the API key is unknown to Fireworks. If true, Fireworks does not know this API key's plaintext value. If false, Fireworks does know the plaintext value. readOnly: true email: type: string description: Email of the user who owns this API key. readOnly: true prefix: type: string title: The first few characters of the API key to visually identify it readOnly: true expireTime: type: string format: date-time description: >- Timestamp indicating when the API key will expire. If not set, the key never expires. ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/list-aws-iam-role-bindings.md # List Aws Iam Role Bindings ## OpenAPI ````yaml get /v1/accounts/{account_id}/awsIamRoleBindings paths: path: /v1/accounts/{account_id}/awsIamRoleBindings method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of bindings to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListAwsIamRoleBindings call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListAwsIamRoleBindings must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only bindings satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: awsIamRoleBindings: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayAwsIamRoleBinding' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of AWS IAM role bindings. refIdentifier: '#/components/schemas/gatewayListAwsIamRoleBindingsResponse' examples: example: value: awsIamRoleBindings: - accountId: createTime: '2023-11-07T05:31:56Z' principal: role: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayAwsIamRoleBinding: type: object properties: accountId: type: string description: The account ID that this binding is associated with. readOnly: true createTime: type: string format: date-time description: The creation time of the AWS IAM role binding. readOnly: true principal: type: string description: >- The principal that is allowed to assume the AWS IAM role. This must be the email address of the user. role: type: string description: The AWS IAM role ARN that is allowed to be assumed by the principal. required: - principal - role ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-batch-inference-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-batch-inference-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-batch-inference-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-batch-inference-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-batch-inference-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-batch-inference-jobs.md # List Batch Inference Jobs ## OpenAPI ````yaml get /v1/accounts/{account_id}/batchInferenceJobs paths: path: /v1/accounts/{account_id}/batchInferenceJobs method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of batch inference jobs to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListBatchInferenceJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListBatchInferenceJobs must match the call that provided the page token. filter: schema: - type: string required: false description: |- Only jobs satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "created_time". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: batchInferenceJobs: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayBatchInferenceJob' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of batch inference jobs. refIdentifier: '#/components/schemas/gatewayListBatchInferenceJobsResponse' examples: example: value: batchInferenceJobs: - name: displayName: createTime: '2023-11-07T05:31:56Z' createdBy: state: JOB_STATE_UNSPECIFIED status: code: OK message: model: inputDatasetId: outputDatasetId: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 updateTime: '2023-11-07T05:31:56Z' precision: PRECISION_UNSPECIFIED jobProgress: percent: 123 epoch: 123 totalInputRequests: 123 totalProcessedRequests: 123 successfullyProcessedRequests: 123 failedRequests: 123 outputRows: 123 inputTokens: 123 outputTokens: 123 cachedInputTokenCount: 123 continuedFromJobName: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve gatewayBatchInferenceJob: type: object properties: name: type: string title: >- The resource name of the batch inference job. e.g. accounts/my-account/batchInferenceJobs/my-batch-inference-job readOnly: true displayName: type: string title: >- Human-readable display name of the batch inference job. e.g. "My Batch Inference Job" createTime: type: string format: date-time description: The creation time of the batch inference job. readOnly: true createdBy: type: string description: >- The email address of the user who initiated this batch inference job. readOnly: true state: $ref: '#/components/schemas/gatewayJobState' description: JobState represents the state an asynchronous job can be in. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true model: type: string description: >- The name of the model to use for inference. This is required, except when continued_from_job_name is specified. inputDatasetId: type: string description: >- The name of the dataset used for inference. This is required, except when continued_from_job_name is specified. outputDatasetId: type: string description: >- The name of the dataset used for storing the results. This will also contain the error file. inferenceParameters: $ref: '#/components/schemas/gatewayInferenceParameters' description: Parameters controlling the inference process. updateTime: type: string format: date-time description: The update time for the batch inference job. readOnly: true precision: $ref: '#/components/schemas/DeploymentPrecision' description: >- The precision with which the model should be served. If PRECISION_UNSPECIFIED, a default will be chosen based on the model. jobProgress: $ref: '#/components/schemas/gatewayJobProgress' description: Job progress. readOnly: true continuedFromJobName: type: string description: >- The resource name of the batch inference job that this job continues from. Used for lineage tracking to understand job continuation chains. title: 'Next ID: 31' gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayInferenceParameters: type: object properties: maxTokens: type: integer format: int32 description: Maximum number of tokens to generate per response. temperature: type: number format: float description: Sampling temperature, typically between 0 and 2. topP: type: number format: float description: Top-p sampling parameter, typically between 0 and 1. 'n': type: integer format: int32 description: Number of response candidates to generate per input. extraBody: type: string description: |- Additional parameters for the inference request as a JSON string. For example: "{\"stop\": [\"\\n\"]}". topK: type: integer format: int32 description: >- Top-k sampling parameter, limits the token selection to the top k tokens. description: Parameters for the inference requests. gatewayJobProgress: type: object properties: percent: type: integer format: int32 description: Progress percent, within the range from 0 to 100. epoch: type: integer format: int32 description: >- The epoch for which the progress percent is reported, usually starting from 0. This is optional for jobs that don't run in an epoch fasion, e.g. BIJ, EVJ. totalInputRequests: type: integer format: int32 description: Total number of input requests/rows in the job. totalProcessedRequests: type: integer format: int32 description: >- Total number of requests that have been processed (successfully or failed). successfullyProcessedRequests: type: integer format: int32 description: Number of requests that were processed successfully. failedRequests: type: integer format: int32 description: Number of requests that failed to process. outputRows: type: integer format: int32 description: Number of output rows generated. inputTokens: type: integer format: int32 description: Total number of input tokens processed. outputTokens: type: integer format: int32 description: Total number of output tokens generated. cachedInputTokenCount: type: integer format: int32 description: The number of input tokens that hit the prompt cache. description: Progress of a job, e.g. RLOR, EVJ, BIJ etc. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/list-batch-jobs.md # List Batch Jobs ## OpenAPI ````yaml get /v1/accounts/{account_id}/batchJobs paths: path: /v1/accounts/{account_id}/batchJobs method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of batch jobs to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListBatchJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListBatchJobs must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only batch jobs satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "create_time desc". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: batchJobs: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayBatchJob' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of batch jobs. refIdentifier: '#/components/schemas/gatewayListBatchJobsResponse' examples: example: value: batchJobs: - name: displayName: createTime: '2023-11-07T05:31:56Z' startTime: '2023-11-07T05:31:56Z' endTime: '2023-11-07T05:31:56Z' createdBy: nodePoolId: environmentId: snapshotId: numRanks: 123 envVars: {} role: pythonExecutor: targetType: TARGET_TYPE_UNSPECIFIED target: args: - notebookExecutor: notebookFilename: shellExecutor: command: imageRef: annotations: {} state: STATE_UNSPECIFIED status: shared: true updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: PythonExecutorTargetType: type: string enum: - TARGET_TYPE_UNSPECIFIED - MODULE - FILENAME default: TARGET_TYPE_UNSPECIFIED description: |2- - MODULE: Runs a python module, i.e. passed as -m argument. - FILENAME: Runs a python file. gatewayBatchJob: type: object properties: name: type: string title: |- The resource name of the batch job. e.g. accounts/my-account/clusters/my-cluster/batchJobs/123456789 readOnly: true displayName: type: string description: |- Human-readable display name of the batch job. e.g. "My Batch Job" Must be fewer than 64 characters long. createTime: type: string format: date-time description: The creation time of the batch job. readOnly: true startTime: type: string format: date-time description: The time when the batch job started running. readOnly: true endTime: type: string format: date-time description: The time when the batch job completed, failed, or was cancelled. readOnly: true createdBy: type: string description: The email address of the user who created this batch job. readOnly: true nodePoolId: type: string title: >- The ID of the node pool that this batch job should use. e.g. my-node-pool environmentId: type: string description: >- The ID of the environment that this batch job should use. e.g. my-env If specified, image_ref must not be specified. snapshotId: type: string description: >- The ID of the snapshot used by this batch job. If specified, environment_id must be specified and image_ref must not be specified. numRanks: type: integer format: int32 description: |- For GPU node pools: one GPU per rank w/ host packing, for CPU node pools: one host per rank. envVars: type: object additionalProperties: type: string description: Environment variables to be passed during this job's execution. role: type: string description: |- The ARN of the AWS IAM role that the batch job should assume. If not specified, the connection will fall back to the node pool's node_role. pythonExecutor: $ref: '#/components/schemas/gatewayPythonExecutor' notebookExecutor: $ref: '#/components/schemas/gatewayNotebookExecutor' shellExecutor: $ref: '#/components/schemas/gatewayShellExecutor' imageRef: type: string description: >- The container image used by this job. If specified, environment_id and snapshot_id must not be specified. annotations: type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. state: $ref: '#/components/schemas/gatewayBatchJobState' description: The current state of the batch job. readOnly: true status: type: string description: Detailed information about the current status of the batch job. readOnly: true shared: type: boolean description: >- Whether the batch job is shared with all users in the account. This allows all users to update, delete, clone, and create environments using the batch job. updateTime: type: string format: date-time description: The update time for the batch job. readOnly: true title: 'Next ID: 22' required: - nodePoolId gatewayBatchJobState: type: string enum: - STATE_UNSPECIFIED - CREATING - QUEUED - PENDING - RUNNING - COMPLETED - FAILED - CANCELLING - CANCELLED - DELETING default: STATE_UNSPECIFIED description: |- - CREATING: The batch job is being created. - QUEUED: The batch job is in the queue and waiting to be scheduled. Currently unused. - PENDING: The batch job scheduled and is waiting for resource allocation. - RUNNING: The batch job is running. - COMPLETED: The batch job has finished successfully. - FAILED: The batch job has failed. - CANCELLING: The batch job is being cancelled. - CANCELLED: The batch job was cancelled. - DELETING: The batch job is being deleted. title: 'Next ID: 10' gatewayNotebookExecutor: type: object properties: notebookFilename: type: string description: Path to a notebook file to be executed. description: Execute a notebook file. required: - notebookFilename gatewayPythonExecutor: type: object properties: targetType: $ref: '#/components/schemas/PythonExecutorTargetType' description: The type of Python target to run. target: type: string description: A Python module or filename depending on TargetType. args: type: array items: type: string description: Command line arguments to pass to the Python process. description: Execute a Python process. required: - targetType - target gatewayShellExecutor: type: object properties: command: type: string title: Command we want to run for the shell script description: Execute a shell script. required: - command ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/list-clusters.md # List Clusters ## OpenAPI ````yaml get /v1/accounts/{account_id}/clusters paths: path: /v1/accounts/{account_id}/clusters method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of clusters to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListClusters call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListClusters must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only clusters satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: clusters: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayCluster' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of clusters. refIdentifier: '#/components/schemas/gatewayListClustersResponse' examples: example: value: clusters: - name: displayName: createTime: '2023-11-07T05:31:56Z' eksCluster: awsAccountId: fireworksManagerRole: region: clusterName: storageBucketName: metricWriterRole: loadBalancerControllerRole: workloadIdentityPoolProviderId: inferenceRole: fakeCluster: projectId: location: clusterName: state: STATE_UNSPECIFIED status: code: OK message: updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCluster: type: object properties: name: type: string title: >- The resource name of the cluster. e.g. accounts/my-account/clusters/my-cluster readOnly: true displayName: type: string description: |- Human-readable display name of the cluster. e.g. "My Cluster" Must be fewer than 64 characters long. createTime: type: string format: date-time description: The creation time of the cluster. readOnly: true eksCluster: $ref: '#/components/schemas/gatewayEksCluster' fakeCluster: $ref: '#/components/schemas/gatewayFakeCluster' state: $ref: '#/components/schemas/gatewayClusterState' description: The current state of the cluster. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: Detailed information about the current status of the cluster. readOnly: true updateTime: type: string format: date-time description: The update time for the cluster. readOnly: true title: 'Next ID: 15' gatewayClusterState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED default: STATE_UNSPECIFIED description: |2- - CREATING: The cluster is still being created. - READY: The cluster is ready to be used. - DELETING: The cluster is being deleted. - FAILED: Cluster is not operational. Consult 'status' for detailed messaging. Cluster needs to be deleted and re-created. gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEksCluster: type: object properties: awsAccountId: type: string description: The 12-digit AWS account ID where this cluster lives. fireworksManagerRole: type: string title: >- The IAM role ARN used to manage Fireworks resources on AWS. If not specified, the default is arn:aws:iam:::role/FireworksManagerRole region: type: string description: >- The AWS region where this cluster lives. See https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html for a list of available regions. clusterName: type: string description: The EKS cluster name. storageBucketName: type: string description: The S3 bucket name. metricWriterRole: type: string description: >- The IAM role ARN used by Google Managed Prometheus role that will write metrics to Fireworks managed Prometheus. The role must be assumable by the `system:serviceaccount:gmp-system:collector` service account on the EKS cluster. If not specified, no metrics will be written to GCP. loadBalancerControllerRole: type: string description: >- The IAM role ARN used by the EKS load balancer controller (i.e. the load balancer automatically created for the k8s gateway resource). If not specified, no gateway will be created. workloadIdentityPoolProviderId: type: string title: |- The ID of the GCP workload identity pool provider in the Fireworks project for this cluster. The pool ID is assumed to be "byoc-pool" inferenceRole: type: string description: The IAM role ARN used by the inference pods on the cluster. title: |- An Amazon Elastic Kubernetes Service cluster. Next ID: 16 required: - awsAccountId - region gatewayFakeCluster: type: object properties: projectId: type: string location: type: string clusterName: type: string title: A fake cluster using https://pkg.go.dev/k8s.io/client-go/kubernetes/fake gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-credit-redemptions.md # firectl list credit-redemptions > Lists credit code redemptions for the current account. ``` firectl list credit-redemptions [flags] ``` ### Examples ``` firectl list credit-redemptions ``` ### Flags ``` -h, --help help for credit-redemptions ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-datasets.md # Source: https://docs.fireworks.ai/api-reference/list-datasets.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-datasets.md # Source: https://docs.fireworks.ai/api-reference/list-datasets.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-datasets.md # Source: https://docs.fireworks.ai/api-reference/list-datasets.md # List Datasets ## OpenAPI ````yaml get /v1/accounts/{account_id}/datasets paths: path: /v1/accounts/{account_id}/datasets method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of datasets to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListDatasets call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDatasets must match the call that provided the page token. filter: schema: - type: string required: false description: |- Only model satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: datasets: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayDataset' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 title: The total number of datasets refIdentifier: '#/components/schemas/gatewayListDatasetsResponse' examples: example: value: datasets: - name: displayName: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: exampleCount: userUploaded: {} evaluationResult: evaluationJobId: transformed: sourceDatasetId: filter: originalFormat: FORMAT_UNSPECIFIED splitted: sourceDatasetId: evalProtocol: {} externalUrl: format: FORMAT_UNSPECIFIED createdBy: updateTime: '2023-11-07T05:31:56Z' sourceJobName: estimatedTokenCount: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: DatasetFormat: type: string enum: - FORMAT_UNSPECIFIED - CHAT - COMPLETION - RL default: FORMAT_UNSPECIFIED gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDataset: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true state: $ref: '#/components/schemas/gatewayDatasetState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true exampleCount: type: string format: int64 userUploaded: $ref: '#/components/schemas/gatewayUserUploaded' evaluationResult: $ref: '#/components/schemas/gatewayEvaluationResult' transformed: $ref: '#/components/schemas/gatewayTransformed' splitted: $ref: '#/components/schemas/gatewaySplitted' evalProtocol: $ref: '#/components/schemas/gatewayEvalProtocol' externalUrl: type: string title: The external URI of the dataset. e.g. gs://foo/bar/baz.jsonl format: $ref: '#/components/schemas/DatasetFormat' createdBy: type: string description: The email address of the user who initiated this fine-tuning job. readOnly: true updateTime: type: string format: date-time description: The update time for the dataset. readOnly: true sourceJobName: type: string description: >- The resource name of the job that created this dataset (e.g., batch inference job). Used for lineage tracking to understand dataset provenance. estimatedTokenCount: type: string format: int64 description: The estimated number of tokens in the dataset. readOnly: true title: 'Next ID: 23' gatewayDatasetState: type: string enum: - STATE_UNSPECIFIED - UPLOADING - READY default: STATE_UNSPECIFIED gatewayEvalProtocol: type: object gatewayEvaluationResult: type: object properties: evaluationJobId: type: string required: - evaluationJobId gatewaySplitted: type: object properties: sourceDatasetId: type: string required: - sourceDatasetId gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayTransformed: type: object properties: sourceDatasetId: type: string filter: type: string originalFormat: $ref: '#/components/schemas/DatasetFormat' required: - sourceDatasetId gatewayUserUploaded: type: object ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployed-models.md # Source: https://docs.fireworks.ai/api-reference/list-deployed-models.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployed-models.md # Source: https://docs.fireworks.ai/api-reference/list-deployed-models.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployed-models.md # Source: https://docs.fireworks.ai/api-reference/list-deployed-models.md # List LoRAs ## OpenAPI ````yaml get /v1/accounts/{account_id}/deployedModels paths: path: /v1/accounts/{account_id}/deployedModels method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of deployed models to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListDeployedModels call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDeployedModels must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only depoyed models satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: deployedModels: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayDeployedModel' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 title: The total number of deployed models refIdentifier: '#/components/schemas/gatewayListDeployedModelsResponse' examples: example: value: deployedModels: - name: displayName: description: createTime: '2023-11-07T05:31:56Z' model: deployment: default: true state: STATE_UNSPECIFIED serverless: true status: code: OK message: public: true updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDeployedModel: type: object properties: name: type: string title: >- The resource name. e.g. accounts/my-account/deployedModels/my-deployed-model readOnly: true displayName: type: string description: type: string description: Description of the resource. createTime: type: string format: date-time description: The creation time of the resource. readOnly: true model: type: string title: |- The resource name of the model to be deployed. e.g. accounts/my-account/models/my-model deployment: type: string description: The resource name of the base deployment the model is deployed to. default: type: boolean description: >- If true, this is the default target when querying this model without the `#` suffix. The first deployment a model is deployed to will have this field set to true. state: $ref: '#/components/schemas/gatewayDeployedModelState' description: The state of the deployed model. readOnly: true serverless: type: boolean title: True if the underlying deployment is managed by Fireworks status: $ref: '#/components/schemas/gatewayStatus' description: Contains model deploy/undeploy details. readOnly: true public: type: boolean description: If true, the deployed model will be publicly reachable. updateTime: type: string format: date-time description: The update time for the deployed model. readOnly: true title: 'Next ID: 20' gatewayDeployedModelState: type: string enum: - STATE_UNSPECIFIED - UNDEPLOYING - DEPLOYING - DEPLOYED - UPDATING default: STATE_UNSPECIFIED description: |- - UNDEPLOYING: The model is being undeployed. - DEPLOYING: The model is being deployed. - DEPLOYED: The model is deployed and ready for inference. - UPDATING: there are updates happening with the deployed model title: 'Next ID: 6' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployment-shape-versions.md # Source: https://docs.fireworks.ai/api-reference/list-deployment-shape-versions.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployment-shape-versions.md # Source: https://docs.fireworks.ai/api-reference/list-deployment-shape-versions.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployment-shape-versions.md # Source: https://docs.fireworks.ai/api-reference/list-deployment-shape-versions.md # List Deployment Shapes Versions ## OpenAPI ````yaml get /v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id}/versions paths: path: /v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id}/versions method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id deployment_shape_id: schema: - type: string required: true description: The Deployment Shape Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of deployment shape versions to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListDeploymentShapeVersions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDeploymentShapeVersions must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only deployment shape versions satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "create_time". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: deploymentShapeVersions: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayDeploymentShapeVersion' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of deployment shape versions. refIdentifier: '#/components/schemas/gatewayListDeploymentShapeVersionsResponse' examples: example: value: deploymentShapeVersions: - name: createTime: '2023-11-07T05:31:56Z' snapshot: name: displayName: description: createTime: '2023-11-07T05:31:56Z' updateTime: '2023-11-07T05:31:56Z' baseModel: modelType: parameterCount: acceleratorCount: 123 acceleratorType: ACCELERATOR_TYPE_UNSPECIFIED precision: PRECISION_UNSPECIFIED enableAddons: true draftTokenCount: 123 draftModel: ngramSpeculationLength: 123 enableSessionAffinity: true numLoraDeviceCached: 123 presetType: PRESET_TYPE_UNSPECIFIED validated: true public: true latestValidated: true nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve DeploymentShapePresetType: type: string enum: - PRESET_TYPE_UNSPECIFIED - MINIMAL - FAST - THROUGHPUT - REINFORCEMENT_FINE_TUNING default: PRESET_TYPE_UNSPECIFIED gatewayAcceleratorType: type: string enum: - ACCELERATOR_TYPE_UNSPECIFIED - NVIDIA_A100_80GB - NVIDIA_H100_80GB - AMD_MI300X_192GB - NVIDIA_A10G_24GB - NVIDIA_A100_40GB - NVIDIA_L4_24GB - NVIDIA_H200_141GB - NVIDIA_B200_180GB - AMD_MI325X_256GB default: ACCELERATOR_TYPE_UNSPECIFIED gatewayDeploymentShape: type: object properties: name: type: string title: >- The resource name of the deployment shape. e.g. accounts/my-account/deploymentShapes/my-deployment-shape readOnly: true displayName: type: string description: >- Human-readable display name of the deployment shape. e.g. "My Deployment Shape" Must be fewer than 64 characters long. description: type: string description: >- The description of the deployment shape. Must be fewer than 1000 characters long. createTime: type: string format: date-time description: The creation time of the deployment shape. readOnly: true updateTime: type: string format: date-time description: The update time for the deployment shape. readOnly: true baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b modelType: type: string description: The model type of the base model. readOnly: true parameterCount: type: string format: int64 description: The parameter count of the base model . readOnly: true acceleratorCount: type: integer format: int32 description: >- The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model. acceleratorType: $ref: '#/components/schemas/gatewayAcceleratorType' description: |- The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB. precision: $ref: '#/components/schemas/DeploymentPrecision' description: The precision with which the model should be served. enableAddons: type: boolean description: >- If true, LORA addons are enabled for deployments created from this shape. draftTokenCount: type: integer format: int32 description: |- The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. draftModel: type: string description: >- The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. this behavior. ngramSpeculationLength: type: integer format: int32 description: >- The length of previous input sequence to be considered for N-gram speculation. enableSessionAffinity: type: boolean description: Whether to apply sticky routing based on `user` field. numLoraDeviceCached: type: integer format: int32 title: How many LORA adapters to keep on GPU side for caching readOnly: true presetType: $ref: '#/components/schemas/DeploymentShapePresetType' description: Type of deployment shape for different deployment configurations. title: >- A deployment shape is a set of parameters that define the shape of a deployment. Deployments are created from a deployment shape. Next ID: 33 required: - baseModel gatewayDeploymentShapeVersion: type: object properties: name: type: string title: >- The resource name of the deployment shape version. e.g. accounts/my-account/deploymentShapes/my-deployment-shape/versions/{version_id} readOnly: true createTime: type: string format: date-time description: >- The creation time of the deployment shape version. Lists will be ordered by this field. readOnly: true snapshot: $ref: '#/components/schemas/gatewayDeploymentShape' description: Full snapshot of the Deployment Shape at this version. readOnly: true validated: type: boolean description: If true, this version has been validated. public: type: boolean description: If true, this version will be publicly readable. latestValidated: type: boolean description: |- If true, this version is the latest validated version. Only one version of the shape can be the latest validated version. readOnly: true title: >- A deployment shape version is a specific version of a deployment shape. Versions are immutable, only created on updates and deleted when the deployment shape is deleted. Next ID: 9 ```` --- # Source: https://docs.fireworks.ai/api-reference/list-deployment-shapes.md # List Deployment Shapes ## OpenAPI ````yaml get /v1/accounts/{account_id}/deploymentShapes openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/deploymentShapes: get: tags: - Gateway summary: List Deployment Shapes operationId: Gateway_ListDeploymentShapes parameters: - name: pageSize description: >- The maximum number of deployments to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. in: query required: false schema: type: integer format: int32 - name: pageToken description: >- A page token, received from a previous ListDeploymentShapes call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDeploymentShapes must match the call that provided the page token. in: query required: false schema: type: string - name: filter description: >- Only deployment satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. in: query required: false schema: type: string - name: orderBy description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "create_time". in: query required: false schema: type: string - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: targetModel description: >- Target model that the returned deployment shapes should be compatible with. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayListDeploymentShapesResponse' components: schemas: gatewayListDeploymentShapesResponse: type: object properties: deploymentShapes: type: array items: $ref: '#/components/schemas/gatewayDeploymentShape' type: object nextPageToken: type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: type: integer format: int32 description: The total number of deployment shapes. gatewayDeploymentShape: type: object properties: name: type: string title: >- The resource name of the deployment shape. e.g. accounts/my-account/deploymentShapes/my-deployment-shape readOnly: true displayName: type: string description: >- Human-readable display name of the deployment shape. e.g. "My Deployment Shape" Must be fewer than 64 characters long. description: type: string description: >- The description of the deployment shape. Must be fewer than 1000 characters long. createTime: type: string format: date-time description: The creation time of the deployment shape. readOnly: true updateTime: type: string format: date-time description: The update time for the deployment shape. readOnly: true baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b modelType: type: string description: The model type of the base model. readOnly: true parameterCount: type: string format: int64 description: The parameter count of the base model . readOnly: true acceleratorCount: type: integer format: int32 description: >- The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model. acceleratorType: $ref: '#/components/schemas/gatewayAcceleratorType' description: |- The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB. precision: $ref: '#/components/schemas/DeploymentPrecision' description: The precision with which the model should be served. disableDeploymentSizeValidation: type: boolean description: If true, the deployment size validation is disabled. enableAddons: type: boolean description: >- If true, LORA addons are enabled for deployments created from this shape. draftTokenCount: type: integer format: int32 description: |- The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. draftModel: type: string description: >- The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. this behavior. ngramSpeculationLength: type: integer format: int32 description: >- The length of previous input sequence to be considered for N-gram speculation. enableSessionAffinity: type: boolean description: Whether to apply sticky routing based on `user` field. numLoraDeviceCached: type: integer format: int32 title: How many LORA adapters to keep on GPU side for caching presetType: $ref: '#/components/schemas/DeploymentShapePresetType' description: Type of deployment shape for different deployment configurations. title: >- A deployment shape is a set of parameters that define the shape of a deployment. Deployments are created from a deployment shape. Next ID: 34 required: - baseModel gatewayAcceleratorType: type: string enum: - ACCELERATOR_TYPE_UNSPECIFIED - NVIDIA_A100_80GB - NVIDIA_H100_80GB - AMD_MI300X_192GB - NVIDIA_A10G_24GB - NVIDIA_A100_40GB - NVIDIA_L4_24GB - NVIDIA_H200_141GB - NVIDIA_B200_180GB - AMD_MI325X_256GB - AMD_MI350X_288GB default: ACCELERATOR_TYPE_UNSPECIFIED title: 'Next ID: 11' DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve DeploymentShapePresetType: type: string enum: - PRESET_TYPE_UNSPECIFIED - MINIMAL - FAST - THROUGHPUT - FULL_PRECISION default: PRESET_TYPE_UNSPECIFIED title: |- - MINIMAL: Preset for cheapest & most minimal type of deployment - FAST: Preset for fastest generation & TTFT deployment - THROUGHPUT: Preset for best throughput deployment - FULL_PRECISION: Preset for deployment with full precision for training & most accurate numerics securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployments.md # Source: https://docs.fireworks.ai/api-reference/list-deployments.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployments.md # Source: https://docs.fireworks.ai/api-reference/list-deployments.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-deployments.md # Source: https://docs.fireworks.ai/api-reference/list-deployments.md # List Deployments ## OpenAPI ````yaml get /v1/accounts/{account_id}/deployments paths: path: /v1/accounts/{account_id}/deployments method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of deployments to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListDeployments call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDeployments must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only deployment satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "create_time". showDeleted: schema: - type: boolean required: false description: If set, DELETED deployments will be included. readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: deployments: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayDeployment' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of deployments. refIdentifier: '#/components/schemas/gatewayListDeploymentsResponse' examples: example: value: deployments: - name: displayName: description: createTime: '2023-11-07T05:31:56Z' expireTime: '2023-11-07T05:31:56Z' purgeTime: '2023-11-07T05:31:56Z' deleteTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: minReplicaCount: 123 maxReplicaCount: 123 desiredReplicaCount: 123 replicaCount: 123 autoscalingPolicy: scaleUpWindow: scaleDownWindow: scaleToZeroWindow: loadTargets: {} baseModel: acceleratorCount: 123 acceleratorType: ACCELERATOR_TYPE_UNSPECIFIED precision: PRECISION_UNSPECIFIED cluster: enableAddons: true draftTokenCount: 123 draftModel: ngramSpeculationLength: 123 enableSessionAffinity: true directRouteApiKeys: - numPeftDeviceCached: 123 directRouteType: DIRECT_ROUTE_TYPE_UNSPECIFIED directRouteHandle: deploymentTemplate: autoTune: longPrompt: true placement: region: REGION_UNSPECIFIED multiRegion: MULTI_REGION_UNSPECIFIED regions: - REGION_UNSPECIFIED region: REGION_UNSPECIFIED updateTime: '2023-11-07T05:31:56Z' disableDeploymentSizeValidation: true enableMtp: true enableHotReloadLatestAddon: true deploymentShape: activeModelVersion: targetModelVersion: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve gatewayAcceleratorType: type: string enum: - ACCELERATOR_TYPE_UNSPECIFIED - NVIDIA_A100_80GB - NVIDIA_H100_80GB - AMD_MI300X_192GB - NVIDIA_A10G_24GB - NVIDIA_A100_40GB - NVIDIA_L4_24GB - NVIDIA_H200_141GB - NVIDIA_B200_180GB - AMD_MI325X_256GB default: ACCELERATOR_TYPE_UNSPECIFIED gatewayAutoTune: type: object properties: longPrompt: type: boolean description: If true, this deployment is optimized for long prompt lengths. gatewayAutoscalingPolicy: type: object properties: scaleUpWindow: type: string description: >- The duration the autoscaler will wait before scaling up a deployment after observing increased load. Default is 30s. scaleDownWindow: type: string description: >- The duration the autoscaler will wait before scaling down a deployment after observing decreased load. Default is 10m. scaleToZeroWindow: type: string description: >- The duration after which there are no requests that the deployment will be scaled down to zero replicas, if min_replica_count==0. Default is 1h. This must be at least 5 minutes. loadTargets: type: object additionalProperties: type: number format: float title: >- Map of load metric names to their target utilization factors. Currently only the "default" key is supported, which specifies the default target for all metrics. If not specified, the default target is 0.8 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDeployment: type: object properties: name: type: string title: >- The resource name of the deployment. e.g. accounts/my-account/deployments/my-deployment readOnly: true displayName: type: string description: |- Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long. description: type: string description: Description of the deployment. createTime: type: string format: date-time description: The creation time of the deployment. readOnly: true expireTime: type: string format: date-time description: The time at which this deployment will automatically be deleted. purgeTime: type: string format: date-time description: The time at which the resource will be hard deleted. readOnly: true deleteTime: type: string format: date-time description: The time at which the resource will be soft deleted. readOnly: true state: $ref: '#/components/schemas/gatewayDeploymentState' description: The state of the deployment. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: Detailed status information regarding the most recent operation. readOnly: true minReplicaCount: type: integer format: int32 description: |- The minimum number of replicas. If not specified, the default is 0. maxReplicaCount: type: integer format: int32 description: |- The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0. desiredReplicaCount: type: integer format: int32 description: >- The desired number of replicas for this deployment. This represents the target replica count that the system is trying to achieve. readOnly: true replicaCount: type: integer format: int32 readOnly: true autoscalingPolicy: $ref: '#/components/schemas/gatewayAutoscalingPolicy' baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b acceleratorCount: type: integer format: int32 description: >- The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model. acceleratorType: $ref: '#/components/schemas/gatewayAcceleratorType' description: |- The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB. precision: $ref: '#/components/schemas/DeploymentPrecision' description: The precision with which the model should be served. cluster: type: string description: If set, this deployment is deployed to a cloud-premise cluster. readOnly: true enableAddons: type: boolean description: If true, PEFT addons are enabled for this deployment. draftTokenCount: type: integer format: int32 description: >- The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior. draftModel: type: string description: >- The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior. ngramSpeculationLength: type: integer format: int32 description: >- The length of previous input sequence to be considered for N-gram speculation. enableSessionAffinity: type: boolean description: Whether to apply sticky routing based on `user` field. directRouteApiKeys: type: array items: type: string description: >- The set of API keys used to access the direct route deployment. If direct routing is not enabled, this field is unused. numPeftDeviceCached: type: integer format: int32 title: How many peft adapters to keep on gpu side for caching readOnly: true directRouteType: $ref: '#/components/schemas/gatewayDirectRouteType' description: >- If set, this deployment will expose an endpoint that bypasses the Fireworks API gateway. directRouteHandle: type: string description: >- The handle for calling a direct route. The meaning of the handle depends on the direct route type of the deployment: INTERNET -> The host name for accessing the deployment GCP_PRIVATE_SERVICE_CONNECT -> The service attachment name used to create the PSC endpoint. AWS_PRIVATELINK -> The service name used to create the VPC endpoint. readOnly: true deploymentTemplate: type: string description: |- The name of the deployment template to use for this deployment. Only available to enterprise accounts. autoTune: $ref: '#/components/schemas/gatewayAutoTune' description: The performance profile to use for this deployment. placement: $ref: '#/components/schemas/gatewayPlacement' description: |- The desired geographic region where the deployment must be placed. If unspecified, the default is the GLOBAL multi-region. region: $ref: '#/components/schemas/gatewayRegion' description: >- The geographic region where the deployment is presently located. This region may change over time, but within the `placement` constraint. readOnly: true updateTime: type: string format: date-time description: The update time for the deployment. readOnly: true disableDeploymentSizeValidation: type: boolean description: Whether the deployment size validation is disabled. enableMtp: type: boolean description: If true, MTP is enabled for this deployment. enableHotReloadLatestAddon: type: boolean description: >- Allows up to 1 addon at a time to be loaded, and will merge it into the base model. deploymentShape: type: string description: >- The name of the deployment shape that this deployment is using. On the server side, this will be replaced with the deployment shape version name. activeModelVersion: type: string description: >- The model version that is currently active and applied to running replicas of a deployment. targetModelVersion: type: string description: >- The target model version that is being rolled out to the deployment. In a ready steady state, the target model version is the same as the active model version. title: 'Next ID: 82' required: - baseModel gatewayDeploymentState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED - UPDATING - DELETED default: STATE_UNSPECIFIED description: |2- - CREATING: The deployment is still being created. - READY: The deployment is ready to be used. - DELETING: The deployment is being deleted. - FAILED: The deployment failed to be created. See the `status` field for additional details on why it failed. - UPDATING: There are in-progress updates happening with the deployment. - DELETED: The deployment is soft-deleted. gatewayDirectRouteType: type: string enum: - DIRECT_ROUTE_TYPE_UNSPECIFIED - INTERNET - GCP_PRIVATE_SERVICE_CONNECT - AWS_PRIVATELINK default: DIRECT_ROUTE_TYPE_UNSPECIFIED title: |- - DIRECT_ROUTE_TYPE_UNSPECIFIED: No direct routing - INTERNET: The direct route is exposed via the public internet - GCP_PRIVATE_SERVICE_CONNECT: The direct route is exposed via GCP Private Service Connect - AWS_PRIVATELINK: The direct route is exposed via AWS PrivateLink gatewayMultiRegion: type: string enum: - MULTI_REGION_UNSPECIFIED - GLOBAL - US default: MULTI_REGION_UNSPECIFIED gatewayPlacement: type: object properties: region: $ref: '#/components/schemas/gatewayRegion' description: The region where the deployment must be placed. multiRegion: $ref: '#/components/schemas/gatewayMultiRegion' description: The multi-region where the deployment must be placed. regions: type: array items: $ref: '#/components/schemas/gatewayRegion' title: The list of regions where the deployment must be placed description: >- The desired geographic region where the deployment must be placed. Exactly one field will be specified. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-dpo-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-dpo-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-dpo-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-dpo-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-dpo-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-dpo-jobs.md # null ## OpenAPI ````yaml get /v1/accounts/{account_id}/dpoJobs paths: path: /v1/accounts/{account_id}/dpoJobs method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of dpo jobs to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListDpoJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDpoJobs must match the call that provided the page token. filter: schema: - type: string required: false description: >- Filter criteria for the returned jobs. See https://google.aip.dev/160 for the filter syntax specification. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: dpoJobs: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayDpoJob' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 title: The total number of dpo jobs refIdentifier: '#/components/schemas/gatewayListDpoJobsResponse' examples: example: value: dpoJobs: - name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 wandbConfig: enabled: true apiKey: project: entity: runId: url: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayDpoJob: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true completedTime: type: string format: date-time readOnly: true dataset: type: string description: The name of the dataset used for training. state: $ref: '#/components/schemas/gatewayJobState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: type: string description: The email address of the user who initiated this dpo job. readOnly: true trainingConfig: $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. wandbConfig: $ref: '#/components/schemas/gatewayWandbConfig' description: The Weights & Biases team/user account for logging job progress. title: 'Next ID: 13' required: - dataset gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/list-environments.md # List Environments ## OpenAPI ````yaml get /v1/accounts/{account_id}/environments paths: path: /v1/accounts/{account_id}/environments method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of environments to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListEnvironments call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListEnvironments must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only environments satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: environments: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayEnvironment' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of environments. refIdentifier: '#/components/schemas/gatewayListEnvironmentsResponse' examples: example: value: environments: - name: displayName: createTime: '2023-11-07T05:31:56Z' createdBy: state: STATE_UNSPECIFIED status: code: OK message: connection: nodePoolId: numRanks: 123 role: zone: useLocalStorage: true baseImageRef: imageRef: snapshotImageRef: shared: true annotations: {} updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEnvironment: type: object properties: name: type: string title: >- The resource name of the environment. e.g. accounts/my-account/clusters/my-cluster/environments/my-env readOnly: true displayName: type: string title: >- Human-readable display name of the environment. e.g. "My Environment" createTime: type: string format: date-time description: The creation time of the environment. readOnly: true createdBy: type: string description: The email address of the user who created this environment. readOnly: true state: $ref: '#/components/schemas/gatewayEnvironmentState' description: The current state of the environment. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: The current error status of the environment. readOnly: true connection: $ref: '#/components/schemas/gatewayEnvironmentConnection' description: Information about the current environment connection. readOnly: true baseImageRef: type: string description: The URI of the base container image used for this environment. imageRef: type: string description: >- The URI of the container image used for this environment. This is a image is an immutable snapshot of the base_image_ref when the environment was created. readOnly: true snapshotImageRef: type: string description: The URI of the latest container image snapshot for this environment. readOnly: true shared: type: boolean description: >- Whether the environment is shared with all users in the account. This allows all users to connect, disconnect, update, delete, clone, and create batch jobs using the environment. annotations: type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. updateTime: type: string format: date-time description: The update time for the environment. readOnly: true title: 'Next ID: 14' gatewayEnvironmentConnection: type: object properties: nodePoolId: type: string description: The resource id of the node pool the environment is connected to. numRanks: type: integer format: int32 description: |- For GPU node pools: one GPU per rank w/ host packing, for CPU node pools: one host per rank. If not specified, the default is 1. role: type: string description: |- The ARN of the AWS IAM role that the connection should assume. If not specified, the connection will fall back to the node pool's node_role. zone: type: string description: >- Current for the last zone that this environment is connected to. We want to warn the users about cross zone migration latency when they are connecting to node pool in a different zone as their persistent volume. readOnly: true useLocalStorage: type: boolean description: >- If true, the node's local storage will be mounted on /tmp. This flag has no effect if the node does not have local storage. title: 'Next ID: 8' required: - nodePoolId gatewayEnvironmentState: type: string enum: - STATE_UNSPECIFIED - CREATING - DISCONNECTED - CONNECTING - CONNECTED - DISCONNECTING - RECONNECTING - DELETING default: STATE_UNSPECIFIED description: |- - CREATING: The environment is being created. - DISCONNECTED: The environment is not connected. - CONNECTING: The environment is being connected to a node. - CONNECTED: The environment is connected to a node. - DISCONNECTING: The environment is being disconnected from a node. - RECONNECTING: The environment is reconnecting with new connection parameters. - DELETING: The environment is being deleted. title: 'Next ID: 8' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/api-reference/list-evaluation-jobs.md # List Evaluation Jobs ## OpenAPI ````yaml get /v1/accounts/{account_id}/evaluationJobs openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluationJobs: get: tags: - Gateway summary: List Evaluation Jobs operationId: Gateway_ListEvaluationJobs parameters: - name: pageSize in: query required: false schema: type: integer format: int32 - name: pageToken in: query required: false schema: type: string - name: filter in: query required: false schema: type: string - name: orderBy in: query required: false schema: type: string - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayListEvaluationJobsResponse' components: schemas: gatewayListEvaluationJobsResponse: type: object properties: evaluationJobs: type: array items: $ref: '#/components/schemas/gatewayEvaluationJob' type: object nextPageToken: type: string totalSize: type: integer format: int32 gatewayEvaluationJob: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true createdBy: type: string readOnly: true state: $ref: '#/components/schemas/gatewayJobState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true evaluator: type: string description: >- The fully-qualified resource name of the Evaluation used by this job. Format: accounts/{account_id}/evaluators/{evaluator_id} inputDataset: type: string description: >- The fully-qualified resource name of the input Dataset used by this job. Format: accounts/{account_id}/datasets/{dataset_id} outputDataset: type: string description: >- The fully-qualified resource name of the output Dataset created by this job. Format: accounts/{account_id}/datasets/{output_dataset_id} metrics: type: object additionalProperties: type: number format: double readOnly: true outputStats: type: string description: The output dataset's aggregated stats for the evaluation job. updateTime: type: string format: date-time description: The update time for the evaluation job. readOnly: true title: 'Next ID: 18' required: - evaluator - inputDataset - outputDataset gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED - JOB_STATE_PAUSED default: JOB_STATE_UNSPECIFIED description: |- JobState represents the state an asynchronous job can be in. - JOB_STATE_PAUSED: Job is paused, typically due to account suspension or manual intervention. gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-evaluator-revisions.md # firectl list evaluator-revisions > List evaluator revisions ``` firectl list evaluator-revisions [flags] ``` ### Examples ``` firectl list evaluator-revisions accounts/my-account/evaluators/my-evaluator ``` ### Flags ``` -h, --help help for evaluator-revisions ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/api-reference/list-evaluators.md # List Evaluators > Lists all evaluators for an account with pagination support. ## OpenAPI ````yaml get /v1/accounts/{account_id}/evaluators openapi: 3.1.0 info: title: Gateway REST API version: 4.15.25 servers: - url: https://api.fireworks.ai security: - BearerAuth: [] tags: - name: Gateway paths: /v1/accounts/{account_id}/evaluators: get: tags: - Gateway summary: List Evaluators description: Lists all evaluators for an account with pagination support. operationId: Gateway_ListEvaluators parameters: - name: pageSize in: query required: false schema: type: integer format: int32 - name: pageToken in: query required: false schema: type: string - name: filter in: query required: false schema: type: string - name: orderBy in: query required: false schema: type: string - name: readMask description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. in: query required: false schema: type: string - name: account_id in: path required: true description: The Account Id schema: type: string responses: '200': description: A successful response. content: application/json: schema: $ref: '#/components/schemas/gatewayListEvaluatorsResponse' components: schemas: gatewayListEvaluatorsResponse: type: object properties: evaluators: type: array items: $ref: '#/components/schemas/gatewayEvaluator' type: object nextPageToken: type: string totalSize: type: integer format: int32 gatewayEvaluator: type: object properties: name: type: string readOnly: true displayName: type: string description: type: string createTime: type: string format: date-time readOnly: true createdBy: type: string readOnly: true updateTime: type: string format: date-time readOnly: true state: $ref: '#/components/schemas/gatewayEvaluatorState' readOnly: true requirements: type: string title: Content for the requirements.txt for package installation entryPoint: type: string title: >- entry point of the evaluator inside the codebase. In module::function or path::function format status: $ref: '#/components/schemas/gatewayStatus' title: Status of the evaluator, used to expose build status to the user readOnly: true commitHash: type: string title: Commit hash of this evaluator from the user's original codebase source: $ref: '#/components/schemas/gatewayEvaluatorSource' description: Source information for the evaluator codebase. defaultDataset: type: string title: Default dataset that is associated with the evaluator title: 'Next ID: 17' gatewayEvaluatorState: type: string enum: - STATE_UNSPECIFIED - ACTIVE - BUILDING - BUILD_FAILED default: STATE_UNSPECIFIED title: |- - ACTIVE: The evaluator is ready to use for evaluation - BUILDING: The evaluator is being built, i.e. building the e2b template - BUILD_FAILED: The evaluator build failed, and it cannot be used for evaluation gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayEvaluatorSource: type: object properties: type: $ref: '#/components/schemas/EvaluatorSourceType' description: Identifies how the evaluator source code is provided. githubRepositoryName: type: string description: >- Normalized GitHub repository name (e.g. owner/repository) when the source is GitHub. gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] EvaluatorSourceType: type: string enum: - TYPE_UNSPECIFIED - TYPE_UPLOAD - TYPE_GITHUB - TYPE_TEMPORARY default: TYPE_UNSPECIFIED title: |- - TYPE_UPLOAD: Source code is uploaded by the user - TYPE_GITHUB: Source code is from a GitHub repository - TYPE_TEMPORARY: Source code is a temporary UI uploaded code securitySchemes: BearerAuth: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer bearerFormat: API_KEY ```` --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-identity-providers.md # firectl list identity-providers > List identity providers for an account ``` firectl list identity-providers [flags] ``` ### Examples ``` firectl list identity-providers ``` ### Flags ``` -h, --help help for identity-providers ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-invoices.md # firectl list invoices > Prints information about invoices. ``` firectl list invoices [flags] ``` ### Examples ``` firectl list invoices ``` ### Flags ``` -h, --help help for invoices --show-pending If true, only pending invoices are shown. ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-models.md # Source: https://docs.fireworks.ai/api-reference/list-models.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-models.md # Source: https://docs.fireworks.ai/api-reference/list-models.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-models.md # Source: https://docs.fireworks.ai/api-reference/list-models.md # List Models ## OpenAPI ````yaml get /v1/accounts/{account_id}/models paths: path: /v1/accounts/{account_id}/models method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of models to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListModels call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListModels must match the call that provided the page token. filter: schema: - type: string required: false description: |- Only model satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: models: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayModel' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 title: The total number of models refIdentifier: '#/components/schemas/gatewayListModelsResponse' examples: example: value: models: - name: displayName: description: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: kind: KIND_UNSPECIFIED githubUrl: huggingFaceUrl: baseModelDetails: worldSize: 123 checkpointFormat: CHECKPOINT_FORMAT_UNSPECIFIED parameterCount: moe: true tunable: true modelType: supportsFireattention: true defaultPrecision: PRECISION_UNSPECIFIED supportsMtp: true peftDetails: baseModel: r: 123 targetModules: - baseModelType: mergeAddonModelName: teftDetails: {} public: true conversationConfig: style: system: template: contextLength: 123 supportsImageInput: true supportsTools: true importedFrom: fineTuningJob: defaultDraftModel: defaultDraftTokenCount: 123 deployedModelRefs: - name: deployment: state: STATE_UNSPECIFIED default: true public: true cluster: deprecationDate: year: 123 month: 123 day: 123 calibrated: true tunable: true supportsLora: true useHfApplyChatTemplate: true updateTime: '2023-11-07T05:31:56Z' defaultSamplingParams: {} rlTunable: true supportedPrecisions: - PRECISION_UNSPECIFIED supportedPrecisionsWithCalibration: - PRECISION_UNSPECIFIED trainingContextLength: 123 snapshotType: FULL_SNAPSHOT nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: BaseModelDetailsCheckpointFormat: type: string enum: - CHECKPOINT_FORMAT_UNSPECIFIED - NATIVE - HUGGINGFACE default: CHECKPOINT_FORMAT_UNSPECIFIED DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve ModelSnapshotType: type: string enum: - FULL_SNAPSHOT - INCREMENTAL_SNAPSHOT default: FULL_SNAPSHOT gatewayBaseModelDetails: type: object properties: worldSize: type: integer format: int32 description: |- The default number of GPUs the model is served with. If not specified, the default is 1. checkpointFormat: $ref: '#/components/schemas/BaseModelDetailsCheckpointFormat' parameterCount: type: string format: int64 description: >- The number of model parameters. For serverless models, this determines the price per token. moe: type: boolean description: >- If true, this is a Mixture of Experts (MoE) model. For serverless models, this affects the price per token. tunable: type: boolean description: If true, this model is available for fine-tuning. modelType: type: string description: The type of the model. supportsFireattention: type: boolean description: Whether this model supports fireattention. defaultPrecision: $ref: '#/components/schemas/DeploymentPrecision' description: Default precision of the model. readOnly: true supportsMtp: type: boolean description: If true, this model supports MTP. title: 'Next ID: 11' gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayConversationConfig: type: object properties: style: type: string description: The chat template to use. system: type: string description: The system prompt (if the chat style supports it). template: type: string description: The Jinja template (if style is "jinja"). required: - style gatewayDeployedModelRef: type: object properties: name: type: string title: >- The resource name. e.g. accounts/my-account/deployedModels/my-deployed-model readOnly: true deployment: type: string description: The resource name of the base deployment the model is deployed to. readOnly: true state: $ref: '#/components/schemas/gatewayDeployedModelState' description: The state of the deployed model. readOnly: true default: type: boolean description: >- If true, this is the default target when querying this model without the `#` suffix. The first deployment a model is deployed to will have this field set to true automatically. readOnly: true public: type: boolean description: If true, the deployed model will be publicly reachable. readOnly: true title: 'Next ID: 6' gatewayDeployedModelState: type: string enum: - STATE_UNSPECIFIED - UNDEPLOYING - DEPLOYING - DEPLOYED - UPDATING default: STATE_UNSPECIFIED description: |- - UNDEPLOYING: The model is being undeployed. - DEPLOYING: The model is being deployed. - DEPLOYED: The model is deployed and ready for inference. - UPDATING: there are updates happening with the deployed model title: 'Next ID: 6' gatewayModel: type: object properties: name: type: string title: >- The resource name of the model. e.g. accounts/my-account/models/my-model readOnly: true displayName: type: string description: |- Human-readable display name of the model. e.g. "My Model" Must be fewer than 64 characters long. description: type: string description: >- The description of the model. Must be fewer than 1000 characters long. createTime: type: string format: date-time description: The creation time of the model. readOnly: true state: $ref: '#/components/schemas/gatewayModelState' description: The state of the model. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: Contains detailed message when the last model operation fails. readOnly: true kind: $ref: '#/components/schemas/gatewayModelKind' description: |- The kind of model. If not specified, the default is HF_PEFT_ADDON. githubUrl: type: string description: The URL to GitHub repository of the model. huggingFaceUrl: type: string description: The URL to the Hugging Face model. baseModelDetails: $ref: '#/components/schemas/gatewayBaseModelDetails' description: |- Base model details. Required if kind is HF_BASE_MODEL. Must not be set otherwise. peftDetails: $ref: '#/components/schemas/gatewayPEFTDetails' description: |- PEFT addon details. Required if kind is HF_PEFT_ADDON or HF_TEFT_ADDON. teftDetails: $ref: '#/components/schemas/gatewayTEFTDetails' description: |- TEFT addon details. Required if kind is HF_TEFT_ADDON. Must not be set otherwise. public: type: boolean description: If true, the model will be publicly readable. conversationConfig: $ref: '#/components/schemas/gatewayConversationConfig' description: If set, the Chat Completions API will be enabled for this model. contextLength: type: integer format: int32 description: The maximum context length supported by the model. supportsImageInput: type: boolean description: If set, images can be provided as input to the model. supportsTools: type: boolean description: >- If set, tools (i.e. functions) can be provided as input to the model, and the model may respond with one or more tool calls. importedFrom: type: string description: >- The name of the the model from which this was imported. This field is empty if the model was not imported. readOnly: true fineTuningJob: type: string description: >- If the model was created from a fine-tuning job, this is the fine-tuning job name. readOnly: true defaultDraftModel: type: string description: |- The default draft model to use when creating a deployment. If empty, speculative decoding is disabled by default. defaultDraftTokenCount: type: integer format: int32 description: |- The default draft token count to use when creating a deployment. Must be specified if default_draft_model is specified. deployedModelRefs: type: array items: type: object $ref: '#/components/schemas/gatewayDeployedModelRef' description: Populated from GetModel API call only. readOnly: true cluster: type: string description: |- The resource name of the BYOC cluster to which this model belongs. e.g. accounts/my-account/clusters/my-cluster. Empty if it belongs to a Fireworks cluster. readOnly: true deprecationDate: $ref: '#/components/schemas/typeDate' description: >- If specified, this is the date when the serverless deployment of the model will be taken down. calibrated: type: boolean description: >- If true, the model is calibrated and can be deployed to non-FP16 precisions. readOnly: true tunable: type: boolean description: >- If true, the model can be fine-tuned. The value will be true if the tunable field is true, and the model is validated against the model_type field. readOnly: true supportsLora: type: boolean description: Whether this model supports LoRA. useHfApplyChatTemplate: type: boolean description: >- If true, the model will use the Hugging Face apply_chat_template API to apply the chat template. updateTime: type: string format: date-time description: The update time for the model. readOnly: true defaultSamplingParams: type: object additionalProperties: type: number format: float description: >- A json object that contains the default sampling parameters for the model. readOnly: true rlTunable: type: boolean description: If true, the model is RL tunable. readOnly: true supportedPrecisions: type: array items: $ref: '#/components/schemas/DeploymentPrecision' title: Supported precisions readOnly: true supportedPrecisionsWithCalibration: type: array items: $ref: '#/components/schemas/DeploymentPrecision' title: Supported precisions if calibrated readOnly: true trainingContextLength: type: integer format: int32 description: The maximum context length supported by the model. snapshotType: $ref: '#/components/schemas/ModelSnapshotType' title: 'Next ID: 56' gatewayModelKind: type: string enum: - KIND_UNSPECIFIED - HF_BASE_MODEL - HF_PEFT_ADDON - HF_TEFT_ADDON - FLUMINA_BASE_MODEL - FLUMINA_ADDON - DRAFT_ADDON - FIRE_AGENT - LIVE_MERGE - CUSTOM_MODEL - EMBEDDING_MODEL - SNAPSHOT_MODEL default: KIND_UNSPECIFIED description: |2- - HF_BASE_MODEL: An LLM base model. - HF_PEFT_ADDON: A parameter-efficent fine-tuned addon. - HF_TEFT_ADDON: A token-eficient fine-tuned addon. - FLUMINA_BASE_MODEL: A Flumina base model. - FLUMINA_ADDON: A Flumina addon. - DRAFT_ADDON: A draft model used for speculative decoding in a deployment. - FIRE_AGENT: A FireAgent model. - LIVE_MERGE: A live-merge model. - CUSTOM_MODEL: A customized model - EMBEDDING_MODEL: An Embedding model. - SNAPSHOT_MODEL: A snapshot model. gatewayModelState: type: string enum: - STATE_UNSPECIFIED - UPLOADING - READY default: STATE_UNSPECIFIED description: |- - UPLOADING: The model is still being uploaded (upload is asynchronous). - READY: The model is ready to be used. title: 'Next ID: 7' gatewayPEFTDetails: type: object properties: baseModel: type: string title: The base model name. e.g. accounts/fireworks/models/falcon-7b r: type: integer format: int32 description: |- The rank of the update matrices. Must be between 4 and 64, inclusive. targetModules: type: array items: type: string title: >- This is the target modules for an adapter that we extract from for more information what target module means, check out https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft baseModelType: type: string description: The type of the model. readOnly: true mergeAddonModelName: type: string title: >- The resource name of the model to merge with base model, e.g accounts/fireworks/models/falcon-7b-lora title: |- PEFT addon details. Next ID: 6 required: - baseModel - r - targetModules gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayTEFTDetails: type: object typeDate: type: object properties: year: type: integer format: int32 description: >- Year of the date. Must be from 1 to 9999, or 0 to specify a date without a year. month: type: integer format: int32 description: >- Month of a year. Must be from 1 to 12, or 0 to specify a year without a month and day. day: type: integer format: int32 description: >- Day of a month. Must be from 1 to 31 and valid for the year and month, or 0 to specify a year by itself or a year and month where the day isn't significant. description: >- * A full date, with non-zero year, month, and day values * A month and day value, with a zero year, such as an anniversary * A year on its own, with zero month and day values * A year and month value, with a zero day, such as a credit card expiration date Related types are [google.type.TimeOfDay][google.type.TimeOfDay] and `google.protobuf.Timestamp`. title: >- Represents a whole or partial calendar date, such as a birthday. The time of day and time zone are either specified elsewhere or are insignificant. The date is relative to the Gregorian Calendar. This can represent one of the following: ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/list-node-pool-bindings.md # List Node Pool Bindings ## OpenAPI ````yaml get /v1/accounts/{account_id}/nodePoolBindings paths: path: /v1/accounts/{account_id}/nodePoolBindings method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of bindings to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListNodePoolBindings call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListNodePoolBindings must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only bindings satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: nodePoolBindings: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayNodePoolBinding' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of node pool bindings. refIdentifier: '#/components/schemas/gatewayListNodePoolBindingsResponse' examples: example: value: nodePoolBindings: - accountId: clusterId: nodePoolId: createTime: '2023-11-07T05:31:56Z' principal: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayNodePoolBinding: type: object properties: accountId: type: string description: The account ID that this binding is associated with. readOnly: true clusterId: type: string description: The cluster ID that this binding is associated with. readOnly: true nodePoolId: type: string description: The node pool ID that this binding is associated with. readOnly: true createTime: type: string format: date-time description: The creation time of the node pool binding. readOnly: true principal: type: string description: |- The principal that is allowed use the node pool. This must be the email address of the user. required: - principal ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/list-node-pools.md # List Node Pools ## OpenAPI ````yaml get /v1/accounts/{account_id}/nodePools paths: path: /v1/accounts/{account_id}/nodePools method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of node pools to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListNodePools call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListNodePools must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only node pools satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: nodePools: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayNodePool' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of node pools. refIdentifier: '#/components/schemas/gatewayListNodePoolsResponse' examples: example: value: nodePools: - name: displayName: createTime: '2023-11-07T05:31:56Z' minNodeCount: 123 maxNodeCount: 123 overprovisionNodeCount: 123 eksNodePool: nodeRole: instanceType: spot: true nodeGroupName: subnetIds: - zone: placementGroup: launchTemplate: fakeNodePool: machineType: numNodes: 123 serviceAccount: annotations: {} state: STATE_UNSPECIFIED status: code: OK message: nodePoolStats: nodeCount: 123 ranksPerNode: 123 environmentCount: 123 environmentRanks: 123 batchJobCount: {} batchJobRanks: {} updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayEksNodePool: type: object properties: nodeRole: type: string description: |- If not specified, the parent cluster's system_node_group_role will be used. title: |- The IAM role ARN to associate with nodes. The role must have the following IAM policies attached: - AmazonEKSWorkerNodePolicy - AmazonEC2ContainerRegistryReadOnly - AmazonEKS_CNI_Policy instanceType: type: string description: >- The type of instance used in this node pool. See https://aws.amazon.com/ec2/instance-types/ for a list of valid instance types. spot: type: boolean title: >- If true, nodes are created as preemptible VM instances. See https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html#managed-node-group-capacity-types nodeGroupName: type: string description: |- The name of the node group. If not specified, the default is the node pool ID. subnetIds: type: array items: type: string description: >- A list of subnet IDs for nodes in this node pool. If not specified, the parent cluster's default subnet IDs that matches the zone will be used. Note that all the subnets will need to be in the same zone. zone: type: string description: >- Zone for the node pool. If not specified, a random zone in the cluster's region will be selected. placementGroup: type: string description: Cluster placement group to colocate hosts in this pool. launchTemplate: type: string description: Launch template to create for this node group. title: |- An Amazon Elastic Kubernetes Service node pool. Next ID: 10 required: - instanceType gatewayFakeNodePool: type: object properties: machineType: type: string numNodes: type: integer format: int32 serviceAccount: type: string description: A fake node pool to be used with FakeCluster. gatewayNodePool: type: object properties: name: type: string title: >- The resource name of the node pool. e.g. accounts/my-account/clusters/my-cluster/nodePools/my-pool readOnly: true displayName: type: string description: |- Human-readable display name of the node pool. e.g. "My Node Pool" Must be fewer than 64 characters long. createTime: type: string format: date-time description: The creation time of the node pool. readOnly: true minNodeCount: type: integer format: int32 description: >- https://cloud.google.com/kubernetes-engine/quotas Minimum number of nodes in this node pool. Must be a non-negative integer less than or equal to max_node_count. If not specified, the default is 0. maxNodeCount: type: integer format: int32 description: >- https://cloud.google.com/kubernetes-engine/quotas Maximum number of nodes in this node pool. Must be a positive integer greater than or equal to min_node_count. If not specified, the default is 1. overprovisionNodeCount: type: integer format: int32 description: |- The number of nodes to overprovision by the autoscaler. Must be a non-negative integer and less than or equal to min_node_count and max_node_count-min_node_count. If not specified, the default is 0. eksNodePool: $ref: '#/components/schemas/gatewayEksNodePool' fakeNodePool: $ref: '#/components/schemas/gatewayFakeNodePool' annotations: type: object additionalProperties: type: string description: >- Arbitrary, user-specified metadata. Keys and values must adhere to Kubernetes constraints: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set Additionally, the "fireworks.ai/" prefix is reserved. state: $ref: '#/components/schemas/gatewayNodePoolState' description: The current state of the node pool. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: >- Contains detailed message when the last node pool operation fails, e.g. when node pool is in FAILED state or when last node pool update fails. readOnly: true nodePoolStats: $ref: '#/components/schemas/gatewayNodePoolStats' description: Live statistics of the node pool. readOnly: true updateTime: type: string format: date-time description: The update time for the node pool. readOnly: true title: 'Next ID: 16' gatewayNodePoolState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - DELETING - FAILED default: STATE_UNSPECIFIED description: |2- - CREATING: The cluster is still being created. - READY: The node pool is ready to be used. - DELETING: The node pool is being deleted. - FAILED: Node pool is not operational. Consult 'status' for detailed messaging. Node pool needs to be deleted and re-created. gatewayNodePoolStats: type: object properties: nodeCount: type: integer format: int32 description: The number of nodes currently available in this pool. ranksPerNode: type: integer format: int32 description: >- The number of ranks available per node. This is determined by the machine type of the nodes in this node pool. environmentCount: type: integer format: int32 description: The number of environments connected to this node pool. environmentRanks: type: integer format: int32 description: |- The number of ranks in this node pool that are currently allocated to environment connections. batchJobCount: type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of batch jobs in that state allocated to this node pool. batchJobRanks: type: object additionalProperties: type: integer format: int32 description: >- The key is the string representation of BatchJob.State (e.g. "RUNNING"). The value is the number of ranks allocated to batch jobs in that state in this node pool. title: 'Next ID: 7' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-quotas.md # firectl list quotas > Prints all quotas. ``` firectl list quotas [flags] ``` ### Examples ``` firectl list quotas ``` ### Flags ``` -h, --help help for quotas ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-reinforcement-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-reinforcement-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-reinforcement-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-reinforcement-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-reinforcement-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-reinforcement-fine-tuning-jobs.md # List Reinforcement Fine-tuning Jobs ## OpenAPI ````yaml get /v1/accounts/{account_id}/reinforcementFineTuningJobs paths: path: /v1/accounts/{account_id}/reinforcementFineTuningJobs method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of fine-tuning jobs to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListReinforcementLearningFineTuningJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListReinforcementLearningFineTuningJobs must match the call that provided the page token. filter: schema: - type: string required: false description: >- Filter criteria for the returned jobs. See https://google.aip.dev/160 for the filter syntax specification. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: reinforcementFineTuningJobs: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayReinforcementFineTuningJob' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 title: The total number of fine-tuning jobs refIdentifier: >- #/components/schemas/gatewayListReinforcementFineTuningJobsResponse examples: example: value: reinforcementFineTuningJobs: - name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: evaluationDataset: evalAutoCarveout: true state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 evaluator: wandbConfig: enabled: true apiKey: project: entity: runId: url: outputStats: inferenceParameters: maxTokens: 123 temperature: 123 topP: 123 'n': 123 extraBody: topK: 123 outputMetrics: mcpServer: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayInferenceParameters: type: object properties: maxTokens: type: integer format: int32 description: Maximum number of tokens to generate per response. temperature: type: number format: float description: Sampling temperature, typically between 0 and 2. topP: type: number format: float description: Top-p sampling parameter, typically between 0 and 1. 'n': type: integer format: int32 description: Number of response candidates to generate per input. extraBody: type: string description: |- Additional parameters for the inference request as a JSON string. For example: "{\"stop\": [\"\\n\"]}". topK: type: integer format: int32 description: >- Top-k sampling parameter, limits the token selection to the top k tokens. description: Parameters for the inference requests. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayReinforcementFineTuningJob: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true completedTime: type: string format: date-time description: The completed time for the reinforcement fine-tuning job. readOnly: true dataset: type: string description: The name of the dataset used for training. evaluationDataset: type: string description: The name of a separate dataset to use for evaluation. evalAutoCarveout: type: boolean description: Whether to auto-carve the dataset for eval. state: $ref: '#/components/schemas/gatewayJobState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: type: string description: The email address of the user who initiated this fine-tuning job. readOnly: true trainingConfig: $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. evaluator: type: string description: The evaluator resource name to use for RLOR fine-tuning job. wandbConfig: $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. outputStats: type: string description: The output dataset's aggregated stats for the evaluation job. readOnly: true inferenceParameters: $ref: '#/components/schemas/gatewayInferenceParameters' description: BIJ parameters. outputMetrics: type: string readOnly: true mcpServer: type: string title: 'Next ID: 29' required: - dataset - evaluator gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/api-reference/list-reinforcement-fine-tuning-steps.md # List Reinforcement Fine-tuning Steps ## OpenAPI ````yaml get /v1/accounts/{account_id}/rlorTrainerJobs paths: path: /v1/accounts/{account_id}/rlorTrainerJobs method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of fine-tuning jobs to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListRlorTuningJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListRlorTuningJobs must match the call that provided the page token. filter: schema: - type: string required: false description: >- Filter criteria for the returned jobs. See https://google.aip.dev/160 for the filter syntax specification. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: rlorTrainerJobs: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayRlorTrainerJob' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 title: The total number of fine-tuning jobs refIdentifier: '#/components/schemas/gatewayListRlorTrainerJobsResponse' examples: example: value: rlorTrainerJobs: - name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: evaluationDataset: evalAutoCarveout: true state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: trainingConfig: outputModel: baseModel: warmStartFrom: jinjaTemplate: learningRate: 123 maxContextLength: 123 loraRank: 123 region: REGION_UNSPECIFIED epochs: 123 batchSize: 123 gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 rewardWeights: - wandbConfig: enabled: true apiKey: project: entity: runId: url: nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayBaseTrainingConfig: type: object properties: outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. epochs: type: integer format: int32 description: The number of epochs to train for. batchSize: type: integer format: int32 description: >- The maximum packed number of tokens per batch for training in sequence packing. gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: |- BaseTrainingConfig contains common configuration fields shared across different training job types. Next ID: 19 gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayRlorTrainerJob: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true completedTime: type: string format: date-time readOnly: true dataset: type: string description: The name of the dataset used for training. evaluationDataset: type: string description: The name of a separate dataset to use for evaluation. evalAutoCarveout: type: boolean description: Whether to auto-carve the dataset for eval. state: $ref: '#/components/schemas/gatewayJobState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: type: string description: The email address of the user who initiated this fine-tuning job. readOnly: true trainingConfig: $ref: '#/components/schemas/gatewayBaseTrainingConfig' description: Common training configurations. rewardWeights: type: array items: type: string description: >- A list of reward metrics to use for training in format of "=". wandbConfig: $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. title: 'Next ID: 18' gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-reservations.md # firectl list reservations > Prints active reservations. ``` firectl list reservations [flags] ``` ### Examples ``` firectl list reservations ``` ### Flags ``` -h, --help help for reservations --show-inactive Show all reservations ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/api-reference/list-responses.md # List Responses > Get a list of all responses for the authenticated account. Args: limit: Maximum number of responses to return (default: 20, max: 100) after: Cursor for pagination - return responses after this ID before: Cursor for pagination - return responses before this ID ## OpenAPI ````yaml get /v1/responses paths: path: /v1/responses method: get servers: - url: https://api.fireworks.ai/inference request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: {} query: limit: schema: - type: integer required: false title: Limit default: 20 after: schema: - type: string required: false title: After - type: 'null' required: false title: After before: schema: - type: string required: false title: Before - type: 'null' required: false title: Before header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: object: allOf: - type: string title: Object description: The object type, which is always 'list'. default: list data: allOf: - items: $ref: '#/components/schemas/Response' type: array title: Data description: >- An array of response objects, sorted by creation time in descending order (most recent first). has_more: allOf: - type: boolean title: Has More description: >- Indicates whether there are more responses available beyond this page. If true, use the 'last_id' value as the 'after' cursor to fetch the next page. first_id: allOf: - anyOf: - type: string - type: 'null' title: First Id description: >- The ID of the first response in the current page. Used for pagination. last_id: allOf: - anyOf: - type: string - type: 'null' title: Last Id description: >- The ID of the last response in the current page. Use this as the 'after' cursor to fetch the next page if has_more is true. title: ResponseList description: >- Response model for listing responses. Returned from the GET /v1/responses endpoint. Provides a paginated list of response objects with cursor-based pagination support. refIdentifier: '#/components/schemas/ResponseList' requiredProperties: - data - has_more examples: example: value: object: list data: - id: object: response created_at: 123 status: model: output: - id: type: message role: content: - type: text: status: previous_response_id: usage: {} error: {} incomplete_details: {} instructions: max_output_tokens: 123 max_tool_calls: 2 parallel_tool_calls: true reasoning: {} store: true temperature: 1 text: {} tool_choice: tools: - {} top_p: 1 truncation: disabled user: metadata: {} has_more: true first_id: last_id: description: Successful Response '422': application/json: schemaArray: - type: object properties: detail: allOf: - items: $ref: '#/components/schemas/ValidationError' type: array title: Detail title: HTTPValidationError refIdentifier: '#/components/schemas/HTTPValidationError' examples: example: value: detail: - loc: - msg: type: description: Validation Error deprecated: false type: path components: schemas: Message: properties: id: type: string title: Id description: The unique identifier of the message. type: type: string title: Type description: The object type, always 'message'. default: message role: type: string title: Role description: >- The role of the message sender. Can be 'user', 'assistant', or 'system'. content: items: $ref: '#/components/schemas/MessageContent' type: array title: Content description: >- An array of content parts that make up the message. Each part has a type and associated data. status: type: string title: Status description: The status of the message. Can be 'in_progress' or 'completed'. type: object required: - id - role - content - status title: Message description: Represents a message in a conversation. MessageContent: properties: type: type: string title: Type description: >- The type of the content part. Can be 'input_text', 'output_text', 'image', etc. text: anyOf: - type: string - type: 'null' title: Text description: The text content, if applicable. type: object required: - type title: MessageContent description: Represents a piece of content within a message. Response: properties: id: anyOf: - type: string - type: 'null' title: Id description: The unique identifier of the response. Will be None if store=False. object: type: string title: Object description: The object type, which is always 'response'. default: response created_at: type: integer title: Created At description: The Unix timestamp (in seconds) when the response was created. status: type: string title: Status description: >- The status of the response. Can be 'completed', 'in_progress', 'incomplete', 'failed', or 'cancelled'. model: type: string title: Model description: >- The model used to generate the response (e.g., `accounts//models/`). output: items: anyOf: - $ref: '#/components/schemas/Message' - $ref: '#/components/schemas/ToolCall' - $ref: '#/components/schemas/ToolOutput' type: array title: Output description: >- An array of output items produced by the model. Can contain messages, tool calls, and tool outputs. previous_response_id: anyOf: - type: string - type: 'null' title: Previous Response Id description: >- The ID of the previous response in the conversation, if this response continues a conversation. usage: anyOf: - additionalProperties: true type: object - type: 'null' title: Usage description: >- Token usage information for the request. Contains 'prompt_tokens', 'completion_tokens', and 'total_tokens'. error: anyOf: - additionalProperties: true type: object - type: 'null' title: Error description: >- Error information if the response failed. Contains 'type', 'code', and 'message' fields. incomplete_details: anyOf: - additionalProperties: true type: object - type: 'null' title: Incomplete Details description: >- Details about why the response is incomplete, if status is 'incomplete'. Contains 'reason' field which can be 'max_output_tokens', 'max_tool_calls', or 'content_filter'. instructions: anyOf: - type: string - type: 'null' title: Instructions description: >- System instructions that guide the model's behavior. Similar to a system message. max_output_tokens: anyOf: - type: integer - type: 'null' title: Max Output Tokens description: >- The maximum number of tokens that can be generated in the response. Must be at least 1. max_tool_calls: anyOf: - type: integer minimum: 1 - type: 'null' title: Max Tool Calls description: >- The maximum number of tool calls allowed in a single response. Must be at least 1. parallel_tool_calls: type: boolean title: Parallel Tool Calls description: >- Whether to enable parallel function calling during tool use. Default is True. default: true reasoning: anyOf: - additionalProperties: true type: object - type: 'null' title: Reasoning description: >- Reasoning output from the model, if reasoning is enabled. Contains 'content' and 'type' fields. store: anyOf: - type: boolean - type: 'null' title: Store description: >- Whether to store this response for future retrieval. If False, the response will not be persisted and previous_response_id cannot reference it. Default is True. default: true temperature: type: number maximum: 2 minimum: 0 title: Temperature description: >- The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0. default: 1 text: anyOf: - additionalProperties: true type: object - type: 'null' title: Text description: Text generation configuration parameters, if applicable. tool_choice: anyOf: - type: string - additionalProperties: true type: object title: Tool Choice description: >- Controls which (if any) tool the model should use. Can be 'none', 'auto', 'required', or an object specifying a particular tool. Default is 'auto'. default: auto tools: items: additionalProperties: true type: object type: array title: Tools description: >- A list of tools the model may call. Each tool is defined with a type and function specification following the OpenAI tool format. Supports 'function', 'mcp', 'sse', and 'python' tool types. top_p: type: number maximum: 1 minimum: 0 title: Top P description: >- An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0. default: 1 truncation: type: string title: Truncation description: >- The truncation strategy to use for the context. Can be 'auto' or 'disabled'. Default is 'disabled'. default: disabled user: anyOf: - type: string - type: 'null' title: User description: >- A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse. metadata: anyOf: - additionalProperties: true type: object - type: 'null' title: Metadata description: >- Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information about the response in a structured format. type: object required: - created_at - status - model - output title: Response description: >- Represents a response object returned from the API. A response includes the model output, token usage, configuration parameters, and metadata about the conversation state. ToolCall: properties: id: type: string title: Id description: The unique identifier of the tool call. type: type: string title: Type description: The type of tool call. Can be 'function', 'tool_call', or 'mcp'. function: anyOf: - additionalProperties: true type: object - type: 'null' title: Function description: >- The function definition for function tool calls. Contains 'name' and 'arguments' keys. mcp: anyOf: - additionalProperties: true type: object - type: 'null' title: Mcp description: >- The MCP (Model Context Protocol) tool call definition for MCP tool calls. type: object required: - id - type title: ToolCall description: Represents a tool call made by the model. ToolOutput: properties: type: type: string title: Type description: The object type, always 'tool_output'. default: tool_output tool_call_id: type: string title: Tool Call Id description: The ID of the tool call that this output corresponds to. output: type: string title: Output description: The output content from the tool execution. type: object required: - tool_call_id - output title: ToolOutput description: Represents the output/result of a tool call. ValidationError: properties: loc: items: anyOf: - type: string - type: integer type: array title: Location msg: type: string title: Message type: type: string title: Error Type type: object required: - loc - msg - type title: ValidationError ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-secret.md # firectl list secret > Lists all secrets for the signed in user. ``` firectl list secret [flags] ``` ### Examples ``` firectl list secrets ``` ### Flags ``` -h, --help help for secret ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/api-reference/list-secrets.md # List Secrets > Lists all secrets for an account. Note that the `value` field is not returned in the response for security reasons. Only the `name` and `key_name` fields are included for each secret. ## OpenAPI ````yaml get /v1/accounts/{account_id}/secrets paths: path: /v1/accounts/{account_id}/secrets method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false pageToken: schema: - type: string required: false filter: schema: - type: string required: false description: Unused but required to use existing ListRequest functionality. orderBy: schema: - type: string required: false description: Unused but required to use existing ListRequest functionality. readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: secrets: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewaySecret' nextPageToken: allOf: - type: string totalSize: allOf: - type: integer format: int32 description: The total number of secrets. refIdentifier: '#/components/schemas/gatewayListSecretsResponse' examples: example: value: secrets: - name: keyName: value: sk-1234567890abcdef nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewaySecret: type: object properties: name: type: string title: |- name follows the convention accounts/account-id/secrets/unkey-key-id keyName: type: string title: name of the key. In this case, it can be WOLFRAM_ALPHA_API_KEY value: type: string example: sk-1234567890abcdef description: >- The secret value. This field is INPUT_ONLY and will not be returned in GET or LIST responses for security reasons. The value is only accepted when creating or updating secrets. required: - name - keyName ```` --- # Source: https://docs.fireworks.ai/api-reference-dlde/list-snapshots.md # List Snapshots ## OpenAPI ````yaml get /v1/accounts/{account_id}/snapshots paths: path: /v1/accounts/{account_id}/snapshots method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of snapshots to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListEnvironments call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListEnvironments must match the call that provided the page token. filter: schema: - type: string required: false description: >- Only snapshots satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "create_time". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: snapshots: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewaySnapshot' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of snapshots. refIdentifier: '#/components/schemas/gatewayListSnapshotsResponse' examples: example: value: snapshots: - name: createTime: '2023-11-07T05:31:56Z' state: STATE_UNSPECIFIED status: code: OK message: imageRef: updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewaySnapshot: type: object properties: name: type: string title: >- The resource name of the snapshot. e.g. accounts/my-account/clusters/my-cluster/environments/my-env/snapshots/1 readOnly: true createTime: type: string format: date-time description: The creation time of the snapshot. readOnly: true state: $ref: '#/components/schemas/gatewaySnapshotState' description: The state of the snapshot. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: The status code and message of the snapshot. readOnly: true imageRef: type: string description: The URI of the container image for this snapshot. readOnly: true updateTime: type: string format: date-time description: The update time for the snapshot. readOnly: true title: 'Next ID: 7' gatewaySnapshotState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - FAILED - DELETING default: STATE_UNSPECIFIED gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-supervised-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-supervised-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-supervised-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-supervised-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-supervised-fine-tuning-jobs.md # Source: https://docs.fireworks.ai/api-reference/list-supervised-fine-tuning-jobs.md # List Supervised Fine-tuning Jobs ## OpenAPI ````yaml get /v1/accounts/{account_id}/supervisedFineTuningJobs paths: path: /v1/accounts/{account_id}/supervisedFineTuningJobs method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of fine-tuning jobs to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListSupervisedFineTuningJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListSupervisedFineTuningJobs must match the call that provided the page token. filter: schema: - type: string required: false description: >- Filter criteria for the returned jobs. See https://google.aip.dev/160 for the filter syntax specification. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: supervisedFineTuningJobs: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewaySupervisedFineTuningJob' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 title: The total number of fine-tuning jobs refIdentifier: '#/components/schemas/gatewayListSupervisedFineTuningJobsResponse' examples: example: value: supervisedFineTuningJobs: - name: displayName: createTime: '2023-11-07T05:31:56Z' completedTime: '2023-11-07T05:31:56Z' dataset: state: JOB_STATE_UNSPECIFIED status: code: OK message: createdBy: outputModel: baseModel: warmStartFrom: jinjaTemplate: earlyStop: true epochs: 123 learningRate: 123 maxContextLength: 123 loraRank: 123 wandbConfig: enabled: true apiKey: project: entity: runId: url: evaluationDataset: isTurbo: true evalAutoCarveout: true region: REGION_UNSPECIFIED updateTime: '2023-11-07T05:31:56Z' nodes: 123 batchSize: 123 mtpEnabled: true mtpNumDraftTokens: 123 mtpFreezeBaseModel: true hiddenStatesGenConfig: deployedModel: maxWorkers: 123 maxTokens: 123 inputOffset: 123 inputLimit: 123 maxContextLen: 123 regenerateAssistant: true outputActivations: true apiKey: metricsFileSignedUrl: gradientAccumulationSteps: 123 learningRateWarmupSteps: 123 nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayHiddenStatesGenConfig: type: object properties: deployedModel: type: string maxWorkers: type: integer format: int32 maxTokens: type: integer format: int32 inputOffset: type: integer format: int32 inputLimit: type: integer format: int32 maxContextLen: type: integer format: int32 regenerateAssistant: type: boolean outputActivations: type: boolean apiKey: type: string description: >- Config for generating dataset with hidden states for SFTJ or eagle training. gatewayJobState: type: string enum: - JOB_STATE_UNSPECIFIED - JOB_STATE_CREATING - JOB_STATE_RUNNING - JOB_STATE_COMPLETED - JOB_STATE_FAILED - JOB_STATE_CANCELLED - JOB_STATE_DELETING - JOB_STATE_WRITING_RESULTS - JOB_STATE_VALIDATING - JOB_STATE_DELETING_CLEANING_UP - JOB_STATE_PENDING - JOB_STATE_EXPIRED - JOB_STATE_RE_QUEUEING - JOB_STATE_CREATING_INPUT_DATASET - JOB_STATE_IDLE - JOB_STATE_CANCELLING - JOB_STATE_EARLY_STOPPED default: JOB_STATE_UNSPECIFIED description: JobState represents the state an asynchronous job can be in. gatewayRegion: type: string enum: - REGION_UNSPECIFIED - US_IOWA_1 - US_VIRGINIA_1 - US_ILLINOIS_1 - AP_TOKYO_1 - US_ARIZONA_1 - US_TEXAS_1 - US_ILLINOIS_2 - EU_FRANKFURT_1 - US_TEXAS_2 - EU_ICELAND_1 - EU_ICELAND_2 - US_WASHINGTON_1 - US_WASHINGTON_2 - US_WASHINGTON_3 - AP_TOKYO_2 - US_CALIFORNIA_1 - US_UTAH_1 - US_TEXAS_3 - US_GEORGIA_1 - US_GEORGIA_2 - US_WASHINGTON_4 - US_GEORGIA_3 default: REGION_UNSPECIFIED title: |- - US_IOWA_1: GCP us-central1 (Iowa) - US_VIRGINIA_1: AWS us-east-1 (N. Virginia) - US_ILLINOIS_1: OCI us-chicago-1 - AP_TOKYO_1: OCI ap-tokyo-1 - US_ARIZONA_1: OCI us-phoenix-1 - US_TEXAS_1: Lambda us-south-3 (C. Texas) - US_ILLINOIS_2: Lambda us-midwest-1 (Illinois) - EU_FRANKFURT_1: OCI eu-frankfurt-1 - US_TEXAS_2: Lambda us-south-2 (N. Texas) - EU_ICELAND_1: Crusoe eu-iceland1 - EU_ICELAND_2: Crusoe eu-iceland1 (network1) - US_WASHINGTON_1: Voltage Park us-pyl-1 (Detach audio cluster from control_plane) - US_WASHINGTON_2: Voltage Park us-seattle-2 - US_WASHINGTON_3: Vultr Seattle 1 - AP_TOKYO_2: AWS ap-northeast-1 - US_CALIFORNIA_1: AWS us-west-1 (N. California) - US_UTAH_1: GCP us-west3 (Utah) - US_TEXAS_3: Crusoe us-southcentral1 - US_GEORGIA_1: DigitalOcean us-atl1 - US_GEORGIA_2: Vultr Atlanta 1 - US_WASHINGTON_4: Coreweave us-west-09b-1 - US_GEORGIA_3: Alicloud us-southeast-1 gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewaySupervisedFineTuningJob: type: object properties: name: type: string readOnly: true displayName: type: string createTime: type: string format: date-time readOnly: true completedTime: type: string format: date-time readOnly: true dataset: type: string description: The name of the dataset used for training. state: $ref: '#/components/schemas/gatewayJobState' readOnly: true status: $ref: '#/components/schemas/gatewayStatus' readOnly: true createdBy: type: string description: The email address of the user who initiated this fine-tuning job. readOnly: true outputModel: type: string description: >- The model ID to be assigned to the resulting fine-tuned model. If not specified, the job ID will be used. baseModel: type: string description: |- The name of the base model to be fine-tuned Only one of 'base_model' or 'warm_start_from' should be specified. warmStartFrom: type: string description: |- The PEFT addon model in Fireworks format to be fine-tuned from Only one of 'base_model' or 'warm_start_from' should be specified. jinjaTemplate: type: string title: >- The Jinja template for conversation formatting. If not specified, defaults to the base model's conversation template configuration earlyStop: type: boolean description: >- Whether to stop training early if the validation loss does not improve. epochs: type: integer format: int32 description: The number of epochs to train for. learningRate: type: number format: float description: The learning rate used for training. maxContextLength: type: integer format: int32 description: The maximum context length to use with the model. loraRank: type: integer format: int32 description: The rank of the LoRA layers. wandbConfig: $ref: '#/components/schemas/gatewayWandbConfig' description: >- The Weights & Biases team/user account for logging training progress. evaluationDataset: type: string description: The name of a separate dataset to use for evaluation. isTurbo: type: boolean description: Whether to run the fine-tuning job in turbo mode. evalAutoCarveout: type: boolean description: Whether to auto-carve the dataset for eval. region: $ref: '#/components/schemas/gatewayRegion' description: The region where the fine-tuning job is located. updateTime: type: string format: date-time description: The update time for the supervised fine-tuning job. readOnly: true nodes: type: integer format: int32 description: The number of nodes to use for the fine-tuning job. batchSize: type: integer format: int32 title: The batch size for sequence packing in training mtpEnabled: type: boolean title: Whether to enable MTP (Model-Token-Prediction) mode mtpNumDraftTokens: type: integer format: int32 title: Number of draft tokens to use in MTP mode mtpFreezeBaseModel: type: boolean title: Whether to freeze the base model parameters during MTP training hiddenStatesGenConfig: $ref: '#/components/schemas/gatewayHiddenStatesGenConfig' description: Config for generating dataset with hidden states for training. metricsFileSignedUrl: type: string title: The signed URL for the metrics file gradientAccumulationSteps: type: integer format: int32 title: Number of gradient accumulation steps learningRateWarmupSteps: type: integer format: int32 title: Number of steps for learning rate warm up title: 'Next ID: 42' required: - dataset gatewayWandbConfig: type: object properties: enabled: type: boolean description: Whether to enable wandb logging. apiKey: type: string description: The API key for the wandb service. project: type: string description: The project name for the wandb service. entity: type: string description: The entity name for the wandb service. runId: type: string description: The run ID for the wandb service. url: type: string description: The URL for the wandb service. readOnly: true description: >- WandbConfig is the configuration for the Weights & Biases (wandb) logging which will be used by a training job. ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/list-user.md # firectl list user > Prints all users in the account. ``` firectl list user [flags] ``` ### Examples ``` firectl list users ``` ### Flags ``` -h, --help help for user ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. --filter string Only resources satisfying the provided filter will be listed. See https://google.aip.dev/160 for the filter grammar. --no-paginate List all resources without pagination. --order-by string A list of fields to order by. To specify a descending order for a field, append a " desc" suffix --page-size int32 The maximum number of resources to list. --page-token string The page to list. A number from 0 to the total number of pages (number of entities / page size). -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/api-reference/list-users.md # List Users ## OpenAPI ````yaml get /v1/accounts/{account_id}/users paths: path: /v1/accounts/{account_id}/users method: get servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id query: pageSize: schema: - type: integer required: false description: >- The maximum number of users to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50. pageToken: schema: - type: string required: false description: >- A page token, received from a previous ListUsers call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListUsers must match the call that provided the page token. filter: schema: - type: string required: false description: |- Only users satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar. orderBy: schema: - type: string required: false description: >- A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "name". readMask: schema: - type: string required: false description: >- The fields to be returned in the response. If empty or "*", all fields will be returned. header: {} cookie: {} body: {} response: '200': application/json: schemaArray: - type: object properties: users: allOf: - type: array items: type: object $ref: '#/components/schemas/gatewayUser' nextPageToken: allOf: - type: string description: >- A token, which can be sent as `page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages. totalSize: allOf: - type: integer format: int32 description: The total number of users. refIdentifier: '#/components/schemas/gatewayListUsersResponse' examples: example: value: users: - name: displayName: serviceAccount: true createTime: '2023-11-07T05:31:56Z' role: email: state: STATE_UNSPECIFIED status: code: OK message: updateTime: '2023-11-07T05:31:56Z' nextPageToken: totalSize: 123 description: A successful response. deprecated: false type: path components: schemas: gatewayCode: type: string enum: - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION_DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS default: OK description: |- - OK: Not an error; returned on success. HTTP Mapping: 200 OK - CANCELLED: The operation was cancelled, typically by the caller. HTTP Mapping: 499 Client Closed Request - UNKNOWN: Unknown error. For example, this error may be returned when a `Status` value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error. HTTP Mapping: 500 Internal Server Error - INVALID_ARGUMENT: The client specified an invalid argument. Note that this differs from `FAILED_PRECONDITION`. `INVALID_ARGUMENT` indicates arguments that are problematic regardless of the state of the system (e.g., a malformed file name). HTTP Mapping: 400 Bad Request - DEADLINE_EXCEEDED: The deadline expired before the operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire. HTTP Mapping: 504 Gateway Timeout - NOT_FOUND: Some requested entity (e.g., file or directory) was not found. Note to server developers: if a request is denied for an entire class of users, such as gradual feature rollout or undocumented allowlist, `NOT_FOUND` may be used. If a request is denied for some users within a class of users, such as user-based access control, `PERMISSION_DENIED` must be used. HTTP Mapping: 404 Not Found - ALREADY_EXISTS: The entity that a client attempted to create (e.g., file or directory) already exists. HTTP Mapping: 409 Conflict - PERMISSION_DENIED: The caller does not have permission to execute the specified operation. `PERMISSION_DENIED` must not be used for rejections caused by exhausting some resource (use `RESOURCE_EXHAUSTED` instead for those errors). `PERMISSION_DENIED` must not be used if the caller can not be identified (use `UNAUTHENTICATED` instead for those errors). This error code does not imply the request is valid or the requested entity exists or satisfies other pre-conditions. HTTP Mapping: 403 Forbidden - UNAUTHENTICATED: The request does not have valid authentication credentials for the operation. HTTP Mapping: 401 Unauthorized - RESOURCE_EXHAUSTED: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. HTTP Mapping: 429 Too Many Requests - FAILED_PRECONDITION: The operation was rejected because the system is not in a state required for the operation's execution. For example, the directory to be deleted is non-empty, an rmdir operation is applied to a non-directory, etc. Service implementors can use the following guidelines to decide between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`: (a) Use `UNAVAILABLE` if the client can retry just the failing call. (b) Use `ABORTED` if the client should retry at a higher level. For example, when a client-specified test-and-set fails, indicating the client should restart a read-modify-write sequence. (c) Use `FAILED_PRECONDITION` if the client should not retry until the system state has been explicitly fixed. For example, if an "rmdir" fails because the directory is non-empty, `FAILED_PRECONDITION` should be returned since the client should not retry unless the files are deleted from the directory. HTTP Mapping: 400 Bad Request - ABORTED: The operation was aborted, typically due to a concurrency issue such as a sequencer check failure or transaction abort. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 409 Conflict - OUT_OF_RANGE: The operation was attempted past the valid range. E.g., seeking or reading past end-of-file. Unlike `INVALID_ARGUMENT`, this error indicates a problem that may be fixed if the system state changes. For example, a 32-bit file system will generate `INVALID_ARGUMENT` if asked to read at an offset that is not in the range [0,2^32-1], but it will generate `OUT_OF_RANGE` if asked to read from an offset past the current file size. There is a fair bit of overlap between `FAILED_PRECONDITION` and `OUT_OF_RANGE`. We recommend using `OUT_OF_RANGE` (the more specific error) when it applies so that callers who are iterating through a space can easily look for an `OUT_OF_RANGE` error to detect when they are done. HTTP Mapping: 400 Bad Request - UNIMPLEMENTED: The operation is not implemented or is not supported/enabled in this service. HTTP Mapping: 501 Not Implemented - INTERNAL: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. HTTP Mapping: 500 Internal Server Error - UNAVAILABLE: The service is currently unavailable. This is most likely a transient condition, which can be corrected by retrying with a backoff. Note that it is not always safe to retry non-idempotent operations. See the guidelines above for deciding between `FAILED_PRECONDITION`, `ABORTED`, and `UNAVAILABLE`. HTTP Mapping: 503 Service Unavailable - DATA_LOSS: Unrecoverable data loss or corruption. HTTP Mapping: 500 Internal Server Error title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto] gatewayStatus: type: object properties: code: $ref: '#/components/schemas/gatewayCode' description: The status code. message: type: string description: A developer-facing error message in English. title: >- Mimics [https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto] gatewayUser: type: object properties: name: type: string title: >- The resource name of the user. e.g. accounts/my-account/users/my-user readOnly: true displayName: type: string description: |- Human-readable display name of the user. e.g. "Alice" Must be fewer than 64 characters long. serviceAccount: type: boolean title: Whether this user is a service account (can only be set by admins) createTime: type: string format: date-time description: The creation time of the user. readOnly: true role: type: string description: The user's role, e.g. admin or user. email: type: string description: The user's email address. state: $ref: '#/components/schemas/gatewayUserState' description: The state of the user. readOnly: true status: $ref: '#/components/schemas/gatewayStatus' description: Contains information about the user status. readOnly: true updateTime: type: string format: date-time description: The update time for the user. readOnly: true title: 'Next ID: 13' required: - role gatewayUserState: type: string enum: - STATE_UNSPECIFIED - CREATING - READY - UPDATING - DELETING default: STATE_UNSPECIFIED ```` --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/load-lora.md # firectl load-lora > Loads a LoRA model to a deployment. ### Usage Loads a LoRA model to a deployment. If a deployment is not specified, the model will be loaded to Fireworks' serverless platform (if supported). If a deployment is specified, it will be loaded to the given dedicated deployment. If successful, a DeployedModel resource will be created. ``` firectl load-lora [flags] ``` ### Examples ``` firectl load-lora my-lora # To load it to serverless firectl load-lora my-lora --deployment abcd1234 # To load it to a dedicated deployment ``` ### Flags ``` --deployment string The resource ID of the deployment where the LoRA model is to be loaded. -h, --help help for load-lora --public If true, the LoRA model will be publicly available for inference. --replace-merged-addon Required when loading an addon to a hot reload deployment. If there is already an existing addon, it will be replaced. --wait Wait until the model is deployed. --wait-timeout duration Maximum time to wait when using --wait flag. (default 30m0s) ``` ### Global flags ``` -a, --account-id string The Fireworks account ID. If not specified, reads account_id from ~/.fireworks/auth.ini. --api-key string An API key used to authenticate with Fireworks. -p, --profile string fireworks auth and settings profile to use. ``` --- # Source: https://docs.fireworks.ai/ecosystem/integrations/mlops-observability.md # MLOps & Observability > Track and monitor your Fireworks AI deployments with leading MLOps and observability platforms Fireworks AI integrates with industry-leading MLOps and observability platforms to help you monitor, track, and optimize your AI applications in production. ## Supported Platforms Track fine-tuning experiments and visualize training metrics with W\&B Mlflow Tracing to track prompts, outputs, latency etc as your build AI applications with FireworksAI ## Need Help? For assistance with MLOps and observability integrations, [contact our team](https://fireworks.ai/contact) or join our [Discord community](https://discord.gg/fireworks-ai). --- # Source: https://docs.fireworks.ai/fine-tuning/monitor-training.md # Monitor Training > Track RFT job progress and diagnose issues in real-time Once your RFT job is running, the Fireworks dashboard provides comprehensive monitoring tools to track progress, inspect individual rollouts, and debug issues as they arise. ## Accessing the monitoring dashboard After creating your RFT job, you'll receive a dashboard link in the CLI output: ``` Dashboard Links: RFT Job: https://app.fireworks.ai/dashboard/fine-tuning/reinforcement/abc123 ``` Click this link or navigate manually: 1. Go to [Fireworks Dashboard](https://app.fireworks.ai) 2. Click **Fine-Tuning** in the sidebar 3. Select your job from the list ## Understanding the overview The main dashboard shows your job's current state and key metrics. ### Job status Your job is queued waiting for GPU resources. Queue time depends on current demand and your account priority. **Action**: None needed. Job will start automatically when resources become available. Fireworks is validating your dataset to ensure it meets format requirements and quality standards. **Duration**: Typically 1-2 minutes **Action**: None needed. If validation fails, you'll receive specific error messages about issues in your dataset. Training is actively in progress. Rollouts are being generated, evaluated, and the model is learning. **Action**: Monitor metrics and rollout quality. This is when you'll watch reward curves improve. Training finished successfully. Your fine-tuned model is ready for deployment. **Action**: Review final metrics, then [deploy your model](/fine-tuning/deploying-loras). Training encountered an unrecoverable error and stopped. **Action**: Check error logs and troubleshooting section below. Common causes include evaluator errors, resource limits, or dataset issues. You or another user manually stopped the job. **Action**: Review partial results if needed. Create a new job to continue training. Training stopped automatically because the full epoch showed no improvement. All rollouts received the same scores, indicating no training progress. **Action**: This typically indicates an issue with your evaluator or training setup. Check that: * Your evaluator is returning varied scores (not all 0s or all 1s) * The reward function can distinguish between good and bad outputs * The model is actually generating different responses Review the troubleshooting section below for common causes. ### Key metrics at a glance The overview panel displays: * **Elapsed time**: How long the job has been running * **Progress**: Current epoch and step counts * **Reward**: Latest mean reward from rollouts * **Model**: Base model and output model names ## Training metrics ### Reward curves The most important metric in RFT is the reward curve, which shows how well your model is performing over time. **What to look for**: * **Upward trend** - Model is learning and improving * **Plateauing** - Model may have converged; consider stopping or adjusting parameters * **Decline** - Potential issue with evaluator or training instability * **Spikes** - Could indicate noisy rewards or outliers in evaluation Reward curve showing upward trend over training epochs Healthy training shows steady reward improvement. Don't worry about minor fluctuations—focus on the overall trend. ### Training loss Loss measures how well the model is fitting the training data: * **Decreasing loss** - Normal learning behavior * **Increasing loss** - Learning rate may be too high * **Flat loss** - Model may not be learning; check evaluator rewards ### Evaluation metrics If you provided an evaluation dataset, you'll see validation metrics: * **Eval reward**: Model performance on held-out data * **Generalization gap**: Difference between training and eval rewards Large gaps between training and eval rewards suggest overfitting. Consider reducing epochs or adding more diverse training data. ## Inspecting rollouts Understanding individual rollouts helps you verify your evaluator is working correctly and identify quality issues. ### Rollout overview table Click any **Epoch** in the training timeline, then click the **table icon** to view all rollouts for that step. Table showing rollout IDs, prompts, responses, and rewards The table shows: * **Row ID**: Unique identifier for each dataset row used in this rollout * **Prompt**: The input prompt sent to the model * **Messages**: The model's generated response messages * **Valid**: Whether the rollout completed successfully without errors * **Reason**: Explanation if the rollout failed or was marked invalid * **Score**: Reward score assigned by your evaluator (0.0 to 1.0) **What to check**: * Most rollouts succeeding (status: complete) * Reward distribution makes sense (high for good outputs, low for bad) * Many failures indicate evaluator issues * All rewards identical may indicate evaluator is broken ### Individual rollout details Click any row in the rollout table to see full details: Detailed view of a single rollout showing full prompt, response, and evaluation You'll see: 1. **Full prompt**: Exact messages sent to the model 2. **Model response**: Complete generated output 3. **Evaluation result**: Reward score and reasoning (if provided) 4. **Metadata**: Token counts, timing, temperature settings 5. **Tool calls**: For agentic rollouts with function calling Copy and paste model outputs to test them manually. For example, if you're training a code generator, try running the generated code yourself to verify your evaluator is scoring correctly. ### Quality spot checks Regularly inspect rollouts at different stages of training: **Early training (first epoch)**: * Verify evaluator is working correctly * Check that high-reward rollouts are actually good * Ensure low-reward rollouts are actually bad **Mid-training**: * Confirm model quality is improving * Look for new strategies or behaviors emerging * Check that evaluator isn't being gamed **Late training**: * Verify final model quality meets your standards * Check for signs of overfitting (memorizing training data) * Ensure diversity in responses (not all identical) ## Live logs Real-time logs show what's happening inside your training job. ### Accessing logs Click the **Logs icon** next to the table icon to view real-time logs for your training job. Live log streaming showing rollout processing and evaluation ### Using logs for debugging When things go wrong, logs are your first stop: 1. **Filter by error level**: Focus on `[ERROR]` and `[WARNING]` messages 2. **Search for rollout IDs**: Track specific rollouts through their lifecycle 3. **Look for patterns**: Repeated errors indicate systematic issues 4. **Check timestamps**: Correlate errors with metric changes ## Common issues and solutions **Symptoms**: Reward curve flat or very low throughout training **Possible causes**: * Evaluator always returning 0 or very low scores * Model outputs not matching expected format * Task too difficult for base model **Solutions**: 1. Inspect rollouts to verify evaluator is working: * Check that some rollouts get high rewards * Verify reward logic makes sense 2. Test evaluator locally on known good/bad outputs 3. Simplify the task or provide more examples 4. Try a stronger base model **Symptoms**: Reward increases then crashes and stays low **Possible causes**: * Learning rate too high causing training instability * Model found an exploit in the evaluator (reward hacking) * Catastrophic forgetting **Solutions**: 1. Stop training and use the last good checkpoint 2. Restart with lower learning rate (e.g., `--learning-rate 5e-5`) 3. Review recent rollouts for reward hacking behavior 4. Improve evaluator to be more robust **Symptoms**: Rollout table shows lots of errors or timeouts **Possible causes**: * Evaluator code errors * Timeout too short for evaluation * External API failures (for remote evaluators) * Resource exhaustion **Solutions**: 1. Check error logs for specific error messages 2. Test evaluator locally to reproduce errors 3. Increase `--rollout-timeout` if evaluations need more time 4. Add better error handling in evaluator code 5. For remote evaluators: check server health and logs **Symptoms**: Loss goes up instead of down **Possible causes**: * Learning rate too high * Conflicting reward signals * Numerical instability **Solutions**: 1. Reduce learning rate by 2-5x 2. Check that rewards are consistent (same prompt gets similar rewards) 3. Verify rewards are in valid range \[0, 1] 4. Consider reducing batch size **Symptoms**: Model generates the same response for every prompt **Possible causes**: * Temperature too low (near 0) * Model found one high-reward response and overfit to it * Evaluator only rewards one specific output **Solutions**: 1. Increase `--temperature` to 0.8-1.0 2. Make evaluator more flexible to accept diverse good answers 3. Use more diverse prompts in training data 4. Reduce epochs to prevent overfitting **Symptoms**: Many rollouts timing out with remote environment **Possible causes**: * Remote server slow or overloaded * Network latency issues * Evaluator not logging completion correctly **Solutions**: 1. Check remote server logs for errors 2. Verify server is logging `Status.rollout_finished()` 3. Increase `--rollout-timeout` to allow more time 4. Scale remote server to handle concurrent requests 5. Optimize evaluator code for performance ## Performance optimization ### Speeding up training If training is slower than expected: **Slow evaluators directly increase training time**: * Profile your evaluator code to find bottlenecks * Cache expensive computations * Use batch processing for API calls * Add timeouts to prevent hanging **For remote evaluators**: * Add more worker instances to handle concurrent rollouts * Use faster machines (more CPU, memory) * Optimize network connectivity to Fireworks Target: Evaluations should complete in 1-5 seconds per rollout. **Reduce compute while maintaining quality**: * Decrease `--n` (e.g., from 8 to 4 rollouts per prompt) * Reduce `--max-tokens` if responses don't need to be long * Lower temperature slightly to speed up sampling Caution: Too few rollouts (n \< 4) may hurt training quality. ### Cost optimization Reduce costs without sacrificing too much quality: 1. **Start small**: Experiment with `qwen3-0p6b` before scaling to larger models 2. **Reduce rollouts**: Use `--n 4` instead of 8 3. **Shorter responses**: Lower `--max-tokens` to minimum needed 4. **Fewer epochs**: Start with 1 epoch, only add more if needed 5. **Efficient evaluators**: Minimize API calls and computation ## Stopping and resuming jobs ### Stopping a running job If you need to stop training: 1. Click **Cancel Job** in the dashboard 2. Or via CLI: ```bash theme={null} firectl delete rftj ``` The model state at the last checkpoint is saved and can be deployed. Cancelled jobs cannot be resumed. If you want to continue training, create a new job starting from the last checkpoint. ### Using checkpoints Checkpoints are automatically saved during training. To continue from a checkpoint: ```bash theme={null} eval-protocol create rft \ --warm-start-from accounts/your-account/models/previous-checkpoint \ --output-model continued-training ``` This is useful for: * Extending training after early stopping * Trying different hyperparameters on a trained model * Building on previous successful training runs ## Comparing multiple jobs Running multiple experiments? Compare them side-by-side: 1. Navigate to **Fine-Tuning** dashboard 2. Select multiple jobs using checkboxes 3. Click **Compare** This shows: * Reward curves overlaid on same graph * Parameter differences highlighted * Final metrics comparison * Training time and cost comparison Use consistent naming for experiments (e.g., `math-lr-1e4`, `math-lr-5e5`) to make comparisons easier. ## Exporting metrics For deeper analysis or paper writing: ### Via dashboard 1. Click **Export** button in job view 2. Choose format: CSV, JSON 3. Select metrics to export (rewards, loss, rollout data) ### Via API ```python theme={null} import requests response = requests.get( f"https://api.fireworks.ai/v1/accounts/{account}/reinforcementFineTuningJobs/{job_id}/metrics", headers={"Authorization": f"Bearer {api_key}"} ) metrics = response.json() ``` ### Weights & Biases integration If you enabled W\&B when creating the job: ```bash theme={null} eval-protocol create rft \ --wandb-project my-experiments \ --wandb-entity my-org \ ... ``` All metrics automatically sync to W\&B for advanced analysis, comparison, and sharing. ## Best practices Check your job within the first 15-30 minutes of training: * Verify evaluator is working correctly * Confirm rewards are in expected range * Catch configuration errors early Don't wait until training completes to discover issues. Every few epochs, inspect 5-10 random rollouts: * Manually verify high-reward outputs are actually good * Check low-reward outputs are actually bad * Look for unexpected model behaviors This catches evaluator bugs and reward hacking. When you find good hyperparameters, save the command: ```bash theme={null} # Save to file for reproducibility echo "eval-protocol create rft --base-model ... --learning-rate 5e-5 ..." > best_config.sh ``` Makes it easy to reproduce results or share with team. Name jobs descriptively: * Good: `math-solver-llama8b-temp08-n8` * Bad: `test1`, `try2`, `final-final` Future you will thank you when comparing experiments. Keep notes on what worked and what didn't: * Hypothesis for each experiment * Parameters changed * Results and insights * Next steps Build institutional knowledge for your team. ## Next steps Once training completes, deploy your fine-tuned model for inference Learn how to adjust parameters for better results Improve your reward functions based on training insights Start a new experiment using the CLI --- # Source: https://docs.fireworks.ai/guides/ondemand-deployments.md # Deployments > Configure and manage on-demand deployments on dedicated GPUs **New to deployments?** Start with our [Deployments Quickstart](/getting-started/ondemand-quickstart) to deploy and query your first model in minutes, then return here to learn about configuration options. On-demand deployments give you dedicated GPUs for your models, providing several advantages over serverless: * **Better performance** – Lower latency, higher throughput, and predictable performance unaffected by other users * **No hard rate limits** – Only limited by your deployment's capacity * **Cost-effective at scale** – Cheaper under high utilization. Unlike serverless models (billed per token), on-demand deployments are [billed by GPU-second](https://fireworks.ai/pricing). * **Broader model selection** – Access models not available on serverless * **Custom models** – Upload your own models (for supported architectures) from Hugging Face or elsewhere Need higher GPU quotas or want to reserve capacity? [Contact us](https://fireworks.ai/contact). ## Creating & querying deployments **Create a deployment:** ```bash theme={null} # This command returns your DEPLOYMENT_NAME - save it for querying firectl create deployment accounts/fireworks/models/ --wait ``` See [Deployment shapes](#deployment-shapes) below to optimize for speed, throughput, or cost. **Query your deployment:** After creating a deployment, query it using this format: ``` # ``` You can find your deployment name anytime with `firectl list deployments` and `firectl get deployment `. **Examples:** ``` accounts/fireworks/models/mixtral-8x7b#accounts/alice/deployments/12345678 ``` * Model: `accounts/fireworks/models/mixtral-8x7b` * Deployment: `accounts/alice/deployments/12345678` You can also use shorthand: `fireworks/mixtral-8x7b#alice/12345678` ``` accounts/alice/models/custom-model#accounts/alice/deployments/12345678 ``` * Model: `accounts/alice/models/custom-model` * Deployment: `accounts/alice/deployments/12345678` You can also use shorthand: `alice/custom-model#alice/12345678` ### Code examples ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.chat.completions.create( model="accounts/fireworks/models/gpt-oss-120b#", messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}] ) print(response.choices[0].message.content) ``` ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const response = await client.chat.completions.create({ model: "accounts/fireworks/models/gpt-oss-120b#", messages: [ { role: "user", content: "Explain quantum computing in simple terms", }, ], }); console.log(response.choices[0].message.content); ``` ```bash theme={null} curl https://api.fireworks.ai/inference/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $FIREWORKS_API_KEY" \ -d '{ "model": "accounts/fireworks/models/gpt-oss-120b#", "messages": [ { "role": "user", "content": "Explain quantum computing in simple terms" } ] }' ``` ## Deployment shapes Deployment shapes are the primary way to configure deployments. They're pre-configured templates optimized for speed, cost, or efficiency, including hardware, quantization, and other [performance factors](/faq/deployment/performance/optimization#performance-factors). * **Fast** – Low latency for interactive workloads * **Throughput** – Cost-per-token at scale for high-volume workloads * **Minimal** – Lowest cost for testing or light workloads **Usage:** ```bash theme={null} # List available shapes firectl list deployment-shape-versions --base-model # Create with a shape (shorthand) firectl create deployment accounts/fireworks/models/deepseek-v3 --deployment-shape throughput # Create with full shape ID firectl create deployment accounts/fireworks/models/llama-v3p3-70b-instruct \ --deployment-shape accounts/fireworks/deploymentShapes/llama-v3p3-70b-instruct-fast # View shape details firectl get deployment-shape-version ``` Need even better performance with tailored optimizations? [Contact our team](https://fireworks.ai/contact). ## Managing & configuring deployments ### Basic management ```bash theme={null} # List all deployments firectl list deployments # Check deployment status firectl get deployment # Delete a deployment firectl delete deployment ``` By default, deployments scale to zero if unused for 1 hour. Deployments with min replicas set to 0 are automatically deleted after 7 days of no traffic. ### GPU hardware Choose GPU type with `--accelerator-type`: * `NVIDIA_A100_80GB` * `NVIDIA_H100_80GB` * `NVIDIA_H200_141GB` GPU availability varies by [region](/deployments/regions). See [Hardware selection guide→](https://docs.fireworks.ai/faq/deployment/ondemand/hardware-options#hardware-selection) ### Autoscaling Control replica counts, scale timing, and load targets for your deployment. See the [Autoscaling guide](/deployments/autoscaling) for configuration options. ### Multiple GPUs per replica Use multiple GPUs to improve latency and throughput: ```bash theme={null} firectl create deployment --accelerator-count 2 ``` More GPUs = faster generation. Note that scaling is sub-linear (2x GPUs ≠ 2x performance). ## Advanced * **[Speculative decoding](/deployments/speculative-decoding)** - Speed up text generation using draft models or n-gram speculation * **[Quantization](/models/quantization)** - Reduce model precision (e.g., FP16 to FP8) to improve speeds and reduce costs by 30-50% * **[Performance benchmarking](/deployments/benchmarking)** - Measure and optimize your deployment's performance with load testing * **[Managing default deployments](/deployments/managing-default-deployments)** - Control which deployment handles queries when using just the model name * **[Publishing deployments](/deployments/publishing-deployments)** - Make your deployment accessible to other Fireworks users ## Next steps Configure autoscaling for optimal cost and performance Deploy your own models from Hugging Face Reduce costs with model quantization Choose deployment regions for optimal latency Purchase reserved GPUs for guaranteed capacity Fine-tune models for your specific use case --- # Source: https://docs.fireworks.ai/getting-started/ondemand-quickstart.md # Deployments Quickstart > Deploy models on dedicated GPUs in minutes On-demand deployments are dedicated GPUs that give you better performance, no rate limits, fast autoscaling, and a wider selection of models than serverless. This quickstart will help you spin up your first on-demand deployment in minutes. ## Step 1: Create and export an API key Before you begin, create an API key in the [Fireworks dashboard](https://app.fireworks.ai/settings/users/api-keys). Click **Create API key** and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal: ```bash theme={null} export FIREWORKS_API_KEY="your_api_key_here" ``` ```powershell theme={null} setx FIREWORKS_API_KEY "your_api_key_here" ``` ## Step 2: Install the CLI To create and manage on-demand deployments, you'll need the `firectl` CLI tool. Install it using one of the following methods, based on your platform: ```bash homebrew theme={null} brew tap fw-ai/firectl brew install firectl # If you encounter a failed SHA256 check, try first running brew update ``` ```bash macOS (Apple Silicon) theme={null} curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-arm64.gz -o firectl.gz gzip -d firectl.gz && chmod a+x firectl sudo mv firectl /usr/local/bin/firectl sudo chown root: /usr/local/bin/firectl ``` ```bash macOS (x86_64) theme={null} curl https://storage.googleapis.com/fireworks-public/firectl/stable/darwin-amd64.gz -o firectl.gz gzip -d firectl.gz && chmod a+x firectl sudo mv firectl /usr/local/bin/firectl sudo chown root: /usr/local/bin/firectl ``` ```bash Linux (x86_64) theme={null} wget -O firectl.gz https://storage.googleapis.com/fireworks-public/firectl/stable/linux-amd64.gz gunzip firectl.gz sudo install -o root -g root -m 0755 firectl /usr/local/bin/firectl ``` ```Text Windows (64 bit) theme={null} wget -L https://storage.googleapis.com/fireworks-public/firectl/stable/firectl.exe ``` Then, sign in: ```bash theme={null} firectl signin ``` ## Step 3: Create a deployment This command will create a deployment of GPT OSS 120B optimized for speed. It will take a few minutes to complete. The resulting deployment will scale up to 1 replica. ```bash theme={null} firectl create deployment accounts/fireworks/models/gpt-oss-120b \ --deployment-shape fast \ --scale-down-window 5m \ --scale-up-window 30s \ --min-replica-count 0 \ --max-replica-count 1 \ --scale-to-zero-window 5m \ --wait ``` `fast` is called a [deployment shape](/guides/ondemand-deployments#deployment-shapes), which is a pre-configured deployment template created by the Fireworks team that sets sensible defaults for most deployment options (such as hardware type). You can also pass `throughput` or `cost` to `--deployment-shape`: * `throughput` creates a deployment that trades off latency for lower cost-per-token at scale * `cost` creates a deployment that trades off latency and throughput for lowest cost-per-token at small scale, usually for early experimentation and prototyping While we recommend using a deployment shape, you are also free to pass your own configuration to the deployment via our [deployment options](/guides/ondemand-deployments#deployment-options). The response will look like this: ```bash theme={null} Name: accounts//deployments/ Create Time: Expire Time: Created By: State: CREATING Status: OK Min Replica Count: 0 Max Replica Count: 1 Desired Replica Count: 0 Replica Count: 0 Autoscaling Policy: Scale Up Window: 30s Scale Down Window: 5m0s Scale To Zero Window: 5m0s Base Model: accounts/fireworks/models/gpt-oss-120b ...other fields... ``` Take note of the `Name:` field in the response, as it will be used in the next step to query your deployment. [Learn more about deployment options→](/guides/ondemand-deployments#deployment-options) [Learn more about autoscaling options→](/guides/ondemand-deployments#customizing-autoscaling-behavior) ## Step 4: Query your deployment Now you can query your on-demand deployment using the same API as serverless models, but using your dedicated deployment. Replace `` in the below snippets with the value from the `Name:` field in the previous step: ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.chat.completions.create( model="accounts/fireworks/models/gpt-oss-120b#", messages=[{ "role": "user", "content": "Explain quantum computing in simple terms", }], ) print(response.choices[0].message.content) ``` ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const response = await client.chat.completions.create({ model: "accounts/fireworks/models/gpt-oss-120b#", messages: [ { role: "user", content: "Explain quantum computing in simple terms", }, ], }); console.log(response.choices[0].message.content); ``` ```bash theme={null} curl https://api.fireworks.ai/inference/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $FIREWORKS_API_KEY" \ -d '{ "model": "accounts/fireworks/models/gpt-oss-120b#", "messages": [ { "role": "user", "content": "Explain quantum computing in simple terms" } ] }' ``` The examples from the Serverless quickstart will work with this deployment as well, just replace the model string with the deployment-specific model string from above. [Serverless quickstart→](/getting-started/quickstart) ## Common use cases ### Autoscale based on requests per second ```bash theme={null} firectl create deployment accounts/fireworks/models/gpt-oss-120b \ --deployment-shape fast \ --scale-down-window 5m \ --scale-up-window 30s \ --scale-to-zero-window 5m \ --min-replica-count 0 \ --max-replica-count 4 \ --load-targets requests_per_second=5 \ --wait ``` ### Autoscale based on concurrent requests ```bash theme={null} firectl create deployment accounts/fireworks/models/gpt-oss-120b \ --deployment-shape fast \ --scale-down-window 5m \ --scale-up-window 30s \ --scale-to-zero-window 5m \ --min-replica-count 0 \ --max-replica-count 4 \ --load-targets concurrent_requests=5 \ --wait ``` ## Next steps Ready to scale to production, explore other modalities, or customize your models? Bring your own model and deploy it on Fireworks Improve model quality with supervised and reinforcement learning Real-time or batch audio transcription Use embeddings & reranking in search & context retrieval Run async inference jobs at scale, faster and cheaper Explore all available models across modalities Complete API documentation --- # Source: https://docs.fireworks.ai/tools-sdks/openai-compatibility.md # OpenAI compatibility You can use the [OpenAI Python client library](https://github.com/openai/openai-python) to interact with Fireworks. This makes migration of existing applications already using OpenAI particularly easy. ## Specify endpoint and API key ### Using the OpenAI client You can use the OpenAI client by initializing it with your Fireworks configuration: ```python theme={null} from openai import OpenAI # Initialize with Fireworks parameters client = OpenAI( base_url="https://api.fireworks.ai/inference/v1", api_key="", ) ``` You can also use environment variables with the client: ```python theme={null} import os from openai import OpenAI # Initialize using environment variables client = OpenAI( base_url=os.environ.get("OPENAI_API_BASE", "https://api.fireworks.ai/inference/v1"), api_key=os.environ.get("OPENAI_API_KEY"), # Set to your Fireworks API key ) ``` ### Using environment variables ```shell theme={null} export OPENAI_API_BASE="https://api.fireworks.ai/inference/v1" export OPENAI_API_KEY="" ``` ### Alternative approach ```python theme={null} import openai # warning: it has a process-wide effect openai.api_base = "https://api.fireworks.ai/inference/v1" openai.api_key = "" ``` ## Usage Use OpenAI's SDK how you'd normally would. Just ensure that the `model` parameter refers to one of [Fireworks models](https://fireworks.ai/models). ### Completion Simple completion API that doesn't modify provided prompt in any way: ```python theme={null} from openai import OpenAI client = OpenAI( base_url="https://api.fireworks.ai/inference/v1", api_key="", ) completion = client.completions.create( model="accounts/fireworks/models/llama-v3p1-8b-instruct", prompt="The quick brown fox", ) print(completion.choices[0].text) ``` ### Chat Completion Works best for models fine-tuned for conversation (e.g. llama\*-chat variants): ```python theme={null} from openai import OpenAI client = OpenAI( base_url="https://api.fireworks.ai/inference/v1", api_key="", ) chat_completion = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p1-8b-instruct", messages=[ { "role": "system", "content": "You are a helpful assistant.", }, { "role": "user", "content": "Say this is a test", }, ], ) print(chat_completion.choices[0].message.content) ``` ## API compatibility ### Differences The following options have minor differences: * `stop`: the returned string includes the stop word for Fireworks while it's omitted for OpenAI (it can be easily truncated on client side) * `max_tokens`: behaves differently if the model context length is exceeded. If the length of `prompt` or `messages` plus `max_tokens` is higher than the model's context window, `max_tokens` will be adjusted lower accordingly. OpenAI returns invalid request error in this situation. This behavior can be adjusted by `context_length_exceeded_behavior` parameter. ### Token usage for streaming responses OpenAI API returns usage stats (number of tokens in prompt and completion) for non-streaming responses but doesn't for the streaming ones (see [forum post](https://community.openai.com/t/chat-completion-stream-api-token-usage/352964)). Fireworks.ai returns usage stats in both cases. For streaming responses, the `usage` field is returned in the very last chunk on the response (i.e. the one having `finish_reason` set). For example: ```bash cURL theme={null} curl --request POST \ --url https://api.fireworks.ai/inference/v1/completions \ --header "accept: application/json" \ --header "authorization: Bearer $API_KEY" \ --header "content-type: application/json" \ --data '{"model": "accounts/fireworks/models/starcoder-16b-w8a16", "prompt": "def say_hello_world():", "max_tokens": 100, "stream": true}' ``` ``` data: {..., "choices":[{"text":"\n print('Hello,","index":0,"finish_reason":null,"logprobs":null}],"usage":null} data: {..., "choices":[{"text":" World!')\n\n\n","index":0,"finish_reason":null,"logprobs":null}],"usage":null} data: {..., "choices":[{"text":"say_hello_","index":0,"finish_reason":null,"logprobs":null}],"usage":null} data: {..., "choices":[{"text":"world()\n","index":0,"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":7,"total_tokens":24,"completion_tokens":17}} data: [DONE] ``` Note, that if you're using OpenAI SDK, they `usage` field won't be listed in the SDK's structure definition. But it can be accessed directly. For example: * In Python SDK, you can access the attribute directly, e.g. `for chunk in openai.ChatCompletion.create(...): print(chunk["usage"])`. * In TypeScript SDK, you need to cast away the typing, e.g. `for await (const chunk of await openai.chat.completions.create(...)) { console.log((chunk as any).usage); }`. ### Not supported options The following options are not yet supported: * `presence_penalty` * `frequency_penalty` * `best_of`: you can use `n` instead * `logit_bias` * `functions`: you can use our [LangChain integration](https://python.langchain.com/docs/integrations/providers/fireworks) to achieve similar functionality client-side Please reach out to us on [Discord](https://discord.gg/fireworks-ai) if you have a use case requiring one of these. --- # Source: https://docs.fireworks.ai/fine-tuning/parameter-tuning.md # Parameter Tuning > Learn how training parameters affect model behavior and outcomes ## Overview Reinforcement fine-tuning uses two categories of parameters to control model training: **training parameters** that govern how the model learns, and **rollout (sampling) parameters** that control how the model generates responses during training. Most experiments converge well with the default values. Adjust parameters only when you have a clear hypothesis based on your training metrics and reward curves. ## Training Parameters Core parameters that control how your model learns during the training process. **What it does**: Controls how aggressively the model updates its weights during each training step. Think of it as the "step size" when descending the loss landscape. **Default**: `1e-4` (0.0001)\ **Valid range**: `1e-5` to `5e-4` **How it affects outcome**: * **Too high** → Unstable training where reward spikes briefly then collapses as the model overshoots optimal weights. * **Too low** → Painfully slow convergence. The reward curve plateaus too early before reaching optimal performance. * **Just right** → Steady, consistent reward improvement throughout training. **When to adjust**: * **Decrease** when you see reward spikes followed by crashes in your training metrics * **Increase** when the reward curve plateaus too early and stops improving * Keep changes within 2× of the default value **What it does**: The number of complete passes through your training dataset. Each epoch processes every example once. **Default**: `1`\ **Valid range**: `1` to `10` (whole numbers only) **How it affects outcome**: * **Too few** → The model hasn't had enough exposure to learn patterns from your data * **Too many** → Overfitting risk where the model memorizes the training set instead of generalizing * **Just right** → Reward curve shows steady improvement and plateaus near the end of training **When to adjust**: * **Add 1-2 more epochs** if the reward is still climbing steadily at the end of training * **Keep at 1** for most tasks—the default works well * Watch your reward curves to detect when adding more epochs stops helping **What it does**: Controls the number of trainable parameters in your LoRA adapter. LoRA (Low-Rank Adaptation) adds small adapter layers to the base model rather than training all weights. Higher rank means more capacity to learn new behaviors. **Default**: `8`\ **Valid range**: `4` to `32` (must be powers of 2: 4, 8, 16, 32) **How it affects outcome**: * **Lower rank (4-8)** → Faster training, but may lack capacity for complex tasks * **Just right (8-16)** → Balances capacity and efficiency for most tasks * **Higher rank (32)** → More learning capacity, but requires significantly more GPUs and risks overfitting **When to adjust**: * **Increase** for complex reasoning tasks or when the model struggles to learn desired behaviors * Consider task complexity: simple style changes need lower rank, complex reasoning needs higher **What it does**: The amount of data (measured in tokens) processed in each training step before updating model weights. Unlike traditional batch sizes that count sequences (e.g., 32 or 64 sequences), Fireworks RFT uses **token-based batch sizing**. For example, with an 8k max sequence length, a 64k batch size allows up to 8 sequences per batch (64k tokens ÷ 8k tokens/sequence = 8 sequences). **Default**: `32k tokens` **How it affects outcome**: * **Smaller batches** → Noisier gradient updates that may help exploration, but slower training throughput * **Larger batches** → Smoother, more stable updates and faster training throughput **When to adjust**: * Most users should stick with the default. Modify if you want a smaller/larger amount of tokens per train step ## Rollout (Sampling) Parameters Parameters that control how the model generates responses during training rollouts. **What it does**: Controls the randomness of the model's token selection during generation. Higher temperature = more random/creative, lower = more deterministic/focused. **Default**: `0.7`\ **Valid range**: `0.1` to `2.0` (must be >0) **How it affects outcome**: * **0.0-0.1 (near-greedy)** → Deterministic outputs with no exploration. Leads to mode collapse and repetitive text. **Avoid in RFT.** * **0.5-1.0 (sweet spot)** → Good balance of exploration and coherence. Ideal for most RLHF applications. * **>1.2 (high randomness)** → Very creative but potentially incoherent outputs **When to adjust**: * **Lower (0.3-0.5)** for tasks requiring precision, factual accuracy, or safety (less toxic outputs) * **Raise (1.0-1.2)** for creative tasks like story generation or when you need more diverse rollout exploration * **Never use 0.0**—greedy sampling breaks RFT by eliminating exploration **What it does**: Dynamically limits token sampling to the smallest set of tokens whose cumulative probability exceeds threshold p. Only considers the most probable tokens that together make up the top p% of probability mass. **Default**: `1.0` (considers all tokens)\ **Valid range**: `0` to `1` **How it affects outcome**: * Lower values (0.2-0.5) filter out long-tail, low-probability tokens that often cause hallucinations * Higher values (0.9-1.0) allow more diversity in outputs * Prevents the model from selecting very unlikely tokens that may be nonsensical **When to adjust**: * **Lower to 0.2-0.5** when your reward function penalizes hallucinations or factual errors * **Keep at 0.9-1.0** for creative tasks that benefit from diverse vocabulary * Works well in combination with temperature for fine-grained control **What it does**: Limits sampling to only the K most probable tokens at each step. A fixed-size cutoff (unlike top-p which is dynamic). **Default**: `40`\ **Valid range**: `0` to `100` (0 = disabled) **How it affects outcome**: * Similar to top-p but uses a fixed number of candidates instead of a probability threshold * Lower k = more focused, less diverse outputs * Higher k = more exploration and creativity **When to adjust**: * **Combine with temperature** (e.g., temp 0.8 + top-k 40) for balanced creative exploration * **Keep ≤50** to maintain reasonable inference latency * Consider using top-p instead for most use cases—it adapts better to varying probability distributions **What it does**: How many different responses the model generates for each prompt during training. The policy optimization algorithm compares these candidates to compute the KL divergence term and learn which responses are better. **Default**: `4`\ **Valid range**: `2` to `8` (minimum 2 required) **How it affects outcome**: * **n=1** → **Not allowed.** Policy optimization requires multiple candidates to learn from comparisons * **n=2-4** → Minimal viable exploration. Faster and cheaper but less signal for learning * **n=4-8** → Good balance of learning signal and cost for most tasks * **n>8** → Diminishing returns. Significantly slower and more expensive with marginal quality gains **When to adjust**: * **Increase to 6-8** when you need higher quality and cost isn't a concern * **Keep at 4** for most experiments—it's the sweet spot * **Never set to 1**—this will cause training to fail * Consider the tradeoff: more rollouts = better signal but linearly higher cost **What it does**: The maximum number of tokens the model can generate in a single response during rollouts. **Default**: `2048`\ **Valid range**: `16` to `16384` **How it affects outcome**: * Directly affects task completion: too short and the model can't finish complex tasks * Longer responses improve reward on summarization, story generation, and reasoning tasks * Linearly increases training cost—every token generated costs compute **When to adjust**: * **Increase** when your tasks require longer reasoning chains, detailed summaries, or complex multi-step solutions * **Decrease** to reduce costs for tasks with naturally short outputs (classification, short-form Q\&A) * Monitor your reward curves: if the model is cutting off mid-response, increase max tokens ## Parameter Interactions Parameters don't work in isolation—they interact in important ways. These three work together to control sampling behavior. Using all three gives you fine-grained control: * **Temperature** sets the overall randomness * **Top-p** dynamically filters by probability mass * **Top-k** sets a hard limit on candidate tokens Example: `temperature=0.8, top_p=0.9, top_k=40` gives creative but controlled outputs. Larger batch sizes provide more stable gradients, which may allow for slightly higher learning rates. However, the default learning rate is tuned for the default batch size—only adjust if you have evidence from your training curves. Larger base models (70B+) may need higher LoRA ranks to capture complex behaviors, but they also require more resources. For smaller models (\<13B), rank 8-16 is usually sufficient. ## Tuning Strategies Best practices for adjusting parameters to achieve your training goals. The default parameters are carefully tuned to work well for most RFT tasks. Don't change them unless you have a clear hypothesis based on your training metrics. Run at least one baseline experiment with defaults before making any adjustments. This gives you: * A performance benchmark to compare against * Understanding of whether parameter tuning is actually needed * Evidence about which metrics need improvement Many successful RFT jobs use all default parameters. When you do adjust parameters, change only one at a time and measure the impact on your reward curves and evaluation metrics. **Good workflow:** 1. Run baseline with defaults 2. Identify specific issue (e.g., reward crashes, slow convergence) 3. Change ONE parameter that should address that issue 4. Compare results 5. Repeat **Avoid:** Changing multiple parameters simultaneously—you won't know which change caused the improvement or regression. Use Weights & Biases integration to: * Compare training curves across experiments * Track reward progression over time * Log all hyperparameters automatically This makes it easy to identify which parameter changes actually helped and which hurt performance. Quick reference for goal-directed parameter tuning: * **Faster convergence** → ↑ epochs (add 1-2), tune learning rate (stay \<2× default) * **Better quality** → ↑ temperature (1.0-1.2), ↑ rollouts (6-8), ↑ max tokens * **Safer/less toxic** → ↓ temperature (0.3-0.5), ↓ top-p (0.5), ↓ top-k * **More creative** → ↑ temperature (1.0-1.2), top-p = 0.9 * **Lower cost** → ↓ rollouts, ↓ max tokens, ↓ batch size * **Higher capacity** → ↑ LoRA rank (16-32), but monitor memory usage * **Prevent overfitting** → Keep epochs = 1, consider lower LoRA rank ## Next Steps Quick lookup table for all parameters with defaults and valid ranges Launch your RFT job Hands-on tutorial showing parameter tuning in practice Learn about the RFT training process and workflow --- # Source: https://docs.fireworks.ai/api-reference/post-chatcompletions.md # Create Chat Completion > Creates a model response for the given chat conversation. ## OpenAPI ````yaml post /chat/completions paths: path: /chat/completions method: post servers: - url: https://api.fireworks.ai/inference/v1/ request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer cookie: {} parameters: path: {} query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: model: allOf: - description: The name of the model to use. type: string example: accounts/fireworks/models/llama-v3p1-8b-instruct messages: allOf: - description: A list of messages comprising the conversation so far. type: array minItems: 1 items: $ref: '#/components/schemas/ChatCompletionRequestMessage' tool_choice: allOf: - description: > Controls which (if any) tool is called by the model. - `none`: the model will not call any tool and instead generates a message. - `auto`: the model can pick between generating a message or calling one or more tools. - `required` (alias: `any`): the model must call one or more tools. To force a specific function, pass an object of the form `{ "type": "function", "name": "my_function" }` or `{ "type": "function", "function": { "name": "my_function" }` for OpenAI compatibility. oneOf: - $ref: '#/components/schemas/ToolChoiceOptions' - $ref: '#/components/schemas/ToolChoiceFunction' tools: allOf: - type: array description: > A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. See the guide for more information and the list of supported models: https://docs.fireworks.ai/guides/function-calling#supported-models items: $ref: '#/components/schemas/ChatCompletionTool' max_tokens: allOf: - description: > The maximum number of tokens to generate in the completion. If the token count of your prompt (previous messages) plus `max_tokens` exceed the model's context length, the behavior is depends on `context_length_exceeded_behavior`. By default, `max_tokens` will be lowered to fit in the context window instead of returning an error. default: 2000 type: integer prompt_truncate_len: allOf: - description: > The size (in tokens) to which to truncate chat prompts. This includes the system prompt (if any), previous user/assistant messages, and the current user message. Earlier user/assistant messages will be evicted first to fit the prompt into this length. The system prompt is preserved whenever possible and only truncated as a last resort. This should usually be set to a number much smaller << than the model's maximum context size, to allow enough remaining tokens for generating a response. If omitted, you may receive "prompt too long" errors in your responses as conversations grow. Note that even with this set, you may still receive "prompt too long" errors if individual messages (such as a very long system prompt or user message) exceed the model's context window on their own. default: 1500 nullable: true type: integer temperature: allOf: - type: number minimum: 0 maximum: 2 default: 1 example: 1 nullable: true description: > What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. top_p: allOf: - type: number minimum: 0 maximum: 1 default: 1 example: 1 nullable: true description: > An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or `temperature` but not both. top_k: allOf: - type: integer minimum: 0 maximum: 100 example: 50 nullable: true description: > Top-k sampling is another sampling method where the k most probable next tokens are filtered and the probability mass is redistributed among only those k next tokens. The value of k controls the number of candidates for the next token at each step during text generation. Must be between 0 and 100. frequency_penalty: allOf: - type: number default: 0 minimum: -2 maximum: 2 nullable: true description: > Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Reasonable value is around 0.1 to 1 if the aim is to just reduce repetitive samples somewhat. If the aim is to strongly suppress repetition, then one can increase the coefficients up to 2, but this can noticeably degrade the quality of samples. Negative values can be used to increase the likelihood of repetition. See also `presence_penalty` for penalizing tokens that have at least one appearance at a fixed rate. OpenAI compatible (follows OpenAI's conventions for handling token frequency and repetition penalties). perf_metrics_in_response: allOf: - type: boolean default: false nullable: true description: > Whether to include performance metrics in the response body. **Non-streaming requests:** Performance metrics are always included in response headers (e.g., `fireworks-prompt-tokens`, `fireworks-server-time-to-first-token`). Setting this to `true` additionally includes the same metrics in the response body under the `perf_metrics` field. **Streaming requests:** Performance metrics are only included in the response body under the `perf_metrics` field in the final chunk (when `finish_reason` is set). This is because headers may not be accessible during streaming. The response body `perf_metrics` field contains the following metrics: **Basic Metrics (all deployments):** - `prompt-tokens`: Number of tokens in the prompt - `server-time-to-first-token`: Time from request start to first token (in seconds) - `server-processing-time`: Total processing time (in seconds, only for completed requests) **Predicted Outputs Metrics:** - `speculation-prompt-tokens`: Number of speculative prompt tokens - `speculation-prompt-matched-tokens`: Number of matched speculative prompt tokens (for completed requests) **Dedicated Deployment Only Metrics:** - `speculation-generated-tokens`: Number of speculative generated tokens (for completed requests) - `speculation-acceptance`: Speculation acceptance rates by position - `cached-prompt-tokens`: Number of cached prompt tokens - `backend-host`: Hostname of the backend server - `num-concurrent-requests`: Number of concurrent requests - `deployment`: Deployment name - `tokenizer-queue-duration`: Time spent in tokenizer queue - `tokenizer-duration`: Time spent in tokenizer - `prefill-queue-duration`: Time spent in prefill queue - `prefill-duration`: Time spent in prefill - `generation-queue-duration`: Time spent in generation queue presence_penalty: allOf: - type: number default: 0 minimum: -2 maximum: 2 nullable: true description: > Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Reasonable value is around 0.1 to 1 if the aim is to just reduce repetitive samples somewhat. If the aim is to strongly suppress repetition, then one can increase the coefficients up to 2, but this can noticeably degrade the quality of samples. Negative values can be used to increase the likelihood of repetition. See also `frequence_penalty` for penalizing tokens at an increasing rate depending on how often they appear. OpenAI compatible (follows OpenAI's conventions for handling token frequency and repetition penalties). repetition_penalty: allOf: - type: number default: 1 minimum: 0 maximum: 2 nullable: true description: > Applies a penalty to repeated tokens to discourage or encourage repetition. A value of `1.0` means no penalty, allowing free repetition. Values above `1.0` penalize repetition, reducing the likelihood of repeating tokens. Values between `0.0` and `1.0` reward repetition, increasing the chance of repeated tokens. For a good balance, a value of `1.2` is often recommended. Note that the penalty is applied to both the generated output and the prompt in decoder-only models. reasoning_effort: allOf: - oneOf: - type: string enum: - none - low - medium - high - type: integer nullable: true description: > Applicable to reasoning models only, this option controls the reasoning token length, and can be set to either 'none', 'low', 'medium', 'high' or an integer. 'low', 'medium' and 'high' correspond to progressively higher thinking effort and thus longer reasoning tokens. 'none' means disable thinking. You can alterntively set the option to an integer controlling the hard-cutoff for reasoning token length (this is not entirely OpenAI compatible, you might have to use fireworks.ai client library to bypass the schema check). Note: For OpenAI GPT OSS models, only the string values ('low', 'medium', 'high') are supported. Integer values will not work with these models. mirostat_lr: allOf: - type: number default: 0.1 nullable: true description: > Specifies the learning rate for the Mirostat sampling algorithm, which controls how quickly the model adjusts its token distribution to maintain the target perplexity. A smaller value slows down the adjustments, leading to more stable but gradual shifts, while higher values speed up corrections at the cost of potential instability. mirostat_target: allOf: - type: number nullable: true description: > Defines the target perplexity for the Mirostat algorithm. Perplexity measures the unpredictability of the generated text, with higher values encouraging more diverse and creative outputs, while lower values prioritize predictability and coherence. The algorithm dynamically adjusts the token selection to maintain this target during text generation. If not specified, Mirostat sampling is disabled. 'n': allOf: - type: integer minimum: 1 maximum: 128 default: 1 example: 1 nullable: true description: > How many completions to generate for each prompt. **Note:** Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for `max_tokens` and `stop`. ignore_eos: allOf: - description: > This setting controls whether the model should ignore the End of Sequence (EOS) token. When set to `True`, the model will continue generating tokens even after the EOS token is produced. By default, it stops when the EOS token is reached. type: boolean nullable: true default: false stop: allOf: - description: > Up to 4 sequences where the API will stop generating further tokens. The returned text will NOT contain the stop sequence. default: null oneOf: - type: string nullable: true - type: array minItems: 1 maxItems: 4 items: type: string response_format: allOf: - type: object description: > Allows to force the model to produce specific output format. Setting to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. If "type" is "json_schema", a JSON schema must be provided. E.g., `response_format = {"type": "json_schema", "json_schema": }`. **Important:** when using JSON mode, it's crucial to also instruct the model to produce JSON via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. In this case the return value might not be a valid JSON. nullable: true default: null properties: type: type: string enum: - text - json_object - json_schema example: json_object default: text description: Must be one of `text`, `json_object` or `json_schema`. json_schema: type: object default: null nullable: true description: > JSON schema according to https://json-schema.org/specification that can be provided if `"type": "json_schema"`. Most common fields like `type`, `properties`, `items`, `required` and `anyOf` are supported. More sophisticated cases like `oneOf` might not be covered. Note: it's an OpenAI API extension. Example: `{"type": "object", "properties": {"foo": {"type": "string"}, "bar": {"type": "integer"}}, "required": ["foo"]}` required: - type stream: allOf: - description: > Whether to stream back partial progress. If set, tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. type: boolean nullable: true default: false context_length_exceeded_behavior: allOf: - type: string enum: - truncate - error description: > What to do if the token count of prompt plus `max_tokens` exceeds the model's context window. Passing `truncate` limits the `max_tokens` to at most `context_window_length - prompt_length`. This is the default. Passing `error` would trigger a request error. The default of 'truncate' is selected as it allows to ask for high `max_tokens` value while respecting the context window length without having to do client-side prompt tokenization. Note, that it differs from OpenAI's behavior that matches that of `error`. logprobs: allOf: - oneOf: - type: boolean - type: integer minimum: 0 maximum: 5 default: null nullable: true description: > Include log probabilities in the response. This accepts either a boolean or an integer: - If set to `true`, log probabilities are included and the number of alternatives can be controlled via `top_logprobs` (OpenAI-compatible behavior). - If set to an integer N (0-5), include log probabilities for up to N most likely tokens per position in the legacy format. The API will always return the `logprob` of the sampled token, so there may be up to `logprobs+1` elements in the response when an integer is used. The maximum value for the integer form is 5. top_logprobs: allOf: - type: integer minimum: 0 maximum: 5 default: null nullable: true description: > An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. The minimum value is 0 and the maximum value is 5. When `logprobs` is set, `top_logprobs` can be used to modify how many top log probabilities are returned. If `top_logprobs` is not set, the API will return up to `logprobs` tokens per position. echo: allOf: - type: boolean default: false nullable: true description: Echo back the prompt in addition to the completion. min_p: allOf: - type: number minimum: 0 maximum: 1 default: 0 nullable: true description: > Minimum probability threshold for token selection. Only tokens with probability >= min_p are considered for selection. This is an alternative to top_p and top_k sampling. typical_p: allOf: - type: number minimum: 0 maximum: 1 default: 1 nullable: true description: > Typical-p sampling is an alternative to nucleus sampling. It considers the most typical tokens whose cumulative probability is at most typical_p. logit_bias: allOf: - type: object additionalProperties: type: number nullable: true description: > Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. user: allOf: - description: >- A unique identifier representing your end-user, which can help monitor and detect abuse type: string nullable: true required: true refIdentifier: '#/components/schemas/BaseCreateCompletionRequest' requiredProperties: - model - messages examples: example: value: model: accounts/fireworks/models/llama-v3p1-8b-instruct messages: - role: system content: name: tool_choice: none tools: - type: function function: description: name: parameters: type: object required: - properties: {} max_tokens: 2000 prompt_truncate_len: 1500 temperature: 1 top_p: 1 top_k: 50 frequency_penalty: 0 perf_metrics_in_response: false presence_penalty: 0 repetition_penalty: 1 reasoning_effort: none mirostat_lr: 0.1 mirostat_target: 123 'n': 1 ignore_eos: false stop: response_format: null stream: false context_length_exceeded_behavior: truncate logprobs: true top_logprobs: null echo: false min_p: 0 typical_p: 1 logit_bias: {} user: response: '200': application/json: schemaArray: - type: object properties: id: allOf: - type: string description: A unique identifier of the response. object: allOf: - type: string description: The object type, which is always "chat.completion". created: allOf: - type: integer description: The Unix time in seconds when the response was generated. model: allOf: - type: string description: The model used for the chat completion. choices: allOf: - type: array description: The list of chat completion choices. items: type: object required: - index - message - finish_reason properties: index: type: integer description: The index of the chat completion choice. message: $ref: '#/components/schemas/ChatCompletionResponseMessage' finish_reason: type: string description: > The reason the model stopped generating tokens. This will be "stop" if the model hit a natural stop point or a provided stop sequence, or "length" if the maximum number of tokens specified in the request was reached. enum: - stop - length usage: allOf: - $ref: '#/components/schemas/UsageInfo' refIdentifier: '#/components/schemas/CreateChatCompletionResponse' requiredProperties: - id - object - created - model - choices examples: example: value: id: object: created: 123 model: choices: - index: 123 message: role: system content: reasoning_content: tool_calls: - id: type: function function: name: arguments: finish_reason: stop usage: prompt_tokens: 123 completion_tokens: 123 total_tokens: 123 description: OK deprecated: false type: path components: schemas: ChatCompletionRequestMessage: type: object properties: role: type: string enum: - system - user - assistant description: >- The role of the messages author. One of `system`, `user`, or `assistant`. content: oneOf: - type: string nullable: true description: > The contents of the message. `content` is required for all messages, and may be null for assistant messages with function calls. - type: array description: A list of chat messages that could contain images or texts items: $ref: '#/components/schemas/ChatMessageContent' name: type: string description: >- The name of the author of this message. May contain a-z, A-Z, 0-9, and underscores, with a maximum length of 64 characters. required: - role - content ChatMessageContent: description: | The content of the message. Can either be text or image_url. oneOf: - type: object description: A message containing text properties: type: type: string enum: - text text: type: string description: The content of the message. - type: object description: A message containing image properties: type: type: string enum: - image_url image_url: type: object properties: url: type: string description: > base64 encoded string for image formatted as MIME_TYPE,\ eg. data:image/jpeg;base64,\ ChatCompletionResponseMessage: type: object properties: role: type: string enum: - system - user - assistant description: The role of the author of this message. content: type: string description: The contents of the message. nullable: true reasoning_content: type: string description: >- The reasoning or thinking process generated by the model. This field is only available for certain reasoning models (GLM 4.5, GLM 4.5 Air, GPT OSS 120B, GPT OSS 20B) and contains the model's internal reasoning that would otherwise appear in tags within the content field. nullable: true tool_calls: $ref: '#/components/schemas/ChatCompletionMessageToolCalls' required: - role ChatCompletionMessageToolCalls: type: array description: The tool calls generated by the model, such as function calls. items: $ref: '#/components/schemas/ChatCompletionMessageToolCall' ChatCompletionMessageToolCall: type: object properties: id: type: string description: The ID of the tool call. type: type: string enum: - function description: The type of the tool. Currently, only `function` is supported. function: type: object description: The function that the model called. properties: name: type: string description: The name of the function to call. arguments: type: string description: >- The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function. required: - name - arguments required: - id - type - function ChatCompletionTool: type: object properties: type: type: string enum: - function description: The type of the tool. Currently, only `function` is supported. function: $ref: '#/components/schemas/FunctionObject' required: - type - function FunctionObject: type: object properties: description: type: string description: >- A description of what the function does, used by the model to choose when and how to call the function. name: type: string description: >- The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. parameters: $ref: '#/components/schemas/FunctionParameters' required: - name - parameters FunctionParameters: type: object properties: type: type: string enum: - object description: type of parameter required: type: array description: which one of the parameter is required items: type: string properties: type: object additionalProperties: type: object properties: type: type: string description: The type of the property description: type: string description: A description of the property description: >- A map of property names to their types and descriptions. Each property is an object with 'type' and 'description' fields. description: >- The parameters the functions accepts, described as a JSON Schema object. To describe a function that accepts no parameters, provide the value `{"type": "object", "properties": {}}`. UsageInfo: type: object description: > Usage statistics. For streaming responses, `usage` field is included in the very last response chunk returned. Note that returning `usage` for streaming requests is an OpenAI API extension. If you use OpenAI SDK, you might access the field directly even if it's not present in the type signature in the SDK. properties: prompt_tokens: type: integer description: The number of tokens in the prompt. completion_tokens: type: integer description: The number of tokens in the generated completion. total_tokens: type: integer description: >- The total number of tokens used in the request (prompt + completion). required: - prompt_tokens - completion_tokens - total_tokens ToolChoiceOptions: type: string title: Tool choice mode description: > Controls which (if any) tool is called by the model. `none` means the model will not call any tool and instead generates a message. `auto` means the model can pick between generating a message or calling one or more tools. `required` means the model must call one or more tools. This is equivalent to `any`. enum: - none - auto - required - any ToolChoiceFunction: type: object title: Function tool description: | Use this option to force the model to call a specific function. properties: type: type: string enum: - function description: For function calling, the type is always `function`. x-stainless-const: true name: type: string description: The name of the function to call. function: type: object description: >- OpenAI-compatible nested function object. Either `name` or `function.name` must be provided. properties: name: type: string description: The name of the function to call. required: - type ```` --- # Source: https://docs.fireworks.ai/api-reference/post-completions.md # Create Completion > Creates a completion for the provided prompt and parameters. ## OpenAPI ````yaml post /completions paths: path: /completions method: post servers: - url: https://api.fireworks.ai/inference/v1/ request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer cookie: {} parameters: path: {} query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: model: allOf: - description: The name of the model to use. type: string example: accounts/fireworks/models/llama-v3p1-8b-instruct prompt: allOf: - description: > The prompt to generate completions for. It can be a single string or an array of strings. It can also be an array of integers or an array of integer arrays, which allows to pass already tokenized prompt. If multiple prompts are specified, several choices with corresponding `index` will be returned in the output." oneOf: - type: string example: The sky is - type: array minItems: 1 items: type: string example: The sky is - type: array minItems: 1 items: type: integer example: '[123, 10, 456]' - type: array minItems: 1 items: type: array minItems: 1 items: type: integer example: '[[123, 10, 456], [100, 543]]' images: allOf: - description: > The list of base64 encoded images for visual language completition generation. They should be formatted as MIME_TYPE,\ eg. data:image/jpeg;base64,\ Additionally, the number of images provided should match the number of '\' special token in the prompt type: array items: type: string max_tokens: allOf: - type: integer minimum: 0 default: 16 example: 16 nullable: true description: > The maximum number of tokens to generate in the completion. If the token count of your prompt plus `max_tokens` exceed the model's context length, the behavior is depends on `context_length_exceeded_behavior`. By default, `max_tokens` will be lowered to fit in the context window instead of returning an error. echo_last: allOf: - type: integer minimum: 0 nullable: true description: > Echo back the last N tokens of the prompt in addition to the completion. This is useful for obtaining logprobs of the prompt suffix but without transferring too much data. Passing `echo_last=len(prompt)` is the same as `echo=True` temperature: allOf: - type: number minimum: 0 maximum: 2 default: 1 example: 1 nullable: true description: > What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. top_p: allOf: - type: number minimum: 0 maximum: 1 default: 1 example: 1 nullable: true description: > An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or `temperature` but not both. top_k: allOf: - type: integer minimum: 0 maximum: 100 example: 50 nullable: true description: > Top-k sampling is another sampling method where the k most probable next tokens are filtered and the probability mass is redistributed among only those k next tokens. The value of k controls the number of candidates for the next token at each step during text generation. Must be between 0 and 100. frequency_penalty: allOf: - type: number default: 0 minimum: -2 maximum: 2 nullable: true description: > Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Reasonable value is around 0.1 to 1 if the aim is to just reduce repetitive samples somewhat. If the aim is to strongly suppress repetition, then one can increase the coefficients up to 2, but this can noticeably degrade the quality of samples. Negative values can be used to increase the likelihood of repetition. See also `presence_penalty` for penalizing tokens that have at least one appearance at a fixed rate. OpenAI compatible (follows OpenAI's conventions for handling token frequency and repetition penalties). perf_metrics_in_response: allOf: - type: boolean default: false nullable: true description: > Whether to include performance metrics in the response body. **Non-streaming requests:** Performance metrics are always included in response headers (e.g., `fireworks-prompt-tokens`, `fireworks-server-time-to-first-token`). Setting this to `true` additionally includes the same metrics in the response body under the `perf_metrics` field. **Streaming requests:** Performance metrics are only included in the response body under the `perf_metrics` field in the final chunk (when `finish_reason` is set). This is because headers may not be accessible during streaming. The response body `perf_metrics` field contains the following metrics: **Basic Metrics (all deployments):** - `prompt-tokens`: Number of tokens in the prompt - `server-time-to-first-token`: Time from request start to first token (in seconds) - `server-processing-time`: Total processing time (in seconds, only for completed requests) **Predicted Outputs Metrics:** - `speculation-prompt-tokens`: Number of speculative prompt tokens - `speculation-prompt-matched-tokens`: Number of matched speculative prompt tokens (for completed requests) **Dedicated Deployment Only Metrics:** - `speculation-generated-tokens`: Number of speculative generated tokens (for completed requests) - `speculation-acceptance`: Speculation acceptance rates by position - `cached-prompt-tokens`: Number of cached prompt tokens - `backend-host`: Hostname of the backend server - `num-concurrent-requests`: Number of concurrent requests - `deployment`: Deployment name - `tokenizer-queue-duration`: Time spent in tokenizer queue - `tokenizer-duration`: Time spent in tokenizer - `prefill-queue-duration`: Time spent in prefill queue - `prefill-duration`: Time spent in prefill - `generation-queue-duration`: Time spent in generation queue presence_penalty: allOf: - type: number default: 0 minimum: -2 maximum: 2 nullable: true description: > Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Reasonable value is around 0.1 to 1 if the aim is to just reduce repetitive samples somewhat. If the aim is to strongly suppress repetition, then one can increase the coefficients up to 2, but this can noticeably degrade the quality of samples. Negative values can be used to increase the likelihood of repetition. See also `frequence_penalty` for penalizing tokens at an increasing rate depending on how often they appear. OpenAI compatible (follows OpenAI's conventions for handling token frequency and repetition penalties). repetition_penalty: allOf: - type: number default: 1 minimum: 0 maximum: 2 nullable: true description: > Applies a penalty to repeated tokens to discourage or encourage repetition. A value of `1.0` means no penalty, allowing free repetition. Values above `1.0` penalize repetition, reducing the likelihood of repeating tokens. Values between `0.0` and `1.0` reward repetition, increasing the chance of repeated tokens. For a good balance, a value of `1.2` is often recommended. Note that the penalty is applied to both the generated output and the prompt in decoder-only models. reasoning_effort: allOf: - oneOf: - type: string enum: - none - low - medium - high - type: integer nullable: true description: > Applicable to reasoning models only, this option controls the reasoning token length, and can be set to either 'none', 'low', 'medium', 'high' or an integer. 'low', 'medium' and 'high' correspond to progressively higher thinking effort and thus longer reasoning tokens. 'none' means disable thinking. You can alterntively set the option to an integer controlling the hard-cutoff for reasoning token length (this is not entirely OpenAI compatible, you might have to use fireworks.ai client library to bypass the schema check). Note: For OpenAI GPT OSS models, only the string values ('low', 'medium', 'high') are supported. Integer values will not work with these models. mirostat_lr: allOf: - type: number default: 0.1 nullable: true description: > Specifies the learning rate for the Mirostat sampling algorithm, which controls how quickly the model adjusts its token distribution to maintain the target perplexity. A smaller value slows down the adjustments, leading to more stable but gradual shifts, while higher values speed up corrections at the cost of potential instability. mirostat_target: allOf: - type: number nullable: true description: > Defines the target perplexity for the Mirostat algorithm. Perplexity measures the unpredictability of the generated text, with higher values encouraging more diverse and creative outputs, while lower values prioritize predictability and coherence. The algorithm dynamically adjusts the token selection to maintain this target during text generation. If not specified, Mirostat sampling is disabled. 'n': allOf: - type: integer minimum: 1 maximum: 128 default: 1 example: 1 nullable: true description: > How many completions to generate for each prompt. **Note:** Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for `max_tokens` and `stop`. ignore_eos: allOf: - description: > This setting controls whether the model should ignore the End of Sequence (EOS) token. When set to `True`, the model will continue generating tokens even after the EOS token is produced. By default, it stops when the EOS token is reached. type: boolean nullable: true default: false stop: allOf: - description: > Up to 4 sequences where the API will stop generating further tokens. The returned text will NOT contain the stop sequence. default: null oneOf: - type: string nullable: true - type: array minItems: 1 maxItems: 4 items: type: string response_format: allOf: - type: object description: > Allows to force the model to produce specific output format. Setting to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. If "type" is "json_schema", a JSON schema must be provided. E.g., `response_format = {"type": "json_schema", "json_schema": }`. **Important:** when using JSON mode, it's crucial to also instruct the model to produce JSON via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. In this case the return value might not be a valid JSON. nullable: true default: null properties: type: type: string enum: - text - json_object - json_schema example: json_object default: text description: Must be one of `text`, `json_object` or `json_schema`. json_schema: type: object default: null nullable: true description: > JSON schema according to https://json-schema.org/specification that can be provided if `"type": "json_schema"`. Most common fields like `type`, `properties`, `items`, `required` and `anyOf` are supported. More sophisticated cases like `oneOf` might not be covered. Note: it's an OpenAI API extension. Example: `{"type": "object", "properties": {"foo": {"type": "string"}, "bar": {"type": "integer"}}, "required": ["foo"]}` required: - type stream: allOf: - description: > Whether to stream back partial progress. If set, tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. type: boolean nullable: true default: false context_length_exceeded_behavior: allOf: - type: string enum: - truncate - error description: > What to do if the token count of prompt plus `max_tokens` exceeds the model's context window. Passing `truncate` limits the `max_tokens` to at most `context_window_length - prompt_length`. This is the default. Passing `error` would trigger a request error. The default of 'truncate' is selected as it allows to ask for high `max_tokens` value while respecting the context window length without having to do client-side prompt tokenization. Note, that it differs from OpenAI's behavior that matches that of `error`. logprobs: allOf: - oneOf: - type: boolean - type: integer minimum: 0 maximum: 5 default: null nullable: true description: > Include log probabilities in the response. This accepts either a boolean or an integer: - If set to `true`, log probabilities are included and the number of alternatives can be controlled via `top_logprobs` (OpenAI-compatible behavior). - If set to an integer N (0-5), include log probabilities for up to N most likely tokens per position in the legacy format. The API will always return the `logprob` of the sampled token, so there may be up to `logprobs+1` elements in the response when an integer is used. The maximum value for the integer form is 5. top_logprobs: allOf: - type: integer minimum: 0 maximum: 5 default: null nullable: true description: > An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. The minimum value is 0 and the maximum value is 5. When `logprobs` is set, `top_logprobs` can be used to modify how many top log probabilities are returned. If `top_logprobs` is not set, the API will return up to `logprobs` tokens per position. echo: allOf: - type: boolean default: false nullable: true description: Echo back the prompt in addition to the completion. min_p: allOf: - type: number minimum: 0 maximum: 1 default: 0 nullable: true description: > Minimum probability threshold for token selection. Only tokens with probability >= min_p are considered for selection. This is an alternative to top_p and top_k sampling. typical_p: allOf: - type: number minimum: 0 maximum: 1 default: 1 nullable: true description: > Typical-p sampling is an alternative to nucleus sampling. It considers the most typical tokens whose cumulative probability is at most typical_p. logit_bias: allOf: - type: object additionalProperties: type: number nullable: true description: > Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. user: allOf: - description: >- A unique identifier representing your end-user, which can help monitor and detect abuse type: string nullable: true required: true refIdentifier: '#/components/schemas/BaseCreateCompletionRequest' requiredProperties: - model - prompt examples: example: value: model: accounts/fireworks/models/llama-v3p1-8b-instruct prompt: The sky is images: - max_tokens: 16 echo_last: 1 temperature: 1 top_p: 1 top_k: 50 frequency_penalty: 0 perf_metrics_in_response: false presence_penalty: 0 repetition_penalty: 1 reasoning_effort: none mirostat_lr: 0.1 mirostat_target: 123 'n': 1 ignore_eos: false stop: response_format: null stream: false context_length_exceeded_behavior: truncate logprobs: true top_logprobs: null echo: false min_p: 0 typical_p: 1 logit_bias: {} user: response: '200': application/json: schemaArray: - type: object properties: id: allOf: - type: string description: A unique identifier of the response. object: allOf: - type: string description: The object type, which is always "text_completion". created: allOf: - type: integer description: The Unix time in seconds when the response was generated. model: allOf: - type: string description: The model used for the completion. choices: allOf: - type: array description: The list of generated completion choices. items: type: object required: - text - index - logprobs - finish_reason properties: text: type: string description: The completion response. index: type: integer description: The index of the completion choice. logprobs: type: object description: The log probabilities of the most likely tokens. nullable: true properties: tokens: type: array items: type: string token_logprobs: type: array items: type: number top_logprobs: type: array items: type: object additionalProperties: type: integer text_offset: type: array items: type: integer finish_reason: type: string description: > The reason the model stopped generating tokens. This will be "stop" if the model hit a natural stop point or a provided stop sequence, or "length" if the maximum number of tokens specified in the request was reached. enum: - stop - length usage: allOf: - $ref: '#/components/schemas/UsageInfo' refIdentifier: '#/components/schemas/CreateCompletionResponse' requiredProperties: - id - object - created - model - choices examples: example: value: id: object: created: 123 model: choices: - text: index: 123 logprobs: tokens: - token_logprobs: - 123 top_logprobs: - {} text_offset: - 123 finish_reason: stop usage: prompt_tokens: 123 completion_tokens: 123 total_tokens: 123 description: OK deprecated: false type: path components: schemas: UsageInfo: type: object description: > Usage statistics. For streaming responses, `usage` field is included in the very last response chunk returned. Note that returning `usage` for streaming requests is an OpenAI API extension. If you use OpenAI SDK, you might access the field directly even if it's not present in the type signature in the SDK. properties: prompt_tokens: type: integer description: The number of tokens in the prompt. completion_tokens: type: integer description: The number of tokens in the generated completion. total_tokens: type: integer description: >- The total number of tokens used in the request (prompt + completion). required: - prompt_tokens - completion_tokens - total_tokens ```` --- # Source: https://docs.fireworks.ai/api-reference/post-responses.md # Create Response > Creates a model response, optionally interacting with custom tools via the Model Context Protocol (MCP). This endpoint supports conversational continuation and streaming. Explore our cookbooks for detailed examples: - [Basic MCP Usage](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_mcp_examples.ipynb) - [Streaming with MCP](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_mcp_with_streaming.ipynb) - [Conversational History with `previous_response_id`](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_previous_response_cookbook.ipynb) - [Basic Streaming](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_streaming_example.ipynb) - [Controlling Response Storage](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/mcp_server_with_store_false_argument.ipynb) ## OpenAPI ````yaml post /v1/responses paths: path: /v1/responses method: post servers: - url: https://api.fireworks.ai/inference request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: {} query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: model: allOf: - type: string title: Model description: >- The model to use for generating the response. Example: `accounts//models/`. input: allOf: - anyOf: - type: string - items: additionalProperties: true type: object type: array title: Input description: >- The input to the model. Can be a simple text string or a list of message objects for complex inputs with multiple content types. previous_response_id: allOf: - anyOf: - type: string - type: 'null' title: Previous Response Id description: >- The ID of a previous response to continue the conversation from. When provided, the conversation history from that response will be automatically loaded. instructions: allOf: - anyOf: - type: string - type: 'null' title: Instructions description: >- System instructions that guide the model's behavior throughout the conversation. Similar to a system message. max_output_tokens: allOf: - anyOf: - type: integer - type: 'null' title: Max Output Tokens description: >- The maximum number of tokens that can be generated in the response. Must be at least 1. If not specified, the model will generate up to its maximum context length. max_tool_calls: allOf: - anyOf: - type: integer minimum: 1 - type: 'null' title: Max Tool Calls description: >- The maximum number of tool calls allowed in a single response. Useful for controlling costs and limiting tool execution. Must be at least 1. metadata: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Metadata description: >- Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information in a structured format. parallel_tool_calls: allOf: - anyOf: - type: boolean - type: 'null' title: Parallel Tool Calls description: >- Whether to enable parallel function calling during tool use. When true, the model can call multiple tools simultaneously. Default is True. default: true reasoning: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Reasoning description: >- Configuration for reasoning output. When enabled, the model will return its reasoning process along with the response. store: allOf: - anyOf: - type: boolean - type: 'null' title: Store description: >- Whether to store the response. When set to false, the response will not be stored and will not be retrievable via the API. This is useful for ephemeral or sensitive data. See an example in our [Controlling Response Storage cookbook](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/mcp_server_with_store_false_argument.ipynb). Default is True. default: true stream: allOf: - anyOf: - type: boolean - type: 'null' title: Stream description: >- Whether to stream the response back as Server-Sent Events (SSE). When true, tokens are sent incrementally as they are generated. Default is False. default: false temperature: allOf: - anyOf: - type: number maximum: 2 minimum: 0 - type: 'null' title: Temperature description: >- The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0. default: 1 text: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Text description: >- Text generation configuration parameters. Used for advanced text generation settings. tool_choice: allOf: - anyOf: - type: string - additionalProperties: true type: object - type: 'null' title: Tool Choice description: >- Controls which (if any) tool the model should use. Can be 'none' (never call tools), 'auto' (model decides), 'required' (must call at least one tool), or an object specifying a particular tool to call. Default is 'auto'. default: auto tools: allOf: - anyOf: - items: additionalProperties: true type: object type: array - type: 'null' title: Tools description: >- A list of MCP tools the model may call. See our cookbooks for examples on [basic MCP usage](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_mcp_examples.ipynb) and [streaming with MCP](https://github.com/fw-ai/cookbook/blob/main/learn/response-api/fireworks_mcp_with_streaming.ipynb). top_p: allOf: - anyOf: - type: number maximum: 1 minimum: 0 - type: 'null' title: Top P description: >- An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0. We generally recommend altering this or temperature but not both. default: 1 truncation: allOf: - anyOf: - type: string - type: 'null' title: Truncation description: >- The truncation strategy to use for the context when it exceeds the model's maximum length. Can be 'auto' (automatically truncate) or 'disabled' (return error if context too long). Default is 'disabled'. default: disabled user: allOf: - anyOf: - type: string - type: 'null' title: User description: >- A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse. This can be a username, email, or any other unique identifier. required: true title: CreateResponse description: >- Request model for creating a new response. This model defines all the parameters needed to create a new model response, including model configuration, input data, tool definitions, and conversation continuation. refIdentifier: '#/components/schemas/CreateResponse' requiredProperties: - model - input examples: example: value: model: input: previous_response_id: instructions: max_output_tokens: 123 max_tool_calls: 2 metadata: {} parallel_tool_calls: true reasoning: {} store: true stream: true temperature: 1 text: {} tool_choice: tools: - {} top_p: 0.5 truncation: user: response: '200': application/json: schemaArray: - type: object properties: id: allOf: - anyOf: - type: string - type: 'null' title: Id description: >- The unique identifier of the response. Will be None if store=False. object: allOf: - type: string title: Object description: The object type, which is always 'response'. default: response created_at: allOf: - type: integer title: Created At description: >- The Unix timestamp (in seconds) when the response was created. status: allOf: - type: string title: Status description: >- The status of the response. Can be 'completed', 'in_progress', 'incomplete', 'failed', or 'cancelled'. model: allOf: - type: string title: Model description: >- The model used to generate the response (e.g., `accounts//models/`). output: allOf: - items: anyOf: - $ref: '#/components/schemas/Message' - $ref: '#/components/schemas/ToolCall' - $ref: '#/components/schemas/ToolOutput' type: array title: Output description: >- An array of output items produced by the model. Can contain messages, tool calls, and tool outputs. previous_response_id: allOf: - anyOf: - type: string - type: 'null' title: Previous Response Id description: >- The ID of the previous response in the conversation, if this response continues a conversation. usage: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Usage description: >- Token usage information for the request. Contains 'prompt_tokens', 'completion_tokens', and 'total_tokens'. error: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Error description: >- Error information if the response failed. Contains 'type', 'code', and 'message' fields. incomplete_details: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Incomplete Details description: >- Details about why the response is incomplete, if status is 'incomplete'. Contains 'reason' field which can be 'max_output_tokens', 'max_tool_calls', or 'content_filter'. instructions: allOf: - anyOf: - type: string - type: 'null' title: Instructions description: >- System instructions that guide the model's behavior. Similar to a system message. max_output_tokens: allOf: - anyOf: - type: integer - type: 'null' title: Max Output Tokens description: >- The maximum number of tokens that can be generated in the response. Must be at least 1. max_tool_calls: allOf: - anyOf: - type: integer minimum: 1 - type: 'null' title: Max Tool Calls description: >- The maximum number of tool calls allowed in a single response. Must be at least 1. parallel_tool_calls: allOf: - type: boolean title: Parallel Tool Calls description: >- Whether to enable parallel function calling during tool use. Default is True. default: true reasoning: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Reasoning description: >- Reasoning output from the model, if reasoning is enabled. Contains 'content' and 'type' fields. store: allOf: - anyOf: - type: boolean - type: 'null' title: Store description: >- Whether to store this response for future retrieval. If False, the response will not be persisted and previous_response_id cannot reference it. Default is True. default: true temperature: allOf: - type: number maximum: 2 minimum: 0 title: Temperature description: >- The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0. default: 1 text: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Text description: Text generation configuration parameters, if applicable. tool_choice: allOf: - anyOf: - type: string - additionalProperties: true type: object title: Tool Choice description: >- Controls which (if any) tool the model should use. Can be 'none', 'auto', 'required', or an object specifying a particular tool. Default is 'auto'. default: auto tools: allOf: - items: additionalProperties: true type: object type: array title: Tools description: >- A list of tools the model may call. Each tool is defined with a type and function specification following the OpenAI tool format. Supports 'function', 'mcp', 'sse', and 'python' tool types. top_p: allOf: - type: number maximum: 1 minimum: 0 title: Top P description: >- An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0. default: 1 truncation: allOf: - type: string title: Truncation description: >- The truncation strategy to use for the context. Can be 'auto' or 'disabled'. Default is 'disabled'. default: disabled user: allOf: - anyOf: - type: string - type: 'null' title: User description: >- A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse. metadata: allOf: - anyOf: - additionalProperties: true type: object - type: 'null' title: Metadata description: >- Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information about the response in a structured format. title: Response description: >- Represents a response object returned from the API. A response includes the model output, token usage, configuration parameters, and metadata about the conversation state. refIdentifier: '#/components/schemas/Response' requiredProperties: - created_at - status - model - output examples: example: value: id: object: response created_at: 123 status: model: output: - id: type: message role: content: - type: text: status: previous_response_id: usage: {} error: {} incomplete_details: {} instructions: max_output_tokens: 123 max_tool_calls: 2 parallel_tool_calls: true reasoning: {} store: true temperature: 1 text: {} tool_choice: tools: - {} top_p: 1 truncation: disabled user: metadata: {} description: Successful Response '422': application/json: schemaArray: - type: object properties: detail: allOf: - items: $ref: '#/components/schemas/ValidationError' type: array title: Detail title: HTTPValidationError refIdentifier: '#/components/schemas/HTTPValidationError' examples: example: value: detail: - loc: - msg: type: description: Validation Error deprecated: false type: path components: schemas: Message: properties: id: type: string title: Id description: The unique identifier of the message. type: type: string title: Type description: The object type, always 'message'. default: message role: type: string title: Role description: >- The role of the message sender. Can be 'user', 'assistant', or 'system'. content: items: $ref: '#/components/schemas/MessageContent' type: array title: Content description: >- An array of content parts that make up the message. Each part has a type and associated data. status: type: string title: Status description: The status of the message. Can be 'in_progress' or 'completed'. type: object required: - id - role - content - status title: Message description: Represents a message in a conversation. MessageContent: properties: type: type: string title: Type description: >- The type of the content part. Can be 'input_text', 'output_text', 'image', etc. text: anyOf: - type: string - type: 'null' title: Text description: The text content, if applicable. type: object required: - type title: MessageContent description: Represents a piece of content within a message. ToolCall: properties: id: type: string title: Id description: The unique identifier of the tool call. type: type: string title: Type description: The type of tool call. Can be 'function', 'tool_call', or 'mcp'. function: anyOf: - additionalProperties: true type: object - type: 'null' title: Function description: >- The function definition for function tool calls. Contains 'name' and 'arguments' keys. mcp: anyOf: - additionalProperties: true type: object - type: 'null' title: Mcp description: >- The MCP (Model Context Protocol) tool call definition for MCP tool calls. type: object required: - id - type title: ToolCall description: Represents a tool call made by the model. ToolOutput: properties: type: type: string title: Type description: The object type, always 'tool_output'. default: tool_output tool_call_id: type: string title: Tool Call Id description: The ID of the tool call that this output corresponds to. output: type: string title: Output description: The output content from the tool execution. type: object required: - tool_call_id - output title: ToolOutput description: Represents the output/result of a tool call. ValidationError: properties: loc: items: anyOf: - type: string - type: integer type: array title: Location msg: type: string title: Message type: type: string title: Error Type type: object required: - loc - msg - type title: ValidationError ```` --- # Source: https://docs.fireworks.ai/guides/predicted-outputs.md # Using predicted outputs > Use Predicted Outputs to boost output generation speeds for editing / rewriting use cases This feature is in beta and we are working on improvements. We welcome your feedback on [Discord](https://discord.gg/fireworks-ai) In cases where large parts of the LLM output are known in advance, e.g. editing or rewriting a document or code snippet, you can improve output generation speeds with predicted outputs. Predicted outputs allows you to provide strong "guesses" of what output may look like. To use Predicted Outputs, set the `prediction` field in the Fireworks API with the predicted output. For example, you may want to edit a survey and add an option to contact users by text message: ``` { "questions": [ { "question": "Name", "type": "text" }, { "question": "Age", "type": "number" }, { "question": "Feedback", "type": "text_area" }, { "question": "How to Contact", "type": "multiple_choice", "options": ["Email", "Phone"], "optional": true } ] } ``` In this case, we expect most of the code will remain the same. We set the ‘prediction’ field to be the original survey code. The output generation speed increases using predicted outputs. ```python Python (Fireworks) theme={null} from fireworks.client import Fireworks code = """{ "questions": [ { "question": "Name", "type": "text" }, { "question": "Age", "type": "number" }, { "question": "Feedback", "type": "text_area" }, { "question": "How to Contact", "type": "multiple_choice", "options": ["Email", "Phone"], "optional": true } ] } """ client = Fireworks(api_key="") response = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p1-70b-instruct", messages=[{ "role": "user", "content": "Edit the How to Contact question to add an option called Text Message. Output the full edited code, with no markdown or explanations.", }, { "role": "user", "content": code } ], temperature=0, prediction={"type": "content", "content": code} ) print(response.choices[0].message.content) ``` ### Additional information on Predicted Outputs: * Using Predicted Outputs is free at this time * We recommend setting `temperature=0` for best results for most intended use cases of Predicted Outputs. In these cases, using Predicted Outputs does not impact the quality of outputs generated * If the prediction is substantially different from the generated output, output generation speed may decrease * The max length of the `prediction` field is set by `max_tokens` and is 2048 by default, and needs to be updated if you have a longer input and prediction. * If you are using an on-demand deployment, you can set `rewrite_speculation=True` and potentially get even faster output generation. We are working on rolling this out to Serverless soon. --- # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/prepare-model.md # Source: https://docs.fireworks.ai/api-reference/prepare-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/prepare-model.md # Source: https://docs.fireworks.ai/api-reference/prepare-model.md # Source: https://docs.fireworks.ai/tools-sdks/firectl/commands/prepare-model.md # Source: https://docs.fireworks.ai/api-reference/prepare-model.md # Prepare Model for different precisions ## OpenAPI ````yaml post /v1/accounts/{account_id}/models/{model_id}:prepare paths: path: /v1/accounts/{account_id}/models/{model_id}:prepare method: post servers: - url: https://api.fireworks.ai request: security: - title: BearerAuth parameters: query: {} header: Authorization: type: http scheme: bearer description: >- Bearer authentication using your Fireworks API key. Format: Bearer cookie: {} parameters: path: account_id: schema: - type: string required: true description: The Account Id model_id: schema: - type: string required: true description: The Model Id query: {} header: {} cookie: {} body: application/json: schemaArray: - type: object properties: precision: allOf: - $ref: '#/components/schemas/DeploymentPrecision' title: the precision with which the model will be prepared readMask: allOf: - type: string title: >- The fields to be returned in the response. If empty or "*", all fields will be returned. This is added as is used in getResource() required: true refIdentifier: '#/components/schemas/GatewayPrepareModelBody' examples: example: value: precision: PRECISION_UNSPECIFIED readMask: response: '200': application/json: schemaArray: - type: object properties: {} examples: example: value: {} description: A successful response. deprecated: false type: path components: schemas: DeploymentPrecision: type: string enum: - PRECISION_UNSPECIFIED - FP16 - FP8 - FP8_MM - FP8_AR - FP8_MM_KV_ATTN - FP8_KV - FP8_MM_V2 - FP8_V2 - FP8_MM_KV_ATTN_V2 - NF4 - FP4 - BF16 - FP4_BLOCKSCALED_MM - FP4_MX_MOE default: PRECISION_UNSPECIFIED title: >- - PRECISION_UNSPECIFIED: if left unspecified we will treat this as a legacy model created before self serve ```` --- # Source: https://docs.fireworks.ai/guides/prompt-caching.md # Prompt caching Prompt caching is a performance optimization feature that allows Fireworks to respond faster to requests with prompts that share common prefixes. In many situations, it can reduce time to first token (TTFT) by as much as 80%. Prompt caching is enabled by default for all Fireworks models and deployments. For dedicated deployments, prompt caching frees up resources, leading to higher throughput on the same hardware. Dedicated deployments on the Enterprise plan allow additional configuration options to further optimize cache performance. ## Using prompt caching ### Common use cases Requests to LLMs often share a large portion of their prompt. For example: * Long system prompts with detailed instructions * Descriptions of available tools for function calling * Growing previous conversation history for chat use cases * Shared per-user context, like a current file for a coding assistant Prompt caching avoids re-processing the cached prefix of the prompt and starts output generation much sooner. ### Structuring prompts for caching Prompt caching works only for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. For function calling models, tools are considered part of the prompt. ## Optimization techniques for maximum cache hits Due to the autoregressive nature of LLMs, even a single-token difference can invalidate the cache from that token onward. Here are key strategies to maximize your cache hit rates: ### Keep your prompt prefix stable The most critical rule for effective prompt caching is maintaining a stable prefix. Any change to the beginning of your prompt will invalidate the entire cache chain that follows. **Common mistake:** Including timestamps or other dynamic content at the beginning of your system prompt. ```python theme={null} # ❌ DON'T: This kills cache hit rates system_prompt = f""" Current time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} You are a helpful AI assistant... """ ``` Even a one-second difference in the timestamp will invalidate the entire cache, making it completely ineffective. ### Structure prompts for caching success **✅ DO:** Place static content first, dynamic content last ```python theme={null} from fireworks import LLM # ✅ Good: Static content first system_prompt = """ You are a helpful AI assistant with expertise in software development. Your guidelines: - Provide clear, concise explanations - Include practical examples when helpful - Ask clarifying questions when requirements are unclear Available tools: - web_search: Search the internet for current information - code_executor: Run code snippets safely - file_manager: Read and write files """ # Build the complete prompt user_message = "" # Add dynamic content at the end if user_context: user_message += f"User context: {user_context}\n\n" if current_time_needed: user_message += f"Current time: {datetime.now().isoformat()}\n\n" # User query goes last user_message += user_query # Use with Fireworks Build SDK llm = LLM(model="llama-v3p1-70b-instruct", deployment_type="auto") response = llm.chat.completions.create( messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_message} ] ) ``` ### Smart timestamp handling When you need to provide current time information, consider these strategies: **Option 1: Rounded timestamps** ```python theme={null} # ✅ Round to larger intervals to increase cache hits current_hour = datetime.now().replace(minute=0, second=0, microsecond=0) system_prompt = f""" You are a helpful assistant. Current hour: {current_hour.strftime('%Y-%m-%d %H:00')} ... """ ``` **Option 2: Conditional time injection** ```python theme={null} # ✅ Only add time when the query actually needs it def build_prompt(user_query, system_base): prompt = system_base # Only add timestamp for time-sensitive queries time_keywords = ['today', 'now', 'current', 'latest', 'recent'] if any(keyword in user_query.lower() for keyword in time_keywords): prompt += f"\nCurrent time: {datetime.now().isoformat()}" prompt += f"\nUser: {user_query}" return prompt ``` **Option 3: Move time to user message** ```python theme={null} from fireworks import LLM # ✅ Keep system prompt static, add time context to user message system_prompt = """ You are a helpful AI assistant... """ # This stays the same user_message = f""" Current time: {datetime.now().isoformat()} User query: {user_query} """ # Use with Fireworks Build SDK llm = LLM(model="llama-v3p1-70b-instruct", deployment_type="auto") response = llm.chat.completions.create( messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_message} ] ) ``` By following these optimization techniques, you can significantly improve your application's performance through effective prompt caching while maintaining the quality and functionality of your AI system. ### How it works Fireworks will automatically find the longest prefix of the request that is present in the cache and reuse it. The remaining portion of the prompt will be processed as usual. The entire prompt is stored in the cache for future reuse. Cached prompts usually stay in the cache for at least several minutes. Depending on the model, load level, and deployment configuration, it can be up to several hours. The oldest prompts are evicted from the cache first. Prompt caching doesn't alter the result generated by the model. The response you receive will be identical to what you would get if prompt caching was not used. Each generation is sampled from the model independently on each request and is not cached for future usage. ## Monitoring For dedicated deployments, information about prompt caching is returned in the response headers. The header `fireworks-prompt-tokens` contains the number of tokens in the prompt, out of which `fireworks-cached-prompt-tokens` are cached. Aggregated metrics are also available in the [usage dashboard](https://fireworks.ai/account/usage?type=deployments). ## Data privacy Serverless deployments maintain separate caches for each Fireworks account to prevent data leakage and timing attacks. Dedicated deployments by default share a single cache across all requests. Because prompt caching doesn't change the outputs, privacy is preserved even if the deployment powers a multi-tenant application. It does open a minor risk of a timing attack: potentially, an adversary can learn that a particular prompt is cached by observing the response time. To ensure full isolation, you can pass the `x-prompt-cache-isolation-key` header or the `prompt_cache_isolation_key` field in the body of the request. It can contain an arbitrary string that acts as an additional cache key, i.e., no sharing will occur between requests with different IDs. ## Limiting or turning off caching Additionally, you can pass the `prompt_cache_max_len` field in the request body to limit the maximum prefix of the prompt (in tokens) that is considered for caching. It's rarely needed in real applications but can come in handy for benchmarking the performance of dedicated deployments by passing `"prompt_cache_max_len": 0`. ## Advanced: cache locality for Enterprise deployments Dedicated deployments on an Enterprise plan allow you to pass an additional hint in the request to improve cache hit rates. First, the deployment needs to be created or updated with an additional flag: ```bash theme={null} firectl create deployment ... --enable-session-affinity ``` Then the client can pass an opaque identifier representing a single user or session in the `user` field of the body or in the `x-session-affinity` header. Fireworks will try to route requests with the identifier to the same server, further reducing response times. It's best to choose an identifier that groups requests with long shared prompt prefixes. For example, it can be a chat session with the same user or an assistant working with the same shared context. ### Migration and traffic management When migrating between deployments that use prompt caching, it's crucial to implement proper traffic routing to maintain optimal cache hit rates. When gradually routing traffic to a new deployment, use consistent user/session-based sampling rather than random sampling. Here's the recommended implementation for traffic routing: ```python theme={null} import hashlib # Configure traffic fraction (e.g., 20% to new deployment) fireworks_traffic_fraction = 0.2 user_id = "session-id-123" # Generate deterministic hash from user_id hashed_user_id = int(hashlib.md5(user_id.encode()).hexdigest(), 16) # MD5 hash on user-id and convert to integer MAX_HASH = 2**128 - 1 # MD5 hash maximum value # Compute ratio for consistent routing ratio = hashed_user_id / MAX_HASH # Returns 0.0 to 1.0 if (ratio < fireworks_traffic_fraction): send_to_new_deployment(user=hashed_user_id) # Pass user ID for caching else: send_elsewhere() # Route to old deployment or serverless ``` Avoid random sampling for traffic routing as it can negatively impact cache hit rates: ```python theme={null} # Don't do this: if random() < fireworks_traffic_fraction: # ❌ Reduces cache effectiveness send_to_new_deployment(user=hashed_user_id) ``` Using consistent user-based routing ensures complete user sessions are maintained on the same deployment, optimizing prompt cache performance regardless of the traffic fraction. --- # Source: https://docs.fireworks.ai/tools-sdks/python-sdk.md # Python SDK The official Python SDK for the Fireworks AI API is available on [GitHub](https://github.com/fw-ai-external/python-sdk) and [PyPI](https://pypi.org/project/fireworks-ai/). ## Fireworks vs. OpenAI SDK Fireworks is [OpenAI-compatible](/tools-sdks/openai-compatibility), so you can use the OpenAI SDK with Fireworks. The Fireworks SDK offers additional benefits: * **Better concurrency defaults** — Optimized connection pooling for high-throughput workloads * **Fireworks-exclusive features** — Access parameters and response fields not available in the OpenAI API * **Platform automation** — Manage datasets, evals, fine-tuning, and deployments programmatically ## Installation Requires Python 3.9+ and an API key. See [Getting your API key](/api-reference/introduction#getting-your-api-key) for instructions. The SDK is currently in alpha. Use the `--pre` flag when installing to get the latest version. ```bash pip theme={null} pip install --pre fireworks-ai ``` ```bash poetry theme={null} poetry add --pre fireworks-ai ``` ```bash uv theme={null} uv add --pre fireworks-ai ``` For detailed usage instructions, see the [README.md](https://github.com/fw-ai-external/python-sdk#readme). To quickly get started with serverless, see our [Serverless Quickstart](/getting-started/quickstart). --- > To find navigation and other pages in this documentation, fetch the llms.txt file at: https://docs.fireworks.ai/llms.txt --- # Source: https://docs.fireworks.ai/models/quantization.md # Quantization > Reduce model precision to improve performance and lower costs Quantization reduces the number of bits used to serve a model, improving performance and reducing cost by 30-50%. However, this can change model numerics which may introduce small changes to the output. Read our [blog post](https://fireworks.ai/blog/fireworks-quantization) for a detailed treatment of how quantization affects model quality. ## Checking available precisions Models may support different numerical precisions like FP16, FP8, BF16, or INT8, which affect memory usage and inference speed. **Check default precision:** ```bash theme={null} firectl get model accounts/fireworks/models/llama-v3p1-8b-instruct | grep "Default Precision" ``` **Check supported precisions:** ```bash theme={null} firectl get model accounts/fireworks/models/llama-v3p1-8b-instruct | grep -E "(Supported Precisions|Supported Precisions With Calibration)" ``` The `Precisions` field indicates what precisions the model has been prepared for. ## Quantizing a model A model can be quantized to 8-bit floating-point (FP8) precision. ```bash theme={null} firectl prepare-model ``` ```python theme={null} import os import requests ACCOUNT_ID = os.environ.get("FIREWORKS_ACCOUNT_ID") API_KEY = os.environ.get("FIREWORKS_API_KEY") MODEL_ID = "" # The ID of the model you want to prepare response = requests.post( f"https://api.fireworks.ai/v1/accounts/{ACCOUNT_ID}/models/{MODEL_ID}:prepare", headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" }, json={ "precision": "FP8" } ) print(response.json()) ``` This is an additive process that enables creating deployments with additional precisions. The original FP16 checkpoint is still available for use. You can check on the status of preparation by running: ```bash theme={null} firectl get model ``` ```python theme={null} import os import requests ACCOUNT_ID = os.environ.get("FIREWORKS_ACCOUNT_ID") API_KEY = os.environ.get("FIREWORKS_API_KEY") MODEL_ID = "" # The ID of the model you want to get response = requests.get( f"https://api.fireworks.ai/v1/accounts/{ACCOUNT_ID}/models/{MODEL_ID}", headers={ "Authorization": f"Bearer {API_KEY}" } ) print(response.json()) ``` and checking if the state is still in `PREPARING`. A successfully prepared model will have the desired precision added to the `Precisions` list. ## Creating an FP8 deployment By default, creating a deployment uses the FP16 checkpoint. To use a quantized FP8 checkpoint, first ensure the model has been prepared for FP8 (see [Checking available precisions](#checking-available-precisions) above), then pass the `--precision` flag when creating your deployment: ```bash theme={null} firectl create deployment --accelerator-type NVIDIA_H100_80GB --precision FP8 ``` ```python theme={null} import os import requests ACCOUNT_ID = os.environ.get("FIREWORKS_ACCOUNT_ID") API_KEY = os.environ.get("FIREWORKS_API_KEY") # The ID of the model you want to deploy. # The model must be prepared for FP8 precision. MODEL_ID = "" DEPLOYMENT_NAME = "My FP8 Deployment" response = requests.post( f"https://api.fireworks.ai/v1/accounts/{ACCOUNT_ID}/deployments", headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" }, json={ "displayName": DEPLOYMENT_NAME, "baseModel": MODEL_ID, "acceleratorType": "NVIDIA_H100_80GB", "precision": "FP8", } ) print(response.json()) ``` Quantized deployments can only be served using H100 GPUs. --- # Source: https://docs.fireworks.ai/guides/querying-asr-models.md # Speech to Text > Convert audio to text with streaming and pre-recorded transcription Fireworks AI provides three ASR (Automatic Speech Recognition) features: **Streaming Transcription**, **Pre-recorded Transcription**, and **Pre-recorded Translation**. This guide shows you how to get started with each feature. ## Streaming Transcription Convert audio to text in real-time using WebSocket connections. Perfect for voice agents and live applications. ### Quick Start **Available Models:** * [`fireworks-asr-large`](https://app.fireworks.ai/models/fireworks/fireworks-asr-large): Cost efficient model for real-time transcription over web-sockets * [`fireworks-asr-v2`](https://app.fireworks.ai/models/fireworks/fireworks-asr-v2): Next generation and ultra-low latency audio streaming for real-time transcription over web-sockets For a working example of streaming transcription see the following resources: 1. [Python notebook](https://colab.research.google.com/github/fw-ai/cookbook/blob/main/learn/audio/audio_streaming_speech_to_text/audio_streaming_speech_to_text.ipynb) 2. [Python cookbook](https://github.com/fw-ai/cookbook/blob/main/learn/audio/audio_streaming_speech_to_text/python) For more detailed information, see the [full streaming API documentation](/api-reference/audio-streaming-transcriptions) and the [source code](https://github.com/fw-ai/cookbook/tree/main/learn/audio/audio_streaming_speech_to_text) ## Pre-recorded Transcription Convert audio files to text. Supports files up to 1GB in formats like MP3, FLAC, and WAV. Transcribe multiple hours of audio in minutes. ### Quick Start For a working example of pre-recorded transcription see the [Python notebook](https://colab.research.google.com/github/fw-ai/cookbook/blob/main/learn/audio/audio_prerecorded_speech_to_text/audio_prerecorded_speech_to_text.ipynb) **Available Models:** * [`whisper-v3`](https://app.fireworks.ai/models/fireworks/whisper-v3): Highest accuracy * model=`whisper-v3` * base\_url=`https://audio-prod.api.fireworks.ai` * [`whisper-v3-turbo`](https://app.fireworks.ai/models/fireworks/whisper-v3-turbo): Faster processing * model=`whisper-v3-turbo` * base\_url=`https://audio-turbo.api.fireworks.ai` For more detailed information, see the [full transcription API documentation](/api-reference/audio-transcriptions) ## Pre-recorded Translation Translate audio from any of our supported languages to English. Supports files up to 1GB in formats like MP3, FLAC, and WAV. ### Quick Start ```python Python (fireworks sdk) theme={null} !pip install fireworks-ai requests from fireworks.client.audio import AudioInference import requests import time from dotenv import load_dotenv import os load_dotenv() # Prepare client audio = requests.get("https://tinyurl.com/3cy7x44v").content client = AudioInference( model="whisper-v3", base_url="https://audio-prod.api.fireworks.ai", # # Or for the turbo version # model="whisper-v3-turbo", # base_url="https://audio-turbo.api.fireworks.ai", api_key=os.getenv("FIREWORKS_API_KEY") ) # Make request start = time.time() r = await client.translate_async(audio=audio) print(f"Took: {(time.time() - start):.3f}s. Text: '{r.text}'") ``` ```python Python (openai sdk) theme={null} !pip install openai requests from openai import OpenAI import requests from dotenv import load_dotenv import os load_dotenv() client = OpenAI( base_url="https://audio-prod.api.fireworks.ai/v1", api_key=os.getenv("FIREWORKS_API_KEY"), ) audio_file= requests.get("https://tinyurl.com/3cy7x44v").content translation = client.audio.translations.create( model="whisper-v3", file=audio_file, ) print(translation.text) ``` ```bash curl theme={null} # Download audio file curl -L -o "audio.flac" "https://tinyurl.com/4997djsh" # Make request curl -X POST "https://audio-prod.api.fireworks.ai/v1/audio/translations" \ -H "Authorization: " \ -F "file=@audio.flac" ``` For more detailed information, see the [full translation API documentation](/api-reference/audio-translations) ## Supported Languages We support 95+ languages including English, Spanish, French, German, Chinese, Japanese, Russian, Portuguese, and many more. See the [complete language list](/api-reference/audio-transcriptions#supported-languages). ## Common Use Cases * **Call Center / Customer Service**: Transcribe or translate customer calls * **Note Taking**: Transcribe audio for automated note taking ## Next Steps 1. Explore [advanced features](/api-reference/audio-transcriptions) like speaker diarization and custom prompts 2. Contact us at [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai) for dedicated endpoints and enterprise features --- # Source: https://docs.fireworks.ai/tools-sdks/python-client/querying-dedicated-deployments.md # Querying Dedicated Deployments > Learn how to connect to and query dedicated deployments that were created outside the SDK This SDK documentation applies to version [0.19.20](https://pypi.org/project/fireworks-ai/0.19.20/) and earlier. The Build SDK will be deprecated and replaced with version 1.0.0 of the SDK (see our [changelog](/updates/changelog#2025-11-12) for more details). Please migrate to the new SDK when it becomes available. When you have dedicated deployments that were created via `firectl` or the Fireworks web UI, you can easily connect to them using the Build SDK to run inference. This is particularly useful when you want to leverage existing infrastructure or when deployments are managed by different teams. ## Prerequisites Before you begin, make sure you have: * An existing dedicated deployment running on Fireworks * The deployment ID or name * Your Fireworks API key configured You can find your deployment ID in the [Fireworks dashboard](https://app.fireworks.ai/dashboard/deployments) under the deployments section. ## Connecting to an existing deployment To query an existing dedicated deployment, you simply need to create an `LLM` instance with the `deployment_type="on-demand"` and provide the deployment `id`: ```python theme={null} from fireworks import LLM # Connect to your existing dedicated deployment llm = LLM( model="llama-v3p2-3b-instruct", # The model your deployment is running deployment_type="on-demand", id="my-custom-deployment", # Your deployment ID ) # Start using the deployment immediately - no .apply() needed response = llm.chat.completions.create( messages=[{"role": "user", "content": "Hello from my dedicated deployment!"}] ) print(response.choices[0].message.content) ``` Since you're connecting to an existing deployment, you don't need to call `.apply()` - the deployment is already running and ready to serve requests. ## Important considerations ### No resource creation When connecting to existing deployments: * **No new resources are created** - The SDK connects to your existing deployment * **No `.apply()` call needed** - The deployment is already active * **Immediate availability** - You can start making inference calls right away ### Deployment ID requirements The `id` parameter should match exactly with your existing deployment: * Use the deployment name/ID as shown in the Fireworks dashboard * The ID is case-sensitive and must match exactly * If the deployment doesn't exist, you'll receive an error when making requests ### Model specification While you need to specify the `model` parameter, it should match the model that your deployment is actually running: ```python theme={null} # If your deployment is running Llama 3.2 3B Instruct llm = LLM( model="llama-v3p2-3b-instruct", deployment_type="on-demand", id="production-llama-deployment" ) # If your deployment is running Qwen 2.5 72B Instruct llm = LLM( model="qwen2p5-72b-instruct", deployment_type="on-demand", id="qwen-high-capacity-deployment" ) ``` ## Complete example Here's a complete example that demonstrates connecting to an existing deployment and using it for a conversation: ```python Basic usage theme={null} from fireworks import LLM # Connect to existing deployment llm = LLM( model="llama-v3p2-3b-instruct", deployment_type="on-demand", id="my-existing-deployment", ) # Use OpenAI-compatible chat completions response = llm.chat.completions.create( messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=150, temperature=0.7 ) print(response.choices[0].message.content) ``` ```python Streaming responses theme={null} from fireworks import LLM llm = LLM( model="llama-v3p2-3b-instruct", deployment_type="on-demand", id="my-existing-deployment", ) # Stream the response stream = llm.chat.completions.create( messages=[{"role": "user", "content": "Write a short poem about AI."}], stream=True, max_tokens=100 ) for chunk in stream: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="") ``` ## Troubleshooting ### Common issues and solutions **Problem**: Getting 404 errors when trying to use the deployment. **Solutions**: * Verify the deployment ID is correct in the [Fireworks dashboard](https://app.fireworks.ai/dashboard/deployments) * Ensure the deployment is in "Running" status * Check that you're using the correct Fireworks API key * Confirm the deployment belongs to your account/organization **Problem**: The model parameter doesn't match the actual deployed model. **Solutions**: * Check what model your deployment is actually running in the dashboard * Update the `model` parameter to match the deployed model * If unsure, you can often find the model information in the deployment details **Problem**: Getting authentication errors when connecting to the deployment. **Solutions**: * Verify your `FIREWORKS_API_KEY` environment variable is set correctly * Ensure your API key has access to the deployment * Check that the deployment belongs to your account or organization ## Next steps Now that you can connect to existing deployments, you might want to: * Learn about [fine-tuning models](/tools-sdks/python-client/sdk-basics#fine-tuning-a-model) to create custom deployments * Explore the [complete SDK tutorial](/tools-sdks/python-client/the-tutorial) for more advanced usage * Check out the [SDK reference documentation](/tools-sdks/python-client/sdk-reference) for all available options --- # Source: https://docs.fireworks.ai/guides/querying-embeddings-models.md # Embeddings & Reranking > Generate embeddings and rerank results for semantic search Fireworks hosts embedding and reranking models, which are useful for tasks like RAG and semantic search. ## Generating embeddings Embeddings models take text as input and output a vector of floating point numbers to use for tasks like similarity comparisons and search. Our embedding service is OpenAI compatible. Refer to OpenAI's embeddings [guide](https://platform.openai.com/docs/guides/embeddings) and OpenAI's [embeddings documentation](https://platform.openai.com/docs/api-reference/embeddings) for more information on using these models. ```python Python theme={null} import requests url = "https://api.fireworks.ai/inference/v1/embeddings" payload = { "input": "The quick brown fox jumped over the lazy dog", "model": "fireworks/qwen3-embedding-8b", } headers = { "Authorization": "Bearer ", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print(response.json()) ``` To generate variable-length embeddings, you can add the `dimensions` parameter to the request, for example, `dimensions: 128`. The API usage for embedding models is identical for BERT-based and LLM-based embeddings. Simply use the `/v1/embeddings` endpoint with your chosen model. ## Model Availability Fireworks hosts several purpose-built embeddings models, which are optimized specifically for tasks like semantic search and document similarity comparison. We host the SOTA Qwen3 Embeddings family of models: * `fireworks/qwen3-embedding-8b` (\*available on serverless) * `fireworks/qwen3-embedding-4b` * `fireworks/qwen3-embedding-0p6b` You can retrieve embeddings from any LLM in our model library. Here are some examples of LLMs that work with the embeddings API: * `fireworks/glm-4p5` * `fireworks/gpt-oss-20b` * `fireworks/kimi-k2-instruct-0905` * `fireworks/deepseek-r1-0528` You can also retrieve embeddings from any models you bring yourself through [custom model upload](/models/uploading-custom-models). These BERT-based models are available on serverless only: * `nomic-ai/nomic-embed-text-v1.5` * `nomic-ai/nomic-embed-text-v1` * `WhereIsAI/UAE-Large-V1` * `thenlper/gte-large` * `thenlper/gte-base` * `BAAI/bge-base-en-v1.5` * `BAAI/bge-small-en-v1.5` * `mixedbread-ai/mxbai-embed-large-v1` * `sentence-transformers/all-MiniLM-L6-v2` * `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` ## Reranking documents Reranking models are used to rerank a list of documents based on a query. We only support reranking with the Qwen3 Reranker family of models: * `fireworks/qwen3-reranker-8b` (\*available on serverless) * `fireworks/qwen3-reranker-4b` * `fireworks/qwen3-reranker-0p6b` The reranking model takes a query and a list of documents as input and outputs the list of documents scored by relevance to the query. ```python Python theme={null} import requests url = "https://api.fireworks.ai/inference/v1/rerank" payload = { "model": "fireworks/qwen3-reranker-8b", "query": "What was the primary objective of the Apollo 10 mission?", "documents": [ "The Apollo 10 mission was launched in May 1969 and served as a 'dress rehearsal' for the Apollo 11 lunar landing.", "The crew of Apollo 10 consisted of astronauts Thomas Stafford, John Young, and Eugene Cernan.", "The command module for Apollo 10 was nicknamed 'Charlie Brown' and the lunar module was called 'Snoopy', after characters from the Peanuts comics.", "The Apollo program was a series of NASA missions that successfully landed the first humans on the Moon and returned them safely to Earth." ], "top_n": 3, "return_documents": True } headers = { "Authorization": "Bearer ", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print(response.json()) ``` ## Deploying embeddings and reranking models While Qwen3 Embedding 8b and Qwen3 Reranker 8b are available on serverless, you also have the option to deploy them via [on-demand deployments](/guides/ondemand-deployments). --- # Source: https://docs.fireworks.ai/guides/querying-text-models.md # Text Models > Query, track and manage inference for text models New to Fireworks? Start with the [Serverless Quickstart](/getting-started/quickstart) for a step-by-step guide to making your first API call. Fireworks provides fast, cost-effective access to leading open-source text models through OpenAI-compatible APIs. Query models via serverless inference or dedicated deployments using the chat completions API (recommended), completions API, or responses API. [Browse 100+ available models →](https://fireworks.ai/models) ## Chat Completions API ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{ "role": "user", "content": "Explain quantum computing in simple terms" }] ) print(response.choices[0].message.content) ``` ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const response = await client.chat.completions.create({ model: "accounts/fireworks/models/deepseek-v3p1", messages: [ { role: "user", content: "Explain quantum computing in simple terms", }, ], }); console.log(response.choices[0].message.content); ``` ```bash theme={null} curl https://api.fireworks.ai/inference/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $FIREWORKS_API_KEY" \ -d '{ "model": "accounts/fireworks/models/deepseek-v3p1", "messages": [ { "role": "user", "content": "Explain quantum computing in simple terms" } ] }' ``` Most models automatically format your messages with the correct template. To verify the exact prompt used, enable the [`echo`](#debugging--advanced-options) parameter. ## Alternative query methods Fireworks also supports [Completions API](/guides/completions-api) and [Responses API](/guides/response-api). ## Querying dedicated deployments For consistent performance, guaranteed capacity, or higher throughput, you can query [on-demand deployments](/guides/ondemand-deployments) instead of serverless models. Deployments use the same APIs with a deployment-specific model identifier: ``` # ``` For example: ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1#accounts//deployments/", messages=[{"role": "user", "content": "Hello"}] ) ``` ## Common patterns ### Multi-turn conversations Maintain conversation history by including all previous messages: ```python theme={null} messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "The capital of France is Paris."}, {"role": "user", "content": "What's its population?"} ] response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=messages ) print(response.choices[0].message.content) ``` The model uses the full conversation history to provide contextually relevant responses. ### System prompts Override the default system prompt by setting the first message with `role: "system"`: ```python theme={null} messages = [ {"role": "system", "content": "You are a helpful Python expert who provides concise code examples."}, {"role": "user", "content": "How do I read a CSV file?"} ] response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=messages ) ``` To completely omit the system prompt, set the first message's `content` to an empty string. ### Streaming responses Stream tokens as they're generated for real time, interactive UX. Covered in detail in the [Serverless Quickstart](/getting-started/quickstart#streaming-responses). ```python theme={null} stream = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Tell me a story"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` **Aborting streams:** Close the connection to stop generation and avoid billing for ungenerated tokens: ```python theme={null} for chunk in stream: print(chunk.choices[0].delta.content, end="") if some_condition: stream.close() break ``` ### Async requests Use async clients to make multiple concurrent requests for better throughput: ```python theme={null} import asyncio from openai import AsyncOpenAI client = AsyncOpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) async def main(): response = await client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content) asyncio.run(main()) ``` ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); async function main() { const response = await client.chat.completions.create({ model: "accounts/fireworks/models/deepseek-v3p1", messages: [{ role: "user", content: "Hello" }], }); console.log(response.choices[0].message.content); } main(); ``` ### Usage & performance tracking Every response includes token usage information and performance metrics for debugging and observability. For aggregate metrics over time, see the [usage dashboard](https://app.fireworks.ai/account/usage). **Token usage** (prompt, completion, total tokens) is included in the response body for all requests. **Performance metrics** (latency, time-to-first-token, etc.) are included in response headers for non-streaming requests. For streaming requests, use the `perf_metrics_in_response` parameter to include all metrics in the response body. ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}] ) # Token usage (always included) print(response.usage.prompt_tokens) # Tokens in your prompt print(response.usage.completion_tokens) # Tokens generated print(response.usage.total_tokens) # Total tokens billed # Performance metrics are in response headers: # fireworks-prompt-tokens, fireworks-server-time-to-first-token, etc. ``` ```python theme={null} stream = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") # Usage is included in the final chunk if chunk.usage: print(f"\n\nTokens used: {chunk.usage.total_tokens}") print(f"Prompt: {chunk.usage.prompt_tokens}, Completion: {chunk.usage.completion_tokens}") ``` ```python theme={null} stream = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello, world!"}], stream=True, extra_body={"perf_metrics_in_response": True} ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") # Both usage and performance metrics are in the final chunk if chunk.choices[0].finish_reason: if chunk.usage: print(f"\n\nTokens: {chunk.usage.total_tokens}") if hasattr(chunk, 'perf_metrics'): print(f"Performance: {chunk.perf_metrics}") ``` Usage information is automatically included in the final chunk for streaming responses (the chunk with `finish_reason` set). This is a Fireworks extension - OpenAI SDK doesn't return usage for streaming by default. For all available metrics and details, see the [API reference documentation](/api-reference/post-chatcompletions#response-perf_metrics). If you encounter errors during inference, see [Inference Error Codes](/guides/inference-error-codes) for common issues and resolutions. ## Advanced capabilities Extend text models with additional features for structured outputs, tool integration, and performance optimization: Connect models to external tools and APIs with type-safe parameters Enforce JSON schemas for reliable data extraction Multi-step reasoning for complex problem-solving Speed up edits by predicting unchanged sections Cache common prompts to reduce latency and cost Process large volumes of requests asynchronously ## Configuration & debugging Control how the model generates text. Fireworks automatically uses recommended sampling parameters from each model's HuggingFace `generation_config.json` when you don't specify them explicitly, ensuring optimal performance out-of-the-box. We pull `temperature`, `top_k`, `top_p`, `min_p`, and `typical_p` from the model's configuration when not explicitly provided. ### Temperature Adjust randomness (0 = deterministic, higher = more creative): ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Write a poem"}], temperature=0.7 # Override model default ) ``` ### Max tokens Control the maximum number of tokens in the generated completion: ```python theme={null} max_tokens=100 # Generate at most 100 tokens ``` **Important notes:** * Default value is 2048 tokens if not specified * Most models support up to their full context window (e.g., 128K for DeepSeek R1) * When the limit is reached, you'll see `"finish_reason": "length"` in the response Set `max_tokens` appropriately for your use case to avoid truncated responses. Check the model's context window in the [Model Library](https://fireworks.ai/models). ### Top-p (nucleus sampling) Consider only the most probable tokens summing to `top_p` probability mass: ```python theme={null} top_p=0.9 # Consider top 90% probability mass ``` ### Top-k Consider only the k most probable tokens: ```python theme={null} top_k=50 # Consider top 50 tokens ``` ### Min-p Exclude tokens below a probability threshold: ```python theme={null} min_p=0.05 # Exclude tokens with <5% probability ``` ### Typical-p Use typical sampling to select tokens with probability close to the entropy of the distribution: ```python theme={null} typical_p=0.95 # Consider tokens with typical probability ``` ### Repetition penalties Reduce repetitive text with `frequency_penalty`, `presence_penalty`, or `repetition_penalty`: ```python theme={null} frequency_penalty=0.5, # Penalize frequent tokens (OpenAI compatible) presence_penalty=0.5, # Penalize any repeated token (OpenAI compatible) repetition_penalty=1.1 # Exponential penalty from prompt + output ``` ### Sampling options header The `fireworks-sampling-options` header contains the actual default sampling parameters used for the model, including values from the model's HuggingFace `generation_config.json`: ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.chat.completions.with_raw_response.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}] ) # Access headers from the raw response sampling_options = response.headers.get('fireworks-sampling-options') print(sampling_options) # e.g., '{"temperature": 0.7, "top_p": 0.9}' completion = response.parse() # get the parsed response object print(completion.choices[0].message.content) ``` ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const response = await client.chat.completions.with_raw_response.create({ model: "accounts/fireworks/models/deepseek-v3p1", messages: [{ role: "user", content: "Hello" }], }); // Access headers from the raw response const samplingOptions = response.headers.get('fireworks-sampling-options'); console.log(samplingOptions); // e.g., '{"temperature": 0.7, "top_p": 0.9}' const completion = response.parse(); // get the parsed response object console.log(completion.choices[0].message.content); ``` See the [API reference](/api-reference/post-chatcompletions) for detailed parameter descriptions. Generate multiple completions in one request: ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Tell me a joke"}], n=3 # Generate 3 different jokes ) for choice in response.choices: print(choice.message.content) ``` Inspect token probabilities for debugging or analysis: ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}], logprobs=True, top_logprobs=5 # Show top 5 alternatives per token ) for content in response.choices[0].logprobs.content: print(f"Token: {content.token}, Logprob: {content.logprob}") ``` Verify how your prompt was formatted: **Echo:** Return the prompt along with the generation: ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}], echo=True ) ``` **Raw output:** See raw token IDs and prompt fragments: Experimental API - may change without notice. ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}], raw_output=True ) print(response.raw_output.prompt_token_ids) # Token IDs print(response.raw_output.completion) # Raw completion ``` Force generation to continue past the end-of-sequence token (useful for benchmarking): ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}], ignore_eos=True, max_tokens=100 # Will always generate exactly 100 tokens ) ``` Output quality may degrade when ignoring EOS. This API is experimental and should not be relied upon for production use cases. Modify token probabilities to encourage or discourage specific tokens: ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}], logit_bias={ 123: 10.0, # Strongly encourage token ID 123 456: -50.0 # Strongly discourage token ID 456 } ) ``` Control perplexity dynamically using the [Mirostat algorithm](https://arxiv.org/abs/2007.14966): ```python theme={null} response = client.chat.completions.create( model="accounts/fireworks/models/deepseek-v3p1", messages=[{"role": "user", "content": "Hello"}], mirostat_target=5.0, # Target perplexity mirostat_lr=0.1 # Learning rate for adjustments ) ``` ## Understanding tokens Language models process text in chunks called **tokens**. In English, a token can be as short as one character or as long as one word. Different model families use different **tokenizers**, so the same text may translate to different token counts depending on the model. **Why tokens matter:** * Models have maximum context lengths measured in tokens * Pricing is based on token usage (prompt + completion) * Token count affects response time For Llama models, use [this tokenizer tool](https://belladoreai.github.io/llama-tokenizer-js/example-demo/build/) to estimate token counts. Actual usage is returned in the `usage` field of every API response. ## OpenAI SDK Migration Fireworks provides an OpenAI-compatible API, making migration straightforward. However, there are some minor differences to be aware of: ### Behavioral differences **`stop` parameter:** * **Fireworks**: Returns text including the stop word * **OpenAI**: Omits the stop word * *You can easily truncate it client-side if needed* **`max_tokens` with context limits:** * **Fireworks**: Automatically adjusts `max_tokens` lower if `prompt + max_tokens` exceeds the model's context window * **OpenAI**: Returns an invalid request error * *Control this behavior with the `context_length_exceeded_behavior` parameter* **Streaming usage stats:** * **Fireworks**: Returns `usage` field in the final chunk (where `finish_reason` is set) for both streaming and non-streaming * **OpenAI**: Only returns usage for non-streaming responses Example accessing streaming usage: ```python theme={null} for chunk in client.chat.completions.create(stream=True, ...): if chunk.usage: # Available in final chunk print(f"Tokens: {chunk.usage.total_tokens}") ``` ### Unsupported parameters The following OpenAI parameters are not yet supported: * `presence_penalty` * `frequency_penalty` * `best_of` (use `n` instead) * `logit_bias` * `functions` (deprecated - use [Tool Calling](/guides/function-calling) with the `tools` parameter instead) Have a use case requiring one of these? [Join our Discord](https://discord.gg/fireworks-ai) to discuss. ## Next steps Process images alongside text Transcribe and translate audio Generate vector representations for search Deploy models on dedicated GPUs Customize models for your use case Troubleshoot common inference errors Complete API documentation --- # Source: https://docs.fireworks.ai/guides/querying-vision-language-models.md # Vision Models > Query vision-language models to analyze images and visual content New to Fireworks? Start with the [Serverless Quickstart](/getting-started/quickstart#vision-models) to see a vision model example, then return here for more details. Vision-language models (VLMs) process both text and images in a single request, enabling image captioning, visual question answering, document analysis, chart interpretation, OCR, and content moderation. Use VLMs via serverless inference or [dedicated deployments](/getting-started/ondemand-quickstart). [Browse available vision models →](https://app.fireworks.ai/models?filter=Vision) ## Chat Completions API Provide images via URL or base64 encoding: ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.chat.completions.create( model="accounts/fireworks/models/qwen2p5-vl-32b-instruct", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Can you describe this image?"}, { "type": "image_url", "image_url": { "url": "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80" } } ] } ] ) print(response.choices[0].message.content) ``` ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const response = await client.chat.completions.create({ model: "accounts/fireworks/models/qwen2p5-vl-32b-instruct", messages: [ { role: "user", content: [ { type: "text", text: "Can you describe this image?" }, { type: "image_url", image_url: { url: "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80" } } ] } ] }); console.log(response.choices[0].message.content); ``` ```bash theme={null} curl https://api.fireworks.ai/inference/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $FIREWORKS_API_KEY" \ -d '{ "model": "accounts/fireworks/models/qwen2p5-vl-32b-instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Can you describe this image?" }, { "type": "image_url", "image_url": { "url": "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80" } } ] } ] }' ``` Instead of URLs, you can provide base64-encoded images prefixed with MIME types: ```python theme={null} import os import base64 from openai import OpenAI # Helper function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') # Encode your image image_base64 = encode_image("your_image.jpg") client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.chat.completions.create( model="accounts/fireworks/models/qwen2p5-vl-32b-instruct", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Can you describe this image?"}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image_base64}" } } ] } ] ) print(response.choices[0].message.content) ``` ## Working with images Vision-language models support [prompt caching](/guides/prompt-caching) to improve performance for requests with repeated content. Both text and image portions can benefit from caching to reduce time to first token by up to 80%. **Tips for optimal performance:** * **Use URLs for long conversations** – Reduces latency compared to base64 encoding * **Downsize images** – Smaller images use fewer tokens and process faster * **Structure prompts for caching** – Place static instructions at the beginning, variable content at the end * **Include metadata in prompts** – Add context about the image directly in your text prompt ## Advanced capabilities Fine-tune VLMs for specialized visual tasks Deploy custom LoRA adapters for vision models Deploy VLMs on dedicated GPUs for better performance ## Alternative query methods For the Completions API, manually insert the image token `` in your prompt and supply images as an ordered list: ```python theme={null} import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("FIREWORKS_API_KEY"), base_url="https://api.fireworks.ai/inference/v1" ) response = client.completions.create( model="accounts/fireworks/models/qwen2p5-vl-32b-instruct", prompt="SYSTEM: Hello\n\nUSER:\ntell me about the image\n\nASSISTANT:", extra_body={ "images": ["https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80"] } ) print(response.choices[0].text) ``` ## Known limitations 1. **Maximum images per request**: 30 images maximum, regardless of format (base64 or URL) 2. **Base64 size limit**: Total base64-encoded images must be less than 10MB 3. **URL size and timeout**: Each image URL must be smaller than 5MB and download within 1.5 seconds 4. **Supported formats**: `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tiff`, `.ppm` 5. **Llama 3.2 Vision models**: Pass images before text in the content field to avoid refusals (temporary limitation) --- # Source: https://docs.fireworks.ai/fine-tuning/quickstart-math.md # Single-Turn Training Quickstart > Train a model to be an expert at answering GSM8K math questions **Following the [RFT Overview](/fine-tuning/reinforcement-fine-tuning-models)?** This is the **Single-Turn Training** path—the fastest way to get started with RFT. In this quickstart, you'll train a small language model—`Qwen3 0.6B`—to solve mathematical reasoning problems from the GSM8K dataset. ## What you'll learn * How to set up and test an evaluator locally, using the Eval Protocol SDK * How to take that evaluator and use it in an RFT job, from the command line * How to monitor training progress and evaluate accuracy improvements Prefer a notebook experience? You can also [run this tutorial in Google Colab](https://colab.research.google.com/drive/16xrb9rx6AoAEOtrDXumzo71HjhunaoPi#scrollTo=CP18QX4tgi-0). Note that Colab requires billing enabled on your Google account. ## Prerequisites * Python 3.10+ * A Fireworks API key (stored in your shell or .env) * Command-line access (terminal or shell) ## 1. Install dependencies and set up files Clone the quickstart-gsm8k repository and install dependencies: ```bash theme={null} git clone https://github.com/eval-protocol/quickstart-gsm8k.git cd quickstart-gsm8k pip install -r requirements.txt ``` Create the `gsm8k_artifacts/` folder structure and copy files: ```bash theme={null} mkdir -p gsm8k_artifacts/{tests/pytest/gsm8k,development} cp evaluation.py gsm8k_artifacts/tests/pytest/gsm8k/test_pytest_math_example.py cp gsm8k_sample.jsonl gsm8k_artifacts/development/gsm8k_sample.jsonl ``` The repository includes: * **Evaluator** (`evaluation.py`): Defines how to evaluate math answers * **Dataset** (`gsm8k_sample.jsonl`): Contains example math problems to test on Install the latest `eval-protocol` SDK, `pytest`, and `requests`: ```bash theme={null} python -m pip install --upgrade pip python -m pip install pytest requests git+https://github.com/eval-protocol/python-sdk.git ``` Download the evaluator and dataset files: Run this Python script to download two files from the Eval Protocol repository into a folder on your machine called `gsm8k_artifacts/`. * **Test script** (`test_pytest_math_example.py`): Defines how to evaluate math answers * **Sample dataset** (`gsm8k_sample.jsonl`): Contains example math problems to test on ```python tutorial/download_gsm8k_assets.py theme={null} from pathlib import Path import requests ARTIFACT_ROOT = Path("gsm8k_artifacts") TEST_PATH = ARTIFACT_ROOT / "tests" / "pytest" / "gsm8k" / "test_pytest_math_example.py" DATASET_PATH = ARTIFACT_ROOT / "development" / "gsm8k_sample.jsonl" files_to_download = { TEST_PATH: "https://raw.githubusercontent.com/eval-protocol/python-sdk/main/tests/pytest/gsm8k/test_pytest_math_example.py", DATASET_PATH: "https://raw.githubusercontent.com/eval-protocol/python-sdk/main/development/gsm8k_sample.jsonl", } for local_path, url in files_to_download.items(): local_path.parent.mkdir(parents=True, exist_ok=True) response = requests.get(url, timeout=30) response.raise_for_status() local_path.write_bytes(response.content) print(f"Saved {url} -> {local_path}") ``` Expected output: ``` Saved https://raw.githubusercontent.com/.../test_pytest_math_example.py -> gsm8k_artifacts/tests/pytest/gsm8k/test_pytest_math_example.py Saved https://raw.githubusercontent.com/.../gsm8k_sample.jsonl -> gsm8k_artifacts/development/gsm8k_sample.jsonl ``` ## 2. Test your evaluator locally In this step, we will test your evaluator by examining the output locally. Feel free to iterate on the evaluator you downloaded in the last step until it gives the output you want. Open a terminal and run: ```bash theme={null} ep logs ``` This will start a local server, navigate to `http://localhost:8000`. Keep this terminal running. In a **new terminal**, call the test script to run the evaluator on your dataset of sample math problems. ```bash theme={null} cd gsm8k_artifacts ep local-test ``` This command discovers and runs your `@evaluation_test` with pytest. As the test runs, you'll see evaluation scores appear in the browser, with detailed logs for each problem the model attempts. `pytest` will also register your evaluator and dataset with Fireworks automatically, so you can use them in the next step for RFT. GSM8K evaluation UI showing model scores and trajectories ## 3. Start training First, set your Fireworks API key so the Fireworks CLI can authenticate you: ```bash theme={null} export FIREWORKS_API_KEY="" ``` Next, we'll launch the RFT job using the evaluator and dataset you just registered. We're using a small base model (`qwen3-0p6b`) to keep training fast and inexpensive. Because your evaluator and dataset were already registered with Fireworks in the last step, we don't need to specify them again here. ```bash theme={null} cd .. eval-protocol create rft --base-model accounts/fireworks/models/qwen3-0p6b ``` The CLI will output dashboard links where you can monitor your training job in real-time. GSM8K evaluation score showing upward trajectory You can also store your API key in a `.env` file instead of exporting it each session. ## Monitor your training progress Your RFT job is now running. You can monitor progress in the dashboard links provided by the CLI output. Re-run the pytest evaluation command to measure your model's performance on new checkpoints: ```bash theme={null} cd gsm8k_artifacts pytest -q tests/pytest/gsm8k/test_pytest_math_example.py::test_math_dataset -s ``` This helps you see how your model's accuracy improves over time and decide when to stop training. You can adjust the evaluation logic to better fit your needs: * **Modify reward shaping**: Edit the scoring logic in `test_pytest_math_example.py` to match your answer format expectations * **Use your own data**: Replace the sample dataset by either editing the JSONL file locally or passing `--dataset-jsonl` when creating the RFT job ### What's happening behind the scenes Understanding the training workflow: 1. **Evaluation registration**: The pytest script evaluates a small GSM8K subset using numeric answer checking, then automatically registers both your evaluator and dataset with Fireworks 2. **RFT job creation**: The `create rft` command connects your registered evaluator and dataset to a Reinforcement Fine-Tuning job for your chosen base model 3. **Continuous improvement**: As training progresses, evaluation scores on the held-out set reflect improved accuracy, allowing you to iterate quickly before scaling to larger experiments ## Next steps Learn all CLI options to customize your training parameters Train agents that run in your production infrastructure Understand how reinforcement fine-tuning works --- # Source: https://docs.fireworks.ai/fine-tuning/quickstart-svg-agent.md # Remote Agent Quickstart > Train an SVG drawing agent running in a remote environment **Following the [RFT Overview](/fine-tuning/reinforcement-fine-tuning-models)?** This is the **Remote Agent Training** path—for training agents that run in your production infrastructure. In this quickstart, you'll train an agent to generate SVG drawings. Your agent runs in a remote server (Vercel), which means rollouts happen remotely while Fireworks handles the training. This approach lets you train agents that already live in your production environment. Here's a quick walkthrough: