# Runpod > ## Documentation Index --- # Source: https://docs.runpod.io/tutorials/sdks/python/101/aggregate.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Aggregating outputs in Runpod serverless functions This tutorial will guide you through using the `return_aggregate_stream` feature in Runpod to simplify result handling in your serverless functions. Using `return_aggregate_stream` allows you to automatically collect and aggregate all yielded results from a generator handler into a single response. This simplifies result handling, making it easier to manage and return a consolidated set of results from asynchronous tasks, such as concurrent sentiment analysis or object detection, without needing additional code to collect and format the results manually. We'll create a multi-purpose analyzer that can perform sentiment analysis on text and object detection in images, demonstrating how to aggregate outputs efficiently. ## Setting up your Serverless Function Let's break down the process of creating our multi-purpose analyzer into steps. ### Import required libraries First, import the necessary libraries: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import time import random ``` ### Create Helper Functions Define functions to simulate sentiment analysis and object detection: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} def analyze_sentiment(text): """Simulate sentiment analysis of text.""" sentiments = ["Positive", "Neutral", "Negative"] score = random.uniform(-1, 1) sentiment = random.choice(sentiments) return f"Sentiment: {sentiment}, Score: {score:.2f}" def detect_objects(image_url): """Simulate object detection in an image.""" objects = ["person", "car", "dog", "cat", "tree", "building"] detected = random.sample(objects, random.randint(1, 4)) confidences = [random.uniform(0.7, 0.99) for _ in detected] return [f"{obj}: {conf:.2f}" for obj, conf in zip(detected, confidences)] ``` These functions: 1. Simulate sentiment analysis, returning a random sentiment and score 2. Simulate object detection, returning a list of detected objects with confidence scores ### Create the main Handler Function Now, let's create the main handler function that processes jobs and yields results: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} def handler(job): job_input = job["input"] task_type = job_input.get("task_type", "sentiment") items = job_input.get("items", []) results = [] for item in items: time.sleep(random.uniform(0.5, 2)) # Simulate processing time if task_type == "sentiment": result = analyze_sentiment(item) elif task_type == "object_detection": result = detect_objects(item) else: result = f"Unknown task type: {task_type}" results.append(result) yield result return results ``` This handler: 1. Determines the task type (sentiment analysis or object detection) 2. Processes each item in the input 3. Yields results incrementally 4. Returns the complete list of results ### Set up the Serverless Function starter Create a function to start the serverless handler with proper configuration: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} def start_handler(): def wrapper(job): generator = handler(job) if job.get("id") == "local_test": return list(generator) return generator runpod.serverless.start({"handler": wrapper, "return_aggregate_stream": True}) if __name__ == "__main__": start_handler() ``` This setup: 1. Creates a wrapper to handle both local testing and Runpod environments 2. Uses `return_aggregate_stream=True` to automatically aggregate yielded results ## Complete code example Here's the full code for our multi-purpose analyzer with output aggregation: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import time import random def analyze_sentiment(text): """Simulate sentiment analysis of text.""" sentiments = ["Positive", "Neutral", "Negative"] score = random.uniform(-1, 1) sentiment = random.choice(sentiments) return f"Sentiment: {sentiment}, Score: {score:.2f}" def detect_objects(image_url): """Simulate object detection in an image.""" objects = ["person", "car", "dog", "cat", "tree", "building"] detected = random.sample(objects, random.randint(1, 4)) confidences = [random.uniform(0.7, 0.99) for _ in detected] return [f"{obj}: {conf:.2f}" for obj, conf in zip(detected, confidences)] def handler(job): job_input = job["input"] task_type = job_input.get("task_type", "sentiment") items = job_input.get("items", []) results = [] for item in items: time.sleep(random.uniform(0.5, 2)) # Simulate processing time if task_type == "sentiment": result = analyze_sentiment(item) elif task_type == "object_detection": result = detect_objects(item) else: result = f"Unknown task type: {task_type}" results.append(result) yield result return results def start_handler(): def wrapper(job): generator = handler(job) if job.get("id") == "local_test": return list(generator) return generator runpod.serverless.start({"handler": wrapper, "return_aggregate_stream": True}) if __name__ == "__main__": start_handler() ``` ## Testing your Serverless Function To test your function locally, use these commands: For sentiment analysis: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python your_script.py --test_input ' { "input": { "task_type": "sentiment", "items": [ "I love this product!", "The service was terrible.", "It was okay, nothing special." ] } }' ``` For object detection: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python your_script.py --test_input ' { "input": { "task_type": "object_detection", "items": [ "image1.jpg", "image2.jpg", "image3.jpg" ] } }' ``` ### Understanding the output When you run the sentiment analysis test, you'll see output similar to this: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} --- Starting Serverless Worker | Version 1.6.2 --- INFO | test_input set, using test_input as job input. DEBUG | Retrieved local job: {'input': {'task_type': 'sentiment', 'items': ['I love this product!', 'The service was terrible.', 'It was okay, nothing special.']}, 'id': 'local_test'} INFO | local_test | Started. DEBUG | local_test | Handler output: ['Sentiment: Positive, Score: 0.85', 'Sentiment: Negative, Score: -0.72', 'Sentiment: Neutral, Score: 0.12'] DEBUG | local_test | run_job return: {'output': ['Sentiment: Positive, Score: 0.85', 'Sentiment: Negative, Score: -0.72', 'Sentiment: Neutral, Score: 0.12']} INFO | Job local_test completed successfully. INFO | Job result: {'output': ['Sentiment: Positive, Score: 0.85', 'Sentiment: Negative, Score: -0.72', 'Sentiment: Neutral, Score: 0.12']} INFO | Local testing complete, exiting. ``` This output demonstrates: 1. The serverless worker starting and processing the job 2. The handler generating results for each input item 3. The aggregation of results into a single list ## Conclusion You've now created a serverless function using Runpod's Python SDK that demonstrates efficient output aggregation for both local testing and production environments. This approach simplifies result handling and ensures consistent behavior across different execution contexts. To further enhance this application, consider: * Implementing real sentiment analysis and object detection models * Adding error handling and logging for each processing step * Exploring Runpod's advanced features for handling larger datasets or parallel processing Runpod's serverless library, with features like `return_aggregate_stream`, provides a powerful foundation for building scalable, efficient applications that can process and aggregate data seamlessly. --- # Source: https://docs.runpod.io/get-started/api-keys.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Manage API keys > Learn how to create, edit, and disable Runpod API keys. Legacy API keys generated before November 11, 2024 have either Read/Write or Read Only access to GraphQL based on what was set for that key. All legacy keys have full access to AI API. To improve security, generate a new key with **Restricted** permission and select the minimum permission needed for your use case. ## Create an API key Follow these steps to create a new Runpod API key: 1. In the Runpod console, navigate to the [Settings page](https://www.console.runpod.io/user/settings). 2. Expand the **API Keys** section and select **Create API Key**. 3. Give your key a name and set its permissions (**All**, **Restricted**, or **Read Only**). If you choose **Restricted**, you can customize access for each Runpod API: * **None**: No access * **Restricted**: Customize access for each of your [Serverless endpoints](/serverless/overview). (Default: None.) * **Read/Write**: Full access to your endpoints. * **Read Only**: Read access without write access. 4. Select **Create**, then select your newly-generated key to copy it to your clipboard. Runpod does not store your API key, so you may wish to save it elsewhere (e.g., in your password manager, or in a GitHub secret). Treat your API key like a password and don't share it with anyone. ## Edit API key permissions To edit an API key: 1. Navigate to the [Settings page](https://www.console.runpod.io/user/settings). 2. Under **API Keys**, select the pencil icon for the key you wish to update 3. Update the key with your desired permissions, then select **Update**. ## Enable/disable an API key To enable/disable an API key: 1. Navigate to the [Settings page](https://www.console.runpod.io/user/settings). 2. Under **API Keys**, select the toggle for the API key you wish to enable/disable, then select **Yes** in the confirmation modal. ## Delete an API key To delete an API key: 1. From the console, select **Settings**. 2. Under **API Keys**, select the trash can icon and select **Revoke Key** in the confirmation modal. --- # Source: https://docs.runpod.io/sdks/python/apis.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## API Wrapper This document outlines the core functionalities provided by the Runpod API, including how to interact with Endpoints, manage Templates, and list available GPUs. These operations let you dynamically manage computational resources within the Runpod environment. ## Get Endpoints To retrieve a comprehensive list of all available endpoint configurations within Runpod, you can use the `get_endpoints()` function. This function returns a list of endpoint configurations, allowing you to understand what's available for use in your projects. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import os runpod.api_key = os.getenv("RUNPOD_API_KEY") # Fetching all available endpoints endpoints = runpod.get_endpoints() # Displaying the list of endpoints print(endpoints) ``` ## Create Template Templates in Runpod serve as predefined configurations for setting up environments efficiently. The `create_template()` function facilitates the creation of new templates by specifying a name and a Docker image. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import os runpod.api_key = os.getenv("RUNPOD_API_KEY") try: # Creating a new template with a specified name and Docker image new_template = runpod.create_template(name="test", image_name="runpod/base:0.1.0") # Output the created template details print(new_template) except runpod.error.QueryError as err: # Handling potential errors during template creation print(err) print(err.query) ``` ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "id": "n6m0htekvq", "name": "test", "imageName": "runpod/base:0.1.0", "dockerArgs": "", "containerDiskInGb": 10, "volumeInGb": 0, "volumeMountPath": "/workspace", "ports": "", "env": [], "isServerless": false } ``` ## Create Endpoint Creating a new endpoint with the `create_endpoint()` function. This function requires you to specify a `name` and a `template_id`. Additional configurations such as GPUs, number of Workers, and more can also be specified depending your requirements. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import os runpod.api_key = os.getenv("RUNPOD_API_KEY") try: # Creating a template to use with the new endpoint new_template = runpod.create_template( name="test", image_name="runpod/base:0.4.4", is_serverless=True ) # Output the created template details print(new_template) # Creating a new endpoint using the previously created template new_endpoint = runpod.create_endpoint( name="test", template_id=new_template["id"], gpu_ids="AMPERE_16", workers_min=0, workers_max=1, ) # Output the created endpoint details print(new_endpoint) except runpod.error.QueryError as err: # Handling potential errors during endpoint creation print(err) print(err.query) ``` ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "id": "Unique_Id", "name": "YourTemplate", "imageName": "runpod/base:0.4.4", "dockerArgs": "", "containerDiskInGb": 10, "volumeInGb": 0, "volumeMountPath": "/workspace", "ports": null, "env": [], "isServerless": true } { "id": "Unique_Id", "name": "YourTemplate", "templateId": "Unique_Id", "gpuIds": "AMPERE_16", "networkVolumeId": null, "locations": null, "idleTimeout": 5, "scalerType": "QUEUE_DELAY", "scalerValue": 4, "workersMin": 0, "workersMax": 1 } ``` ## Get GPUs For understanding the computational resources available, the `get_gpus()` function lists all GPUs that can be allocated to endpoints in Runpod. This enables optimal resource selection based on your computational needs. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import json import os runpod.api_key = os.getenv("RUNPOD_API_KEY") # Fetching all available GPUs gpus = runpod.get_gpus() # Displaying the GPUs in a formatted manner print(json.dumps(gpus, indent=2)) ``` ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} [ { "id": "NVIDIA A100 80GB PCIe", "displayName": "A100 80GB", "memoryInGb": 80 }, { "id": "NVIDIA A100-SXM4-80GB", "displayName": "A100 SXM 80GB", "memoryInGb": 80 } // Additional GPUs omitted for brevity ] ``` ## Get GPU by Id Use `get_gpu()` and pass in a GPU Id to retrieve details about a specific GPU model by its ID. This is useful when understanding the capabilities and costs associated with various GPU models. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import json import os runpod.api_key = os.getenv("RUNPOD_API_KEY") gpus = runpod.get_gpu("NVIDIA A100 80GB PCIe") print(json.dumps(gpus, indent=2)) ``` ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "maxGpuCount": 8, "id": "NVIDIA A100 80GB PCIe", "displayName": "A100 80GB", "manufacturer": "Nvidia", "memoryInGb": 80, "cudaCores": 0, "secureCloud": true, "communityCloud": true, "securePrice": 1.89, "communityPrice": 1.59, "oneMonthPrice": null, "threeMonthPrice": null, "oneWeekPrice": null, "communitySpotPrice": 0.89, "secureSpotPrice": null, "lowestPrice": { "minimumBidPrice": 0.89, "uninterruptablePrice": 1.59 } } ``` Through these functionalities, the Runpod API enables efficient and flexible management of computational resources, catering to a wide range of project requirements. --- # Source: https://docs.runpod.io/tutorials/sdks/python/101/async.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Building an async generator handler for weather data simulation This tutorial will guide you through creating a serverless function using Runpod's Python SDK that simulates fetching weather data for multiple cities concurrently. Use asynchronous functions to handle multiple concurrent operations efficiently, especially when dealing with tasks that involve waiting for external resources, such as network requests or I/O operations. Asynchronous programming allows your code to perform other tasks while waiting, rather than blocking the entire program. This is particularly useful in a serverless environment where you want to maximize resource utilization and minimize response times. We'll use an async generator handler to stream results incrementally, demonstrating how to manage multiple concurrent operations efficiently in a serverless environment. ## Setting up your Serverless Function Let's break down the process of creating our weather data simulator into steps. ### SImport required libraries First, import the necessary libraries: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import asyncio import random import json import sys ``` ### Create the Weather Data Fetcher Define an asynchronous function that simulates fetching weather data: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} async def fetch_weather_data(city, delay): await asyncio.sleep(delay) temperature = random.uniform(-10, 40) humidity = random.uniform(0, 100) return { "city": city, "temperature": round(temperature, 1), "humidity": round(humidity, 1) } ``` This function: 1. Simulates a network delay using `asyncio.sleep()` 2. Generates random temperature and humidity data 3. Returns a dictionary with the weather data for a city ### Create the Async Generator Handler Now, let's create the main handler function: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} async def async_generator_handler(job): job_input = job['input'] cities = job_input.get('cities', ['New York', 'London', 'Tokyo', 'Sydney', 'Moscow']) update_interval = job_input.get('update_interval', 2) duration = job_input.get('duration', 10) print(f"Weather Data Stream | Starting job {job['id']}") print(f"Monitoring cities: {', '.join(cities)}") start_time = asyncio.get_event_loop().time() while asyncio.get_event_loop().time() - start_time < duration: tasks = [fetch_weather_data(city, random.uniform(0.5, 2)) for city in cities] for completed_task in asyncio.as_completed(tasks): weather_data = await completed_task yield { "timestamp": round(asyncio.get_event_loop().time() - start_time, 2), "data": weather_data } await asyncio.sleep(update_interval) yield {"status": "completed", "message": "Weather monitoring completed"} ``` This handler: 1. Extracts parameters from the job input 2. Logs the start of the job 3. Creates tasks for fetching weather data for each city 4. Uses `asyncio.as_completed()` to yield results as they become available 5. Continues fetching data at specified intervals for the given duration ### Set up the Main Execution Finally, Set up the main execution block: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} async def run_test(job): async for item in async_generator_handler(job): print(json.dumps(item)) if __name__ == "__main__": if "--test_input" in sys.argv: # Code for local testing (see full example) else: runpod.serverless.start({ "handler": async_generator_handler, "return_aggregate_stream": True }) ``` This block allows for both local testing and deployment as a Runpod serverless function. ## Complete code example Here's the full code for our serverless weather data simulator: ```python fetch_weather_data.py theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import asyncio import random import json import sys async def fetch_weather_data(city, delay): await asyncio.sleep(delay) temperature = random.uniform(-10, 40) humidity = random.uniform(0, 100) return { "city": city, "temperature": round(temperature, 1), "humidity": round(humidity, 1) } async def async_generator_handler(job): job_input = job['input'] cities = job_input.get('cities', ['New York', 'London', 'Tokyo', 'Sydney', 'Moscow']) update_interval = job_input.get('update_interval', 2) duration = job_input.get('duration', 10) print(f"Weather Data Stream | Starting job {job['id']}") print(f"Monitoring cities: {', '.join(cities)}") start_time = asyncio.get_event_loop().time() while asyncio.get_event_loop().time() - start_time < duration: tasks = [fetch_weather_data(city, random.uniform(0.5, 2)) for city in cities] for completed_task in asyncio.as_completed(tasks): weather_data = await completed_task yield { "timestamp": round(asyncio.get_event_loop().time() - start_time, 2), "data": weather_data } await asyncio.sleep(update_interval) yield {"status": "completed", "message": "Weather monitoring completed"} async def run_test(job): async for item in async_generator_handler(job): print(json.dumps(item)) if __name__ == "__main__": if "--test_input" in sys.argv: test_input_index = sys.argv.index("--test_input") if test_input_index + 1 < len(sys.argv): test_input_json = sys.argv[test_input_index + 1] try: job = json.loads(test_input_json) asyncio.run(run_test(job)) except json.JSONDecodeError: print("Error: Invalid JSON in test_input") else: print("Error: --test_input requires a JSON string argument") else: runpod.serverless.start({ "handler": async_generator_handler, "return_aggregate_stream": True }) ``` ## Testing Your Serverless Function To test your function locally, use this command: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python your_script.py --test_input ' { "input": { "cities": ["New York", "London", "Tokyo", "Paris", "Sydney"], "update_interval": 3, "duration": 15 }, "id": "local_test" }' ``` ### Understanding the output When you run the test, you'll see output similar to this: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} Weather Data Stream | Starting job local_test Monitoring cities: New York, London, Tokyo, Paris, Sydney {"timestamp": 0.84, "data": {"city": "London", "temperature": 11.0, "humidity": 7.3}} {"timestamp": 0.99, "data": {"city": "Paris", "temperature": -5.9, "humidity": 59.3}} {"timestamp": 1.75, "data": {"city": "Tokyo", "temperature": 18.4, "humidity": 34.1}} {"timestamp": 1.8, "data": {"city": "Sydney", "temperature": 26.8, "humidity": 91.0}} {"timestamp": 1.99, "data": {"city": "New York", "temperature": 35.9, "humidity": 27.5}} {"status": "completed", "message": "Weather monitoring completed"} ``` This output demonstrates: 1. The concurrent processing of weather data for multiple cities 2. Real-time updates with timestamps 3. A completion message when the monitoring duration is reached ## Conclusion You've now created a serverless function using Runpod's Python SDK that simulates concurrent weather data fetching for multiple cities. This example showcases how to handle multiple asynchronous operations and stream results incrementally in a serverless environment. To further enhance this application, consider: * Implementing real API calls to fetch actual weather data * Adding error handling for network failures or API limits * Exploring Runpod's documentation for advanced features like scaling for high-concurrency scenarios Runpod's serverless library provides a powerful foundation for building scalable, efficient applications that can process and stream data concurrently in real-time without the need to manage infrastructure. --- # Source: https://docs.runpod.io/instant-clusters/axolotl.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Deploy an Instant Cluster with Axolotl This tutorial demonstrates how to use Instant Clusters with [Axolotl](https://axolotl.ai/) to fine-tune large language models (LLMs) across multiple GPUs. By leveraging PyTorch's distributed training capabilities and Runpod's high-speed networking infrastructure, you can significantly accelerate your training process compared to single-GPU setups. Follow the steps below to deploy a cluster and start training your models efficiently. ## Step 1: Deploy an Instant Cluster 1. Open the [Instant Clusters page](https://www.console.runpod.io/cluster) on the Runpod web interface. 2. Click **Create Cluster**. 3. Use the UI to name and configure your Cluster. For this walkthrough, keep **Pod Count** at **2** and select the option for **16x H100 SXM** GPUs. Keep the **Pod Template** at its default setting (Runpod PyTorch). 4. Click **Deploy Cluster**. You should be redirected to the Instant Clusters page after a few seconds. ## Step 2: Set up Axolotl on each Pod 1. Click your cluster to expand the list of Pods. 2. Click on a Pod, for example `CLUSTERNAME-pod-0`, to expand the Pod. 3. Click **Connect**, then click **Web Terminal**. 4. In the terminal that opens, run this command to clone the Axolotl repository into the Pod's main directory: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} git clone https://github.com/axolotl-ai-cloud/axolotl ``` 5. Navigate to the `axolotl` directory: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} cd axolotl ``` 6. Install the required packages: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} pip3 install -U packaging setuptools wheel ninja pip3 install --no-build-isolation -e '.[flash-attn,deepspeed]' ``` 7. Navigate to the `examples/llama-3` directory: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} cd examples/llama-3 ``` Repeat these steps for **each Pod** in your cluster. ## Step 3: Start the training process on each Pod Run this command in the web terminal of **each Pod**: ```php theme={"theme":{"light":"github-light","dark":"github-dark"}} torchrun \ --nnodes $NUM_NODES \ --node_rank $NODE_RANK \ --nproc_per_node $NUM_TRAINERS \ --rdzv_id "myjob" \ --rdzv_backend static \ --rdzv_endpoint "$PRIMARY_ADDR:$PRIMARY_PORT" -m axolotl.cli.train lora-1b.yml ``` Currently, the dynamic `c10d` backend is not supported. Please keep the `rdzv_backend` flag set to `static`. After running the command on the last Pod, you should see output similar to this after the training process is complete: ```csharp theme={"theme":{"light":"github-light","dark":"github-dark"}} ... {'loss': 1.2569, 'grad_norm': 0.11112671345472336, 'learning_rate': 5.418275829936537e-06, 'epoch': 0.9} {'loss': 1.2091, 'grad_norm': 0.11100614815950394, 'learning_rate': 3.7731999690749585e-06, 'epoch': 0.92} {'loss': 1.2216, 'grad_norm': 0.10450132936239243, 'learning_rate': 2.420361737256438e-06, 'epoch': 0.93} {'loss': 1.223, 'grad_norm': 0.10873789340257645, 'learning_rate': 1.3638696597277679e-06, 'epoch': 0.95} {'loss': 1.2529, 'grad_norm': 0.1063728854060173, 'learning_rate': 6.069322682050516e-07, 'epoch': 0.96} {'loss': 1.2304, 'grad_norm': 0.10996092110872269, 'learning_rate': 1.518483566683826e-07, 'epoch': 0.98} {'loss': 1.2334, 'grad_norm': 0.10642101615667343, 'learning_rate': 0.0, 'epoch': 0.99} {'train_runtime': 61.7602, 'train_samples_per_second': 795.189, 'train_steps_per_second': 1.085, 'train_loss': 1.255359119443751, 'epoch': 0.99} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [01:00<00:00, 1.11it/s] [2025-04-01 19:24:22,603] [INFO] [axolotl.train.save_trained_model:211] [PID:1009] [RANK:0] Training completed! Saving pre-trained model to ./outputs/lora-out. ``` Congrats! You've successfully trained a model using Axolotl on an Instant Cluster. Your fine-tuned model has been saved to the `./outputs/lora-out` directory. You can now use this model for inference or continue training with different parameters. ## Step 4: Clean up If you no longer need your cluster, make sure you return to the [Instant Clusters page](https://www.console.runpod.io/cluster) and delete your cluster to avoid incurring extra charges. You can monitor your cluster usage and spending using the **Billing Explorer** at the bottom of the [Billing page](https://www.console.runpod.io/user/billing) section under the **Cluster** tab. ## Next steps Now that you've successfully deployed and tested an Axolotl distributed training job on an Instant Cluster, you can: * **Fine-tune your own models** by modifying the configuration files in Axolotl to suit your specific requirements. * **Scale your training** by adjusting the number of Pods in your cluster (and the size of their containers and volumes) to handle larger models or datasets. * **Try different optimization techniques** such as DeepSpeed, FSDP (Fully Sharded Data Parallel), or other distributed training strategies. For more information on fine-tuning with Axolotl, refer to the [Axolotl documentation](https://github.com/OpenAccess-AI-Collective/axolotl). --- # Source: https://docs.runpod.io/serverless/development/benchmarking.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Benchmark workers and requests > Measure the performance of your Serverless workers and identify bottlenecks. Benchmarking your Serverless workers helps you identify bottlenecks and [optimize your code](/serverless/development/optimization) for performance and cost. Performance is measured by two key metrics: * **Delay time**: The time spent waiting for a worker to become available. This includes the cold start time if a new worker needs to be spun up. * **Execution time**: The time the GPU takes to process the request once the worker has received the job. ## Send a test request To gather initial metrics, use `curl` to send a request to your endpoint. This will initiate the job and return a request ID that you can use to poll for status. ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{"input": {"prompt": "Hello, world!"}}' ``` This returns a JSON object containing the request ID. Poll the `/status` endpoint to get the delay time and execution time: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X GET https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/status/REQUEST_ID \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" ``` This returns a JSON object: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "id": "1234567890", "status": "COMPLETED", "delayTime": 1000, "executionTime": 2000 } ``` ### Automate benchmarking To get a statistically significant view of your worker's performance, you should automate the benchmarking process. The following Python script sends multiple requests and calculates the minimum, maximum, and average times for both delay and execution. ```python benchmark.py theme={"theme":{"light":"github-light","dark":"github-dark"}} import requests import time import statistics ENDPOINT_ID = "YOUR_ENDPOINT_ID" API_KEY = "YOUR_API_KEY" BASE_URL = f"https://api.runpod.ai/v2/{ENDPOINT_ID}" HEADERS = { "Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}" } def run_benchmark(num_requests=5): delay_times = [] execution_times = [] for i in range(num_requests): # Send request response = requests.post( f"{BASE_URL}/run", headers=HEADERS, json={"input": {"prompt": f"Test request {i+1}"}} ) request_id = response.json()["id"] # Poll for completion while True: status_response = requests.get( f"{BASE_URL}/status/{request_id}", headers=HEADERS ) status_data = status_response.json() if status_data["status"] == "COMPLETED": delay_times.append(status_data["delayTime"]) execution_times.append(status_data["executionTime"]) break elif status_data["status"] == "FAILED": print(f"Request {i+1} failed") break time.sleep(1) # Calculate statistics print(f"Delay Time - Min: {min(delay_times)}ms, Max: {max(delay_times)}ms, Avg: {statistics.mean(delay_times):.0f}ms") print(f"Execution Time - Min: {min(execution_times)}ms, Max: {max(execution_times)}ms, Avg: {statistics.mean(execution_times):.0f}ms") if __name__ == "__main__": run_benchmark(num_requests=5) ``` --- # Source: https://docs.runpod.io/references/billing-information.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Billing information > Understand how billing works for Pods, storage, network volumes, refunds, and spending limits. All billing, including per-hour compute and storage billing, is charged per minute. ## How billing works Every Pod has an hourly cost based on [GPU type](/references/gpu-types). Your Runpod credits are charged every minute the Pod is running. If you run out of credits, your Pods are automatically stopped and you'll receive an email notification. Pods are eventually terminated if you don't refill your credits. Runpod pre-emptively stops all your Pods when your account balance is projected to cover less than 10 minutes of remaining runtime. This ensures your account retains a small balance to help preserve your data volumes. If your balance is completely drained, all Pods are subject to deletion. Setting up [automatic payments](https://www.console.runpod.io/user/billing) is recommended to avoid service interruptions. You must have at least one hour's worth of runtime in your balance to rent a Pod at your given spec. If your balance is insufficient, consider renting the Pod on Spot, depositing additional funds, or lowering your GPU spec requirements. ## Storage billing Storage billing varies depending on Pod state. Running Pods are charged \$0.10 per GB per month for all storage, while stopped Pods are charged \$0.20 per GB per month for volume storage. Storage is charged per minute. You are not charged for storage if the host machine is down or unavailable from the public internet. ## Network volume billing Network volumes are billed hourly based on storage size. For storage below 1TB, you'll pay \$0.07 per GB per month. Above 1TB, the rate drops to \$0.05 per GB per month. Network volumes are hosted on storage servers located in the same datacenters where you rent GPU servers. These servers are connected via a high-speed local network (25Gbps to 200Gbps depending on location) and use NVME SSDs for storage. If your machine-based storage or network volume is terminated due to lack of funds, that disk space is immediately freed up for use by other clients. Runpod cannot assist in recovering lost storage. Runpod is not designed as a cloud storage system—storage is provided to support running tasks on GPUs. Back up critical data regularly to your local machine or a dedicated cloud storage provider. ## Refunds and withdrawals Runpod does not offer the option to withdraw your unused balance after depositing funds. When you add funds to your Runpod account, credits are non-refundable and can only be used for Runpod services. Only deposit the amount you intend to use. If you aren't sure if Runpod is right for you, you can load as little as \$10 into your account to try things out. Visit the [Discord community](https://discord.gg/pJ3P2DbUUq) to ask questions or email [help@runpod.io](mailto:help@runpod.io). Refunds and trial credits are not currently offered due to processing overhead. If you have questions about billing or need assistance planning your Runpod expenses, contact support at [help@runpod.io](mailto:help@runpod.io). ## Spending limits Spending limits are implemented for newer accounts to prevent fraud. These limits grow over time and should not impact normal usage. If you need an increased spending limit, [contact support](https://www.runpod.io/contact) and share your use case. ### Payment methods Runpod accepts several payment methods for funding your account: 1. **Credit Card**: You can use your credit card to fund your Runpod account. However, be aware that card declines are more common than you might think, and the reasons for them might not always be clear. If you're using a prepaid card, it's recommended to deposit in transactions of at least \$100 to avoid unexpected blocks due to Stripe's minimums for prepaid cards. For more information, review [cards accepted by Stripe](https://stripe.com/docs/payments/cards/supported-card-brands?ref=blog.runpod.io). 2) **Crypto Payments**: Runpod also accepts crypto payments. It's recommended to set up a [crypto.com](https://crypto.com/?ref=blog.runpod.io) account and go through any KYC checks they may require ahead of time. This provides an alternate way of funding your account in case you run into issues with card payment. 3. **Business Invoicing**: For large transactions (over \$5,000), Runpod offers business invoicing through ACH, credit card, and local and international wire transfers. If you're having trouble with your card payments, you can contact [Runpod support](https://www.runpod.io/contact) for assistance. --- # Source: https://docs.runpod.io/serverless/load-balancing/build-a-worker.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Build a load balancing worker > Learn how to implement and deploy a load balancing worker with FastAPI. This tutorial shows how to build a load balancing worker using FastAPI and deploy it as a Serverless endpoint on Runpod. ## What you'll learn In this tutorial you'll learn how to: * Create a FastAPI application to serve your API endpoints. * Implement proper health checks for your workers. * Deploy your application as a load balancing Serverless endpoint. * Test and interact with your custom APIs. ## Requirements Before you begin you'll need: * A Runpod account. * Basic familiarity with Python and REST APIs. * Docker installed on your local machine. ## Step 1: Create a basic FastAPI application You can download a preconfigured repository containing the completed code for this tutorial [on GitHub](https://github.com/runpod-workers/worker-load-balancing/). First, let's create a simple FastAPI application that will serve as our API. Create a file named `app.py`: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import os from fastapi import FastAPI, HTTPException from pydantic import BaseModel # Create FastAPI app app = FastAPI() # Define request models class GenerationRequest(BaseModel): prompt: str max_tokens: int = 100 temperature: float = 0.7 class GenerationResponse(BaseModel): generated_text: str # Global variable to track requests request_count = 0 # Health check endpoint; required for Runpod to monitor worker health @app.get("/ping") async def health_check(): return {"status": "healthy"} # Our custom generation endpoint @app.post("/generate", response_model=GenerationResponse) async def generate(request: GenerationRequest): global request_count request_count += 1 # A simple mock implementation; we'll replace this with an actual model later generated_text = f"Response to: {request.prompt} (request #{request_count})" return {"generated_text": generated_text} # A simple endpoint to show request stats @app.get("/stats") async def stats(): return {"total_requests": request_count} # Run the app when the script is executed if __name__ == "__main__": import uvicorn port = int(os.getenv("PORT", 80)) print(f"Starting server on port {port}") # Start the server uvicorn.run(app, host="0.0.0.0", port=port) ``` This simple application defines the following endpoints: * A health check endpoint at `/ping` * A text generation endpoint at `/generate` * A statistics endpoint at `/stats` ## Step 2: Create a Dockerfile Now, let's create a `Dockerfile` to package our application: ```dockerfile FROM nvidia/cuda:12.1.0-base-ubuntu22.04 RUN apt-get update -y \ && apt-get install -y python3-pip RUN ldconfig /usr/local/cuda-12.1/compat/ # Install Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY app.py . # Start the handler CMD ["python3", "app.py"] ``` You'll also need to create a `requirements.txt` file: ```text fastapi==0.95.1 uvicorn==0.22.0 pydantic==1.10.7 ``` ## Step 3: Build and push the Docker image Build and push your Docker image to a container registry: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} # Build the image docker build --platform linux/amd64 -t YOUR_DOCKER_USERNAME/loadbalancer-example:v1.0 . # Push to Docker Hub docker push YOUR_DOCKER_USERNAME/loadbalancer-example:v1.0 ``` ## Step 4: Deploy to Runpod Now, let's deploy our application to a Serverless endpoint: 1. Go to the [Serverless page](https://www.runpod.io/console/serverless) in the Runpod console. 2. Click **New Endpoint** 3. Click **Import from Docker Registry**. 4. In the **Container Image** field, enter your Docker image URL: ```text YOUR_DOCKER_USERNAME/loadbalancer-example:v1.0 ``` Then click **Next**. 5. Give your endpoint a name. 6. Under **Endpoint Type**, select **Load Balancer**. 7. Under **GPU Configuration**, select at least one GPU type (16 GB or 24 GB GPUs are fine for this example). 8. Leave all other settings at their defaults. 9. Click **Deploy Endpoint**. ## Step 5: Access your custom API Once your endpoint is created, you can access your custom APIs at: ```text https://ENDPOINT_ID.api.runpod.ai/PATH ``` For example, the load balancing worker we defined in step 1 exposes these endpoints: * Health check: `https://ENDPOINT_ID.api.runpod.ai/ping` * Generate text: `https://ENDPOINT_ID.api.runpod.ai/generate` * Get request count: `https://ENDPOINT_ID.api.runpod.ai/stats` Try running one or more of these commands, replacing `ENDPOINT_ID` and `RUNPOD_API_KEY` with your actual endpoint ID and API key: ```bash generate theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST "https://ENDPOINT_ID.api.runpod.ai/generate" \ -H 'Authorization: Bearer RUNPOD_API_KEY' \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!"}' ``` ```bash ping theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X GET "https://ENDPOINT_ID.api.runpod.ai/ping" \ -H 'Authorization: Bearer RUNPOD_API_KEY' \ -H "Content-Type: application/json" \ ``` ```bash stats theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X GET "https://ENDPOINT_ID.api.runpod.ai/stats" \ -H 'Authorization: Bearer RUNPOD_API_KEY' \ -H "Content-Type: application/json" \ ``` After sending a request, your workers will take some time to initialize. You can track their progress by checking the logs in the **Workers** tab of your endpoint page. If you see: `{"error":"no workers available"}%` after running the request, this means your workers did not initialize in time to process it. If you try running the request again, this will usually resolve the issue. For production applications, implement a health check with retries before sending requests. See [Handling cold start errors](/serverless/load-balancing/overview#handling-cold-start-errors) for a complete code example. Congratulations! You've now successfully deployed and tested a load balancing endpoint. If you want to use a real model, you can follow the [vLLM worker](/serverless/load-balancing/vllm-worker) tutorial. ## (Optional) Advanced endpoint definitions For a more complex API, you can define multiple endpoints and organize them logically. Here's an example of how to structure a more complex API: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} from fastapi import FastAPI, HTTPException, Depends, Query from pydantic import BaseModel import os app = FastAPI() # --- Authentication middleware --- def verify_api_key(api_key: str = Query(None, alias="api_key")): if api_key != os.getenv("API_KEY", "test_key"): raise HTTPException(401, "Invalid API key") return api_key # --- Models --- class TextRequest(BaseModel): text: str max_length: int = 100 class ImageRequest(BaseModel): prompt: str width: int = 512 height: int = 512 # --- Text endpoints --- @app.post("/v1/text/summarize") async def summarize(request: TextRequest, api_key: str = Depends(verify_api_key)): # Implement text summarization return {"summary": f"Summary of: {request.text[:30]}..."} @app.post("/v1/text/translate") async def translate(request: TextRequest, target_lang: str, api_key: str = Depends(verify_api_key)): # Implement translation return {"translation": f"Translation to {target_lang}: {request.text[:30]}..."} # --- Image endpoints --- @app.post("/v1/image/generate") async def generate_image(request: ImageRequest, api_key: str = Depends(verify_api_key)): # Implement image generation return {"image_url": f"https://example.com/images/{hash(request.prompt)}.jpg"} # --- Health check --- @app.get("/ping") async def health_check(): return {"status": "healthy"} ``` ## Troubleshooting Here are some common issues and methods for troubleshooting: * **No workers available**: If your request returns `{"error":"no workers available"}%`, this means means your workers did not initialize in time to process the request. Running the request again will usually fix this issue. * **Worker unhealthy**: Check your health endpoint implementation and ensure it's returning proper status codes. * **API not accessible**: If your request returns `{"error":"not allowed for QB API"}`, verify that your endpoint type is set to "Load Balancer". * **Port issues**: Make sure the environment variable for `PORT` matches what your application is using, and that the `PORT_HEALTH` variable is set to a different port. * **Model errors**: Check your model's requirements and whether it's compatible with your GPU. ## Next steps Now that you've learned how to build a basic load balancing worker, you can try [implementing a real model with vLLM](/serverless/load-balancing/vllm-worker). --- # Source: https://docs.runpod.io/tutorials/pods/build-docker-images.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Build Docker Images on Runpod with Bazel Runpod's GPU Pods use custom Docker images to run your code. This means you can't directly spin up your own Docker instance or build Docker containers on a GPU Pod. Tools like Docker Compose are also unavailable. This limitation can be frustrating when you need to create custom Docker images for your Runpod templates. Fortunately, many use cases can be addressed by creating a custom template with the desired Docker image. In this tutorial, you'll learn how to use the [Bazel](https://bazel.build) build tool to build and push Docker images from inside a Runpod container. By the end of this tutorial, you'll be able to build custom Docker images on Runpod and push them to Docker Hub for use in your own templates. ## Prerequisites Before you begin this guide you'll need the following: * A Docker Hub account and access token for authenticating the docker login command * Enough volume for your image to be built ## Create a Pod 1. Navigate to [Pods](https://www.console.runpod.io/pods) and select **+ Deploy**. 2. Choose between **GPU** and **CPU**. 3. Customize your an instance by setting up the following: 1. (optional) Specify a Network volume. 2. Select an instance type. For example, **A40**. 3. (optional) Provide a template. For example, **Runpod Pytorch**. 4. (GPU only) Specify your compute count. 4. Review your configuration and select **Deploy On-Demand**. For more information, see [Manage Pods](/pods/manage-pods#start-a-pod). Wait for the Pod to spin up then connect to your Pod through the Web Terminal: 1. Select **Connect**. 2. Choose **Start Web Terminal** and then **Connect to Web Terminal**. 3. Enter your username and password. Now you can clone the example GitHub repository ## Clone the example GitHub repository Clone the example code repository that demonstrates building Docker images with Bazel: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} git clone https://github.com/therealadityashankar/build-docker-in-runpod.git && cd build-docker-in-runpod ``` ## Install dependencies Install the required dependencies inside the Runpod container: Update packages and install sudo: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} apt update && apt install -y sudo ``` Install Docker using the convenience script: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh get-docker.sh ``` Log in to Docker using an access token: 1. Go to [https://hub.docker.com/settings/security](https://hub.docker.com/settings/security) and click "New Access Token". 2. Enter a description like "Runpod Token" and select "Read/Write" permissions. 3. Click "Generate" and copy the token that appears. 4. In the terminal, run: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker login -u ``` When prompted, paste in the access token you copied instead of your password. Install Bazel via the Bazelisk version manager: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64 chmod +x bazelisk-linux-amd64 sudo cp ./bazelisk-linux-amd64 /usr/local/bin/bazel ``` ## Configure the Bazel Build First, install nano if it’s not already installed and open the `BUILD.bazel` file for editing: ```bash BUILD.bazel theme={"theme":{"light":"github-light","dark":"github-dark"}} sudo apt install nano nano BUILD.bazel ``` Replace the `{YOUR_USERNAME}` placeholder with your Docker Hub username in the `BUILD.bazel` file: ```bash BUILD.bazel theme={"theme":{"light":"github-light","dark":"github-dark"}} [label BUILD.bazel] oci_push( name = "push_custom_image", image = ":custom_image", repository = "index.docker.io/{YOUR_USERNAME}/custom_image", remote_tags = ["latest"] ) ``` ## Build and Push the Docker Image Run the bazel command to build the Docker image and push it to your Docker Hub account: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} bazel run //:push_custom_image ``` Once the command completes, go to [https://hub.docker.com/](https://hub.docker.com/) and log in. You should see a new repository called `custom_image` containing the Docker image you just built. You can now reference this custom image in your own Runpod templates. ## Conclusion In this tutorial, you learned how to use Bazel to build and push Docker images from inside Runpod containers. By following the steps outlined, you can now create and utilize custom Docker images for your Runpod templates. The techniques demonstrated can be further expanded to build more complex images, providing a flexible solution for your containerization needs on Runpod. --- # Source: https://docs.runpod.io/hosting/burn-testing.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. ## Burn testing Machines should be thoroughly tested before they are listed on the Runpod platform. Here is a simple guide to running a burn test for a few days. Stop the Runpod agent by running: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} sudo systemctl stop runpod ``` Then you can kick off a gpu-burn run by typing: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker run --gpus all --rm jorghi21/gpu-burn-test 172800 ``` You should also verify that your memory, CPU, and disk are up to the task. You can use the [ngstress library](https://wiki.ubuntu.com/Kernel/Reference/stress-ngstress) to accomplish this. When everything is verified okay, start the Runpod agent again by running ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} sudo systemctl start runpod ``` Then, on your [machine dashboard](https://www.console.runpod.io/host/machines), self rent your machine to ensure it's working well with most popular templates. --- # Source: https://docs.runpod.io/pods/choose-a-pod.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Choose a Pod > Select the right Pod by evaluating your resource requirements. Selecting the appropriate Pod configuration is a crucial step in maximizing performance and efficiency for your specific workloads. This guide will help you understand the key factors to consider when choosing a Pod that meets your requirements. ## Understanding your workload needs Before selecting a Pod, take time to analyze your specific project requirements. Different applications have varying demands for computing resources: * Machine learning models require sufficient VRAM and powerful GPUs. * Data processing tasks benefit from higher CPU core counts and RAM. * Rendering workloads need both strong GPU capabilities and adequate storage. For machine learning models, check the model's documentation on platforms like Hugging Face or review the `config.json` file to understand its resource requirements. ## Resource assessment tools There are several online tools that can help you estimate your resource requirements: * [Hugging Face's Model Memory Usage Calculator](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) provides memory estimates for transformer models. * [Vokturz's Can it run LLM calculator](https://huggingface.co/spaces/Vokturz/can-it-run-llm) helps determine if your hardware can run specific language models. * [Alexander Smirnov's VRAM Estimator](https://vram.asmirnov.xyz) offers GPU memory requirement approximations. ## Key factors to consider ### GPU selection The GPU is the cornerstone of computational performance for many workloads. When selecting your GPU, consider the architecture that best suits your software requirements. NVIDIA GPUs with CUDA support are essential for most machine learning frameworks, while some applications might perform better on specific GPU generations. Evaluate both the raw computing power (CUDA cores, tensor cores) and the memory bandwidth to ensure optimal performance for your specific tasks. For machine learning inference, a mid-range GPU might be sufficient, while training large models requires more powerful options. Check framework-specific recommendations, as PyTorch, TensorFlow, and other frameworks may perform differently across GPU types. For a full list of available GPUs, see [GPU types](/references/gpu-types). ### VRAM requirements VRAM (video RAM) is the dedicated memory on your GPU that stores data being processed. Insufficient VRAM can severely limit your ability to work with large models or datasets. For machine learning models, VRAM requirements increase with model size, batch size, and input dimensions. When working with LLMs, a general guideline is to **allocate approximately 2GB of VRAM per billion parameters**. For example, running a 13-billion parameter model efficiently would require around 26GB of VRAM. Following this guideline helps ensure smooth model operation and prevents out-of-memory errors. ### Storage configuration Your storage configuration affects both data access speeds and your ability to maintain persistent workspaces. Runpod offers both temporary and persistent [storage options](/pods/storage/types). When determining your storage needs, account for raw data size, intermediate files generated during processing, and space for output results. For data-intensive workloads, prioritize both capacity and speed to avoid bottlenecks. ## Balancing performance and cost When selecting a Pod, consider these strategies for balancing performance and cost: 1. Use right-sized resources for your workload. For development and testing, a smaller Pod configuration may be sufficient, while production workloads might require more powerful options. 2. Take advantage of spot instances for non-critical or fault-tolerant workloads to reduce costs. For consistent availability needs, on-demand or reserved Pods provide greater reliability. 3. For extended usage, explore Runpod's [savings plans](/pods/pricing#savings-plans) to optimize your spending while ensuring access to the resources you need. ## Secure Cloud vs Community Cloud Secure Cloud operates in T3/T4 data centers with high reliability, redundancy, security, and fast response times to minimize downtime. It's designed for sensitive and enterprise workloads. Community Cloud connects individual compute providers to users through a peer-to-peer GPU computing platform. Hosts are invite-only and vetted to maintain quality standards. Community Cloud offers competitive pricing with good server quality, though with less redundancy for power and networking compared to Secure Cloud. ## Next steps Once you've determined your resource requirements, you can learn how to: * [Deploy a Pod](/get-started). * [Manage your Pods](/pods/manage-pods). * [Connect to a Pod](/pods/connect-to-a-pod). Remember that you can always deploy a new Pod if your requirements evolve. Start with a configuration that meets your immediate needs, then scale up or down based on actual usage patterns and performance metrics. --- # Source: https://docs.runpod.io/serverless/development/cleanup.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Clean up temporary files > Manage disk space by automatically removing temporary files. The Runpod SDK's `clean()` function helps maintain the health of your Serverless worker by removing temporary files and folders after processing completes. This is particularly important for workers that download large assets or generate temporary artifacts, as accumulated data can lead to `DiskQuotaExceeded` errors over time. ## Import the `clean()` function To use the `clean()` function, import it from the `utils.rp_cleanup` module: ```python from runpod.serverless.utils.rp_cleanup import clean ``` ## Default behavior When called without arguments, `clean()` targets a specific set of default directories for removal: * `input_objects/` * `output_objects/` * `job_files/` * `output.zip` These are standard locations used by various SDK operations, and cleaning them ensures a fresh state for the next request. ## Custom cleanup If your handler generates files in non-standard directories, you can override the default behavior by passing a list of folder names to the `folder_list` parameter. ```python clean(folder_list=["temp_images", "cache", "downloads"]) ``` ## Use `clean()` in your handler You should integrate cleanup logic into your handler's lifecycle, typically within a `finally` block or right before returning the result. ```python import runpod from runpod.serverless.utils.rp_cleanup import clean import requests import os def download_image(url, save_path): response = requests.get(url) if response.status_code == 200: with open(save_path, "wb") as file: file.write(response.content) return True return False def handler(event): try: image_url = event["input"]["image_url"] # Create a temporary directory os.makedirs("temp_images", exist_ok=True) image_path = "temp_images/downloaded_image.jpg" # Download the image if not download_image(image_url, image_path): raise Exception("Failed to download image") # Process the image (your code here) result = f"Processed image from: {image_url}" # Cleanup specific folders after processing clean(folder_list=["temp_images"]) return {"output": result} except Exception as e: # Attempt cleanup even if an error occurs clean(folder_list=["temp_images"]) return {"error": str(e)} runpod.serverless.start({"handler": handler}) ``` ## Best practices To ensure reliability, always call `clean()` at the end of your handler execution. We recommend wrapping your cleanup calls in a `try...except` or `finally` block so that disk space is recovered even if your main processing logic fails. Be cautious when adding custom folders to the cleanup list to avoid accidentally deleting persistent data, and consider logging cleanup actions during development to verify that the correct paths are being targeted. --- # Source: https://docs.runpod.io/pods/storage/cloud-sync.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Sync Pod data with cloud storage providers > Learn how to sync your Pod data with popular cloud storage providers. Runpod's Cloud Sync feature makes it easy to upload your Pod data to external cloud storage providers, or download data from cloud storage providers to your Pod. This guide walks you through setting up and using Cloud Sync with supported providers. Cloud Sync supports syncing data with these cloud storage providers: * Amazon S3 * Google Cloud Storage * Microsoft Azure Blob Storage * Dropbox * Backblaze B2 Cloud Storage ## Security best practices When using Cloud Sync, follow these security guidelines to protect your data and credentials: * Keep all access keys, tokens, and credentials confidential. * Use dedicated service accounts or application-specific credentials when possible. * Grant only the minimum permissions required for data transfer. * Regularly rotate your access credentials. * Monitor your cloud storage logs for unauthorized access. ## Amazon S3 Amazon S3 provides scalable object storage that integrates seamlessly with Runpod through Cloud Sync. Follow the steps below to sync your data with Amazon S3: Navigate to the [Amazon S3 bucket creation form](https://s3.console.aws.amazon.com/s3/bucket/create?region=us-east-1) in your AWS console. Provide a descriptive name for your bucket and select your preferred AWS Region (this affects data storage location and access speeds). If you need your bucket to be publicly accessible, uncheck the **Block public access** option at the bottom of the form. For most use cases, keeping this checked provides better security. Go to **Security credentials** in your AWS account settings. Create a new Access Key on the Security credentials page. Your Secret Access Key will be displayed only once during creation, so make sure to save it securely. In the Runpod console, navigate to the [Pods page](https://runpod.io/console/pods) and select the Pod containing your data. Click **Cloud Sync**, then select **AWS S3** from the available providers. Enter your **AWS Access Key ID** and **Secret Access Key** in the provided fields. Specify the **AWS Region** where your bucket is located and provide the complete bucket path where you want to store your data. Click **Copy to/from AWS S3** to initiate the transfer. The transfer progress will be displayed in the Runpod interface. Large datasets may take time depending on your Pod's network connection. ## Google Cloud Platform Storage Cloud Sync is compatible with Google Cloud Storage, but **not Google Drive**. However, you can transfer files between your Pods and Drive [using the Runpod CLI](/pods/storage/transfer-files#transfer-files-between-google-drive-and-runpod). Google Cloud Storage offers high-performance object storage with global availability and strong consistency. Follow the steps below to sync your data with Google Cloud Storage: Access the Google Cloud Storage dashboard and click **Buckets → Create** to start the bucket creation process. Choose a globally unique name for your bucket. Leave most configuration options at their default settings unless you have specific requirements. To allow public access to your bucket contents, uncheck **Enforce Public Access Prevention On This Bucket**. Keep this checked for better security unless public access is required. Create a service account specifically for Runpod access. This provides better security than using your primary account credentials. Follow [Google's guide on creating service account keys](https://cloud.google.com/iam/docs/keys-create-delete) to generate a JSON key file. This key contains all necessary authentication information. In the Runpod console, navigate to the [Pods page](https://runpod.io/console/pods) and select the Pod containing your data. Click **Cloud Sync**, then select **Google Cloud Storage** from the available providers. Paste the entire contents of your Service Account JSON key into the provided field. Specify the source/destination path within your bucket and select which folders from your Pod to transfer. Click **Copy to/from Google Cloud Storage** to initiate the transfer. The transfer progress will be displayed in the Runpod interface. Large datasets may take time depending on your Pod's network connection. ## Microsoft Azure Blob Storage Azure Blob Storage provides massively scalable object storage for unstructured data, with seamless integration into the Azure ecosystem. Follow the steps below to sync your data with Microsoft Azure Blob Storage: Start by creating a Resource Group to organize your Azure resources. Navigate to [Resource Groups](https://portal.azure.com/#view/HubsExtension/BrowseResourceGroups) in the Azure portal and click **Create**. Next, set up a Storage Account under [Storage Accounts](https://portal.azure.com/#view/HubsExtension/BrowseResource/resourceType/Microsoft.Storage%2FStorageAccounts). Click **Create** and assign it to your newly created Resource Group. Navigate to **Security + Networking → Access Keys** in your storage account to retrieve the authentication key. Create a Blob Container by going to **Storage Browser → Blob Containers** and clicking **Add Container**. Consider creating folders within the container for better organization if you plan to sync data to/from multiple Pods. In the Runpod console, navigate to the [Pods page](https://runpod.io/console/pods) and select the Pod containing your data. Click **Cloud Sync**, then select **Azure Blob Storage** from the available providers. Enter your **Azure Account Name** and **Account Key** in the provided fields. Specify the source/destination path in your blob storage where you want to store your data. Click **Copy to/from Azure Blob Storage** to initiate the transfer. The transfer progress will be displayed in the Runpod interface. Large datasets may take time depending on your Pod's network connection. ## Backblaze B2 Cloud Storage Backblaze B2 offers affordable cloud storage with S3-compatible APIs and straightforward pricing. Follow the steps below to sync your data with Backblaze B2 Cloud Storage: Navigate to [B2 Cloud Storage Buckets](https://secure.backblaze.com/b2_buckets.htm) and click **Create a Bucket**. Set the bucket visibility to **Public** to allow Runpod access. You can restrict access later using application keys if needed. Visit [App Keys](https://secure.backblaze.com/app_keys.htm) to create a new application key. This key provides authenticated access to your bucket. Save both the KeyID and applicationKey securely—the applicationKey cannot be retrieved after creation. In the Runpod console, navigate to the [Pods page](https://runpod.io/console/pods) and select the Pod containing your data. Click **Cloud Sync**, then select **Backblaze B2** from the available providers. Enter your **Backblaze B2 Account ID**, **Application Key**, and **bucket path** as shown in the Backblaze interface. Click **Copy to/from Backblaze B2** to initiate the transfer. The transfer progress will be displayed in the Runpod interface. ## Dropbox Dropbox integration allows you to sync your Pod data with your Dropbox account using OAuth authentication. Follow the steps below to sync your data with Dropbox: Go to the [Dropbox App Console](https://www.dropbox.com/developers/apps/create) to create a new app. Select **Scoped Access** for API options and **Full Dropbox** for access type. Choose a descriptive name for your app. In the Dropbox App Console, navigate to the **Permissions** tab. Enable all required checkboxes for read and write access to ensure Cloud Sync can transfer files properly. Return to the **Settings** tab of your app. In the OAuth2 section, click **Generate** under Generated Access Token. Save this token immediately—it won't be shown again after you leave the page. This token authenticates Runpod's access to your Dropbox. In the Runpod console, navigate to the [Pods page](https://runpod.io/console/pods) and select the Pod containing your data. Click **Cloud Sync**, then select **Dropbox** from the available providers. Paste your **Dropbox Access Token** and specify the remote path where you want to store the data. Creating a dedicated folder in Dropbox beforehand helps with organization. Click **Copy to/from Dropbox** to initiate the transfer. The transfer progress will be displayed in the Runpod interface. ## Alternative transfer methods While Cloud Sync provides the easiest way to sync data with cloud providers, you can also transfer files between your Pod and other destinations using: * **runpodctl**: A built-in CLI tool for peer-to-peer transfers using one-time codes. * **SSH-based tools**: Use SCP or rsync for direct transfers to your local machine. * **Network volumes**: For persistent storage across multiple Pods. For detailed instructions on these methods, see our [file transfer guide](/pods/storage/transfer-files). ## Troubleshooting If you encounter issues during syncing: * **Transfer fails immediately**: Verify your credentials are correct and have the necessary permissions. * **Slow transfer speeds**: Large datasets take time to transfer. Consider compressing data before syncing or using incremental transfers. * **Permission denied errors**: Ensure your bucket or container has the correct access policies. Some providers require specific permission configurations for external access. * **Connection timeouts**: Check that your Pod has stable network connectivity. You may need to retry the transfer. For additional support, consult your cloud provider's documentation or contact Runpod support. --- # Source: https://docs.runpod.io/tutorials/serverless/comfyui.md # Source: https://docs.runpod.io/tutorials/pods/comfyui.md # Source: https://docs.runpod.io/tutorials/serverless/comfyui.md # Source: https://docs.runpod.io/tutorials/pods/comfyui.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Generate images with ComfyUI > Deploy ComfyUI on Runpod to create AI-generated images. This tutorial walks you through how to configure ComfyUI on a [GPU Pod](/pods/overview) and use it to generate images with text-to-image models. [ComfyUI](https://www.comfy.org/) is a node-based graphical interface for creating AI image generation workflows. Instead of writing code, you connect different components visually to build custom image generation pipelines. This approach provides flexibility to experiment with various models and techniques while maintaining an intuitive interface. This tutorial uses the [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) model and a matching template, but you can adapt these instructions for any model/template combination you want to use. When you're just getting started with ComfyUI, it's important to use a workflow that was created for the specific model you intend to use. You usually can't just switch the "Load Checkpoint" node from one model to another and expect optimal performance or results. For example, if you load a workflow created for the Flux Dev model and try to use it with SDXL-Turbo, the workflow might run, but with poor speed or image quality. ## What you'll learn In this tutorial, you'll learn how to: * Deploy a Pod with ComfyUI pre-installed. * Connect to the ComfyUI web interface. * Browse pre-configured workflow templates. * Install new models to your Pod. * Generate an image. ## Requirements Before you begin, you'll need: * A [Runpod account](/get-started/manage-accounts). * At least \$10 in Runpod credits. * A basic understanding of AI image generation. ## Step 1: Deploy a ComfyUI Pod First, you'll deploy a Pod using the official Runpod ComfyUI template, which pre-installs ComfyUI and the ComfyUI Manager plugin: Runpod provides official ComfyUI templates built from the [comfyui-base](https://github.com/runpod-workers/comfyui-base) repository. Choose the one that matches your GPU: * **Standard GPUs (RTX 4090, L40, A100, etc.):** Use the [ComfyUI](https://console.runpod.io/hub/template/comfyui?id=cw3nka7d08) template. * **Blackwell GPUs (RTX 5090, B200):** Use the [ComfyUI Blackwell Edition](https://console.runpod.io/hub/template/comfyui-blackwell-edition-5090-b200?id=2lv7ev3wfp) template. Blackwell GPUs use a different architecture, so this dedicated template ensures compatibility. Click **Deploy** on the template that matches your target GPU. Configure your Pod with these settings: * **GPU selection:** Choose an L40 or RTX 4090 for optimal performance with SDXL-Turbo. Lower VRAM GPUs may work for smaller models. If you selected the Blackwell Edition template, choose an RTX 5090 or B200. * **Storage:** The default container and volume disk sizes set by the template should be sufficient for SDXL-Turbo. You can also add a [network volume](/storage/network-volumes) to your Pod if you want persistent storage. * **Deployment type:** Select **On-Demand** for flexibility. Click **Deploy On-Demand** to create your Pod. The Pod can take up to 30 minutes to initialize the container and start the ComfyUI HTTP service. ## Step 2: Open the ComfyUI interface Once your Pod has finished initializing, you can open the ComfyUI interface: Go to the [Pods section](https://www.runpod.io/console/pods) in the Runpod console, then find your deployed ComfyUI Pod and expand it. The Pod may take up to 30 minutes to initialize when first deployed. Future starts will generally take much less time. Click **Connect** on your Pod, then select the last HTTP service button in the list, labeled **Connect to HTTP Service \[Port 8188]**. This will open the ComfyUI interface in a new browser tab. The URL follows the format: `https://[POD_ID]-8188.proxy.runpod.net`. If you see the label "Not Ready" on the HTTP service button, or you get a "Bad Gateway" error when first connecting, wait 2–3 minutes for the service to fully start, then refresh the page. ## Step 3: Load a workflow template ComfyUI workflows consist of a series of nodes that are connected to each other to create a AI generation pipeline. Rather than creating our own workflow from scratch, we'll load a pre-configured workflow that was created for the specific model we intend to use: When you first open the ComfyUI interface, the template browser should open automatically. If it doesn't, click the **Workflow** button in the top right corner of the ComfyUI interface, then select **Browse Templates**. In the sidebar to the left of the browser, select the **Image** tab. Find the **SDXL-Turbo** template and click on it to load a basic image generation workflow. ## Step 4: Install the SDXL-Turbo model As soon as you load the workflow, you'll see a popup labeled **Missing Models**. This happens because the Pod template we deployed doesn't come pre-installed with any models, so we'll need to install them now. Rather than clicking the download button (which downloads the missing model to your local machine), use the ComfyUI Manager plugin to install the missing model directly onto the Pod: Close the **Missing Models** popup by clicking the **X** in the top right corner. Then click **Manager** in the top right of the ComfyUI interface, and select **Model Manager** from the list of options. In the search bar, enter `SDXL-Turbo 1.0 (fp16)`, then click **Install**. Before you can use the model, you'll need to refresh the ComfyUI interface. You can do this by either refreshing the browser tab where it's running, or by pressing R. Find the node labeled **Load Checkpoint** in the workflow. It should be the first node on the left side of the canvas. Click on the dropdown menu labeled `ckpt_name` and select the SDXL-Turbo model checkpoint you just installed (named `SDXL-TURBO/sd_xl_turbo_1.0_fp16.safetensors`). ## Step 5: Generate an image Your workflow is now ready! Follow these steps to generate an image: Locate the text input node labeled **CLIP Text Encode (Prompt)** in the workflow. Click on the text field containing the default prompt and replace it with your desired image description. Example prompts: * "A serene mountain landscape at sunset with a crystal clear lake." * "A futuristic cityscape with neon lights and flying vehicles." * "A detailed portrait of a robot reading a book in a library." Click **Run** at the bottom of the workflow (or press Ctrl+Enter) to begin the image generation process. Watch as the workflow progresses through each node: * Text encoding. * Model loading. * Image generation steps. * Final output processing. The first generation may take a few minutes to complete as the model checkpoint must be loaded. Subsequent generations will be much faster. The generated image appears in the output node when complete. Right-click the image to save it to your local machine, view it at full resolution, or copy it to your clipboard. Congratulations! You've just generated your first image with ComfyUI on Runpod. ## Troubleshooting Here are some common issues you may encounter and possbile solutions: * **Connection errors**: Wait for the Pod to fully initialize (up to 30 minutes for initial deployment). * **HTTP service not ready**: Wait at least 2 to 3 minutes after Pod deployment for the HTTP service to fully start. You can also check the Pod logs in the Runpod console to look for deployment errors. * **Out of memory errors**: Reduce image resolution or batch size in your workflow. * **Slow generation**: Make sure you're using an appropriate GPU for your selected model. See [Choose a Pod](/pods/choose-a-pod) for guidance. ## Next steps Once you're comfortable with basic image generation, explore the [ComfyUI documentation](https://docs.comfy.org/) to learn how to build more advanced workflows. Here are some ideas for where to start: ### Experiment with different workflow templates Use the template browser from [Step 3](#step-3%3A-load-a-workflow-template) to test out new models and find a workflow that suits your needs. You can also browse the web for a preconfigured workflow and import it by clicking **Workflow** in the top right corner of the ComfyUI interface, selecting **Open**, then selecting the workflow file you want to import. Don't forget to install any missing models using the model manager. If you need a model that isn't available in the model manager, you can download it from the web to your local machine, then use the [Runpod CLI](/runpodctl/overview) to transfer the model files directly into your Pod's `/workspace/madapps/ComfyUI/models` directory. ### Create custom workflows Build your own workflows by: 1. Right-clicking the canvas to add new nodes. 2. Connecting node outputs to inputs by dragging between connection points. 3. Saving your custom workflow with Ctrl+S or by clicking **Workflow** and selecting **Save**. ### Manage your Pod While working with ComfyUI, you can monitor your usage by checking GPU/disk utilization in the [Pods page](https://console.runpod.io/pods) of the Runpod console. Stop your Pod when you're finished to avoid unnecessary charges. It's also a good practice to download any custom workflows to your local machine before stopping the Pod. For persistent storage of models and outputs across sessions, consider using a [network volume](/storage/network-volumes). --- # Source: https://docs.runpod.io/get-started/concepts.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Concepts > Key concepts and terminology for understanding Runpod's platform and products. ## [Runpod console](https://console.runpod.io) The web interface for managing your compute resources, account, teams, and billing. ## [Serverless](/serverless/overview) A pay-as-you-go compute solution designed for dynamic autoscaling in production AI/ML apps. ## [Pod](/pods/overview) A dedicated GPU or CPU instance for containerized AI/ML workloads, such as training models, running inference, or other compute-intensive tasks. ## [Public Endpoint](/hub/public-endpoints) An AI model API hosted by Runpod that you can access directly without deploying your own infrastructure. ## [Instant Cluster](/instant-clusters) A managed compute cluster with high-speed networking for multi-node distributed workloads like training large AI models. ## [Network volume](/storage/network-volumes) Persistent storage that exists independently of your other compute resources and can be attached to multiple Pods or Serverless endpoints to share data between machines. ## [S3-compatible API](/storage/s3-api) A storage interface compatible with Amazon S3 for uploading, downloading, and managing files in your network volumes. ## [Runpod Hub](/hub/overview) A repository for discovering, deploying, and sharing preconfigured AI projects optimized for Runpod. ## Container A Docker-based environment that packages your code, dependencies, and runtime into a portable unit that runs consistently across machines. ## Data center Physical facilities where Runpod's GPU and CPU hardware is located. Your choice of data center can affect latency, available GPU types, and pricing. ## Machine The physical server hardware within a data center that hosts your workloads. Each machine contains CPUs, GPUs, memory, and storage. --- # Source: https://docs.runpod.io/serverless/workers/concurrent-handler.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Build a concurrent handler > Build a concurrent handler function to process multiple requests simultaneously on a single worker. ## What you'll learn In this guide you will learn how to: * Create an asynchronous handler function. * Create a concurrency modifier to dynamically adjust concurrency levels. * Optimize worker resources based on request patterns. * Test your concurrent handler locally. ## Requirements * You've [created a Runpod account](/get-started/manage-accounts). * You've installed the Runpod SDK (`pip install runpod`). * You know how to build a [basic handler function](/serverless/workers/handler-functions). ## Step 1: Set up your environment First, set up a virtual environment and install the necessary packages: ```sh # Create a Python virtual environment python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install runpod asyncio ``` ## Step 2: Create a concurrent handler file Create a file named `concurrent_handler.py` and add the following code: ```python import runpod import asyncio import random # Global variable to simulate a varying request rate request_rate = 0 async def process_request(job): # This function processes incoming requests concurrently. # # Args: # job (dict): Contains the input data and request metadata # # Returns: # str: The processed result # Extract input data job_input = job["input"] delay = job_input.get("delay", 1) # Simulate an asynchronous task (like a database query or API call) await asyncio.sleep(delay) return f"Processed: {job_input}" # Placeholder code for a dynamic concurrency adjustment function def adjust_concurrency(current_concurrency): return 50 def update_request_rate(): """Simulates changes in the request rate to mimic real-world scenarios.""" global request_rate request_rate = random.randint(20, 100) # Start the Serverless function when the script is run if __name__ == "__main__": runpod.serverless.start({ "handler": process_request, "concurrency_modifier": adjust_concurrency }) ``` The `process_request` function uses the `async` keyword, enabling it to use non-blocking I/O operations with `await`. This allows the function to pause during I/O operations (simulated with `asyncio.sleep()`) and handle other requests while waiting. The `update_request_rate` function simulates monitoring request patterns for adaptive scaling. This example uses a simple random number generator to simulate changing request patterns. In a production environment, you would: * Track actual request counts and response times. * Monitor system resource usage, such as CPU and memory. * Adjust concurrency based on real performance metrics. ## Step 3: Implement dynamic concurrency adjustment Let's enhance our handler with dynamic concurrency adjustment. This will allow your worker to handle more requests during high traffic periods and conserve resources during low traffic periods. Replace the placeholder `adjust_concurrency` function with this improved version: ```python def adjust_concurrency(current_concurrency): # Dynamically adjust the worker's concurrency level based on request load. # # Args: # current_concurrency (int): The current concurrency level # # Returns: # int: The new concurrency level global request_rate # In production, this would use real metrics update_request_rate() max_concurrency = 10 # Maximum allowable concurrency min_concurrency = 1 # Minimum concurrency to maintain high_request_rate_threshold = 50 # Threshold for high request volume # Increase concurrency if under max limit and request rate is high if (request_rate > high_request_rate_threshold and current_concurrency < max_concurrency): return current_concurrency + 1 # Decrease concurrency if above min limit and request rate is low elif (request_rate <= high_request_rate_threshold and current_concurrency > min_concurrency): return current_concurrency - 1 return current_concurrency ``` Let's break down how this function works: 1. **Control parameters**: * `max_concurrency = 10`: Sets an upper limit on concurrency to prevent resource exhaustion. * `min_concurrency = 1`: Ensures at least one request can be processed at a time. * `high_request_rate_threshold = 50`: Defines when to consider traffic "high". You can adjust these parameters based on your specific workload. 2. **Scaling up logic**: ```python if (request_rate > high_request_rate_threshold and current_concurrency < max_concurrency): return current_concurrency + 1 ``` This increases concurrency by 1 when: * The request rate exceeds our threshold (50 requests). * We haven't reached our maximum concurrency limit. 3. **Scaling down logic**: ```python elif (request_rate <= high_request_rate_threshold and current_concurrency > min_concurrency): return current_concurrency - 1 ``` This decreases concurrency by 1 when: * The request rate is at or below our threshold. * We're above our minimum concurrency level. 4. **Default behavior**: ```python return current_concurrency ``` If neither condition is met, maintain the current concurrency level. With these enhancements, your concurrent handler will now dynamically adjust its concurrency level based on the observed request rate, optimizing resource usage and responsiveness. ## Step 4: Create a test input file Now we're ready to test our handler. Create a file named `test_input.json` to test your handler locally: ```json { "input": { "message": "Test concurrent processing", "delay": 0.5 } } ``` ## Step 5: Test your handler locally Run your handler to verify that it works correctly: ```sh python concurrent_handler.py ``` You should see output similar to this: ```sh --- Starting Serverless Worker | Version 1.7.9 --- INFO | Using test_input.json as job input. DEBUG | Retrieved local job: {'input': {'message': 'Test concurrent processing', 'delay': 0.5}, 'id': 'local_test'} INFO | local_test | Started. DEBUG | local_test | Handler output: Processed: {'message': 'Test concurrent processing', 'delay': 0.5} DEBUG | local_test | run_job return: {'output': "Processed: {'message': 'Test concurrent processing', 'delay': 0.5}"} INFO | Job local_test completed successfully. INFO | Job result: {'output': "Processed: {'message': 'Test concurrent processing', 'delay': 0.5}"} INFO | Local testing complete, exiting. ``` ## (Optional) Step 6: Implement real metrics collection In a production environment, you should to replace the `update_request_rate` function with real metrics collection. Here is an example how you could build this functionality: ```python def update_request_rate(request_history): # Collects real metrics about request patterns. # # Args: # request_history (list): A list of request timestamps # # Returns: # int: The new request rate global request_rate # Option 1: Track request count over a time window current_time = time.time() # Count requests in the last minute recent_requests = [r for r in request_history if r > current_time - 60] request_rate = len(recent_requests) # Option 2: Use an exponential moving average # request_rate = 0.9 * request_rate + 0.1 * new_requests # Option 3: Read from a shared metrics service like Redis # request_rate = redis_client.get('recent_request_rate') ``` ## Next steps Now that you've created a concurrent handler, you're ready to: * [Package and deploy your handler as a Serverless worker.](/serverless/workers/deploy) * [Add error handling for more robust processing.](/serverless/workers/handler-functions#error-handling) * [Implement streaming responses with generator functions.](/serverless/workers/handler-functions#generator-handlers) * [Configure your endpoint for optimal performance.](/serverless/endpoints/endpoint-configurations) --- # Source: https://docs.runpod.io/serverless/vllm/configuration.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Configure vLLM to work with your model > Learn how to set up vLLM endpoints to work with your chosen model. Most LLMs need specific configuration to run properly on vLLM. You need to understand what settings your model expects for loading, tokenization, and generation. This guide covers how to configure your vLLM endpoints for different model families, how environment variables map to vLLM command-line flags, and recommended configurations for popular models, and how to select the right GPU for your model. ## Why is vLLM sometimes hard to configure? vLLM supports hundreds of models, but default settings only work out of the box for a subset of them. Without the right settings, your vLLM workers may fail to load, produce incorrect outputs, or miss key features. Different model architectures have different requirements for tokenization, attention mechanisms, and features like tool calling or reasoning. For example, Mistral models use a specialized tokenizer mode and config format, while reasoning models like DeepSeek-R1 require you to specify a reasoning parser. When deploying a model, check its Hugging Face README and the [vLLM documentation](https://docs.vllm.ai/en/latest/usage/) for required or recommended settings. ## Mapping environment variables to vLLM CLI flags When running vLLM with `vllm serve`, the engine is configured using [command-line flags](https://docs.vllm.ai/en/latest/configuration/engine_args/). On Runpod, you set these options with [environment variables](/serverless/vllm/environment-variables) instead. Each vLLM command-line argument has a corresponding environment variable. Convert the flag name to uppercase with underscores: `--tokenizer_mode` becomes `TOKENIZER_MODE`, `--enable-auto-tool-choice` becomes `ENABLE_AUTO_TOOL_CHOICE`, and so on. ### Example: Deploying Mistral To launch a Mistral model using the vLLM CLI, you would run a command similar to this: ```bash vllm serve mistralai/Ministral-8B-Instruct-2410 \ --tokenizer_mode mistral \ --config_format mistral \ --load_format mistral \ --enable-auto-tool-choice \ --tool-call-parser mistral ``` On Runpod, set these options as environment variables when configuring your endpoint: | Environment variable | Value | | ------------------------- | -------------------------------------- | | `MODEL_NAME` | `mistralai/Ministral-8B-Instruct-2410` | | `TOKENIZER_MODE` | `mistral` | | `CONFIG_FORMAT` | `mistral` | | `LOAD_FORMAT` | `mistral` | | `ENABLE_AUTO_TOOL_CHOICE` | `true` | | `TOOL_CALL_PARSER` | `mistral` | This pattern applies to any vLLM command-line flag. Find the corresponding environment variable name and add it to your endpoint configuration. ## Model-specific configurations The table below lists recommended environment variables for popular model families. These settings handle common requirements like tokenization modes, tool calling support, and reasoning capabilities. Not all models in a family require all settings. Check your model's documentation for exact requirements. | Model family | Example model | Key environment variables | Notes | | ------------ | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Qwen3 | `Qwen/Qwen3-8B` | `ENABLE_AUTO_TOOL_CHOICE=true` `TOOL_CALL_PARSER=hermes` | Qwen models often ship in various quantization formats. If you are deploying an AWQ or GPTQ version, ensure `QUANTIZATION` is set correctly (e.g., `awq`). | | OpenChat | `openchat/openchat-3.5-0106` | None required | OpenChat relies heavily on specific chat templates. If the default templates produce poor results, use `CUSTOM_CHAT_TEMPLATE` to inject the precise Jinja2 template required for the OpenChat correction format. | | Gemma | `google/gemma-3-1b-it` | None required | Gemma models require an active Hugging Face token. Ensure your `HF_TOKEN` is set as a secret. Gemma also performs best when `DTYPE` is explicitly set to `bfloat16` to match its native training precision. | | DeepSeek-R1 | `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` | `REASONING_PARSER=deepseek_r1` | Enables reasoning mode for chain-of-thought outputs. | | Phi-4 | `microsoft/Phi-4-mini-instruct` | None required | Phi models are compact but have specific architectural quirks. Setting `ENFORCE_EAGER=true` can sometimes resolve initialization issues with Phi models on older CUDA versions, though it may slightly reduce performance compared to CUDA graphs. | | Llama 3 | `meta-llama/Llama-3.2-3B-Instruct` | `TOOL_CALL_PARSER=llama3_json` `ENABLE_AUTO_TOOL_CHOICE=true` | Llama 3 models often require strict attention to context window limits. Use `MAX_MODEL_LEN` to prevent the KV cache from exceeding your GPU VRAM. If you are using a 24 GB GPU like a 4090, setting `MAX_MODEL_LEN` to `8192` or `16384` is a safe starting point. | | Mistral | `mistralai/Ministral-8B-Instruct-2410` | `TOKENIZER_MODE=mistral`, `CONFIG_FORMAT=mistral`, `LOAD_FORMAT=mistral`, `TOOL_CALL_PARSER=mistral` `ENABLE_AUTO_TOOL_CHOICE=true` | Mistral models use specialized tokenizers to work properly. | ## Selecting GPU size based on the model Selecting the right GPU for vLLM is a balance between **model size**, **quantization**, and your required **context length**. Because vLLM pre-allocates memory for its KV (Key-Value) cache to enable high-throughput serving, you generally need more VRAM than the bare minimum required just to load the model. ### VRAM estimation formula A reliable rule of thumb for estimating the required VRAM for a model in vLLM is: * **FP16/BF16 (unquantized):** 2 bytes per parameter. * **INT8 quantized:** 1 byte per parameter. * **INT4 (AWQ/GPTQ):** 0.5 bytes per parameter. * **KV cache buffer:** vLLM typically reserves 10-30% of remaining VRAM for the KV cache to handle concurrent requests. Use the table below as a starting point to select a hardware configuration for your model. | Model size (parameters) | Recommended GPUs | VRAM | | ----------------------- | ------------------- | --------- | | **Small (\<10B)** | RTX 4090, A6000, L4 | 16–24 GB | | **Medium (10B–30B)** | A6000, L40S | 32–48 GB | | **Large (30B–70B)** | A100, H100, B200 | 80–180 GB | *** ### Context window vs. VRAM The more context you need (e.g., 32k or 128k tokens), the more VRAM the KV cache consumes. If you encounter Out-of-Memory (OOM) errors, use the `MAX_MODEL_LEN` environment variable to cap the context. For example, a 7B model that OOMs at 32k context on a 24 GB card will often run perfectly at 16k. ### GPU memory utilization By default, vLLM attempts to use 90% of the available VRAM (`GPU_MEMORY_UTILIZATION=0.90`). * **If you OOM during initialization:** Lower this to `0.85`. * **If you have extra headroom:** Increase it to `0.95` to allow for more concurrent requests. ### Quantization (AWQ/GPTQ) If you are limited by a single GPU, use a quantized version of the model (e.g., `Meta-Llama-3-8B-Instruct-AWQ`). This reduces the weight memory by 50-75% compared to `FP16`, allowing you to fit larger models on cards like the RTX 4090 (24 GB) or A4000 (16 GB). For production workloads where high availability is key, always select **multiple GPU types** in your [Serverless endpoint configuration](/serverless/endpoints/endpoint-configurations). This allows the system to fall back to a different hardware tier if your primary choice is out of stock in a specific data center. ## vLLM recipes vLLM provides step-by-step recipes for common deployment scenarios, including deploying specific models, optimizing performance, and integrating with frameworks. Find the recipes at [docs.vllm.ai/projects/recipes](https://docs.vllm.ai/projects/recipes/en/latest/index.html). They are community-maintained and updated regularly as vLLM evolves. You can often find further information in the documentation for the specific model you are deploying. For example: * [Mistral + vLLM deployment guide](https://docs.mistral.ai/deployment/self-deployment/vllm). * [Qwen + vLLM deployment guide](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#). --- # Source: https://docs.runpod.io/sdks/graphql/configurations.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Configurations For details on queries, mutations, fields, and inputs, see the [Runpod GraphQL Spec](https://graphql-spec.runpod.io/). When configuring your environment, certain arguments are essential to ensure the correct setup and operation. Below is a detailed overview of each required argument: ### `containerDiskInGb` * **Description**: Specifies the size of the disk allocated for the container in gigabytes. This space is used for the operating system, installed applications, and any data generated or used by the container. * **Type**: Integer * **Example**: `10` for a 10 GB disk size. ### `dockerArgs` * **Description**: If specified, overrides the [container start command](https://docs.docker.com/engine/reference/builder/#cmd). If this argument is not provided, it will rely on the start command provided in the docker image. * **Type**: String * **Example**: `sleep infinity` to run the container in the background. ### `env` * **Description**: A set of environment variables to be set within the container. These can configure application settings, external service credentials, or any other configuration data required by the software running in the container. * **Type**: Dictionary or Object * **Example**: `{"DATABASE_URL": "postgres://user:password@localhost/dbname"}`. ### `imageName` * **Description**: The name of the Docker image to use for the container. This should include the repository name and tag, if applicable. * **Type**: String * **Example**: `"nginx:latest"` for the latest version of the Nginx image. ### `name` * **Description**: The name assigned to the container instance. This name is used for identification and must be unique within the context it's being used. * **Type**: String * **Example**: `"my-app-container"`. ### `volumeInGb` * **Description**: Defines the size of an additional persistent volume in gigabytes. This volume is used for storing data that needs to persist between container restarts or redeployments. * **Type**: Integer * **Example**: `5` for a 5GB persistent volume. Ensure that these arguments are correctly specified in your configuration to avoid errors during deployment. Optional arguments may also be available, providing additional customization and flexibility for your setup. --- # Source: https://docs.runpod.io/pods/connect-to-a-pod.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Connection options > Explore our Pod connection options, including the web terminal, SSH, JupyterLab, and VSCode/Cursor. ## Web terminal connection The web terminal offers a convenient, browser-based method to quickly connect to your Pod and run commands. However, it's not recommended for long-running processes, such as training an LLM, as the connection might not be as stable or persistent as a direct [SSH connection](#ssh-terminal-connection). The availability of the web terminal depends on the [Pod's template](/pods/templates/overview). To connect using the web terminal: 1. Navigate to the [Pods page](https://console.runpod.io/pods) in the Runpod console. 2. Expand the desired Pod and select **Connect**. 3. If your web terminal is **Stopped**, click **Start**. If clicking **Start** does nothing, try refreshing the page. 4. Click **Open Web Terminal** to open a new tab in your browser with a web terminal session. ## JupyterLab connection JupyterLab provides an interactive, web-based environment for running code, managing files, and performing data analysis. Many Runpod templates, especially those geared towards machine learning and data science, come with JupyterLab pre-configured and accessible via HTTP. To connect to JupyterLab (if it's available on your Pod): 1. Deploy your Pod, ensuring that the template is configured to run JupyterLab. Official Runpod templates like "Runpod Pytorch" are usually compatible. 2. Once the Pod is running, navigate to the [Pods page](https://console.runpod.io/pods) in the Runpod console. 3. Find the Pod you created and click the **Connect** button. If it's grayed out, your Pod hasn't finished starting up yet. 4. In the window that opens, under **HTTP Services**, look for a link to **Jupyter Lab** (or a similarly named service on the configured HTTP port, often 8888). Click this link to open the JupyterLab workspace in your browser. If the JupyterLab tab displays a blank page for more than a minute or two, try restarting the Pod and opening it again. 5. Once in JupyterLab, you can create new notebooks (e.g., under **Notebook**, select **Python 3 (ipykernel)**), upload files, and run code interactively. ## SSH terminal connection Connecting to a Pod via an SSH (Secure Shell) terminal provides a secure and reliable method for interacting with your instance. To establish an SSH connection, you'll need an SSH client installed on your local machine. The exact command will vary slightly depending on whether you're using the basic proxy connection or a direct connection to a public IP. To learn more, see [Connect to a Pod with SSH](/pods/configuration/use-ssh). ## Connect to VSCode or Cursor For a more integrated development experience, you can connect directly to your Pod instance through Visual Studio Code (VSCode) or Cursor. This allows you to work within your Pod's volume directory as if the files were stored on your local machine, leveraging VSCode's or Cursor's powerful editing and debugging features. For a step-by-step guide, see [Connect to a Pod with VSCode or Cursor](/pods/configuration/connect-to-ide). --- # Source: https://docs.runpod.io/pods/configuration/connect-to-ide.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Connect to a Pod with VSCode or Cursor > Set up remote development on your Pod using VSCode or Cursor. This guide explains how to connect directly to your Pod through VSCode or Cursor using the Remote-SSH extension, allowing you to work within your Pod's volume directories as if the files were stored on your local machine. ## Requirements Before you begin, you'll need: * A local development environment with VSCode or Cursor installed. * [Download VSCode](https://code.visualstudio.com/download). * [Download Cursor](https://cursor.sh/). * Familiarity with basic command-line operations and SSH. ## Step 1: Install the Remote-SSH extension To connect to a Pod, you'll need to install the Remote-SSH extension for your IDE: 1. Open VSCode or Cursor and navigate to the **Extensions** view (Ctrl+Shift+X or Cmd+Shift+X). 2. Search for and install the Remote-SSH extension: * VSCode: **Remote - SSH** by **ms-vscode-remote**. * Cursor: **Remote-SSH** by **Anysphere**. ## Step 2: Generate an SSH key Before you can connect to a Pod, you'll need an SSH key that is paired with your Runpod account. If you don't have one, follow these steps: 1. Generate an SSH key using this command on your local terminal: ```sh ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -C "YOUR_EMAIL@DOMAIN.COM" ``` 2. To retrieve your public SSH key, run this command: ```sh cat ~/.ssh/id_ed25519.pub ``` This will output something similar to this: ```sh ssh-ed25519 AAAAC4NzaC1lZDI1JTE5AAAAIGP+L8hnjIcBqUb8NRrDiC32FuJBvRA0m8jLShzgq6BQ YOUR_EMAIL@DOMAIN.COM ``` 3. Copy and paste the output into the **SSH Public Keys** field in your [Runpod user account settings](https://www.runpod.io/console/user/settings). To enable SSH access, your public key must be present in the `~/.ssh/authorized_keys` file on your Pod. If you upload your public key to the settings page before your Pod starts, the system will automatically inject it into that file at startup. If your Pod is already running when you upload the key, the system will not perform this injection. To enable SSH access, you'll need to either terminate/redeploy the Pod, or open a [web terminal](/pods/connect-to-a-pod#web-terminal-connection) on the running Pod and run the following commands: ```sh export PUBLIC_KEY="" echo "$PUBLIC_KEY" >> ~/.ssh/authorized_keys ``` ## Step 3: Deploy a Pod Next, deploy the Pod you want to connect to. For detailed deployment instructions, see [Manage Pods -> Create a Pod](/pods/manage-pods#create-a-pod). To connect with VSCode/Cursor, your Pod template must support SSH over exposed TCP. To determine whether your Pod template supports this, during deployment, after selecting a template, look for a checkbox under **Instance Pricing** labeled **SSH Terminal Access** and make sure it's checked. All official Runpod Pytorch templates support SSH over exposed TCP. ## Step 4: Configure SSH for your IDE Next, you'll configure SSH access to your Pod using the Remote-SSH extension. The instructions are different for VSCode and Cursor: 1. From the [Pods](https://www.runpod.io/console/pods) page, select the Pod you deployed. 2. Select **Connect**, then select the **SSH** tab. 3. Copy the second command, under **SSH over exposed TCP**. It will look similar to this: ```bash ssh root@123.456.789.80 -p 12345 -i ~/.ssh/id_ed25519 ``` If you only see one command under SSH, then SSH over exposed TCP is not supported by your selected Pod template. This means you won't be able to connect to your Pod directly through VSCode/Cursor, but you can still connect using [basic SSH](/pods/connect-to-a-pod#basic-ssh-connection) via the terminal. 4. In VSCode, open the **Command Palette** (Ctrl+Shift+P or Cmd+Shift+P) and choose **Remote-SSH: Connect to Host**, then select **Add New SSH Host**. 5. Enter the copied SSH command from step 3 (`ssh root@***.***.***.** -p ***** -i ~/.ssh/id_ed25519`) and press **Enter**. This will add a new entry to your SSH config file. 1. From the [Pods](https://www.runpod.io/console/pods) page, select the Pod you deployed. 2. Select **Connect**. 3. Under **Direct TCP Ports**, look for a line similar to: ``` TCP port -> 69.48.159.6:25634 -> :22 ``` If you don't see a **Direct TCP Ports** section, then SSH over exposed TCP is not supported by your selected Pod template. This means you won't be able to connect to your Pod directly through VSCode/Cursor, but you can still connect using [basic SSH](/pods/configuration/use-ssh#basic-ssh-connection) via the terminal. Here's what these values mean: * `69.48.159.6` is the IP address of your Pod. * `25634` is the port number for the Pod's SSH service. Make a note of these values (they will likely be different for your Pod), as you'll need them for the following steps. 4. In Cursor, open the **Command Palette** (Ctrl+Shift+P or Cmd+Shift+P) and choose **Remote-SSH: Connect to Host**, then select **Add New SSH Host**. This opens the SSH config file in Cursor. 5. Add the following to the SSH config file: ``` Host POD_NAME HostName POD_IP User root Port POD_PORT IdentityFile ~/.ssh/id_ed25519 ``` Replace: * `POD_NAME` with a descriptive name for your Pod. This will be used to identify your Pod in the SSH config file, and does not need to match the name you gave your Pod in the Runpod console. * `POD_IP` with the IP address of your Pod from step 3. * `POD_PORT` with the port number of your Pod from step 3. So, for the example Pod, the SSH config file will look like: ``` Host my-pod HostName 69.48.159.6 User root Port 25634 IdentityFile ~/.ssh/id_ed25519 ``` If you are using a custom SSH key, replace `~/.ssh/id_ed25519` with the path to your SSH key. 6. Save and close the file. ## Step 5: Connect to your Pod Now you can connect to your Pod with the Remote-SSH extension. 1. Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P). 2. Select **Remote-SSH: Connect to Host**. 3. Choose your Pod from the list (either by IP or custom name if you configured one). 4. VSCode/Cursor will open a new window and connect to your Pod. 5. When prompted, select the platform (Linux). 6. Once connected, click **Open Folder** and navigate to your workspace directory (typically `/workspace`). You should now be connected to your Pod instance, where you can edit files in your volume directories as if they were local. If you stop and then resume your Pod, the port numbers may change. If so, you'll need to go back to the previous step and update your SSH config file using the new port numbers before reconnecting. ## Working with your Pod Once connected through Remote-SSH, you can: * Edit files with full IntelliSense and language support. * Run and debug applications with access to GPU resources. * Use integrated terminal for command execution. * Install extensions that run on the remote host. * Forward ports to access services locally. * Commit and push code using integrated Git support. Here are some important directories to be aware of: * `/workspace`: Default [persistent storage](/pods/storage/types) directory. * `/tmp`: Temporary files (cleared when Pod stops). * `/root`: Home directory for the root user. ## Troubleshooting If you can't connect to your Pod: 1. Verify your Pod is running and fully initialized. 2. Check that your SSH key is properly configured in Runpod settings. 3. Ensure the Pod has SSH enabled in its template. If the VSCode/Cursor server fails to install: 1. Check that your Pod has sufficient disk space. 2. Ensure your Pod has internet connectivity. 3. Try manually removing the `.vscode-server` or `.cursor-server` directory and reconnecting: ```sh rm -rf ~/.vscode-server ``` --- # Source: https://docs.runpod.io/get-started/connect-to-runpod.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Choose a workflow > Review the available methods for accessing and managing Runpod resources. Runpod offers multiple ways to access and manage your compute resources. Choose the method that best fits your workflow: ## Runpod console The Runpod console provides an intuitive web interface to manage Pods and endpoints, access Pod terminals, send endpoint requests, monitor resource usage, and view billing and usage history. [Launch the Runpod console →](https://www.console.runpod.io) ## Connect directly to Pods You can connect directly to your running Pods and execute code on them using a variety of methods, including a built-in web terminal, an SSH connection from your local machine, a JupyterLab instance, or a remote VSCode/Cursor development environment. [Learn more about Pod connection options →](/pods/connect-to-a-pod) ## REST API The Runpod REST API allows you to programmatically manage and control compute resources. Use the API to manage Pod lifecycles and Serverless endpoints, monitor resource utilization, and integrate Runpod into your applications. [Explore the API reference →](/api-reference/docs/GET/openapi-json) ## SDKs Runpod provides SDKs in Python, JavaScript, Go, and GraphQL to help you integrate Runpod services into your applications. [Explore the SDKs →](/sdks/python/overview) ## Command-line interface (CLI) The Runpod CLI allows you to manage Pods from your terminal, execute code on Pods, transfer data between Runpod and local systems, and programmatically manage Serverless endpoints. Every Pod comes pre-installed with the `runpodctl` command and includes a Pod-scoped API key for seamless command-line management. [Learn more about runpodctl →](/runpodctl/overview) --- # Source: https://docs.runpod.io/api-reference/container-registry-auths/GET/containerregistryauth/containerRegistryAuthId.md # Source: https://docs.runpod.io/api-reference/container-registry-auths/DELETE/containerregistryauth/containerRegistryAuthId.md # Source: https://docs.runpod.io/api-reference/container-registry-auths/GET/containerregistryauth/containerRegistryAuthId.md # Source: https://docs.runpod.io/api-reference/container-registry-auths/DELETE/containerregistryauth/containerRegistryAuthId.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Delete a container registry auth > Delete a container registry auth. ## OpenAPI ````yaml DELETE /containerregistryauth/{containerRegistryAuthId} openapi: 3.0.3 info: title: Runpod API description: Public Rest API for managing Runpod programmatically. version: 0.1.0 contact: name: help url: https://contact.runpod.io/hc/requests/new email: help@runpod.io servers: - url: https://rest.runpod.io/v1 security: - ApiKey: [] tags: - name: docs description: This documentation page. - name: pods description: Manage Pods. - name: endpoints description: Manage Serverless endpoints. - name: network volumes description: Manage Runpod network volumes. - name: templates description: Manage Pod and Serverless templates. - name: container registry auths description: >- Manage authentication for container registries such as dockerhub to use private images. - name: billing description: Retrieve billing history for your Runpod account. externalDocs: description: Find out more about Runpod. url: https://runpod.io paths: /containerregistryauth/{containerRegistryAuthId}: delete: tags: - container registry auths summary: Delete a container registry auth description: Delete a container registry auth. operationId: DeleteContainerRegistryAuth parameters: - name: containerRegistryAuthId in: path description: Container registry auth ID to delete. required: true schema: type: string responses: '204': description: Container registry auth successfully deleted. '400': description: Invalid container registry auth ID. '401': description: Unauthorized. components: securitySchemes: ApiKey: type: http scheme: bearer bearerFormat: Bearer ```` --- # Source: https://docs.runpod.io/api-reference/container-registry-auths/POST/containerregistryauth.md # Source: https://docs.runpod.io/api-reference/container-registry-auths/GET/containerregistryauth.md # Source: https://docs.runpod.io/api-reference/container-registry-auths/POST/containerregistryauth.md # Source: https://docs.runpod.io/api-reference/container-registry-auths/GET/containerregistryauth.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # List container registry auths > Returns a list of container registry auths. ## OpenAPI ````yaml GET /containerregistryauth openapi: 3.0.3 info: title: Runpod API description: Public Rest API for managing Runpod programmatically. version: 0.1.0 contact: name: help url: https://contact.runpod.io/hc/requests/new email: help@runpod.io servers: - url: https://rest.runpod.io/v1 security: - ApiKey: [] tags: - name: docs description: This documentation page. - name: pods description: Manage Pods. - name: endpoints description: Manage Serverless endpoints. - name: network volumes description: Manage Runpod network volumes. - name: templates description: Manage Pod and Serverless templates. - name: container registry auths description: >- Manage authentication for container registries such as dockerhub to use private images. - name: billing description: Retrieve billing history for your Runpod account. externalDocs: description: Find out more about Runpod. url: https://runpod.io paths: /containerregistryauth: get: tags: - container registry auths summary: List container registry auths description: Returns a list of container registry auths. operationId: ListContainerRegistryAuths responses: '200': description: Successful operation. content: application/json: schema: $ref: '#/components/schemas/ContainerRegistryAuths' '400': description: Invalid ID supplied. '404': description: Container registry auth not found. components: schemas: ContainerRegistryAuths: type: array items: $ref: '#/components/schemas/ContainerRegistryAuth' ContainerRegistryAuth: type: object properties: id: type: string example: clzdaifot0001l90809257ynb description: A unique string identifying a container registry authentication. name: type: string example: my creds description: >- A user-defined name for a container registry authentication. The name must be unique. securitySchemes: ApiKey: type: http scheme: bearer bearerFormat: Bearer ```` --- # Source: https://docs.runpod.io/tutorials/introduction/containers.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Overview > Learn about containers and how to use them with Runpod ## What are containers? > A container is an isolated environment for your code. This means that a container has no knowledge of your operating system, or your files. It runs on the environment provided to you by Docker Desktop. Containers have everything that your code needs in order to run, down to a base operating system. [From Docker's website](https://docs.docker.com/guides/walkthroughs/what-is-a-container/#:~:text=A%20container%20is%20an%20isolated,to%20a%20base%20operating%20system) Developers package their applications, frameworks, and libraries into a Docker container. Then, those containers can run outside their development environment. ### Why use containers? > Build, ship, and run anywhere. Containers are self-contained and run anywhere Docker runs. This means you can run a container on-premises or in the cloud, as well as in hybrid environments. Containers include both the application and any dependencies, such as libraries and frameworks, configuration data, and certificates needed to run your application. In cloud computing, you get the best cold start times with containers. ## What are images? Docker images are fixed templates for creating containers. They ensure that applications operate consistently and reliably across different environments, which is vital for modern software development. To create Docker images, you use a process known as "Docker build." This process uses a Dockerfile, a text document containing a sequence of commands, as instructions guiding Docker on how to build the image. ### Why use images? Using Docker images helps in various stages of software development, including testing, development, and deployment. Images ensure a seamless workflow across diverse computing environments. ### Why not use images? You must rebuild and push the container image, then edit your endpoint to use the new image each time you iterate on your code. Since development requires changing your code every time you need to troubleshoot a problem or add a feature, this workflow can be inconvenient. ### What is Docker Hub? After their creation, Docker images are stored in a registry, such as Docker Hub. From these registries, you can download images and use them to generate containers, which make it easy to widely distribute and deploy applications. Now that you've got an understanding of Docker, containers, images, and whether containerization is right for you, let's move on to installing Docker. ## Installing Docker For this walkthrough, install Docker Desktop. Docker Desktop bundles a variety of tools including: * Docker GUI * Docker CLI * Docker extensions * Docker Compose The majority of this walkthrough uses the Docker CLI, but feel free to use the GUI if you prefer. For the best installation experience, see Docker's [official documentation](https://docs.docker.com/get-started/get-docker/). ### Running your first command Now that you've installed Docker, open a terminal window and run the following command: ```bash docker version ``` You should see something similar to the following output. ```bash docker version Client: Docker Engine - Community Version: 24.0.7 API version: 1.43 Go version: go1.21.3 Git commit: afdd53b4e3 Built: Thu Oct 26 07:06:42 2023 OS/Arch: darwin/arm64 Context: desktop-linux Server: Docker Desktop 4.26.1 (131620) Engine: Version: 24.0.7 API version: 1.43 (minimum version 1.12) Go version: go1.20.10 Git commit: 311b9ff Built: Thu Oct 26 09:08:15 2023 OS/Arch: linux/arm64 Experimental: false containerd: Version: 1.6.25 GitCommit: abcd runc: Version: 1.1.10 GitCommit: v1.1.10-0-g18a0cb0 docker-init: Version: 0.19.0 ``` If at any point you need help with a command, you can use the `--help` flag to see documentation on the command you're running. ```bash docker --help ``` Let's run `busybox` from the command line to print out today's date. ```bash docker run busybox sh -c 'echo "The time is: $(date)"' # The time is: Thu Jan 11 06:35:39 UTC 2024 ``` * `busybox` is a lightweight Docker image with the bare minimum Linux utilities installed, including `echo` * The `echo` command prints the container's uptime. You've successfully installed Docker and run your first commands. --- # Source: https://docs.runpod.io/references/cpu-types.md > ## Documentation Index > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Serverless CPU types The following list contains all CPU types available on Runpod. | Display Name | Cores | Threads Per Core | | ----------------------------------------------- | ----- | ---------------- | | 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz | 6 | 2 | | 11th Gen Intel(R) Core(TM) i5-11400F @ 2.60GHz | 6 | 2 | | 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz | 2 | 1 | | 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz | 8 | 2 | | 11th Gen Intel(R) Core(TM) i7-11700F @ 2.50GHz | 8 | 2 | | 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz | 8 | 2 | | 11th Gen Intel(R) Core(TM) i7-11700KF @ 3.60GHz | 8 | 2 | | 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz | 8 | 2 | | 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz | 8 | 2 | | 12th Gen Intel(R) Core(TM) i3-12100 | 4 | 2 | | 12th Gen Intel(R) Core(TM) i7-12700F | 12 | 1 | | 12th Gen Intel(R) Core(TM) i7-12700K | 12 | 1 | | 13th Gen Intel(R) Core(TM) i3-13100F | 4 | 2 | | 13th Gen Intel(R) Core(TM) i5-13600K | 14 | 1 | | 13th Gen Intel(R) Core(TM) i7-13700K | 16 | 1 | | 13th Gen Intel(R) Core(TM) i7-13700KF | 16 | 1 | | 13th Gen Intel(R) Core(TM) i9-13900F | 24 | 1 | | 13th Gen Intel(R) Core(TM) i9-13900K | 24 | 1 | | 13th Gen Intel(R) Core(TM) i9-13900KF | 24 | 1 | | AMD Eng Sample: 100-000000053-04\_32/20\_N | 48 | 1 | | AMD Eng Sample: 100-000000897-03 | 32 | 2 | | AMD EPYC 4564P 16-Core Processor | 16 | 2 | | AMD EPYC 7251 8-Core Processor | 8 | 2 | | AMD EPYC 7252 8-Core Processor | 8 | 2 | | AMD EPYC 7272 12-Core Processor | 12 | 2 | | AMD EPYC 7281 16-Core Processor | 16 | 2 | | AMD EPYC 7282 16-Core Processor | 16 | 2 | | AMD EPYC 7302 16-Core Processor | 16 | 2 | | AMD EPYC 7302P 16-Core Processor | 16 | 2 | | AMD EPYC 7313 16-Core Processor | 16 | 2 | | AMD EPYC 7313P 16-Core Processor | 16 | 2 | | AMD EPYC 7343 16-Core Processor | 16 | 2 | | AMD EPYC 7351P 16-Core Processor | 16 | 2 | | AMD EPYC 7352 24-Core Processor | 24 | 2 | | AMD EPYC 7371 16-Core Processor | 16 | 2 | | AMD EPYC 7402 24-Core Processor | 24 | 2 | | AMD EPYC 7402P 24-Core Processor | 24 | 2 | | AMD EPYC 7413 24-Core Processor | 24 | 2 | | AMD EPYC 7443 24-Core Processor | 48 | 1 | | AMD EPYC 7443P 24-Core Processor | 24 | 2 | | AMD EPYC 7452 32-Core Processor | 32 | 2 | | AMD EPYC 7453 28-Core Processor | 28 | 1 | | AMD EPYC 74F3 24-Core Processor | 24 | 2 | | AMD EPYC 7502 32-Core Processor | 32 | 1 | | AMD EPYC 7502P 32-Core Processor | 32 | 1 | | AMD EPYC 7513 32-Core Processor | 32 | 2 | | AMD EPYC 7532 32-Core Processor | 32 | 2 | | AMD EPYC 7542 32-Core Processor | 32 | 2 | | AMD EPYC 7543 32-Core Processor | 28 | 1 | | AMD EPYC 7543P 32-Core Processor | 32 | 2 | | AMD EPYC 7551 32-Core Processor | 32 | 2 | | AMD EPYC 7551P 32-Core Processor | 32 | 2 | | AMD EPYC 7552 48-Core Processor | 48 | 2 | | AMD EPYC 75F3 32-Core Processor | 32 | 2 | | AMD EPYC 7601 32-Core Processor | 32 | 2 | | AMD EPYC 7642 48-Core Processor | 48 | 2 | | AMD EPYC 7643 48-Core Processor | 48 | 2 | | AMD EPYC 7663 56-Core Processor | 56 | 2 | | AMD EPYC 7702 64-Core Processor | 64 | 2 | | AMD EPYC 7702P 64-Core Processor | 64 | 2 | | AMD EPYC 7713 64-Core Processor | 64 | 1 | | AMD EPYC 7713P 64-Core Processor | 64 | 2 | | AMD EPYC 7742 64-Core Processor | 64 | 2 | | AMD EPYC 7763 64-Core Processor | 64 | 2 | | AMD EPYC 7773X 64-Core Processor | 64 | 2 | | AMD EPYC 7B12 64-Core Processor | 64 | 2 | | AMD EPYC 7B13 64-Core Processor | 64 | 1 | | AMD EPYC 7C13 64-Core Processor | 64 | 2 | | AMD EPYC 7F32 8-Core Processor | 8 | 2 | | AMD EPYC 7F52 16-Core Processor | 16 | 2 | | AMD EPYC 7F72 24-Core Processor | 24 | 2 | | AMD EPYC 7H12 64-Core Processor | 64 | 2 | | AMD EPYC 7J13 64-Core Processor | 64 | 2 | | AMD EPYC 7K62 48-Core Processor | 48 | 2 | | AMD EPYC 7R32 48-Core Processor | 48 | 2 | | AMD EPYC 7T83 64-Core Processor | 127 | 1 | | AMD EPYC 7V13 64-Core Processor | 24 | 1 | | AMD EPYC 9124 16-Core Processor | 16 | 2 | | AMD EPYC 9254 24-Core Processor | 24 | 2 | | AMD EPYC 9274F 24-Core Processor | 24 | 2 | | AMD EPYC 9334 32-Core Processor | 32 | 2 | | AMD EPYC 9335 32-Core Processor | 32 | 2 | | AMD EPYC 9354 32-Core Processor | 32 | 2 | | AMD EPYC 9354P | 64 | 1 | | AMD EPYC 9354P 32-Core Processor | 32 | 2 | | AMD EPYC 9355 32-Core Processor | 32 | 2 | | AMD EPYC 9355P 32-Core Processor | 32 | 2 | | AMD EPYC 9374F 32-Core Processor | 32 | 1 | | AMD EPYC 9454 48-Core Processor | 48 | 2 | | AMD EPYC 9454P 48-Core Emb Processor | 48 | 2 | | AMD EPYC 9455P 48-Core Processor | 48 | 2 | | AMD EPYC 9474F 48-Core Processor | 48 | 2 | | AMD EPYC 9534 64-Core Processor | 64 | 2 | | AMD EPYC 9554 64-Core Emb Processor | 64 | 1 | | AMD EPYC 9554 64-Core Processor | 126 | 1 | | AMD EPYC 9555 64-Core Processor | 56 | 2 | | AMD EPYC 9654 96-Core Emb Processor | 96 | 1 | | AMD EPYC 9654 96-Core Processor | 96 | 2 | | AMD EPYC 9754 128-Core Processor | 128 | 2 | | AMD EPYC Processor | 1 | 1 | | AMD EPYC Processor (with IBPB) | 16 | 1 | | AMD EPYC-Rome Processor | 16 | 1 | | AMD Ryzen 3 2200G with Radeon Vega Graphics | 4 | 1 | | AMD Ryzen 3 3200G with Radeon Vega Graphics | 4 | 1 | | AMD Ryzen 3 4100 4-Core Processor | 4 | 2 | | AMD Ryzen 5 1600 Six-Core Processor | 6 | 2 | | AMD Ryzen 5 2600 Six-Core Processor | 6 | 2 | | AMD Ryzen 5 2600X Six-Core Processor | 6 | 2 | | AMD Ryzen 5 3600 6-Core Processor | 6 | 2 | | AMD Ryzen 5 3600X 6-Core Processor | 6 | 2 | | AMD Ryzen 5 5500 | 6 | 2 | | AMD Ryzen 5 5600G with Radeon Graphics | 6 | 2 | | Ryzen 5 5600X | 6 | 2 | | AMD Ryzen 5 7600 6-Core Processor | 6 | 2 | | AMD Ryzen 5 8600G w/ Radeon 760M Graphics | 6 | 2 | | AMD Ryzen 5 PRO 2600 Six-Core Processor | 6 | 2 | | AMD Ryzen 7 1700 Eight-Core Processor | 8 | 2 | | AMD Ryzen 7 1700X Eight-Core Processor | 8 | 2 | | AMD Ryzen 7 5700G with Radeon Graphics | 8 | 2 | | AMD Ryzen 7 5700X 8-Core Processor | 8 | 2 | | AMD Ryzen 7 5800X 8-Core Processor | 8 | 2 | | AMD Ryzen 7 7700 8-Core Processor | 8 | 2 | | AMD Ryzen 7 PRO 3700 8-Core Processor | 8 | 2 | | AMD Ryzen 9 3900X 12-Core Processor | 12 | 2 | | Ryzen 9 5900X | 12 | 2 | | AMD Ryzen 9 5950X 16-Core Processor | 16 | 2 | | AMD Ryzen 9 7900 12-Core Processor | 12 | 2 | | AMD Ryzen 9 7950X 16-Core Processor | 16 | 2 | | AMD Ryzen 9 7950X3D 16-Core Processor | 16 | 2 | | AMD Ryzen 9 9950X 16-Core Processor | 16 | 2 | | AMD Ryzen Threadripper 1900X 8-Core Processor | 8 | 2 | | AMD Ryzen Threadripper 1920X 12-Core Processor | 12 | 2 | | AMD Ryzen Threadripper 1950X 16-Core Processor | 16 | 2 | | AMD Ryzen Threadripper 2920X 12-Core Processor | 12 | 2 | | AMD Ryzen Threadripper 2950X 16-Core Processor | 16 | 2 | | AMD Ryzen Threadripper 2970WX 24-Core Processor | 24 | 1 | | AMD Ryzen Threadripper 2990WX 32-Core Processor | 32 | 2 | | AMD Ryzen Threadripper 3960X 24-Core Processor | 24 | 2 | | AMD Ryzen Threadripper 7960X 24-Cores | 24 | 2 | | Ryzen Threadripper PRO 3955WX | 16 | 2 | | AMD Ryzen Threadripper PRO 3975WX 32-Cores | 32 | 2 | | AMD Ryzen Threadripper PRO 3995WX 64-Cores | 64 | 2 | | AMD Ryzen Threadripper PRO 5945WX 12-Cores | 12 | 2 | | AMD Ryzen Threadripper PRO 5955WX 16-Cores | 16 | 2 | | AMD Ryzen Threadripper PRO 5965WX 24-Cores | 24 | 2 | | AMD Ryzen Threadripper PRO 5975WX 32-Cores | 32 | 2 | | AMD Ryzen Threadripper PRO 5995WX 64-Cores | 18 | 1 | | AMD Ryzen Threadripper PRO 7955WX 16-Cores | 16 | 2 | | AMD Ryzen Threadripper PRO 7965WX 24-Cores | 24 | 2 | | AMD Ryzen Threadripper PRO 7975WX 32-Cores | 32 | 2 | | AMD Ryzen Threadripper PRO 7985WX 64-Cores | 112 | 1 | | Common KVM processor | 28 | 1 | | Genuine Intel(R) CPU @ 2.20GHz | 14 | 2 | | Genuine Intel(R) CPU \$0000%@ | 24 | 2 | | Intel Xeon E3-12xx v2 (Ivy Bridge) | 1 | 1 | | Intel Xeon Processor (Icelake) | 40 | 2 | | Intel(R) Celeron(R) CPU G3900 @ 2.80GHz | 2 | 1 | | Intel(R) Celeron(R) G5905 CPU @ 3.50GHz | 2 | 1 | | Intel(R) Core(TM) i3-10100F CPU @ 3.60GHz | 4 | 2 | | Intel(R) Core(TM) i3-10105F CPU @ 3.70GHz | 4 | 2 | | Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz | 2 | 2 | | Intel(R) Core(TM) i3-9100F CPU @ 3.60GHz | 4 | 1 | | Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz | 6 | 2 | | Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz | 6 | 2 | | Intel(R) Core(TM) i5-10600 CPU @ 3.30GHz | 6 | 2 | | Intel(R) Core(TM) i5-14500 | 14 | 2 | | Intel(R) Core(TM) i5-14600K | 14 | 2 | | Intel(R) Core(TM) i5-14600KF | 14 | 2 | | Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz | 4 | 1 | | Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz | 4 | 1 | | Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz | 4 | 1 | | Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz | 4 | 1 | | Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz | 6 | 1 | | Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz | 8 | 2 | | Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz | 8 | 2 | | Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz | 4 | 2 | | Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz | 4 | 2 | | Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz | 4 | 2 | | Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz | 4 | 2 | | Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz | 6 | 2 | | Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz | 4 | 2 | | Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz | 6 | 2 | | Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz | 8 | 1 | | Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz | 14 | 2 | | Intel(R) Core(TM) i9-14900K | 24 | 1 | | Intel(R) Core(TM) Ultra 5 245K | 1 | 1 | | Intel(R) Pentium(R) CPU G3260 @ 3.30GHz | 2 | 1 | | Intel(R) Pentium(R) CPU G4560 @ 3.50GHz | 2 | 2 | | Intel(R) Xeon(R) 6747P | 48 | 2 | | Intel(R) Xeon(R) 6767P | 64 | 2 | | Intel(R) Xeon(R) 6960P | 72 | 2 | | Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz | 6 | 1 | | Intel(R) Xeon(R) CPU X5660 @ 2.80GHz | 6 | 2 | | Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz | 4 | 1 | | Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz | 4 | 1 | | Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz | 6 | 2 | | Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz | 6 | 1 | | Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz | 4 | 1 | | Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz | 1 | 1 | | Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | 8 | 2 | | Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz | 6 | 2 | | Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz | 6 | 2 | | Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz | 8 | 2 | | Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz | 10 | 2 | | Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz | 4 | 2 | | Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz | 4 | 1 | | Intel(R) Xeon(R) CPU E5-2648L v3 @ 1.80GHz | 12 | 2 | | Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz | 16 | 1 | | Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz | 10 | 2 | | Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz | 12 | 2 | | Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz | 10 | 2 | | Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz | 1 | 1 | | Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz | 8 | 2 | | Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | 1 | 1 | | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 8 | 2 | | Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | 10 | 2 | | Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz | 20 | 2 | | Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz | 12 | 2 | | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 12 | 2 | | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 2 | | Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz | 16 | 2 | | Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz | 8 | 2 | | Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 14 | 2 | | Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz | 18 | 2 | | Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHz | 18 | 2 | | Intel(R) Xeon(R) CPU E5-2696 v4 @ 2.20GHz | 22 | 2 | | Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz | 16 | 2 | | Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz | 20 | 2 | | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz | 1 | 1 | | Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz | 22 | 2 | | Intel(R) Xeon(R) CPU E5-4667 v3 @ 2.00GHz | 16 | 2 | | Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz | 12 | 2 | | Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz | 20 | 2 | | Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz | 32 | 1 | | Intel(R) Xeon(R) Gold 5318N CPU @ 2.10GHz | 24 | 2 | | Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz | 26 | 2 | | Intel(R) Xeon(R) Gold 5420+ | 28 | 2 | | Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz | 16 | 2 | | Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz | 40 | 1 | | Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz | 12 | 2 | | Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz | 20 | 2 | | Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz | 18 | 2 | | Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz | 12 | 2 | | Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz | 8 | 2 | | Intel(R) Xeon(R) Gold 6238R CPU @ 2.20GHz | 28 | 2 | | Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz | 24 | 2 | | Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz | 16 | 1 | | Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz | 24 | 1 | | Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz | 22 | 2 | | Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz | 24 | 2 | | Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz | 28 | 2 | | Intel(R) Xeon(R) Gold 6448Y | 32 | 2 | | INTEL(R) XEON(R) GOLD 6548Y+ | 32 | 2 | | Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz | 24 | 2 | | Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz | 24 | 2 | | Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz | 26 | 2 | | Intel(R) Xeon(R) Platinum 8173M CPU @ 2.00GHz | 28 | 2 | | Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz | 28 | 2 | | Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz | 28 | 2 | | Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz | 36 | 2 | | Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz | 32 | 2 | | Intel(R) Xeon(R) Platinum 8452Y | 36 | 2 | | Intel(R) Xeon(R) Platinum 8460Y+ | 40 | 2 | | Intel(R) Xeon(R) Platinum 8462Y+ | 32 | 2 | | Intel(R) Xeon(R) Platinum 8468 | 48 | 2 | | Intel(R) Xeon(R) Platinum 8468V | 44 | 2 | | Intel(R) Xeon(R) Platinum 8470 | 52 | 2 | | Intel(R) Xeon(R) Platinum 8480+ | 56 | 2 | | Intel(R) Xeon(R) Platinum 8480C | 56 | 2 | | Intel(R) Xeon(R) Platinum 8480CL | 56 | 2 | | INTEL(R) XEON(R) PLATINUM 8558 | 48 | 2 | | INTEL(R) XEON(R) PLATINUM 8568Y+ | 48 | 2 | | INTEL(R) XEON(R) PLATINUM 8570 | 56 | 2 | | Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz | 10 | 2 | | Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz | 10 | 2 | | Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz | 24 | 1 | | Intel(R) Xeon(R) Silver 4310T CPU @ 2.30GHz | 10 | 2 | | Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz | 16 | 2 | | Intel(R) Xeon(R) W-2223 CPU @ 3.60GHz | 4 | 2 | | Intel(R) Xeon(R) w5-2455X | 12 | 2 | | Intel(R) Xeon(R) w7-3465X | 28 | 2 | | QEMU Virtual CPU version 2.5+ | 16 | 1 | --- # Source: https://docs.runpod.io/pods/templates/create-custom-template.md ## Build a custom Pod template > A step-by-step guide to extending Runpod's official templates. You can find the complete code for this tutorial, including automated build options with GitHub Actions, in the [runpod-workers/pod-template](https://github.com/runpod-workers/pod-template) repository. This tutorial shows how to build a custom Pod template from the ground up. You'll extend an official Runpod template, add your own dependencies, configure how your container starts, and pre-load machine learning models. This approach saves time during Pod initialization and ensures consistent environments across deployments. By creating custom templates, you can package everything your project needs into a reusable Docker image. Once built, you can deploy your workload in seconds instead of reinstalling dependencies every time you start a new Pod. You can also share your template with members of your team and the wider Runpod community. ## What you'll learn In this tutorial, you'll learn how to: * Create a Dockerfile that extends a Runpod base image. * Configure container startup options (JupyterLab/SSH, application + services, or application only). * Add Python dependencies and system packages. * Pre-load machine learning models from Hugging Face, local files, or custom sources. * Build and test your image, then push it to Docker Hub. * Create a custom Pod template in the Runpod console * Deploy a Pod using your custom template. ## Requirements Before you begin, you'll need: * A [Runpod account](/get-started/manage-accounts). * Docker installed on your local machine or a remote server. * A Docker Hub account (or access to another container registry). * Basic familiarity with Docker and Python. ## Step 1: Set up your project structure First, create a directory for your custom template and the necessary files. Create a new directory for your template project: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir my-custom-pod-template cd my-custom-pod-template ``` Create the following files in your project directory: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} touch Dockerfile requirements.txt main.py ``` Your project structure should now look like this: ``` my-custom-pod-template/ ├── Dockerfile ├── requirements.txt └── main.py ``` ## Step 2: Choose a base image and create your Dockerfile Runpod offers base images with PyTorch, CUDA, and common dependencies pre-installed. You'll extend one of these images to build your custom template. Runpod offers several base images. You can explore available base images on [Docker Hub](https://hub.docker.com/u/runpod). For this tutorial, we'll use the PyTorch image, `runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404` which includes PyTorch 2.8.0, CUDA 12.8.1, and Ubuntu 24.04. Open `Dockerfile` and add the following content: ```dockerfile Dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} # Use Runpod PyTorch base image FROM runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404 # Set environment variables # This ensures Python output is immediately visible in logs ENV PYTHONUNBUFFERED=1 # Set the working directory WORKDIR /app # Install system dependencies if needed RUN apt-get update --yes && \ DEBIAN_FRONTEND=noninteractive apt-get install --yes --no-install-recommends \ wget \ curl \ && rm -rf /var/lib/apt/lists/* # Copy requirements file COPY requirements.txt /app/ # Install Python dependencies RUN pip install --no-cache-dir --upgrade pip && \ pip install --no-cache-dir -r requirements.txt # Copy application files COPY . /app ``` This basic Dockerfile: * Extends the Runpod PyTorch base image. * Installs system packages (`wget`, `curl`). * Installs Python dependencies from `requirements.txt`. * Copies your application code to `/app`. ## Step 3: Add Python dependencies Now define the Python packages your application needs. Open `requirements.txt` and add your Python dependencies: ```txt requirements.txt theme={"theme":{"light":"github-light","dark":"github-dark"}} # Python dependencies # Add your packages here numpy>=1.24.0 requests>=2.31.0 transformers>=4.40.0 ``` These packages will be installed when you build your Docker image. Add any additional libraries your application requires. ## Step 4: Configure container startup behavior Runpod base images come with built-in services like Jupyter and SSH. You can choose how your container starts: whether to keep all the base image services running, run your application alongside those services, or run only your application. There are three ways to configure how your container starts: ### Option 1: Keep all base image services (no changes needed) If you want the default behavior with Jupyter and SSH services, you don't need to modify the Dockerfile. The base image's `/start.sh` script handles everything automatically. This is already configured in the Dockerfile from Step 2. ### Option 2: Automatically run the application after services start If you want to run your application alongside Jupyter/SSH services, add these lines to the end of your Dockerfile: ```dockerfile Dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} # Run application after services start COPY run.sh /app/run.sh RUN chmod +x /app/run.sh CMD ["/app/run.sh"] ``` Create a new file named `run.sh` in the same directory as your `Dockerfile`: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} touch run.sh ``` Then add the following content to it: ```bash run.sh theme={"theme":{"light":"github-light","dark":"github-dark"}} #!/bin/bash # Start base image services (Jupyter/SSH) in background /start.sh & # Wait for services to start sleep 2 # Run your application python /app/main.py # Wait for background processes wait ``` This script starts the base services in the background, then runs your application. ### Option 3: Configure application-only mode For production deployments where you don't need Jupyter or SSH, add these lines to the end of your Dockerfile: ```dockerfile Dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} # Clear entrypoint and run application only ENTRYPOINT [] CMD ["python", "/app/main.py"] ``` This overrides the base image entrypoint and runs only your Python application. *** For this tutorial, we'll use option 1 (default behavior for the base image services) so we can test out the various connection options. ## Step 5: Pre-load a model into your template Pre-loading models into your Docker image means that you won't need to re-download a model every time you start up a new Pod, enabling you to create easily reusable and shareable environments for ML inference. There are two ways to pre-load models: * **Option 1: Automatic download from Hugging Face (recommended)**: This is the simplest approach. During the Docker build, Python downloads and caches the model using the transformers library. * **Option 2: Manual download with wget**: This gives you explicit control and works with custom or hosted models. For this tutorial, we'll use Option 1 (automatic download from Hugging Face) for ease of setup and testing, but you can use Option 2 if you need more control. ### Option 1: Pre-load models from Hugging Face Add these lines to your Dockerfile before the `COPY . /app` line: ```dockerfile Dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} # Set Hugging Face cache directory ENV HF_HOME=/app/models ENV HF_HUB_ENABLE_HF_TRANSFER=0 # Pre-download model during build RUN python -c "from transformers import pipeline; pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')" ``` During the build, Python will download the model and cache it in `/app/models`. When you deploy Pods with this template, the model loads instantly from the cache. ### Option 2: Pre-load models with wget For more control or to use models from custom sources, you can manually download model files during the build. Add these lines to your Dockerfile before the `COPY . /app` line: ```dockerfile Dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} # Create model directory and download files RUN mkdir -p /app/models/distilbert-model && \ cd /app/models/distilbert-model && \ wget -q https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json && \ wget -q https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/model.safetensors && \ wget -q https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/tokenizer_config.json && \ wget -q https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/vocab.txt ``` *** For this tutorial, we'll use option 1 (automatic download from Hugging Face). ## Step 6: Create your application Next we'll create the Python application that will run in your Pod. Open `main.py` and add your application code. Here's an example app that loads a machine learning model and performs inference on sample texts. (You can also replace this with your own application logic.) ```python main.py theme={"theme":{"light":"github-light","dark":"github-dark"}} """ Example Pod template application with sentiment analysis. """ import sys import torch import time import signal from transformers import pipeline def main(): print("Hello from your custom Runpod template!") print(f"Python version: {sys.version.split()[0]}") print(f"PyTorch version: {torch.__version__}") print(f"CUDA available: {torch.cuda.is_available()}") if torch.cuda.is_available(): print(f"CUDA version: {torch.version.cuda}") print(f"GPU device: {torch.cuda.get_device_name(0)}") # Initialize model print("\nLoading sentiment analysis model...") device = 0 if torch.cuda.is_available() else -1 # MODEL LOADING OPTIONS: # OPTION 1: From Hugging Face Hub cache (default) # Bakes the model into the container image using transformers pipeline # Behavior: Loads model from the cache, requires local_files_only=True classifier = pipeline( "sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", device=device, model_kwargs={"local_files_only": True}, ) # OPTION 2: From a local directory # Download the model files using wget, loads them from the local directory # Behavior: Loads directly from /app/models/distilbert-model # To use: Uncomment the pipeline object below, comment OPTION 1 above # classifier = pipeline('sentiment-analysis', # model='/app/models/distilbert-model', # device=device) print("Model loaded successfully!") # Example inference test_texts = [ "This is a wonderful experience!", "I really don't like this at all.", "The weather is nice today.", ] print("\n--- Running sentiment analysis ---") for text in test_texts: result = classifier(text) print(f"Text: {text}") print(f"Result: {result[0]['label']} (confidence: {result[0]['score']:.4f})\n") print("Container is running. Press Ctrl+C to stop.") # Keep container running def signal_handler(sig, frame): print("\nShutting down...") sys.exit(0) signal.signal(signal.SIGINT, signal_handler) signal.signal(signal.SIGTERM, signal_handler) try: while True: time.sleep(60) except KeyboardInterrupt: signal_handler(None, None) if __name__ == "__main__": main() ``` If you're pre-loading a model with `wget` (option 2 from step 5), make sure to uncomment the `classifier = pipeline()` object in `main.py` and comment out the `classifier = pipeline()` object for option 1. ## Step 7: Build and test your Docker image Now that your template is configured, you can build and test your Docker image locally to make sure it works correctly: Run the Docker build command from your project directory: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker build --platform linux/amd64 -t my-custom-template:latest . ``` The `--platform linux/amd64` flag ensures compatibility with Runpod's infrastructure, and is required if you're building on a Mac or ARM system. The build process will: * Download the base image. * Install system dependencies. * Install Python packages. * Download and cache models (if configured). * Copy your application files. This may take 5-15 minutes depending on your dependencies and model sizes. Check that your image was created successfully: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker images | grep my-custom-template ``` You should see your image listed with the `latest` tag, similar to this: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} my-custom-template latest 54c3d1f97912 10 seconds ago 10.9GB ``` To test the container locally, run the following command: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker run --rm -it --platform linux/amd64 my-custom-template:latest /bin/bash ``` This starts the container and connects you to a shell inside it, exactly like the Runpod web terminal but running locally on your machine. You can use this shell to test your application and verify that your dependencies are installed correctly. (Press `Ctrl+D` when you want to return to your local terminal.) When you connect to the container shell, you'll be taken directly to the `/app` directory, which contains your application code (`main.py`) and `requirements.txt`. Your models can be found in `/app/models`. Try running the sample application (or any custom code you added): ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python main.py ``` You should see output from the application in your terminal, including the model loading and inference results. Press `Ctrl+C` to stop the application and `Ctrl+D` when you're ready to exit the container. ## Step 8: Push to Docker Hub To use your template with Runpod, push to Docker Hub (or another container registry). Tag your image with your Docker Hub username: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker tag my-custom-template:latest YOUR_DOCKER_USERNAME/my-custom-template:latest ``` Replace `YOUR_DOCKER_USERNAME` with your actual Docker Hub username. Authenticate with Docker Hub: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker login ``` If you aren't already logged in to Docker Hub, you'll be prompted to enter your Docker Hub username and password. Push your image to Docker Hub: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker push YOUR_DOCKER_USERNAME/my-custom-template:latest ``` This uploads your image to Docker Hub, making it accessible to Runpod. Large images may take several minutes to upload. ## Step 9: Create a Pod template in the Runpod console Next, create a Pod template using your custom Docker image: Navigate to the [Templates page](https://console.runpod.io/user/templates) in the Runpod console and click **New Template**. Configure your template with these settings: * **Name**: Give your template a descriptive name (e.g., "my-custom-template"). * **Container Image**: Enter the Docker Hub image name and tag: `YOUR_DOCKER_USERNAME/my-custom-template:latest`. * **Container Disk**: Set to at least 15 GB. * **HTTP Ports**: Expand the section, click **Add port**, then enter **JupyterLab** as the port label and **8888** as the port number. * **TCP Ports**: Expand the section, click **Add port**, then enter **SSH** as the port label and **22** as the port number. Leave all other settings on their defaults and click **Save Template**. ## Step 10: Deploy and test your template Now you can deploy and test your template on a Pod: Go to the [Pods page](https://console.runpod.io/pods) in the Runpod console and click **Deploy**. Configure your Pod with these settings: * **GPU**: The Distilbert model used in this tutorial is very small, so you can **select any available GPU**. If you're using a different model, you'll need to [select a GPU](/pods/choose-a-pod) that matches its requirements. * **Pod Template**: Click **Change Template**. You should see your custom template ("my-custom-template") in the list. Click it to select it. Leave all other settings on their defaults and click **Deploy On-Demand**. Your Pod will start with all your pre-installed dependencies and models. The first deployment may take a few minutes as Runpod downloads your image. Once your Pod is running, click on your Pod to open the connection options panel. Try one or more connection options: * **Web Terminal**: Click **Enable Web Terminal** and then **Open Web Terminal** to access it. * **JupyterLab**: It may take a few minutes for JupyterLab to start. Once it's labeled as **Ready**, click the **JupyterLab** link to access it. * **SSH**: Copy the SSH command and run it in your local terminal to access it. (See [Connect to a Pod with SSH](/pods/configuration/use-ssh) for details on how to use SSH.) After you've connected, try running the sample application (or any custom code you added): ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python main.py ``` You should see output from the application in your terminal, including the model loading and inference results. To avoid incurring unnecessary charges, make sure to stop and then terminate your Pod when you're finished. (See [Manage Pods](/pods/manage-pods) for detailed instructions.) ## Next steps Congratulations! You've built a custom Pod template and deployed it to Runpod. You can use this as a jumping off point to build your own custom templates with your own applications, dependencies, and models. For example, you can try: * Adding more dependencies and models to your template. * Creating different template versions for different use cases. * Automating builds using GitHub Actions or other CI/CD tools. * Using [Runpod secrets](/pods/templates/secrets) to manage sensitive information. For more information on working with templates, see the [Manage Pod templates](/pods/templates/manage-templates) guide. For more advanced template management, you can use the [Runpod REST API](/api-reference/templates/POST/templates) to programmatically create and update templates. --- # Source: https://docs.runpod.io/serverless/workers/create-dockerfile.md # Create a Dockerfile > Package your handler function for deployment. A Dockerfile defines the build process for a Docker image containing your handler function and all its dependencies. This page explains how to organize your project files and create a Dockerfile for your Serverless worker. ## Project organization Organize your project files in a clear directory structure: ```text project_directory ├── Dockerfile # Instructions for building the Docker image ├── src │ └── handler.py # Your handler function └── builder └── requirements.txt # Dependencies required by your handler ``` Your `requirements.txt` file should list all Python packages your handler needs: ```txt title="requirements.txt" theme={"theme":{"light":"github-light","dark":"github-dark"}} # Example requirements.txt runpod~=1.7.6 torch==2.0.1 pillow==9.5.0 transformers==4.30.2 ``` ## Basic Dockerfile structure A basic Dockerfile for a Runpod Serverless worker follows this structure: ```dockerfile title="Dockerfile" theme={"theme":{"light":"github-light","dark":"github-dark"}} FROM python:3.11.1-slim WORKDIR / # Copy and install requirements COPY builder/requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy your handler code COPY src/handler.py . # Command to run when the container starts CMD ["python", "-u", "/handler.py"] ``` This Dockerfile: 1. Starts with a Python base image. 2. Sets the working directory to the root. 3. Copies and installs Python dependencies. 4. Copies your handler code. 5. Specifies the command to run when the container starts. ## Choosing a base image The base image you choose affects your image size, startup time, and available system dependencies. Common options include: ### Python slim images Recommended for most use cases. These images are smaller and faster to download: ```dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} FROM python:3.11.1-slim ``` ### Python full images Include more system tools and libraries but are larger: ```dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} FROM python:3.11.1 ``` ### CUDA images Required if you need CUDA libraries for GPU-accelerated workloads: ```dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04 # Install Python RUN apt-get update && apt-get install -y python3.11 python3-pip ``` ### Custom base images You can build on top of specialized images for specific frameworks: ```dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime ``` ## Including models and files If your model is available on Hugging Face, we strongly recommend enabling [cached models](/serverless/endpoints/model-caching) instead of baking/downloading the model into your Docker image. Cached models provide faster startup times, lower costs, and uses less storage. ### Baking models into the image If you need to include model files or other assets in your image, use the `COPY` instruction: ```dockerfile title="Dockerfile" theme={"theme":{"light":"github-light","dark":"github-dark"}} FROM python:3.11.1-slim WORKDIR / # Copy and install requirements COPY builder/requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy your code and model files COPY src/handler.py . COPY models/ /models/ # Set environment variables if needed ENV MODEL_PATH=/models/my_model.pt # Command to run when the container starts CMD ["python", "-u", "/handler.py"] ``` ### Downloading models during build You can download models during the Docker build process: ```dockerfile title="Dockerfile" theme={"theme":{"light":"github-light","dark":"github-dark"}} # Download model files RUN wget -q URL_TO_YOUR_MODEL -O /models/my_model.pt # Or use a script to download from Hugging Face RUN python -c "from transformers import AutoModel; AutoModel.from_pretrained('model-name')" ``` ## Environment variables Set environment variables to configure your application without hardcoding values: ```dockerfile title="Dockerfile" theme={"theme":{"light":"github-light","dark":"github-dark"}} ENV MODEL_PATH=/models/my_model.pt ENV LOG_LEVEL=INFO ENV MAX_BATCH_SIZE=4 ``` You can override these at runtime through the Runpod console when configuring your endpoint. For details on how to access environment variables in your handler functions, see [Environment variables](/serverless/development/environment-variables). ## Optimizing image size Smaller images download and start faster, reducing cold start times. Use these techniques to minimize image size: ### Use multi-stage builds Multi-stage builds let you compile dependencies in one stage and copy only the necessary files to the final image: ```dockerfile title="Dockerfile" theme={"theme":{"light":"github-light","dark":"github-dark"}} # Build stage FROM python:3.11.1 AS builder WORKDIR /build COPY builder/requirements.txt . RUN pip install --no-cache-dir --target=/build/packages -r requirements.txt # Runtime stage FROM python:3.11.1-slim WORKDIR / COPY --from=builder /build/packages /usr/local/lib/python3.11/site-packages COPY src/handler.py . CMD ["python", "-u", "/handler.py"] ``` ### Clean up build artifacts Remove unnecessary files after installation: ```dockerfile title="Dockerfile" theme={"theme":{"light":"github-light","dark":"github-dark"}} RUN apt-get update && apt-get install -y build-essential \ && pip install --no-cache-dir -r requirements.txt \ && apt-get remove -y build-essential \ && apt-get autoremove -y \ && rm -rf /var/lib/apt/lists/* ``` ### Use .dockerignore Create a `.dockerignore` file to exclude unnecessary files from the build context: ```txt title=".dockerignore" theme={"theme":{"light":"github-light","dark":"github-dark"}} .git .gitignore README.md tests/ *.pyc __pycache__/ .venv/ venv/ ``` ## Next steps After creating your Dockerfile, you can: * [Build and deploy your image from Docker Hub](/serverless/workers/deploy). * [Deploy directly from GitHub](/serverless/workers/github-integration). * [Test your handler locally](/serverless/development/local-testing) before building the image. --- # Source: https://docs.runpod.io/tutorials/introduction/containers/create-dockerfiles.md # Dockerfile In the previous step, you ran a command that prints the container's uptime. Now you'll create a Dockerfile to customize the contents of your own Docker image. ## Create a Dockerfile Create a new file called `Dockerfile` and add the following items. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} FROM busybox COPY entrypoint.sh / RUN chmod +x /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ``` This Dockerfile starts from the `busybox` image like we used before. It then adds a custom `entrypoint.sh` script, makes it executable, and configures it as the entrypoint. ## The entrypoint script Now let's create `entrypoint.sh` with the following contents: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} #!/bin/sh echo "The time is: $(date)" ``` While we named this script `entrypoint.sh` you will see a variety of naming conventions; such as: - `start.sh` - `CMD.sh` - `entry_path.sh` These files are normally placed in a folder called `script` but it is dependent on the maintainers of that repository. This is a simple script that will print the current time when the container starts. ### Why an entrypoint script: - It lets you customize what command gets run when a container starts from your image. - For example, our script runs date to print the time. - Without it, containers would exit immediately after starting. - Entrypoints make images executable and easier to reuse. ## Build the image With those files created, we can now build a Docker image using our Dockerfile: ```bash docker image build -t my-time-image . ``` This will build the image named `my-time-image` from the Dockerfile in the current directory. ### Why build a custom image: - Lets you package up custom dependencies and configurations. - For example you can install extra software needed for your app. - Makes deploying applications more reliable and portable. - Instead of installing things manually on every server, just use your image. - Custom images can be shared and reused easily across environments. - Building images puts your application into a standardized unit that "runs anywhere". - You can version images over time as you update configurations. ## Run the image Finally, let's run a container from our new image: ```bash docker run my-time-image ``` We should see the same output as before printing the current time! Entrypoints and Dockerfiles let you define reusable, executable containers that run the software and commands you need. This makes deploying and sharing applications much easier without per-server configuration. By putting commands like this into a Dockerfile, you can easily build reusable and shareable images. --- # Source: https://docs.runpod.io/serverless/workers/deploy.md # Deploy workers from Docker Hub > Build, test, and deploy your worker image from Docker Hub. After [creating a Dockerfile](/serverless/workers/create-dockerfile) for your worker, you can build the image, test it locally, and deploy it to a Serverless endpoint. ## Requirements * A [Dockerfile](/serverless/workers/create-dockerfile) that packages your handler function. * [Docker](https://docs.docker.com/get-started/get-docker/) installed on your development machine. * A [Docker Hub](https://hub.docker.com/) account. ## Build the Docker image From your terminal, navigate to your project directory and build the Docker image: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} docker build --platform linux/amd64 \ -t DOCKER_USERNAME/WORKER_NAME:VERSION . ``` Replace `DOCKER_USERNAME` with your Docker Hub username, `WORKER_NAME` with a descriptive name for your worker, and `VERSION` with an appropriate version tag. The `--platform linux/amd64` flag is required to ensure compatibility with Runpod's infrastructure. This is especially important if you're building on an ARM-based system (like Apple Silicon Macs), as the default platform would be incompatible with Runpod's infrastructure. ## Test the image locally Before pushing it to the registry, you should test your Docker image locally: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} docker run -it DOCKER_USERNAME/WORKER_NAME:VERSION ``` If your handler is properly configured with a [test input](/serverless/workers/handler-functions#local-testing), you should see it process the test input and provide output. ## Push the image to Docker Hub Make your image available to Runpod by pushing it to Docker Hub: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} # Log in to Docker Hub docker login # Push the image docker push DOCKER_USERNAME/WORKER_NAME:VERSION ``` Once your image is in the Docker container registry, you can [create a Serverless endpoint](/serverless/endpoints/overview#create-an-endpoint) through the Runpod console. ## Image versioning For production workloads, use SHA tags for absolute reproducibility: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} # Get the SHA after pushing docker inspect --format='{{index .RepoDigests 0}}' DOCKER_USERNAME/WORKER_NAME:VERSION # Use the SHA when deploying # DOCKER_USERNAME/WORKER_NAME:VERSION@sha256:4d3d4b3c5a5c2b3a5a5c3b2a5a4d2b3a2b3c5a3b2a5d2b3a3b4c3d3b5c3d4a3 ``` Versioning best practices: * Never rely on the `:latest` tag for production. * Use semantic versioning AND SHA tags for clarity and reproducibility. * Document the specific image SHA in your deployment documentation. * Keep images as small as possible for faster startup times. ## Deploy an endpoint If your files are hosted on GitHub, you can [deploy your worker directly from a GitHub repository](/serverless/workers/github-integration) through the Runpod console. You can deploy your worker image directly from a Docker registry through the Runpod console: 1. Navigate to the [Serverless section](https://www.console.runpod.io/serverless) of the Runpod console. 2. Click **New Endpoint**. 3. Click **Import from Docker Registry**. 4. In the **Container Image** field, enter your Docker image URL (e.g., `docker.io/yourusername/worker-name:v1.0.0`), then click **Next**. 5. Configure your endpoint settings: * Enter an **Endpoint Name**. * Choose your **Endpoint Type**: select **Queue** for traditional queue-based processing or **Load Balancer** for direct HTTP access (see [Load balancing endpoints](/serverless/load-balancing/overview) for details). * Under **GPU Configuration**, select the appropriate GPU types for your workload. * Configure [other settings](/serverless/endpoints/endpoint-configurations) as needed (active/max workers, timeouts, environment variables). 6. Click **Deploy Endpoint** to deploy your worker. ## Troubleshoot deployment issues If your worker fails to start or process requests: 1. Check the [logs](/serverless/development/logs) in the Runpod console for error messages. 2. Verify your handler function works correctly in [local testing](/serverless/development/local-testing). 3. Ensure all dependencies are properly installed in the [Docker image](/serverless/workers/create-dockerfile). 4. Check that your Docker image is compatible with the selected GPU type. 5. Verify your [input format](/serverless/endpoints/send-requests) matches what your handler expects. --- # Source: https://docs.runpod.io/tutorials/introduction/containers/docker-commands.md # Docker commands Runpod enables bring-your-own-container (BYOC) development. If you choose this workflow, you will be using Docker commands to build, run, and manage your containers. The following is a reference sheet to some of the most commonly used Docker commands. ## Login Log in to a registry (like Docker Hub) from the CLI. This saves credentials locally. ```bash docker login docker login -u myusername ``` ## Images `docker push` - Uploads a container image to a registry like Docker Hub. `docker pull` - Downloads container images from a registry like Docker Hub. `docker images` - Lists container images that have been downloaded locally. `docker rmi` - Deletes/removes a Docker container image from the machine. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker push myuser/myimage:v1 # Push custom image docker pull someimage # Pull shared image docker images # List downloaded images docker rmi # Remove/delete image ``` ## Containers `docker run` - Launches a new container from a Docker image. `docker ps` - Prints out a list of containers currently running. `docker logs` - Shows stdout/stderr logs for a specific container. `docker stop/rm` - Stops or totally removes a running container. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker run # Start new container from image docker ps # List running containers docker logs # Print logs from container docker stop # Stop running container docker rm # Remove/delete container ``` ## Dockerfile `docker build` - Builds a Docker image by reading build instructions from a Dockerfile. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker build # Build image from Dockerfile docker build --platform=linux/amd64 # Build for specific architecture ``` For the purposes of using Docker with Runpod, you should ensure your build command uses the `--platform=linux/amd64` flag to build for the correct architecture. ## Volumes When working with a Docker and Runpod, see how to [attach a network volume](/storage/network-volumes). `docker volume create` - Creates a persisted and managed volume that can outlive containers. `docker run -v` - Mounts a volume into a specific container to allow persisting data past container lifecycle. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker volume create # Create volume docker run -v :/data # Mount volume into container ``` ## Network `docker network create` - Creates a custom virtual network for containers to communicate over. `docker run --network=` - Connects a running container to a Docker user-defined network. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker network create # Create user-defined network docker run --network= # Connect container ``` ## Execute `docker exec` - Execute a command in an already running container. Useful for debugging/inspecting containers: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} docker exec docker exec mycontainer ls -l /etc # List files in container ``` --- # Source: https://docs.runpod.io/integrations/dstack.md # Manage Pods with dstack on Runpod [dstack](https://dstack.ai/) is an open-source tool that simplifies the orchestration of Pods for AI and ML workloads. By defining your application and resource requirements in YAML configuration files, it automates the provisioning and management of cloud resources on Runpod, allowing you to focus on your application logic rather than the infrastructure. In this guide, we'll walk through setting up [dstack](https://dstack.ai/) with Runpod to deploy [vLLM](https://github.com/vllm-project/vllm). We'll serve the `meta-llama/Llama-3.1-8B-Instruct` model from Hugging Face using a Python environment. ## Prerequisites * [A Runpod account with an API key](/get-started/api-keys) * On your local machine: * Python 3.8 or higher * `pip` (or `pip3` on macOS) * Basic utilities: `curl` * These instructions are applicable for macOS, Linux, and Windows systems. ### Windows Users * It's recommended to use [WSL (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install) or tools like [Git Bash](https://gitforwindows.org/) to follow along with the Unix-like commands used in this tutorial * Alternatively, Windows users can use PowerShell or Command Prompt and adjust commands accordingly ## Installation ### Setting Up the dstack Server 1. **Prepare Your Workspace** Open a terminal or command prompt and create a new directory for this tutorial: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir runpod-dstack-tutorial cd runpod-dstack-tutorial ``` 2. **Set Up a Python Virtual Environment** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python3 -m venv .venv source .venv/bin/activate ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python3 -m venv .venv source .venv/bin/activate ``` **Command Prompt:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python -m venv .venv .venv\Scripts\activate ``` **PowerShell:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python -m venv .venv .venv\Scripts\Activate.ps1 ``` 3. **Install dstack** Use `pip` to install dstack: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip3 install -U "dstack[all]" ``` **Note:** If `pip3` is not available, you may need to install it or use `pip`. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip install -U "dstack[all]" ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip install -U "dstack[all]" ``` ### Configuring dstack for Runpod 1. **Create the Global Configuration File** The following `config.yml` file is a **global configuration** used by [dstack](https://dstack.ai/) for all deployments on your computer. It's essential to place it in the correct configuration directory. * **Create the configuration directory:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir -p ~/.dstack/server ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir -p ~/.dstack/server ``` **Command Prompt or PowerShell:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir %USERPROFILE%\.dstack\server ``` * **Navigate to the configuration directory:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd ~/.dstack/server ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd ~/.dstack/server ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd %USERPROFILE%\.dstack\server ``` * **Create the `config.yml` File** In the configuration directory, create a file named `config.yml` with the following content: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} projects: - name: main backends: - type: runpod creds: type: api_key api_key: YOUR_RUNPOD_API_KEY ``` Replace `YOUR_RUNPOD_API_KEY` with the API key you obtained from Runpod. 2. **Start the dstack Server** From the configuration directory, start the dstack server: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack server ``` You should see output indicating that the server is running: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} [INFO] Applying ~/.dstack/server/config.yml... [INFO] The admin token is ADMIN-TOKEN [INFO] The dstack server is running at http://127.0.0.1:3000 ``` The `ADMIN-TOKEN` displayed is important for accessing the dstack web UI. 3. **Access the dstack Web UI** * Open your web browser and navigate to `http://127.0.0.1:3000`. * When prompted for an admin token, enter the `ADMIN-TOKEN` from the server output. * The web UI allows you to monitor and manage your deployments. ## Deploying vLLM as a Task ### Step 1: Configure the Deployment Task 1. **Prepare for Deployment** * Open a new terminal or command prompt window. * Navigate to your tutorial directory: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd runpod-dstack-tutorial ``` * **Activate the Python Virtual Environment** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} source .venv/bin/activate ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} source .venv/bin/activate ``` **Command Prompt:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} .venv\Scripts\activate ``` **PowerShell:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} .venv\Scripts\Activate.ps1 ``` 2. **Create a Directory for the Task** Create and navigate to a new directory for the deployment task: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir task-vllm-llama cd task-vllm-llama ``` 3. **Create the dstack Configuration File** * **Create the `.dstack.yml` File** Create a file named `.dstack.yml` (or `dstack.yml` if your system doesn't allow filenames starting with a dot) with the following content: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} type: task name: vllm-llama-3.1-8b-instruct python: "3.10" env: - HUGGING_FACE_HUB_TOKEN=YOUR_HUGGING_FACE_HUB_TOKEN - MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct - MAX_MODEL_LEN=8192 commands: - pip install vllm - vllm serve $MODEL_NAME --port 8000 --max-model-len $MAX_MODEL_LEN ports: - 8000 spot_policy: on-demand resources: gpu: name: "RTX4090" memory: "24GB" cpu: 16.. ``` Replace `YOUR_HUGGING_FACE_HUB_TOKEN` with your actual [Hugging Face access token](https://huggingface.co/settings/tokens) (read-access is enough) or define the token in your environment variables. Without this token, the model cannot be downloaded as it is gated. ### Step 2: Initialize and Deploy the Task 1. **Initialize dstack** Run the following command **in the directory where your `.dstack.yml` file is located**: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack init ``` 2. **Apply the Configuration** Deploy the task by applying the configuration: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack apply ``` * You will see an output summarizing the deployment configuration and available instances. * When prompted: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} Submit the run vllm-llama-3.1-8b-instruct? [y/n]: ``` Type `y` and press `Enter` to confirm. * The `ports` configuration provides port forwarding from the deployed pod to `localhost`, allowing you to access the deployed vLLM via `localhost:8000`. 3. **Monitor the Deployment** * After executing `dstack apply`, you'll see all the steps that dstack performs: * Provisioning the pod on Runpod. * Downloading the Docker image. * Installing required packages. * Downloading the model from Hugging Face. * Starting the vLLM server. * The logs of vLLM will be displayed in the terminal. * To monitor the logs at any time, run: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack logs vllm-llama-3.1-8b-instruct ``` * Wait until you see logs indicating that vLLM is serving the model, such as: ```bash INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) ``` ### Step 3: Test the Model Server 1. **Access the Service** Since the `ports` configuration forwards port `8000` from the deployed pod to `localhost`, you can access the vLLM server via `http://localhost:8000`. 2. **Test the Service Using `curl`** Use the following `curl` command to test the deployed model: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are Poddy, a helpful assistant."}, {"role": "user", "content": "What is your name?"} ], "temperature": 0, "max_tokens": 150 }' ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are Poddy, a helpful assistant."}, {"role": "user", "content": "What is your name?"} ], "temperature": 0, "max_tokens": 150 }' ``` **Command Prompt:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{ \"model\": \"meta-llama/Llama-3.1-8B-Instruct\", \"messages\": [ {\"role\": \"system\", \"content\": \"You are Poddy, a helpful assistant.\"}, {\"role\": \"user\", \"content\": \"What is your name?\"} ], \"temperature\": 0, \"max_tokens\": 150 }" ``` **PowerShell:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl.exe -Method Post http://localhost:8000/v1/chat/completions ` -Headers @{ "Content-Type" = "application/json" } ` -Body '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are Poddy, a helpful assistant."}, {"role": "user", "content": "What is your name?"} ], "temperature": 0, "max_tokens": 150 }' ``` 3. **Verify the Response** You should receive a JSON response similar to the following: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "id": "chat-f0566a5143244d34a0c64c968f03f80c", "object": "chat.completion", "created": 1727902323, "model": "meta-llama/Llama-3.1-8B-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "My name is Poddy, and I'm here to assist you with any questions or information you may need.", "tool_calls": [] }, "logprobs": null, "finish_reason": "stop", "stop_reason": null } ], "usage": { "prompt_tokens": 49, "total_tokens": 199, "completion_tokens": 150 }, "prompt_logprobs": null } ``` This confirms that the model is running and responding as expected. ### Step 4: Clean Up To avoid incurring additional costs, it's important to stop the task when you're finished. 1. **Stop the Task** In the terminal where you ran `dstack apply`, you can stop the task by pressing `Ctrl + C`. You'll be prompted: ```bash Stop the run vllm-llama-3.1-8b-instruct before detaching? [y/n]: ``` Type `y` and press `Enter` to confirm stopping the task. 2. **Terminate the Instance** The instance will terminate automatically after stopping the task. If you wish to ensure the instance is terminated immediately, you can run: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack stop vllm-llama-3.1-8b-instruct ``` 3. **Verify Termination** Check your Runpod dashboard or the [dstack](https://dstack.ai/) web UI to ensure that the instance has been terminated. ## Additional Tips: Using Volumes for Persistent Storage If you need to retain data between runs or cache models to reduce startup times, you can use volumes. ### Creating a Volume Create a separate [dstack](https://dstack.ai/) file named `volume.dstack.yml` with the following content: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} type: volume name: llama31-volume backend: runpod region: EUR-IS-1 # Required size size: 100GB ``` The `region` ties your volume to a specific region, which then also ties your Pod to that same region. Apply the volume configuration: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} dstack apply -f volume.dstack.yml ``` This will create the volume named `llama31-volume`. ### Using the Volume in Your Task Modify your `.dstack.yml` file to include the volume: ```yml theme={"theme":{"light":"github-light","dark":"github-dark"}} volumes: - name: llama31-volume path: /data ``` This configuration will mount the volume to the `/data` directory inside your container. By doing this, you can store models and data persistently, which can be especially useful for large models that take time to download. For more information on using volumes with Runpod, refer to the [dstack blog on volumes](https://dstack.ai/blog/volumes-on-runpod/). *** ## Conclusion By leveraging [dstack](https://dstack.ai/) on Runpod, you can efficiently deploy and manage Pods, accelerating your development workflow and reducing operational overhead. --- # Source: https://docs.runpod.io/serverless/development/dual-mode-worker.md # Pod-first development > Develop on a Pod before deploying your worker to Serverless for faster iteration. Developing machine learning applications often requires powerful GPUs, making local development challenging. Instead of repeatedly deploying your worker to Serverless for testing, you can develop on a Pod first and then deploy the same Docker image to Serverless when ready. This "Pod-first" workflow lets you develop and test interactively in a GPU environment, then seamlessly transition to Serverless for production. You'll use a Pod as your cloud-based development machine with tools like Jupyter Notebooks and SSH, catching issues early before deploying your worker to Serverless. To get started quickly, you can [clone this repository](https://github.com/justinwlin/Runpod-GPU-And-Serverless-Base) for a pre-configured template for a dual-mode worker. ## What you'll learn In this tutorial you'll learn how to: * Set up a project for a dual-mode Serverless worker. * Create a handler that adapts based on an environment variable. * Write a startup script to manage different operational modes. * Build a Docker image that works in both Pod and Serverless environments. * Deploy and test your worker in both environments. ## Requirements * You've [created a Runpod account](/get-started/manage-accounts). * You've installed [Python 3.x](https://www.python.org/downloads/) and [Docker](https://docs.docker.com/get-started/get-docker/) and configured them for your command line. * Basic understanding of Docker concepts and shell scripting. ## Step 1: Set up your project structure Create a directory for your project and the necessary files: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} mkdir dual-mode-worker cd dual-mode-worker touch handler.py start.sh Dockerfile requirements.txt ``` This creates: * `handler.py`: Your Python script with the Runpod handler logic. * `start.sh`: A shell script that will be the entrypoint for your Docker container. * `Dockerfile`: Instructions to build your Docker image. * `requirements.txt`: A file to list Python dependencies. ## Step 2: Create the handler This Python script will check for a `MODE_TO_RUN` environment variable to determine whether to run in Pod or Serverless mode. Add the following code to `handler.py`: ```python handler.py theme={"theme":{"light":"github-light","dark":"github-dark"}} import os import asyncio import runpod # Use the MODEL environment variable; fallback to a default if not set mode_to_run = os.getenv("MODE_TO_RUN", "pod") model_length_default = 25000 print("------- ENVIRONMENT VARIABLES -------") print("Mode running: ", mode_to_run) print("------- -------------------- -------") async def handler(event): inputReq = event.get("input", {}) return inputReq if mode_to_run == "pod": async def main(): prompt = "Hello World" requestObject = {"input": {"prompt": prompt}} response = await handler(requestObject) print(response) asyncio.run(main()) else: runpod.serverless.start({ "handler": handler, "concurrency_modifier": lambda current: 1, }) ``` Key features: * `MODE_TO_RUN = os.getenv("MODE_TO_RUN", "pod")`: Reads the mode from an environment variable, defaulting to `pod`. * `async def handler(event)`: Your core logic. * `if mode_to_run == "pod" ... else`: This conditional controls what happens when the script is executed directly. * In `pod` mode, it runs a sample test call to your `handler` function, allowing for quick iteration. * In `serverless`" mode, it starts the Runpod Serverless worker. ## Step 3: Create the `start.sh` script The `start.sh` script serves as the entrypoint for your Docker container and manages different operational modes. It reads the `MODE_TO_RUN` environment variable and configures the container accordingly. Add the following code to `start.sh`: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} #!/bin/bash set -e # Exit the script if any statement returns a non-true return value # Set workspace directory from env or default WORKSPACE_DIR="${WORKSPACE_DIR:-/workspace}" # Start nginx service start_nginx() { echo "Starting Nginx service..." service nginx start } # Execute script if exists execute_script() { local script_path=$1 local script_msg=$2 if [[ -f ${script_path} ]]; then echo "${script_msg}" bash ${script_path} fi } # Setup ssh setup_ssh() { if [[ $PUBLIC_KEY ]]; then echo "Setting up SSH..." mkdir -p ~/.ssh echo "$PUBLIC_KEY" >> ~/.ssh/authorized_keys chmod 700 -R ~/.ssh # Generate SSH host keys if not present generate_ssh_keys service ssh start echo "SSH host keys:" cat /etc/ssh/*.pub fi } # Generate SSH host keys generate_ssh_keys() { ssh-keygen -A } # Export env vars export_env_vars() { echo "Exporting environment variables..." printenv | grep -E '^RUNPOD_|^PATH=|^_=' | awk -F = '{ print "export " $1 "=\"" $2 "\"" }' >> /etc/rp_environment echo 'source /etc/rp_environment' >> ~/.bashrc } # Start jupyter lab start_jupyter() { echo "Starting Jupyter Lab..." mkdir -p "$WORKSPACE_DIR" && \ cd / && \ nohup jupyter lab --allow-root --no-browser --port=8888 --ip=* --NotebookApp.token='' --NotebookApp.password='' --FileContentsManager.delete_to_trash=False --ServerApp.terminado_settings='{"shell_command":["/bin/bash"]}' --ServerApp.allow_origin=* --ServerApp.preferred_dir="$WORKSPACE_DIR" &> /jupyter.log & echo "Jupyter Lab started without a password" } # Call Python handler if mode is serverless or both call_python_handler() { echo "Calling Python handler.py..." python $WORKSPACE_DIR/handler.py } # ---------------------------------------------------------------------------- # # Main Program # # ---------------------------------------------------------------------------- # start_nginx echo "Pod Started" setup_ssh case $MODE_TO_RUN in serverless) echo "Running in serverless mode" call_python_handler ;; pod) echo "Running in pod mode" start_jupyter ;; *) echo "Invalid MODE_TO_RUN value: $MODE_TO_RUN. Expected 'serverless', 'pod', or 'both'." exit 1 ;; esac export_env_vars echo "Start script(s) finished" sleep infinity ``` Here are some key features of this script: * `case $MODE_TO_RUN in ... esac`: This structure directs the startup based on the mode. * `serverless` mode: Executes `handler.py`, which then starts the Runpod Serverless worker. `exec` replaces the shell process with the Python process. * `pod` mode: Starts up the JupyterLab server for Pod development, then runs `sleep infinity` to keep the container alive so you can connect to it (e.g., via SSH or `docker exec`). You would then manually run `python /app/handler.py` inside the Pod to test your handler logic. ## Step 4: Create the `Dockerfile` Create a `Dockerfile` that includes your handler and startup script: ```dockerfile theme={"theme":{"light":"github-light","dark":"github-dark"}} # Use an official Runpod base image FROM runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 # Environment variables ENV PYTHONUNBUFFERED=1 # Supported modes: pod, serverless ARG MODE_TO_RUN=pod ENV MODE_TO_RUN=$MODE_TO_RUN # Set up the working directory ARG WORKSPACE_DIR=/app ENV WORKSPACE_DIR=${WORKSPACE_DIR} WORKDIR $WORKSPACE_DIR # Install dependencies in a single RUN command to reduce layers and clean up in the same layer to reduce image size RUN apt-get update --yes --quiet && \ DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --no-install-recommends \ software-properties-common \ gpg-agent \ build-essential \ apt-utils \ ca-certificates \ curl && \ add-apt-repository --yes ppa:deadsnakes/ppa && \ apt-get update --yes --quiet && \ DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --no-install-recommends # Create and activate a Python virtual environment RUN python3 -m venv /app/venv ENV PATH="/app/venv/bin:$PATH" # Install Python packages RUN pip install --no-cache-dir \ asyncio \ requests \ runpod # Install requirements.txt COPY requirements.txt ./requirements.txt RUN pip install --no-cache-dir --upgrade pip && \ pip install --no-cache-dir -r requirements.txt # Delete's the default start.sh file from Runpod (so we can replace it with our own below) RUN rm ../start.sh # Copy all of our files into the container COPY handler.py $WORKSPACE_DIR/handler.py COPY start.sh $WORKSPACE_DIR/start.sh # Make sure start.sh is executable RUN chmod +x start.sh # Make sure that the start.sh is in the path RUN ls -la $WORKSPACE_DIR/start.sh # depot build -t justinrunpod/pod-server-base:1.0 . --push --platform linux/amd64 CMD $WORKSPACE_DIR/start.sh ``` Key features of this `Dockerfile`: * `FROM runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04`: Starts with a Runpod base image that comes with nginx, runpodctl, and other helpful base packages. * `ARG WORKSPACE_DIR=/workspace` and `ENV WORKSPACE_DIR=${WORKSPACE_DIR}`: Allows the workspace directory to be set at build time. * `WORKDIR $WORKSPACE_DIR`: Sets the working directory to the value of `WORKSPACE_DIR`. * `COPY requirements.txt ./requirements.txt` and `RUN pip install ...`: Installs Python dependencies. * `COPY . .`: Copies all application files into the workspace directory. * `ENV MODE_TO_RUN="pod"`: Sets the default operational mode to "pod". This can be overridden at runtime. * `CMD ["$WORKSPACE_DIR/start.sh"]`: Specifies `start.sh` as the command to run when the container starts. ## Step 5: Build and push your Docker image Instead of building and pushing your image via Docker Hub, you can also [deploy your worker from a GitHub repository](/serverless/workers/github-integration). Now you're ready to build your Docker image and push it to Docker Hub: Build your Docker image, replacing `YOUR_USERNAME` with your Docker Hub username and choosing a suitable image name: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} docker build --platform linux/amd64 --tag YOUR_USERNAME/dual-mode-worker . ``` The `--platform linux/amd64` flag is important for compatibility with Runpod's infrastructure. ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} docker push YOUR_USERNAME/dual-mode-worker:latest ``` You might need to run `docker login` first. ## Step 6: Testing in Pod mode Now that you've finished building our Docker image, let's explore how you would use the Pod-first development workflow in practice. Deploy the image to a Pod by following these steps: 1. Navigate to the [Pods page](https://www.runpod.io/console/pods) in the Runpod console. 2. Click **Deploy**. 3. Select your preferred GPU. 4. Under **Container Image**, enter `YOUR_USERNAME/dual-mode-worker:latest`. 5. Under **Public Environment Variables**, select **Add environment variable** and add: * Key: `MODE_TO_RUN` * Value: `pod` 6. Click **Deploy**. Once your Pod is running, you can: * [Connect via the web terminal, JupyterLab, or SSH](/pods/connect-to-a-pod) to test your handler interactively. * Debug and iterate on your code. * Test GPU-specific operations. * Edit `handler.py` within the Pod and re-run it for rapid iteration. ## Step 7: Deploy to a Serverless endpoint Once you're confident with your `handler.py` logic tested in Pod mode, you're ready to deploy your dual-mode worker to a Serverless endpoint. 1. Navigate to the [Serverless page](https://www.runpod.io/console/serverless) in the Runpod console. 2. Click **New Endpoint**. 3. Click **Import from Docker Registry**. 4. In the **Container Image** field, enter your Docker image URL: `docker.io/YOUR_USERNAME/dual-mode-worker:latest`, then click *Next*\*\*\*. 5. Under **Environment Variables**, add: * Key: `MODE_TO_RUN` * Value: `serverless` 6. Configure your endpoint settings (GPU type, workers, etc.). 7. Click **Deploy Endpoint**. The *same* image will be used for your workers, but `start.sh` will now direct them to run in Serverless mode, using the `runpod.serverless.start` function to process requests. ## Step 8: Test your endpoint After deploying your endpoint in to Serverless mode, you can test it by sending API requests to your endpoint. 1. Navigate to your endpoint's detail page in the Runpod console. 2. Click the **Requests** tab. 3. Use the following JSON as test input: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "input": { "prompt": "Hello World!", } } ``` 4. Click **Run**. After a few moments for initialization and processing, you should see output similar to this: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "delayTime": 12345, // This will vary "executionTime": 3050, // This will be around 3000ms + overhead "id": "some-unique-id", "output": { "output": "Processed prompt: 'Hello Serverless World!' after 3s in Serverless mode." }, "status": "COMPLETED" } ``` ## Explore the Pod-first development workflow Congratulations! You've successfully built, deployed, and tested a dual-mode Serverless worker. Now, let's explore the recommended iteration process for a Pod-first development workflow: 1. Deploy your initial Docker image to a Runpod Pod, ensuring `MODE_TO_RUN` is set to `pod` (or rely on the Dockerfile default). 2. [Connect to your Pod](/pods/connect-to-a-pod) (via SSH or web terminal). 3. Navigate to the `/app` directory. 4. As you develop, install any necessary Python packages (`pip install PACKAGE_NAME`) or system dependencies (`apt-get install PACKAGE_NAME`). 5. Iterate on your `handler.py` script. Test your changes frequently by running `python handler.py` directly in the Pod's terminal. This will execute the test harness you defined in the `elif MODE_TO_RUN == "pod":` block, giving you immediate feedback. Once you're satisfied with a set of changes and have new dependencies: 1. Add new Python packages to your `requirements.txt` file. 2. Add system installation commands (e.g., `RUN apt-get update && apt-get install -y PACKAGE_NAME`) to your `Dockerfile`. 3. Ensure your updated `handler.py` is saved. 1. Re-deploy your worker image to a Serverless endpoint using [Docker Hub](/serverless/workers/deploy) or [GitHub](/serverless/workers/github-integration). 2. During deployment, ensure that the `MODE_TO_RUN` environment variable for the endpoint is set to `serverless`. For instructions on how to set environment variables during deployment, see [Manage endpoints](/serverless/endpoints/overview). 3. After your endpoint is deployed, you can test it by [sending API requests](/serverless/endpoints/send-requests). This iterative loop (write your handler, update the Docker image, test in Pod mode, then deploy to Serverless) enables you to rapidly develop and debug your Serverless workers. --- # Source: https://docs.runpod.io/serverless/endpoints/endpoint-configurations.md # Endpoint settings > Reference guide for all Serverless endpoint settings and parameters. This guide details the configuration options available for Runpod Serverless endpoints. These settings control how your endpoint scales, how it utilizes hardware, and how it manages request lifecycles. Some settings can only be updated after deploying your endpoint. For instructions on modifying an existing endpoint, see [Edit an endpoint](/serverless/endpoints/overview#edit-an-endpoint). ## General configuration ### Endpoint name The name assigned to your endpoint helps you identify it within the Runpod console. This is a local display name and does not impact the endpoint ID used for API requests. ### Endpoint type Select the architecture that best fits your application's traffic pattern: **Queue based endpoints** utilize a built-in queueing system to manage requests. They are ideal for asynchronous tasks, batch processing, and long-running jobs where immediate synchronous responses are not required. These endpoints provide guaranteed execution and automatic retries for failed requests. Queue based endpoints are implemented using [handler functions](/serverless/workers/handler-functions). **Load balancing endpoints** route traffic directly to available workers, bypassing the internal queue. They are designed for high-throughput, low-latency applications that require synchronous request/response cycles, such as real-time inference or custom REST APIs. For implementation details, see [Load balancing endpoints](/serverless/load-balancing/overview). ### GPU configuration This setting determines the hardware tier your workers will utilize. You can select multiple GPU categories to create a prioritized list. Runpod attempts to allocate the first category in your list. If that hardware is unavailable, it automatically falls back to the subsequent options. Selecting multiple GPU types significantly improves endpoint availability during periods of high demand. | **GPU type(s)** | **Memory** | **Flex cost per second** | **Active cost per second** | **Description** | | ----------------------- | ---------- | ------------------------ | -------------------------- | ----------------------------------------------------- | | A4000, A4500, RTX 4000 | 16 GB | \$0.00016 | \$0.00011 | The most cost-effective for small models. | | 4090 PRO | 24 GB | \$0.00031 | \$0.00021 | Extreme throughput for small-to-medium models. | | L4, A5000, 3090 | 24 GB | \$0.00019 | \$0.00013 | Great for small-to-medium sized inference workloads. | | L40, L40S, 6000 Ada PRO | 48 GB | \$0.00053 | \$0.00037 | Extreme inference throughput on LLMs like Llama 3 7B. | | A6000, A40 | 48 GB | \$0.00034 | \$0.00024 | A cost-effective option for running big models. | | H100 PRO | 80 GB | \$0.00116 | \$0.00093 | Extreme throughput for big models. | | A100 | 80 GB | \$0.00076 | \$0.00060 | High throughput GPU, yet still very cost-effective. | | H200 PRO | 141 GB | \$0.00155 | \$0.00124 | Extreme throughput for huge models. | | B200 | 180 GB | \$0.00240 | \$0.00190 | Maximum throughput for huge models. | ## Worker scaling ### Active workers This setting defines the minimum number of workers that remain warm and ready to process requests at all times. Setting this to 1 or higher eliminates cold starts for the initial wave of requests. Active workers incur charges even when idle, but they receive a 20-30% discount compared to on-demand workers. ### Max workers This setting controls the maximum number of concurrent instances your endpoint can scale to. This acts as a safety limit for costs and a cap on concurrency. We recommend setting your max worker count approximately 20% higher than your expected maximum concurrency. This buffer allows for smoother scaling during traffic spikes. ### GPUs per worker This defines how many GPUs are assigned to a single worker instance. The default is 1. When choosing between multiple lower-tier GPUs or fewer high-end GPUs, you should generally prioritize high-end GPUs with lower GPU count per worker when possible. ### Auto-scaling type This setting determines the logic used to scale workers up and down. **Queue delay** scaling adds workers based on wait times. If requests sit in the queue for longer than a defined threshold (default 4 seconds), the system provisions new workers. This is best for workloads where slight delays are acceptable in exchange for higher utilization. **Request count** scaling is more aggressive. It adjusts worker numbers based on the total volume of pending and active work. The formula used is `Math.ceil((requestsInQueue + requestsInProgress) / scalerValue)`. Use a scaler value of 1 for maximum responsiveness, or increase it to scale more conservatively. This strategy is recommended for LLM workloads or applications with frequent, short requests. ## Lifecycle and timeouts ### Idle timeout The idle timeout determines how long a worker remains active after completing a request before shutting down. While a worker is idle, you are billed for the time, but the worker remains "warm," allowing it to process subsequent requests immediately. The default is 5 seconds. ### Execution timeout The execution timeout acts as a failsafe to prevent runaway jobs from consuming infinite resources. It specifies the maximum duration a single job is allowed to run before being forcibly terminated. We strongly recommend keeping this enabled. The default is 600 seconds (10 minutes), and it can be extended up to 24 hours. ### Job TTL (time-to-live) This setting defines how long a job request remains valid in the queue before expiring. If a worker does not pick up the job within this window, the system discards it. The default is 24 hours. ## Performance features ### FlashBoot FlashBoot reduces cold start times by retaining the state of worker resources shortly after they spin down. This allows the system to "revive" a worker much faster than a standard fresh boot. FlashBoot is most effective on endpoints with consistent traffic, where workers frequently cycle between active and idle states. ### Model The Model field allows you to select from a list of [cached models](/serverless/endpoints/model-caching). When selected, Runpod schedules your workers on host machines that already have these large model files pre-loaded. This significantly reduces the time required to load models during worker initialization. ## Advanced settings ### Data centers You can restrict your endpoint to specific geographical regions. For maximum reliability and availability, we recommend allowing all data centers. Restricting this list decreases the pool of available GPUs your endpoint can draw from. ### Network volumes [Network volumes](/storage/network-volumes) provide persistent storage that survives worker restarts. While they enable data sharing between workers, they introduce network latency and restrict your endpoint to the specific data center where the volume resides. Use network volumes only if your workload specifically requires shared persistence or datasets larger than the container limit. ### CUDA version selection This filter ensures your workers are scheduled on host machines with compatible drivers. While you should select the version your code requires, we recommend also selecting all newer versions. CUDA is generally backward compatible, and selecting a wider range of versions increases the pool of available hardware. ### Expose HTTP/TCP ports Enabling this option exposes the public IP and port of the worker, allowing for direct external communication. This is required for applications that need persistent connections, such as WebSockets. --- # Source: https://docs.runpod.io/api-reference/endpoints/PATCH/endpoints/endpointId.md # Source: https://docs.runpod.io/api-reference/endpoints/GET/endpoints/endpointId.md # Source: https://docs.runpod.io/api-reference/endpoints/DELETE/endpoints/endpointId.md # Source: https://docs.runpod.io/api-reference/endpoints/PATCH/endpoints/endpointId.md # Source: https://docs.runpod.io/api-reference/endpoints/GET/endpoints/endpointId.md # Source: https://docs.runpod.io/api-reference/endpoints/DELETE/endpoints/endpointId.md # Delete an endpoint > Delete an endpoint. ## OpenAPI ````yaml DELETE /endpoints/{endpointId} openapi: 3.0.3 info: title: Runpod API description: Public Rest API for managing Runpod programmatically. version: 0.1.0 contact: name: help url: https://contact.runpod.io/hc/requests/new email: help@runpod.io servers: - url: https://rest.runpod.io/v1 security: - ApiKey: [] tags: - name: docs description: This documentation page. - name: pods description: Manage Pods. - name: endpoints description: Manage Serverless endpoints. - name: network volumes description: Manage Runpod network volumes. - name: templates description: Manage Pod and Serverless templates. - name: container registry auths description: >- Manage authentication for container registries such as dockerhub to use private images. - name: billing description: Retrieve billing history for your Runpod account. externalDocs: description: Find out more about Runpod. url: https://runpod.io paths: /endpoints/{endpointId}: delete: tags: - endpoints summary: Delete an endpoint description: Delete an endpoint. operationId: DeleteEndpoint parameters: - name: endpointId in: path description: Endpoint ID to delete. required: true schema: type: string responses: '204': description: Endpoint successfully deleted. '400': description: Invalid endpoint ID. '401': description: Unauthorized. components: securitySchemes: ApiKey: type: http scheme: bearer bearerFormat: Bearer ```` --- # Source: https://docs.runpod.io/sdks/python/endpoints.md # Source: https://docs.runpod.io/sdks/javascript/endpoints.md # Source: https://docs.runpod.io/sdks/go/endpoints.md # Source: https://docs.runpod.io/api-reference/endpoints/POST/endpoints.md # Source: https://docs.runpod.io/api-reference/endpoints/GET/endpoints.md # Source: https://docs.runpod.io/api-reference/billing/GET/billing/endpoints.md # Source: https://docs.runpod.io/sdks/python/endpoints.md # Source: https://docs.runpod.io/sdks/javascript/endpoints.md # Source: https://docs.runpod.io/sdks/go/endpoints.md # Source: https://docs.runpod.io/api-reference/endpoints/POST/endpoints.md # Source: https://docs.runpod.io/api-reference/endpoints/GET/endpoints.md # Source: https://docs.runpod.io/api-reference/billing/GET/billing/endpoints.md # Serverless billing history > Retrieve billing information about your Serverless endpoints. ## OpenAPI ````yaml GET /billing/endpoints openapi: 3.0.3 info: title: Runpod API description: Public Rest API for managing Runpod programmatically. version: 0.1.0 contact: name: help url: https://contact.runpod.io/hc/requests/new email: help@runpod.io servers: - url: https://rest.runpod.io/v1 security: - ApiKey: [] tags: - name: docs description: This documentation page. - name: pods description: Manage Pods. - name: endpoints description: Manage Serverless endpoints. - name: network volumes description: Manage Runpod network volumes. - name: templates description: Manage Pod and Serverless templates. - name: container registry auths description: >- Manage authentication for container registries such as dockerhub to use private images. - name: billing description: Retrieve billing history for your Runpod account. externalDocs: description: Find out more about Runpod. url: https://runpod.io paths: /billing/endpoints: get: tags: - billing summary: Serverless billing history description: Retrieve billing information about your Serverless endpoints. operationId: EndpointBilling parameters: - name: bucketSize in: query schema: type: string enum: - hour - day - week - month - year default: day description: >- The length of each billing time bucket. The billing time bucket is the time range over which each billing record is aggregated. - name: dataCenterId in: query schema: type: array example: - EU-RO-1 - CA-MTL-1 default: - EU-RO-1 - CA-MTL-1 - EU-SE-1 - US-IL-1 - EUR-IS-1 - EU-CZ-1 - US-TX-3 - EUR-IS-2 - US-KS-2 - US-GA-2 - US-WA-1 - US-TX-1 - CA-MTL-3 - EU-NL-1 - US-TX-4 - US-CA-2 - US-NC-1 - OC-AU-1 - US-DE-1 - EUR-IS-3 - CA-MTL-2 - AP-JP-1 - EUR-NO-1 - EU-FR-1 - US-KS-3 - US-GA-1 items: type: string enum: - EU-RO-1 - CA-MTL-1 - EU-SE-1 - US-IL-1 - EUR-IS-1 - EU-CZ-1 - US-TX-3 - EUR-IS-2 - US-KS-2 - US-GA-2 - US-WA-1 - US-TX-1 - CA-MTL-3 - EU-NL-1 - US-TX-4 - US-CA-2 - US-NC-1 - OC-AU-1 - US-DE-1 - EUR-IS-3 - CA-MTL-2 - AP-JP-1 - EUR-NO-1 - EU-FR-1 - US-KS-3 - US-GA-1 description: >- Filter to endpoints located in any of the provided Runpod data centers. The data center IDs are listed in the response of the /pods endpoint. - name: endpointId in: query schema: type: string example: jpnw0v75y3qoql description: Filter to a specific endpoint. - name: endTime in: query schema: type: string format: date-time example: '2023-01-31T23:59:59Z' description: The end date of the billing period to retrieve. - name: gpuTypeId in: query schema: type: array items: type: string enum: - NVIDIA GeForce RTX 4090 - NVIDIA A40 - NVIDIA RTX A5000 - NVIDIA GeForce RTX 5090 - NVIDIA H100 80GB HBM3 - NVIDIA GeForce RTX 3090 - NVIDIA RTX A4500 - NVIDIA L40S - NVIDIA H200 - NVIDIA L4 - NVIDIA RTX 6000 Ada Generation - NVIDIA A100-SXM4-80GB - NVIDIA RTX 4000 Ada Generation - NVIDIA RTX A6000 - NVIDIA A100 80GB PCIe - NVIDIA RTX 2000 Ada Generation - NVIDIA RTX A4000 - NVIDIA RTX PRO 6000 Blackwell Server Edition - NVIDIA H100 PCIe - NVIDIA H100 NVL - NVIDIA L40 - NVIDIA B200 - NVIDIA GeForce RTX 3080 Ti - NVIDIA RTX PRO 6000 Blackwell Workstation Edition - NVIDIA GeForce RTX 3080 - NVIDIA GeForce RTX 3070 - AMD Instinct MI300X OAM - NVIDIA GeForce RTX 4080 SUPER - Tesla V100-PCIE-16GB - Tesla V100-SXM2-32GB - NVIDIA RTX 5000 Ada Generation - NVIDIA GeForce RTX 4070 Ti - NVIDIA RTX 4000 SFF Ada Generation - NVIDIA GeForce RTX 3090 Ti - NVIDIA RTX A2000 - NVIDIA GeForce RTX 4080 - NVIDIA A30 - NVIDIA GeForce RTX 5080 - Tesla V100-FHHL-16GB - NVIDIA H200 NVL - Tesla V100-SXM2-16GB - NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition - NVIDIA A5000 Ada - Tesla V100-PCIE-32GB - NVIDIA RTX A4500 - NVIDIA A30 - NVIDIA GeForce RTX 3080TI - Tesla T4 - NVIDIA RTX A30 example: NVIDIA GeForce RTX 4090 description: Filter to endpoints with the provided GPU type attached. - name: grouping in: query schema: type: string enum: - endpointId - podId - gpuTypeId default: endpointId description: Group the billing records by the provided field. - name: imageName in: query schema: type: string example: runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 description: Filter to endpoints created with the provided image. - name: startTime in: query schema: type: string format: date-time example: '2023-01-01T00:00:00Z' description: The start date of the billing period to retrieve. - name: templateId in: query schema: type: string example: 30zmvf89kd description: Filter to endpoints created from the provided template. responses: '200': description: Successful operation. content: application/json: schema: $ref: '#/components/schemas/BillingRecords' components: schemas: BillingRecords: type: array items: type: object properties: amount: type: number description: The amount charged for the group for the billing period, in USD. example: 100.5 diskSpaceBilledGb: type: integer description: >- The amount of disk space billed for the billing period, in gigabytes (GB). Does not apply to all resource types. example: 50 endpointId: type: string description: If grouping by endpoint ID, the endpoint ID of the group. gpuTypeId: type: string description: If grouping by GPU type ID, the GPU type ID of the group. podId: type: string description: If grouping by Pod ID, the Pod ID of the group. time: type: string format: date-time description: The start of the period for which the billing record applies. example: '2023-01-01T00:00:00Z' timeBilledMs: type: integer description: >- The total time billed for the billing period, in milliseconds. Does not apply to all resource types. example: 3600000 securitySchemes: ApiKey: type: http scheme: bearer bearerFormat: Bearer ```` --- # Source: https://docs.runpod.io/serverless/vllm/environment-variables.md # Source: https://docs.runpod.io/serverless/development/environment-variables.md # Source: https://docs.runpod.io/pods/templates/environment-variables.md # Source: https://docs.runpod.io/serverless/vllm/environment-variables.md # Source: https://docs.runpod.io/serverless/development/environment-variables.md # Source: https://docs.runpod.io/pods/templates/environment-variables.md # Environment variables > Learn how to use environment variables in Runpod Pods for configuration, security, and automation Environment variables in are key-value pairs that you can configure for your Pods. They are accessible within your containerized application and provide a flexible way to pass configuration settings, secrets, and runtime information to your application without hardcoding them into your code or container image. ## What are environment variables? Environment variables are dynamic values that exist in your Pod's operating system environment. They act as a bridge between your Pod's configuration and your running applications, allowing you to: * Store configuration settings that can change between deployments. * Pass sensitive information like API keys securely. * Access Pod metadata and system information. * Configure application behavior without modifying code. * Reference [Runpod secrets](/pods/templates/secrets) in your containers. When you set an environment variable in your Pod configuration, it becomes available to all processes running inside that Pod's container. ## Why use environment variables in Pods? Environment variables offer several key benefits for containerized applications: **Configuration flexibility**: Environment variables allow you to easily change application settings without modifying your code or rebuilding your container image. For example, you can set different model names, API endpoints, or processing parameters for different deployments: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} # Set a model name that your application can read MODEL_NAME=llama-2-7b-chat API_ENDPOINT=https://api.example.com/v1 MAX_BATCH_SIZE=32 ``` **Security**: Sensitive information such as API keys, database passwords, or authentication tokens can be injected as environment variables, keeping them out of your codebase and container images. This prevents accidental exposure in version control or public repositories. **Pod metadata access**: Runpod provides [predefined environment variables](#runpod-provided-environment-variables) that give your application information about the Pod's environment, resources, and network configuration. This metadata helps your application adapt to its runtime environment automatically. **Automation and scaling**: Environment variables make it easier to automate deployments and scale applications. You can use the same container image with different settings for development, staging, and production environments by simply changing the environment variables. ## Setting environment variables You can configure up to 50 environment variables per Pod through the Runpod interface when creating or editing a Pod or Pod template. ### During Pod creation 1. When creating a new Pod, click **Edit Template** and expand the **Environment Variables** section. 2. Click **Add Environment Variable**. 3. Enter the **Key** (variable name) and **Value**. 4. Repeat for additional variables. ### In Pod templates 1. Navigate to [My Templates](https://www.console.runpod.io/user/templates) in the console. 2. Create a new template or edit an existing one. 3. Add environment variables in the **Environment Variables** section. 4. Save the template for reuse across multiple Pods. ### Using secrets For sensitive data, you can reference [Runpod secrets](/pods/templates/secrets) in environment variables using the `RUNPOD_SECRET_` prefix. For example: ```bash API_KEY={{ RUNPOD_SECRET_my_api_key }} DATABASE_PASSWORD={{ RUNPOD_SECRET_db_password }} ``` ## Updating environment variables To update environment variables in your Pod: 1. Navigate to the [Pods](https://www.console.runpod.io/user/pods) section of the console. 2. Click the three dots to the right of the Pod you want to update and select **Edit Pod**. 3. Click the **Environment Variables** section to expand it. 4. Add or update the environment variables. 5. Click **Save** to save your changes. When you update environment variables your Pod will restart, clearing all data outside of your volume mount path (`/workspace` by default). ## Accessing environment variables Once set, environment variables are available to your application through standard operating system mechanisms. ### Verify variables in your Pod You can check if environment variables are properly set by running commands in your Pod's terminal: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} # View a specific environment variable echo $ENVIRONMENT_VARIABLE_KEY # List all environment variables env # Search for specific variables env | grep RUNPOD ``` ### Accessing variables in your applications Different programming languages provide various ways to access environment variables: **Python:** ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import os model_name = os.environ.get('MODEL_NAME', 'default-model') api_key = os.environ['API_KEY'] # Raises error if not found ``` **Node.js:** ```javascript theme={"theme":{"light":"github-light","dark":"github-dark"}} const modelName = process.env.MODEL_NAME || 'default-model'; const apiKey = process.env.API_KEY; ``` **Bash scripts:** ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} #!/bin/bash MODEL_NAME=${MODEL_NAME:-"default-model"} echo "Using model: $MODEL_NAME" ``` ## Runpod-provided environment variables Runpod automatically sets several environment variables that provide information about your Pod's environment and resources: | Variable | Description | | --------------------- | ---------------------------------------------------------------------------- | | `RUNPOD_POD_ID` | The unique identifier assigned to your Pod. | | `RUNPOD_DC_ID` | The identifier of the data center where your Pod is located. | | `RUNPOD_POD_HOSTNAME` | The hostname of the server where your Pod is running. | | `RUNPOD_GPU_COUNT` | The total number of GPUs available to your Pod. | | `RUNPOD_CPU_COUNT` | The total number of CPUs available to your Pod. | | `RUNPOD_PUBLIC_IP` | The publicly accessible IP address for your Pod, if available. | | `RUNPOD_TCP_PORT_22` | The public port mapped to SSH (port 22) for your Pod. | | `RUNPOD_ALLOW_IP` | A comma-separated list of IP addresses or ranges allowed to access your Pod. | | `RUNPOD_VOLUME_ID` | The ID of the network volume attached to your Pod. | | `RUNPOD_API_KEY` | The API key for making Runpod API calls scoped specifically to this Pod. | | `PUBLIC_KEY` | The SSH public keys authorized to access your Pod over SSH. | | `CUDA_VERSION` | The version of CUDA installed in your Pod environment. | | `PYTORCH_VERSION` | The version of PyTorch installed in your Pod environment. | | `PWD` | The current working directory inside your Pod. | ## Common use cases Environment variables are particularly useful for: **Model configuration**: Configure which AI models to load without rebuilding your container: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} MODEL_NAME=gpt-3.5-turbo MODEL_PATH=/workspace/models MAX_TOKENS=2048 TEMPERATURE=0.7 ``` **Service configuration**: Set up web services and APIs with flexible configuration: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} API_PORT=8000 DEBUG_MODE=false LOG_LEVEL=INFO CORS_ORIGINS=https://myapp.com,https://staging.myapp.com ``` **Database and external service connections**: Connect to databases and external APIs securely: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} DATABASE_URL=postgresql://user:pass@host:5432/db REDIS_URL=redis://localhost:6379 API_BASE_URL=https://api.external-service.com ``` **Development vs. production settings**: Use different configurations for different environments: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} ENVIRONMENT=production CACHE_ENABLED=true RATE_LIMIT=1000 MONITORING_ENABLED=true ``` **Port management**: When configuring symmetrical ports, your application can discover assigned ports through environment variables. This is particularly useful for services that need to know their external port numbers. For more details, see [Expose ports](/pods/configuration/expose-ports#symmetrical-port-mapping). ## Best practices Follow these guidelines when working with environment variables: **Security considerations**: * **Never hardcode secrets**: Use [Runpod secrets](/pods/templates/secrets) for sensitive data. * **Use descriptive names**: Choose clear, descriptive variable names like `DATABASE_PASSWORD` instead of `DB_PASS`. **Configuration management**: * **Provide defaults**: Use default values for non-critical configuration options. * **Document your variables**: Maintain clear documentation of what each environment variable does. * **Group related variables**: Use consistent prefixes for related configuration (for example, `DB_HOST`, `DB_PORT`, `DB_NAME`). **Application design**: * **Validate required variables**. Check that critical environment variables are set before your application starts. If the variable is missing, your application should throw an error or return a clear message indicating which variable is not set. This helps prevent unexpected failures and makes debugging easier. * **Type conversion**: Convert string environment variables to appropriate types (such as integers or booleans) in your application. * **Configuration validation**: Validate environment variable values to catch configuration errors early. --- # Source: https://docs.runpod.io/serverless/development/error-handling.md # Error handling > Implement robust error handling for your Serverless endpoints. Robust error handling is essential for production Serverless endpoints. It prevents your workers from crashing silently and ensures that useful error messages are returned to the client, making debugging significantly easier. ## Basic error handling The simplest way to handle errors is to wrap your handler logic in a `try...except` block. This ensures that even if your logic fails, the worker remains stable and returns a readable error message. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod def handler(job): try: input = job["input"] # Replace process_input() with your own handler logic result = process_input(input) return {"output": result} except KeyError as e: return {"error": f"Missing required input: {str(e)}"} except Exception as e: return {"error": f"An error occurred: {str(e)}"} runpod.serverless.start({"handler": handler}) ``` ## Structured error responses For more complex applications, you should return consistent error objects. This allows the client consuming your API to programmatically handle different types of errors, such as [validation failures](/serverless/development/validation) versus unexpected server errors. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import traceback def handler(job): try: # Validate input if "prompt" not in job.get("input", {}): return { "error": { "type": "ValidationError", "message": "Missing required field: prompt", "details": "The 'prompt' field is required in the input object" } } prompt = job["input"]["prompt"] result = process_prompt(prompt) return {"output": result} except ValueError as e: return { "error": { "type": "ValueError", "message": str(e), "details": "Invalid input value provided" } } except Exception as e: # Log the full traceback for debugging print(f"Unexpected error: {traceback.format_exc()}") return { "error": { "type": "UnexpectedError", "message": "An unexpected error occurred", "details": str(e) } } runpod.serverless.start({"handler": handler}) ``` ## Timeout handling You can also set an execution timeout in your [endpoint settings](/serverless/endpoints/endpoint-configurations#execution-timeout) to automatically terminate a job after a certain amount of time. For long-running operations, you may want to implement timeout logic within your handler. This prevents a job from hanging indefinitely and consuming credits without producing a result. ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import signal class TimeoutError(Exception): pass def timeout_handler(signum, frame): raise TimeoutError("Operation timed out") def handler(job): try: # Set a timeout (e.g., 60 seconds) signal.signal(signal.SIGALRM, timeout_handler) signal.alarm(60) # Your processing code here result = long_running_operation(job["input"]) # Cancel the timeout signal.alarm(0) return {"output": result} except TimeoutError: return {"error": "Request timed out after 60 seconds"} except Exception as e: return {"error": str(e)} runpod.serverless.start({"handler": handler}) ``` --- # Source: https://docs.runpod.io/tutorials/sdks/python/101/error.md # Implementing error handling and logging in Runpod serverless functions This tutorial will guide you through implementing effective error handling and logging in your Runpod serverless functions. Proper error handling ensures that your serverless functions can handle unexpected situations gracefully. This prevents crashes and ensures that your application can continue running smoothly, even if some parts encounter issues. We'll create a simulated image classification model to demonstrate these crucial practices, ensuring your serverless deployments are robust and maintainable. ## Setting up your Serverless Function Let's break down the process of creating our error-aware image classifier into steps. ### Import required libraries and Set Up Logging First, import the necessary libraries and Set up the Runpod logger: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod from runpod import RunPodLogger import time import random log = RunPodLogger() ``` ### Create Helper Functions Define functions to simulate various parts of the image classification process: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} def load_model(): """Simulate loading a machine learning model.""" log.info("Loading image classification model...") time.sleep(2) # Simulate model loading time return "ImageClassifier" def preprocess_image(image_url): """Simulate image preprocessing.""" log.debug(f"Preprocessing image: {image_url}") time.sleep(0.5) # Simulate preprocessing time return f"Preprocessed_{image_url}" def classify_image(model, preprocessed_image): """Simulate image classification.""" classes = ["cat", "dog", "bird", "fish", "horse"] confidence = random.uniform(0.7, 0.99) predicted_class = random.choice(classes) return predicted_class, confidence ``` These functions: 1. Simulate model loading, logging the process 2. Preprocess images, with debug logging 3. Classify images, returning random results for demonstration ### Create the Main Handler Function Now, let's create the main handler function with error handling and logging: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} def handler(job): job_input = job["input"] images = job_input.get("images", []) # Process mock logs if provided for job_log in job_input.get("mock_logs", []): log_level = job_log.get("level", "info").lower() if log_level == "debug": log.debug(job_log["message"]) elif log_level == "info": log.info(job_log["message"]) elif log_level == "warn": log.warn(job_log["message"]) elif log_level == "error": log.error(job_log["message"]) try: # Load model model = load_model() log.info("Model loaded successfully") results = [] for i, image_url in enumerate(images): # Preprocess image preprocessed_image = preprocess_image(image_url) # Classify image predicted_class, confidence = classify_image(model, preprocessed_image) result = { "image": image_url, "predicted_class": predicted_class, "confidence": round(confidence, 2), } results.append(result) # Log progress progress = (i + 1) / len(images) * 100 log.info(f"Progress: {progress:.2f}%") # Simulate some processing time time.sleep(random.uniform(0.5, 1.5)) log.info("Classification completed successfully") # Simulate error if mock_error is True if job_input.get("mock_error", False): raise Exception("Mock error") return {"status": "success", "results": results} except Exception as e: log.error(f"An error occurred: {str(e)}") return {"error": str(e)} ``` This handler: 1. Processes mock logs to demonstrate different logging levels 2. Uses a try-except block to handle potential errors 3. Simulates image classification with progress logging 4. Returns results or an error message based on the execution ### Start the Serverless Function Finally, start the Runpod serverless function: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} runpod.serverless.start({"handler": handler}) ``` ## Complete code example Here's the full code for our error-aware image classification simulator: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod from runpod import RunPodLogger import time import random log = RunPodLogger() def load_model(): """Simulate loading a machine learning model.""" log.info("Loading image classification model...") time.sleep(2) # Simulate model loading time return "ImageClassifier" def preprocess_image(image_url): """Simulate image preprocessing.""" log.debug(f"Preprocessing image: {image_url}") time.sleep(0.5) # Simulate preprocessing time return f"Preprocessed_{image_url}" def classify_image(model, preprocessed_image): """Simulate image classification.""" classes = ["cat", "dog", "bird", "fish", "horse"] confidence = random.uniform(0.7, 0.99) predicted_class = random.choice(classes) return predicted_class, confidence def handler(job): job_input = job["input"] images = job_input.get("images", []) # Process mock logs if provided for job_log in job_input.get("mock_logs", []): log_level = job_log.get("level", "info").lower() if log_level == "debug": log.debug(job_log["message"]) elif log_level == "info": log.info(job_log["message"]) elif log_level == "warn": log.warn(job_log["message"]) elif log_level == "error": log.error(job_log["message"]) try: # Load model model = load_model() log.info("Model loaded successfully") results = [] for i, image_url in enumerate(images): # Preprocess image preprocessed_image = preprocess_image(image_url) # Classify image predicted_class, confidence = classify_image(model, preprocessed_image) result = { "image": image_url, "predicted_class": predicted_class, "confidence": round(confidence, 2), } results.append(result) # Log progress progress = (i + 1) / len(images) * 100 log.info(f"Progress: {progress:.2f}%") # Simulate some processing time time.sleep(random.uniform(0.5, 1.5)) log.info("Classification completed successfully") # Simulate error if mock_error is True if job_input.get("mock_error", False): raise Exception("Mock error") return {"status": "success", "results": results} except Exception as e: log.error(f"An error occurred: {str(e)}") return {"error": str(e)} runpod.serverless.start({"handler": handler}) ``` ## Testing Your Serverless Function To test your function locally, use this command: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python your_script.py --test_input '{ "input": { "images": ["image1.jpg", "image2.jpg", "image3.jpg"], "mock_logs": [ {"level": "info", "message": "Starting job"}, {"level": "debug", "message": "Debug information"}, {"level": "warn", "message": "Warning: low disk space"}, {"level": "error", "message": "Error: network timeout"} ], "mock_error": false } }' ``` ### Understanding the output When you run the test, you'll see output similar to this: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} { "status": "success", "results": [ { "image": "image1.jpg", "predicted_class": "cat", "confidence": 0.85 }, { "image": "image2.jpg", "predicted_class": "dog", "confidence": 0.92 }, { "image": "image3.jpg", "predicted_class": "bird", "confidence": 0.78 } ] } ``` This output demonstrates: 1. Successful processing of all images 2. Random classification results for each image 3. The overall success status of the job ## Conclusion You've now created a serverless function using Runpod's Python SDK that demonstrates effective error handling and logging practices. This approach ensures that your serverless functions are robust, maintainable, and easier to debug. To further enhance this application, consider: * Implementing more specific error types and handling * Adding more detailed logging for each step of the process * Exploring Runpod's advanced logging features and integrations Runpod's serverless library provides a powerful foundation for building reliable, scalable applications with comprehensive error management and logging capabilities. --- # Source: https://docs.runpod.io/pods/configuration/expose-ports.md # Expose ports > Learn how to make your Pod services accessible from the internet using HTTP proxy and TCP port forwarding Runpod provides flexible options for exposing your Pod services to the internet. This guide explains how to configure port exposure for different use cases and requirements. ## Understanding port mapping When exposing services from your Pod, it's important to understand that the publicly accessible port usually differs from your internal service port. This mapping ensures security and allows multiple Pods to coexist on the same infrastructure. For example, if you run a web API inside your Pod on port 4000 like this: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} uvicorn main:app --host 0.0.0.0 --port 4000 ``` The external port users connect to will be different, depending on your chosen exposure method. ## HTTP access via Runpod proxy Runpod's HTTP proxy provides the easiest way to expose web services from your Pod. This method works well for REST APIs, web applications, and any HTTP-based service. ### Configure external HTTP ports To configure HTTP ports during Pod deployment, click **Edit Template** and add a comma-separated list of ports to the **Expose HTTP Ports (Max 10)** field. To configure HTTP ports for an existing Pod, navigate to the [Pod page](https://www.console.runpod.io/pods), expand your Pod, click the hamburger menu on the bottom-left, select **Edit Pod**, then add your port(s) to the **Expose HTTP Ports (Max 10)** field. You can also configure HTTP ports for a Pod template in the [My Templates](https://www.console.runpod.io/user/templates) section of the console. ### Access your service Once your Pod is running and your service is active, access it using the proxy URL format: ```bash https://[POD_ID]-[INTERNAL_PORT].proxy.runpod.net ``` Replace `[POD_ID]` with your Pod's unique identifier and `[INTERNAL_PORT]` with your service's internal port. For example: * Pod ID: `abc123xyz` * Internal port: `4000` * Access URL: `https://abc123xyz-4000.proxy.runpod.net` A Pod that's listed as **Running** in the console (with a green dot in the Pod UI) may not be ready to use. The best way to check if your Pod is ready is by checking the **Telemetry** tab in the Pod details page in the Runpod console. If a Pod is receiving telemetry, it should be ready to use, but individual services (JupyterLab, HTTP services, etc.) may take a few minutes to start up. ### Proxy limitations and behavior The HTTP proxy route includes several intermediaries that affect connection behavior: ```bash User → Cloudflare → Runpod Load Balancer → Your Pod ``` This architecture introduces important limitations: * **100-second timeout**: Cloudflare enforces a maximum connection time of 100 seconds. If your service doesn't respond within this time, the connection closes with a `524` error. * **HTTPS only**: All connections are secured with HTTPS, even if your internal service uses HTTP. * **Public accessibility**: Your service becomes publicly accessible. While the Pod ID provides some obscurity, implement proper authentication in your application. Design your application with these constraints in mind. For long-running operations, consider: * Implementing progress endpoints that return status updates. * Using background job queues with status polling. * Breaking large operations into smaller chunks. * Returning immediate responses with job IDs for later retrieval. ## TCP access via public IP Pods do not support UDP connections. If your application relies on UDP, you'll need to modify your application to use TCP-based communication instead. For services requiring direct TCP connections, lower latency, or protocols other than HTTP, use TCP port exposure with public IP addresses. ### Configure TCP ports In your Pod or template configuration, follow the same steps as for [HTTP ports](#configure-external-http-ports), but add ports to the **Expose TCP Ports** field. This enables direct TCP forwarding with a public IP address. ### Find your connection details After your Pod starts, check the **Connect** menu to find your assigned public IP and external port mapping under **Direct TCP Ports**. For example: ```bash TCP port 213.173.109.39:13007 -> :22 ``` Public IP addresses may change for Community Cloud Pods if your Pod is migrated or restarted, but they should remain stable for Secure Cloud Pods. External port mappings change whenever your Pod resets. ## Symmetrical port mapping Some applications require the external port to match the internal port. Runpod supports this through a special configuration syntax. ### Requesting symmetrical ports To request symmetrical mapping, specify port numbers above 70000 in your TCP configuration. These aren't valid port numbers, but signal Runpod to allocate matching internal and external ports. After Pod creation, check the **Connect** menu to see which symmetrical ports were assigned under **Direct TCP Ports**. ### Accessing port mappings programmatically Your application can discover assigned ports through environment variables. For example, if you specify `70000` and `70001` in your Pod configuration, you could use the following commands to retrieve the assigned ports: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} echo $RUNPOD_TCP_PORT_70000 echo $RUNPOD_TCP_PORT_70001 ``` You can use these environment variables in your application configuration to automatically adapt to assigned ports: **Python example:** ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import os # Get the assigned port or use a default port = os.environ.get('RUNPOD_TCP_PORT_70000', '8000') app.run(host='0.0.0.0', port=int(port)) ``` **Configuration file example:** ```yaml theme={"theme":{"light":"github-light","dark":"github-dark"}} server: host: 0.0.0.0 port: ${RUNPOD_TCP_PORT_70000} ``` ## Best practices When exposing ports from your Pods, follow these guidelines for security and reliability: ### Security considerations * **Implement authentication**: Both HTTP proxy and TCP access make your services publicly accessible. Always implement proper authentication and authorization in your applications. * **Use HTTPS for sensitive data**: While the proxy automatically provides HTTPS, TCP connections do not. Implement TLS in your application when handling sensitive data over TCP. * **Validate input**: Public endpoints are targets for malicious traffic. Implement robust input validation and rate limiting. ### Performance optimization * **Choose the right method**: Use HTTP proxy for web services and TCP for everything else. The proxy adds latency but provides automatic HTTPS and load balancing. * **Handle timeouts gracefully**: Design your application to work within the 100-second proxy timeout or use TCP for long-running connections. * **Monitor your services**: Implement health checks and monitoring to ensure your exposed services remain accessible. ### Configuration tips * **Document your ports**: Maintain clear documentation of which services run on which ports, especially in complex deployments. * **Use templates**: Define port configurations in templates for consistent deployments across multiple Pods. * **Test thoroughly**: Verify your port configurations work correctly before deploying production workloads. ## Common use cases Different types of applications benefit from different exposure methods: * **Web APIs and REST services**: Use HTTP proxy for automatic HTTPS and simple configuration. * **WebSocket applications**: TCP exposure often works better for persistent connections that might exceed timeout limits. * **Database connections**: Use TCP with proper security measures. Consider using Runpod's global networking for internal-only databases. * **Development environments**: HTTP proxy works well for web-based IDEs and development servers. ## Troubleshooting Try these fixes if you're having issues with port exposure: * **Service not accessible via proxy**: Ensure your service binds to `0.0.0.0` (all interfaces) not just `localhost` or `127.0.0.1`. * **524 timeout errors**: If your service takes longer than 100 seconds to respond, consider using TCP or restructuring your application for faster responses. * **Connection refused**: Verify your service is running and listening on the correct port inside the Pod. * **Port already in use**: Check that no other services in your Pod are using the same port. * **Unstable connections**: For Community Cloud Pods, implement reconnection logic to handle IP address changes. ## Next steps Once you've exposed your ports, consider: * Setting up [SSH access](/pods/configuration/use-ssh) for secure Pod administration. * Implementing [global networking](/pods/networking) for secure Pod-to-Pod communication. * Configuring health checks and monitoring for your exposed services. --- # Source: https://docs.runpod.io/tutorials/pods/fine-tune-llm-axolotl.md # Fine tune an LLM with Axolotl on Runpod Runpod provides an easier method to fine tune an LLM. For more information, see [Fine tune a model](/fine-tune/). [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) is a tool that simplifies the process of training large language models (LLMs). It provides a streamlined workflow that makes it easier to fine-tune AI models on various configurations and architectures. When combined with Runpod's GPU resources, Axolotl enables you to harness the power needed to efficiently train LLMs. In addition to its user-friendly interface, Axolotl offers a comprehensive set of YAML examples covering a wide range of LLM families, such as LLaMA2, Gemma, LLaMA3, and Jamba. These examples serve as valuable references, helping users understand the role of each parameter and guiding them in making appropriate adjustments for their specific use cases. It is highly recommended to explore [these examples](https://github.com/OpenAccess-AI-Collective/axolotl/tree/main/examples) to gain a deeper understanding of the fine-tuning process and optimize the model's performance according to your requirements. In this tutorial, we'll walk through the steps of training an LLM using Axolotl on Runpod and uploading your model to Hugging Face. ## Setting up the environment Fine-tuning a large language model (LLM) can take up a lot of compute power. Because of this, we recommend fine-tuning using Runpod's GPUs. To do this, you'll need to create a Pod, specify a container, then you can begin training. A Pod is an instance on a GPU or multiple GPUs that you can use to run your training job. You also specify a Docker image like `axolotlai/axolotl-cloud:main-latest` that you want installed on your Pod. 1. Login to [Runpod](https://www.console.runpod.io/console/home) and deploy your Pod. 1. Select **Deploy**. 2. Select an appropriate GPU instance. 3. Specify the `axolotlai/axolotl-cloud:main-latest` image as your Template image. 4. Select your GPU count. 5. Select **Deploy**. For optimal compatibility, we recommend using A100, H100, V100, or RTX 3090 Pods for Axolotl fine-tuning. Now that you have your Pod set up and running, connect to it over secure SSH. 2. Wait for the Pod to startup, then connect to it using secure SSH. 1. On your Pod page, select **Connect**. 2. Copy the secure SSH string and paste it into your terminal on your machine. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} ssh @ -p -i string ``` Follow the on-screen prompts to SSH into your Pod. You should use the SSH connection to your Pod as it is a persistent connection. The Web UI terminal shouldn't be relied on for long-running processes, as it will be disconnected after a period of inactivity. With the Pod deployed and connected via SSH, we're ready to move on to preparing our dataset. ## Preparing the dataset The dataset you provide to your LLM is crucial, as it's the data your model will learn from during fine-tuning. You can make your own dataset that will then be used to fine-tune your own model, or you can use a pre-made one. To continue, use either a [local dataset](#using-a-local-dataset) or one [stored on Hugging Face](#using-a-hugging-face-dataset). ### Using a local dataset To use a local dataset, you'll need to transfer it to your Runpod instance. You can do this using Runpod CLI to securely transfer files from your local machine to the one hosted by Runpod. All Pods automatically come with `runpodctl` installed with a Pod-scoped API key. ### To send a file Run the following on the computer that has the file you want to send, enter the following command: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl send data.jsonl ``` ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} Sending 'data.jsonl' (5 B) Code is: 8338-galileo-collect-fidel On the other computer run runpodctl receive 8338-galileo-collect-fidel ``` ### To receive a file The following is an example of a command you'd run on your Runpod machine. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} runpodctl receive 8338-galileo-collect-fidel ``` The following is an example of an output. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} Receiving 'data.jsonl' (5 B) Receiving (<-149.36.0.243:8692) data.jsonl 100% |████████████████████| ( 5/ 5B, 0.040 kB/s) ``` Once the local dataset is transferred to your Runpod machine, we can proceed to updating requirements and preprocessing the data. ### Using a Hugging Face dataset If your dataset is stored on Hugging Face, you can specify its path in the `lora.yaml` configuration file under the `datasets` key. Axolotl will automatically download the dataset during the preprocessing step. Review the [configuration file](https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/docs/config.qmd) in detail and make any adjustments to your file as needed. Now update your Runpod machine's requirement and preprocess your data. ## Updating requirements and preprocessing data Before you can start training, you'll need to install the necessary dependencies and preprocess our dataset. In some cases, your Pod will not contain the Axolotl repository. To add the required repository, run the following commands and then continue with the tutorial: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} git clone https://github.com/OpenAccess-AI-Collective/axolotl cd axolotl ``` 1. Install the required packages by running the following commands: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip3 install packaging ninja pip3 install -e '.[flash-attn,deepspeed]' ``` 2. Update the `lora.yml` configuration file with your dataset path and other training settings. You can use any of the examples in the `examples` folder as a starting point. 3. Preprocess your dataset by running: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml ``` This step converts your dataset into a format that Axolotl can use for training. Having updated the requirements and preprocessed the data, we're now ready to fine-tune the LLM. ### Fine-tuning the LLM With your environment set up and data preprocessed, you're ready to start fine-tuning the LLM. Run the following command to fine-tune the base model. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml ``` This will start the training process using the settings specified in your `lora.yml` file. The training time will depend on factors like your model size, dataset size, and GPU type. Be prepared to wait a while, especially for larger models and datasets. Once training is complete, we can move on to testing our fine-tuned model through inference. ### Inference Once training is complete, you can test your fine-tuned model using the inference script: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml --lora_model_dir="./lora-out" ``` This will allow you to interact with your model and see how it performs on new prompts. If you're satisfied with your model's performance, you can merge the LoRA weights with the base model using the `merge_lora` script. ### Merge the model You will merge the base model with the LoRA weights using the `merge_lora` script. Run the following command: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python3 -m axolotl.cli.merge_lora examples/openllama-3b/lora.yml \ --lora_model_dir="./lora-out" ``` This creates a standalone model that doesn't require LoRA layers for inference. ### Upload the model to Hugging Face Finally, you can share your fine-tuned model with others by uploading it to Hugging Face. 1. Login to Hugging Face through the CLI: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} huggingface-cli login ``` 2. Create a new model repository on Hugging Face using `huggingface-cli`. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} huggingface-cli repo create your_model_name --type model ``` 3. Then, use the `huggingface-cli upload` command to upload your merged model to the repository. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} huggingface-cli upload your_model_name path_to_your_model ``` With our model uploaded to Hugging Face, we've successfully completed the fine-tuning process and made our work available for others to use and build upon. ## Conclusion By following these steps and leveraging the power of Axolotl and Runpod, you can efficiently fine-tune LLMs to suit your specific use cases. The combination of Axolotl's user-friendly interface and Runpod's GPU resources makes the process more accessible and streamlined. Remember to explore the provided YAML examples to gain a deeper understanding of the various parameters and make appropriate adjustments for your own projects. With practice and experimentation, you can unlock the full potential of fine-tuned LLMs and create powerful, customized AI models. --- # Source: https://docs.runpod.io/fine-tune.md > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Fine-tune a model This guide explains how to fine-tune a large language model using Runpod and Axolotl. You'll learn how to select a base model, configure your training environment, and start the fine-tuning process. ## Prerequisites Before you begin fine-tuning, ensure you have: * A Runpod account with access to the Fine Tuning feature * (Optional) A Hugging Face access token for gated models ## Select a base model To start fine-tuning, you'll need to choose a base model from Hugging Face: 1. Navigate to the **Fine Tuning** section in the sidebar 2. Enter the Hugging Face model ID in the **Base Model** field * Example: `NousResearch/Meta-Llama-3-8B` 3. For gated models (requiring special access): 1. Generate a Hugging Face token with appropriate permissions 2. Add your token in the designated field ## Select a dataset You can choose a dataset from Hugging Face for fine-tuning: 1. Browse available datasets on [Hugging Face](https://huggingface.co/datasets?task_categories=task_categories:text-generation\&sort=trending) 2. Enter your chosen dataset identifier in the **Dataset** field * Example: `tatsu-lab/alpaca` ## Deploy the fine-tuning pod Follow these steps to set up your training environment: 1. Click **Deploy the Fine Tuning Pod** 2. Select a GPU instance based on your model's requirements: * Smaller models: Choose GPUs with less memory * Larger models/datasets: Choose GPUs with higher memory capacity 3. Monitor the system logs for deployment progress 4. Wait for the success message: `"You've successfully configured your training environment!"` ## Connect to your training environment After your pod is deployed and active, you can connect using any of these methods: 1. Go to your Fine Tuning pod dashboard 2. Click **Connect** and choose your preferred connection method: * **Jupyter Notebook**: Browser-based notebook interface * **Web Terminal**: Browser-based terminal * **SSH**: Local machine terminal connection To use SSH, add your public SSH key in your account settings. The system automatically adds your key to the pod's `authorized_keys` file. ## Configure your environment Your training environment includes this directory structure in `/workspace/fine-tuning/`: * `examples/`: Sample configurations and scripts * `outputs/`: Training results and model outputs * `config.yaml`: Training parameters for your model The system generates an initial `config.yaml` based on your selected base model and dataset. ## Review and modify the configuration The `config.yaml` file controls your fine-tuning parameters. Here's how to customize it: 1. Open the configuration file: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} nano config.yaml ``` 2. Review and adjust the parameters based on your specific use case Here's an example configuration with common parameters: ```yaml theme={"theme":{"light":"github-light","dark":"github-dark"}} base_model: NousResearch/Meta-Llama-3.1-8B # Model loading settings load_in_8bit: false load_in_4bit: false strict: false # Dataset configuration datasets: - path: tatsu-lab/alpaca type: alpaca dataset_prepared_path: last_run_prepared val_set_size: 0.05 output_dir: ./outputs/out # Training parameters sequence_len: 8192 sample_packing: true pad_to_sequence_len: true # Weights & Biases logging (optional) wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model: # Training optimization gradient_accumulation_steps: 8 micro_batch_size: 1 num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 2e-5 # Additional settings train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 100 evals_per_epoch: 2 eval_table_size: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: pad_token: <|end_of_text|> ``` The `config.yaml` file contains all hyperparameters needed for fine-tuning. You may need to iterate on these settings to achieve optimal results. For more configuration examples, visit the [Axolotl examples repository](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples). ## Start the fine-tuning process Once your configuration is ready, follow these steps: 1. Start the training process: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} axolotl train config.yaml ``` 2. Monitor the training progress in your terminal ## Push your model to Hugging Face After completing the fine-tuning process, you can share your model: 1. Log in to Hugging Face: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} huggingface-cli login ``` 2. Create a new repository on Hugging Face if needed 3. Upload your model: ```sh theme={"theme":{"light":"github-light","dark":"github-dark"}} huggingface-cli upload / ./output ``` Replace `` with your Hugging Face username and `` with your desired model name. ## Additional resources For more information about fine-tuning with Axolotl, see: * [Axolotl Documentation](https://github.com/OpenAccess-AI-Collective/axolotl) --- # Source: https://docs.runpod.io/tutorials/serverless/generate-sdxl-turbo.md > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Generate images with SDXL Turbo When it comes to working with an AI image generator, the speed in which images are generated is often a compromise. Runpod's Serverless Workers allows you to host [SDXL Turbo](https://huggingface.co/stabilityai/sdxl-turbo) from Stability AI, which is a fast text-to-image model. In this tutorial, you'll build a web application, where you'll leverage Runpod's Serverless Worker and Endpoint to return an image from a text-based input. By the end of this tutorial, you'll have an understanding of running a Serverless Worker on Runpod and sending requests to an Endpoint to receive a response. You can proceed with the tutorial by following the build steps outlined here or skip directly to [Deploy a Serverless Endpoint](#deploy-a-serverless-endpoint) section. ## Prerequisites This section presumes you have an understanding of the terminal and can execute commands from your terminal. Before starting this tutorial, you'll need access to: ### Runpod To continue with this quick start, you'll need access to the following from Runpod: * Runpod account * Runpod API Key ### Docker To build your Docker image, you'll need access to the following: * Docker installed * Docker account You can also use the prebuilt image from [runpod/sdxl-turbo](https://hub.docker.com/r/runpod/sdxl-turbo). ### GitHub To clone the `worker-sdxl-turbo` repo, you'll need access to the following: * Git installed * Permissions to clone GitHub repos With the prerequisites covered, get started by building and pushing a Docker image to a container registry. ## Build and push your Docker image This step will walk you through building and pushing your Docker image to your container registry. This is useful to building custom images for your use case. If you prefer, you can use the prebuilt image from [runpod/sdxl-turbo](https://hub.docker.com/r/runpod/sdxl-turbo) instead of building your own. Building a Docker image allows you to specify the container when creating a Worker. The Docker image includes the [Runpod Handler](https://github.com/runpod-workers/worker-sdxl-turbo/blob/main/src/handler.py) which is how you provide instructions to Worker to perform some task. In this example, the Handler is responsible for taking a Job and returning a base 64 instance of the image. 1. Clone the [Runpod Worker SDXL Turbo](https://github.com/runpod-workers/worker-sdxl-turbo) repository: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} gh repo clone runpod-workers/worker-sdxl-turbo ``` 2. Navigate to the root of the cloned repo: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} cd worker-sdxl-turbo ``` 3. Build the Docker image: ```html theme={"theme":{"light":"github-light","dark":"github-dark"}} docker build --tag /: . ``` 4. Push your container registry: ```html theme={"theme":{"light":"github-light","dark":"github-dark"}} docker push /: ``` Now that you've pushed your container registry, you're ready to deploy your Serverless Endpoint to Runpod. ## Deploy a Serverless Endpoint The container you just built will run on the Worker you're creating. Here, you will configure and deploy the Endpoint. This will include the GPU and the storage needed for your Worker. This step will walk you through deploying a Serverless Endpoint to Runpod. 1. Log in to the [Runpod Serverless console](https://www.console.runpod.io/serverless). 2. Select **+ New Endpoint**. 3. Provide the following: 1. Endpoint name. 2. Select a GPU. 3. Configure the number of Workers. 4. (optional) Select **FlashBoot**. 5. (optional) Select a template. 6. Enter the name of your Docker image. * For example, `runpod/sdxl-turbo:dev`. 7. Specify enough memory for your Docker image. 4. Select **Deploy**. Now, let's send a request to your Endpoint. ## Send a request Now that our Endpoint is deployed, you can begin interacting with and integrating it into an application. Before writing the logic into the application, ensure that you can interact with the Endpoint by sending a request. Run the following command: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} curl -X POST "https://api.runpod.ai/v2/${YOUR_ENDPOINT}/runsync" \ -H "accept: application/json" \ -H "content-type: application/json" \ -H "authorization: ${YOUR_API_KEY}" \ -d '{ "input": { "prompt": "${YOUR_PROMPT}", "num_inference_steps": 25, "refiner_inference_steps": 50, "width": 1024, "height": 1024, "guidance_scale": 7.5, "strength": 0.3, "seed": null, "num_images": 1 } }' ``` ```JSON theme={"theme":{"light":"github-light","dark":"github-dark"}} { "delayTime": 168, "executionTime": 251, "id": "sync-fa542d19-92b2-47d0-8e58-c01878f0365d-u1", "output": "BASE_64", "status": "COMPLETED" } ``` Export your variable names in your terminal session or replace them in line: * `YOUR_ENDPOINT`: The name of your Endpoint. * `YOUD_API_KEY`: The API Key required with read and write access. * `YOUR_PROMPT`: The custom prompt passed to the model. You should see the output. The status will return `PENDING`; but quickly change to `COMPLETED` if you query the Job Id. ## Integrate into your application Now, let's create a web application that can take advantage of writing a prompt and generate an image based on that prompt. While these steps are specific to JavaScript, you can make requests against your Endpoint in any language of your choice. To do that, you'll create two files: * `index.html`: The frontend to your web application. * `script.js`: The backend which handles the logic behind getting the prompt and the call to the Serverless Endpoint. The HTML file (`index.html`) sets up a user interface with an input box for the prompt and a button to trigger the image generation. ```HTML index.html theme={"theme":{"light":"github-light","dark":"github-dark"}} Runpod AI Image Generator

Runpod AI Image Generator

```
The JavaScript file (`script.js`) contains the `generateImage` function. This function reads the user's input, makes a POST request to the Runpod serverless endpoint, and handles the response. The server's response is expected to contain the base64-encoded image, which is then displayed on the webpage. ```JavaScript script.js theme={"theme":{"light":"github-light","dark":"github-dark"}} async function generateImage() { const prompt = document.getElementById("promptInput").value; if (!prompt) { alert("Please enter a prompt!"); return; } const options = { method: "POST", headers: { accept: "application/json", "content-type": "application/json", // Replace with your actual API key authorization: "Bearer ${process.env.REACT_APP_AUTH_TOKEN}", }, body: JSON.stringify({ input: { prompt: prompt, num_inference_steps: 25, width: 1024, height: 1024, guidance_scale: 7.5, seed: null, num_images: 1, }, }), }; try { const response = await fetch( // Replace with your actual Endpoint Id "https://api.runpod.ai/v2/${process.env.REACT_APP_ENDPOINT_ID}/runsync", options, ); const data = await response.json(); if (data && data.output) { const imageBase64 = data.output; const imageUrl = `data:image/jpeg;base64,${imageBase64}`; document.getElementById("imageResult").innerHTML = `Generated Image`; } else { alert("Failed to generate image"); } } catch (error) { console.error("Error:", error); alert("Error generating image"); } } ```
1. Replace `${process.env.REACT_APP_AUTH_TOKEN}` with your actual API key. 2. Replace `${process.env.REACT_APP_ENDPOINT_ID}` with your specific Endpoint. 3. Open `index.html` in a web browser, enter a prompt, and select **Generate Image** to see the result. This web application serves as a basic example of how to interact with your Runpod serverless endpoint from a client-side application. It can be expanded or modified to fit more complex use cases. ## Run a server You can run a server through Python or by opening the `index.html` page in your browser. Run the following command to start a server locally using Python. ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python -m http.server 8000 ``` **Open the File in a Browser** Open the `index.html` file directly in your web browser. 1. Navigate to the folder where your `index.html` file is located. 2. Right-click on the file and choose **Open with** and select your preferred web browser. * Alternatively, you can drag and drop the `index.html` file into an open browser window. * The URL will look something like `file:///path/to/your/index.html`. --- # Source: https://docs.runpod.io/tutorials/sdks/python/101/generator.md > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Building a streaming handler for text to speech simulation This tutorial will guide you through creating a serverless function using Runpod's Python SDK that simulates a text-to-speech (TTS) process. We'll use a streaming handler to stream results incrementally, demonstrating how to handle long-running tasks efficiently in a serverless environment. A streaming handler in the Runpod's Python SDK is a special type of function that allows you to iterate over a sequence of values lazily. Instead of returning a single value and exiting, a streaming handler yields multiple values, one at a time, pausing the function's state between each yield. This is particularly useful for handling large data streams or long-running tasks, as it allows the function to produce and return results incrementally, rather than waiting until the entire process is complete. ## Setting up your Serverless Function Let's break down the process of creating our TTS simulator into steps. ### Import required libraries First, import the necessary libraries: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import time import re import json import sys ``` ### Create the TTS Simulator Define a function that simulates the text-to-speech process: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} def text_to_speech_simulator(text, chunk_size=5, delay=0.5): words = re.findall(r'\w+', text) for i in range(0, len(words), chunk_size): chunk = words[i:i+chunk_size] audio_chunk = f"Audio chunk {i//chunk_size + 1}: {' '.join(chunk)}" time.sleep(delay) # Simulate processing time yield audio_chunk ``` This function: 1. Splits the input text into words 2. Processes the words in chunks 3. Simulates a delay for each chunk 4. Yields each "audio chunk" as it's processed ### Create the Streaming Handler Now, let's create the main handler function: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} def streaming_handler(job): job_input = job['input'] text = job_input.get('text', "Welcome to Runpod's text-to-speech simulator!") chunk_size = job_input.get('chunk_size', 5) delay = job_input.get('delay', 0.5) print(f"TTS Simulator | Starting job {job['id']}") print(f"Processing text: {text}") for audio_chunk in text_to_speech_simulator(text, chunk_size, delay): yield {"status": "processing", "chunk": audio_chunk} yield {"status": "completed", "message": "Text-to-speech conversion completed"} ``` This handler: 1. Extracts parameters from the job input 2. Logs the start of the job 3. Calls the TTS simulator and yields each chunk as it's processed using a streaming handler 4. Yields a completion message when finished ### Set up the main function Finally, set up the main execution block: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} if __name__ == "__main__": if "--test_input" in sys.argv: # Code for local testing (see full example) else: runpod.serverless.start({"handler": streaming_handler, "return_aggregate_stream": True}) ``` This block allows for both local testing and deployment as a Runpod serverless function. ## Complete code example Here's the full code for our serverless TTS simulator using a streaming handler: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} import runpod import time import re import json import sys def text_to_speech_simulator(text, chunk_size=5, delay=0.5): words = re.findall(r'\w+', text) for i in range(0, len(words), chunk_size): chunk = words[i:i+chunk_size] audio_chunk = f"Audio chunk {i//chunk_size + 1}: {' '.join(chunk)}" time.sleep(delay) # Simulate processing time yield audio_chunk def streaming_handler(job): job_input = job['input'] text = job_input.get('text', "Welcome to Runpod's text-to-speech simulator!") chunk_size = job_input.get('chunk_size', 5) delay = job_input.get('delay', 0.5) print(f"TTS Simulator | Starting job {job['id']}") print(f"Processing text: {text}") for audio_chunk in text_to_speech_simulator(text, chunk_size, delay): yield {"status": "processing", "chunk": audio_chunk} yield {"status": "completed", "message": "Text-to-speech conversion completed"} if __name__ == "__main__": if "--test_input" in sys.argv: test_input_index = sys.argv.index("--test_input") if test_input_index + 1 < len(sys.argv): test_input_json = sys.argv[test_input_index + 1] try: job = json.loads(test_input_json) gen = streaming_handler(job) for item in gen: print(json.dumps(item)) except json.JSONDecodeError: print("Error: Invalid JSON in test_input") else: print("Error: --test_input requires a JSON string argument") else: runpod.serverless.start({"handler": streaming_handler, "return_aggregate_stream": True}) ``` ## Testing your Serverless Function To test your function locally, use this command: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} python your_script.py --test_input ' { "input": { "text": "This is a test of the Runpod text-to-speech simulator. It processes text in chunks and simulates audio generation.", "chunk_size": 4, "delay": 1 }, "id": "local_test" }' ``` ### Understanding the output When you run the test, you'll see output similar to this: ```json theme={"theme":{"light":"github-light","dark":"github-dark"}} {"status": "processing", "chunk": "Audio chunk 1: This is a test"} {"status": "processing", "chunk": "Audio chunk 2: of the Runpod"} {"status": "processing", "chunk": "Audio chunk 3: text to speech"} {"status": "processing", "chunk": "Audio chunk 4: simulator It processes"} {"status": "processing", "chunk": "Audio chunk 5: text in chunks"} {"status": "processing", "chunk": "Audio chunk 6: and simulates audio"} {"status": "processing", "chunk": "Audio chunk 7: generation"} {"status": "completed", "message": "Text-to-speech conversion completed"} ``` This output demonstrates: 1. The incremental processing of text chunks 2. Real-time status updates for each chunk 3. A completion message when the entire text is processed ## Conclusion You've now created a serverless function using Runpod's Python SDK that simulates a streaming text-to-speech process. This example showcases how to handle long-running tasks and stream results incrementally in a serverless environment. To further enhance this application, consider: * Implementing a real text-to-speech model * Adding error handling for various input types * Exploring Runpod's documentation for advanced features like GPU acceleration for audio processing Runpod's serverless library provides a powerful foundation for building scalable, efficient applications that can process and stream data in real-time without the need to manage infrastructure. --- # Source: https://docs.runpod.io/serverless/vllm/get-started.md # Source: https://docs.runpod.io/get-started.md # Source: https://docs.runpod.io/serverless/vllm/get-started.md # Source: https://docs.runpod.io/get-started.md > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Deploy your first Pod > Run code on a remote GPU in minutes. Follow this guide to learn how to create an account, deploy your first GPU Pod, and use it to execute code remotely. ## Step 1: Create an account Start by creating a Runpod account: 1. [Sign up here](https://www.console.runpod.io/signup). 2. Verify your email address. 3. Set up two-factor authentication (recommended for security). Planning to share compute resources with your team? You can convert your personal account to a team account later. See [Manage accounts](/get-started/manage-accounts) for details. ## Step 2: Deploy a Pod Now that you've created your account, you're ready to deploy your first Pod: 1. Open the [Pods page](https://www.console.runpod.io/pods) in the web interface. 2. Click the **Deploy** button. 3. Select **A40** from the list of graphics cards. 4. In the field under **Pod Name**, enter the name **quickstart-pod**. 5. Keep all other fields (Pod Template, GPU Count, and Instance Pricing) on their default settings. 6. Click **Deploy On-Demand** to deploy and start your Pod. You'll be redirected back to the Pods page after a few seconds. If you haven't set up payments yet, you'll be prompted to add a payment method and purchase credits for your account. ## Step 3: Explore the Pod detail pane On the [Pods page](https://www.console.runpod.io/pods), click the Pod you just created to open the Pod detail pane. The pane opens onto the **Connect** tab, where you'll find options for connecting to your Pod so you can execute code on your GPU (after it's done initializing). Take a minute to explore the other tabs: * **Details**: Information about your Pod, such as hardware specs, pricing, and storage. * **Telemetry**: Realtime utilization metrics for your Pod's CPU, memory, and storage. * **Logs**: Logs streamed from your container (including stdout from any applications inside) and the Pod management system. * **Template Readme**: Details about the template your Pod is running. Your Pod is configured with the latest official Runpod PyTorch template. ## Step 4: Execute code on your Pod with JupyterLab 1. Go back to the **Connect** tab, and under **HTTP Services**, click **Jupyter Lab** to open a JupyterLab workspace on your Pod. 2. Under **Notebook**, select **Python 3 (ipykernel)**. 3. Type `print("Hello, world!")` in the first line of the notebook. 4. Click the play button to run your code. And that's it—congrats! You just ran your first line of code on Runpod. ## Step 5: Clean up To avoid incurring unnecessary charges, follow these steps to clean up your Pod resources: 1. Return to the [Pods page](https://www.console.runpod.io/pods) and click your running Pod. 2. Click the **Stop** button (pause icon) to stop your Pod. 3. Click **Stop Pod** in the modal that opens to confirm. You'll still be charged a small amount for storage on stopped Pods (\$0.20 per GB per month). If you don't need to retain any data on your Pod, you should terminate it completely. To terminate your Pod: 1. Click the **Terminate** button (trash icon). 2. Click **Terminate Pod** to confirm. Terminating a Pod permanently deletes all data that isn't stored in a [network volume](/storage/network-volumes). Be sure that you've saved any data you might need to access again. To learn more about how storage works, see the [Pod storage overview](/pods/storage/types). ## Next steps Now that you've learned the basics, you're ready to: * [Generate API keys](/get-started/api-keys) for programmatic resource management. * Experiment with various options for [accessing and managing Runpod resources](/get-started/connect-to-runpod). * Learn how to [choose the right Pod](/pods/choose-a-pod) for your workload. * Review options for [Pod pricing](/pods/pricing). * [Explore our tutorials](/tutorials/introduction/overview) for specific AI/ML use cases. * Start building production-ready applications with [Runpod Serverless](/serverless/overview). ## Need help? * Join the Runpod community [on Discord](https://discord.gg/cUpRmau42V). * Submit a support request using our [contact page](https://contact.runpod.io/hc/requests/new). * Reach out to us via [email](mailto:help@runpod.io). --- # Source: https://docs.runpod.io/serverless/workers/github-integration.md > Fetch the complete documentation index at: https://docs.runpod.io/llms.txt > Use this file to discover all available pages before exploring further. # Deploy workers from GitHub > Speed up development by deploying workers directly from GitHub. Runpod's GitHub integration simplifies your workflow by pulling your code and Dockerfile from GitHub, building the container image, storing it in Runpod's secure container registry, and deploying it to your endpoint.