# Lambda Cloud > - [Using Multi-Instance GPU (MIG)](using-mig/) --- # Introduction - Source: https://docs.lambda.ai/education/ --- # Introduction [#](#introduction) - [Using Multi-Instance GPU (MIG)](using-mig/) ## Generative AI (GAI) [#](#generative-ai-gai) - [How to serve the FLUX.1 prompt-to-image models using Lambda Cloud on-demand instances](generative-ai/flux-prompt-to-image/) - [Fine-tuning the Mochi video generation model on GH200](fine-tune-mochi-gh200/) ## Large language models (LLMs) [#](#large-language-models-llms) - [Deploying a Llama 3 inference endpoint](large-language-models/deploying-a-llama-3-inference-endpoint/) - [Deploying Llama 3.2 3B in a Kubernetes (K8s) cluster](large-language-models/k8s-ollama-llama-3-2/) - [Using KubeAI to deploy Nous Research's Hermes 3 and other LLMs](large-language-models/kubeai-hermes-3/) - [Serving Llama 3.1 405B on a Lambda 1-Click Cluster](large-language-models/serving-llama-3-1-405b/) - [Serving the Llama 3.1 8B and 70B models using Lambda Cloud on-demand instances](large-language-models/serving-llama-3-1-docker/) - [Running DeepSeek-R1 70B using Ollama](large-language-models/deepseek-r1-ollama/) - [Running Nemotron 3 Nano using vLLM](large-language-models/deploying-nemotron-3-nano/) ## Linux usage and system administration [#](#linux-usage-and-system-administration) - [Basic Linux commands and system administration](linux-usage/basic-linux-commands-and-system-administration/) - [Configuring Software RAID](linux-usage/configuring-software-raid/) - [Lambda Stack and recovery images](linux-usage/lambda-stack-and-recovery-images/) - [Troubleshooting and debugging](linux-usage/troubleshooting-and-debugging/) - [Using the Lambda bug report to troubleshoot your system](linux-usage/using-the-lambda-bug-report-to-troubleshoot-your-system/) - [Using the nvidia-bug-report.log file to troubleshoot your system](linux-usage/using-the-nvidia-bug-report.log-file-to-troubleshoot-your-system/) ## Programming [#](#programming) - [Virtual environments and Docker containers](programming/virtual-environments-containers/) - [Running Hugging Face Transformers and Diffusers on an NVIDIA GH200 instance](running-huggingface-diffusers-transformers-gh200/) ## Scheduling and orchestration [#](#scheduling-and-orchestration) - [Orchestrating AI workloads with dstack](scheduling-and-orchestration/orchestrating-workloads-with-dstack/) - [Using SkyPilot to deploy a Kubernetes cluster](scheduling-and-orchestration/skypilot-deploy-kubernetes/) ## Benchmarking [#](#benchmarking) - [Running a PyTorch®-based benchmark on an NVIDIA GH200 instance](running-benchmark-gh200/) --- # Fine-tuning the Mochi video generation model on GH200 - Source: https://docs.lambda.ai/education/fine-tune-mochi-gh200/ --- [generative ai](../../tags/#tag:generative-ai)[on-demand cloud](../../tags/#tag:on-demand-cloud) # Fine-tuning the Mochi video generation model on GH200 [#](#fine-tuning-the-mochi-video-generation-model-on-gh200) This guide helps you get started fine-tuning [Genmo's Mochi video generation model](https://www.genmo.ai/)using a [Lambda On-Demand Cloud](https://lambda.ai/service/gpu-cloud)GH200 instance. ## Launch your GH200 instance [#](#launch-your-gh200-instance) Begin by launching a GH200 instance: - In the Lambda Cloud console, navigate to the [SSH keys page](https://cloud.lambda.ai/ssh-keys), click **Add SSH Key **, and then add or generate a SSH key. - Navigate to the [Instances page](https://cloud.lambda.ai/instances)and click **Launch Instance **. - Follow the steps in the instance launch wizard. - *Instance type: *Select **1x GH200 (96 GB). ** - *Region: *Select an available region. - *Filesystem: *Don't attach a filesystem. - *SSH key: *Use the key you created in step 1. - Click **Launch instance **. - Review the EULAs. If you agree to them, click **I agree to the above **to start launching your new instance. Instances can take up to five minutes to fully launch. ## Install dependencies [#](#install-dependencies) - Install the dependencies needed for this guide by running: ```bash `[](#__codelineno-0-1)git clone https://github.com/genmoai/mochi.git [](#__codelineno-0-2)cd mochi-tune [](#__codelineno-0-3)pip install --upgrade pip setuptools wheel packaging [](#__codelineno-0-4)pip install -e . --no-build-isolation [](#__codelineno-0-5)pip install moviepy==1.0.3 pillow==9.5.0 av==13.1.0 [](#__codelineno-0-6)sudo apt -y install bc ` ``` --- # How to serve the FLUX.1 prompt-to-image models using Lambda Cloud on-demand instances - Source: https://docs.lambda.ai/education/generative-ai/flux-prompt-to-image/ --- [generative ai](../../../tags/#tag:generative-ai)[stable diffusion](../../../tags/#tag:stable-diffusion) # How to serve the FLUX.1 prompt-to-image models using Lambda Cloud on-demand instances [#](#how-to-serve-the-flux1-prompt-to-image-models-using-lambda-cloud-on-demand-instances) This tutorial shows you how to use [Lambda Cloud](https://lambda.ai/service/gpu-cloud)A100 and H100 on-demand instances to download and serve a [FLUX.1 prompt-to-image model](https://blackforestlabs.ai/). The model will be served as a [Gradio app](https://www.gradio.app/)accessible with a link you can share. Note You can download and serve the [FLUX.1 [schnell] ](https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell)model without a [Hugging Face](https://huggingface.co/)account. However, to download and serve the [FLUX.1 [dev] ](https://huggingface.co/black-forest-labs/FLUX.1-dev)model, you need a Hugging Face account and a [User Access Token](https://huggingface.co/docs/hub/en/security-tokens). You also need to review and accept the model license agreement. --- # Running DeepSeek-R1 70B using Ollama - Source: https://docs.lambda.ai/education/large-language-models/deepseek-r1-ollama/ --- [llm](../../../tags/#tag:llm) # Running DeepSeek-R1 70B using Ollama [#](#running-deepseek-r1-70b-using-ollama) ## Introduction [#](#introduction) This short tutorial teaches how to use a [Lambda Cloud on-demand instance](https://lambda.ai/service/gpu-cloud)to run the [DeepSeek-R1 distilled Llama 3.3 70B model using Ollama](https://ollama.com/library/deepseek-r1)in a Docker container. Since [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit)is preinstalled as part of [Lambda Stack](https://lambda.ai/lambda-stack-deep-learning-software)on all on-demand instances, Docker containers can use instance GPUs without any additional configuration. ## Prerequisites [#](#prerequisites) For this tutorial, it's recommended that you use an instance type with more than 40 GB of VRAM, for example, a 1x GH200 or 1x H100. ## Download Ollama and start the Ollama server [#](#download-ollama-and-start-the-ollama-server) - Log into your instance using SSH or by opening a terminal in Jupyter Lab. - Download Ollama and start the Ollama server: ```bash `[](#__codelineno-0-1)sudo docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ` ``` --- # Deploying a Llama 3 inference endpoint - Source: https://docs.lambda.ai/education/large-language-models/deploying-a-llama-3-inference-endpoint/ --- # Deploying a Llama 3 inference endpoint [#](#deploying-a-llama-3-inference-endpoint) Meta's Llama 3 large language models (LLMs) feature generative text models recognized for their state-of-the-art performance in common industry benchmarks. This guide covers the deployment of a Meta [Llama 3](https://llama.meta.com/llama3/)inference endpoint using Lambda [On-Demand Cloud](https://lambda.ai/service/gpu-cloud). This tutorial uses the Llama 3 models hosted by [Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3-8B). The model is available in 8B and 70B sizes: Model Size Characteristics 8B (8 billion parameters) More efficient and accessible, suitable for tasks where resources are constrained. The 8B model requires a 1x A100 or H100 GPU node. 70B (70 billion parameters) Superior performance and capabilities ideal for complex or high-stakes applications. The 70B model requires an 8x A100 or H100 GPU node. ### Prerequisites [#](#prerequisites) This tutorial assumes the following prerequisites: - Lambda On-Demand Cloud instances appropriate for the Llama 3 model size you want to run. - Model 8B ( [meta-llama/Meta-Llama-3-8B)](https://huggingface.co/meta-llama/Meta-Llama-3-8B)requires 1x A100 or H100 GPU node. - Model 70B ( [meta-llama/Meta-Llama-3-70B)](https://huggingface.co/meta-llama/Meta-Llama-3-70B)requires 8x A100 or H100 GPU nodes. - A Hugging Face [user account](https://huggingface.co/join). - An approved [Hugging Face user access token](https://huggingface.co/docs/hub/en/security-tokens)that includes repository read permissions for the meta-llama-3 model repository you wish to use. JSON outputs in this tutorial are formatted using [jq](https://jqlang.github.io/jq/). ### Set up the inference point [#](#set-up-the-inference-point) Once you have the appropriate Lambda On-Demand Cloud instances and Hugging Face permissions, begin by setting up an inference point. - [Launch your Lambda On-Demand Cloud instance](https://cloud.lambda.ai/sign-up). - [Add or generate an SSH key](../../../public-cloud/console/#add-generate-and-delete-ssh-keys)to access the instance. - SSH into your instance. - Create a dedicated python environment. Llama3 8b Llama3 70b --- # Deploying NVIDIA Nemotron 3 Nano using vLLM - Source: https://docs.lambda.ai/education/large-language-models/deploying-nemotron-3-nano/ --- [llm](../../../tags/#tag:llm) # Deploying NVIDIA Nemotron 3 Nano using vLLM [#](#deploying-nvidia-nemotron-3-nano-using-vllm) [NVIDIA Nemotron 3 Nano](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16)uses a [Mamba state layer](https://github.com/state-spaces/mamba)on top of a transformer Mixture-of-Experts (MoE) backbone. This architecture yields up to four times the output-tokens-per-energy as Nemotron Nano 2 while still scoring at or above the current open frontier models on SWE-Bench, GPQA Diamond, and IFBench. This efficiency gain comes from introducing a user-supplied *thinking budget *parameter that caps per-request reasoning length, allowing users to tune latency and accuracy curves without touching the core model. This document provides an overview of Nemotron 3 Nano, and then shows you how to deploy and benchmark the model on Lambda Cloud. ## Model details [#](#model-details) ### Overview [#](#overview) - *Name: *`NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` - *Author: *NVIDIA - *Architecture: *MoE - *Core capabilities: *Fast reasoning, long-context understanding, robust coding performance, robust tool-use performance - *License: *[NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) ### Specifications [#](#specifications) - *Context window: *1,000,000 tokens - *Weights-on-disk: *58GB - *Idle VRAM usage: *120GB ### Recommended Lambda VRAM configuration [#](#recommended-lambda-vram-configuration) - *Instances: ***1x B200 (180 GB SXM6) **or **1x H100 (80 GB SXM5) **(minimum recommended 4096 sequence length) - *1-Click Clusters: ***16x B200 (180 GB) **(max sequence length with FP8 KV cache) ## Deployment and benchmarking [#](#deployment-and-benchmarking) ### Deploying to a single-GPU instance [#](#deploying-to-a-single-gpu-instance) You can run NVIDIA Nemotron 3 Nano on any instance type that has enough VRAM to comfortably support it. For example, to deploy Nemotron 3 Nano on a **1x B200 (180 GB SXM6) **instance running the `Lambda Stack 22.04`image: - In the Lambda Cloud Console, navigate to the [Instances page](https://cloud.lambda.ai/instances)and click **Launch instance **. A modal appears. - Follow the steps in the instance launch wizard. Select the following options: - *Instance type: *Select **1x B200 (180 GB SXM6) **. - *Base image: *Select **Lambda Stack 22.04 **. - *Security: *Create a new firewall ruleset called `nemotron-3-nano`and add a rule to allow incoming traffic to port `TCP/8000`. - After your instance launches, find the row for your instance, and then click **Launch **in the **Cloud IDE **column. JupyterLab opens in a new window. - In JupyterLab's **Launcher **tab, under **Other **, click **Terminal **to open a new terminal. - In your terminal, install `uv`, set up a Python virtual environment, and then begin serving Nemotron 3 Nano with vLLM. ```bash `[](#__codelineno-0-1)curl -LsSf https://astral.sh/uv/install.sh | sh [](#__codelineno-0-2)uv venv --python 3.12 --seed [](#__codelineno-0-3)source .venv/bin/activate [](#__codelineno-0-4)uv pip install vllm --torch-backend=auto [](#__codelineno-0-5)VLLM_SERVER_DEV_MODE=1 vllm serve nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \ [](#__codelineno-0-6) --port 8000 \ [](#__codelineno-0-7) --served-model-name nemotron-3-nano \ [](#__codelineno-0-8) --trust-remote-code \ [](#__codelineno-0-9) --enable-sleep-mode ` ``` --- # Deploying Llama 3.2 3B in a Kubernetes (K8s) cluster - Source: https://docs.lambda.ai/education/large-language-models/k8s-ollama-llama-3-2/ --- [api](../../../tags/#tag:api)[kubernetes](../../../tags/#tag:kubernetes)[llama](../../../tags/#tag:llama)[llm](../../../tags/#tag:llm) # Deploying Llama 3.2 3B in a Kubernetes (K8s) cluster [#](#deploying-llama-32-3b-in-a-kubernetes-k8s-cluster) ## Introduction [#](#introduction) In this tutorial, you'll: - Stand up a single-node Kubernetes cluster on an [on-demand instance](https://lambda.ai/service/gpu-cloud)using [K3s](https://k3s.io/). - Install the [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html)so your cluster can use your instance's GPUs. - Deploy [Ollama](https://ollama.com/)in your cluster to serve the [Llama 3.2 3B model](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/). - Install the Ollama client. - Interact with the Llama 3.2 3B model. Note You don't need a Kubernetes cluster to run Ollama and serve the Llama 3.2 3B model. Part of this tutorial is to demonstrate that it's possible to stand up a Kubernetes cluster on on-demand instances. --- # Using KubeAI to deploy Nous Research's Hermes 3 and other LLMs - Source: https://docs.lambda.ai/education/large-language-models/kubeai-hermes-3/ --- [kubernetes](../../../tags/#tag:kubernetes)[llama](../../../tags/#tag:llama)[llm](../../../tags/#tag:llm) # Using KubeAI to deploy Nous Research's Hermes 3 and other LLMs [#](#using-kubeai-to-deploy-nous-researchs-hermes-3-and-other-llms) ## Introduction [#](#introduction) [See our video tutorial on using KubeAI to deploy Nous Research's Hermes 3 and other LLMs.](https://youtu.be/HEtPO2Wuiac?) [KubeAI: Private Open AI on Kubernetes](https://github.com/substratusai/kubeai)is a Kubernetes solution for running inference on open-weight large language models (LLMs), including [Nous Research's Hermes 3 fine-tuned Llama 3.1 8B model](https://nousresearch.com/hermes3/)and [NVIDIA's Nemotron fine-tuned Llama 3.1 70B model](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-instruct). Using model servers such as [vLLM](https://blog.vllm.ai/2023/06/20/vllm.html)and [Ollama](https://ollama.com/), KubeAI enables you to interact with LLMs using both a web UI powered by [Open WebUI](https://openwebui.com/)and an OpenAI-compatible API. In this tutorial, you'll: - Stand up a single-node Kubernetes cluster on an 8x H100 [on-demand instance](https://lambda.ai/service/gpu-cloud)using [K3s](https://k3s.io/). - Install the [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html)so your Kubernetes cluster can use your instance's GPUs. - Deploy KubeAI in your Kubernetes cluster to serve both Nous Research's Hermes 3 model and NVIDIA's Nemotron model. - Interact with the models using KubeAI's web UI. - Interact with the models using KubeAI's OpenAI-compatible API. - Use [NVTOP](https://github.com/Syllo/nvtop)to observe GPU utilization. ## Stand up a single-node Kubernetes cluster [#](#stand-up-a-single-node-kubernetes-cluster) - Use the [console](https://cloud.lambda.ai/instances)or [Cloud API](https://docs.lambda.ai/api/cloud#launchInstance)to launch an 8x H100 instance. Then, SSH into your instance by running: ```bash `[](#__codelineno-0-1)ssh ubuntu@ -L 8080:localhost:8080 ` ``` --- # Serving Llama 3.1 405B on a Lambda 1-Click Cluster - Source: https://docs.lambda.ai/education/large-language-models/serving-llama-3-1-405b/ --- # Serving Llama 3.1 405B on a Lambda 1-Click Cluster [#](#serving-llama-31-405b-on-a-lambda-1-click-cluster) In this tutorial, you'll learn how to use a 1-Click Cluster (1CC) to serve the [Meta Llama 3.1 405B model](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)using [vLLM](https://docs.vllm.ai/en/latest/index.html)with pipeline parallelism. ## Prerequisites [#](#prerequisites) For this tutorial, you need: - A [Lambda Cloud account](https://cloud.lambda.ai/sign-up). - A [1-Click Cluster](https://lambda.ai/service/gpu-cloud/1-click-clusters). - A [Hugging Face](https://huggingface.co/)account to download the Llama 3.1 405B model. - A [User Access Token](https://huggingface.co/docs/hub/en/security-tokens)with the **Read **role. - Before you can download the Llama 3.1 405B model, you need to review and accept the model's license agreement. Once you accept the agreement, a request to access the repository will be submitted for approval; approval tends to be fast. You can see the status of the request in your [Hugging Face account settings](https://huggingface.co/settings/gated-repos). ### Download the Llama 3.1 405B model and set up a head node [#](#download-the-llama-31-405b-model-and-set-up-a-head-node) First, follow the [instructions for accessing your 1CC](../../../public-cloud/1-click-clusters/#accessing-your-1-click-cluster), which includes steps to set up SSH. Then SSH into one of your 1CC GPU nodes. You can find the node names on the [1-Click Clusters](https://cloud.lambda.ai/one-click-clusters/running)page in the Lambda Cloud console. You’ll use this GPU node as a head node for cluster management. On the head node, set environment variables needed for this tutorial by running: ```bash `[](#__codelineno-0-1)export HEAD_IP=HEAD-IP [](#__codelineno-0-2)export SHARED_DIR=/home/ubuntu/FILE-SYSTEM-NAME [](#__codelineno-0-3)export HF_TOKEN=HF-TOKEN [](#__codelineno-0-4)export HF_HOME="${SHARED_DIR}/.cache/huggingface" [](#__codelineno-0-5)export MODEL_REPO=meta-llama/Meta-Llama-3.1-405B-Instruct ` ``` --- # Serving the Llama 3.1 8B and 70B models using Lambda Cloud on-demand instances - Source: https://docs.lambda.ai/education/large-language-models/serving-llama-3-1-docker/ --- [docker](../../../tags/#tag:docker)[llama](../../../tags/#tag:llama)[llm](../../../tags/#tag:llm) # Serving the Llama 3.1 8B and 70B models using Lambda Cloud on-demand instances [#](#serving-the-llama-31-8b-and-70b-models-using-lambda-cloud-on-demand-instances) This tutorial shows you how to use a [Lambda Cloud](https://lambda.ai/service/gpu-cloud)1x or 8x A100 or H100 on-demand instance to serve the Llama 3.1 8B and 70B models. You'll serve the model using [vLLM running inside of a Docker container](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html). ## Start the vLLM API server [#](#start-the-vllm-api-server) If you haven't already, use the [console](https://cloud.lambda.ai/instances)or [Cloud API](https://docs.lambda.ai/api/cloud)to launch an instance. Then, SSH into your instance. Run: ```bash `[](#__codelineno-0-1)export HF_TOKEN=HF-TOKEN HF_HOME="/home/ubuntu/.cache/huggingface" MODEL_REPO=meta-llama/MODEL ` ``` --- # Basic Linux commands and system administration - Source: https://docs.lambda.ai/education/linux-usage/basic-linux-commands-and-system-administration/ --- # Basic Linux commands and system administration [#](#basic-linux-commands-and-system-administration) ## Importing SSH keys from GitHub accounts [#](#importing-ssh-keys-from-github-accounts) To import an SSH key from a GitHub account and add it to your server (or Lambda GPU Cloud on-demand instance): - Using your existing SSH key, SSH into your server. Alternatively, if you're using an on-demand instance, open a terminal in [JupyterLab](../../../public-cloud/on-demand/getting-started/#how-do-i-open-jupyterlab-on-my-instance). - Import the SSH key from the GitHub account by running: ```bash `[](#__codelineno-0-1)ssh-import-id gh:USERNAME ` ``` --- # Configuring Software RAID - Source: https://docs.lambda.ai/education/linux-usage/configuring-software-raid/ --- # Configuring Software RAID [#](#configuring-software-raid) Software RAID (redundant array of independent disks) provides fast and resilient storage for your machine learning data. This document shows you how to configure software RAID in your cluster using [`mdadm`](https://linux.die.net/man/8/mdadm). - Install new drives as needed, then power on the machine. - Check that the drives are present with `lsblk`. Your output should look similar to the following: ```bash `[](#__codelineno-0-1)ubuntu@ubuntu:~$ lsblk [](#__codelineno-0-2)NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS [](#__codelineno-0-3)nvme1n1 259:1 0 1.8T 0 disk [](#__codelineno-0-4)nvme3n1 259:2 0 1.8T 0 disk [](#__codelineno-0-5)nvme2n1 259:3 0 1.8T 0 disk [](#__codelineno-0-6)... ` ``` --- # Lambda Stack and recovery images - Source: https://docs.lambda.ai/education/linux-usage/lambda-stack-and-recovery-images/ --- # Lambda Stack and recovery images [#](#lambda-stack-and-recovery-images) ## Removing and reinstalling Lambda Stack [#](#removing-and-reinstalling-lambda-stack) To remove and reinstall [Lambda Stack](https://lambda.ai/lambda-stack-deep-learning-software): Uninstall (purge) the existing Lambda Stack by running: ```bash `[](#__codelineno-0-1)sudo rm -f /etc/apt/sources.list.d/{graphics,nvidia,cuda}* && \ [](#__codelineno-0-2)dpkg -l | \ [](#__codelineno-0-3)awk '/cuda|lib(accinj64|cu(blas|dart|dnn|fft|inj|pti|rand|solver|sparse)|magma|nccl|npp|nv[^p])|nv(idia|ml)|tensor(flow|board)|torch/ { print $2 }' | \ [](#__codelineno-0-4)sudo xargs -or apt -y remove --purge ` ``` --- # Troubleshooting and debugging - Source: https://docs.lambda.ai/education/linux-usage/troubleshooting-and-debugging/ --- [troubleshooting](../../../tags/#tag:troubleshooting) # Troubleshooting and debugging [#](#troubleshooting-and-debugging) ## Linux [#](#linux) ### Generate a Lambda bug report [#](#generate-a-lambda-bug-report) Lambda bug reports are useful for troubleshooting systems with NVIDIA GPUs, including Cloud instances. To generate a Lambda bug report, run: ```bash `[](#__codelineno-0-1)wget -nv -O - https://raw.githubusercontent.com/lambdal-support/lambda-public-tools/main/lambda-bug-report.sh | bash - ` ``` --- # Using the Lambda bug report to troubleshoot your system - Source: https://docs.lambda.ai/education/linux-usage/using-the-lambda-bug-report-to-troubleshoot-your-system/ --- [troubleshooting](../../../tags/#tag:troubleshooting) # Using the Lambda bug report to troubleshoot your system The Lambda bug report helps simplify the process of troubleshooting by collecting system information for you into one place. This article helps you utilize the `lambda-bug-report.log` file to troubleshoot common issues. Warning The `lambda-bug-report.sh` is intended for use on [Vector](https://lambda.ai/gpu-workstations/vector), [Scalar](https://lambda.ai/products/scalar), [Hyperplane](https://lambda.ai/deep-learning/servers/hyperplane), and [On-Demand](https://lambda.ai/service/gpu-cloud) products only. Do not run this script on a cluster as it installs packages that may cause unintended outcomes. --- # Using the nvidia-bug-report.log file to troubleshoot your system - Source: https://docs.lambda.ai/education/linux-usage/using-the-nvidia-bug-report.log-file-to-troubleshoot-your-system/ --- # Using the nvidia-bug-report.log file to troubleshoot your system NVIDIA provides a script that generates a log file that you can use to troubleshoot issues with NVIDIA GPUs. This log file has comprehensive information about your system, including information about individual devices, configuration of NVIDIA drivers, system journals, and more. ## Generate the log file To generate the log file, log in as the root user or use `sudo`, then run the following command: ```bash sudo nvidia-bug-report.sh ``` --- # Virtual environments and Docker containers - Source: https://docs.lambda.ai/education/programming/virtual-environments-containers/ --- [Virtualization](../../../tags/#tag:virtualization) # Virtual environments and Docker containers ## What are virtual environments? Virtual environments allow you to create and maintain development environments that are isolated from each other. Lambda recommends using either: - [Python venv](#creating-a-python-virtual-environment) - [conda](#creating-a-conda-virtual-environment) ### Creating a Python virtual environment Create a Python virtual environment using the `venv` module by running: ```bash python -m venv --system-site-packages NAME ``` --- # Running a PyTorch®-based benchmark on an NVIDIA GH200 instance - Source: https://docs.lambda.ai/education/running-benchmark-gh200/ --- [on-demand cloud](../../tags/#tag:on-demand-cloud) # Running a PyTorch®-based benchmark on an NVIDIA GH200 instance This tutorial describes how to run an NGC-based benchmark on an On-Demand Cloud (ODC) instance backed with the NVIDIA GH200 Grace Hopper Superchip. The tutorial also outlines how to run the benchmark on other ODC instance types to compare performance. The benchmark uses a variety of PyTorch® examples from NVIDIA's [Deep Learning Examples](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch) repository. ## Prerequisites To run this tutorial successfully, you'll need the following: - A GitHub account and some familiarity with a Git-based workflow. - The following tools and libraries installed on the machine or instance you plan to benchmark. These tools and libraries are installed by default on your ODC instances: - NVIDIA driver - Docker - Git - nvidia-container-toolkit - Python ## Setting up your environment ### Launch your GH200 instance Begin by launching a GH200 instance: - In the Lambda Cloud console, navigate to the [SSH keys page](https://cloud.lambda.ai/ssh-keys), click **Add SSH Key**, and then add or generate a SSH key. - Navigate to the [Instances page](https://cloud.lambda.ai/instances) and click **Launch Instance**. - Follow the steps in the instance launch wizard. - *Instance type:* Select **1x GH200 (96 GB)**. - *Region:* Select an available region. - *Filesystem:* Don't attach a filesystem. - *SSH key:* Use the key you created in step 1. - Click **Launch instance**. - Review the EULAs. If you agree to them, click **I agree to the above** to start launching your new instance. Instances can take up to five minutes to fully launch. ### Set the required environment variables Next, set the environment variables you need to run the benchmark: - In the Lambda Cloud console, navigate to the [Instances page](https://cloud.lambda.ai/instances), find the row for your instance, and then click **Launch** in the **Cloud IDE** column. JupyterLab opens in a new window. - In JupyterLab's **Launcher** tab, under **Other**, click **Terminal** to open a new terminal. - Open your `.bashrc` file for editing: ```bash nano ~/.bashrc ``` --- # Running Hugging Face Transformers and Diffusers on an NVIDIA GH200 instance - Source: https://docs.lambda.ai/education/running-huggingface-diffusers-transformers-gh200/ --- [on-demand cloud](../../tags/#tag:on-demand-cloud) # Running Hugging Face Transformers and Diffusers on an NVIDIA GH200 instance [Hugging Face](https://huggingface.co/) provides several powerful Python libraries that provide easy access to a wide range of pre-trained models. Among the most popular are [Diffusers](https://huggingface.co/docs/diffusers/index), which focuses on diffusion-based generative AI, and [Transformers](https://huggingface.co/docs/transformers/en/index), which supports common AI/ML tasks across several different modalities. This tutorial demonstrates how to use these libraries to generate images and chatbot-style responses on an On-Demand Cloud (ODC) instance backed with the NVIDIA GH200 Grace Hopper Superchip. ## Setting up your environment ### Launch your GH200 instance Begin by launching a GH200 instance: - In the Lambda Cloud console, navigate to the [SSH keys page](https://cloud.lambda.ai/ssh-keys), click **Add SSH Key**, and then add or generate a SSH key. - Navigate to the [Instances page](https://cloud.lambda.ai/instances) and click **Launch Instance**. - Follow the steps in the instance launch wizard. - *Instance type:* Select **1x GH200 (96 GB)**. - *Region:* Select an available region. - *Filesystem:* Don't attach a filesystem. - *SSH key:* Use the key you created in step 1. - Click **Launch instance**. - Review the EULAs. If you agree to them, click **I agree to the above** to start launching your new instance. Instances can take up to five minutes to fully launch. ### Set up your Python virtual environment Next, create a new Python virtual environment and install the required libraries: - In the Lambda Cloud console, navigate to the [Instances page](https://cloud.lambda.ai/instances), find the row for your instance, and then click **Launch** in the **Cloud IDE** column. JupyterLab opens in a new window. - In JupyterLab's **Launcher** tab, under **Other**, click **Terminal** to open a new terminal. - In your terminal, create a Python virtual environment: ```bash python -m venv --system-site-packages hf-tests ``` --- # Orchestrating AI workloads with dstack - Source: https://docs.lambda.ai/education/scheduling-and-orchestration/orchestrating-workloads-with-dstack/ --- [api](../../../tags/#tag:api) [llm](../../../tags/#tag:llm) # Orchestrating AI workloads with dstack ## Introduction [dstack](https://dstack.ai/) is an open-source alternative to Kubernetes and Slurm, built for orchestrating containerized AI and ML workloads. It simplifies the development, training, and deployment of AI models. With dstack, you use YAML configuration files to define how your applications run. These files specify which [Lambda On-Demand Cloud](https://lambda.ai/service/gpu-cloud) resources to use and how to start your workloads. You can run one-off jobs, set up full-featured remote development environments that open in VS Code, or deploy persistent services that expose APIs for your models. In this tutorial, you'll learn how to: - Run a [Task](https://dstack.ai/docs/concepts/tasks/) that evaluates an LLM's ability to solve multiplication problems. - Set up a remote [development environment](https://dstack.ai/docs/concepts/dev-environments/) for use with VS Code. - Launch a [Service](https://dstack.ai/docs/concepts/services/) that serves an LLM via an OpenAI-compatible API endpoint. - Create an [SSH Fleet](https://dstack.ai/docs/concepts/fleets/#ssh) on a [Lambda 1-Click Cluster](https://lambda.ai/service/gpu-cloud/1-click-clusters). ## Prerequisites All of the instructions in this tutorial should be followed on your local machine, not on an on-demand instance. Before you begin, make sure the following tools are installed: - `python3` - `python3-pip` - `git` - `curl` - `jq` On Ubuntu, you can install these packages by running: ```bash sudo apt update && sudo apt install -y python3 python3-pip git curl jq ``` --- # Using SkyPilot to deploy a Kubernetes cluster - Source: https://docs.lambda.ai/education/scheduling-and-orchestration/skypilot-deploy-kubernetes/ --- [api](../../../tags/#tag:api) [kubernetes](../../../tags/#tag:kubernetes) # Using SkyPilot to deploy a Kubernetes cluster ## Introduction [SkyPilot](https://skypilot.readthedocs.io/en/latest/docs/index.html) makes it easy to deploy a Kubernetes cluster using [Lambda Cloud](https://lambda.ai/service/gpu-cloud) on-demand instances. The [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html) is preinstalled so you can immediately use your instances' GPUs. In this tutorial, you'll: - [Configure your Lambda Cloud firewall and a Cloud API key for SkyPilot and Kubernetes](#configure-your-lambda-cloud-firewall-and-generate-a-cloud-api-key). - [Install SkyPilot](#install-skypilot). - [Configure SkyPilot for Lambda Cloud](#configure-skypilot-for-lambda-cloud). - [Use SkyPilot to launch 2 1x H100 on-demand instances and deploy a 2-node Kubernetes cluster using these instances](#use-skypilot-to-launch-instances-and-deploy-kubernetes). Note [**You're billed for all of the time the instances are running.**](../../../public-cloud/billing/#on-demand-cloud-odc-instances) --- # Using Multi-Instance GPU (MIG) - Source: https://docs.lambda.ai/education/using-mig/ --- [docker](../../tags/#tag:docker) [llama](../../tags/#tag:llama) [llm](../../tags/#tag:llm) [on-demand cloud](../../tags/#tag:on-demand-cloud) # Using Multi-Instance GPU (MIG) [See our video tutorial on using Multi-Instance GPU (MIG).](https://youtu.be/KbB5e_V6THw) [NVIDIA Multi-Instance GPU](https://www.nvidia.com/en-us/technologies/multi-instance-gpu/), or MIG, allows you to partition your GPUs into isolated instances. MIG enables you to run simultaneous workloads on a single GPU. For example, you can run inference on multiple models at the same time. Tip [See NVIDIA's MIG User Guide to learn more about MIG](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/). --- # Introduction - Source: https://docs.lambda.ai/hardware/ --- # Introduction Getting started and troubleshooting instructions for Lambda Servers and Vector workstations. [](servers/getting-started)Servers --- # Getting started - Source: https://docs.lambda.ai/hardware/servers/getting-started/ --- [scalar](../../../tags/#tag:scalar) # Getting started ## Where can I download the user manual for my server chassis? User manuals for Lambda server chassis can be downloaded below. Tip You can run `sudo dmidecode -t 1` to know your server chassis. The command will output, for example: ```bash # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 3.3.0 present. # SMBIOS implementations newer than version 3.2.0 are not # fully supported by this version of dmidecode. Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Supermicro Product Name: AS -1114CS-TNR Version: 0123456789 Serial Number: S452392X2826686 UUID: 51605a00-c54f-11ec-8000-3cecefcdb48b Wake-up Type: Power Switch SKU Number: To be filled by O.E.M. Family: To be filled by O.E.M. ``` --- # Set lower power limits (TDPs) for NVIDIA GPUs - Source: https://docs.lambda.ai/hardware/servers/set-lower-gpu-power-limits/ --- # Set lower power limits (TDPs) for NVIDIA GPUs You can set lower power limits (TDPs) for your NVIDIA GPUs using `nvidia-smi` and a simple script. You can configure the script to run automatically at boot using a systemd service. Lowering power limits can reduce power usage and heat output, which is helpful for thermally constrained systems or energy-aware environments. - **Check your GPU's minimum power limit.** Run: ```bash nvidia-smi -q -d POWER ``` --- # Getting started - Source: https://docs.lambda.ai/hardware/workstations/getting-started/ --- [vector](../../../tags/#tag:vector) [vector one](../../../tags/#tag:vector-one) [vector pro](../../../tags/#tag:vector-pro) # Getting started [See our video on Lambda's line of Vector desktops and workstations.](https://youtu.be/NaLzEXRb2bw) [See our quick start video guide.](https://youtu.be/CNysLDjOzKk) ## What are the buttons and ports at the front and top of my Vector One? [![Vector One front and top buttons and ports](../../../assets/images/vector-one-buttons.png)](../../../assets/images/vector-one-buttons.png) Your Vector One's power button is located at the front-top. The first button at the top, closest to the front, switches between the various RGB modes. The second button at the top changes the color. The first port at the top, closest to the front, is used to connect USB-C 3.1 devices. The following 2 ports are used to connect USB-A 3.0 devices. The jack at the top is used to connect a headset/microphone. ## How do I fix Wi-Fi issues with my Vector One? There are [known issues in Ubuntu with the Wi-Fi adapter](https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2049220) installed in Vector Ones. In some cases, the Wi-Fi adapter isn't detected at all. In other cases, the Wi-Fi adapter is detected but exhibits slow performance. These issues are fixed in updated firmware for the Wi-Fi adapter. In order to download and install the updated firmware, you need to connect your Vector One to the Internet. If your Wi-Fi adapter isn't detected at all, try booting using a previous kernel version. Your Wi-Fi adapter might be detected and you can download the updated firmware. Note You can also connect your Vector One to the Internet using Ethernet (recommended), a USB Wi-Fi adapter, or by tethering your iPhone or Android phone. --- # Troubleshooting Workstations and Desktops - Source: https://docs.lambda.ai/hardware/workstations/troubleshooting/ --- [vector](../../../tags/#tag:vector) [vector one](../../../tags/#tag:vector-one) [vector pro](../../../tags/#tag:vector-pro) # Troubleshooting Workstations and Desktops This document provides guidelines for troubleshooting common issues with Vector desktops and workstations. By following this guide, you can often resolve problems without the need for a Repair Merchandise Authorization (RMA). To mitigate unnecessary downtime with shipping and repair, Lambda may expect you to attempt basic troubleshooting before approving a request for RMA. Note When you encounter any issue with your product, your first step should be to file a [support ticket](https://support.lambda.ai/hc/en-us/requests/new). Doing so creates a record of your issue and allows our team to provide ongoing assistance if needed. If possible, generate a Lambda bug report and attach the report to your ticket. This report provides our Support Team with valuable system information that can expedite the troubleshooting process. --- # Introduction - Source: https://docs.lambda.ai/private-cloud/ --- [private cloud](../tags/#tag:private-cloud) # Introduction Lambda Private Cloud provides bare metal, single-tenant clusters for defined reservation periods. Private Cloud clusters are built to your specifications, with Lambda providing 24x7 support for cluster hardware. Access to the cluster and firewall is restricted solely to you, and all hardware is isolated from other customer workloads. Additionally, Lambda offers workload management for Private Cloud clusters, like [Managed Kubernetes](managed-kubernetes/), as an added service. [](https://lambda.ai/talk-to-an-engineer)Private Cloud --- # Accessing your Lambda Private Cloud cluster - Source: https://docs.lambda.ai/private-cloud/accessing-private-cloud/ --- [security](../../tags/#tag:security) # Accessing your Lambda Private Cloud cluster ## Introduction [Lambda Private Cloud](https://lambda.ai/service/gpu-cloud/private-cloud) clusters use Fortinet FortiGate firewall appliances to enable secure VPN access. To access your Private Cloud cluster through the firewall, you must first install and configure FortiClient VPN (FortiClient). ## Download and install FortiClient Download and install the appropriate FortiClient package for your computer: - [Download for Windows](https://links.fortinet.com/forticlient/win/vpnagent) - [Download for MacOS](https://links.fortinet.com/forticlient/mac/vpnagent) - [Download for Ubuntu](https://links.fortinet.com/forticlient/deb/vpnagent) (and other `.deb`-based distributions) - [Download for Red Hat](https://links.fortinet.com/forticlient/rhel/vpnagent) (and other `.rpm`-based distributions) FortiClient for other devices, including ARM64 systems, can be downloaded from Fortinet's [product downloads](https://www.fortinet.com/support/product-downloads#vpn) page. ## Configure the VPN connection Next, you'll configure FortiClient using the credentials provided in your 1Password vault, which looks like: [![Screenshot of a 1Password vault for Private Cloud](../../assets/images/private-cloud/1password-vault.png)](../../assets/images/private-cloud/1password-vault.png) - Open FortiClient. - Click **Configure VPN**. - In the **New VPN Connection** window, enter the following settings: - **VPN**: Select **SSL-VPN**. - **Connection Name**: Enter a descriptive name for the VPN connection. - (Optional) **Description**: Enter a description for the VPN connection. - **Remote Gateway**: Enter the VPN URL from your 1Password vault. Omit the port number. - Select the **Customize port** checkbox and enter the VPN URL port number. - Clear the **Enable Single Sign On (SSO) for VPN Tunnel** checkbox. - **Client Certificate**: Select **None**. - **Authentication**: Select **Prompt on login**. - Clear the **Enable Dual-stack IPv4/IPv6 address** checkbox. Your configuration should look like this: [![Screenshot of a FortiClient VPN connection configured for Private Cloud](../../assets/images/private-cloud/forticlient-new-vpn-connection.png)](../../assets/images/private-cloud/forticlient-new-vpn-connection.png) - Click **Save**. - In the main window, select the VPN connection you just created from the dropdown menu: [![Screenshot of the FortiClient main window](../../assets/images/private-cloud/forticlient-main-window.png)](../../assets/images/private-cloud/forticlient-main-window.png) - Enter the **Username** and **Password** provided in your 1Password vault. - Click **Connect**. You're asked to confirm that you want to connect to the VPN: [![Screenshot of prompt to accept or deny certificate](../../assets/images/private-cloud/forticlient-cert-confirmation.png)](../../assets/images/private-cloud/forticlient-cert-confirmation.png) - Verify that the certificate fingerprint matches the fingerprint shown in your 1Password vault. Then, click **Accept**. You're connected to your Private Cloud cluster when FortiClient shows the VPN connection is active: [![Screenshot of FortiClient connect to Private Cloud VPN](../../assets/images/private-cloud/forticlient-vpn-connected.png)](../../assets/images/private-cloud/forticlient-vpn-connected.png) ## Next steps - [Learn about Lambda Private Cloud clusters](https://lambda.ai/service/gpu-cloud/private-cloud). - [Learn about Lambda 1-Click Clusters](https://lambda.ai/service/gpu-cloud/1-click-clusters). --- # Overview - Source: https://docs.lambda.ai/private-cloud/managed-kubernetes/ --- [managed kubernetes](../../tags/#tag:managed-kubernetes) [private cloud](../../tags/#tag:private-cloud) # Overview During the Private Cloud reservation process, you can choose to configure your cluster as a Managed Kubernetes cluster. In this configuration, Lambda manages your cluster's underlying environment, and you interact with the cluster through a browser-based Kubernetes administration UI and the Kubernetes API. This document outlines the standard configuration for a Managed Kubernetes cluster in Lambda Private Cloud. ## Hardware Lambda Private Cloud provides single-tenant clusters that are isolated from other clusters. The hardware details for your specific cluster depend on what you chose when reserving your cluster. Each cluster includes at least three control (CPU) nodes for cluster administration and job scheduling. ## Software Your Managed Kubernetes deployment is configured to use Rancher with Rancher Kubernetes Engine 2 (RKE2). - [Rancher](https://ranchermanager.docs.rancher.com/) provides a web UI for monitoring and managing aspects of your Kubernetes cluster. Rancher also provides your cluster's Kubenetes API server. - [RKE2](https://docs.rke2.io/) is a fully conformant Kubernetes distribution focused on security and compliance. ## Cluster management ### Rancher dashboard The Rancher dashboard serves as the main UI for your Managed Kubernetes cluster. After you set up your SSL VPN connection, you can access your dashboard at [https://10.141.3.1](https://10.141.3.1). The login details for your dashboard can be found in your 1Password vault. For details on setting up your SSL VPN connection, see [Getting started > Establishing a secure connection to your cluster](getting-started/#establishing-a-secure-connection). ### Kubernetes API The Kubernetes API is available at `https://10.141.0.250:6443` through your SSL VPN connection. You can obtain your `kubeconfig` file from the Rancher dashboard. For details, see [Getting started > Accessing the Kubernetes API](getting-started/#accessing-the-kubernetes-api). ## Storage In the default cluster configuration, your cluster comes with three types of storage, each with its own performance and access characteristics: - Common (Longhorn) - Shared workload (Intelliflash) - Scratch/HPC (directly attached local storage) Each type is mapped to a corresponding storage class in Kubernetes: - `longhorn` - `intelliflash` - `local-path` Note Each cluster also includes a `local-storage` storage class that Lambda uses to route monitoring metric replication. You can safely ignore this class— `local-path` is the dynamic provisioner routed to the fast NVMe arrays. --- # Getting started - Source: https://docs.lambda.ai/private-cloud/managed-kubernetes/getting-started/ --- [managed kubernetes](../../../tags/#tag:managed-kubernetes) [private cloud](../../../tags/#tag:private-cloud) # Getting started This document explains how to access your Private Cloud cluster's Managed Kubernetes dashboard and walks you through commonly used sections of your cluster dashboard. It also provides guidance for configuring and running workloads on your Private Cloud cluster. For an overview of your Managed Kubernetes cluster's default specifications and configuration, see the [Managed Kubernetes overview](../) for Private Cloud. ## Establishing a secure connection to your cluster Your cluster uses Fortigate, Fortinet's Next-Generation Firewall (NGFW), to protect your network and provide remote access services. To access your cluster from your local computer: Note These steps assume you're running Linux on your local machine. If you use FortiClient VPN on macOS and Windows, the steps might differ slightly. --- # Security posture - Source: https://docs.lambda.ai/private-cloud/security-posture/ --- [security](../../tags/#tag:security) # Security posture for Lambda Private Cloud ## Introduction This document outlines the physical and logical security posture of [Lambda Private Cloud](https://lambda.ai/service/gpu-cloud/private-cloud). [![Diagram of Private Cloud infrastructure](../../assets/images/private-cloud/private-cloud-infra-diagram.png)](../../assets/images/private-cloud/private-cloud-infra-diagram.png) ## Overview Lambda Private Cloud provides single-tenant AI compute infrastructure with fully dedicated hardware and network resources allocated exclusively to a single customer. This architecture eliminates shared infrastructure risks and potential resource contention through complete environmental separation. This isolation model significantly reduces third-party risk exposure while maintaining access to Lambda's infrastructure design expertise and operational support services. All clusters start with a common baseline reference design, which incorporates well-reasoned and sensible security decisions. On top of this design, various customizations are offered to tailor a cluster to specific customer requirements. Common customizations and their impact on the cluster's security posture are outlined below. ## Cluster Design A private cloud cluster contains several distinct types of server nodes. All nodes are single-tenant bare-metal systems which are physically isolated from other Lambda customers and dedicated to the customer's cluster. - **Compute (GPU) nodes: **Optimized for GPU workloads. These nodes connect to the cluster's in-band Ethernet network and InfiniBand fabric. - **Head nodes: **Used as control plane or CPU-only compute nodes. They connect solely to the cluster's in-band Ethernet network. - **Lambda Management nodes: **Used by Lambda to provide observability into the health of the cluster and allow Lambda to provide operational support. ### BIOS, BMC, and firmware All nodes are deployed with the latest validated BIOS and BMC firmware. All nodes have secure BMC passwords set. ### Operating system All compute and head nodes are provisioned with an up-to-date installation of an Ubuntu LTS (Long-Term Support) release. The customer is responsible for OS-level security and patch management, as well as monitoring logs and metrics. Lambda provisions the cluster nodes with an initial SSH authorized key provided by the customer. With this key, the customer receives administrator (root) access to all compute and head nodes. Once the cluster is handed over, the customer can install additional SSH keys, add or remove local user accounts, and perform any other desired OS-level configuration changes. ## Network A private cloud cluster primarily utilizes two network fabrics that operate in tandem: an **in-band Ethernet network **and an **InfiniBand fabric **. All network connections and supporting infrastructure (including firewalls, routers, and switches) are dedicated to the customer's cluster. The **in-band Ethernet **network is connected to all compute and management nodes, as well as the cluster's persistent storage. This network is the primary path that compute and management nodes use to access persistent storage. The in-band Ethernet network has internet connectivity provided through redundant dedicated internet access (DIA) links. A firewall is placed at the network perimeter and is initially configured with no internet-exposed ingress ports. The customer has full control over this firewall and can implement their own network policies. The **InfiniBand fabric **provides high-speed, low-latency connectivity between compute nodes in a spine-leaf topology, suitable for RDMA-enabled GPU communication. All compute nodes have unrestricted access to each other on this fabric. Depending on the size of the cluster, an **RoCE fabric **(RDMA over Converged Ethernet) may be used for RDMA traffic instead of InfiniBand. The security properties of this technology are similar to InfiniBand; however, it uses Ethernet as the underlying physical layer. Each cluster also has a separate **management network **. The management network connects to control plane systems, physical infrastructure (including smart PDUs), dedicated management interfaces on server nodes (including BMCs and DPUs), and dedicated management interfaces of network devices that support the in-band Ethernet network and InfiniBand fabric. It has connectivity to the in-band Ethernet network through a dedicated management firewall. Finally, there is a small **out-of-band (OOB) network **. This network is used for backup or emergency access to the dedicated management interfaces on network devices in the management network (i.e., core switches and firewalls). The OOB network has its own firewall that is connected to a backup low-bandwidth internet link, which can be utilized to manage core management network devices in an outage scenario. General routing is not permitted over the OOB backup link. ### Hardware Lambda will deploy all network hardware with the latest firmware provided by the respective vendors. Lambda can assist with ongoing firmware updates to these devices at the customer's request, but will not (and generally cannot) take any action on them without prior customer approval. ### VPN The cluster's perimeter firewall provides client VPN functionality, which allows for secure remote connection to the cluster from individual client endpoints. This is suitable either as a primary access path or a backup access path if the primary preference is to use a site-to-site VPN or dedicated private links. As part of the handoff process of a new cluster, Lambda will provision and share an initial set of administrative VPN credentials with the customer. The customer has the option to provision additional VPN accounts or integrate with their own identity provider for single sign-on (SSO). ## Data storage and handling Each cluster includes persistent network-attached storage. As with other aspects of a private cloud cluster, all hardware (including storage media) is dedicated to the customer and not shared with others. Data saved to persistent storage is encrypted at rest with 256-bit AES-XTS. A unique encryption key is generated when the cluster is built and is not shared with other customers. If a persistent storage drive is ever physically removed, its data is unreadable and irrecoverable without this key. Data accessed from persistent storage is not encrypted in-transit over the network, but the traffic is fully contained within the cluster's physical footprint. This design optimizes for I/O performance while still protecting the data through physical isolation and controlled datacenter access. Customers with stricter requirements can implement object-level encryption. Storage drives that fail or reach end-of-life are carefully removed and secured within the datacenter until they are destroyed. Lambda relies on a certified third-party vendor for this process and can provide proof of destruction upon request. At the conclusion of a Private Cloud contract, Lambda performs a secure wipe of all cluster drives. This sanitization process complies with NIST 800-88 "purge" requirements, ensuring that customer data is permanently removed. ## Physical site Lambda Private Cloud infrastructure is housed in secure datacenter facilities featuring: - Perimeter and internal CCTV surveillance, with a minimum 90-day retention - Multiple security checkpoints, minimally including: - Main lobby access that is restricted by a security door with a biometric reader, badge reader, and/or security personnel - Multi-factor authentication (PIN/badge + biometric) for entry into the data hall Physical access to the infrastructure is limited to Lambda employees and data center providers with a specific need for access. In the event of emergencies or planned maintenance, Lambda documents all access and reviews it to ensure compliance with these protocols. Authorized Lambda employees may access data centers for maintenance, upgrades, or other hardware-related work. Local authorities or authorized data center providers may access the hall space as required by local codes. ## Ongoing maintenance and support It is understood that every customer has unique infrastructure management requirements. Lambda Private Cloud offers different levels of maintenance options, allowing customers to select the level of support that best suits their needs. Regardless of the chosen model, the customer will have access to Lambda's expert support staff. - **Physical only **- After handoff, Lambda will not retain any logical access to the customer's cluster or data. The customer will be able to submit support and maintenance requests, and has the option to provide Lambda with logical access as-needed for assistance. This model provides the most robust level of access reduction for the cluster, while still retaining access to Lambda's support expertise. - **Managed Private Cloud **- Lambda will retain logical access to the cluster to provide best-in-class support at all layers of the stack. The customer can revoke Lambda's access at any time and will have access to robust audit logs that provide visibility into the actions Lambda takes on the cluster. Lambda will never take action on the cluster without prior notification and permission, and will work with the customer to establish requirements for maintenance windows. ## Customizations ### Managed Kubernetes A [fully-managed Kubernetes stack](https://lambda.ai/kubernetes)can be provisioned into the cluster. This Platform-as-a-Service (PaaS) offering adheres to Kubernetes best practices for cluster configuration and security, removing management overhead and enabling customers to leverage Kubernetes effectively. Lambda will handle ongoing cluster maintenance, including updates and patching (subject to the customer's requirements for maintenance window coordination). SSO authentication is supported for Managed Kubernetes out-of-the-box. Customers can authenticate against their own identity provider via OIDC or SAML. For clusters with Managed Kubernetes, customers will not have direct SSH access to cluster nodes; instead, Kubernetes tooling is used to manage workloads. ### Dedicated cage A cage can be constructed within the datacenter to physically isolate the cluster hardware. The cage will be exclusively dedicated to the customer and will not contain any hardware allocated to other customers. **Important: **If a dedicated cage is desired, Lambda needs to be informed early in the cluster build process so that space can be allocated and the cage can be physically constructed prior to cluster hardware installation. The cage is constructed with a tight mesh and a ceiling, and can extend below the raised floor to the underlying concrete. Customers can install their own badging infrastructure at the entrance to the cage and will have full control over who has access and when they have access. Additionally, customers have the option to install their own security cameras within the cage, with full control over placement to get any angle within the cage. The cluster will have external network links (including DIA links) that leave the cage. These links can optionally use armored cabling for increased protection against physical tampering. ### Site-to-site IPsec VPN The cluster can be securely connected to the customer's infrastructure via a site-to-site VPN tunnel over the internet (or alternatively, through dedicated private links). The cluster's perimeter firewall terminates an IPsec tunnel with the customer's on-premises or self-hosted firewall/router, establishing secure and encrypted connectivity between the two environments. The security of this tunnel relies heavily on the protection of the IPsec credentials (either a pre-shared key or certificates). The cluster firewall utilizes encrypted credential storage; however, the customer is responsible for secure storage of the credentials on their own infrastructure. If additional protection is required for tunnels running over the internet, an IP allowlist can be added to the outer tunnel connection to restrict it to traffic from specified source IPs. ### Private cloud connectivity The cluster can be configured with dedicated private network links to major cloud providers, including AWS DirectConnect, Azure ExpressRoute, GCP Dedicated/Partner Interconnect, and OCI FastConnect. These links are terminated on the cluster's perimeter firewall, allowing the customer to implement desired network security policies. ### Custom hardware Additional custom hardware (such as network appliances or server nodes) can be deployed into a cluster to meet specific needs if the baseline components are not suitable. This is useful, for example, when there are requirements to run specific network appliances or a need for special-purpose functionality that must run in close proximity to the rest of the cluster. Lambda will be responsible for establishing physical network connectivity for custom hardware, and the customer will be responsible for its configuration and operation. --- # Introduction - Source: https://docs.lambda.ai/public-cloud/ --- # Introduction Lambda's public cloud lets you launch individual virtual machines or clusters and turn them down on your schedule. [](on-demand/)[![odc-icon](/assets/icons/odc-icon.svg)](/assets/icons/odc-icon.svg)On Demand Cloud --- # Introduction - Source: https://docs.lambda.ai/public-cloud/1-click-clusters/ --- [1-click clusters](../../tags/#tag:1-click-clusters)[distributed training](../../tags/#tag:distributed-training) # Introduction 1-Click Clusters (1CC) are high-performance clusters composed of both GPU and CPU nodes, featuring 16 to 512 NVIDIA H100 or B200 SXM Tensor Core GPUs. GPU nodes are interconnected using an NVIDIA Quantum-2 400 Gb/s InfiniBand non-blocking fabric in a rail-optimized topology, enabling peer-to-peer GPUDirect RDMA at up to 3200 Gb/s. All nodes (GPU and CPU) are equipped with 2x100 Gb/s Ethernet for IP communication and 2x100 Gb/s Direct Internet Access (DIA) connections. Each 1CC includes 3x CPU management (head) nodes for use as jump boxes (bastion hosts) and for cluster administration and job scheduling. These management nodes are assigned public IP addresses and are directly accessible over the Internet via SSH. All nodes can be directly accessed using [JupyterLab](../on-demand/getting-started/#how-do-i-open-jupyterlab-on-my-instance)from the Lambda Cloud console. 1CC nodes are in an isolated private network and can communicate freely with each other using private IP addresses. Generic CPU nodes can optionally be launched in the same regions as 1CCs. These generic CPU nodes run independently of 1CCs and don't terminate when 1CC reservations end. Each compute node includes 24TB of usable local ephemeral NVMe storage. Each management node includes 208GB of usable local ephemeral NVMe storage. [Lambda filesystems](../filesystems/)are automatically created and attached to each 1CC node, and can also be attached to Lambda On-Demand instances. Existing filesystems in the same region can additionally be attached. [You’re billed only for the storage you actually use](../billing/#filesystems). All 1CC nodes are preinstalled with Ubuntu 22.04 LTS and [Lambda Stack](https://lambda.ai/lambda-stack-deep-learning-software), including NCCL, Open MPI, PyTorch® with DDP and FSDP support, TensorFlow, OFED, and other popular libraries and frameworks for distributed ML workloads, allowing ML engineers and researchers to begin their large-scale experiments and other work immediately after launching a 1CC. Note It's highly recommended that you use Lambda Stack's Python packages. Lambda Stack packages are extensively tested for compatibility and reliability in 1CC environments. Packages installed outside of Lambda Stack might not work properly, especially with newer GPUs. See our documentation on [creating a Python virtual environment](../../education/programming/virtual-environments-containers/#creating-a-python-virtual-environment)to learn how to use Lambda Stack's Python packages. --- # Using Lambda's Managed Kubernetes - Source: https://docs.lambda.ai/public-cloud/1-click-clusters/managed-kubernetes-legacy/ --- [1-click clusters](../../../tags/#tag:1-click-clusters)[kubernetes](../../../tags/#tag:kubernetes) # Using Lambda's Managed Kubernetes ## Introduction This guide walks you through getting started with [Lambda's Managed Kubernetes](https://lambda.ai/kubernetes)(MK8s) on a [1-Click Cluster](https://lambda.ai/service/gpu-cloud/1-click-clusters)(1CC). MK8s provides a Kubernetes environment with GPU support, InfiniBand (RDMA), and shared persistent storage across all nodes in a 1CC. Clusters are preconfigured so you can deploy workloads without additional setup. You'll learn how to: - Access MK8s using the Rancher Dashboard and `kubectl`. - Organize workloads using projects and namespaces. - Deploy and manage applications. - Expose services using Ingresses. - Use shared and node-local persistent storage. - Monitor GPU usage with the NVIDIA DCGM Grafana dashboard. The guide includes two examples. In the first, you'll deploy a vLLM server to serve the [NousResearch Hermes 3](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B)model: - Create a namespace for the examples. - Add a PersistentVolumeClaim (PVC) to cache model downloads. - Deploy the vLLM server. - Expose it with a Service. - Configure an Ingress to make it accessible externally. In the second, you'll evaluate the multiplication-solving accuracy of the [DeepSeek R1 Distill Llama 70B](https://lambda.ai/inference-models/deepseek-llama3.3-70b)model using vLLM: - Run a batch job that performs the evaluation. - Monitor GPU utilization during the run. ## Prerequisites You need the Kubernetes command-line tool, `kubectl`, to interact with the cluster. Refer to the Kubernetes documentation for [installation instructions](https://kubernetes.io/docs/tasks/tools/#kubectl). ## Accessing MK8s After your 1CC with MK8s is provisioned, you'll receive credentials to access MK8s. These include the Rancher Dashboard URL, username, and password. To access MK8s using either the Rancher Dashboard or `kubectl`, you must first configure a firewall rule: - In the Cloud dashboard, go to the [Firewall](https://cloud.lambda.ai/firewall)page. - Click **Edit **to modify the inbound firewall rules. - Click **Add rule **, then set up the following rule: - **Type **: Custom TCP - **Protocol **: TCP - **Port range **: `443` - **Source **: `0.0.0.0/0` - **Description **: `Managed Kubernetes dashboard` - Click **Update and save **. ### Rancher Dashboard To access the MK8s Rancher Dashboard: - In your browser, go to the URL provided along with your MK8s credentials. You'll see a login screen. - Enter your username and password, then click **Log in with Local User **. - In the left sidebar, click the **Local Cluster **button: [![Screenshot of Local Cluster button](../../../assets/images/managed-kubernetes/local-cluster-button.png)](../../../assets/images/managed-kubernetes/local-cluster-button.png) ### kubectl To access MK8s using `kubectl`: - Open the Rancher Dashboard as described above. - In the top-right corner, click the **Download KubeConfig **button: [![Screenshot of Download KubeConfig button](../../../assets/images/managed-kubernetes/download-kubeconfig-button.png)](../../../assets/images/managed-kubernetes/download-kubeconfig-button.png) - Save the file to `~/.kube/config`. Alternatively, set the `KUBECONFIG`environment variable to the path of the file. - (Optional) Restrict access to the file: ``` `[](#__codelineno-0-1)chmod 600 ~/.kube/config ` ``` --- # Using Lambda's Managed Kubernetes - Source: https://docs.lambda.ai/public-cloud/1-click-clusters/managed-kubernetes/ --- [1-click clusters](../../../tags/#tag:1-click-clusters)[kubernetes](../../../tags/#tag:kubernetes) # Using Lambda's Managed Kubernetes ## Introduction This guide walks you through getting started with [Lambda's Managed Kubernetes](https://lambda.ai/kubernetes)(MK8s) on a [1-Click Cluster](https://lambda.ai/service/gpu-cloud/1-click-clusters)(1CC). MK8s provides a Kubernetes environment with GPU and InfiniBand (RDMA) support, and shared persistent storage across all nodes in a 1CC. Clusters are preconfigured so you can deploy workloads without additional setup. In this guide, you'll learn how to: - Access MK8s using `kubectl`. - Grant access to additional users. - Organize workloads using namespaces. - Deploy and manage applications. - Expose services using Ingresses. - Use shared and node-local persistent storage. - Monitor GPU usage with the NVIDIA DCGM Grafana dashboard. This guide includes two examples: In the first, you'll deploy a vLLM server to serve the [Nous Research Hermes 4](https://huggingface.co/NousResearch/Hermes-4-14B)model. You'll: - Create a namespace for the examples. - Add a PersistentVolumeClaim (PVC) to cache model downloads. - Deploy the vLLM server. - Expose it with a Service. - Configure an Ingress to make it accessible externally. In the second example, you'll evaluate the multiplication-solving accuracy of the [DeepSeek R1 Distill Qwen 7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)model using vLLM. You'll: - Run a batch job that performs the evaluation. - Monitor GPU utilization during the run. ## Prerequisites You need the Kubernetes command-line tool, `kubectl`, to interact with MK8s. Refer to the Kubernetes documentation for [installation instructions](https://kubernetes.io/docs/tasks/tools/#kubectl). You also need the `kubelogin`plugin for `kubectl`to authenticate to MK8s. Refer to the [kubelogin README for installation instructions](https://github.com/int128/kubelogin?tab=readme-ov-file#setup). ## Accessing MK8s To access MK8s, you need to: - Configure firewall rules to allow connections to MK8s. - Configure `kubectl`to use the provided `kubeconfig`file. - Authenticate to MK8s using your [Lambda Cloud account](https://cloud.lambda.ai). ### Configure firewall rules To access MK8s, you must first create firewall rules for the MK8s API server and Ingress Controller: - Navigate to the **Global rules **tab on the [Firewall settings page](https://cloud.lambda.ai/firewall)in the Lambda Cloud console. - In the **Rules **section, click **Edit rules **to begin creating a rule. - Click **Add rule **, then set up the following rule: - **Type **: Custom TCP - **Protocol **: TCP - **Port range **: `6443` - **Source **: `0.0.0.0/0` - **Description **: `MK8s API server` - Click **Add rule **again, then set up the following rule: - **Type **: Custom TCP - **Protocol **: TCP - **Port range **: `443` - **Source **: `0.0.0.0/0` - **Description **: `MK8s Ingress Controller` - Click **Update firewall rules **. ### Configure `kubectl` You're provided with a `kubeconfig`file when MK8s is provisioned. You need to set up `kubectl`to use this `kubeconfig`file: - Save the file to `~/.kube/config`. Alternatively, set the `KUBECONFIG`environment variable to the path of the file. - (Optional) Restrict access to the file: ``` `[](#__codelineno-0-1)chmod 600 ~/.kube/config ` ``` --- # Using Lambda's Managed Slurm - Source: https://docs.lambda.ai/public-cloud/1-click-clusters/managed-slurm/ --- [1-click clusters](../../../tags/#tag:1-click-clusters)[slurm](../../../tags/#tag:slurm) # Using Lambda's Managed Slurm [See our video guide on using Lambda's Managed Slurm.](https://youtu.be/G2yFiZroW3g) ## Introduction to Slurm Slurm is a widely used open-source workload manager optimized for high-performance computing (HPC) and machine learning (ML) workloads. When deployed on a Lambda 1-Click Cluster (1CC), Slurm allows administrators to create user accounts with controlled access, enabling individual users to submit, monitor, and manage distributed ML training jobs. Slurm automatically schedules workloads across the 1CC, maximizing cluster utilization while preventing resource contention. ## Lambda's Slurm The table below summarizes the key differences between Lambda's Managed and Unmanaged Slurm deployments on a 1CC: Feature Managed Slurm ✓ Unmanaged Slurm ✗ Access only through login node ✓ ✗ (all nodes accessible) User `sudo`/ `root`privileges ✗ ✓ Lambda monitors Slurm daemons ✓ ✗ (customer is responsible) Lambda applies patches and upgrades ✓ (on request) ✗ (customer is responsible) Slurm support with SLAs ✓ ✗ Lambda Slurm configuration ✓ ✓ Slurm configured for high availability ✓ ✓ Shared `/home`across all nodes ✓ ✓ Shared `/data`across all nodes ✓ ✓ ### Managed Slurm When Lambda's Managed Slurm (MSlurm) is deployed on a 1CC: - All interaction with the cluster happens through the login node. Access to other nodes is restricted to help ensure cluster integrity and reliability. - Lambda monitors and maintains the health of Slurm daemons such as `slurmctld`and `slurmdbd`. - Lambda coordinates with the customer to apply security patches and upgrade to new Slurm releases, if requested. - Lambda provides support according to the service level agreements (SLAs) in place with the customer. ### Unmanaged Slurm In contrast, on a 1CC with Unmanaged Slurm: - All nodes are directly accessible, and users have system administrator privileges ( `sudo`or `root`) across the cluster. Warning Workloads that run outside of Slurm might interfere with the resources managed by Slurm. Additionally, users with administrator access can make changes that render the 1CC unrecoverable. In such cases, Lambda might need to "repave" the 1CC, fully wiping and reinstalling the system. --- # Security posture - Source: https://docs.lambda.ai/public-cloud/1-click-clusters/security-posture/ --- [1-click clusters](../../../tags/#tag:1-click-clusters)[security](../../../tags/#tag:security) # Security posture This document describes the physical and logical security properties of the Lambda 1-Click Clusters™ (1CC) product, including the default software configuration on cluster nodes. The following diagram illustrates the 1CC network architecture: [![1cc-network-architecture](../../../assets/images/1cc-network-architecture.png)](../../../assets/images/1cc-network-architecture.png) ## Compute (GPU) nodes 1CC compute nodes run on single-tenant hardware with tenant isolation enforced using logical network segmentation. Underlying hardware resources, including GPUs, local storage, memory, and network interfaces, aren't shared with or accessible by any other customers. Compute nodes live on a dedicated network segment with no inbound connectivity from the firewall. Compute nodes can be reached either by using a management node as a jump box or via a public reverse tunnel to a JupyterLab service running on each compute node. Each JupyterLab instance is configured with a unique, random authentication token shared via the Lambda Cloud console. For more information, see the [JupyterLab security documentation](https://jupyter-server.readthedocs.io/en/latest/operators/security.html). Customers have full control over the configuration of their compute nodes and can reconfigure them at will. ## Management (Head) nodes 1CC management nodes run on multi-tenant hardware with tenant isolation enforced using hardware virtualization. Underlying resources, including local storage, memory, and network interfaces, are shared with other customers. By default, management nodes are directly accessible over the internet via SSH and via a public reverse tunnel to a JupyterLab service running on each management node. Each JupyterLab instance is configured with a unique, random authentication token shared via the Lambda Cloud console. For more information, see [JupyterLab's security documentation](https://jupyter-server.readthedocs.io/en/latest/operators/security.html). Customers can configure their own inbound firewall rules to expand or reduce the exposure of their management nodes. Customers have full control over the configuration of their management nodes and can reconfigure them at will. ## Ethernet connect 1CC compute and management nodes share a logically isolated Ethernet switching fabric. Logical isolation ensures that customers have no interaction with each other. ## InfiniBand interconnect 1CC compute nodes share a specially isolated InfiniBand fabric that ensures that customer traffic only ever transits physical IB links dedicated to that customer, ensuring complete isolation of customer InfiniBand traffic. ## Persistent file storage All 1CC compute and management nodes have pre-configured access to a customer-specific portion of a multi-tenant persistent file storage system. The storage system is on an isolated network accessible only to management and compute nodes. All data on the storage system is encrypted at rest using industry standard algorithms and parameters. ## Lambda employee access Logical and physical access to 1CC infrastructure, such as network and storage solutions, is limited to Lambda employees with a specific need for access. Underlying 1CC infrastructure is monitored for security, utilization, performance, and reliability. Lambda employees do not access customer environments without customers' express authorization. Customers are responsible for all security instrumentation and monitoring of their management and compute nodes. ## Physical security 1CC infrastructure is located in secure data centers with the following access controls: - Qualified in-house security personnel on site 24x7x365 - CCTV surveillance, with a minimum 90 days of retention - Multiple security checkpoints: - Controlled fenced access into data center property - Lobby mantraps in data center hallway - 2FA Access Control (Biometric and RFID badge) into data hall - 2FA Access Control (Biometric and RFID badge) into secured cage Authorized Lambda employees may access data centers for maintenance, upgrades, or other hardware infrastructure work. --- # How to serve the Llama 3.1 405B model using a Lambda 1-Click Cluster - Source: https://docs.lambda.ai/public-cloud/1-click-clusters/serving-llama-3_1-405b/ --- [1-click clusters](../../../tags/#tag:1-click-clusters)[distributed training](../../../tags/#tag:distributed-training) # How to serve the Llama 3.1 405B model using a Lambda 1-Click Cluster In this tutorial, you'll learn how to use a 1-Click Cluster (1CC) to serve the [Meta Llama 3.1 405B model](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)using [vLLM](https://docs.vllm.ai/en/latest/index.html)with pipeline parallelism. Note You need a [Hugging Face](https://huggingface.co/)account to download the Llama 3.1 405B model. You also need a [User Access Token](https://huggingface.co/docs/hub/en/security-tokens)with the **Read **role. Before you can download the Llama 3.1 405B model, you need to review and accept the model's license agreement. Once you accept the agreement, a request to access the repository will be submitted. You can see the status of the request in your [Hugging Face account settings](https://huggingface.co/settings/gated-repos). --- # Support - Source: https://docs.lambda.ai/public-cloud/1-click-clusters/support/ --- [1-click-clusters](../../../tags/#tag:1-click-clusters)[support](../../../tags/#tag:support) # Support At Lambda, we recognize that exceptional support is critical to maximizing the value of your 1-Click Cluster (1CC) deployment. Our world-class support team is dedicated to ensuring your success at every stage, from deployment to daily operations. ## Support team When you choose Lambda, you gain access to a dedicated support team of seasoned professionals with deep expertise in AI/ML infrastructure. This team includes: - **Customer Success Manager (CSM) **: Your main point of contact post-sales, responsible for ensuring the delivery of your solution and your overall satisfaction. - **Technical Account Manager(TAM) **: An expert specialist who understands the specifics of your deployment, provides ongoing technical guidance, and escalates any complex issues. - **Machine Learning Expert (MLE) **: An AI/ML expert within Lambda will provide guidance on how to integrate and scale AI workloads. - **Support Engineering **: Lambda's Support team, available 24/7 through our ticketing portal, is well-versed in 1CC and will be able to respond to any technical support request, escalating issues and incidents early and often. ## Support scope Lambda classifies incoming support tickets into three categories: - In scope - Best effort - Out of scope ### In scope - **Hardware and Infrastructure: **Full support for CPU/GPU VMs, physical hosts, and networking components. - **Software Environment: **Assistance with Lambda Stack, OFED drivers, JupyterLab, and essential ML tools like NCCL and Open MPI. - **Networking and Storage: **Management and troubleshooting for Ethernet and InfiniBand networks, as well as persistent and local storage. - **Slurm Installation: **Guidance on Slurm setup and basic troubleshooting to streamline your job scheduling processes. - **Managed Kubernetes*: **If purchased as an add-on, our team of Kubernetes experts will be there to help you. ### Best effort Our Support team is dedicated to delivering world-class customer service and technical expertise. We empower our engineers to go the extra mile, even when it means stepping beyond the standard scope of our support and engineered products to provide innovative solutions. In these exceptional cases, we ensure our customers understand that while we strive to help, there may be no guaranteed outcome. Any solutions we provide under these circumstances won't be fully supported in the future, and Lambda cannot assume responsibility for any potential impacts. ### Out of scope Some support requests cannot be supported, such as: - Troubleshooting customer code - 3rd party applications/software installed after cluster handoff - Network/VPN connections to your cluster # SLA Our focus on prompt and reliable service is supported by the clearly established response times in our agreements: Incident Level Definition Initial Response Time Severity 1 A critical Services problem in which the Services (i) are down, inoperable, inaccessible, or unavailable, (ii) otherwise materially cease operation, or (iii) perform or fail to perform so as to prevent useful work from being done. 4 hours Severity 2 A Services problem in which the Services (i) are severely limited or major functions are performing improperly, and the situation is significantly impacting certain portions of the Services users’ operations or productivity, or (ii) have been interrupted but recovered, and there is high risk of recurrence. 8 hours Severity 3 A minor or cosmetic Services problem that (i) is an irritant, affects non-essential functions, or has minimal business operations impact, (ii) is localized or has isolated impact, (iii) is an operational nuisance, (iv) results in documentation errors, or (v) is otherwise not Severity 1 or Severity 2, but represents a failure of services to conform to specifications provided 10 hours Service Request Requests for action or tasks that are not generated by an incident. 24 hours --- # Access and security overview - Source: https://docs.lambda.ai/public-cloud/access-security/ --- [identity and access management](../../tags/#tag:identity-and-access-management)[security and compliance](../../tags/#tag:security-and-compliance) # Access and security This page describes Lambda Cloud's access management and security features. ## Access management Lambda provides lightweight access management mechanisms to ensure secure access while minimizing friction. ### API keys The Lambda Cloud API uses API keys to authenticate incoming requests. You can generate a new API key pair or view your existing API keys by visiting the [API keys page](http://cloud.lambda.ai/api-keys)in the Lambda Cloud console. API keys have full access to all Lambda API operations. ### SSH keys Before you launch an instance, you must add an SSH key to your Lambda Cloud account. When you go through the process of launching an instance, you'll be prompted to supply this SSH key so you can securely connect to the instance after launching. You can import an existing key if you have one, or you can generate a new one in the Lambda Cloud console. For guidance on setting up an SSH key, see [Connecting to an instance > Setting up SSH access](../on-demand/connecting-instance/#setting-up-ssh-access). ### Teams You can add new members to your Lambda account by inviting them to join your *Team *. Each Team member can be either an *Admin *or a *Member *: - Both roles have full access to your Lambda resources. Each can create API keys, launch and terminate instances, and retrieve audit logs, for example. - Admins can also invite or remove Team members, modify the project's payment information, and rename the team. The invitee's email address must not already be associated with an existing Lambda account. If your team member already has a Lambda account, ask them to provide a different address or, if feasible, to close their existing account. For details on creating and updating Teams, see [Teams](../teams/). Important **Each role has full access to your Lambda resources. **Make sure to invite only trusted persons to your Team. --- # Billing overview - Source: https://docs.lambda.ai/public-cloud/billing/ --- [billing](../../tags/#tag:billing)[public cloud](../../tags/#tag:public-cloud) # Billing This page explains how Lambda bills each of its public cloud resources. ## Billable resources Lambda Cloud charges for the following resources: - On-Demand Cloud (ODC) instances - 1-Click Clusters (1CCs) - Filesystems Charges include sales tax, which is based on the location provided in your billing information. ### On-Demand Cloud (ODC) instances ODC prices instances by hourly usage and bills in one-minute increments. Billing begins the moment you launch an instance and the instance passes health checks, and ends the moment you terminate the instance. Instances are billed for as long as they're running, regardless if they're actively being used. You receive weekly invoices for the previous week's usage. To view current ODC instance pricing, see the pricing table on the [On-Demand Cloud](https://lambda.ai/service/gpu-cloud#pricing)page. ### 1-Click Clusters (1CCs) 1CCs are priced per GPU per hour and billed in weekly increments according to the terms of your reservation. You receive a billing summary during the 1CC reservation process and an invoice by email when your reservation is approved. After receiving the invoice, you have ten days to pay for your reservation. To view current 1CC pricing, see the pricing table on the [1-Click Clusters](https://lambda.ai/service/gpu-cloud/1-click-clusters#pricing)page. ### Filesystems Filesystems are billed per GiB used per month in one-hour increments. For example, at a rate of $0.20 per GiB per month: - If you use 1,000 GiB continuously for a full month (720 hours), you'll be billed $200.00. - If you use 1,000 GiB continuously for a full day (24 hours), you'll be billed $6.67. Important The rate above is used for example purposes and might not reflect current pricing. The actual price will be displayed when you create your filesystem. --- # Cloud Console - Source: https://docs.lambda.ai/public-cloud/console/ --- [1-click clusters](../../tags/#tag:1-click-clusters) [on-demand cloud](../../tags/#tag:on-demand-cloud) # Lambda Cloud console You can use the [Lambda Cloud console](https://cloud.lambda.ai/instances) to manage your Lambda Cloud resources, Lambda account, and Lambda Teams. This doc provides an overview of the available features. ## Launch, restart, reboot, or terminate instances ### Launch instances To launch an instance: - Click [**Instances**](https://cloud.lambda.ai/instances) in the left sidebar of the console. Then, click **Launch instance** at the top-right of the console. - Click the instance type that you want to launch. - Click the region in which you want to launch the instance. - Select your preferred base image and then click **Next**. For a list of available images, see [Alternative images](../on-demand/#alternative-images). - Click the [Lambda filesystem](../filesystems/) that you want to attach to your instance. If you don't want to or can't attach a persistent storage file system to your instance, click **Don't attach a filesystem**. - Select the [SSH key](#add-generate-and-delete-ssh-keys) that you want to use for your instance. Then, click **Launch instance**. Tip You can [add additional SSH keys](../on-demand/getting-started/#is-it-possible-to-use-more-than-one-ssh-key) to your instance once your instance has launched. --- # Filesystems - Source: https://docs.lambda.ai/public-cloud/filesystems/ --- [storage](../../tags/#tag:storage) # Filesystems A *filesystem* is a high-capacity regional file store you can attach to your instance to store datasets and back up system state. In most regions, each filesystem has a capacity of 8 EB (8,000,000 TB), and you can create up to 24 total filesystems. In the Texas, USA (us-south-1) region, filesystems are currently limited to 10 TB of capacity. For information on how filesystems are billed, see the [Billing overview](../billing/#filesystems). ## Accessing your filesystem ### Accessing from another instance or a 1-Click Cluster To access a filesystem from within Lambda Cloud: - The filesystem must reside in the same region as the instance or cluster. - You must attach the filesystem to your instance or cluster at the time that the instance or cluster is launched. Note Filesystems cannot currently be transferred between regions. --- # Firewalls - Source: https://docs.lambda.ai/public-cloud/firewalls/ --- [1-click clusters](../../tags/#tag:1-click-clusters) [on-demand cloud](../../tags/#tag:on-demand-cloud) # Firewalls You can restrict incoming traffic to your instances, including [1-Click Cluster](../1-click-clusters/) management (head) nodes, by creating firewall rules on the [Firewall page](https://cloud.lambda.ai/firewall) in the Lambda Cloud console. You can create global rules that apply to all of your instances, or rulesets scoped to individual instances and their regions. By default, Lambda allows only incoming ICMP traffic or TCP traffic on port 22 (SSH). Note You can also use the Lambda Cloud API to manage your global firewall rules and per-instance rulesets programmatically. For details, see [Firewalls](https://docs.lambda.ai/api/cloud#Firewalls) in the Lambda Cloud API browser. --- # Guest Agent - Source: https://docs.lambda.ai/public-cloud/guest-agent/ --- [1-click clusters](../../tags/#tag:1-click-clusters) [on-demand cloud](../../tags/#tag:on-demand-cloud) # Guest Agent ## Introduction The `lambda-guest-agent` (Guest Agent) is a service you can install on both Lambda Cloud on-demand instances and 1-Click Cluster (1CC) nodes. For simplicity, this documentation refers to both types of systems as *virtual machines* (VMs). The Guest Agent collects system metrics, such as GPU and VRAM utilization, and sends them to Lambda's backend. You can then view these metrics in the the [Lambda Cloud console](https://cloud.lambda.ai/). [![](../../assets/images/guest-agent/view-metrics.png)](../../assets/images/guest-agent/view-metrics.png) [![](../../assets/images/guest-agent/graphs-1.png)](../../assets/images/guest-agent/graphs-1.png) Note The metrics dashboard will be generally available in the Lambda Cloud console in Q2 of 2025. To request early access, submit a support ticket. --- # Importing and exporting data - Source: https://docs.lambda.ai/public-cloud/importing-exporting-data/ --- [1-click clusters](../../tags/#tag:1-click-clusters) [on-demand cloud](../../tags/#tag:on-demand-cloud) # Importing and exporting data This document outlines common solutions for importing data into your Lambda On-Demand Cloud (ODC) instances and 1-Click Clusters (1CCs). The document also provides guidance on backing up your data so that it persists beyond the life of your instance or 1CC. ## Importing data You can use `rsync` to copy data to and from your Lambda instances and their attached filesystems. `rsync` allows you to copy files from your local environment to your ODC instance, between ODC instances, from instances to 1CCs, and more. If you need to import data from AWS S3 or an S3-compatible object storage service like Cloudflare R2, Google Cloud Storage, or Minio, you can use `s5cmd` or `rclone`. ### Importing data from your local environment To copy files from your local environment to a Lambda Cloud instance or cluster, run the following `rsync` command from your local terminal. Replace the variables as follows: - Replace `` with the files or directories you want to copy to the remote instance. If you're copying multiple files or directories, separate them using spaces—for example, `foo.md bar/ baz/`. - Replace `` with your username on the remote instance. - Replace `` with the IP address of the remote instance. - Replace `` with the directory into which you want to copy files. ```bash rsync -av --info=progress2 @: ``` --- # Managing billing - Source: https://docs.lambda.ai/public-cloud/manage-billing/ --- [billing](../../tags/#tag:billing) [public cloud](../../tags/#tag:public-cloud) # Managing billing This page explains how to manage your Lambda Cloud billing and provides guidance for troubleshooting billing issues. ## Setting up billing Before you can launch Lambda Cloud services, you must visit the [Settings > Billing page](https://cloud.lambda.ai/settings/billing) in the Lambda Cloud console and add a credit card to your account. Lambda makes a $10 pre-authorization charge to make sure the card is valid. The charge will be refunded in a few days. ## Paying an open invoice To pay an open invoice, visit the [Settings > Billing page](https://cloud.lambda.ai/settings/billing) in the Lambda Cloud console, scroll to the **Payment History** section, and then click **Pay open invoice** to open the payment page. Note If you don't have any open invoices, the **Pay open invoice** button does not appear. --- # Overview - Source: https://docs.lambda.ai/public-cloud/on-demand/ --- # Overview On-Demand Cloud (ODC) provides on-demand access to Linux-based, GPU-backed virtual machine instances. ## Instance types ODC offers a variety of predefined instance types to support different workload requirements. Available GPUs include the state-of-the-art NVIDIA HGX B200 GPU, NVIDIA GH200 Grace Hopper Superchip, and NVIDIA H100 Tensor Core GPU, as well as several earlier models. Each instance you create is tied to a specific geographical region. For a list of available regions, see the [Regions](#regions) section below. Select instance types are backed by GPUs that feature NVIDIA SXM. SXM offers improved bandwidth between the NVIDIA GPUs in a single physical server. Warning Lambda prohibits cryptocurrency mining on ODC instances. --- # Connecting to an instance - Source: https://docs.lambda.ai/public-cloud/on-demand/connecting-instance/ --- [on-demand cloud](../../../tags/#tag:on-demand-cloud) # Connecting to an instance You can connect to your On-Demand Cloud (ODC) instances directly through SSH or by using the preinstalled JupyterLab server. ## Setting up SSH access Before you launch an instance, you must add an SSH key to your Lambda Cloud account. When you go through the process of launching an instance, you'll be prompted to supply this SSH key so you can securely connect to the instance after launching. You can import an existing key if you have one, or you can generate a new one in the Lambda Cloud console. ### Adding an existing SSH key If you have an existing SSH key, you can add it to your Lambda Cloud account and use it to connect to your instances. Lambda Cloud accepts SSH keys in the following formats: - OpenSSH, the format `ssh-keygen` uses by default when generating keys. - RFC4716, the format PuTTYgen uses when you save a public key. - PKCS8 - PEM View examples of each key type OpenSSH keys look like: ```text ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIK5HIO+OQSyFjz0clkvg+48YAihYMo5J7AGKiq+9Alg8 foo@bar ``` --- # Creating and managing instances - Source: https://docs.lambda.ai/public-cloud/on-demand/creating-managing-instances/ --- [on-demand cloud](../../../tags/#tag:on-demand-cloud) # Creating and managing instances This doc outlines how to create and manage On-Demand Cloud (ODC) instances. For general guidance on managing your ODC instance's system environment, see [Managing your system environment](../managing-system-environment/). ## Viewing available instance types To view available instance types, navigate to the [Instances page](https://cloud.lambda.ai/instances) in the Lambda Cloud console and click **Launch instance** to start the instance creation wizard. The first page of the wizard dialog lists all of the instance types Lambda currently offers. You can also programmatically view available instance types by using the Lambda Cloud API. For details, see [List available instance types](https://docs.lambda.ai/api/cloud#listInstanceTypes) in the Lambda Cloud API browser. ## Launching instances To launch a new instance, navigate to the [Instances page](https://cloud.lambda.ai/instances) in the Lambda Cloud console, click **Launch instance**, and then follow the steps in the instance creation wizard. You can also launch instances programmatically by using the Lambda Cloud API. For details, see [Launch instances](https://docs.lambda.ai/api/cloud#launchInstance) in the Lambda Cloud API browser. Instances might take several minutes to launch. Note New accounts have a limit on the number of instances you can launch. This quota helps prevent abuse and increases automatically as you pay your invoices. If you attempt to launch an instance that exceeds this limit, you will see a notification that your quota has been reached. --- # Getting started - Source: https://docs.lambda.ai/public-cloud/on-demand/getting-started/ --- [1-click clusters](../../../tags/#tag:1-click-clusters) [automation](../../../tags/#tag:automation) [on-demand cloud](../../../tags/#tag:on-demand-cloud) # Getting started ## Can my data be recovered once I've terminated my instance? Warning We cannot recover your data once you've terminated your instance! Before terminating an instance, make sure to back up all data that you want to keep. If you want to save data even after you terminate your instance, create a [persistent storage filesystem](https://lambda.ai/blog/persistent-storage-beta/). --- # Managing your system environment - Source: https://docs.lambda.ai/public-cloud/on-demand/managing-system-environment/ --- [on-demand cloud](../../../tags/#tag:on-demand-cloud) # Managing your system environment This document provides general guidance for managing your On-Demand Cloud (ODC) instance's system environment. ## Isolating environments on your instance You can use virtual environments to isolate different jobs or experiments from each other on the same machine. This section details a few ways you can create virtual environments on your ODC instance. Tip Because they centralize large parts of your working environment in a small number of locations, virtual environments can also help simplify the process of backing up and restoring your work. --- # Troubleshooting - Source: https://docs.lambda.ai/public-cloud/on-demand/troubleshooting/ --- [on-demand cloud](../../../tags/#tag:on-demand-cloud) # Troubleshooting ## apt full-upgrade fails on Lambda Stack 24.04 and GPU Base 24.04 images As of December 2025, running `sudo apt full-upgrade` or `sudo apt dist-upgrade` on the Lambda Stack 24.04 or GPU Base 24.04 base images produces an error: ```text Error! Bad return status for module build on kernel: 6.14.0-1013-nvidia (x86_64) Consult /var/lib/dkms/mlnx-ofed-kernel/24.10.OFED.24.10.3.2.5.1/build/make.log for more information. ``` --- # Filesystem S3 Adapter - Source: https://docs.lambda.ai/public-cloud/s3-adapter-filesystems/ --- [1-click clusters](../../tags/#tag:1-click-clusters) [filesystems](../../tags/#tag:filesystems) # Filesystem S3 Adapter [See our video guide on using the Filesystem S3 Adapter.](https://youtu.be/9ewdYXajuBc) The Filesystem S3 Adapter allows you to interact with your Lambda filesystems using `rclone`, `s5cmd`, `mc`, and other S3-compatible tools. Supported operations include: - Listing the files and folders on your filesystem. - Transferring or copying files and folders to and from your filesystem. - Deleting files and folders from your filesystem. As of April 2025, this feature is available in select regions only. For the current list of supported regions, see [API regions and endpoints](#api-regions-and-endpoints) below. Note While the Filesystem S3 Adapter is designed to be compatible with S3 tooling, it is not designed to be a replacement for a full-fledged object storage offering. --- # Teams - Source: https://docs.lambda.ai/public-cloud/teams/ --- [1-click clusters](../../tags/#tag:1-click-clusters) [automation](../../tags/#tag:automation) [on-demand cloud](../../tags/#tag:on-demand-cloud) # Teams ## Create a team - In the Lambda Cloud console, in the sidebar, click **Team**. - At the top right of your **Team** page, click **Invite**. - Enter the email address of the person you want to invite to your team. Select their role in the team, either an **Admin** or a **Member**. - Click **Send invitation**. Warning **Be sure to invite only trusted persons to your team!** Currently, the only differences between the *Admin* and *Member* roles are that an *Admin* can: - Invite others to the team. - Remove others from the team. - Modify payment information. - Change the team name. This means that a person with a *Member* role can, for example: - Launch instances that will incur charges. - Terminate instances that should continue to run, including those launched by others on the team.