# Lepton Ai > * [Introduction](/dgx-cloud/lepton/get-started/) --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/batch-jobs/configurations Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Batch Job Configurations Copy page Learn how to configure your batch job in DGX Cloud Lepton. When creating a batch job, you can configure the resource, container, and additional options. ## Resource * **Node Group** : Select one or more node groups to determine what resources this job can use. * **Priority** : Set the priority of the job. Defaults to Medium (4). If the specified node group has limited resources, you can set the priority accordingly to get higher-priority resource allocation. * **Resource shape** : The instance type that the job will run on. Select from a variety of CPU and GPU shapes. Refer to [Node Group Shapes](/dgx-cloud/lepton/features/nodes/resource-shape/) for more details. * **Nodes** : Defaults to no specific nodes, but you can specify the nodes you want to launch the job on. * **Can preempt lower priority workload** : Whether the job can preempt lower-priority workloads. Defaults to false. * **Can be preempted by higher priority workload** : Whether the job can be preempted by higher-priority workloads. Defaults to false. * **Workers** : The number of workers to launch for the job. Defaults to 1. ## Container * **Image** : The container image that will be used to create the job. You can choose from the default image lists or use your own custom image. * **Private image registry auth (optional)** : If you are using a private image, [specify the image registry authentication](/dgx-cloud/lepton/features/workspace/registry/). * **Run Command** : The command to run when the container starts. * **Container Ports** : The ports that the container will listen on. In this field, you can add multiple ports, and each port can be specified with a protocol (TCP, UDP, or SCTP) and a port number. ![ports](/dgx-cloud/lepton/_next/static/media/ports.48f36ecf.png) * **Log Collection** : Whether to collect the logs from the container. Follows the workspace-level setting by default. ## Advanced * **Environment Variables** : Environment variables are key-value pairs that are passed to the job. They are automatically set as environment variables in the job container, so the runtime can refer to them as needed. Refer to [this guide](/dgx-cloud/lepton/features/batch-jobs/predefined-env-vars/) for more details. Your defined environment variables should not start with the name prefix `LEPTON_`, as this prefix is reserved for predefined environment variables. The following environment variables are predefined and will be available in the job: * `LEPTON_JOB_NAME`: The name of the job * `LEPTON_RESOURCE_ACCELERATOR_TYPE`: The resource accelerator type of the job * **Storage** : Mount storage for the job container. Refer to [this guide](/dgx-cloud/lepton/features/utilities/storage/) for more details. * **Shared Memory** : The size of the shared memory that will be allocated to the container. The default amount of shared memory is based on the memory of the resource shape. For example, an instance with one TiB of memory will have 100 GiB of shared memory. The mapping is defined by the table below: Memory| Shared Memory ---|--- 4 GiB| 2 GiB 32 GiB| 16 GiB 128 GiB| 32 GiB 512 GiB| 64 GiB 1 TiB| 100 GiB * **Max replica failure retry** : Maximum number of times to retry a failed replica. Defaults to zero. * **Max job failure retry** : Maximum number of failure restarts of the entire job. * **Disable retry when program error occurs** : If enabled, the job will not be retried if a program error is detected in the logs. * **Archive time** : The time to keep the job's logs and artifacts after the job is completed. Defaults to 3 days. * **Visibility** : Specifies the visibility of the job. If set to private, only the creator can access the job. If set to public, all users in the workspace can access the job. [Starter Kits on Dev Pods](/dgx-cloud/lepton/features/dev-pods/create-from-starter-kits/)[Environment Variables Reference](/dgx-cloud/lepton/features/batch-jobs/predefined-env-vars/) Resource Container Advanced [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/compute/bring-your-own-compute Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Introduction to BYOC Copy page Learn how to bring your own compute to DGX Cloud Lepton. DGX Cloud Lepton allows you to seamlessly integrate your own compute infrastructure, providing enterprise-grade workload management capabilities. ## What is Bring Your Own Compute? Bring Your Own Compute (BYOC) is a feature that enables you to use your existing hardware infrastructure with Lepton's platform. This approach offers several advantages: * Leverage your existing hardware investments * Maintain control over your physical infrastructure * Unify your compute resources under Lepton's management interface * Access Lepton's enterprise-grade orchestration capabilities ## Getting Started with BYOC To start using your own compute with Lepton, follow these steps: 1. Review Requirements: Ensure your machines meet the BYOC Requirements for hardware, software, network, and storage configurations. 2. Create a Node Group: Before importing machines, create a node group to organize your compute resources. 3. Add Your Machines: Use either SSH or GPUd to add machines to your node group. 4. Manage Your Nodes: After importing, manage your BYOC nodes through the node group management interface. ## Next Steps * [Review BYOC Requirements](/dgx-cloud/lepton/compute/bring-your-own-compute/requirements/) * [Create BYOC Node Group](/dgx-cloud/lepton/compute/bring-your-own-compute/create-byoc/) * [Import Your Machines](/dgx-cloud/lepton/compute/bring-your-own-compute/add-machines/) [BYOC Requirements](/dgx-cloud/lepton/compute/bring-your-own-compute/requirements/) What is Bring Your Own Compute? Getting Started with BYOC Next Steps [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://raw.githubusercontent.com/leptonai/leptonai/main/CONTRIBUTING.md # Contributing to leptonai First and foremost, thank you for considering contributing to leptonai! We appreciate the time and effort you're putting into helping improve our project. This guide outlines the process and standards we expect from contributors. ## Development First clone the source code from Github ``` git clone https://github.com/leptonai/leptonai.git ``` Use `pip` to install `leptonai` from source ```shell cd leptonai pip install -e . ``` `-e` means "editable mode" in pip. With "editable mode" all changes to python code will immediately become effective in the current environment. ## Testing We highly recommend writing tests for new features or bug fixes and ensure all tests passing before submitting a PR. To run tests locally, first install test driver by doing ```shell pip install -e ".[test]" ``` To run all existing test cases together, simply run ``` pytest ``` If you only want to run specific test case, append the corresponding test file and test case name in the pytest command, e.g.: ``` pytest leptonai/photon/tests/test_photon.py::TestPhoton::test_batch_handler ``` ## Coding Standards Ensure your code is clean, readable, and well-commented. We use [black](https://github.com/psf/black) and [ruff](https://github.com/astral-sh/ruff) as code linter. To run lint locally, first install linters by doing ```shell pip install -e ".[lint]" ``` Then run ``` black . ruff . ``` to check code. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/endpoints/create-llm Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Create LLM Endpoints Copy page Learn how to deploy LLM endpoints on DGX Cloud Lepton. In this guide, we'll show you how to create a dedicated endpoint from LLMs with inference engine vLLM and SGLang to serve models from Hugging Face. ## Create LLM Endpoints with vLLM [vLLM](https://docs.vllm.ai/en/latest/) is a fast and easy-to-use library for LLM inference and serving. To create a dedicated endpoint with vLLM, follow these steps: 1. Go to the [Endpoints](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/deployments/list?creation=true) page and click **Create Endpoint**. 2. Select **Create LLM Endpoint**. 3. Select **vLLM** as the inference engine. 4. For **Endpoint name** , enter `vllm-endpoint` or any name you prefer. 5. For **Model** , click **Load from Hugging Face** and search by keyword. For example, `meta-llama/Llama-3.1-8B-Instruct`. If the model is gated, provide a Hugging Face token. Create a token in your [Hugging Face account](https://huggingface.co/settings/tokens) and save it as a secret in your workspace. 6. For **Resource** , choose an appropriate resource based on the model size. 7. For **Image configuration** , leave defaults as is. To add arguments, expand the command-line arguments section and add your own. vLLM arguments are listed [here](https://docs.vllm.ai/en/latest/serving/engine_args.html). 8. Leave other configurations at their defaults, or refer to [endpoint configurations](/dgx-cloud/lepton/features/endpoints/configurations/) for details. We recommend setting up an access token for your endpoint instead of making it public. Once created, the endpoint appears on the [Endpoints](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/deployments/list) page. View logs for each replica by clicking the Logs button in the Replica section. Test the endpoint in the playground by clicking the endpoint you created. To access the endpoint via API, click the API tab on the endpoint detail page to find the API key and endpoint URL. ## Create LLM Endpoints with SGLang [SGLang](https://docs.sglang.ai/) is a fast serving framework for large language models and vision language models. It enables faster, more controllable interactions by co-designing the backend runtime and frontend language. To create a dedicated endpoint with SGLang, follow these steps: 1. Go to the [Endpoints](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/deployments/list?creation=true) page and click **Create Endpoint**. 2. Select **Create LLM Endpoint**. 3. Select **SGLang** as the inference engine. 4. For **Endpoint name** , enter `sglang-endpoint` or any name you prefer. 5. For **Model** , click **Load from Hugging Face** and search by keyword. For example, `meta-llama/Llama-3.1-8B-Instruct`. If the model is gated, provide a Hugging Face token. Create a token in your [Hugging Face account](https://huggingface.co/settings/tokens) and save it as a secret in your workspace. 6. For **Resource** , choose an appropriate resource based on the model size. 7. For **Image configuration** , leave defaults as is. To add arguments, expand the command-line arguments section and add your own. SGLang arguments are listed [here](https://docs.sglang.ai/backend/server_arguments.html). 8. Leave other configurations at their defaults, or refer to [endpoint configurations](/dgx-cloud/lepton/features/endpoints/configurations/) for details. We recommend setting up an access token for your endpoint instead of making it public. Once created, the endpoint appears on the [Endpoints](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/deployments/list) page. View logs for each replica by clicking the Logs button in the Replica section. Test the endpoint in the playground by clicking the endpoint you created. To access the endpoint via API, click the API tab on the endpoint detail page to find the API key and endpoint URL. [Create Endpoints from NVIDIA NIM](/dgx-cloud/lepton/features/endpoints/create-from-nim/)[Dev Pod Configurations](/dgx-cloud/lepton/features/dev-pods/configurations/) Create LLM Endpoints with vLLM Create LLM Endpoints with SGLang [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/endpoints/create-from-nim Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Create Endpoints from NVIDIA NIM Copy page Learn how to create dedicated endpoints from NVIDIA NIM. For enhanced performance and seamless compatibility, NVIDIA‑optimized models from the NIM container registry are available on DGX Cloud Lepton. ## Prerequisites These models require an NVIDIA account with access to the NIM container registry. ### NVIDIA Registry You must have an NVIDIA account with access to the NIM container registry and configure the [registry auth key](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#launch-nvidia-nim-for-llms) on DGX Cloud Lepton. Refer to [this guide](/dgx-cloud/lepton/features/workspace/registry/#nvidia) for details. Once the registry auth key is created, add a private registry via [Settings > Registries > New Registry Auth](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/settings/registries). ![Create registry auth](/dgx-cloud/lepton/_next/static/media/create-registry.3cb1ce7c.png) Choose **NVIDIA** as the registry type and paste the registry auth key in the **API Key** field. ![Create registry auth](/dgx-cloud/lepton/_next/static/media/auth-key.856d395a.png) ### NGC API Key Besides the registry auth key, you also need an [NGC API key](https://docs.nvidia.com/ngc/gpu-cloud/ngc-user-guide/index.html#ngc-api-keys). Navigate to the [NGC API key creation page](https://org.ngc.nvidia.com/setup/api-keys) and click **Generate Personal Key**. In the **Service Included** field, select **Public API Endpoints**. ![NGC API key 0.6x](/dgx-cloud/lepton/_next/static/media/ngc-api-key.7bca824d.png) Store the NGC API key on DGX Cloud Lepton as a [secret](/dgx-cloud/lepton/features/workspace/secret/). ## Create endpoint from NVIDIA NIM Navigate to the [Create Endpoint](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/deployments/create/nim) page on the dashboard. For **Endpoint name** , enter `nim-endpoint` or any name you prefer. For **Resource** , choose an appropriate resource based on the model size. For **NIM configuration** : * Select a model image from the list of built‑in models, or enter a custom model image. * Select the NVIDIA registry auth you created (see [registry auth](/dgx-cloud/lepton/features/workspace/registry/#nvidia)). * Select the NGC API key you saved as a [secret](/dgx-cloud/lepton/features/workspace/secret/) in your workspace. For other endpoint‑related configurations, refer to [this guide](/dgx-cloud/lepton/features/endpoints/configurations/). For NIM engine‑related configurations, refer to [this guide](https://docs.nvidia.com/nim/large-language-models/latest/configuration.html#environment-variables). Configure the NIM engine by setting the relevant environment variables. When finished, click **Create Endpoint** to create the endpoint. [Create Endpoints from Container Image](/dgx-cloud/lepton/features/endpoints/create-from-container-image/)[Create LLM Endpoints](/dgx-cloud/lepton/features/endpoints/create-llm/) Prerequisites Create endpoint from NVIDIA NIM [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/dev-pods/configurations Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Dev Pod Configurations Copy page Learn how to configure your dev pod in Lepton. When creating a dev pod, you can configure resources, the container, and other options. ## Resource * **Node Group** : First, select one or more node groups to determine which resources the pod can use. * **Priority** : Set the priority of the pod (defaults to Medium (4)). If the specified node group has limited resources, you can raise the priority to get higher priority allocation. * **Resource shape** : The instance type the pod will run on. Select from a variety of CPU and GPU shapes. Refer to [Node Group Shapes](/dgx-cloud/lepton/features/nodes/resource-shape/) for more details. * **Nodes** : No specific nodes are selected by default, but you can target particular nodes to launch the pod on. * **Can preempt lower priority workload** : Whether the pod can preempt lower-priority workloads (defaults to false). * **Can be preempted by higher priority workload** : Whether the pod can be preempted by higher-priority workloads (defaults to false). ## Container * **Image** : The container image used to create the pod. Choose from default images or use your own custom image. * **SSH Public Key** : The SSH public key used to access the pod. Available for **default images** with **Dev Pod entrypoint**. See [this guide](/dgx-cloud/lepton/features/dev-pods/create-from-container-image/ssh-access/) for details. * **Enable JupyterLab** : Whether to enable JupyterLab in the pod (defaults to false). Available for **default images** with **Dev Pod entrypoint**. See [this guide](/dgx-cloud/lepton/features/dev-pods/create-from-container-image/enable-jupyter-lab/) for details. * **Private image registry auth (optional)** : If you are using a private image, [specify image registry credentials](/dgx-cloud/lepton/features/workspace/registry/). * **Entrypoint (Run Command)** : Entrypoint of the Dev Pod container. Choose from: * **Dev Pod entrypoint** : Automatically applies a run command based on the selected image. * **Image default entrypoint** : Uses the image's default run command. * **Custom entrypoint** : Specify a custom entrypoint for the container. This overrides the default entrypoint. * **Run as** : When using a custom image, choose to run the container as root or the image's default user (defaults to image default user). * **Container Ports** : Ports to expose from the container (maximum of 3). See [this guide](/dgx-cloud/lepton/features/dev-pods/container-ports/) for details. * **Enable SSH Host Network** : Whether to enable SSH Host Network port configuration (defaults to false). Available for **default images** with **Dev Pod entrypoint**. See [this guide](/dgx-cloud/lepton/features/dev-pods/create-from-container-image/ssh-access/) for details. * **Enable JupyterLab Proxy** : Whether to enable JupyterLab Proxy port configuration (defaults to false). Available for **default images** with **Dev Pod entrypoint**. See [this guide](/dgx-cloud/lepton/features/dev-pods/create-from-container-image/enable-jupyter-lab/) for details. ## Storage Mount storage for the pod container. See [this guide](/dgx-cloud/lepton/features/utilities/storage/) for details. ## Advanced * **Log Collection** : Whether to collect logs from the container (follows the workspace-level setting by default). * **Shared Memory** : The size of shared memory allocated to the container. * **Archive time** : How long to keep the job's logs and artifacts after completion (defaults to 3 days). * **Visibility** : If set to private, only the creator can access the job. If set to public, all users in the workspace can access the job. * **Environment Variables** : Key–value pairs passed to the pod container for use at runtime. [Create LLM Endpoints](/dgx-cloud/lepton/features/endpoints/create-llm/)[Container Ports](/dgx-cloud/lepton/features/dev-pods/container-ports/) Resource Container Storage Advanced [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/endpoints/configurations Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Endpoint Configurations Copy page Learn how to create and manage dedicated endpoints in Lepton for AI model deployment, including LLM, custom container image, and NVIDIA NIM, with various configuration options. An endpoint is a running instance of an AI model that exposes an HTTP server. Any service can run as a dedicated endpoint. The most common use case is deploying an AI model exposed with an [OpenAPI](https://www.openapis.org/) interface. ## Create Endpoint You can create an endpoint in several ways. Refer to the following guides for details: 1. [Create from NVIDIA NIM](/dgx-cloud/lepton/features/endpoints/create-from-nim/) 2. [Create Dedicated LLM Endpoint](/dgx-cloud/lepton/features/endpoints/create-llm/) 3. [Create from Container Image](/dgx-cloud/lepton/features/endpoints/create-from-container-image/) ### Autoscaling By default, DGX Cloud Lepton creates your endpoints with a single replica and automatically scales down to zero after one hour of inactivity. You can override this behavior with three autoscaling options and related flags: 1. **Scale replicas to zero based on no‑traffic timeout** : Specify the initial number of replicas and the no‑traffic timeout (seconds). 2. **Autoscale replicas based on traffic QPM** : Specify the minimum and maximum number of replicas and the target queries per minute (QPM). You can also specify the HTTP methods and request paths to include in traffic metrics. 3. **Autoscale replicas based on GPU utilization** : Specify the minimum and maximum number of replicas and the target GPU utilization percentage. ![autoscaling](/dgx-cloud/lepton/_next/static/media/autoscaling.b18072ad.png) We do not currently support scaling up from zero replicas. If a deployment is scaled down to zero replicas, it will not be able to serve any requests until it is scaled up again. ### Access Control DGX Cloud Lepton provides a built‑in access control system for your endpoints. You can create an endpoint with one of the following access control policies: * **Public access** : Allow access to your endpoint from any IP address with an optional endpoint token authentication. * **IP allowlist** : Allow access to your endpoint only from valid IP addresses or CIDR ranges. #### Public Access By default, the endpoint uses the Public access policy without token authentication, which means it is accessible from any IP address. To require token authentication, click **Add Endpoint Token** to create a new token. DGX Cloud Lepton automatically generates the token. You can add multiple tokens and modify generated token values. ![access tokens](/dgx-cloud/lepton/_next/static/media/access-tokens.51eee8ee.png) #### IP Allowlist Select **IP allowlist** to allow access to your endpoint only from specified IP addresses or CIDR ranges. Enter one IP address or CIDR range per line, or separate multiple entries with commas. ### Environment Variables and Secrets Environment variables are key‑value pairs passed to the deployment. All variables are injected into the deployment container and available at runtime. Secrets are similar to environment variables, but their values are pre-stored in the platform and not exposed in the development environment. Learn more about secrets [here](/dgx-cloud/lepton/features/workspace/secret/). Within the deployment, the secret value is available as an environment variable with the same name as the secret. Your defined environment variables should not start with the name prefix `LEPTON_`, as this prefix is reserved for predefined environment variables. ### Storage Mount storage for the deployment container. Refer to [this guide](/dgx-cloud/lepton/features/utilities/storage/) for details. ### Advanced Configurations #### Visibility * Public: Visible to all team members in your workspace. * Private: Visible only to the creator and administrators. #### Shared Memory The size of shared memory allocated to the container. #### Health Check Initial Delay (seconds) By default, two types of probes are configured: * **Readiness probe** : Initial delay of 5 seconds; checks every 5 seconds. One success marks the container as ready; 10 consecutive failures mark it as not ready. Ensures the service is ready to accept traffic. * **Liveness probe** : Initial delay of 600 seconds (10 minutes); checks every 5 seconds. One success marks the container as healthy; 12 consecutive failures mark it as unhealthy. Ensures the service remains healthy during operation. Some endpoints may need more time to start the container and initialize the model. Specify a custom delay by selecting **Custom** and entering the delay in seconds. #### Require Approval to Make Replicas Ready Specify whether approval is required before replicas become ready. By default, replicas are ready immediately after deployment. #### Pulling Metrics from Replicas Specify whether to pull metrics from replicas. By default, metrics are pulled from all replicas. #### Header-based Replica Routing Configure header‑based replica routing. By default, requests are load‑balanced across all replicas. When enabled, specify which replica to use for a request by including the `X-Lepton-Replica-Target` header with the replica ID. Example: #### Log Collection Whether to collect logs from replicas. By default, this follows the workspace setting. [Add Machines](/dgx-cloud/lepton/compute/bring-your-own-compute/add-machines/)[Create Endpoints from Container Image](/dgx-cloud/lepton/features/endpoints/create-from-container-image/) Create Endpoint [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/get-started/batch-job Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Batch Job Copy page Learn about Batch Jobs on DGX Cloud Lepton. A **Batch Job** is a one-off task that runs to completion, such as training a model or processing data. DGX Cloud Lepton provides a simple way to run and manage batch jobs in the cloud. ## Create Batch Job Navigate to the [create job page](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/jobs/create). In the **Resource** section, select the node group and your desired resource shape. For example, select `H100-80GB-HBM3` x 1 from node group `h100sxm-0`, and leave the worker and priority settings at their default values. ![create 0.75x](/dgx-cloud/lepton/_next/static/media/create.e4111fb5.png) In the **Container** section, use the default image. In the **Run Command** field, specify the following command to run in the container: By configuring the above settings, you can create a batch job that: * Uses one H100 GPU from node group `h100sxm-0` * Uses the default image and runs a simple counter that processes 10 items with a 2-second delay between each * Completes after approximately 20 seconds and is archived after 3 days You need to have a [node group](/dgx-cloud/lepton/get-started/node-group/) with available nodes in your workspace first. ## View the Batch Job After the job is created, you can check the status and results on the details page. Click **Logs** under the **Replicas** tab to see the logs for each replica. Because the worker count is one, there will be only one replica, and the logs should display the following: ![logs](/dgx-cloud/lepton/_next/static/media/logs.f7d72907.png) ## Next Steps For more information about batch jobs, refer to the following: * [Configure your Batch Job](/dgx-cloud/lepton/features/batch-jobs/configurations/) * [Job templates](/dgx-cloud/lepton/features/batch-jobs/templates/) * [Predefined environment variables](/dgx-cloud/lepton/features/batch-jobs/predefined-env-vars/) [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/)[Node Group](/dgx-cloud/lepton/get-started/node-group/) Create Batch Job View the Batch Job Next Steps [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/get-started/dev-pod Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Dev Pod Copy page Learn how to create and use a development pod on DGX Cloud Lepton. For AI application developers, having a convenient development environment is essential for building and testing applications. DGX Cloud Lepton provides a solution called **Dev Pods** —lightweight, container-based AI development environments. There are two ways to launch a Dev Pod: start from a blank container image or launch a Starter Kit, which contains a predefined notebook following various AI tasks and workflows. ## Create a Dev Pod from a Container Image Navigate to the [create pod page](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/pods/create/image) on the dashboard, where you can configure and launch a Dev Pod using a custom container image. Enter a name, select a node group and machine type in the **Resource** section, then click **Create**. The Dev Pod starts shortly. ![create 0.75x](/dgx-cloud/lepton/_next/static/media/create.2f5b6352.png) You need to have a [node group](/dgx-cloud/lepton/get-started/node-group/) with available nodes in the workspace first. ## Create from Notebooks To launch a Dev Pod with a connected Jupyter session, go to the [Starter Kits page](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/pods/create/notebook/list) to view curated notebooks. Select a notebook to open a read-only preview. To run it, click **Create pod from this notebook** at the top of the page and complete the Dev Pod settings. For more information on using Starter Kits, refer to the [Starter Kits on Dev Pods](/dgx-cloud/lepton/features/dev-pods/create-from-starter-kits/) guide. ## Access via Web Terminal After creation, visit the details page to check the Dev Pod's status, connection information, metrics, and more. Switch to the **Terminal** tab. DGX Cloud Lepton automatically establishes a connection to the Dev Pod, allowing you to control it through a full web terminal. ![details and web terminal 0.75x](/dgx-cloud/lepton/_next/static/media/details-and-web-terminal.7f5811ca.png) ## Next Steps With these steps, you can launch a Dev Pod and start developing. Refer to the following guides to learn more about Dev Pods: * [Configure your Dev Pod](/dgx-cloud/lepton/features/dev-pods/configurations/) * [Use SSH to access your Dev Pod](/dgx-cloud/lepton/features/dev-pods/create-from-container-image/ssh-access/) * [Enable JupyterLab in a Dev Pod](/dgx-cloud/lepton/features/dev-pods/create-from-container-image/enable-jupyter-lab/) * [Run NCCL test](/dgx-cloud/lepton/examples/dev-pod/nccl-test/) [Endpoint](/dgx-cloud/lepton/get-started/endpoint/)[Batch Job](/dgx-cloud/lepton/get-started/batch-job/) Create a Dev Pod from a Container Image Create from Notebooks Access via Web Terminal Next Steps [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/get-started/endpoint Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Endpoint Copy page Learn how to create and use endpoints on DGX Cloud Lepton for AI model deployment. An **Endpoint** is a running instance of an AI model that exposes an HTTP server. DGX Cloud Lepton lets you deploy AI models as endpoints, making them accessible via high-performance, scalable REST APIs. ## Create an Endpoint Navigate to the [create LLM endpoint page](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/compute/deployments/create/vllm). Select vLLM as the LLM engine, and load a model from Hugging Face in the **Model** section. In this case, we will use the `nvidia/Nemotron-Research-Reasoning-Qwen-1.5B` model. Then, in the **Resource** section, select the node group and your desired resource shape. In this case, use `H100-80GB-HBM3` x 1 from node group `h100`. ![create endpoint 0.8x](/dgx-cloud/lepton/_next/static/media/create.8b91d829.png) Click **Create** to deploy an endpoint that: * Uses one H100 GPU from node group `h100` * Deploys the `nvidia/Nemotron-Research-Reasoning-Qwen-1.5B` model with vLLM You need to have a [node group](/dgx-cloud/lepton/get-started/node-group/) with available nodes in your workspace first. ## Use the Endpoint By default, the endpoint is public and can be accessed by anyone with the URL. Refer to the [endpoint configurations](/dgx-cloud/lepton/features/endpoints/configurations/) for managing endpoint access control. ### Playground After the endpoint is created, the endpoint details page shows a chat playground where you can interact with the deployed model. ![endpoint playground 0.8x](/dgx-cloud/lepton/_next/static/media/endpoint-playground.330bf8c8.png) ### API Request You can also use the endpoint URL to make API requests. Go to the **API** tab on the endpoint details page for details. For example, you can use the following command to list the available models in the endpoint. ## Next Steps For more information about endpoints, refer to the following: * [Endpoint configurations](/dgx-cloud/lepton/features/endpoints/configurations/) * [Create endpoints from NVIDIA NIM](/dgx-cloud/lepton/features/endpoints/create-from-nim/) * [Create LLM endpoints](/dgx-cloud/lepton/features/endpoints/create-llm/) * [Create endpoints from container image](/dgx-cloud/lepton/features/endpoints/create-from-container-image/) [Introduction](/dgx-cloud/lepton/get-started/)[Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) Create an Endpoint Use the Endpoint Next Steps [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/get-started/node-group Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Node Group Copy page Learn how to create and manage node groups on DGX Cloud Lepton. A node group is a dedicated set of nodes of a specific GPU type. Node groups let you group multiple nodes for a given workload. There are two types of node groups: * **BYOC Node Group** : Uses your own compute resources. * **Lepton Managed Node Group** : All resources are managed by DGX Cloud Lepton. Lepton Managed Node Groups will be created by your Technical Account Manager(TAM) for you to use. ## Create a BYOC Node Group Bring Your Own Compute (BYOC) allows you to bring your own compute resources to DGX Cloud Lepton. This is useful if you have existing resources you want to use with DGX Cloud Lepton. Follow the steps on the [Create BYOC Node Group page](/dgx-cloud/lepton/compute/bring-your-own-compute/create-byoc/) to get started. ## Use a Node Group Once the node group is available, you can view its details on the [node group list page](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/node-groups/list). By default, a new node group is empty. To add capacity, [bring your own compute](/dgx-cloud/lepton/compute/bring-your-own-compute/add-machines/) by clicking **Add Machines** under the **Nodes** tab. ### Create Workloads After adding machines to the node group, you can create workloads from the node group using available nodes. When creating a workload, select the node group in the **Resource** pane to view the GPU types available in that group. ![use-node-group](/dgx-cloud/lepton/_next/static/media/use-node-group.69ecc77b.png) ## Next Steps * [Bring Your Own Compute](/dgx-cloud/lepton/compute/bring-your-own-compute/) [Batch Job](/dgx-cloud/lepton/get-started/batch-job/)[Workspace](/dgx-cloud/lepton/get-started/workspace/) Create a BYOC Node Group Use a Node Group Next Steps [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/get-started/workspace Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Workspace Copy page Learn what a workspace is and how to create and manage a workspace on DGX Cloud Lepton. A workspace is a virtual environment that provides a set of tools and resources for managing your infrastructure and workloads. ## Create Workspace ### NVIDIA Cloud Account Before creating a workspace on DGX Cloud Lepton, you need a valid [NVIDIA Cloud Account (NCA)](https://cloudaccounts.nvidia.com/) and access to the DGX Cloud Lepton platform. After logging into DGX Cloud Lepton with your NCA, a workspace is generated automatically. ### Activate Workspace Given that DGX Cloud Lepton is in early access, your workspace must be activated after it is created. A banner will appear indicating your workspace is not yet activated. Follow the instructions to apply for access on the [NVIDIA Developer](https://developer.nvidia.com/dgx-cloud/get-lepton) website. After you apply, NVIDIA will review your application and contact you. ### External SSO Integration DGX Cloud Lepton can integrate with an external SSO/IdP to centralize user authentication and manage access to NVIDIA cloud services. To get started, refer to [NGC External SSO](https://docs.nvidia.com/ngc/latest/ngc-user-guide.html#using-an-external-sso-for-ngc-org-authentication). Contact your NVIDIA TAM if you have questions. ## Next Steps DGX Cloud Lepton provides many features to help you manage your workspace. Learn more: * [Members](/dgx-cloud/lepton/features/workspace/members/) * [Token](/dgx-cloud/lepton/features/workspace/token/) * [Secrets](/dgx-cloud/lepton/features/workspace/secret/) * [Registry](/dgx-cloud/lepton/features/workspace/registry/) * [Templates](/dgx-cloud/lepton/features/workspace/templates/) [Node Group](/dgx-cloud/lepton/get-started/node-group/) Create Workspace Next Steps [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://raw.githubusercontent.com/leptonai/leptonai/main/README.md # Lepton AI **A Pythonic framework to simplify AI service building** HomepageExamplesDocumentationCLI References The LeptonAI Python library allows you to build an AI service from Python code with ease. Key features include: - A Pythonic abstraction `Photon`, allowing you to convert research and modeling code into a service with a few lines of code. - Simple abstractions to launch models like those on [HuggingFace](https://huggingface.co) in few lines of code. - Prebuilt examples for common models such as Llama, SDXL, Whisper, and others. - AI tailored batteries included such as autobatching, background jobs, etc. - A client to automatically call your service like native Python functions. - Pythonic configuration specs to be readily shipped in a cloud environment. ## Getting started with one-liner Install the library with: ```shell pip install -U leptonai ``` This installs the `leptonai` Python library, as well as the commandline interface `lep`. You can then launch a HuggingFace model, say `gpt2`, in one line of code: ```python lep photon runlocal --name gpt2 --model hf:gpt2 ``` If you have access to the Llama2 model ([apply for access here](https://huggingface.co/meta-llama/Llama-2-7b)) and you have a reasonably sized GPU, you can launch it with: ```python # hint: you can also write `-n` and `-m` for short lep photon runlocal -n llama2 -m hf:meta-llama/Llama-2-7b-chat-hf ``` (Be sure to use the `-hf` version for Llama2, which is compatible with huggingface pipelines.) You can then access the service with: ```python from leptonai.client import Client, local c = Client(local(port=8080)) # Use the following to print the doc print(c.run.__doc__) print(c.run(inputs="I enjoy walking with my cute dog")) ``` Not all HuggingFace models are supported, as many of them contain custom code and are not standard pipelines. If you find a popular model you would like to support, please [open an issue or a PR](https://github.com/leptonai/leptonai/issues/new). ## Checking out more examples You can find out more examples from the [examples repository](https://github.com/leptonai/examples). For example, launch the Stable Diffusion XL model with: ```shell git clone git@github.com:leptonai/examples.git cd examples ``` ```python lep photon runlocal -n sdxl -m advanced/sdxl/sdxl.py ``` Once the service is running, you can access it with: ```python from leptonai.client import Client, local c = Client(local(port=8080)) img_content = c.run(prompt="a cat launching rocket", seed=1234) with open("cat.png", "wb") as fid: fid.write(img_content) ``` or access the mounted Gradio UI at [http://localhost:8080/ui](http://localhost:8080/ui). Check the [README file](https://github.com/leptonai/examples/blob/main/advanced/sdxl/README.md) for more details. ## Writing your own photons Writing your own photon is simple: write a Python Photon class and decorate functions with `@Photon.handler`. As long as your input and output are JSON serializable, you are good to go. For example, the following code launches a simple echo service: ```python # my_photon.py from leptonai.photon import Photon class Echo(Photon): @Photon.handler def echo(self, inputs: str) -> str: """ A simple example to return the original input. """ return inputs ``` You can then launch the service with: ```shell lep photon runlocal -n echo -m my_photon.py ``` Then, you can use your service as follows: ```python from leptonai.client import Client, local c = Client(local(port=8080)) # will print available paths print(c.paths()) # will print the doc for c.echo. You can also use `c.echo?` in Jupyter. print(c.echo.__doc__) # will actually call echo. c.echo(inputs="hello world") ``` For more details, checkout the [documentation](https://docs.nvidia.com/dgx-cloud/lepton) and the [examples](https://github.com/leptonai/examples). ## Contributing Contributions and collaborations are welcome and highly appreciated. Please check out the [contributor guide](https://github.com/leptonai/leptonai/blob/main/CONTRIBUTING.md) for how to get involved. ## License The Lepton AI Python library is released under the Apache 2.0 license. Developer Note: early development of LeptonAI was in a separate mono-repo, which is why you may see commits from the `leptonai/lepton` repo. We intend to use this open source repo as the source of truth going forward. --- # Lepton AI Documentation Index ## Overview Lepton AI is a containerized inference deployment platform with autoscaling capabilities. **Official Documentation:** https://docs.nvidia.com/dgx-cloud/lepton/ **GitHub Repository:** https://github.com/leptonai/leptonai ## Documentation Sections ### Getting Started - **Introduction** - Overview of DGX Cloud Lepton - **Endpoint** - Deploy and manage AI model endpoints - **Dev Pod** - Lightweight AI development environments - **Batch Job** - Run one-off tasks and jobs - **Node Group** - Manage compute node groups - **Workspace** - Workspace setup and management ### Features - **Endpoint Configurations** - Configure endpoint settings, autoscaling, access control - **Create LLM Endpoint** - Deploy Large Language Models - **Create NIM Endpoint** - Deploy NVIDIA NIM endpoints - **Dev Pod Configurations** - Configure development pod settings - **Batch Job Configurations** - Configure batch job parameters - **Workspace Members** - Manage workspace access and permissions - **Workspace Tokens** - Authentication tokens - **Workspace Secrets** - Secret management ### Compute - **Bring Your Own Compute** - Use your own infrastructure with Lepton ### Reference - **Python SDK Reference** - API documentation and examples ### Repository - **GitHub README** - Project overview and getting started - **CONTRIBUTING** - Contributing guidelines ## Key Concepts ### Endpoint A running instance of an AI model that exposes an HTTP server. Accessible via REST APIs with support for: - Autoscaling based on traffic (QPM) or GPU utilization - Access control (IP allowlist, tokens) - Health checks and monitoring ### Dev Pod Lightweight, container-based AI development environments for building and testing applications. - Full web terminal access - SSH support - JupyterLab integration - Starter kits for common workflows ### Batch Job One-off tasks like model training or data processing that run to completion. - Simple configuration - Status tracking - Automatic archiving after completion ### Bring Your Own Compute (BYOC) Integrate your existing hardware infrastructure with Lepton's management platform. ## Quick Links - **Main Documentation:** https://docs.nvidia.com/dgx-cloud/lepton/ - **GitHub:** https://github.com/leptonai/leptonai - **Examples:** https://github.com/leptonai/examples --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/ --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/reference/api Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Python SDK Reference Copy page Tutorial on using the Python SDK to interact with DGX Cloud Lepton DGX Cloud Lepton supports the REST API protocol and includes a Python SDK for interacting with workspaces. Common tasks include monitoring and launching batch jobs and endpoints. This document provides an overview of how to interact with the Python SDK for DGX Cloud Lepton. # Installation and authentication First, install the Python SDK and authenticate with your workspace. Install the SDK with: Next, authenticate with your workspace: This prompts you to authenticate with your DGX Cloud Lepton workspace. If you're in a GUI-supported environment such as a desktop, a browser will open to the credentials page in your workspace. Otherwise, a URL will be displayed. Open this URL in a browser. On the credentials page, create an authentication token by following the prompts. The page will display a secret token which is used for authentication. Copy the workspace ID and token shown in the second field and paste it back into your terminal. The format should look like `xxxxxx:**************************`. You should now be authenticated with DGX Cloud Lepton. You only need to authenticate once locally as long as your credentials remain valid. ## Validate installation After authentication, validate the installation by running: This lists your available workspaces and should look similar to the following if authentication was successful: ## Basic Python SDK flow Nearly all workflows using the Python SDK follow the same basic flow: 1. Initialize a client 2. Define the task to perform 3. Execute the task The following sections break down these steps and provide a complete example. ## Initialize a client Initializing the client is straightforward—simply import the Lepton API module and instantiate the client: The `client` variable can be reused for multiple tasks. ## Define the task to perform Most tasks available to users on DGX Cloud Lepton are supported via the SDK. The following API resources are accessible: 1. Batch Jobs 2. Dev Pods 3. Endpoints 4. Events 5. Health Checks 6. Ingress 7. Logs 8. Monitoring 9. Node Groups 10. Queue 11. Readiness 12. Replicas 13. Secrets 14. Templates 15. Workspaces Each of these resources has a specific template they expect for the API request. For example, the Batch Jobs API expects a job to have a `leptonai.api.v1.types.job.LeptonJob` type for submission. Similarly, Endpoints (also known as "Deployments" in the SDK), expect a `leptonai.api.v1.types.deployment.LeptonDeployment` object for submission, as do Dev Pods which are handled similarly to Endpoints by the backend. The list of API specs can be found [here](https://github.com/leptonai/leptonai/tree/main/leptonai/api/v1/types). Open the file for the specific task you need and review its specification. All jobs require a Resource Shape to be specified. This tells the platform what resources should be allocated in the container, such as number and type of GPUs, CPU cores and memory, and local storage. To view available shapes, run `lep node resource-shape` in CLI version 0.26.4 or later. This returns a table of all available resource shapes in your node groups. The **Shapes** column shows the name of the shape. Use the desired name for the `resource_shape` field in the job specifications. For a batch job, you need a `LeptonJob` object with a `LeptonJobUserSpec`. Review the `LeptonJobUserSpec` in the [Python script](https://github.com/leptonai/leptonai/blob/main/leptonai/api/v1/types/job.py#L22) for the list of settings which are required for launching a job. The following is a quick example of defining a batch job spec (this expands upon the previous code which instantiated the client): The example above does the following: 1. Imports all required modules 2. Finds the ID of the specified node group - Update the listed node group for your specific needs 3. Get the list of the node IDs for all nodes in your node group - this specifies which nodes the job can be scheduled on 4. Specify the job spec - this includes defining the resource shape, container, command, and number of workers 5. Define the job by passing the job spec and giving it a name ## Mounting shared storage If shared storage is available in your node group, you can mount it in a job by pointing to the storage address and indicating which directory to mount from storage and where to mount it in the container. To find your shared storage name, navigate to **Nodes** in the UI and select your desired node group. Next, click **Actions > Config > Storage** in the top-right corner of the page to list the available storage in the node group. If you have shared storage available, you should see something similar to the following image: ![shared-storage](/dgx-cloud/lepton/_next/static/media/shared-storage.c4493fbf.png) As indicated in the image, `lepton-shared-fs` is the name of the shared storage which is mountable in containers. With the storage name captured, we can define the mount specifications. To mount the volume, you will need to create a Python dictionary as follows: More information on each value: * `path`: This is the directory from your shared storage to be mounted. Putting `/` will mount the root directory of the shared storage inside your container. * `mount_path`: This is the directory the `path` directory from storage will be mounted inside the container. Putting `/my-mount` will mount the `path` directory from storage at `/my-mount` inside the container. Note that this value cannot be `/`. * `from`: This is the name of the storage to mount. When using shared storage, you must first specify the storage type followed by a colon and your storage name. The storage type is very commonly `node-nfs`. Following the example above, this should be `node-nfs:lepton-shared-fs`. You can also specify multiple mounts by making the dictionary above a list of dictionaries, similar to the following: After defining your mount points, add them to a job template with the following syntax: If you have a single dictionary and not a list, this would be: ## Authenticating with private container registries If you are using a private container registry that requires authentication, you need to specify the private container secret in the job definition. To create a new secret, navigate to the **Settings > Registries** page in the UI and click the green **\+ New registry auth** button in the top-right corner of the page. Follow the prompts to add your private image token. To add your secret to the job specification, enter it in the `image_pull_secrets` field like below: Note that `image_pull_secrets` expects a list of strings, allowing multiple secrets to be added. ## Running in a node reservation If you have an existing node reservation for scheduling jobs on dedicated resources, you can add that to your request using the `node_reservation` field as follows: ## Execute the task After the job has been defined in the previous step, it can be launched using the client. Since we are launching a job, we would use: This adds the job to the queue and schedules it when resources become available. The job should appear in the UI after the `create` function runs. # Example job submission via SDK The following is a self-contained example of launching a batch job using the Python SDK following the flow outlined earlier. Save the script above to a file such as `run.py` and launch it with: Once the script completes, the launched job should be viewable in the UI. [lep workspace](/dgx-cloud/lepton/reference/cli/lep_workspace/)[Workload Overview](/dgx-cloud/lepton/reference/workload-identity/1st-workload-identity/) Installation and authentication Validate installation Basic Python SDK flow Initialize a client Define the task to perform Mounting shared storage Authenticating with private container registries Running in a node reservation Execute the task Example job submission via SDK [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/workspace/members Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Members Copy page Manage workspace collaboration in DGX Cloud Lepton by inviting team members, assigning roles, and controlling access permissions for your projects. As a workspace administrator, you can manage your workspace members by inviting, removing, and assigning roles to them. This is an advanced feature that allows you to collaborate with your team members and control their access to the workspace. If your Enterprise SSO has been integrated, DGX Cloud Lepton members still need to be individually added to DGX Cloud Lepton. See the [Workspace](/dgx-cloud/lepton/get-started/workspace/) section for more information on SSO integrations. ## Managing Members Navigate to the **Members** tab on the left-hand side of the workspace settings page. You will see a list of all members in the current workspace. If you have an admin role, you can invite, remove, and assign roles to members from this page. Once a member is invited, they will receive an email notification with an invitation link to join the workspace. ![Members](/dgx-cloud/lepton/_next/static/media/members.5bce9f5e.png) Your workspace members should be in the same NCA account. You can verify this from the [NCA dashboard](https://cloudaccounts.nvidia.com/). [Break Glass Access for Support](/dgx-cloud/lepton/features/workspace/breakglass-access-for-support/)[Registry](/dgx-cloud/lepton/features/workspace/registry/) Managing Members [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/workspace/secret Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Secrets Copy page Securely store and manage sensitive information like API keys and credentials in DGX Cloud Lepton, and use them in your deployments and jobs. Secrets are a secure and reusable way to add credentials and other sensitive information to your workspace. You can use secrets to store sensitive data, such as passwords, API keys, and tokens, and then reference them in your deployments, jobs, and pods. Secrets are similar to environment variables, but the actual value is no longer editable or revealable once created. ## Managing Secrets Navigate to the **Secrets** tab on the left-hand side of the workspace settings page. You will see a list of all the secrets you have created. You can add, edit, or delete secrets from this page. ![secrets list](/dgx-cloud/lepton/_next/static/media/secrets.4c557d1c.png) We provide quick ways to add SSH public keys, GitHub, Hugging Face, OpenAI, Datadog tokens, NGC API keys, and custom secrets as secrets to your workspace. All secrets are stored and associated with the workspace in which they are created. ### Visibility For every secret, you can set the visibility to **Public** or **Private**. * **Public** : The secret is visible to all users in the workspace. * **Private** : The secret is only visible to you, the user who created it. By default, all secrets are created as private. ## Using Secrets On the dashboard, you can add secrets as environment variables to your deployments, jobs, and pods. Configure them under advanced settings when creating or editing a deployment, job, or pod. To access the secret value in the deployment, you can use the `os` module: [Roles and Permissions](/dgx-cloud/lepton/features/workspace/roles-and-permissions/)[Templates](/dgx-cloud/lepton/features/workspace/templates/) Managing Secrets Using Secrets [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation. --- # Source: https://docs.nvidia.com/dgx-cloud/lepton/features/workspace/token Toggle Menu Menu [](/dgx-cloud/lepton/) Get Started * [Introduction](/dgx-cloud/lepton/get-started/) * [Endpoint](/dgx-cloud/lepton/get-started/endpoint/) * [Dev Pod](/dgx-cloud/lepton/get-started/dev-pod/) * [Batch Job](/dgx-cloud/lepton/get-started/batch-job/) * [Node Group](/dgx-cloud/lepton/get-started/node-group/) * [Workspace](/dgx-cloud/lepton/get-started/workspace/) Compute * Bring Your Own Compute Features * Endpoints * Dev Pods * Batch Jobs * Nodes * Clusters * Utilities * Workspace Examples * Batch Job * Connections * Dev Pod * Endpoint * Fine Tuning * Raycluster Reference * CLI * [Python SDK Reference](/dgx-cloud/lepton/reference/api/) * Workload Identity * Limits * [Support](/dgx-cloud/lepton/reference/support/) # Token Copy page Learn about Lepton's authentication tokens, including User and Workspace tokens, and how to use them securely with the API, CLI, and SDKs. Tokens could be used to authenticate and authorize requests to DGX Cloud Lepton. They are essential for logging in via the CLI, API, or SDKs. Keep your tokens secure and do not share them with others. ## Creating Tokens You can create a new token on the [**Settings - Tokens**](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/settings/api-tokens) tab on the left-hand side of the workspace settings page. When creating a new token, following fields can be configured based on your needs: * Token Name: The name of the token. * Expiration: The expiration time of the token. Default to 1 day. ![Create token 0.6x](/dgx-cloud/lepton/_next/static/media/create.74d59d5e.png) ## Viewing Tokens You can view your tokens on the **Token** tab on the left-hand side of the workspace settings page. ![Tokens](/dgx-cloud/lepton/_next/static/media/tokens.46c118c1.png) You can only see the tokens created by you in the list. ## Using Tokens Token can be used to authenticate requests to the Lepton API. Simply include the token in the `Authorization` header of your request: Workspace ID could be found on the [**Settings - General**](https://dashboard.dgxc-lepton.nvidia.com/workspace-redirect/settings/workspace). [Templates](/dgx-cloud/lepton/features/workspace/templates/)[Distributed Training with MPI](/dgx-cloud/lepton/examples/batch-job/distributed-training-with-mpi/) Creating Tokens Viewing Tokens Using Tokens [](/dgx-cloud/lepton/) ### Corporate Info * [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) * [Manage My Privacy](https://www.nvidia.com/en-us/about-nvidia/privacy-center/) * [Terms of Service](https://www.nvidia.com/en-us/about-nvidia/terms-of-service/) * [Corporate Policies](https://www.nvidia.com/en-us/about-nvidia/company-policies/) ### NVIDIA Developer * [Developer Home](https://developer.nvidia.com/) * [Blog](https://blogs.nvidia.com/) ### Resources * [Contact Us](https://www.nvidia.com/en-us/contact/) * [Developer Program](https://developer.nvidia.com/developer-program) Copyright @ 2025, NVIDIA Corporation.