# Nvidia Developer > [ACE](https://developer.nvidia.com/ace) --- # Source: https://developer.nvidia.com/ace-for-games.md 1. [Home](/) [ACE](https://developer.nvidia.com/ace) ACE for Games # NVIDIA ACE for Games NVIDIA ACE is a suite of AI technologies - spanning models and developer tools - designed to help middleware and game developers build knowledgeable, actionable and conversational in-game characters. ACE provides ready-to-integrate cloud and on-device AI models for each aspect of in-game characters—from speech to intelligence to animation. [Get Started ](#section-getting-started "Get Started")[Notify Me](https://developer.nvidia.com/ace-for-games/notify-me "Notify Me") ![A digital human powered by NVIDIA ACE Unreal Engine 5](https://developer.download.nvidia.com/images/ace-for-gamedev/Hiroki1.png) * * * ## Key Benefits ### **Small Models for Gaming** ACE offers small AI models—from speech to vision, built to enhance game characters. ### Optimized for On-Device Inference AI models fine-tuned and optimized for gaming hardware, provide high accuracy and low latency within a small memory footprint. ### Inference Alongside Graphics [NVIDIA In-Game Inferencing](https://developer.nvidia.com/rtx/in-game-inferencing) (NVIGI) plugins schedule AI inference for different models and inference backends across complex graphics workloads to maximize performance and the user experience. * * * ## Partner Experiences Powered by AI NVIDIA ACE is being used by industry-leading game developers and ISVs to build autonomous game characters that inhabit living, breathing worlds and AI assistants that provide tips and guidance to gamers and creators. ### Assistants The Total War: PHARAOH AI advisor assists players in learning the game’s many systems and mechanics and is powered by an on-device small language model connected to Total War’s extensive game data. https://www.youtube-nocookie.com/embed/6es-Zvh5bu4?&rel=0 ### Sidekicks Streamlabs and Inworld AI introduce an intelligent streaming assistant that serves as a producer, technical assistant and 3D sidekick. https://www.youtube-nocookie.com/embed/909h0VZeOMM?&rel=0 ### Teammates KRAFTON’s PUBG introduces Co-Player Characters (CPC), AI-driven allies that communicate with natural language and act autonomously like a human teammate. https://www.youtube-nocookie.com/embed/OdtFL1CDDwc?&rel=0 ### Enemies Wemade Next’s MIR5 introduces AI-powered bosses that continuously learn from previous player tactics to adapt and provide unique fights every run. https://www.youtube-nocookie.com/embed/-8XeiZ4djKw?&rel=0 ### Citizens KRAFTON’s inZOI features Smart Zois, AI-driven agents that plan, act and reflect on their decisions for unique character dynamics. https://www.youtube-nocookie.com/embed/Pk9z1upOj3U?&rel=0 ### Characters Dead Meat is a first of its kind murder mystery interrogation game where players can ask the suspect absolutely anything using their own words. https://www.youtube-nocookie.com/embed/PFwIVnDU8KM?&rel=0 * * * ## Get Started with NVIDIA ACE The [NVIDIA In-Game Inferencing](https://developer.nvidia.com/blog/bring-nvidia-ace-ai-characters-to-games-with-the-new-in-game-inference-sdk/?ncid=so-nvsh-920422) (NVIGI) SDK offers a streamlined and high performance path to integrate locally run AI models into games and applications via in-process (C++) execution and CUDA in Graphics. NVIGI supports all major inference backends, across different hardware accelerators (GPU, NPU, CPU), so developers can take advantage of the full range of available system resources on a user’s PC. [Download NVIGI SDK](https://developer.nvidia.com/rtx/in-game-inferencing) [Documentation](https://github.com/NVIDIA-RTX/NVIGI-Core) [Compatibility Matrix](https://forums.developer.nvidia.com/t/ace-compatibility-table/334024) [Archive](/rtx/in-game-inferencing#section-archive) Animation Intelligence Speech ### NVIDIA® Riva ASR Takes an audio stream as input and returns a text transcript in real time. It’s NVIDIA GPU-accelerated for maximum performance and accuracy. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_card/RIVA_Conformer_ASR_English.pdf) [Download On-Device Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/riva_pack_ga_1_0_0.zip) [Access Cloud Model](https://build.nvidia.com/nvidia/parakeet-ctc-0_6b-asr/deploy) [Documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html) ### Whisper ASR Takes an audio stream as input and returns a text transcript in real time. It’s compatible with NVIDIA GPUs and any CPUs. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_card/Whisper_ASR.pdf) [Download On-Device Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/whisper_asr_gguf_v1.0.7z) [Access Cloud Model](https://build.nvidia.com/openai/whisper-large-v3) [Documentation](https://github.com/NVIDIA-RTX/NVIGI-Plugins/blob/main/docs/ProgrammingGuideASRWhisper.md) ### Riva TTS Takes a text output and converts it into natural and expressive voices in multiple languages in real time. Built for agentic workflows and compatible with multi-vendor GPUs and CPUs.. FP16 quantization offers higher accuracy for higher VRAM usage. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_zip/riva-magpie-tts-flow-1p0.zip) [Download On-Device Model (FP16)](https://developer.nvidia.com/downloads/rtx/In-Game-Inference-SDK/riva-magpie-tts-flow-ggml-1p5-fp16.zip) [Download On-Device Model (Q4)](https://developer.nvidia.com/downloads/rtx/In-Game-Inference-SDK/riva-magpie-tts-flow-ggml-1p5-q4.zip) [Access Cloud Model](https://build.nvidia.com/nvidia/magpie-tts-flow) [Documentation](https://github.com/NVIDIA-RTX/NVIGI-Plugins/blob/main/docs/ProgrammingGuideTTSASqFlow.md) ### Nemotron Nano 9B v2 Class leading small language model on reasoning accuracy, providing higher-quality final responses to tasks. Optimized for memory efficiency. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_card/nemotron-nano-9b-v2.pdf) [Download On-Device Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/nvidia_nvidia-nemotron-nano-9b-v2-q4_k_m.gguf.zip) [Documentation](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2) ### Qwen3 Family Open source dense models optimized for on-device inference. Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. It's compatible with multi-vendor GPUs and CPUs. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_card/qwen3-8b-instruct.pdf) [Download On-Device .6B Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/qwen_qwen3-0.6b-q4_k_m.gguf.zip) [Download On-Device 4B Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/qwen3-4b-q4_k_m.gguf.zip) [Download On-Device 8B Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/qwen3-8b-q4_k_m.gguf.zip) [Documentation](https://huggingface.co/Qwen/Qwen3-8B) ### Mistral-Nemo-Minitron Family Agentic small language models that enable better role-play, retrieval-augmented generation (RAG) and function calling capabilities. They come in 8B, 4B and 2B parameter models to fit your VRAM and performance requirements. The on-device models are compatible with multi-vendor GPUs and CPUs. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_card/Mistral-NeMo-Minitron-8B-128K-Instruct.pdf) [Download On-Device 2B Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/mistral-nemo-minitron-2b-128k-instruct_v1.0.0.7z) [Download On-Device 4B Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/mistral-nemo-minitron-4b-128k-instruct_v1.0.0.7z) [Download On-Device 8B Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/mistral-nemo-minitron-8b-128k-instruct_v1.0.0.7z) [Documentation](https://github.com/NVIDIA-RTX/NVIGI-Plugins/blob/main/docs/ProgrammingGuideGPT.md) ### Llama3.2-3B-Instruct Agentic small language model that enables better role-play, retrieval-augmented generation (RAG) and function calling capabilities. This model is compatible with multi-vendor GPUs and CPUs. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_card/llama-3.2-3b_for_Nv_IGI_SDK.pdf) [Download On-Device Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/llama-3.2-3b_v1.0.1.7z) [Access Cloud Card](https://build.nvidia.com/meta/llama-3.2-3b-instruct/deploy) [Documentation](https://github.com/NVIDIA-RTX/NVIGI-Plugins/blob/main/docs/ProgrammingGuideGPT.md) ### Nemovision-4B-Instruct Agentic vision-language model that combines visual understanding of on-screen elements and actions and reasons for better context aware responses. The on-device model is compatible with multi-vendor GPUs and CPUs. [Access Model Card](https://developer.nvidia.com/downloads/assets/ace/model_card/mistral-nemotron-vision-4b-instruct_modelcard.pdf) [Download On-Device Model](https://developer.nvidia.com/downloads/assets/ace/model_zip/mistral-nemotron-vision-4b-instruct_vv1.7z) [Documentation](https://github.com/NVIDIA-RTX/NVIGI-Plugins/blob/main/docs/ProgrammingGuideGPT.md#90-vlm-visual-lanuage-models) ### Audio2Face-3D SDK Use AI to convert streaming audio to facial blendshapes for real-time lip-syncing and facial animations on-device or in the cloud. The SDK contains C++ and Python source code through the MIT license. [Download Audio2Face-3D SDK](https://github.com/NVIDIA/Audio2Face-3D-SDK) [Documentation](https://github.com/NVIDIA/Audio2Face-3D-SDK/tree/main/docs) ### Audio2Face-3D Models Audio2Face-3D regression (2.3) and diffusion (3.0) to generate lip-sync. Open weights in ONNX-TRT format available through the NVIDIA Open Model License. Unreal Engine models require the use of the ACE Unreal Engine Plugin. [Download Audio2Face 3.0 Unreal Engine Models](https://developer.nvidia.com/downloads/assets/ace/ace_3.0_a2f_models.zip) [Download Audio2Face 2.3 Unreal Engine Models](https://developer.nvidia.com/downloads/assets/ace/ace_2.5_v2.3_a2f_models.zip) [Download Audio2Face-3D 3.0 Open Source Models](https://huggingface.co/collections/nvidia/audio2face-3d-6865d22d6daec4ac85887b17) ### Audio2Emotion-3D Models Audio2Emotion production (2.2) and experimental (3.0) models to infer emotional state from audio. Open weights in ONNX-TRT format available through a custom license. [Download Audio2Emotion 3.0 Models](https://huggingface.co/nvidia/Audio2Emotion-v3.0) [Download Audio2Emotion 2.2 Models](https://huggingface.co/nvidia/Audio2Emotion-v2.2) ### Audio2Face-3D Plugins The Audio2Face-3D plug-in for Unreal Engine 5 alongside a configuration sample enhances your Metahuman experience. The Autodesk Maya ACE plugin generates high-quality, audio-driven facial animation offline. Both plugins are available under the MIT license. [Download Unreal Engine Gaming Sample](https://developer.nvidia.com/downloads/assets/ace/aceunrealsample-1.0.0.7z) [Download Unreal Engine 5.6 Plugin](https://developer.nvidia.com/downloads/assets/ace/nv_ace_reference-ue5.6-v2.5.0rc3.zip) [Download Unreal Engine 5.5 Plugin](https://developer.nvidia.com/downloads/assets/ace/nv_ace_reference-ue5.5-v2.5.0rc3.zip) [Download Unreal Engine 5.4 Plugin](https://developer.nvidia.com/downloads/assets/ace/nv_ace_reference-ue5.4-v2.4.0.zip) [Documentation](https://docs.nvidia.com/ace/ace-unreal-plugin/2.5/) [Download Maya ACE Plugin and Documentation](https://github.com/NVIDIA/Maya-ACE) ### Audio2Face-3D Training Audio2Face-3D training framework allows developers to create Audio2Face-3D models with your data. Source code is available in Python through the Apache license. Leverage audio files, blendshape data, animated geometry caches, geometry files and transform files to get started with the training framework. The sample data is available through a custom license for evaluation only. [Download Training Framework](https://github.com/NVIDIA/Audio2Face-3D-training-framework) [Download Training Sample Data](https://huggingface.co/datasets/nvidia/Audio2Face-3D-Dataset-v1.0.0-claire) * * * ## More Resources ![A decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Developer Discord ![img-alt-text](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign up for Developer Newsletter * * * ## On-Demand Sessions * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). Ready to try NVIDIA ACE? [Get Started](#section-getting-started "Get Started") --- # Source: https://developer.nvidia.com/drive/agx.md # DRIVE AGX Developer Kits The NVIDIA DRIVE AGX™ Thor and NVIDIA DRIVE AGX Orin™ Developer Kits give you the hardware, software, and sample applications you need to develop production-level autonomous vehicle applications. * * * ## DRIVE AGX Thor Developer Kit DRIVE AGX Thor Developer Kit SKU 10 is available for bench development. DRIVE AGX Thor Developer Kit SKU 12 is available for in-vehicle development. [Buy Now](#section-where-to-buy) ### Features With NVIDIA DRIVE® platform, developers can build, extend, and leverage one development investment across an entire fleet. ### Scaleable AV Platform - Up to 2,000 FP4 (1,000 INT8) TFOPS for multiple concurrent AI inference pipelines - Built on production, auto-grade silicon - Architected for safety and security - Streamlines software development with a fully assembled developer kit - Increases performance with the ability to easily connect multiple DRIVE AGX Thor systems. ### Rich Automotive I/O - 16x GMSL® 2 + 2x GMSL® 3 - 3x 100/1000/10G-T1 (quad and 6-port H-MTD) - 1x DisplayPort up to 4K@60Hz - 6x GMSL3 (MATE-AX x4) ### DRIVE SDK Support - Foundational software stack, NVIDIA DriveOS with DriveWorks - Suite of developer tools, including NvMedia, NvStreams, CUDA®, cuDNN, and TensorRT™ ### Specifications [DRIVE AGX Thor Platform Overview](https://developer.nvidia.com/downloads/drive/docs/nvidia-drive-agx-thor-platform-for-developers.pdf) DRIVE AGX Thor Hardware Quick Start Guides: - [English](https://developer.nvidia.com/downloads/drive-agx-thor-hardware-quick-start-guide) | [Traditional Chinese](https://developer.nvidia.com/downloads/drive-agx-thor-devkit-qsg-traditional-chinese.pdf) | [Simplified Chinese](https://developer.nvidia.com/downloads/drive-agx-thor-devkit-qsg-simplified-chinese.pdf) | **Component** | **Description** | **Details** | | --- | --- | --- | | Single AGX Thor SoC | NVIDIA integrated GPU | Blackwell architecture class—up to 1000 INT8 TOPS | 2000 FP4 FLOPs | | | CPU | ARM® Neoverse V3AE | | | Programmable Vision Accelerators (PVAs) | Up to 2048 INT8 GMAC/s | 512 INT 16 GMAC/s | | | Image Signal Processor (ISP) | 3.5 Gigabixels/s | | | Video encoder | Up to 3.1 Gigapixels/s (H.265) | Up to 3.0 Gigapixels/s (H.264) | | | Video decoder | Up to 2.9 Gigapixels/s (H.265) | Up to 2.6 Gigapixels/s (H.264) | | | System RAM | 64 GB LPDDR5X | | | Memory bandwidth (256-bit LPDDR5) | Up to 273 GB/s | | | UFS | 256 GB | | DRIVE AGX System I/O | Cameras | 16x GMSL® 2 + 2x GMSL® 3 | | | Ethernet | Up to 76 Gb/s for data transmission (including for lidar/radar) | | | Vehicle IO | 4 CAN interfaces | ### Hardware Accessories | **Accessory** | **Included in DRIVE AGX Thor Developer Kit SKU 10 for bench (Y/N)** | **Included in DRIVE AGX Thor Developer Kit SKU 12 for in-vehicle (Y/N)** | | --- | --- | --- | | **DRIVE AGX Power Cord, China** 930-63960-1434-000 **Compatible: SKU 10** | N | N | | **DRIVE AGX Power Cord, Taiwan** 930-63960-1438-000 **Compatible: SKU 10** | N | N | | **DRIVE AGX Power Cord, US and Japan** 930-63960-1440-000 **Compatible: SKU 10** | N | N | | **DRIVE AGX Power Cord, EU and South Korea** 930-63960-1436-000 **Compatible: SKU 10** | N | N | | **DRIVE AGX Power Cord, UK** 930-63960-1437-000 **Compatible: SKU 10** | N | N | | **DRIVE AGX Power Cord, India** 930-63960-1892-000 **Compatible: SKU 10** | N | N | | **DRIVE AGX Thor Vehicle Harness 1** NVPN 030-1789-000 **Compatible: SKU 10 & SKU 12** | N | Y | | **DRIVE AGX Thor Vehicle Harness 2** NVPN 030-1788-000 **Compatible: SKU 10 & SKU 12** | N | Y | | **DRIVE AGX 1GbE H-MTD to RJ45 Adapter** NVPN 930-13751-0000-100 **Compatible: SKU 10 & SKU 12** | Y | Y | | **DRIVE AGX 10GbE H-MTD to RJ45 Adapter** NVPN 930-13584-0000-200 **Compatible: SKU 10 & SKU 12** | N | N | * * * ## DRIVE AGX Orin Developer Kit The DRIVE AGX Orin Developer Kit is available for bench development. For in-vehicle development, a vehicle accessory kit may be purchased separately. [Buy Now](#section-where-to-buy) ![DRIVE AGX Orin](https://developer.download.nvidia.com/images/drive/agx-drive-850x480.jpg) ### Features The highly scalable NVIDIA DRIVE platform allows developers to build, extend, and leverage one development investment across an entire fleet. ### Scaleable AV Platform - Up to 254 INT8 TOPS for multiple concurrent AI inference pipelines - Built on production, auto-grade silicon - Architected for safety and security - Streamlines software development with a fully assembled developer kit - Increases performance with the ability to easily connect multiple DRIVE AGX Orin systems ### Rich Automotive I/O - 16x GMSL cameras - 2x 10 GbE, 10x 1G bE, 6x 100 MbE - 1x DisplayPort ### DRIVE SDK Support - Foundational software stack, NVIDIA DriveOS with DriveWorks - Suite of developer tools, including NvMedia, NvStreams, CUDA®, cuDNN, and TensorRT™ ### Specifications [DRIVE AGX Orin Platform Overview](https://developer.nvidia.com/downloads/drive/docs/nvidia-drive-agx-orin-platform-for-developers.pdf) DRIVE AGX Orin Hardware Quick Start Guides: - [English](https://developer.nvidia.com/downloads/drive-agx-orin-hardware-quick-start-guide.pdf) | [Traditional Chinese](https://developer.nvidia.com/downloads/drive-agx-orin-devkit-qsg-traditional-chinese.pdf) | [Simplified Chinese](https://developer.nvidia.com/downloads/drive-agx-orin-devkit-qsg-simple-chinese.pdf) | **Component** | **Description** | **Details** | | --- | --- | --- | | Single AGX Orin SoC | NVIDIA integrated GPU | Ampere architecture-class — up to 254 INT8 TOPS, including a Deep Learning Accelerator (DLA) with 87 INT8 TOPS | 5.2 FP32 TOPS | | | CPU | 12 Cortex®-A78A CPU | | | Programmable Vision Accelerators (PVAs) | Up to 2048 INT8 GMACS | 512 INT16 GMACS | | | Image Signal Processor (ISP) | 1.85 Gigapixels/s | | | Video encoder | Up to 1.0 Gigapixels/s (H.265) | | | Video decoder | Up to 1.9 Gigapixels/s | | | Memory bandwidth (256-bit LPDDR5) | Up to 200 GB/s | | | UFS | 256 GB | | DRIVE AGX System I/O | Cameras | 90 Gb/s over 16x GMSL(R) ports | | | Ethernet | Up to 30 Gb/s for data transmission (including for lidar/radar) | | | Vehicle IO | 6 CAN interfaces | ### Hardware Accessories | **Accessory** | **Included in DRIVE AGX Orin Developer Kit? (Y/N)** | | --- | --- | | **DRIVE AGX Power Cord, China** 930-63960-1434-000 | Y (version 200) | | **DRIVE AGX Power Cord, Taiwan** 930-63960-1438-000 | Y (version 200) | | **DRIVE AGX Power Cord, US and Japan** 930-63960-1440-000 | Y (version 200) | | **DRIVE AGX Power Cord, EU and South Korea** 930-63960-1436-000 | Y (version 300) | | **DRIVE AGX Power Cord, UK** 930-63960-1437-000 | Y (version 300) | | **DRIVE AGX Power Cord, India** 930-63960-1892-000 | N | | **Vehicle Accessory Kit** NVPN 930-62482-0000-100 | N | | **Vehicle Harness H1B** NVPN 030-1592-000 | N | | **Vehicle Harness H2A** NVPN 030-1463-000 | N | | **DRIVE AGX Orin 1GbE H-MTD to RJ45 Adapter** NVPN 930-13751-0000-100 | Y | | **DRIVE AGX 10GbE H-MTD to RJ45 Adapter** NVPN 930-13584-0000-200 | N | * * * ## Comparison | **Component** | **Description** | **DRIVE AGX Orin DevKit** | **DRIVE AGX Thor DevKit** | | --- | --- | --- | --- | | Single SoC | NVIDIA integrated GPU | Ampere architecture-class—up to 254 INT8 TOPS | Blackwell architecture-class—up to 1000 INT8 TOPS | | | CPU | ARM® Cortex-A78A | ARM® Neoverse V3AE | | | Programmable Vision Accelerators (PVAs) | Up to 2048 INT8 GMAC/s | 512 INT16 GMAC/s | Up to 2048 INT8 GMAC/s | 512 INT16 GMAC/s | | | Image Signal Processor (ISP) | 1.85 Gigapixels/s | 3.5 Gigapixels/s | | | Video encoder | Up to 1.0 Gigapixel (H.265) | Up to 3.1 Gigapixels/s (H.265) | 3.0 Gigapixels (H.264) | | | Video decoder | Up to 1.9 Gigapixels/s | Up to 2.9 Gigapixels/s | | | System RAM | 32 GB LPDDR5 | 64 GB LPDDR5X | | | Memory bandwidth (256-bit LPDDR5) | Up to 200 GB/s | Up to 273 GB/s | | | UFS | 256 GB | 256 GB | | DRIVE AGX System I/O | Cameras | 16x GMSL® 2 | 16x GMSL® 2 + 2x GMSL® 3 | | | Ethernet | Up to 30 Gb/s for data transmission (including for lidar/radar) | Up to 76 Gb/s for data transmission (including for lidar/radar) | | | Vehicle IO | 6 CAN interfaces | 4 CAN interfaces | * * * ## Purchase a DRIVE AGX Developer Kit ![Arrow logo](https://developer.download.nvidia.com/images/drive/arrow-logo.svg) # Buy from Arrow Contact Arrow, our NVIDIA DRIVE worldwide distributor [Contact Arrow](https://arrow.tfaforms.net/5113032) ![Macnica logo](https://developer.download.nvidia.com/images/logos/logo-macnica.svg) # Buy from Macnica Contact Macnica, our NVIDIA DRIVE Japan distributor. [Contact Macnica](https://go.macnica.co.jp/maas_contact.html) ![Nexty Electronics logo](https://developer.download.nvidia.com/images/logos/logo-nexty.jpg) # Buy from NEXTY Contact NEXTY, our NVIDIA DRIVE Japan distributor. [Contact NEXTY](https://www.nexty-ele.com/contact/) ![Edom logo](https://developer.download.nvidia.com/images/drive/logo-edom.svg) # Buy from EDOM Contact EDOM, NVIDIA DRIVE APAC and India distributor [Contact EDOM](mailto:sales@edom-tech.com?subject=NVIDIA%20DRIVE%20AGX%20DevKit&body=I%20am%20interested%20in%20the%20NVIDIA%20DRIVE%20AGX%20Developer%20Kit.%20Please%20provide%20a%20quote.) ![Leadtek logo](https://developer.download.nvidia.com/images/drive/logo-leadtek.svg) # Buy from Leadtek Contact Leadtek, an NVIDIA DRIVE APAC distributor. [Contact Leadtek](mailto:service@leadtek.com.cn?subject=NVIDIA%20DRIVE%20AGX%20DevKit&body=I%20am%20interested%20in%20the%20NVIDIA%20DRIVE%20AGX%20Developer%20Kit.%20Please%20provide%20a%20quote.) ![MDS Tech logo](https://developer.download.nvidia.com/images/drive/MDS-Tech-CI-Logo.png) # Buy from MDS Tech Contact an MDS Tech, NVIDIA DRIVE Korea distributor [Contact MDS Tech](mailto:ndrive@mdstech.co.kr) --- # Source: https://developer.nvidia.com/ai-apps-for-rtx-pcs.md 1. [Home](/) AI Apps for RTX PCs # AI on NVIDIA RTX PCs Experiment, develop, and optimize with the latest AI capabilities on over 100M RTX PCs worldwide. ![A developer is powering AI apps on her NVIDIA RTX PC laptop](https://developer.download.nvidia.com/images/rtx/rtx_ai_pc.png) ## Benefits ![Decorative](https://developer.download.nvidia.com/icons/m48-monitor-performance.svg) ### Maximum Performance Powered by dedicated AI Tensor Cores with up to 3,352 TOPS, NVIDIA GeForce RTX™ GPUs offer the highest throughput and lowest latency available on PCs. ![Decorative](https://developer.download.nvidia.com/icons/m48-neuralnetwork-1.svg) ### Broadest Ecosystem NVIDIA’s AI development platform offers the broadest selection of AI developer tools, SDKs, models, and frameworks to build the next generation of AI applications on Windows. ![Decorative](https://developer.download.nvidia.com/icons/m48-people-growth.svg) ### 100M+ Installed Base Deploy advanced AI capabilities to the largest install base of powerful, dedicated AI accelerators, with more than 100M RTX AI PCs and workstations worldwide. ![Decorative](https://developer.download.nvidia.com/icons/m48-microservices-2.svg) ### Unified Development Experience Develop and deploy anywhere—across data center, cloud, and RTX AI PCs and workstations—with the same software stack from models to SDKs. ![Decorative](https://developer.download.nvidia.com/icons/m48-it-security-2.svg) ### Privacy and Confidentiality Running AI locally enables workloads that stay on-device at all times, enabling privacy and confidentiality. ![Decorative](https://developer.download.nvidia.com/icons/m48-laptop.svg) ### Local, Always Available Let users run as many inferences as they want without wait time, or without any internet connectivity. ## Develop and Deploy AI Applications for RTX PCs Discover ways to add AI to your applications for peak performance, and get access to exclusive tools for building AI features. ### Accelerate AI With NVIDIA RTX GPUs Discover how to get the most performance out of your application across popular inference backends including TensorRT, Windows ML, and Llama.cpp. [Learn About AI Acceleration on RTX PCs](/ai-apps-for-rtx-pcs/inference-backends) ### Optimize Applications for Maximum Performance Analyze, profile, and optimize your end-to-end application pipeline for maximum performance using NVIDIA tools. [Learn How to Optimize Your Application](https://developer.nvidia.com/nsight-systems) ### Add AI Features to Your App With SDKs and Models Jump-start your development with curated and pre-optimized AI SDKs and models for NVIDIA RTX GPUs. [See AI SDKs and Models](https://developer.nvidia.com/ai-apps-for-rtx-pcs/sdks-models) ### Tailor AI Models to the Needs of Your App Customize AI models for application-specific needs, and optimize them for deployment with RTX AI Toolkit. [Learn About RTX AI Toolkit](/rtx/ai-toolkit) ## Quick Getting Started Guides Have questions or looking for additional recommendations? We’ve created quick-start guides to help answer common questions. [See Recommendations for Deploying LLMs on RTX PCs](https://forums.developer.nvidia.com/t/how-to-deploy-llms-on-rtx-pcs/317354)[See Our Software Migration Guide for NVIDIA Blackwell RTX GPUs](https://forums.developer.nvidia.com/t/software-migration-guide-for-nvidia-blackwell-rtx-gpus-a-guide-to-cuda-12-8-pytorch-tensorrt-and-llama-cpp/321330) ## Proven Technology Over 600 AI Apps and Games have integrated RTX AI optimizations and SDKs [See Full List](https://www.nvidia.com/en-us/geforce/news/nvidia-rtx-games-engines-apps/) ## Developer Tool Integrations [ ![Automatic1111 logo](https://developer.download.nvidia.com/images/rtx/logos/automatic1111.svg) ](https://github.com/AUTOMATIC1111/stable-diffusion-webui) [ ![ComfyUI logo](https://developer.download.nvidia.com/images/rtx/comfy-ui-logo.svg) ](https://www.comfy.org/) [ ![crewai logo](https://developer.download.nvidia.com/images/rtx/crewai-logo.svg) ](https://www.crewai.com/) [ ![flowise logo](https://developer.download.nvidia.com/images/rtx/flowise-ai-logo.svg) ](https://flowiseai.com/) [ ![HuggingFace logo](https://developer.download.nvidia.com/images/rtx/logos/hugging-face-no-text-logo.svg) ](https://huggingface.co/) [ ![JanAI logo](https://developer.download.nvidia.com/images/rtx/jan-ai-logo.svg) ](https://jan.ai/) [ ![LangChain logo](https://developer.download.nvidia.com/images/rtx/logos/langchain-logo.svg) ](https://www.langchain.com/) [ ![langflow logo](https://developer.download.nvidia.com/images/rtx/langflow-logo.svg) ](https://www.langflow.org/) [ ![LlamaIndex logo](https://developer.download.nvidia.com/images/rtx/logos/llamaindex-logo.svg) ](https://www.llamaindex.ai/) [ ![OobaBooga logo](https://developer.download.nvidia.com/images/rtx/logos/oobabooga.svg) ](https://github.com/oobabooga/text-generation-webui) ## NVIDIA Powers the World’s AI. And Yours. Upgrade to advanced AI with NVIDIA GeForce RTX™ GPUs and accelerate gaming, creating, productivity, and development. Thanks to specialized built-in AI processors, you get world-leading AI technology and performance powering everything you do—plus, your data always stays secure on your Windows PC. [Learn More](https://www.nvidia.com/en-us/ai-on-rtx/) ![A content creator powers her video development using NVIDIA GeForce RTX GPUs with advanced AI](https://developer.download.nvidia.com/images/rtx/gforce-rtx-1920-1080.jpg) ## Build Innovative AI PC Applications Across Use Cases ### AI for Gaming Apps NVIDIA-built tools, SDKs, and partner engines work together to produce the next generation of gaming, including AI-powered NPCs, graphics workloads, and ray tracing. [Learn More](https://developer.nvidia.com/industries/game-development) ![AI tools for developing creative apps](https://developer.download.nvidia.com/images/rtx/nvidia-studio-1920-1080.jpg) ### AI for Creative Apps The suite of NVIDIA Studio tools accelerates Audio, Video, 3D, and Broadcasting applications to empower the artistic process on GPUs. [Learn More](https://developer.nvidia.com/ai-for-creative-applications/resources) ![A programmer uses AI tools to develop productivity apps](https://developer.download.nvidia.com/images/rtx/ai-productivity-apps-1920-1080.jpg) ### AI for Productivity Apps NVIDIA’s ecosystem of generative AI tools are powering the next generation of productivity and everyday applications from copilots, to chatbots, to search engines and beyond. ## Latest News ## Access Exclusive NVIDIA Resources The NVIDIA Developer Program gives you free access to the latest AI models for development with NVIDIA NIM™, along with access to training, documentation, how-to guides, expert forums, support from peers and domain experts, and information on the right hardware to tackle the biggest challenges. [Join the NVIDIA Developer Program](/developer-program) ![Explore NVIDIA gen AI training and certification](https://developer.download.nvidia.com/images/products/dli-individuals-850x480.jpg) ### Get Generative AI Training and Certification Elevate your technical skills in generative AI and LLMs with NVIDIA Training’s comprehensive learning paths, covering fundamental to advanced topics, featuring hands-on training, and delivered by NVIDIA experts. Showcase your skills and advance your career by **getting certified** by NVIDIA. [Explore Training](https://www.nvidia.com/en-us/learn/learning-path/generative-ai-llm/) ![NVIDIA experts discuss a new development project](https://developer.download.nvidia.com/images/products/onnect-with-experts-850x480.jpg) ### Connect With NVIDIA Experts Have questions as you’re getting started? Explore our NVIDIA Developer Forum for AI to get your questions answered or explore insights from other developers. [Visit Forums](https://forums.developer.nvidia.com/c/ai-data-science/86) ![NVIDIA Inception helps you build custom gen AI with partners](https://developer.download.nvidia.com/images/products/inference-850x480.jpg) ### Build Your Custom Generative AI With NVIDIA Partners For generative AI startups, **NVIDIA Inception** provides access to the latest developer resources, preferred pricing on NVIDIA software and hardware, and exposure to the venture capital community. The program is free and available to tech startups of all stages. [Learn More About NVIDIA Inception ](https://www.nvidia.com/en-us/startups/) Stay up to date on the latest NVIDIA generative AI news. [Subscribe](/email-signup) --- # Source: https://developer.nvidia.com/dgx-cloud/benchmarking.md 1. [Home](https://developer.nvidia.com/) 2. [DGX Cloud](https://developer.nvidia.com/dgx-cloud) DGX Cloud Benchmarking # NVIDIA DGX Cloud Benchmarking NVIDIA DGX™ Cloud Benchmarking gauges training and inference performance across AI workloads and platforms, accounting for chips, cloud platforms, and application configurations. [Try Now](/dgx-cloud/benchmarking/join)[Documentation](https://github.com/NVIDIA/dgxc-benchmarking) * * * ## How NVIDIA DGX Cloud Benchmarking Works NVIDIA DGX Cloud Benchmarking analyzes real-time metrics across GPU configurations and environments, with ready-to-use benchmarking templates and on-demand benchmarking for custom workloads. Baseline performance results for comparison are provided via interactive dashboards. Specifications: - Scaling analysis from 8 to 2,048 GPUs - Precision comparisons: FP8 versus BF16 - Support for popular AI frameworks and models - Performance data across NVIDIA NeMo™ framework versions ![A chart showing DGX Cloud Benchmarking results](https://developer.download.nvidia.com/images/dgx-cloud-services/diagram-cloud-benchmarking-service.jpg) Example of improved metrics gained over time derived from NVIDIA DGX Cloud Benchmarking results. ### Introductory Blog NVIDIA DGX Cloud introduces ready-to-use templates and recipes to benchmark AI platform performance. [Read Blog](/blog/nvidia-dgx-cloud-introduces-ready-to-use-templates-to-benchmark-ai-platform-performance/) ### Benchmark Recipes The Benchmarking Collection provides an easy path to reproduce the latest performance results for deep learning workloads. [View Recipes](https://github.com/NVIDIA/dgxc-benchmarking) ### Explainer Blog Discover how NVIDIA DGX Cloud Benchmarking accurately measures performance in real-world environments and identifies optimization opportunities in AI training and inference workloads. [Read Blog](/blog/measure-and-improve-ai-workload-performance-with-nvidia-dgx-cloud-benchmarking) * * * ## Get Started With NVIDIA DGX Cloud Benchmarking ![](https://developer.download.nvidia.com/icons/m48-monitor-performance.svg) ### Benchmark AI Workloads Understand end-to-end platform performance and learn best practices for cluster sizing and workload optimization with NVIDIA DGX Cloud Benchmarking. [Try Now](/dgx-cloud/benchmarking/join) ![](https://developer.download.nvidia.com/icons/m48-guideline-document.svg) ### Benchmarking Recipes Deploy ready-to-use benchmarking templates across any cloud platform. Leverage NVIDIA’s performance baselines to compare expected performance across precisions and scales. [View Recipes](https://github.com/NVIDIA/dgxc-benchmarking) * * * ## NVIDIA DGX Cloud Benchmarking Starter Kits See how well your own environment performs: log into NGC; select a model; download the container, benchmarking recipes, and dataset scripts; and launch to obtain throughput results. ### Benchmark Recipe for Meta Llama 3.1 70B Understand end-to-end platform performance and learn best practices for cluster sizing and workload optimization with DGX Cloud Benchmarking. - [Review DGX Cloud Benchmarking Artifacts](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/collections/dgxc-benchmarking/artifacts) - [Select DGX Cloud Benchmarking Entity for Meta Llama 3.1 70B](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/resources/llama31-70b-dgxc-benchmarking-b) - [Access DGX Cloud Benchmarking Service](/dgx-cloud/benchmarking/join) ### Benchmark Recipe for NVIDIA NeMo Megatron This recipe contains information and scripts to produce training performance results of the NVIDIA NeMo Megatron workload. - [Review DGX Cloud Benchmarking Artifacts](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/collections/dgxc-benchmarking/artifacts) - [Select DGX Cloud Benchmarking Entity for NVIDIA NeMo Megatron](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/resources/nemo_megatron175b-dgxc-benchmarking-b) - [Access DGX Cloud Benchmarking Service](/dgx-cloud/benchmarking/join) ### Benchmark Recipe for xAI Grok-1 314B This recipe contains information and scripts to produce performance results for the Grok1 314B training workload. - [Review DGX Cloud Benchmarking Artifacts](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/collections/dgxc-benchmarking/artifacts) - [Select DGX Cloud Benchmarking Entity for Grok1 314B](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/resources/grok1-314b-dgxc-benchmarking-b) - [Access DGX Cloud Benchmarking Service](/dgx-cloud/benchmarking/join) * * * ## NVIDIA DGX Cloud Benchmarking Learning Library Techblog ### NVIDIA-Optimized Code for Popular LLMs **NVIDIA AI Foundation Models and Endpoints** Learn tips to generate code, answer queries, and translate text on Llama, Kosmos-2, and SeamlessM4T with NVIDIA AI Foundation Models. Tutorial ### How to Deploy a NIM in 5 Minutes **NVIDIA NIM** NVIDIA NIM™ is a set of easy-to-use inference microservices for accelerating the deployment of foundation models on any cloud or data center. Model ### Try NVIDIA NIM APIs **NVIDIA Build** Explore leading open models built by the community, optimized and accelerated by NVIDIA’s enterprise-ready inference runtime. Video ### NVIDIA DGX Cloud Create **NVIDIA DGX Cloud** NVIDIA DGX Cloud Create is a high-performance, fully managed AI training platform that provides optimized accelerated computing clusters on any leading cloud and access to NVIDIA experts. Techblog ### NVIDIA DGX Cloud Serverless Inference **NVIDIA DGX Cloud** NVIDIA DGX Cloud Serverless Inference simplifies deploying AI workloads across multiple regions with seamless auto-scaling, load balancing, and event-driven execution. * * * ## More Resources ![Decorative image representing forums](https://developer.download.nvidia.com/icons/m48-misc-question-faq.svg) ### NVIDIA DGX Cloud Benchmarking FAQ ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). **Get started with DGX Cloud Benchmarking today** [Try Now](/dgx-cloud/benchmarking/join) --- # Source: https://developer.nvidia.com/clara-guardian.md # NVIDIA Clara Guardian NVIDIA Clara™ Guardian is an application framework and partner ecosystem that simplifies the development and deployment of smart sensors with multimodal AI, anywhere in a healthcare facility. With a diverse set of pre-trained [models](/ai-models), reference applications, and fleet management solutions, developers can build solutions faster—bringing AI to healthcare facilities and improving patient care. [Get Started](https://catalog.ngc.nvidia.com/orgs/nvidia/collections/clara_guardian) ![clara software stack](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-clara-guardian-stack.svg) Clara Guardian’s key components include healthcare pre-trained models for computer vision and speech, training tools, deployment SDKs, and NVIDIA Fleet Command. NVIDIA Fleet Command is a hybrid-cloud platform for securely managing and scaling AI deployments across millions of servers or edge devices at hospitals. This makes it easy for ecosystem partners to add AI capabilities to common sensors that can monitor crowds for safe social distancing, measure body temperature, detect the absence of protective gear such as masks, or interact remotely with high-risk patients so that everyone in the healthcare facility stays safe and informed. Applications and services can run on a wide range of hardware, allowing developers to securely deploy anywhere, from the edge to the cloud. * * * ### Time to Solution Leverage high-performance, pre-trained [models](/ai-models) to build accurate AI in healthcare. ### Cloud-Native, Edge First Scale software quickly and deploy applications easily at the edge. ### Secure Management Securely manage and scale AI deployments across dozens or up to millions of servers or edge devices. * * * ## Healthcare-Specific, Pre-Trained Models ### Clara Guardian For Speech Clara Guardian for speech is a healthcare domain specific version of Riva conversational AI capabilities. - For **automated speech recognition (ASR)**, models perform offline and streaming recognition to automatically add punctuation, output word timestamps, and return top-n transcripts. CitriNet is the recommended new end-to-end convolutional Connectionist Temporal Classification (CTC) based ASR model. CitriNet models take in audio segments and transcribe them to letter,bytepair or word piece sequences.CitriNet has been trained on ASR dataset and ,without any external LM, it reaches Word Error Rate (WER) 6.22% on LibriSpeech test-other, and can run efficiently on a variety of hardware/ GPUs as shown [here](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-performance.html). The Conformer-CTC model is a non-autoregressive variant of the Conformer model for Automatic Speech Recognition that uses CTC loss/decoding instead of Transducer. - For **natural language understanding (NLU)**, deep learning models understand context via encoded vectors and provide appropriate outputs for specific language tasks like next-word prediction and text summarization. - For **text to speech (TTS)**, a speech synthesis model is based on FastPitchHifiGanE2E. FastPitchHifiGanE2E is an end-to-end, non-autoregressive model that generates audio from text. It combines FastPitch and HiFiGan into one model and is trained jointly in an end-to-end manner. Speech models (ASR, NLP, and TTS) can be used to capture, process, and respond to common requests that a patient might make when they are in a healthcare setting. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/clara/images/clara-guadian-speech%402x.jpg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/clara/images/clara-guadian-computer-vision%402x.jpg) ### Clara Guardian For Computer Vision Clara Guardian for computer vision is a healthcare domain specific version of DeepStream and Riva computer vision capabilities. Clara Guardian contains pre-trained models for applications such as gesture recognition, heart rate monitoring, mask detection, and body pose estimation. - Body pose estimation can be used to detect positions of key joints and facial landmarks in the body (eyes, ears, elbows, shoulders, wrists, hip sides, knees, ankles, nose, neck, etc) to build patient monitoring AI models. - Gesture recognition models can recognize a set of common gestures (wave, okay, thumbs-up, stop, etc). - Heart rate estimation can be used to obtain the heart rate of a person just by observing the video stream of a person’s face. Pre-compiled NVIDIA TensorRT engines are optimized on NVIDIA GPUs. ### Secure Management with Fleet Command NVIDIA Fleet Command is a hybrid-cloud platform for securely and remotely deploying, managing, and scaling AI across dozens or up to millions of servers or edge devices. Instead of spending weeks planning and executing deployments, in minutes, administrators can scale AI to hospitals. With the capability of an entire IT division in a single control plane, administrators can manage the lifecycle of AI applications, update system software over the air, and remotely monitor and access systems. See how our customers are using it: https://www.youtube-nocookie.com/embed/48VSHYeGtk4 GTC talk for more [technical details](https://www.nvidia.com/en-us/gtc/session-catalog/?tab.catalogtabfields=1600209910618001TWM3&search=%22Cristiana%20Dinea%22&search.language=1594320459782001LCjF) * * * ## An End-to-End AI Solution ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/clara-guardian-workflow-diagram.png) Clara Guardian includes GPU-optimized components that can accelerate every stage of your application development. ### Training - A collection of healthcare-specific, pre-trained computer vision and conversational AI models for a variety of use cases. - [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) to build conversational AI models for ASR, NLP, and TTS - [TAO Toolkit](/tao-toolkit) to create highly accurate computer vision models with zero coding ### Deployment - NVIDIA Riva for deploying conversational AI models that fuse vision, speech, and other sensor data - [NVIDIA DeepStream SDK](/deepstream-sdk) for a multi-platform scalable video analytics framework with Transport Layer Security (TLS) that can deploy on the edge and connect to any cloud * * * ## Testimonials > _“Our AI-powered IOT platform, running on NVIDIA Clara Guardian, is used by leading hospitals, such as Northwestern Medicine, to screen hundreds of thousands of people for elevated temperatures and help front-line providers safely care for patients during the pandemic. Clara Guardian made smart hospitals at the edge possible, enabling our customers to increase staff productivity by over six-fold, saving millions of dollars in staffing costs while improving patient care.”_ > > Andrew Gostine, MD, CEO of Whiteboard Coordinator ![Whiteboard](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/clara/images/whiteboard.png) > _“We have been using NVIDIA GPUs in Ouva solutions from day one. With our new solution, we are aiming to allow nurses to monitor hundreds of patients in real time. The Clara Guardian framework allows us to build a scalable and efficient solution in no time, allowing our team to focus on our core competency—developing algorithms that unlock the potential of remote care.”_ > > Dogan Demir, CEO of Ouva ![ouva](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/clara/images/ouva.png) * * * ## Resources **Webinars** - [Improve Patient Care with Everyday Sensors and Multimodal AI](https://www.nvidia.com/en-us/gtc/session-catalog/?search.industrysegment=option_1559593230294&search=a21257) - [Cutting Edge AI Tools Aid in Safe Patient Care in Smart Hospital](https://www.nvidia.com/en-us/gtc/session-catalog/?tab.catalogtabfields=1600209910618001TWM3&search=%22Cristiana%20Dinea%22&search.language=1594320459782001LCjF) - [TensorRT and Triton Deep Dive](https://event.on24.com/eventRegistration/EventLobbyServlet?target=reg30.jsp&referrer=&eventid=2355625&sessionid=1&key=7FEA91B2550844603D8D23097ED564D8&regTag=&sourcepage=register) - [NVIDIA Pre-trained Vision Models with DeepStream SDK](https://info.nvidia.com/iva-occupancy-webinar-reg-page.html) **Latest News** - Learn more about the latest[developer news](https://developer.nvidia.com/blog/?tags=healthcare-and-lifesciences&categories=) - Learn more about technical deep dive on [Fleet Command](https://www.nvidia.com/FleetCommand/) **Deep Learning Institute Training** - [Optimization and Deployment of TensorFlow Models with TensorRT](https://courses.nvidia.com/courses/course-v1:DLI+L-FX-18+V2/about) - [Fundamentals of Deep Learning for Computer Vision](https://courses.nvidia.com/courses/course-v1:DLI+C-FX-01+V2/about) - [Getting Started with AI on Jetson Nano](https://courses.nvidia.com/courses/course-v1:DLI+C-RX-02+V1/about) **Intelligent Video Analytics** - [NVIDIA DeepStream SDK](/deepstream-sdk) for AI-based multi-sensor processing and video and image understanding - [TAO Toolkit](/tao-toolkit) to create highly accurate AI models with zero coding **Speech and NLP** - [NVIDIA NeMo](https://github.com/NVIDIA/NeMo), an open-source toolkit for building conversational AI models - [NVIDIA Riva SDK](/riva) for deploying conversational AI models that fuse vision, speech, and other sensor data **Edge Hardware** Explore the Fleet Command to securely manage and scale AI deployments. [Learn More](https://www.nvidia.com/en-us/data-center/products/egx-edge-computing/) _Disclaimer: Clara SDKs and samples are for developmental purposes only and cannot be used directly for clinical procedures._ --- # Source: https://developer.nvidia.com/cloudxr-sdk.md # NVIDIA CloudXR Suite NVIDIA CloudXR™ is designed to provide seamless, high-fidelity immersive streaming to extended reality (XR) devices over any network. The CloudXR Suite is a set of tools that enables developers to stream XR applications. [Apply for Access](https://developer.nvidia.com/nvidia-cloudxr-sdk-early-access-program) ![](https://developer.download.nvidia.com/images/cloudxr-suite-diagram.svg) ## How It Works With the CloudXR Suite, extended reality (XR) content can be **streamed over wired** or **wireless networks** to any **tethered** or **untethered thin client** device. The CloudXR Suite contains the following components: - **Core:** Provides the streaming layer, complete with new improvements such as L4S optimizations, QoS algorithms, and enhanced logging tools. - **Client Additions:** This Unity plug-in allows developers to build custom CloudXR client applications with Unity development tools. - **Server Additions:** Includes a compilable OpenXR runtime based on the Monado Open Source OpenXR runtime. ## See CloudXR in Action - [ ### **Humanoid Robotics** ](#humanoid) - [ ### **Manufacturing and Design** ](#manufacturing) - [ ### **Architecture, Engineering, Construction, and Operations** ](#aeco) - [ ### **Media and Entertainment** ](#media) ## NVIDIA Isaac GR00T ![](https://developer.download.nvidia.com/images/nvidia-cloudxr-gr00t-1920x1080-2.jpg) [Streamline Data Collection With NVIDIA Isaac GR00T](https://www.youtube.com/watch?v=waZ08Z3uimk&ab_channel=NVIDIA) ## Fourier ![](https://developer.download.nvidia.com/images/nvidia-cloudxr-humanoid-1920x1080.jpg) [Humanoid Development With Data Capture and Generation](https://www.youtube.com/watch?v=sygmkSXuYDQ&t=1s&ab_channel=NVIDIA) ## Katana ![](https://developer.download.nvidia.com/images/nvidia-cloudxr-katana-1920x1080.jpg) [Enhance Product Design and Customer Experiences](https://youtu.be/LAVXuDK83iA?si=J_gTdPRUW62x9M75&t=68) ## ESI Group IC.IDO To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video. ESI Group’s IC.IDO collaborative VR environment ## Innoactive https://player.vimeo.com/video/525702109?h=d9f827d9a4&title=0&byline=0&portrait=0 Why SAP uses Innoactive Creator and Portal ## Luxion KeyVR To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video. One Click from Luxion’s KeyShot to KeyVR streaming AR ## Theia Interactive [![Man wearing gray shirt inside a room](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/theia-interactive-630x353.jpg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/theia-interactive-630x353.jpg) Image courtesy of Theia Interactive ## Masterpiece VR https://www.youtube-nocookie.com/embed/kx6-IjnuJXk Streaming immersive 3D design and character creation ## Dreamscape Immersive To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video. 5G use case with AT&T, WEVR, Ericsson, and Dreamscape Immersive ## Heavenue https://player.vimeo.com/video/662158278?h=2e81c63403&title=0&byline=0&portrait=0 Actors Theatre of Louisville’s Lab Production of A Christmas Carol: Scrooge’s Ghost Encounter Live in VR * * * ## Benefits ### Untethered XR Device Experience High-fidelity, immersive VR applications typically require a restrictive tethered HMD. NVIDIA CloudXR allows VR users to move to wireless HMDs without any noticeable degradation in the VR experience. ### Growing Wireless Ecosystem Beyond the multitude of cellular phones and tablets, there are several OEMs working on 5G- and wifi-enabled VR all-in-one HMDs. ### Streaming of Any OpenVR Application The CloudXR streaming stack provides developers with a flexible high-performance cloud instance to stream even the most demanding immersive experiences. ### AR Support The CloudXR server application even supports streaming VR content to AR phones/tablets by adding an alpha channel to blend rendered content with live camera streams. ## Supported Devices ### Head Mounted Displays **OpenVR-capable HMDs tethered to NVIDIA Pascal™, Turing™, or Ampere™-based GPU:** **HTC VIVE** - HTC VIVE - HTC VIVE Pro - HTC VIVE Pro Eye **Valve** - Valve Index **Untethered Android-based devices:** **HTC VIVE** - HTC VIVE Focus 3 **Meta Quest** - Meta Quest - Meta Quest 2 - Meta Quest 3 - Meta Quest Pro **Pico** - Pico Neo 3 - Pico Neo 3 Pro Eye - Pico Neo 4 - Pico Neo 4 Enterprise **XREAL** - XREAL Air * * * ### Mobile Devices - [Google ARCore supported devices](https://developers.google.com/ar/discover/supported-devices) - [Apple Augmented Reality supported devices](https://www.apple.com/augmented-reality/) ## Device-Specific CloudXR Client Application Packages (APKs) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/vive-focus.svg) ### [HTC VIVE Focus 3](https://github.com/ViveSoftware/Wave-CloudXR-Sample) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/ic-pico-logo.svg) ### [Pico Neo 3](https://github.com/picoxr/CloudXR_Client_Demo) HMD partners wishing to support CloudXR can use the CloudXR Client SDK to create a device-specific APK. ## Supported Cloud Service Providers **CloudXR is available on the following cloud service provider marketplaces:** [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/aws-logo.svg)](https://aws.amazon.com/marketplace/pp/prodview-galujeez5ljra) ### [Amazon Web Services](https://aws.amazon.com/marketplace/pp/prodview-galujeez5ljra) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/google-cloud-logo.svg)](https://console.cloud.google.com/marketplace/product/nvidia/nvidia-cloudxr-introductory-offer-winserver2019?project=deft-waters-328817) ### [Google Cloud](https://console.cloud.google.com/marketplace/product/nvidia/nvidia-cloudxr-introductory-offer-winserver2019?project=deft-waters-328817) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/microsoft-azure-logo.svg)](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/nvidia.nvidia-cloudxr-win2019?tab=Overview) ### [Microsoft Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/nvidia.nvidia-cloudxr-win2019?tab=Overview) **NVIDIA Partner Network cloud service providers:** ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/alibaba-cloud-logo.svg) ### Alibaba Cloud ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/ic-core-weave.svg) ### Coreweave ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/tencent-cloud-logo.svg) ### Tencent Cloud Learn More About Availability From NVIDIA Partner Network Service Providers ## CloudXR News ## Additional Resources ![](https://developer.download.nvidia.com/images/cloudxr/nvidia-cloudxr-spatial-streaming-digital-twins-1920x1080.jpg) Foxconn ### Spatial Streaming for Omniverse Digital Twins Learn more about streaming immersive OpenUSD-based NVIDIA Omniverse™ digital twins to the Apple Vision Pro with this reference workflow. [Explore the Workflow Guide](/blog/experience-digital-twins-in-xr-with-nvidia-omniverse-spatial-streaming/) ![](https://developer.download.nvidia.com/images/cloudxr/nvidia-cloudxr-humanoid-robot-learning-1920x1080.jpg) ### Humanoid Robot Learning With Apple Vision Pro Capture teleoperated data from Apple Vision Pro to generate large synthetic trajectories data sets from just a few human demonstrations using NVIDIA Isaac™ GR00T. [Read the Blog](/blog/building-a-synthetic-motion-generation-pipeline-for-humanoid-robot-learning/) [Apply for Early Access](/humanoid-robot-program) ![](https://developer.download.nvidia.com/images/cloudxr/xr_pavilion_gtc25_1920x1080.jpg) ### Experience XR at GTC Explore the latest in the spatial computing ecosystem at GTC. Learn more about the latest technical advancements from our partners first-hand. [Watch on Demand](https://www.nvidia.com/en-us/on-demand/playlist/playList-faa749e0-49ce-4e6f-92a4-717f17a23e07/) Ready to get started with CloudXR? [Apply For Access](https://developer.nvidia.com/nvidia-cloudxr-sdk-early-access-program) --- # Source: https://developer.nvidia.com/compute-sanitizer.md # NVIDIA Compute Sanitizer ## Compute Sanitizer Tools & API **Compute Sanitizer** is a functional correctness checking suite. This suite contains multiple tools that can perform different type of checks. Tool features are described below. The **Compute Sanitizer API** enables the creation of sanitizing and tracing tools that target CUDA applications. Examples of such tools are memory and race condition checkers. The Compute Sanitizer API is composed of three APIs: the callback API, the patching API and the memory API. It is delivered as a dynamic library on supported platforms. ### [Memcheck](https://docs.nvidia.com/cuda/sanitizer-docs/ComputeSanitizer/index.html#memcheck-tool) The _memcheck_ tool is a run time error detection tool for CUDA applications. The tool can precisely detect and report out of bounds and misaligned memory accesses to global, local and shared memory in CUDA applications. It can also detect and report hardware reported error information. In addition, the memcheck tool can detect and report memory leaks in the user application. ### [Racecheck](https://docs.nvidia.com/cuda/sanitizer-docs/ComputeSanitizer/index.html#racecheck-tool) The _racecheck_ tool is a run time shared memory data access hazard detector. The primary use of this tool is to help identify memory access race conditions in CUDA applications that use shared memory. In CUDA applications, storage declared with the _\_\_shared\_\__ qualifier is placed on chip _shared memory_. All threads in a thread block can access this per block shared memory. Shared memory goes out of scope when the thread block completes execution. As shared memory is on chip, it is frequently used for inter-thread communication and as a temporary buffer to hold data being processed. As this data is being accessed by multiple threads in parallel, incorrect program assumptions may result in data races. Racecheck is a tool built to identify these hazards and help users write programs free of shared memory races. Currently, this tool only supports detecting accesses to on-chip shared memory. ### [Initcheck](https://docs.nvidia.com/cuda/sanitizer-docs/ComputeSanitizer/index.html#initcheck-tool) The _initcheck_ tool is a run time uninitialized device global memory access detector. This tool can identify when device global memory is accessed without it being initialized via device side writes, or via CUDA memcpy and memset API calls. Currently, this tool only supports detecting accesses to device global memory. ### [Synccheck](https://docs.nvidia.com/cuda/sanitizer-docs/ComputeSanitizer/index.html#synccheck-tool) The _synccheck_ tool is a runtime tool that can identify whether a CUDA application is correctly using synchronization primitives, specifically \_\_syncthreads() and _\_\_syncwarp()_ intrinsics and their Cooperative Groups API counterparts. ### Developer Benefits **Compute Sanitizer** : Provides the ability to ensure the code correctness of your GPU accelerated kernels **Compute Sanitizer API** : Provides the ability to incorporate GPU code correctness into your own tools **Both the Compute Sanitizer suite and API are available in the CUDA Toolkit. You may obtain the latest version of Compute Sanitizer by downloading the CUDA Toolkit** [ Download the CUDA Toolkit Now ](https://developer.nvidia.com/cuda-downloads) ##### [Sanitizer Tools documentation](https://docs.nvidia.com/cuda/sanitizer-docs/ComputeSanitizer/index.html) | [Sanitizer API documentation](https://docs.nvidia.com/cuda/sanitizer-docs/SanitizerApiGuide/index.html) | [Revision History](https://docs.nvidia.com/cuda/sanitizer-docs/ReleaseNotes/index.html) **NVIDIA® Compute Sanitizer** is freely offered through the [NVIDIA Registered Developer Program](/sign-up-gameworks-registered-developer-program) and as part of the [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) --- # Source: https://developer.nvidia.com/cosmos.md 1. [Topic](/topics/) [Artificial Intelligence](/topics/ai/) Cosmos # NVIDIA Cosmos for Developers [NVIDIA Cosmos™](http://www.nvidia.com/en-us/ai/cosmos) is a platform purpose-built for physical AI, featuring state-of-the-art generative [world foundation models](https://www.nvidia.com/en-us/glossary/world-models/) (WFMs), guardrails, and an accelerated data processing and curation pipeline for [autonomous vehicle (AV)](https://www.nvidia.com/en-us/use-cases/autonomous-vehicle-simulation/), [robotics](https://www.nvidia.com/en-us/solutions/robotics-and-edge-computing/), and [AI agent developers](https://www.nvidia.com/en-us/use-cases/video-analytics-ai-agents/). Build, evaluate, deploy, and simulate [physical AI](https://www.nvidia.com/en-us/glossary/physical-ai/) models faster while minimizing testing and validation risks in the real world. [Download from GitHub](https://github.com/nvidia-cosmos)[Documentation ](http://docs.nvidia.com/cosmos)[Cookbook](https://nvidia-cosmos.github.io/cosmos-cookbook/) * * * ## How It Works ![Diagram showing the application and Omniverse Cloud using USD framework](https://developer.download.nvidia.com/images/cosmos/nvidia-cosmos-how-it-works-diagram-ari.jpg) Cosmos WFMs accelerate physical AI development, helping developers augment datasets and post-train downstream world models for robots and autonomous vehicles. [Cosmos Predict](https://huggingface.co/collections/nvidia/cosmos-predict1-67c9d1b97678dbf7669c89a7) generates next frames based on input to build datasets predicting various edge cases and serves as the foundation for all world models. [Cosmos Reason](https://huggingface.co/collections/nvidia/cosmos-reason1-67c9e926206426008f1da1b7) acts as a critic, using chain-of-thought reasoning to evaluate synthetic visuals and reward outcomes. It can also generate captions to speed up data curation. [Cosmos Transfer](https://huggingface.co/collections/nvidia/cosmos-reason1-67c9e926206426008f1da1b7) amplifies structured video across various environments and lighting conditions. Developers can use the available PyTorch inference and post-training scripts along with model checkpoints. Cosmos NIM microservices are in development—Cosmos Predict NIM microservices are available [here](https://build.nvidia.com/search?q=%22cosmos%22+-nemotron). * * * ## NVIDIA Cosmos World Foundation Models A family of pretrained models for world generation as videos and world understanding for accelerating physical AI development. Available openly to developers on NGC, Hugging Face, and [GitHub](https://github.com/nvidia-cosmos). ### Cosmos Predict For future world state generation or as a base for custom world models. Input: Text or Image Output: Video [Get Started on GitHub](https://github.com/nvidia-cosmos/cosmos-predict2.5) [Try Model Checkpoint on Hugging Face](https://huggingface.co/nvidia/Cosmos-Predict2.5-2B) ### Cosmos Transfer Multicontrol net for fast, photorealistic video data augmentation. Input: Segmentation maps, depth signals, HD maps, or CG simulation videos. Pair with: NVIDIA Omniverse Output: Photorealistic world scenes [Get Started on GitHub](https://github.com/nvidia-cosmos/cosmos-transfer2.5) [Try Model Checkpoint on Hugging Face](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) ### Cosmos Reason World reasoning for synthetic data curation, robot decision-making, and runtime video analytics for AI agents. Input: Image or Video Output: Chain-of-thoughts reasoning and text [Experience Model as NVIDIA NIM API](https://build.nvidia.com/nvidia/cosmos-reason1-7b)[Get Started on GitHub](https://github.com/nvidia-cosmos/cosmos-reason1) ### Cosmos Curator Filter, annotate, and deduplicate large datasets for physical AI development using advanced AI models and distributed computing. [Get Started](https://github.com/nvidia-cosmos/cosmos-curate) ### Cosmos Dataset Search Instantly search and retrieve scenarios from massive training datasets for targeted post-training using the Cosmos Dataset Search vector-based search workflow. [Try Now](https://build.nvidia.com/nvidia/cosmos-dataset-search) ### Cosmos Guardrails A set of guardrails, including a pre-guard to block harmful inputs and a post-guard to ensure safety and consistency in generations. [Download Cosmos Guardrail](https://huggingface.co/nvidia/Cosmos-1.0-Guardrail) ### Cosmos Prompt Upsampler Transform original input prompts into more detailed and enriched versions for higher-quality outputs from Cosmos WFMs. [Download Cosmos Prompt Upsampler](https://huggingface.co/nvidia/Cosmos-1.0-Prompt-Upsampler-12B-Text2World) ### Sample AV Models Try sample post-trained Cosmos models specialized for autonomous vehicle development, including multi-view and lidar generation. [Try Now](https://github.com/nv-tlabs/Cosmos-Drive-Dreams) ## Introductory Resources ### Cosmos Cookbook: A Practical Guide to Physical AI Models The Cosmos Cookbook is an open-source guide with step-by-step workflows and examples for deploying and customizing NVIDIA Cosmos world foundation models in real-world applications. [Read Cookbook](https://nvidia-cosmos.github.io/cosmos-cookbook/) ### Updated Versions of Cosmos Predict 2.5 & Transfer 2.5 Cosmos Predict 2.5 and Transfer 2.5 advance NVIDIA’s world foundation models by unifying multimodal world generation and improving spatially controlled world transformation, driving scalable, efficient physical AI. [Read Hugging Face Blog](https://huggingface.co/blog/nvidia/cosmos-predict-and-transfer2-5) ### World Simulation With Video Foundation Models This NVIDIA Cosmos white paper presents an open platform of world foundation models, video curation tools, and tokenizers to help developers efficiently customize, generate, and simulate physics-based data for advancing physical AI applications like robotics and autonomous vehicles. [Read White Paper](https://research.nvidia.com/publication/2025-09_world-simulation-video-foundation-models-physical-ai) * * * ## Starter Kits Start solving physical AI challenges by developing custom world models with Cosmos or using Cosmos WFMs for downstream use cases. Explore implementation scripts, explainer blogs, and more how-to documentation for various stages of physical AI development. ### Post-Training Cosmos WFMs Cosmos WFMs are purpose-built for post-training. Use domain-specific datasets to build world models, or post-train for different types of output, such as action generation for policy models. - [Post-Training Cookbook](https://github.com/nvidia-cosmos/cosmos-cookbook) - [Cosmos Tokenizer Documentation](https://research.nvidia.com/labs/dir/cosmos-tokenizer/) and [Script](https://docs.nvidia.com/cosmos/latest/predict1/tokenizer/index.html) - [NVIDIA NeMo™ Data Curator Early Access](/nemo-microservices) - [Physical AI Open Dataset](https://huggingface.co/collections/nvidia/physicalai-67c643edbb024053dcbcd6d8) ### Synthetic Data Generation Build and deploy world models for infinite domain-specific synthetic data. Use NVIDIA Omniverse for physics-based conditioning. - [Inference Script ](https://github.com/nvidia-cosmos/cosmos-transfer2.5/blob/main/docs/inference.md) - [Workflow Guide and Documentation](https://docs.omniverse.nvidia.com/guide-sdg/latest/index.html) - [Cosmos NVIDIA NIM™ Microservice](https://build.nvidia.com/search?q=%22cosmos%22+-nemotron) - [Intro to Cosmos for Physical AI Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-42+V1) - [Cosmos Synthetic Dataset Augmentation](https://docs.omniverse.nvidia.com/guide-sdg/latest/case-studies/case4.html) ### Vision-Language Models Vision language models (VLMs) are multimodal, generative AI models that can understand and process video, images, and text. - [Product Education Tech Blog](https://developer.nvidia.com/blog/maximize-robotics-performance-by-post-training-nvidia-cosmos-reason/) - [Github Resources](https://github.com/nvidia-cosmos/cosmos-reason1) - [Post-training Script](https://github.com/nvidia-cosmos/cosmos-reason1/blob/main/examples/post_training/README.md) - [Cosmos Cookbook](https://nvidia-cosmos.github.io/cosmos-cookbook/) - [Start Prototyping](https://build.nvidia.com/nvidia/cosmos-reason1-7b/deploy) - [Documentation](https://docs.nvidia.com/cosmos/latest/reason1/quickstart_guide.html#) ### Video Analytics AI Agent Build a video analytics AI agent using NVIDIA Cosmos Reason with NVIDIA Blueprint for video search and summarization (VSS). - [Product Education Tech Blog](https://developer.nvidia.com/blog/how-to-integrate-computer-vision-pipelines-with-generative-ai-and-reasoning/) - [Try the Blueprint](https://build.nvidia.com/nvidia/video-search-and-summarization) - [Github Resources](https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization) - [Deploy Launchable](https://brev.nvidia.com/launchable/deploy?launchableID=env-2tYIjRXL4eMCbH9Az8mJC5WPAI4) - [Documentation](https://docs.nvidia.com/vss/latest/index.html#) * * * ## Cosmos Learning Library ## * * * More Resources ![NVIDIA Developer Forums](https://developer.download.nvidia.com/icons/m48-communication-chat.svg) ### GitHub Forums ![NVIDIA Training and Certification](https://developer.download.nvidia.com/icons/m48-misc-question-faq.svg) ### Read Cosmos FAQ ![NVIDIA Inception Program for Startups](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for the Developer Newsletter * * * ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloading or using this model in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. NVIDIA has collaborated with Google Deepmind to watermark generated videos from the NVIDIA API catalog. For more detailed information on ethical considerations for this model, please see the [System Card](https://nvdam.widen.net/s/knnqs6ghqn/nvidia-cosmos-system-card), Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). **Get Started With NVIDIA Cosmos Today** [Try Now ](https://github.com/nvidia-cosmos) --- # Source: https://developer.nvidia.com/cublas.md # cuBLAS * * * ## Basic Linear Algebra on NVIDIA GPUs [Download](/hpc-sdk)[Documentation](https://docs.nvidia.com/cuda/cublas/index.html)[Samples](http://github.com/nvidia/cudalibrarysamples)[Support](https://forums.developer.nvidia.com/tags/c/accelerated-computing/gpu-accelerated-libraries/12/cublas)[Feedback](mailto:Math-Libs-Feedback@nvidia.com?subject=cuBLAS%20Feedback&body=Thank%20you%20for%20using%20our%20software%20and%20taking%20time%20to%20provide%20your%0D%0A%0D%0Afeedback.%0D%0A%0D%0A%0D%0A%0D%0AIf%20you%20have%20a%20question%20or%20issue,%20please%20post%20it%20on%20the%20NVIDIA%20%20%0D%0A%0D%0AGPU-ACCELERATED%20LIBRARIES%20Developer%20Forum%20at:%0D%0A%0D%0Ahttps://forums.developer.nvidia.com/c/accelerated-computing/gpu-accelerated-libraries/12%0D%0A%0D%0A%0D%0A%0D%0ATo%20submit%20a%20bug%20report,%20please%20follow%20the%20instructions%20at:%20%0D%0A%0D%0Ahttps://forums.developer.nvidia.com/t/how-to-report-a-bug/67911%0D%0A%0D%0A%0D%0A%0D%0AProvide%20as%20much%20information%20and%20reproducer%20code%20as%20possible.%0D%0A%0D%0A%0D%0A%0D%0AFor%20feedback%20or%20feature%20requests,%20please%20reply%20to%20this%20email%20and%20provide%20%0D%0A%0D%0Athe%20following:%0D%0A%0D%0A%0D%0A%0D%0A%20%201.%20Target%20Library%20(e.g.%20cuBLAS,%20cuFFT,%20etc.)%0D%0A%0D%0A%20%202.%20Target%20Hardware%20(e.g.%20V100,%20A100,%20etc.)%0D%0A%0D%0A%20%203.%20Target%20OS%20(e.g.%20RHEL%208.1,%20Ubuntu%2020.04,%20Windows%2010,%20etc.)%0D%0A%0D%0A%20%204.%20Use%20cases%20(the%20more%20detail%20the%20better)%0D%0A%0D%0A%20%20%20%20%20a.%20Data%20type(s)%0D%0A%0D%0A%20%20%20%20%20b.%20Data%20size(s)%0D%0A%0D%0A%20%20%20%20%20c.%20Format(s)%0D%0A%0D%0A%20%205.%20Limiting%20factors%20by%20not%20having%20the%20desired%20functionality.) NVIDIA cuBLAS is a GPU-accelerated library for accelerating AI and HPC applications. It includes several API extensions for providing drop-in industry standard BLAS APIs and GEMM APIs with support for fusions that are highly optimized for NVIDIA GPUs. The cuBLAS library also contains extensions for batched operations, execution across multiple GPUs, and mixed- and low-precision execution with additional tuning for the best performance. The cuBLAS library is included in both the [NVIDIA HPC SDK](https://developer.nvidia.com/hpc-sdk) and the [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads). [Explore what’s new in the latest release.](https://developer.nvidia.com/cuda-toolkit/whatsnew) * * * ## cuBLAS API Extensions ## cuBLAS Host API cuBLAS Host APIs for CUDA-accelerated **BLAS** for **Level 1** (vector-vector), **Level 2** (matrix-vector), and **Level 3** (matrix-matrix) operations. cuBLAS also includes custom GEMM extension APIs that are simple to use for drop-in hardware acceleration. _cuBLAS APIs are available in the cuBLAS library._ ## cuBLASLt Host API cuBLASLt Host APIs are **multi-stage GEMM APIs** that are highly expressive, allowing applications to leverage the latest NVIDIA architecture features for the best performance with support for **fusions** and performance tuning options. _cuBLASLt APIs are available in the cuBLAS library._ ## cuBLASXt Single-Process Multi-GPU Host API cuBLASXt Host API exposes a multi-GPU capable interface for efficiently dispatching Level 3 workloads across one or multiple GPUs in a **single node**. _cuBLASXt APIs are available in the cuBLAS library._ ## cuBLASMp Multi-Node Multi-GPU Host API (Preview) cuBLASMp (Preview) is a high-performance, **multi-process** , GPU-accelerated library for **distributed** basic dense linear algebra. cuBLASMp is available for standalone download and as part of the [HPC SDK](https://developer.nvidia.com/hpc-sdk). [Download cuBLASMp](http://developer.nvidia.com/cublasmp-downloads) ## cuBLASDx Device API (Preview) cuBLASDx (Preview) is a **device side** API extension to cuBLAS for performing BLAS calculations inside your CUDA kernel. **Fusing** numerical operations decreases the latency and improves the performance of your application. [Download cuBLASDx](https://developer.nvidia.com/cublasdx-downloads) ## cuBLAS Key Features - Complete support for all 152 standard BLAS routines - Support for half-precision and integer matrix multiplication - GEMM and GEMM extensions with fusion optimized for Tensor Cores - GEMM performance tuned for sizes used in various Deep Learning models - Supports CUDA streams for concurrent operations ## cuBLAS Performance The cuBLAS library is highly optimized for performance on NVIDIA GPUs, and leverages tensor cores for acceleration of low- and mixed-precision matrix multiplication. [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cublas-11-mixed.png)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cublas-11-mixed.png) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cublas-11-fp64.png)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cublas-11-fp64.png) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cublas-11-int.png)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cublas-11-int.png) _cuBLAS Matrix Multiply performance on Datacenter GPUs for various precisions_ ## cuBLASMp Key Features - Multi-node multi-GPU basic linear algebra functionality - 2D Block Cyclic data layout - Fortran wrappers available through nvfortran ## cuBLASMp Performance cuBLASMp harnesses tensor core acceleration, while efficiently communicating between GPUs and synchronizing their processes. [![](https://developer.download.nvidia.com/images/WeakScaling_Cropped.png)](https://developer.download.nvidia.com/images/WeakScaling_Cropped.png) Weak scaling of cuBLASMp distributed double precision GEMM. M,N,K = 55k per GPU [![](https://developer.download.nvidia.com/images/NewStrongScaling_Cut.png)](https://developer.download.nvidia.com/images/NewStrongScaling_Cut.png) Strong scaling of cuBLASMp distributed double precision GEMM. M,N,K = 55k ## cuBLASLt Performance [![](https://developer.download.nvidia.com/images/llama-2-training.png)](https://developer.download.nvidia.com/images/llama-2-training.png) Weak scaling of cuBLASMp distributed double precision GEMM. M,N,K = 55k per GPU --- # Source: https://developer.nvidia.com/cuda-gdb.md # CUDA-GDB When developing massively parallel applications on the GPU, you need a debugger capable of handling thousands of threads running simultaneously on each GPU in the system. CUDA-GDB delivers a seamless debugging experience that allows you to debug both the CPU and GPU portions of your application simultaneously. If you already use GDB to debug your CPU application then getting started with CUDA-GDB involves learning just a few additional debugger commands. Just like GDB, CUDA-GDB provides a console-based debugging interface you can use from the command line on your local system or any remote system on which you have Telnet or SSH access. If you prefer debugging with a GUI frontend, CUDA-GDB also supports integration with [DDD](https://www.gnu.org/software/ddd), [EMACS](https://www.gnu.org/software/emacs), [Nsight Eclipse Edition](/nsight-eclipse-edition) or the new [Nsight Visual Studio Code Edition](/nsight-visual-studio-code-edition) Note that [NVIDIA® CUDA Toolkit 11.0](https://developer.nvidia.com/cuda-toolkit) (and later) no longer supports development or running applications on macOS. While there are no tools which use macOS as a target environment, NVIDIA made the [macOS host version of cuda-gdb available up to CUDA 12.4](/nvidia-cuda-toolkit-12_4_0-developer-tools-mac-hosts). However, the macOS host versions were dropped as of CUDA 12.5. | | [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/buttons/production_download.png)](/cuda-downloads) | | [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/buttons/cuda_prerelease_learnmore_button.png)](/cuda-toolkit) | | [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/tools/MacHosts/cuda_gdb_debugAPI_UD_preview.png)](/nvidia-cuda-toolkit-cuda-gdb-developer-preview) | | ### CUDA-GDB Feature Set Overview CUDA-GDB supports debugging of both 32 and 64-bit CUDA C/C++ applications. It provides full control over the execution of the CUDA application including breakpoints and single-stepping. You can examine variables, read/write memory and registers and inspect the GPU state when the application is suspended. Third party developers can leverage CUDA-GDB powerful features directly, see **CUDA Samples,** which can be downloaded with the latest [CUDA Toolkit .](/cuda-downloads) | ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/toolsscreenshots/cuda-gdb-screenshot-1.png "Click to zoom/shrink") (Click to zoom) | **CUDA-GDB on Linux** \> Debug CUDA C and C++ applications directly on the GPU \> Simultaneously debug on the CPU and more GPUs \> Debug at either high-level C/C++ source or low-level GPU assembly \> Use conditional breakpoints to identify and correct errors in CUDA code \> Identify memory access violations \> Automatically break on every kernel launch \> Use the autostep mode to detect errors more precisely \> Debug multiple applications simultaneously using multiple sessions \> Selectively assert in CUDA code \> Dynamic Parallelism Support \> Inlined Subroutine Support \> Run [CUDA-MEMCHECK](/cuda-memcheck) in integrated mode to detect precise exceptions. | | ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/toolsscreenshots/cuda-gdb-screenshot-2.png "Click to zoom/shrink") (Click to zoom) | \> Kernel launch stack information \> Single-GPU debugging \> Long-running kernel debugging \> Remote debugging for x86 and ARM | Developers should be sure to check out NVIDIA Nsight for integrated debugging and profiling. [Nsight Eclipse Edition](/nsight-eclipse-edition) and [Nsight Visual Studio Code Edition](/nsight-visual-studio-code-edition) for Linux support, and [Nsight Visual Studio Edition](/nvidia-nsight-visual-studio-edition) for Windows. ### CUDA Debugging on Clusters NVIDIA partners with the following vendors to provide cluster-class debugging solutions for CUDA applications: | [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/product_logos/ddt340.png)](/allinea-ddt) | [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/product_logos/RogueWave_Logo_RGB.png)](/totalview-debugger) | #### [CUDA-GDB Documentation](https://docs.nvidia.com/cuda/cuda-gdb/index.html) ### Questions on CUDA Tools If you encounter difficulty with any of the CUDA Tools or have more questions please contact the NVIDIA tools team at [cudatools@nvidia.com](mailto:cudatools@nvidia.com) . --- # Source: https://developer.nvidia.com/cuda-q.md # NVIDIA CUDA-Q NVIDIA CUDA-Q™ is the quantum processing unit (QPU)-agnostic platform for [accelerated quantum supercomputing](https://developer.nvidia.com/blog/an-introduction-to-quantum-accelerated-supercomputing/). [Get Started](https://nvidia.github.io/cuda-quantum/latest/using/quick_start.html) * * * ## How CUDA-Q Works CUDA-Q is an open-source quantum development platform that orchestrates the hardware and software needed to run useful, large-scale [quantum computing](https://www.nvidia.com/en-us/glossary/quantum-computing/) applications. The platform’s hybrid programming model allows computation on GPU, CPU, and QPU resources in tandem from within a single quantum program. CUDA-Q is “qubit-agnostic”—seamlessly integrating with all QPUs and qubit modalities and offering GPU-accelerated simulations when adequate quantum hardware isn’t available. CUDA-Q extends far beyond the NISQ-era, charting a course to large-scale, error-corrected quantum supercomputing with libraries, tools, infrastructure, and a hybrid programming model built for the future of quantum computing. Under the hood, CUDA-Q can be interchangeably powered by industry-leading simulators or actual quantum processors from a growing list of vendors. Both of these engines can leverage AI supercomputing, whether to GPU-accelerate simulations or control and enhance QPU operations. ![A diagram showing how CUDA-Q works](https://developer.download.nvidia.com/images/cuda-q/cuda-q-diagram.jpg) * * * ## Key Features ![Decorative icon](https://developer.download.nvidia.com/icons/m48-binaries.svg) ### Simplify Development of Hybrid Quantum-Classical Applications The kernel-based programming model makes it easy to write a hybrid application once and run it on multiple QPU and simulation backends. ![Decorative icon](https://developer.download.nvidia.com/icons/m48-scalability-up-sample.svg) ### Run Quantum Simulations at Scale Powerful state vector, tensor networks, and noisy simulators can accelerate your applications with GPUs. ![Decorative icon](https://developer.download.nvidia.com/icons/m48-simulations.svg) ### Simulate Quantum Systems Accelerated simulation of the time evolution of dynamic systems, noise modeling, and quantum error correction (QEC) tools allow QPU builders to design fault-tolerant systems. ![Decorative icon](https://developer.download.nvidia.com/icons/m48-app-development-cycle.svg) ### Write Once, Run Everywhere CUDA-Q is QPU agnostic and integrates with 75% of publicly available QPUs. Write your code once and run on all qubit modalities. ![Decorative icon](https://developer.download.nvidia.com/icons/m48-scaling-cuda-c++.svg) ### Use Familiar Tools Use Python or C++ to describe your algorithm in a high-level language. The CUDA-Q compiler will lower and optimize the code based on the backend, using industry tools such as Multi-Level Intermediate Representation (MLIR), Low Level Virtual Machine (LLVM), and Quantum Intermediate Representation (QIR). ![Decorative icon](https://developer.download.nvidia.com/icons/m48-well-being-crs-collabration.svg) ### Be Part of the Community CUDA-Q is an open-source project and is part of the quantum community. It interops with AI and high-performance computing (HPC) libraries and visualization tools. * * * ## Built for Performance NVIDIA CUDA-Q enables the straightforward execution of hybrid code on many different types of quantum processors, simulated or physical. Researchers can use the cuQuantum-accelerated simulation backends or QPUs from our partners or connect their own simulator or quantum processor. ### GPU Advantage CUDA-Q quantum algorithm simulations can achieve a speedup of up to 180x over a leading CPU, as well as scaling of the number of qubits with low overhead in GPU time. ![A chart showing CUDA-Q GPU speedup performance over CPU](https://developer.download.nvidia.com/images/cuda-q/gpu-advantage.jpg) ### Multiple GPU Scaling Multiple GPUs can scale the performance of quantum algorithm simulations by more than 300x. ![Multiple GPUs can scale a quantum algorithm beyond today’s quantum devices](https://developer.download.nvidia.com/images/cuda-q/quantum-computing-cuda-q-multiple-gpu-scaling-chart.jpg) * * * ## Starter Kits ### Optimization Understand and solve the Max-Cut optimization problem with the Quantum Approximate Optimization Algorithm (QAOA). - [Read the QAOA paper](https://arxiv.org/abs/1411.4028). - Learn about CUDA-Q [optimizers](https://nvidia.github.io/cuda-quantum/latest/examples/python/optimizers_gradients.html) and use the [observe](https://nvidia.github.io/cuda-quantum/latest/examples/python/executing_kernels.html#Observe)function. - [Run the Max-Cut notebook](https://nvidia.github.io/cuda-quantum/latest/applications/python/qaoa.html). ### Quantum Error Correction Learn how to do quantum error correction with CUDA-Q. - [Read the blog about CUDA-Q QEC.](https://developer.nvidia.com/blog/accelerating-quantum-error-correction-research-with-nvidia-quantum/) - [Explore CUDA-Q QEC documentation](https://nvidia.github.io/cudaqx/components/qec/introduction.html#). - [Run the QEC examples](https://nvidia.github.io/cudaqx/examples_rst/qec/examples.html). ### Dynamic Simulation Learn about the dynamics capabilities in CUDA-Q. - [Read the blog that introduces the dynamics capabilities.](https://developer.nvidia.com/blog/accelerating-googles-qpu-development-with-new-quantum-dynamics-capabilities/) - [Read CUDA-Q dynamics documentation](https://nvidia.github.io/cuda-quantum/latest/using/backends/dynamics.html#). - [Explore different qubit modalities’ examples of system time evolution. examples.](https://github.com/NVIDIA/cuda-quantum/tree/742a31dee48f7fa6a9d274528f6f2875c6312f7b/docs/sphinx/examples/python/dynamics) * * * ## Use Cases ### Fault-Tolerant Qubits Infleqtion demonstrated error-corrected, logical qubits using neutral atoms. - [Read the “NVIDIA CUDA-Q Runs Breakthrough Logical Qubit Application on Infleqtion QPU” blog](https://developer.nvidia.com/blog/nvidia-cuda-q-runs-breakthrough-logical-qubit-application-on-infleqtion-qpu/). - [Read the paper](https://arxiv.org/pdf/2412.07670). - [View the implementation](https://nvidia.github.io/cuda-quantum/latest/applications/python/logical_aim_sqale.html). ### AI for Algorithm Design The University of Toronto developed the Generative Quantum Eigensolver—a new class of quantum algorithms that uses AI to improve performance. - [Read the “Advancing Quantum Algorithm Design With GPTs” blog](https://developer.nvidia.com/blog/advancing-quantum-algorithm-design-with-gpt/). - Read the paper. ### Solar Energy Prediction The Chung Yuan Christian University developed a quantum neural network model for solar irradiance forecasting, showing faster training and improved performance. - [Read the “Accelerating Quantum Algorithms for Solar Energy Prediction With NVIDIA CUDA-Q and NVIDIA cuDNN” blog](https://developer.nvidia.com/blog/accelerating-quantum-algorithms-for-solar-energy-prediction-with-nvidia-cuda-q-and-nvidia-cudnn/). - Read the paper. ### Divisive Clustering The University of Edinburgh developed a method of finding data patterns and clustering big data so it can be used in quantum computers. - [Read the “CUDA-Q Enabled Resource Reduction for Quantum Clustering Algorithms” blog](https://developer.nvidia.com/blog/cuda-q-enabled-resource-reduction-for-quantum-clustering-algorithms/). - [View the implementation](https://nvidia.github.io/cuda-quantum/latest/applications/python/divisive_clustering_coresets.html). - [Read the paper](https://arxiv.org/pdf/2402.01529). ### Molecular Generation Yale University developed a hybrid transformer with a quantized self-attention mechanism applied to molecular generation. - [View the implementation](https://nvidia.github.io/cuda-quantum/latest/applications/python/quantum_transformer.html). - [Read the paper](https://arxiv.org/abs/2502.19214). ### Circuit Synthesis The University of Innsbruck used diffusion models to synthesize arbitrary unitaries into CUDA-Q kernels. - [View the implementation](https://nvidia.github.io/cuda-quantum/latest/applications/python/unitary_compilation_diffusion_models.html). - [Read the paper](https://www.nature.com/articles/s42256-024-00831-9). * * * ## CUDA-Q Learning Resources ### CUDA-Q Documentation Browse [documentation](https://nvidia.github.io/cuda-quantum/latest/index.html) for the latest version of CUDA-Q. ### CUDA-Q Application Hub Run Python notebooks of real-life [applications](https://nvidia.github.io/cuda-quantum/latest/using/tutorials.html) showing the power of CUDA-Q. ### CUDA-Q Repo Visit the CUDA-Q GitHub [repository](https://github.com/NVIDIA/cuda-quantum) to contribute code and create issues. ### CUDA-QX Libraries Explore domain-specific CUDA-Q [libraries](https://nvidia.github.io/cudaqx/index.html) for QEC and solvers. ### CUDA-Q Academic Explore [CUDA-Q Academic](https://github.com/NVIDIA/cuda-q-academic) materials, including self-paced Jupyter notebook modules for building and optimizing hybrid quantum-classical algorithms using CUDA-Q. ### Quick-Start to Accelerated Quantum Supercomputing Watch a [hands-on session](https://www.nvidia.com/en-us/on-demand/session/gtcdc25-dct51159/?playlistId=gtcdc25-quantum-computing-and-hpc) and explore [the code](https://github.com/NVIDIA/cuda-q-academic/tree/2025-GTC-DC/workshops/2025-GTC-DC) to learn how to use CUDA-Q to bring together quantum algorithms with machine learning and generative AI to elevate quantum computing. * * * ## Latest CUDA-Q News * * * ## CUDA-Q Ecosystem CUDA-Q is accelerating work across the quantum computing ecosystem, including partner integrations that range from building and controlling better quantum hardware to developing the first useful quantum algorithms. ![Quantum Computing Partner - Agnostiq](https://developer.download.nvidia.com/images/logos/agnostic.png) ![Quantum Computing Partner - Alice & Bob](https://developer.download.nvidia.com/images/cuda/alice-bob-logo.svg) ![Quantum Computing Partner - Anyon Technologies](https://developer.download.nvidia.com/images/logos/anyon-technologies.png) ![Quantum Computing Partner - Aqarios](https://developer.download.nvidia.com/images/logos/aqarios-logo.svg) ![Quantum Computing Partner - Atlantic Quantum](https://developer.download.nvidia.com/images/logos/atlantic-quantum.png) ![Quantum Computing Partner - Atom Computing](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/atom-computing-logo.svg) ![Quantum Computing Partner - Diraq](https://developer.download.nvidia.com/images/logos/diraq.png) ![Quantum Computing Partner - Equal1](https://developer.download.nvidia.com/images/logos/equal1.png) ![Quantum Computing Partner - Fermioniq](https://developer.download.nvidia.com/images/logos/ferminiq.png) ![Quantum Computing Partner - IonQ](https://developer.download.nvidia.com/images/logos/logo-infleqtion.svg) ![Quantum Computing Partner - IonQ](https://developer.download.nvidia.com/images/cuda/ionq-logo.svg) ![Quantum Computing Partner - IQM](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/iqm-logo@2x.png) ![Quantum Computing Partner - QuEra Computing](https://developer.download.nvidia.com/images/cuda/quera-logo.svg) ![Quantum Computing Partner - Orca Computing](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/orca-computing-logo.svg) ![Quantum Computing Partner - Oxford Quantum Circuits](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/oqc-logo.svg) ![Quantum Computing Partner - Pasqal](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/pasqal-logo@2x.png) ![Quantum Computing Partner - PlanQC](https://developer.download.nvidia.com/images/logos/planqc-logo.svg) ![Quantum Computing Partner - qBraid](https://developer.download.nvidia.com/images/cuda/qbraid-logo.svg) ![Quantum Computing Partner - Quantum Circuits Inc](https://developer.download.nvidia.com/images/logos/qci.svg) ![Quantum Computing Partner - QC Ware](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/qc-ware-logo@2x.png) ![Quantum Computing Partner - QPerfect](https://developer.download.nvidia.com/images/logos/qperfect-logo.svg) ![Quantum Computing Partner - Quandela](https://developer.download.nvidia.com/images/logos/quandela-logo.svg) ![Quantum Computing Partner - Quantnuum](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/quantinuum-logo@2x.png) ![Quantum Computing Partner - Orca Computing](https://developer.download.nvidia.com/images/logos/quantum-art%20-logo.svg) ![Quantum Computing Partner - Quantum Brilliance](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/quantum-brilliance-logo@2x.png) ![Quantum Computing Partner - Quantum Machines](https://developer.download.nvidia.com/images/cuda/quantum-machines-logo.svg) ![Quantum Computing Partner - Qudora](https://developer.download.nvidia.com/images/logos/qudora-logo.svg) ![Quantum Computing Partner - Qubly](https://developer.download.nvidia.com/images/logos/quobly-logo.svg) ![Quantum Computing Partner - Rigetti](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/rigetti-logo@2x.png) ![Quantum Computing Partner - SEEQC](https://developer.download.nvidia.com/images/cuda/seeqc-logo.svg) ![Quantum Computing Partner - Terra Quantum](https://developer.download.nvidia.com/images/cuda/terra-quantum-logo.svg) * * * ## More Resources ![](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Accelerate Your Startup ![](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for our Developer Newsletter * * * ## Get started with CUDA-Q today. [Get Started](https://nvidia.github.io/cuda-quantum/latest/using/quick_start.html) --- # Source: https://developer.nvidia.com/cuda-qx.md # NVIDIA CUDA-QX Quantum researchers and developers across a wide range of domains, from quantum error correction to hybrid solvers, use GPU programming to accelerate their applications. This demands highly optimized, domain-specific libraries. NVIDIA CUDA-QX, built on top of CUDA-Q™, is a collection of libraries and tools for accelerating research and development toward useful accelerated quantum supercomputing. [Get Started](http://www.github.com/NVIDIA/cudaqx) ![Quantum Computing Stacked Diagram](https://developer.download.nvidia.com/images/cuda/quantum-computing-diagram-sc24.jpg) * * * ## CUDA-QX Libraries CUDA-QX libraries are built on top of [CUDA-Q](https://developer.nvidia.com/cuda-q), NVIDIA’s open-source, hardware-agnostic platform for accelerated quantum supercomputing, and are also released open-source on GitHub. The CUDA-QX libraries provide optimized implementations of key quantum primitives—from quantum error correction to hybrid algorithms—enabling developers to easily leverage the CUDA-Q platform. ### CUDA-Q QEC CUDA-Q QEC, which includes GPU-accelerated decoding primitives, Google’s stim stabilizer simulator, and extension points in CUDA-Q for custom decoders, is the foundational toolkit for any error correction researcher. [Learn More About CUDA-Q QEC](https://nvidia.github.io/cudaqx/components/qec/introduction.html) ### CUDA-Q Solvers Run prebuilt optimized kernels for VQE, ADAPT-VQE, QAOA, GQE, and more to get the most performance out of today’s hardware. [Learn More About CUDA-Q Solvers](https://nvidia.github.io/cudaqx/components/solvers/introduction.html) * * * ## Performance ### Single Syndrome Decoding Latency ![](https://developer.download.nvidia.com/images/cuda-qx/single-syndrome-decoding-latency.svg) ### Batched Decoding Throughput ![](https://developer.download.nvidia.com/images/cuda-qx/batched-decoding-throughput.svg) Belief Propagation-Ordered Statistics Decoding (BP-OSD) is one of the most promising approaches for scalable quantum error correction. CUDA-Q QEC accelerates BP-OSD with state-of-the-art latency and throughput, offering a 29–35x speedup over industry standard decoders for a single shot, as well as an additional speedup of up to 42x for high-throughput use cases where many syndromes need to be decoded at once. * * * ## Latest Product News ### Introducing NVIDIA CUDA-QX Libraries for Accelerated Quantum Supercomputing Learn how the CUDA-QX libraries can accelerate the development of hybrid applications, ranging from quantum error correction to chemical simulation. [Learn More About CUDA-QX Libraries](https://developer.nvidia.com/blog/introducing-nvidia-cuda-qx-libraries-for-accelerated-quantum-supercomputing/) ### NVIDIA and QuEra Decode Quantum Errors with AI Learn how researchers will boost magic state fidelity and accelerate QEC research with NVIDIA AI Decoder, a transformer based AI decoder. [Learn More About AI Decoder ](https://developer.nvidia.com/blog/?p=97001&preview=1&_ppp=e793b2b0ef) ### Accelerating Quantum Error Correction Research With NVIDIA Quantum Learn how CUDA-Q QEC features like an advanced BP+OSD decoder implementation, integration with Infleqtion’s qLDPC code library, and CUDA-Q’s accelerated noisy simulator help streamline quantum error correction research. [Learn More About QEC ](https://developer.nvidia.com/blog/accelerating-quantum-error-correction-research-with-nvidia-quantum/) * * * Get Started With CUDA-QX Today. [Get Started](http://www.github.com/NVIDIA/cudaqx) --- # Source: https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries.md 1. [Topics](https://developer.nvidia.com/topics/) [AI](https://developer.nvidia.com/topics/ai) 2. [Data Science](/topics/ai/data-science) CUDA-X Data Science Libraries # CUDA-X Data Science CUDA-X™ Data Science is a collection of open-source libraries that accelerate popular data science libraries and platforms. It is part of the CUDA-X collection of highly optimized, domain-specific libraries built on CUDA®. CUDA-X Data Science includes zero code change APIs to accelerate popular PyData tools like pandas, scikit-learn, as well as distributed computing frameworks like Apache Spark. With 100+ integrations with open-source libraries and tools in the data science and data processing ecosystem, CUDA-X Data Science democratizes access to accelerated data science. [Download Now](https://docs.rapids.ai/install?_gl=1*kwbd1w*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODk0OTQkbzk1JGcwJHQxNzUyMTg5NDk0JGo2MCRsMCRoMA)[Documentation](https://docs.rapids.ai/) ![NVIDIA CUDA-X Data Science open-source libraries](https://developer.download.nvidia.com/images/cuda-x/cuda-diagram-data-science-and-ai-application-stack.png) * * * ## CUDA-X Data Science Libraries Accelerate data analytics, machine learning, graphs as well as data intensive applications such as vector search to get the highest performance possible on single GPUs or scale up to distributed systems using simple zero code change interfaces. ### cuDF: 50x Faster pandas cuDF is a GPU-accelerated library that optimizes fundamental DataFrame operations. It includes drop-in accelerators for popular DataFrame tools like pandas, Polars, and Apache Spark with no code changes required. [Learn More About cuDF](/cudf) [Run the Benchmark](https://github.com/rapidsai/cudf/blob/branch-25.06/docs/cudf/source/user_guide/performance-comparisons/performance-comparisons.ipynb) [View Docs](https://docs.rapids.ai/api/cudf/stable/) [Install Now](https://docs.rapids.ai/install?_gl=1*kwbd1w*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODk0OTQkbzk1JGcwJHQxNzUyMTg5NDk0JGo2MCRsMCRoMA..) **TAGS: pandas, dataframe, Python,Cc++** ### cuML: 50x Faster scikit-learn cuML is a GPU-accelerated machine learning library that optimizes machine learning algorithms for execution on GPUs. It includes accelerators that run machine learning algorithms in scikit-learn, UMAP, and HDBSCAN with no code changes required. [Learn More About cuML](/cuml) [Run the Benchmark](https://github.com/rapidsai/cuml/tree/branch-25.06/python/cuml/cuml/benchmark) [View Docs](https://docs.rapids.ai/api/cuml/stable/) [Install Now](https://docs.rapids.ai/install?_gl=1*kwbd1w*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODk0OTQkbzk1JGcwJHQxNzUyMTg5NDk0JGo2MCRsMCRoMA..) **TAGS: scikit-learn, machine learning, Python, C++** ### cuGraph: 48x Faster NetworkX cuGraph is a GPU-accelerated graph analytics library that optimizes graph algorithms for execution on GPUs to process millions of nodes without specialized software. It includes a zero-code-change accelerator for NetworkX. [Run the Benchmark](https://github.com/rapidsai/nx-cugraph/blob/branch-25.06/benchmarks/pytest-based) [View Docs](https://docs.rapids.ai/api/cugraph/stable/) [Install Now](https://docs.rapids.ai/install?_gl=1*kwbd1w*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODk0OTQkbzk1JGcwJHQxNzUyMTg5NDk0JGo2MCRsMCRoMA..) **TAGS: NetworkX, graph, Python, C++** ### Apache Spark Accelerated with cuDF Learn more about our accelerator plug-in for Apache Spark workflows. [Learn More About GPU-Accelerated Spark](https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/apache-spark-3/) **TAGS: machine learning, data processing, distributed computing, Scala, Python** ### Dask-RAPIDS Scale out GPU-accelerated data science pipelines to multiple nodes on Dask. [Get Started on GitHub](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) **Tags: distributed computing, Python** ### cuxfilter Create interactive data visuals with multidimensional filtering of over 100-million-row tabular datasets. [Get Started With cuxfilter](https://docs.rapids.ai/api/cuxfilter/stable/) **Tags: dashboards, visualization, Python** ### cuCIM Mirror scikit-image for image manipulation and OpenSlide for image loading with the cuCIM API. [Get Started With cuCIM](https://docs.rapids.ai/api/cucim/stable/?_gl=1*w4ryfi*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODUyMjUkbzk0JGcwJHQxNzUyMTg1MjI1JGo2MCRsMCRoMA..) **Tags: computer vision, vision processing, Python** ### cuVS Apply cuVS algorithms to accelerate vector search, including world-class performance from CAGRA. [Get Started With cuVS](/cuvs) **TAGS: vector search, Python, C++, c, rust** ### RAFT Use RAFT’s CUDA-accelerated primitives to rapidly compose analytics. [Get Started With RAFT](https://docs.rapids.ai/api/raft/stable?_gl=1*1jr70u0*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODUyMjUkbzk0JGcwJHQxNzUyMTg1MjI1JGo2MCRsMCRoMA..) **Tags: primitives, algorithms, CUDA, Python, C++** ### KvikIO Take full advantage of NVIDIA® GPUDirect® Storage (GDS) through powerful bindings to cuFile. [Get Started With KviKIO](https://docs.rapids.ai/api/kvikio/stable?_gl=1*w4ryfi*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODUyMjUkbzk0JGcwJHQxNzUyMTg1MjI1JGo2MCRsMCRoMA..) **Tags: FILEIO, GPUDirectStorage, Python, C++** ### Other CUDA-X Data Science and Processing Libraries See a complete list of libraries and tools. [Check out GitHub](https://github.com/rapidsai) * * * ## Get Started Hands-On Tutorials Training ### Starter Kit: Accelerated Data Analytics With pandas Code This kit demonstrates how to create responsive dashboards on large-scale data using pandas code and PyViz libraries, leveraging cuDF for accelerated exploratory data analytics with zero code changes. - Video: [Accelerated Exploratory Data Analysis With pandas on NVIDIA GPUs](https://www.youtube.com/watch?v=PJpCJsqcfOk) (16:06) - Notebook: [Build an Interactive Dashboard Notebook](https://colab.research.google.com/gist/will-hill/aa24c3ffe1428c005af3793fcacf9bd2/cudf_pandas_opencellid_demo.ipynb) ### Starter Kit: Accelerated Machine Learning on XGBoost XGBoost is the most popular Python library for gradient boosted decision trees. It supercharges machine learning models for classification, regression and ranking workflows. - Video: # [Accelerated Machine Learning with XGBoost on NVIDIA GPUs](https://www.youtube.com/watch?v=lhraJRaDkOA) (20:10) - Notebook: [Get Started With Accelerating XGBoost Workflows on GPUs](https://colab.research.google.com/gist/will-hill/2edd85e351e62e52fccd43da9b027434/xgboost_rapids_taxi.ipynb) ### Starter kit: Accelerated Machine Learning With cuML Code cuML accelerates popular machine learning algorithms, including Random Forest, UMAP, and HDBSCAN - Video: [cuML Accelerates Machine Learning by 50x With Zero Code Change](https://www.youtube.com/watch?v=cIJsVq8CPys)(00:55) - Blog: [NVIDIA cuML Brings Zero-Code-Change Acceleration to scikit-learn](https://developer.nvidia.com/blog/nvidia-cuml-brings-zero-code-change-acceleration-to-scikit-learn/) - Notebook: [Get Started With Accelerating Popular Machine Learning Libraries](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/cuml_sklearn_colab_demo.ipynb) #### Starter Kit: Accelerated Data Analytics With Apache Spark The NVIDIA RAPIDS™ accelerator for Apache Spark accelerates enterprise-level data workloads to drive cost savings. - Video: [Accelerate Data Analytics on GPUa With the RAPIDS Accelerator for Apache Spark](https://www.nvidia.com/en-us/on-demand/session/gtc25-dlit71528/) (1:27:34) - Blog: [Predicting Performance on Apache Spark with GPUs](https://developer.nvidia.com/blog/predicting-performance-on-apache-spark-with-gpus/) - User [Guide: RAPIDS Accelerator for Apache Spark User Guide](https://docs.nvidia.com/spark-rapids/user-guide/latest/index.html) #### Starter Kit: Accelerated Data Analytics With Polars Code Polars is known for its high performance and memory optimizations. Experience even faster execution when you call the GPU engine powered by cuDF. - Video: [Processing 100 Million Rows of Data in Under 2 Seconds With Polars](https://www.youtube.com/watch?v=AoKeit2Fbmw) (00:28) - Blog: [Get Started with Accelerating Polars](https://developer.nvidia.com/blog/polars-gpu-engine-powered-by-rapids-cudf-now-available-in-open-beta/) - Notebook: [Accelerate Polars Data Processing Workflows Notebook](https://colab.research.google.com/github/CUDA-X Data Science Librariesai-community/showcase/blob/main/accelerated_data_processing_examples/polars_gpu_engine_demo.ipynb?utm_source=nvidia+mktg&utm_medium=web&utm_campaign=polars+launch) #### Starter Kit: Accelerated Graph Analytics With NetworkX Code NetworkX accelerates popular graph algorithms, including Louvain, Betweeness Centrality, and PageRank. - Video: [Achieve up to 500x Faster NetworkX With Zero Code Changes Using NVIDIA cuGraph](https://www.youtube.com/watch?v=3EsbU1gcH5c) (00:42) - Blog: [NetworkX Introduces Zero-Code-Change Acceleration Using NVIDIA cuGraph](https://developer.nvidia.com/blog/networkx-introduces-zero-code-change-acceleration-using-nvidia-cugraph/) - Notebook: [Accelerated Graph Analytics Notebook](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/accelerated_networkx_demo.ipynb?ncid=so-othe-145635-vt27) ### Data Science Learning Path Get an overview of everything DLI offers for upskilling in accelerated data science. [Learn More](https://www.nvidia.com/en-us/learn/learning-path/accelerated-data-science/) ### Accelerated Data Science Workflows With Zero Code Change Take our free self-paced course to learn how to transform your workflow with zero-code-change acceleration. [Learn More](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+T-DS-03+V1) ### Get Certified in Accelerated Data Science Gain a deeper understanding of accelerated data science in our certification course. [Learn More](https://www.nvidia.com/en-us/learn/certification/accelerated-data-science-professional/) * * * ## Install and Deploy in Your Environment Quick Install Deployment Guides ### Quick Install With conda 1. If not installed, download and run the install script. This will install the latest miniforge: wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh 2. Then install with: conda create -n rapids-25.12 -c rapidsai -c conda-forge rapids=25.12 python=3.13 cuda-version=13.0 ### Quick Install With pip Install via the NVIDIA PyPI index: pip install \ --extra-index-url=https://pypi.nvidia.com \ cudf-cu13==25.12.\* \ dask-cudf-cu13==25.12\* \ cuml-cu13==25.12.\* \ cugraph-cu13==25.12.\* ### Deploy Locally Use this guide to install and build with conda, pip, Docker, or WSL2 on your local machine. [Read the Local Deployment Guide](https://docs.rapids.ai/deployment/stable/local/) ### Deploy on Platforms Deploy CUDA-X Data Science libraries on your platform of choice, including Kubernetes, Databricks, and Google Colab. [Read the Platforms Guide](https://docs.rapids.ai/deployment/stable/platforms/) ### Deploy in the Cloud Run CUDA-X Data Science libraries in AWS, Azure, GCP, and more. [Read the Cloud Deployment Guide](https://docs.rapids.ai/deployment/stable/cloud/) * * * ## The Accelerated Data Science Ecosystem Data practitioners in open-source libraries, commercial software, and industries are driving innovation with CUDA-X Data Science. Open-Source Libraries Platforms Industry Adoption ![Data Science Open-Source Library - Apache Arrow](https://developer.download.nvidia.com/images/logos/apache-arrow-logo.svg) ![Data Science Open-Source Library - Apache Spark](https://developer.download.nvidia.com/images/logos/apache-spark-logo.svg) ![Data Science Open-Source Library - CuPy](https://developer.download.nvidia.com/images/logos/cupy-logo.svg) ![Data Science Open-Source Library - Dask](https://developer.download.nvidia.com/images/logos/dask-logo.svg) ![Data Science Open-Source Library - Dmlc XGBoost](https://developer.download.nvidia.com/images/logos/dmlc-xgboost-logo.svg) ![Data Science Open-Source Library - HoloViz](https://developer.download.nvidia.com/images/logos/holoviz-logo.svg) ![Data Science Open-Source Library - NetworkX](https://developer.download.nvidia.com/images/logos/networkx-logo.svg) ![Data Science Open-Source Library - Numba](https://developer.download.nvidia.com/images/logos/numba-logo.svg) ![Data Science Open-Source Library - Polars](https://developer.download.nvidia.com/images/logos/polars-logo.svg) ![Data Science Open-Source Library - PyG](https://developer.download.nvidia.com/images/logos/pyg-logo.svg) ![Data Science Open-Source Library - PyTorch](https://developer.download.nvidia.com/images/logos/pytorch-logo.svg) ![Data Science Open-Source Library - Scikit Learn](https://developer.download.nvidia.com/images/logos/scikit-learn-logo.svg) ![Data Science Open-Source Library - scverse](https://developer.download.nvidia.com/images/logos/scverse-logo.svg) ![Data Science Platform - Amazon SageMaker](https://developer.download.nvidia.com/images/logos/amazon-sagemaker-logo.svg) ![Data Science Platform - Anaconda](https://developer.download.nvidia.com/images/logos/anaconda-logo.svg) ![Data Science Platform - Azure Machine Learning](https://developer.download.nvidia.com/images/logos/azure-machine-learning-logo.svg) ![Data Science Platform - Cloudera](https://developer.download.nvidia.com/images/logos/cloudera-logo.svg) ![Data Science Platform - Databricks](https://developer.download.nvidia.com/images/logos/databricks-logo.svg) ![Data Science Platform - Google Cloud Dataproc](https://developer.download.nvidia.com/images/logos/dataproc-logo.svg) ![Data Science Platform - Determined AI](https://developer.download.nvidia.com/images/logos/determined-ai-logo.svg) ![Data Science Platform - Domino](https://developer.download.nvidia.com/images/logos/domino-logo.svg) ![Data Science Platform - Google Colab](https://developer.download.nvidia.com/images/logos/google-colab-logo.svg) ![Data Science Platform - Iguazio](https://developer.download.nvidia.com/images/logos/iguazio-logo.svg) ![Data Science Platform - Snowflake](https://developer.download.nvidia.com/images/logos/snowflake-logo.svg) ![Data Science Industry Adoption - AT&T](https://developer.download.nvidia.com/images/logos/att-logo.svg) AT&T applied the RAPIDS Accelerator for Apache Spark on GPU clusters in their data-to-AI pipeline. [Read Blog](https://developer.nvidia.com/blog/scaling-data-pipelines-att-optimizes-speed-cost-and-efficiency-with-gpus/) ![Data Science Industry Adoption - bunq](https://developer.download.nvidia.com/images/logos/bunq-logo.svg) bunq improved fraud detection accuracy by accelerating model training 100x and data processing 5x using NVIDIA CUDA-X libraries. [Read Blog](https://blogs.nvidia.com/blog/europe-financial-services-ai/) ![Data Science Industry Adoption - CapitalOne](https://developer.download.nvidia.com/images/logos/capital-one-logo.svg) Capital One accelerated their financial and credit analysis pipelines, improving model training by 100x. [Watch On-Demand Session](https://www.nvidia.com/en-us/on-demand/session/gtcsj20-s22136/) ![Data Science Industry Adoption - Checkout.com](https://developer.download.nvidia.com/images/logos/checkout-logo.svg) Checkout.com accelerated their data analysis workflows from minutes to seconds with NVIDIA cuDF. [Read Blog](https://blogs.nvidia.com/blog/europe-financial-services-ai/) ![Data Science Industry Adoption - Cloudera](https://developer.download.nvidia.com/images/logos/cloudera-logo.svg) The IRS team uncovered fraud with the RAPIDS Accelerator for Apache Spark on the Cloudera Data Platform. [Read Blog](https://blogs.nvidia.com/blog/2021/09/07/cloudera-spark-irs-gpus/) ![Data Science Industry Adoption - Linkedin](https://developer.download.nvidia.com/images/logos/linkedin-logo.svg) LinkedIn developed DARWIN to enable faster data analysis on NVIDIA cuDF. [Watch On-Demand Session](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51399/) ![Data Science Industry Adoption - NASA](https://developer.download.nvidia.com/images/logos/nasa-logo.svg) NASA used CUDA-X Data Science to detect and quantify air pollution anomalies and build a bias-correction model. [Read Blog: Part 1](https://developer.nvidia.com/blog/nasa-and-nvidia-collaborate-to-accelerate-scientific-data-science-use-cases-part-1/)[Read Blog: Part 2](https://developer.nvidia.com/blog/nasa-and-nvidia-collaborate-to-accelerate-scientific-data-science-use-cases-part-2/) ![Data Science Industry Adoption - PayPal](https://developer.download.nvidia.com/images/logos/paypal-logo.svg) PayPal reduced cloud costs by up to 70% with the RAPIDS Accelerator for Apache Spark. [Watch On-Demand Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-s62506/) ![Data Science Industry Adoption - Taboola](https://developer.download.nvidia.com/images/logos/taboola-logo.svg) Taboola, an advertising platform, processes terabytes of hourly data with the RAPIDS Accelerator for Apache Spark. [Watch On-Demand Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-s62130/) ![Data Science Industry Adoption - Tgen](https://developer.download.nvidia.com/images/logos/tgen-logo.svg) TGen cut analysis time on 4 million-cell datasets from 10 hours to three minutes with RAPIDS-singlecell, built on CUDA-X Data Science. [Read Customer Story](https://www.nvidia.com/en-us/customer-stories/reduce-single-cell-spatial-analysis-from-hours-to-minutes/) ![Data Science Industry Adoption - TCS](https://developer.download.nvidia.com/images/logos/tcs-logo.svg) TCS Optumera accelerated their demand forecasting pipeline with the RAPIDS Accelerator for Apache Spark. [Watch On-Demand Session](https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s42508/) ![Data Science Industry Adoption - Uber](https://developer.download.nvidia.com/images/logos/uber-logo.svg) Uber developed Horovod with support for Spark 3.x with GPU scheduling. [Watch On-Demand Session](https://www.nvidia.com/en-us/on-demand/session/gtcsj20-s21300/) ![Data Science Industry Adoption - Walmart](https://developer.download.nvidia.com/images/logos/walmart-logo.svg) Walmart solved scalability issues with their product-substitution algorithm. [Watch On-Demand Session](https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s42259/) * * * ## Join the Community ![](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Join the Accelerated Data Science Community on Slack ![](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for the Data Science Newsletter * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting team to ensure their application meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ### Download CUDA-X Data Science libraries today. [Download](https://docs.rapids.ai/install/?_gl=1*kwbd1w*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODk0OTQkbzk1JGcwJHQxNzUyMTg5NDk0JGo2MCRsMCRoMA) --- # Source: https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf.md 1. [AI](/topics/ai) [Data Science](/topics/ai/data-science) 2. [CUDA-X Data Science Libraries](/topics/ai/data-science/cuda-x-data-science-libraries) cuDF # NVIDIA cuDF: GPU-Accelerated DataFrames NVIDIA cuDF is an open source CUDA-X™ Data Science library that accelerates popular libraries like pandas, Polars, and Apache Spark on NVIDIA GPUs—delivering massive speedups for DataFrame operations with no code changes required. Built on highly optimized CUDA® primitives, cuDF taps into GPU parallelism and memory bandwidth to accelerate data processing and analytics workflows. Get Started: [Accelerate pandas](#accel-pandas)[Accelerate Polars](#accel-polars)[Accelerate Apache Spark](#accel-apache) * * * ## Key Features ### Maximize Performance With NVIDIA GPUs cuDF is purpose-built to maximize GPU performance, optimizing core DataFrame operations with low-level CUDA primitives that fully leverage the parallelism and memory bandwidth of NVIDIA GPUs. ### Accelerate With Zero Code Changes GPU acceleration can be seamlessly integrated into existing pandas, Polars, or Apache Spark workflows using API-compatible layers like cudf.pandas and the Polars GPU engine—no code changes required. ### Integrate With the Python Data Science Ecosystem cuDF interoperates effortlessly with popular Python data science libraries like cuPy, Numba, and scikit-learn, allowing you to build end-to-end GPU-accelerated workflows. #### Handle Memory Efficiently Built on the Apache Arrow format, cuDF utilizes highly efficient columnar data structures, vector processing, and zero-copy interfaces with other accelerated libraries, minimizing data movement overhead. #### Process Larger Datasets With Unified Virtual Memory (UVM) UVM allows cuDF to transparently manage data transfers between system RAM and GPU memory, enabling you to process datasets that exceed the VRAM of a single GPU without explicit memory management. #### Scale to Distributed Architectures DataFrame workflows can be scaled across multiple GPUs and compute nodes with cuDF—built for distributed processing in environments like Apache Spark. * * * ## Turn cuDF On for Massive GPU Performance Gains https://www.youtube-nocookie.com/embed/eThOYTJrbtA?&rel=0 * * * ## Test Drive cuDF ### pandas pandas offers a flexible API for data manipulation. cuDF makes pandas more usable for medium to large datasets by accelerating operations on the GPU, with no code changes required. [Launch on Colab](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/cudf_pandas_colab_demo.ipynb) ### Polars Polars is known for its performance and memory optimizations. Experience even faster execution when you call the GPU engine powered by cuDF. [Launch on Colab](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/accelerated_data_processing_examples/polars_gpu_engine_demo.ipynb?utm_source=nvidia+mktg&utm_medium=web&utm_campaign=polars+launch) ### Apache Spark Apache Spark is a powerful engine for large-scale data processing. Learn how to use GPUs to significantly boost performance and cost efficiency. [View Quick-Start Guide](https://docs.nvidia.com/spark-rapids/user-guide/latest/index.html) * * * ## pandas Accelerated With cuDF This section details how cuDF seamlessly integrates with and accelerates your existing pandas workflows. Get Started How It Works Integrate cuDF directly into your environment to accelerate pandas. Follow these steps to get started. ### Install cuDF for pandas #### **Quick Install With conda** 1. If not installed, download and run the install script. This will install the latest miniforge: wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh 2. Then install with: conda create -n rapids-25.12 -c rapidsai -c conda-forge \ cudf=25.12 python=3.13 cuda-version=13.0 #### **Quick Install With pip** Install via the NVIDIA PyPI index: pip install \ "cudf-cu13==25.12.\*" See the complete install selector for docker, WSL2, and individual libraries. [Install Selector ](https://docs.rapids.ai/install?_gl=1*kwbd1w*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODk0OTQkbzk1JGcwJHQxNzUyMTg5NDk0JGo2MCRsMCRoMA.. "Download Workflows") ### Enable cuDF Acceleration of pandas With Zero Code Changes cuDF implements zero-code-change acceleration through the cuDF.pandas module. This module is part of the cuDF package. Load it to seamlessly accelerate your existing pandas code. Enable it using one of these methods before importing or using pandas. For a step-by-step visual guide, watch [this video](https://www.youtube.com/watch?v=jFi60VQVmF4). #### To accelerate IPython or Jupyter notebooks, use the magic command: %load\_ext cudf.pandas import pandas as pd ... ### To accelerate a Python script, use the Python module flag on the command line: python -m cudf.pandas script.py import pandas as pd ... ### If you can't use command-line flags, explicitly enable cudf.pandas via import: import cudf.pandas cudf.pandas.install() import pandas as pd ... #### From Minutes to Seconds: cuDF Accelerates pandas Standard DuckDB data benchmark (5 GB) performance comparison between cudf.pandas and traditional pandas v2.2 (lower is better). ![A chart showing how cuDF accelerates pandas from minutes to seconds](https://developer.download.nvidia.com/images/cudf/cuda-diagram-data-science-charts-pandas.png) Specs: HW: NVIDIA L4, CPU: Intel Xeon 8480CL | SW: pandas v2.2.1, NVIDIA RAPIDS™ cuDF 24.02 [Run the benchmark here.](https://github.com/rapidsai/cudf/blob/branch-25.06/docs/cudf/source/user_guide/performance-comparisons/performance-comparisons.ipynb) ### Hands-On Tutorials: Accelerate pandas on Colab cuDF comes preinstalled in Google Colab, making it incredibly easy to get started. Simply switch to a GPU runtime and enable cudf.pandas at the top of your notebook to instantly accelerate your pandas workflows. Explore these starter kits for hands-on examples: #### Starter Kit: Build an Interactive Data Analytics Dashboard This kit demonstrates how to create responsive dashboards using pandas code and PyViz libraries, directly addressing the lag when exploring large-scale data (e.g., 7.3M+ rows of geospatial data). See how GPU acceleration with cuDF makes interactive filters update near instantly with zero code changes. - Video: [Accelerated Exploratory Data Analysis With pandas on NVIDIA GPUs](https://www.youtube.com/watch?v=PJpCJsqcfOk) (16:07) - Notebook: [Build an Interactive Dashboard Notebook](https://colab.research.google.com/gist/will-hill/aa24c3ffe1428c005af3793fcacf9bd2/cudf_pandas_opencellid_demo.ipynb) #### Starter Kit: Process 18 Million Rows of Stock Data This kit demonstrates how cuDF handles processing large volumes of time-series data in pandas, tackling bottlenecks like calculating simple moving averages with groupby().rolling(). See how GPU acceleration turns minutes of processing into seconds. - Video: [Processing 18M Rows of Stock Data 20x Faster in cuDF pandas Accelerator Mode](https://www.youtube.com/watch?v=eThOYTJrbtA) (00:29) - Notebook: [Get Started With Accelerating Stock Data](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/cudf_pandas_stocks_demo.ipynb?ncid=so-yout-962126#scrollTo=WmOguzNUcw4F) #### Starter Kit: Process 8 GB of Text Data This kit addresses the slowdowns pandas experiences from large string fields and memory-intensive operations like reading files, calculating string length, and merging DataFrames, providing massive end-to-end speedups. - Video: [RAPIDS cuDF Accelerates pandas up to 30x on an 8 GB Text Dataset ](https://youtu.be/AgFVwqDcXCs?si=D1wYmdRUOoPrgvvv)(00:35) - Notebook: [Get Started With Accelerating Stock Data](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/accelerated_data_processing_examples/cudf_pandas_large_string.ipynb?ncid=so-yout-766280-vt27) #### How cuDF Accelerates pandas When cudf.pandas is enabled, it transforms how your pandas operations are executed, acting as a proxy that transparently intercepts your pandas calls. It attempts to execute these operations on the GPU first, utilizing cuDF’s highly optimized, GPU-accelerated functions. If a particular operation isn’t supported on the GPU, it falls back to the CPU, ensuring your code continues to run without errors. This seamless “GPU-first, CPU-fallback” mechanism allows you to continue using familiar pandas syntax and functions while benefiting from significant performance improvements. This approach ensures that your existing pandas code, including operations within third-party libraries that use pandas, generally works as is, without requiring manual modifications to specifically target the GPU. For a more granular explanation of the underlying architecture and how cudf.pandas achieves this seamless acceleration, refer to [our documentation](https://docs.rapids.ai/api/cudf/stable/cudf_pandas/how-it-works/). ![A flowchart of how cuDF accelerates pandas](https://developer.download.nvidia.com/images/cudf/cuda-diagram-data-science-flow.png) * * * ## Polars Accelerated With cuDF This section details how the Polars GPU engine, powered by cuDF, seamlessly integrates and accelerates your Polars workflows. Get Started How It Works Integrate the Polars GPU engine directly into your environment to accelerate Polars. Follow these steps to get started. ### Install Polars With GPU Support Simply pip install Polars with the GPU feature flag. pip install polars[gpu] ### Configure Polars to Use the GPU Engine Activate GPU acceleration for your Polars operations by configuring the engine. You can use the default GPU engine or customize it for finer control. #### Default GPU Engine Materialize the Polars LazyFrame into a DataFrame with the default GPU engine configuration by calling .collect(engine="gpu"): import polars as pl ldf = pl.LazyFrame({"a": [1.242, 1.535]}) print( ldf.select( pl.col("a").round(1) ).collect(engine="gpu") ) #### Custom GPU Engine For finer control on the engine, you can pass a GPUEngine object with additional configurations like device details and verbosity to the engine=parameter: import polars as pl ldf = pl.LazyFrame({"a": [1.242, 1.535]}) gpu\_engine = pl.GPUEngine( device=0, # This is the default raise\_on\_fail=True, # Fail loudly if can't execute on the GPU ) print( ldf.select( pl.col("a").round(1) ).collect(engine=gpu\_engine) ) #### Accelerate Polars Workflows Up to 13x Top performing compute heavy queries with complex groupby and join operations (PDS-H benchmark | 80 GB dataset) ![A chart showing how cuDF accelerate Polars workflows up to 13x](https://developer.download.nvidia.com/images/cudf/cuda-diagram-july-data-science-webpages-chart-charts-polars2.png) Query Number #### Performance Improves as Data Size Grows Query processing time across a range of dataset sizes ![A chart showing how cuDF improves Polars performance as data size grows](https://developer.download.nvidia.com/images/cudf/cuda-diagram-july-data-science-webpages-chart-charts-polars1.png) Specs: [PDS-H benchmark](https://github.com/pola-rs/polars-benchmark) | GPU: NVIDIA H100 | CPU: Intel Xeon W9-3495X (Sapphire Rapids) | Storage: Local NVME Note: Note: PDS-H is derived from TPC-H but these results are not comparable to TPC-H results. [Run the benchmark here.](https://github.com/pola-rs/polars-benchmark) ### Hands-On Tutorials: Accelerate Polars on Colab The Polars GPU engine is preinstalled in Google Colab, making it incredibly easy to get started. Simply switch to a GPU runtime and enable the GPU engine to instantly accelerate your Polars workflows. Explore these hands-on resources: #### Starter Kit: Process 100 Million Rows of Transaction Data in Seconds This kit demonstrates how the Polars GPU engine can process 100 million rows in under two seconds. Learn how, with it, you can tackle massive transaction datasets that typically cause slowdowns. - Video: [Processing 100M Rows of Data in Under Two Seconds With the Polars GPU Engine](https://www.youtube.com/watch?v=AoKeit2Fbmw) (00:28) - Notebook: [Intro to the Polars GPU Engine](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/accelerated_data_processing_examples/polars_gpu_engine_demo.ipynb?utm_source=nvidia+mktg&utm_medium=web&utm_campaign=polars+launch#scrollTo=TabS5fiHG0dG) ### How cuDF Accelerates Polars Polars is already a highly performant DataFrame library, designed with multi-threaded execution, advanced memory optimizations, and lazy evaluation. These features allow Polars to efficiently handle medium- to large-scale data out of the box The Polars GPU engine takes these strengths further. By adding the speed of cuDF to the efficiency of Polars, you can achieve even faster execution, enabling the processing of hundreds of millions of rows in seconds. Built directly into the Polars Lazy API, the GPU engine works by attempting to execute operations on the GPU first and falling back to the CPU if necessary. This approach ensures you can continue to use familiar Polars syntax and functions while benefiting from enhanced GPU acceleration. For a deep dive into the architecture and underlying mechanisms of the Polars GPU Engine, check out the [official Polars blog post](https://pola.rs/posts/gpu-engine-release/). * * * ## Apache Spark Accelerated With cuDF This section details how cuDF leverages NVIDIA GPUs to seamlessly integrate and significantly enhance the performance and cost efficiency of your Apache Spark workloads. Get Started How It Works Acceleration of Apache Spark extract, transform, load (ETL) and machine learning workloads is designed to be straightforward, whether you're deploying on premises or in the cloud. ### Launch Path Follow our guide [here](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/overview.html). See below for a high-level overview: Launch Spark with the RAPIDS Accelerator for Apache Spark plug-in jar and enable a configuration setting: spark.conf.set('spark.rapids.sql.enabled','true') The following is an example of a physical plan with operators running on the GPU: == Physical Plan == GpuColumnarToRow false +- GpuProject [cast(c\_customer\_sk#0 as string) AS c\_customer\_sk#40] +- GpuFileGpuScan parquet [c\_customer\_sk#0] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/tmp/customer], PartitionFilters: [], PushedFilters: [], ReadSchema: struct\<c\_customer\_sk:int\> #### Lightning Performance at a Fraction of the Cost [NVIDIA Decision Support Benchmark](https://github.com/NVIDIA/spark-rapids-benchmarks/tree/dev/nds) 3 TB AWS EC2, Apache Spark 3.4.1, Spark RAPIDS release 24.04 ![A chart showing how cuDF leverages NVIDIA GPUs to enhance performance of Apache Spark workloads](https://developer.download.nvidia.com/images/cudf/cuda-diagram-july-data-science-webpages-chart-charts-sparks1-r2.png) ![A chart showing how cuDF leverages NVIDIA GPUs to reduce costs of Apache Spark workloads](https://developer.download.nvidia.com/images/cudf/cuda-diagram-july-data-science-webpages-chart-charts-sparks2.png) Cluster of 8x nodes in each setup: CPU - r6id.8xlarge - 32 vCPU, 256GB, 1x1900 NVMe SSD, 12.5 Gb/s network GPU - g6.8xlarge - 32vCPU, 1 L4 GPU, 128GB, NVMe SSD, 25 Gb/s network \* included NVAIE at $1/GPU/hr [Run the benchmark here.](https://github.com/NVIDIA/spark-rapids-benchmarks/tree/dev/nds) ### Getting Started With cuDF Acceleration of Apache Spark Learn how you can significantly accelerate enterprise-level data workloads, leading to substantial cost savings. #### Scale Data Analytics With Apache Spark Learn how GPUs accelerate enterprise-scale Apache Spark workflows to drive cost savings. - On-Demand GTC Session:: [Accelerate Big Data Analytics on GPUs With the RAPIDS Accelerator for Apache Spark](https://www.nvidia.com/en-us/on-demand/session/gtc25-dlit71528/) (01:27:34) - Blog: [Predicting Performance on Apache Spark With GPUs](https://developer.nvidia.com/blog/predicting-performance-on-apache-spark-with-gpus/) - User Guide: [RAPIDS Accelerator for Apache Spark](https://docs.nvidia.com/spark-rapids/user-guide/latest/index.html) #### How cuDF Accelerates Apache Spark NVIDIA cuDF enhances your Apache Spark workflows by integrating as a Spark plug-in. Once installed and configured, it automatically detects available GPUs and leverages them for supported operations. It works by replacing the backend for Spark SQL and DataFrame operations with GPU-accelerated versions. If an operation isn’t supported on GPUs, the accelerator falls back to the standard Spark CPU implementation. This plug-in-based approach makes it easy to integrate with major Spark platforms, including AWS EMR, GCP Dataproc, OCI, and Databricks. For a deeper-dive explanation of the underlying architecture, supported operations, and integration, refer to the [user guide](https://docs.nvidia.com/spark-rapids/index.html). ![A flowchart of how cuDF accelerates Apache Spark](https://developer.download.nvidia.com/images/cudf/cuda-diagram-data-science-flow.png) * * * ## Data Science Training From NVIDIA ![Data Science Learning Path From the NVIDIA DLI](https://developer.download.nvidia.com/icons/m48-map-search-256px-blk.png) ### Data Science Learning Path From the NVIDIA Deep Learning Institute ![Self-Paced Course: Accelerate Data Science Workflows With Zero Code Changes](https://developer.download.nvidia.com/icons/m48-deep-learning-institute-usd.svg) ### Self-Paced Course: Accelerate Data Science Workflows With Zero Code Changes ![Get Certified in Accelerated Data Science](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Certified in Accelerated Data Science ## Join the Community ![Join NVIDIA CUDA-X Data Science Libraries Slack Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Join the Accelerated Data Science Community on Slack ![Sign Up for NVIDIA Data Science Newsletter](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for the Data Science Newsletter * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting team to ensure their application meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Get started with NVIDIA cuDF today. [Documentation](https://docs.rapids.ai/api/cudf/stable/) --- # Source: https://developer.nvidia.com/cudnn.md 1. [Home ](/) [CUDA-X Libraries](/gpu-accelerated-libraries) cuDNN # NVIDIA cuDNN NVIDIA® CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines, such as forward and backward convolution, attention, matmul, pooling, and normalization. * * * ## Download cuDNN [Download cuDNN Library](/rdp/cudnn-download "vMaterials for Windows")[Download cuDNN Frontend (GitHub)](https://github.com/NVIDIA/cudnn-frontend "vMaterials for Linux ") cuDNN is also available to download via one of the package managers below. ### Quick Install with conda conda install nvidia::cudnn cuda-version=12 Installs the cuDNN library ### Quick Pull with Docker docker pull nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04 Installs the cuDNN lLibrary ### Quick Install with pip pip install nvidia-cudnn Installs the cuDNN library ### pip install nvidia-cudnn-frontend Installs the cuDNN Frontend API [Documentation](https://docs.nvidia.com/deeplearning/cudnn/latest/) [Forum](https://forums.developer.nvidia.com/c/ai-data-science/deep-learning/cudnn/90) * * * ## How cuDNN Works - **Accelerated Llearning:** cuDNN provides kernels, targeting Tensor Cores whenever it makes sense, to deliver best- available performance on compute-bound operations. It offers heuristics for choosing the right kernel for a given problem size. - **Fusion Support:** cuDNN supports fusion of compute-bound and memory-bound operations. Common generic fusion patterns are typically implemented by runtime kernel generation. Specialized fusion patterns are optimized with pre-written kernels. - **Expressive Op Graph API:** The user defines computations as a graph of operations on tensors. The cuDNN library has both a direct C API and an open-source C++ frontend for convenience. Most users choose the frontend as their entry point to cuDNN ### cuDNN API Code Sample The code performs a batched matrix multiplication with bias using the cuDNN PyTorch integration. import torch import cudnn # Prepare sample input data. nvmath-python accepts input tensors from pytorch, cupy, and # numpy. b, m, n, k = 1, 1024, 1024, 512 A = torch.randn(b, m, k, dtype=torch.float32, device="cuda") B = torch.randn(b, k, n, dtype=torch.float32, device="cuda") bias = torch.randn(b, m, 1, dtype=torch.float32, device="cuda") result = torch.empty(b, m, n, dtype=torch.float32, device="cuda") # Use the stateful Graph object in order to perform multiple matrix multiplications # without replanning. The cudnn API allows us to fine-tune our operations by, for # example, selecting a mixed-precision compute type. graph = cudnn.pygraph( intermediate\_data\_type=cudnn.data\_type.FLOAT, compute\_data\_type=cudnn.data\_type.FLOAT, ) a\_cudnn\_tensor = graph.tensor\_like(A) b\_cudnn\_tensor = graph.tensor\_like(B) bias\_cudnn\_tensor = graph.tensor\_like(bias) c\_cudnn\_tensor = graph.matmul(name="matmul", A=a\_cudnn\_tensor, B=b\_cudnn\_tensor) d\_cudnn\_tensor = graph.bias(name="bias", input=c\_cudnn\_tensor, bias=bias\_cudnn\_tensor) # Build the matrix multiplication. Building returns a sequence of algorithms that can be # configured. Each algorithm is a JIT generated function that can be executed on the GPU. graph.build([cudnn.heur\_mode.A]) workspace = torch.empty(graph.get\_workspace\_size(), device="cuda", dtype=torch.uint8) # Execute the matrix multiplication. graph.execute( { a\_cudnn\_tensor: A, b\_cudnn\_tensor: B, bias\_cudnn\_tensor: bias, d\_cudnn\_tensor: result, }, workspace ) ### Sample Operation Graphs Described by the cuDNN Graph API ![](https://developer.download.nvidia.com/images/conv_bias_relu.png) _ConvolutionFwd followed by a DAG with two operations_ ### Documentation Complete guides on installing and using the cuDNN frontend and cuDNN backend. [Read Docs](https://docs.nvidia.com/deeplearning/cudnn/latest/) ### Frontend Samples Samples illustrate usage of the Python and C++ frontend APIs. [View Samples](https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/samples.html) ### ### Latest Release Blog Learn how to accelerate transformers with scaled dot product attention (SDPA) in cuDNN 9. [Read Blog](https://developer.nvidia.com/blog/accelerating-transformers-with-nvidia-cudnn-9/) ### cuDNN on NVIDIA Blackwell Learn about new/updated APIs of cuDNN pertaining to NVIDIA Blackwell’s microscaling format and how to program against those APIs. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc25-s73071/) * * * ## Key Features ### Deep Neural Networks Deep learning neural networks span computer vision, conversational AI, and recommendation systems and have led to breakthroughs like autonomous vehicles and intelligent voice assistants. NVIDIA's GPU-accelerated deep learning frameworks speed up training time for these technologies, reducing multi-day sessions to just a few hours. cuDNN supplies foundational libraries for high-performance, low-latency inference for deep neural networks in the cloud, on embedded devices, and in self-driving cars. - Accelerated compute-bound operations like attention training/prefill, convolution, and matmul - Optimized memory-bound operations like attention decode, pooling, softmax, normalization, activation, pointwise, and tensor transformation - Fusions of compute-bound and memory-bound operations - Runtime fusion engine to generate kernels at runtime for common fusion patterns - Optimizations for important specialized patterns like fused attention - Heuristics to choose the right implementation for a given problem size ### cuDNN Graph API and Fusion The cuDNN Graph API is designed to express common computation patterns in deep learning. A cuDNN graph represents operations as nodes and tensors as edges, similar to a dataflow graph in a typical deep learning framework. Access to the cuDNN Graph API is conveniently available through the [Python/C++ Frontend API](https://github.com/NVIDIA/cudnn-frontend) (recommended) as well as the lower-level C Backend API (for legacy use cases or special cases where Python/C++ isn’t appropriate). - Flexible fusions of memory-limited operations into the input and output of matmul and convolution - Specialized fusions for patterns like attention and convolution with normalization - Support for both forward and backward propagation - Heuristics for predicting the best implementation for a given problem size - Open-source Python/C++ Frontend API - Serialization and deserialization support * * * ## cuDNN Accelerated Frameworks cuDNN accelerates widely used deep learning frameworks, including PyTorch, JAX, Caffe2, Chainer, Keras, MATLAB, MxNet, PaddlePaddle, and TensorFlow. ![cuDNN Accelerated Framework - Caffe2](https://developer.download.nvidia.com/images/caffe2-logo.svg) ![cuDNN Accelerated Framework - Chainer](https://developer.download.nvidia.com/images/chainer-logo.svg) ![cuDNN Accelerated Framework - JAX](https://developer.download.nvidia.com/images/jax-logo.svg) ![cuDNN Accelerated Framework - MATLAB](https://developer.download.nvidia.com/images/matlab-logo.svg) ![cuDNN Accelerated Framework - Microsoft Cognitive Toolkit](https://developer.download.nvidia.com/images/microsoft-cognitive-toolkit-logo.svg) ![cuDNN Accelerated Framework - MXNet](https://developer.download.nvidia.com/images/mxnet-logo.svg) ![cuDNN Accelerated Framework - PaddlePaddle](https://developer.download.nvidia.com/images/paddle-paddle-logo.svg) ![cuDNN Accelerated Framework - PyTorch](https://developer.download.nvidia.com/images/pytorch-logo.svg) ![cuDNN Accelerated Framework - TensorFlow](https://developer.download.nvidia.com/images/tensorflow-logo.svg) ![cuDNN Accelerated Framework - Wolfram Language](https://developer.download.nvidia.com/images/wolfram-language-logo.svg) ![cuDNN Accelerated Framework - XLA](https://developer.download.nvidia.com/images/xla-logo.svg) * * * ## Related Libraries and Software ### NVIDIA NeMo™ NeMo is an end-to-end cloud-native framework for developers to build, customize, and deploy generative AI models with billions of parameters. [Learn More](/nemo) ### NVIDIA TensorRT™ TensorRT is a software development kit for high-performance deep learning inference. [Learn More](/tensorrt) ### ### NVIDIA Optimized Frameworks Deep learning frameworks offer building blocks for designing, training, and validating deep neural networks through a high-level programming interface. [Learn More](https://docs.nvidia.com/deeplearning/frameworks/index.html#undefined') ### NVIDIA Collective Communication Library NCCL is a communication library for high-bandwidth, low-latency, GPU-accelerated networking. [Learn More](/nccl) * * * ## More Resources ![Join the Developer Community](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Join the Community ![Join the NVIDIA Developer Program](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program ![NVIDIA Inception Program](https://developer.download.nvidia.com/icons/m48-ai-startup.svg) ### Accelerate Your Startup * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloading or using a model in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Get Started With cuDNN Today [Download cuDNN Library](/rdp/cudnn-download "vMaterials for Windows")[Download cuDNN Frontend (GitHub)](https://github.com/NVIDIA/cudnn-frontend "vMaterials for Linux ") --- # Source: https://developer.nvidia.com/cudss.md 1. [Home](/) 2. NVIDIA cuDSS # NVIDIA cuDSS NVIDIA cuDSS (Preview) is an optimized, first-generation GPU-accelerated Direct Sparse Solver library for solving linear systems with very sparse matrices. Direct Sparse Solvers are an important part of numerical computing for real-time applications like autonomous driving and process simulation, where increasing complexity and high throughput demands a robust direct solver. [Download](http://developer.nvidia.com/cudss-downloads)[Documentation](https://docs.nvidia.com/cuda/cudss/index.html)[Feedback](mailto:cuDSS-EXTERNAL-Group@nvidia.com) ## Key Features ### GPU-accelerated Solver Capitalizing on the CPU’s sequential computing and the GPU’s parallel computing, cuDSS leverages both the CPU and GPU to solve sparse matrices with only a few non-zero elements per row. The result is significantly higher performance than CPU-only solvers . ### Core Functionality Support cuDSS solves sparse linear systems on single-GPU, multi-GPU, and multi-node platforms, including support for refactorization in cases with multiple systems, and different reorderings and types of matrices. cuDSS is also built to be stable, regardless of matrix size. ### Optimized for NVIDIA GPUs cuDSS supports all NVIDIA GPUs, Pascal and newer, allowing you to integrate direct sparse solvers across a variety of NVIDIA-powered platforms. cuDSS also benefits from the [Grace Hopper Superchip](https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/) architecture. * * * ## cuDSS Performance cuDSS is able to achieve significant performance gains compared to CPU-based Direct Sparse Solvers. [![](https://developer.download.nvidia.com/images/cuDSS_Redo_1260.png)](https://developer.download.nvidia.com/images/cuDSS_Redo_1260.png) * * * ## Resources - [cuDSS Documentation](https://docs.nvidia.com/cuda/cudss/index.html) - [cuDSS Samples](https://github.com/NVIDIA/CUDALibrarySamples) - [cuDSS Feedback](mailto:cuDSS-EXTERNAL-Group@nvidia.com) * * * Ready to get started with cuDSS? [Download](http://developer.nvidia.com/cudss-downloads) --- # Source: https://developer.nvidia.com/cuequivariance.md 1. [Topics](/topics) NVIDIA cuEquivariance # NVIDIA cuEquivariance cuEquivariance is a CUDA-X™ library specifically designed to tackle the demanding computational requirements of geometry-aware neural networks, which are essential for tasks involving 3D data. cuEquivariance provides optimized NVIDIA CUDA® kernels and comprehensive APIs, including those for triangle attention and triangle multiplication, to accelerate such processes across various scientific domains, including drug and material discovery. [Download Now](https://github.com/NVIDIA/cuEquivariance?tab=readme-ov-file)[Documentation](https://docs.nvidia.com/cuda/cuequivariance/) * * * ## Key Features ### Flexible API Alternative equivariance libraries are bound to a specific choice of SO(3) [irreps basis](https://docs.nvidia.com/cuda/cuequivariance/api/generated/cuequivariance.Irrep.html) and data layout. With cuEquivariance, you can specify your own irreps basis tensor product by creating a [segmented tensor product](https://docs.nvidia.com/cuda/cuequivariance/tutorials/stp.html), and generalize such operations beyond irreps to build equivariant neural networks. ### CUDA-Accelerated Performance Achieve up to: - 10x speedup for end-to-end MACE performance - 200x speedup for symmetric contraction operation performance - 100,000 natoms per GPU being simulated with MACE - 3.5x speedups for triangle operations performance _For more information on the performance noted above, please view the Performance section below._ ### Expansive MLIPs Support and Accelerations - Leading equivariant machine-learning interatomic potential models including MACE, Allegro, NequIP, and DiffDock - Protein models with triangle kernels, including: Boltz, Neo-1, and OpenFold * * * ## Get Started With NVIDIA cuEquivariance ### Quick Install With Conda conda install conda-forge::cuequivariance ### Quick Install With pip # Choose the frontend you want to use pip install cuequivariance-jax pip install cuequivariance-torch pip install cuequivariance # Installs only the core non-ML components # CUDA kernels pip install cuequivariance-ops-torch-cu11 pip install cuequivariance-ops-torch-cu12 pip install cuequivariance-ops-jax-cu12 * * * ## Performance * * * ## More Resources ![NVIDIA Developer Newsletter](https://brand-assets.cne.ngc.nvidia.com/assets/marketing-icons/2.1.0/email-settings.svg) ### Sign up for Developer Newsletter ![Get Training and Certification](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![NVIDIA Developer Program](https://brand-assets.cne.ngc.nvidia.com/assets/marketing-icons/2.1.0/developer-1.svg) ### Join the NVIDIA Developer Program ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). **Get started with cuEquivariance today.** [Install Now ](https://github.com/NVIDIA/cuEquivariance/tree/main/docs/tutorials) --- # Source: https://developer.nvidia.com/cufft.md 1. [Home](/) 2. 3. / NVIDIA cuFFT ![](https://developer.download.nvidia.com/images/cuFFT-flat.svg) # NVIDIA cuFFT NVIDIA cuFFT, a library that provides GPU-accelerated Fast Fourier Transform (FFT) implementations, is used for building applications across disciplines, such as deep learning, computer vision, computational physics, molecular dynamics, quantum chemistry, and seismic and medical imaging. # Available in the [CUDA Toolkit](/cuda-toolkit) ### cuFFT Divide-and-conquer algorithms for computing discrete Fourier transformers. Multi-GPU support for FFT calculations on up to 16 GPUs in a single node. [Learn More](#section-cufft) # Available in the [HPC SDK](/hpc-sdk) ### cuFFT Divide-and-conquer algorithms for computing discrete Fourier transformers. Multi-GPU support for FFT calculations on up to 16 GPUs in a single node. [Learn More](#section-cufft) ### cuFFTMp Multi-node support for FFTs in exascale problems. [Learn More](#section-cufftmp) # Available as Standalone ### cuFFTDx Device APIs cuFFT Device Extensions for performing FFT calculations inside a CUDA kernel. [Learn More and Download](#section-cufftdx) * * * ## cuFFT The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued datasets. It’s one of the most important and widely used numerical algorithms in computational physics and general signal processing. The cuFFT library provides a simple interface for computing FFTs on an NVIDIA GPU, which allows users to quickly leverage the GPU’s floating-point power and parallelism in a highly optimized and tested FFT library. When calculations are distributed across GPUs, cuFFT supports using up to 16 GPUs connected to a CPU to perform Fourier Transforms through its cuFFTXt APIs. Performance is a function of the bandwidth between the GPUs, the computational ability of the individual GPUs, and the type and number of FFTs to be performed. [HPC SDK](https://developer.nvidia.com/nvidia-hpc-sdk-downloads)[CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) Key Features Single-GPU Performance Multi-GPU Performance - 1D, 2D, and 3D transforms of complex and real data types - Familiar APIs similar to the advanced interface of the Fastest Fourier Transform in the West (FFTW) - Flexible data layouts allowing arbitrary strides between individual elements and array dimensions - Streamed asynchronous execution - Half-, single-, and double-precision transforms - Batch execution - In-place and out-of-place transforms - Support for up to 16-GPU systems - Thread-safe and callable from multiple host threads The cuFFT library is highly optimized for performance on NVIDIA GPUs. The chart below displays the performance boost achieved by moving to newer hardware—with zero code changes. # 1D Single-Precision FFT ![A line chart displays performance boost achieved by moving to newer hardware with no code changes ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuFFT/1-d-single-precision-fft-850x480.svg) The chart below compares the performance of 16 NVIDIA Volta™ GV100 Tensor Core GPUs to the performance of eight NVIDIA Ampere architecture GA100 Tensor Core GPUs for 3D C2C FP32 FFTs. * * * ## cuFFTDx Device Extensions cuFFT Device Extensions (cuFFTDx) enable users to perform FFT calculations inside their CUDA kernel. Fusing numerical operations can decrease latency and improve the performance of their application. [Download cuFFTDx](https://developer.nvidia.com/cufftdx-downloads) Key Features Performance - FFT embeddable into a CUDA kernel - High-performance, no-unnecessary data movement from and to global memory - Customizable with options to adjust selection of FFT routine for different needs (size, precision, batches, etc.) - Ability to fuse FFT kernels with other operations, saving global memory trips - Compatible with future versions of the CUDA Toolkit - Support for Windows The chart below shows how cuFFTDx can provide over a 2X performance boost compared with cuFFT host calls when executing convolution with 1D FFTs. * * * ## cuFFTMp Multi-Node Support The multi-node FFT functionality, available through the cuFFTMp API, enables scientists and engineers to solve distributed 2D and 3D FFTs in exascale problems. The library handles all the communications between machines, allowing users to focus on other aspects of their problems. [Download cuFFTMp](https://developer.nvidia.com/cufftmp-downloads?target_os=Linux)[Download HPC-SDK](https://developer.nvidia.com/nvidia-hpc-sdk-downloads) Key Features Performance - 2D and 3D distributed-memory FFTs - Slabs (1D) and pencils (2D) data decomposition, with arbitrary block sizes - Message Passing Interface (MPI) compatible - Low-latency implementation using NVSHMEM, optimized for single-node and multi-node FFTs Below compares multi-node weak-scaling performance for distributed 3D FFT by precision, as the problem size and number of GPUs increase. The benchmark was achieved on the NVIDIA Selene supercomputer. Note that, for FP64 and size 1,6384 3, the data didn’t fit on the system. ![The chart compares multi-node weak scaling performance for distributed 3D FFT by precision](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuFFT/multi-node-support-850x480.svg) * * * ## cuFFTDx + cuFFT LTO EA Preview This early-access version of [cuFFT](https://developer.nvidia.com/cufft) and [cuFFTDx](https://developer.nvidia.com/cufftdx-downloads) previews an innovative way of expanding features of the device library, cuFFTDx, through the host library, cuFFT. It leverages device Link-Time Optimizations (LTO) features of the CUDA Toolkit to combine code segments and achieve optimal performance. [Download Now](https://developer.nvidia.com/cufftea) Key Features Performance - A new way of enhancing your cuFFTDx project via our cuFFT host library. - Over 1000 additional sizes supported with improved performance and without workspace requirement, via code sharing across our libraries enabled by LTO. - Supporting both offline builds (using NVCC) and runtime builds (using NVRTC / nvJitLink). - Additional link time optimization in cuFFTDx applications. The chart below shows the performance improvements of the additional functionality of cuFFTDx enabled by cuFFT using LTO. * * * ## Resources **Documentation:** - [cuFFT/cuFFTXt](https://docs.nvidia.com/cuda/cufft/index.html) - [cuFFTMp](https://docs.nvidia.com/hpc-sdk/cufftmp/index.html) - [cuFFTDx](https://docs.nvidia.com/cuda/cufftdx/index.html) **Presentations:** - [Just-In-Time Link-Time Optimization Adoption in cuSPARSE/cuFFT](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31155/) - [New FFT Library With Flexible C++ API](https://www.nvidia.com/en-us/on-demand/session/gtcsiliconvalley2019-s9257/) **Samples:** - [Math Library Github](https://github.com/NVIDIA/CUDALibrarySamples) - [CUDA Toolkit Github](https://github.com/NVIDIA/cuda-samples) **Blogs:** - [Accelerating GPU Applications with NVIDIA Math Libraries](https://developer.nvidia.com/blog/accelerating-gpu-applications-with-nvidia-math-libraries/) - [Using NVIDIA cuFFTMp FFTs at Scale](https://developer.nvidia.com/blog/multinode-multi-gpu-using-nvidia-cufftmp-ffts-at-scale/) **Downloads:** - [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) - [cuFFTDX + cuFFT LTO EA](https://developer.nvidia.com/cufftea) - [NVIDIA HPC SDK](https://developer.nvidia.com/nvidia-hpc-sdk-downloads) - [cuFFTDx](https://developer.nvidia.com/mathdx) * * * ![Decorative image representing Developer Forums](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuFFT/m48-communication-chat-256px-blk.png) #### Visit the Forums ![Decorative image representing contact NVIDIA](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuFFT/m48-information-notification-256px-blk.png) #### Contact Us Quick Links - [HPC SDK](https://developer.nvidia.com/hpc-sdk) - [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)[Download cuFFTDx](https://developer.nvidia.com/cufftdx-downloads) * * * --- # Source: https://developer.nvidia.com/culitho.md # NVIDIA cuLitho ## Accelerate computational lithography NVIDIA cuLitho is a library with optimized tools and algorithms for GPU-accelerating computational lithography and the manufacturing process of semiconductors by orders of magnitude over current CPU-based methods. Manufacturing computer chips requires a critical step called computational lithography – a complex computation – involving electromagnetic physics, photochemistry, computational geometry, iterative optimization, and distributed computing. This computational lithography step is already one of the largest compute workloads in semiconductor production, necessitating massive data centers, and the silicon miniaturization evolution process exponentially amplifies the computation requirements over time. [Watch GTC Session](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s52510/) * * * ## The Challenge As silicon feature sizes become smaller, and the impacts of optical diffraction have to be offset, there arises a need to proactively manipulate mask patterns with optical proximity correction (OPC) or inverse lithography technology (ILT) to accurately image wafers. This requires advanced computational lithography techniques and increased compute efficiency. Every industry needs to accelerate every workload, so that we can reclaim power and do more with less. ![Advanced computational lithography (cuLitho) techniques for computer chips](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/culitho-1ccw-t-2x.jpg "Advanced computational lithography (cuLitho) techniques for computer chips") ## Benefits Acceleration libraries are at the core of accelerated computing. NVIDIA cuLitho is a library that targets the emerging challenges of nanoscale computational lithography. With GPUs, it accelerates inverse lithography by 40X, helping create new solutions to make future semiconductor technologies cheaper and more predictable. It introduces new innovations and leverages decades of investment in CUDA® infrastructure. ![Improved performance from faster computational lithography ILT](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-speed-256px-blk.png " Improved performance from faster computational lithography ILT") ### Performance Faster inverse lithography technology (ILT) powers a 40X performance speedup, generating accurate photomasks faster. ![Increased productivity with cuLitho](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-clock-time-256px-blk.png "Increased productivity with cuLitho") ### Productivity Photomasks that took two weeks can be processed overnight. 3X to 5X more masks can be generated per day. ![cuLitho helps with cost savings by using less data center power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-lower-cost-256px-blk_0.png " cuLitho helps with cost savings by using less data center power") ### Cost Savings 500 NVIDIA Hopper™ GPU systems running cuLitho can do the work of 40,000 CPU systems. That's 1/9 the power and 1/8 the space. ![New lithography innovations can support future silicon scaling](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/icons/m48-scalability-down-sample-256px-blk.png " New lithography innovations can support future silicon scaling") ### Future Silicon Scaling Faster OPC enables new lithography innovations to address miniaturization of next-generation semiconductors, including subatomic modeling, curvilinear OPC, and high-numerical aperture extreme-ultraviolet (high-NA EUV) lithography. ## Collaborators Semiconductor and electronic design automation (EDA) industry leaders are leveraging NVIDIA cuLitho to drive silicon scaling and enable their end users to reduce costs and accelerate technology advancements. - - - > ![NVIDIA collaborator - ASML](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/asml.svg "NVIDIA collaborator - ASML") > > “We are planning to integrate support for GPUs into all of our computational lithography software products. Our collaboration with NVIDIA on GPUs and cuLitho should result in tremendous benefit to computational lithography and therefore to semiconductor scaling. This will be especially true in the era of High-NA EUV lithography.” > > > > — Peter Wennink, CEO, ASML ![](...) > ![NVIDIA collaborator - Synopsys](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/synopsys.svg "NVIDIA collaborator - Synopsys") > > “For more than two decades Synopsys Proteus mask synthesis software products have been the production-proven choice for accelerating computational lithography — the most demanding workload in semiconductor manufacturing. With the move to advanced nodes, computational lithography has dramatically increased in complexity and compute cost. Our collaboration with TSMC and NVIDIA is critical to enabling angstrom-level scaling as we pioneer advanced technologies to reduce turnaround time by orders of magnitude through the power of accelerated computing.” > > > > — Sassine Ghazi, President and CEO, Synopsys ![](...) > ![NVIDIA collaborator - TSMC](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/tsmc.svg "NVIDIA collaborator - TSMC") > > “Our work with NVIDIA to integrate GPU-accelerated computing in the TSMC workflow has resulted in great leaps in performance, dramatic throughput improvement, shortened cycle time and reduced power requirements. We are moving NVIDIA cuLitho into production at TSMC, leveraging this computational lithography technology to drive a critical component of semiconductor scaling” > > > > — Dr. C.C. Wei, CEO, TSMC ![](...) [Previous](#carouselExampleIndicators)[Next](#carouselExampleIndicators) ## Press Release ### Read 2023 Press Release NVIDIA, ASML, TSMC, and Synopsys collaborate to advance next-generation chip manufacturing by adopting NVIDIA's groundbreaking computational lithography, thereby pushing the boundaries of physics in the industry. Read the Press Release ### Read 2024 Press Release NVIDIA introduces Gen AI to computational lithography, and integration partners go into production with cuLitho. Read the Press Release ## Learn More About NVIDIA cuLitho at GTC [See Announcement](https://nvidianews.nvidia.com/news/tsmc-synopsys-nvidia-culitho) [Watch GTC Session](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s52510/) --- # Source: https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cuml.md 1. [AI](/topics/ai) [Data Science](/topics/ai/data-science) 2. [CUDA-X Data Science Libraries](/topics/ai/data-science/cuda-x-data-science-libraries) cuML # NVIDIA cuML: GPU-Accelerated Machine Learning NVIDIA cuML is an open-source CUDA-X™ Data Science library that accelerates scikit-learn, UMAP, and HDBSCAN on GPUs—supercharging machine learning workflows with no code changes required. [Demo Notebook](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/cuml_sklearn_colab_demo.ipynb)[Documentation](https://docs.rapids.ai/api/cuml/stable/zero-code-change/) * * * ## Key Features ### Maximizes Performance on NVIDIA GPUs cuML optimizes fundamental machine learning operations for execution on GPUs. This significantly speeds up model development and training times with quicker testing and parameter-tuning iterations. ### Zero-Code-Change Acceleration cuML includes an API (cuml.accel) that can run your existing scikit-learn, UMAP, or HDBSCAN code on GPUs with no code modifications. ### CPU Fallback cuML’s zero-code-change API (cuml.accel) ensures your scikit-learn, UMAP, and HDBSCAN code won’t fail to execute by automatically pushing code to the GPU or CPU based on coverage. Read more in [How It Works](#howitworks). ### Flexibility cuML includes two interfaces: a zero-code-change API for popular machine learning algorithms and a Python GPU-only machine learning library similar to scikit-learn with comprehensive coverage. [Learn more in the docs](https://docs.rapids.ai/api/cuml/stable/zero-code-change/#faqs). ### Scalability cuML efficiently utilizes single-GPU systems to process large datasets that overwhelm CPU-based implementations of core machine learning libraries. ### Distributed Computing cuML accelerates distributed machine learning applications at scale, with real-world examples of up to 6 TB datasets on multi-node-multi-GPU clusters via the popular Apache Spark MLlib API. * * * ## Turn cuML On to Accelerate scikit-learn by 50x https://www.youtube-nocookie.com/embed/cIJsVq8CPys?&loop=1&playlist=cIJsVq8CPys NVIDIA cuML runs popular machine learning algorithms like scikit-learn Random Forest, UMAP, and HDBSCAN on GPUs with zero code changes. * * * ## Test Drive cuML ### Intro Blog: cuML Accelerator NVIDIA cuML brings zero-code-change GPU acceleration with massive speedups to scikit-learn, UMAP, and HDBSCAN. [Read the Blog](https://developer.nvidia.com/blog/nvidia-cuml-brings-zero-code-change-acceleration-to-scikit-learn/) ### Colab Quickstart: Hands-On cuML Tutorial cuML comes preinstalled in Google Colab, making it incredibly easy to get started. Simply switch to a GPU runtime and use this notebook to try cuml.accel for scikit-learn, UMAP, and HDBSCAN. [Launch on Colab](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/cuml_sklearn_colab_demo.ipynb) * * * ## Install cuML To get started, install cuML using the code snippets below. ### Quick Install With conda 1. If not installed, download and run the install script. This will install the latest miniforge: wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh 2. Then install with: conda create -n rapids-25.12 -c rapidsai -c conda-forge \ cuml=25.12 python=3.13 cuda-version=13.0 ### Quick Install With pip Install via the NVIDIA PyPI index: pip install \ cudf-cu13==25.12.\* \ cuml-cu13==25.12\* \ See the complete install selector for docker, WSL2, and individual libraries. [Install Selector ](https://docs.rapids.ai/install?_gl=1*kwbd1w*_ga*MTE4NDAwMTQ1NS4xNzA5NzcwODcw*_ga_RKXFW6CM42*czE3NTIxODk0OTQkbzk1JGcwJHQxNzUyMTg5NDk0JGo2MCRsMCRoMA.. "Download Workflows") * * * ## Enable cuML Acceleration of scikit-learn, UMAP, and HDBSCAN With Zero Code Changes Once cuML is installed, you can access and enable the cuml.accel module to accelerate scikit-learn, UMAP, and HDBSCAN workflows with no code changes. Note that not all cuML estimators are supported in cuml.accel (open beta) today. Read about the [known limitations](https://docs.rapids.ai/api/cuml/stable/zero-code-change-limitations/) to understand what is and what is not covered. To use cuml.accel, enable it using one of these methods before importing scikit-learn, UMAP, or HDBSCAN: #### To accelerate IPython or Jupyter notebooks, use the magic command: %load\_ext cuml.accel import sklearn ... ### To accelerate a Python script, use the Python module flag on the command line: python -m cuml.accel script.py ### If you can't use command-line flags, explicitly enable cudf.pandas via import: import cuml.accel cuml.accel.install() import sklearn ... #### 50x Faster scikit-learn Speedup of average training performance for traditional machine learning algorithms running on cuml.accel and scikit-learn code on GPU vs scikit-learn on CPU. Specs: NVIDIA cuML 25.02 on NVIDIA H100 80GB HBM3, scikit-learn v1.5.2 on Intel Xeon Platinum 8480CL #### 60x Faster UMAP, 175x Faster HDBSCAN Speedup of average training performance for traditional machine learning algorithms running on cuml.accel and UMAP/HDBSCAN code on GPU vs UMAP/HDBSCAN on CPU. Specs: NVIDIA cuML 25.02 on NVIDIA H100 80GB HBM3, umap-learn v0.5.7, hdbscan v0.8.40 on Intel Xeon Platinum 8480CL [Run this benchmark.](https://github.com/rapidsai/cuml/tree/branch-25.06/python/cuml/cuml/benchmark) * * * ## Hands-On Tutorials: Accelerate scikit-learn, UMAP, and HDBSCAN Dive into these resources to accelerate your machine learning workflows with cuML, including hands-on examples of advanced ML techniques, specialized applications, and deployment optimizations. #### Starter Kit: Accelerate Portfolio Optimization This kit demonstrates an end-to-end machine learning and optimization workflow in portfolio use cases. It leverages CUDA-X libraries such as cuML for KDE fitting and sampling yields for significant acceleration on GPU compared to CPU, especially as dataset sizes and the scenarios to sample increase. - Blog: [Accelerating Real-Time Financial Decisions with Quantitative Portfolio Optimization](https://developer.nvidia.com/blog/accelerating-real-time-financial-decisions-with-quantitative-portfolio-optimization/)l - Notebook: [Quantitative Portfolio Optimization Developer Example](https://github.com/NVIDIA-AI-Blueprints/quantitative-portfolio-optimization) #### Starter Kit: Accelerate Single-Cell Genomics This kit demonstrates techniques to measure and analyze single-cell data at scale, accelerating analysis cycles and saving significant time by leveraging GPUs for genomics workflows. - [Blog: Driving Toward Billion-Cell Analysis and Biological Breakthroughs With RAPIDS-singlecell](https://developer.nvidia.com/blog/driving-toward-billion-cell-analysis-and-biological-breakthroughs-with-rapids-singlecell/) - [Blueprint: Single-Cell Analysis With RAPIDS-singlecell, Powered by CUDA-X Data Science](https://build.nvidia.com/nvidia/single-cell-analysis) #### Starter Kit: Accelerate Topic Modeling This kit demonstrates how to significantly improve performance for topic modeling by minimizing noise clusters and leveraging a rewards-guided, GPU-accelerated method with BERTopic and cuml.accel. - [Video: Minimizing Noise Cluster for Topic Modeling (13:47)](https://www.youtube.com/watch?v=8TBaLWvJBuE&t=12s) - [Notebook: Minimizing Noise Cluster for Topic Modeling](https://github.com/rapidsai-community/showcase/blob/main/blogs_notebooks/video_notebook_for_Minimizing_Noise_Cluster_for_Topic_Modeling.ipynb) - [Blog: Deep dive into UMAP, the technique behind accelerated topic modeling](https://developer.nvidia.com/blog/even-faster-and-more-scalable-umap-on-the-gpu-with-rapids-cuml/) #### Accelerate Time-Series Forecasting This blog demonstrates how cuML accelerates time-series forecasting, enabling you to work with larger datasets and forecast windows using skforecast for faster iteration. - [Blog: Accelerating Time-Series Forecasting With RAPIDS cuML](https://developer.nvidia.com/blog/accelerating-time-series-forecasting-with-rapids-cuml/) #### Starter Kit: Stacking Using cuML This kit shows how to achieve high-performance stacking by using the speed of GPUs to efficiently train and combine numerous diverse models, maximizing accuracy in complex tabular data challenges. - [Blog: Winning First Place in a Kaggle Competitions With Stacking Using cuML](https://developer.nvidia.com/blog/grandmaster-pro-tip-winning-first-place-in-a-kaggle-competition-with-stacking-using-cuml/) - [Blog: Stacking Generalization With HPO: Maximize Accuracy in 15 Minutes With cuML](https://developer.nvidia.com/blog/stacking-generalization-with-hpo-maximize-accuracy-in-15-minutes-with-nvidia-cuml/) #### Supercharge Tree Model Inference With FIL This blog highlights how Forest Inference Library (FIL) delivers blazing-fast inference for tree models within cuML. Explore new capabilities, performance gains, and features to optimize your model deployment. - [Blog: Supercharge Tree-Based Model Inference with Forest Inference Library in NVIDIA cuML](https://developer.nvidia.com/blog/supercharge-tree-based-model-inference-with-forest-inference-library-in-nvidia-cuml/) * * * ## How cuML Accelerates scikit-learn, UMAP, and HDBSCAN cuML introduced zero-code-change acceleration in open beta with the cuml.accel module. When you load this module, importing scikit-learn, umap-learn, or hdbscan allows cuML to "intercept" estimators from these CPU modules. This makes all scikit-learn estimators a proxy to either a GPU or a CPU estimator at any given time. When you use an estimator, your code will use cuML’s GPU-accelerated implementation under the hood if it can. If it can’t, it will fall back to standard CPU scikit-learn. This works in reverse as well. If you’ve already trained a model on the GPU and a particular method isn’t supported, cuML will reconstruct the trained model on the CPU and use the scikit-learn version. Read more about the rapidly growing [list of algorithms and parameters](https://docs.rapids.ai/api/cuml/stable/zero-code-change-limitations/) that the zero-code-change interface covers. cuML also provides an API that mirrors scikit-learn, supports a much wider set of algorithms, and is suitable for users looking to maximize performance for their bespoke applications. Read about it in [cuML's documentation](https://docs.rapids.ai/api/cuml/stable/). * * * ## Data Science Training From NVIDIA ![Data Science Learning Path From the NVIDIA DLI](https://developer.download.nvidia.com/icons/m48-certification-ribbon-1-256px-blk.png) ### Data Science Learning Path From the NVIDIA Deep Learning Institute ![Self-Paced Course: Accelerate Data Science Workflows With Zero Code Changes](https://developer.download.nvidia.com/icons/m48-deep-learning-institute-usd.svg) ### Self-Paced Course: Accelerate Data Science Workflows With Zero Code Changes ![Get Certified in Accelerated Data Science](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Certified in Accelerated Data Science * * * ## Join the Community ![Join NVIDIA CUDA-X Data Science Libraries Slack Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Join the CUDA-X Data Science Libraries Slack Community ![Sign Up for NVIDIA Data Science Newsletter](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for the Data Science Newsletter * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting team to ensure their application meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ### Get Started with NVIDIA cuML today. [Documentation](https://docs.rapids.ai/api/cuml/stable/zero-code-change/) --- # Source: https://developer.nvidia.com/cupqc.md 1. [Home](/) NVIDIA cuPQC # NVIDIA cuPQC **NVIDIA cuPQC** is an SDK of optimized libraries for implementing GPU-accelerated Post-Quantum Cryptography (PQC) workflows—especially crucial in high-throughput data environments. [Download Now](/cupqc-download/) * * * ## Key Features ### Crypto Agility With NIST Algorithms cuPQC achieves breakthrough performance on the NIST finalist algorithms ML-KEM and ML-DSA, with the flexibility to easily add future algorithms. ### High Performance Accelerates NIST ML-KEM Keygen, Encapsulation and Decapsulation by 143x, 99x, and 84x, respectively over a state-of-the-art CPU. ### Comprehensive Cryptographic Primitives Offers a robust set of cryptographic primitives, enabling developers to implement advanced cryptographic schemes with flexibility and performance. ### Broad GPU Platform Support cuPQC offers a high-performance transition to PQC, optimized for diverse GPU platforms—from embedded platforms like NVIDIA Jetson™ to data center-scale platforms. ### Adopted by the Global Cryptography Ecosystem Used by leading cryptographic developers, cloud service providers, and leading security startups. Available through the open-source library liboqs. ### Side-Channel Secure GPU-tailored code is secured against state-of-the-art microarchitectural and timing attacks. * * * ## Primitives cuPQC offers a suite of sub-libraries (primitives) designed to accelerate cryptographic schemes. ### Cryptographic Hash The Cryptographic Hash library accelerates cryptographic hash functions using GPU technology, providing implementations of widely used algorithms such as SHA-2, SHA-3, SHAKE, and Poseidon 2. Additionally, the library offers the capability to efficiently calculate Merkle Trees. With the Cryptographic Hash library, you can achieve significant performance improvements, making it ideal for applications that require fast and secure cryptographic protocols. Use cases include data integrity checks, digital signatures, proofs of membership, and hash-based signatures. * * * ## Performance—Accelerating Leading PQC Algorithms ML-KEM is the finalist for key exchange standardized by NIST in August 2024. cuPQC achieves throughputs of up to 13.3 million keygen/s, 9.3 million encapsulations/s, and 8 million decapsulations/s for batched ML-KEM-768 on a single H100 SXM5 GPU, increases of 143x, 99x, and 84x, respectively over a state-of-the-art CPU. ![A chart showing up to 140X speedup on ML-KEM-768 primitives](https://developer.download.nvidia.com/images/cuPQC-algorithms-performance.svg) _Performed on an NVIDIA H100. Measured in terms of throughput and operations per second, using a batch size of 1,000,000._ * * * ## ![Post-Quantum Cryptography (PQC) Alliance logo](https://developer.download.nvidia.com/images/pqca.svg) NVIDIA is a founding member of the PQC Alliance, a Linux Foundation consortium which aims to advance the adoption of PQC by producing high-assurance software implementations of standardized algorithms. * * * ## Latest Product News * * * ## Partners Adopting NVIDIA cuPQC “cuPQC’s safe and high-performance algorithms make transitioning to post-quantum cryptography achievable for enterprises with high-throughput security applications” - Hart Montgomery, Linux Foundation ![NVIDIA cuPQC Partner - Evolution](https://developer.download.nvidia.com/images/cupqc/evolutionq-logo.svg) ![NVIDIA cuPQC Partner - Open Quantum Safe](https://developer.download.nvidia.com/images/cupqc/open-quantum-safe-logo.svg) ![NVIDIA cuPQC Partner - PQShield](https://developer.download.nvidia.com/images/cupqc/pq-shield-logo.svg) ![NVIDIA cuPQC Partner - QuSecure](https://developer.download.nvidia.com/images/cupqc/qusecure-logo.svg) ![NVIDIA cuPQC Partner - Sandbox AQ](https://developer.download.nvidia.com/images/cupqc/sandbox-aq-logo.svg) * * * ## Resources - [Documentation: Library](https://docs.nvidia.com/cuda/cupqc/index.html) - [Make It So: Software Speeds Journey to Post-Quantum Cryptography](https://blogs.nvidia.com/blog/cupqc-quantum-cryptography/) - [Feedback](mailto:cuPQC-Libs-Feedback@nvidia.com) - [Quantum Computing Glossary Page](https://www.nvidia.com/en-us/glossary/quantum-computing/) --- # Source: https://developer.nvidia.com/cupti.md # NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit The **NVIDIA CUDA Profiling Tools Interface (CUPTI)** is a library that enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides a set of APIs targeted at ISVs creating profilers and other performance optimization tools: - the Activity API, - the Callback API, - the Host Profiling API, - the Range Profiling API, - the PC Sampling API, - the SASS Metric API, - the PM Sampling API, - the Checkpoint API, - the Profiling API, - the Python API (available [separately](https://pypi.org/project/cupti-python/)) Using these CUPTI APIs, independent software developers can create profiling tools that provide low and deterministic profiling overhead on the target system, while giving insight into the CPU and GPU behavior of CUDA applications. Normally packaged with the CUDA Toolkit, NVIDIA occasionally uses this page to provide CUPTI improvements and bug fixes between toolkit releases. **There is currently no CUPTI update to the CUDA Toolkit 13.1 Update 1. You may obtain the latest version of CUPTI by Downloading the CUDA Toolkit 13.1.1** [Download the CUDA Toolkit 13.1 Update 1 Now](https://developer.nvidia.com/cuda-13-1-1-download-archive)[Download the CUDA Toolkit 13.1 Now](https://developer.nvidia.com/cuda-13-1-0-download-archive) [Download the CUPTI Python API 13.0.0 Now](https://pypi.org/project/cupti-python/13.0.0/) ##### [Revision History](/cupti-history) #### Key Features - Trace CUDA API by registering callbacks for API calls of interest - Full support for entry and exit points in the CUDA C Runtime (CUDART) and CUDA Driver - GPU workload trace for the activities happening on the GPU, which includes kernel executions, memory operations (e.g., Host-to-Device memory copies) and memset operations. - CUDA Unified Memory trace for transfers from host to device, device to host, device to device and page faults on CPU and GPU etc. - Normalized timestamps for CPU and GPU trace - Profile hardware and software event counters, including: - Utilization metrics for various hardware units - Instruction count and throughput - Memory load/store events and throughput - Cache hits/misses - Branches and divergent branches - Many more - Enables automated bottleneck identification based on metrics such as instruction throughput, memory throughput, and more - Range profiling to enable metric collection over concurrent kernel launches within a range - Metrics attribution at the high-level source code and the executed assembly instructions. - Device-wide sampling of the program counter (PC). The PC Sampling gives the number of samples for each source and assembly line with various stall reasons. See the CUPTI User Guide for a complete listing of hardware and software event counters available for performance analysis tools. #### Updates in CUDA Toolkit 13.1 Update 1 ##### Resolved Issues - Fixed incorrect correlation IDs for `cudaGraphLaunch` API calls and their associated kernel launches in profiling sessions after the first session. - Fixed issue where CUPTI graph creation callbacks were not triggered for CUDA device graphs under green context. - Fixed linker error when linking the static CUPTI library from CUDA 13.0 and 13.1 GA releases. - Fixed issue where CUPTI limited activity buffer records to 2GB regardless of the configured activity buffer size. #### Updates in CUDA Toolkit 13.1 ##### New Features - Added support for Compute Engine context switch events. For more details, refer to the section [Compute Engine Context Switch](https://docs.nvidia.com/cupti/main/main.html#activity-ce-context-switch). - Added tracing for host execution nodes in CUDA Graphs i.e. nodes of type `CU_GRAPH_NODE_TYPE_HOST`. Enable with the activity kind `CUPTI_ACTIVITY_KIND_GRAPH_HOST_NODE`; records are reported as `CUpti_ActivityGraphHostNode`. - Added tracing for host launches done by CUDA through `cudaLaunchHostFunc()` API. This is important in understanding the device bubbles in the stream timeline. Enable with the activity kind `CUPTI_ACTIVITY_KIND_HOST_LAUNCH`; records are reported as `CUpti_ActivityHostLaunch`. - Users can query the collection scope for any metric. The `CUpti_MetricCollectionScope` enum lists the possible scopes: context or device. A parameter, `metricCollectionScope`, is added to `CUpti_Profiler_Host_GetMetricProperties_Params` to return the collection scope for a metric. - By default, the counter availability image stores only context-level metrics in the binary blob. A new parameter, `bAllowDeviceLevelCounters` , has been added to the counter availability API struct `CUpti_Profiler_GetCounterAvailability_Params` to include device-level metrics in the image. - Added a new parameter `sku` in `CUpti_Profiler_DeviceSupported_Params` for checking profiling support for a GPU SKU. - Added a new parameter `priority` in the kernel record to provide the launch priority of the kernel. The `CUpti_ActivityKernel10` activity record is deprecated and replaced by the new `CUpti_ActivityKernel11` activity record. ##### Resolved Issues - Fixed an issue that could cause invalid (zero) kernel timestamps when using the function `cuptiActivityRegisterTimestampCallback` to register a timestamp callback. This issue was introduced in the CUDA 12.6 Update 2 release. #### Requirements ##### Supported platforms - Linux x86\_64[1] - Windows x86\_64[1] - Linux aarch64 SBSA[1] - DRIVE OS QNX aarch64[2] - DRIVE OS Linux aarch64[2] [1] available in the [CUDA Desktop Toolkit](https://developer.nvidia.com/cuda-downloads) only [2] available in the [Embedded](https://developer.nvidia.com/embedded/develop/tools) or [Drive](https://developer.nvidia.com/drive/drive-sdk) toolkits only ##### Supported NVIDIA GPU architectures - Activity and Callback APIs - All architectures supported by CUDA Toolkit - Profiling and PC Sampling APIs - Blackwell: B100, GB10x, GB11x - Hopper: GH100 - Ada: AD10x - Ampere: A100 with Multi-Instance GPU, GA10x - Turing ##### CUDA Toolkit - CUPTI can be found in the [CUDA Toolkit 13.1 Update 1](https://developer.nvidia.com/cuda-downloads) production release ##### Drivers Please use the following drivers - 591.59 (Windows) available at the [NVIDIA Driver Download page](https://www.nvidia.com/Download/index.aspx). - 590.48.01 (Linux) provided with [CUDA Toolkit 13.1 Update 1](https://developer.nvidia.com/cuda-downloads) production release. #### Documentation - [Online Product Documentation](https://docs.nvidia.com/cupti/index.html) #### Support To provide feedback, request additional features, or report issues, please use the [Developer Forums](https://forums.developer.nvidia.com/c/developer-tools/cuda-profiler-tools-interface-cupti/109). #### Installation Overview When installing [CUDA Toolkit 13.1 Update 1](https://developer.nvidia.com/cuda-downloads) and specifying options, be sure to select CUDA \> Development \> Tools \> CUPTI. --- # Source: https://developer.nvidia.com/cupynumeric.md 1. [Home](https://developer.nvidia.com/) NVIDIA cuPyNumeric # NVIDIA cuPyNumeric **NumPy and SciPy on Multi-Node Multi-GPU Systems** cuPyNumeric is a multi-node, multi-GPU array computing library that implements the NumPy API on top of Legate, bringing accelerated computing to the Python scientific community. **Python** is a powerful and user-friendly programming language widely adopted by researchers and scientists for data science, machine learning (ML), and productive numerical computing. **NumPy** is the de facto standard for math and matrix libraries, providing a simple and easy-to-use programming model with interfaces that correspond closely to the mathematical needs of scientific applications. As data sizes and computational complexities grow, CPU-based Python and NumPy programs need help meeting the speed and scale demanded by cutting-edge research. Distributed accelerated computing offers the infrastructure to efficiently solve and test hypotheses in data-driven problems. Whether analyzing data from high-energy electron beams, solving complex computational fluid dynamics problems, or building ML models, researchers are increasingly seeking ways to scale their programs. With cuPyNumeric, you can take your existing NumPy workflows and seamlessly scale them from a single CPU to a single GPU, and up to thousands of GPUs across a multi-node, multi-GPU cluster, without changing your code. This powerful scaling enables you to focus on your research and discovery, not on complex code modifications for different hardware environments. Download the latest beta release of cuPyNumeric today. The cuPyNumeric project is independent of the CuPy project. CuPy is a trademark of Preferred Networks, Inc, and the name ‘cuPyNumeric’ is used with their permission. [Download Now ](https://github.com/nv-legate/cunumeric "Download Now on GitHub") * * * ## Legate Legate is an abstraction layer that runs on top of the CUDA® runtime system, together providing scalable implementations of popular domain-specific APIs. NVIDIA cuPyNumeric layers on top of Legate, like many other libraries. Legate democratizes computing by making it possible for all programmers to leverage the power of large clusters of CPUs and GPUs by running the same code that runs on a desktop or a laptop at scale. Using this technology, scientists and researchers can develop and test programs on moderately sized datasets on local machines and then immediately scale up to larger datasets deployed on many nodes in the cloud or on a supercomputer without any code modifications. [Getting Started With Legate](https://developer.nvidia.com/legate) * * * ## Key Benefits The NVIDIA cuPyNumeric library on Legate: - Supports native Python language and NumPy interface without constraints - Transparently accelerates and scales existing NumPy workflows - Provides a seamless drop-in replacement for NumPy - Provides automatic parallelism and acceleration for multiple nodes across CPUs and GPUs - Scales from one CPU up to thousands of GPUs optimally - Requires little to no code changes, allowing faster completion of scientific tasks - Is freely available. Get started with the [installation guide](https://docs.nvidia.com/cupynumeric/latest/installation.html) and [tutorial](https://github.com/NVIDIA/accelerated-computing-hub/blob/main/Accelerated_Python_User_Guide/notebooks_v1/Chapter_11_Distributed_Computing_cuPyNumeric.ipynb). * * * ## cuPyNumeric Performance ![Weak Scaling of Richard-Lucy Devonvolution on NVIDIA DGX SuperPOD](https://developer.download.nvidia.com/images/nvidia-cunumeric-vs-cupy-2c50-d.svg) _Weak Scaling of Richard-Lucy Deconvolution on DGX SuperPOD_ #### Processing 10TB Microscopy Image Data as a Single NumPy Array This multi-view lattice light-sheet microscopy example produces tens of terabytes (TB) of raw image data per day. Up until now, all processing has happened offline, after all the data has been collected. By moving all the preprocessing and reconstruction operations to GPUs and using cuPyNumeric on Legate, the data can be visualized in real time as it’s processed. Get started with cuPyNumeric today. [Download Now ](https://github.com/nv-legate/cunumeric#installation "Download Now on GitHub") --- # Source: https://developer.nvidia.com/cuquantum-sdk.md # cuQuantum NVIDIA cuQuantum is an SDK of optimized libraries and tools that accelerate quantum computing emulations at both the circuit and device level by orders of magnitude. ![Placeholder](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-cuquantum-icon.svg) [Download Now](https://developer.nvidia.com/cuquantum-downloads "Github Repo")[Install With Conda](https://anaconda.org/conda-forge/cuquantum-python "Download Workflows") * * * ## Quick Links NGC ### CUDA-Q A quantum framework with simulators powered by cuQuantum in a container. Documentation ### Documentation Documentation for cuQuantum and the cuQuantum Appliance OSS (Github) ### GitHub The cuQuantum public repository, including cuQuantum Python bindings and examples Documentation ### Latest Notes The cuQuantum release notes, including the latest and greatest features * * * ## Features and Benefits ![Flexibility](https://developer.download.nvidia.com/icons/m48-fleet-command.svg) ### Flexible Choose the best approach for your work from algorithm-agnostic accelerated quantum circuit simulation methods. 1. State vector method features include optimized memory management and, gate application kernels. 2. Tensor Network Method features include accelerated tensor network contraction, order optimization, and approximate contractions. 3. Density Matrix Method features include arbitrary operator action on the state. ![Scalability](https://developer.download.nvidia.com/icons/m48-scalability-up-sample.svg) ### Scalable Leverage the power of multi-node, multi-GPU clusters using the latest GPUs on premises or in the cloud. 1. Low-level C++ APIs provide increased control and flexibility for a single GPU and single-node multi-GPU clusters. 2. The high-level Python API supports drop-in multi-node execution. ![Fast speed](https://developer.download.nvidia.com/icons/m48-speed.svg) ### Fast Simulate bigger problems faster, and get more work done sooner. 1. Using an NVIDIA H200 Tensor Core GPU over CPU implementations delivers orders-of-magnitude speedups on key quantum problems, including random quantum circuits, Shor’s algorithm, and the Variational Quantum Eigensolver. 2. Leveraging the NVIDIA Eos supercomputer, cuQuantum generated a sample from a full-circuit simulation of the Google Sycamore processor in less than five minutes. * * * ## cuQuantum Framework Integrations cuQuantum is integrated with leading quantum simulation frameworks. Download cuQuantum to dramatically accelerate performance using your framework of choice, with zero code changes. ![NVIDIA cuQuantum is integrated with Amazon Web Services (AWS)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuquantum/aws-logo.svg) ![NVIDIA cuQuantum is integrated with Cirq](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuquantum/ic-cirq-logo.svg) ![NVIDIA cuQuantum is integrated with Gruppy](https://developer.download.nvidia.com/images/cuquantum/guppy-logo.svg) ![NVIDIA cuQuantum is integrated with ExaTN](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuquantum/exatn-logo.svg) ![NVIDIA cuQuantum is integrated with Pennylane](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuquantum/pennylane-logo.svg) ![NVIDIA cuQuantum is integrated with Qibo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuquantum/ic-qibo-logo.svg) ![NVIDIA cuQuantum is integrated with Qiskit](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuquantum/ic-qiskit-logo.svg) ![NVIDIA cuQuantum is integrated with QuEST](https://developer.download.nvidia.com/images/cuquantum/quest-logo.svg) ![NVIDIA cuQuantum is integrated with QurlParts](https://developer.download.nvidia.com/images/cuquantum/quriparts-logo.svg) ![NVIDIA cuQuantum is integrated with QuTIP](https://developer.download.nvidia.com/images/cuquantum/qutip-logo.svg) ![NVIDIA cuQuantum is integrated with TKET](https://developer.download.nvidia.com/images/cuquantum/quantinuum-tket-logo.svg) ![NVIDIA cuQuantum is integrated with Torch Quantum](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/torch-quantum-logo.svg) ![NVIDIA cuQuantum is integrated with XACC](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuquantum/ic-xacc-logo.svg) * * * ## Components Tools to accelerate quantum emulations on NVIDIA hardware. cuDensityMat cuPauliProp cuStabilizer cuStateVec cuTensorNet ### Largest Scale Dynamics Designing quantum computers and devices has always been challenging. Simulations for these problems can be slow and limited in their ability to scale. cuQuantum now includes time dynamics functionality, which enables users to accelerate analog Hamiltonian dynamics to unprecedented scales. Users can now better understand how to optimize device design where quantum phenomena occur faster than before. By distributing the state and operators across multi-GPU, multi-node systems, cuQuantum allows phase space exploration larger than ever before, only limited by the number of GPUs a user has access to. [Learn More About cuQuantum Dynamics](https://developer.nvidia.com/blog/accelerating-googles-qpu-development-with-new-quantum-dynamics-capabilities/) ![](https://developer.download.nvidia.com/images/cuquantum/36q-multi-node-quantum-dynamics-simulation.png) Google was able to scale simulations of analog dynamics on its processors to 40 qubits with 1024 GPUs using NVIDIA’s Eos supercomputer. This enables QPU builders like Google to understand long-range effects on their devices, perform validation, and design more effectively than ever before, ushering in a new age for QPU design. ### Fastest GPU Implementations The core operator action API gives developers of custom solvers the flexibility to apply arbitrary time-dependent operators to the quantum state more efficiently than was previously possible. Our advanced algorithms allow us to scale further with the same hardware memory. This enables users to design better quantum systems more quickly than was previously possible. With multi-GPU memory, developers can drastically accelerate their QPU design cycle by simulating 473 different quantum systems in the time it formerly took to do just one. Strong scaling shows that these APIs can speed up a range of Hamiltonians and operator terms to even further accelerate hardware development cycle. [Learn More About cuQuantum cuDensityMat](https://docs.nvidia.com/cuda/cuquantum/latest/cudensitymat/index.html) ![A graph showing a transmon coupler resonator noisy device quantum dynamics](https://developer.download.nvidia.com/images/cuquantum/transmon-coupler-resonator-graph.svg) cuDensityMat speeds up and scales simulations beyond what was previously possible with the next-best alternatives. Simulating a qudit coupler resonator is multiple orders of magnitude faster and can scale simulations to an arbitrary number of GPUs. This shows scaling to 1024 GPUs with GB200 NVL72 for a Hilbert space with 1.44 million levels. Users can now study much more complex systems and larger unit cells of quantum devices. ### Multi-GPU Speedups State vector simulation tracks the entire state of the system over time, through each gate operation. It’s an excellent tool for simulating deep or highly entangled quantum circuits, and for simulating noisy qubits. Recent software updates to our offering have enabled a 5.53x speedup over previously reported numbers. When combined with ~2.5x speedups offered by NVIDIA Blackwell GPUs, users see even greater speedups over CPU implementations, despite CPU hardware and software improvements. ![A graph showing 34Q QFT state vector simulation with noise using NVIDIA cuStateVec](https://developer.download.nvidia.com/images/cuquantum/34q-qft-state-vector-simulation-custatevec-graph.svg) cuStateVec speeds up noisy simulations of critical quantum algorithms, such as the quantum Fourier transform, by 25x on NVIDIA GB200 NVL72 over a single GPU, achieving nearly linear scaling. This tool enables extremely fast noisy quantum simulations to be done with speed of light on the NVIDIA stack. ### Multi-Node Speedups This multi-node capability enables users of the NVIDIA Quantum platform to achieve the most performant quantum circuit simulations at supercomputer scales. On key problems like quantum phase estimation, quantum approximate optimization algorithm (QAOA), quantum volume, and more, the newest cuQuantum Appliance is over two orders of magnitude faster than previous implementations and seamlessly scales from a single GPU to a supercomputer. [Learn More About cuStateVec](https://docs.nvidia.com/cuda/cuquantum/latest/custatevec/index.html) ![A graph showing weak scaling state vector simulation on GB200 NVL72 using NVIDIA cuStateVec](https://developer.download.nvidia.com/images/cuquantum/weak-scaling-state-vector-simulations-graph.svg) Performance is benchmarked leveraging quantum volume with a depth of 30, along with QAOA and a small quantum phase estimation, run on NVIDIA GB200 NVL72 up to 40 qubits in single precision. On average, cuQuantum with B200 GPUs is 3x faster than H100s at 40 qubits. ### Pathfinding Performance Tensor network methods are rapidly gaining popularity to simulate hundreds or thousands of qubits for near-term quantum algorithms. Tensor networks scale with the number of quantum gates rather than the number of qubits. This makes it possible to simulate very large qubit counts with smaller gate counts on large supercomputers. Tensor contractions dramatically reduce the memory requirement for running a circuit on a tensor network simulator. The research community is investing heavily in improving pathfinding methods for quickly finding near-optimal tensor contractions before running a simulation. cuTensorNet provides state-of-the-art performance for both the pathfinding and contraction stages of tensor network simulation. ![A graph showing time to find an optimized contraction path using single core](https://developer.download.nvidia.com/images/cuquantum/time-to-find-and-optimized-contraction-path-using-single-core-100.jpg) Performance for cuTensorNet pathfinding compared to Cotengra in terms of seconds per sample. Both runs are leveraging a single-core Xeon Platinum 8480+. Sycamore refers to 53-qubit random quantum circuits of depth 10, and 20 from Arute et al., Quantum supremacy using a programmable superconducting processor. [www.nature.com/articles/s41586-019-1666-5](https://www.nature.com/articles/s41586-019-1666-5) Cotengra: Gray & Kourtis, Hyper-optimized Tensor Network Contraction, 2021. [quantum-journal.org/papers/q-2021-03-15-410](https://quantum-journal.org/papers/q-2021-03-15-410/) ### Less Contraction Time, More Efficiency When comparing contraction performance for cuTensorNet against Torch, cuPy, and NumPy, all runs leverage the same best contraction path. cuTensorNet, cuPy, and Torch all ran on one NVIDIA B200 GPU, and NumPy was run on a single-socket Emerald Rapids CPU. Using cuQuantum, NVIDIA researchers simulated a quantum support vector machine for a wide range of classification problems using 65,000 qubits on the NVIDIA GB200 NVL72—orders of magnitude larger than the largest problem run on quantum hardware to date. [Learn More About cuTensorNet](https://docs.nvidia.com/cuda/cuquantum/latest/cutensornet/index.html) ![A graph showing state-of-the-art contraction time performance for NVIDIA cuTensorNet against numPy](https://developer.download.nvidia.com/images/cuquantum/state-of-the-art-contraction-time-performance-numpy-graph.svg) ![A chart showing MPS gate split performance on GPU](https://developer.download.nvidia.com/images/cuquantum/state-of-the-art-contraction-time-performance-torch-graph.svg) Sycamore Circuit: 53 qubits depth 10 Quantum Fourier Transform: 34 qubits Inverse Quantum Fourier Transform: 36 qubits Quantum Volume: 26 and 30 qubits with depth 30 QAOA: 36 qubits with one and four parameters ### Approximate Tensor Network Methods As ‌quantum problems of interest can greatly vary in both size and complexity, researchers have developed highly customized approximate tensor network algorithms to address the gamut of possibilities. To enable easy integration with these frameworks and libraries, cuTensorNet provides a set of APIs to cover the following common use cases: tensor QR, tensor SVD, and gate split. These primitives enable users to accelerate and scale different types of quantum circuit simulators. A common approach to simulating quantum computers, which takes advantage of these methods, is Matrix Product States (MPS, also known as tensor train). Users can leverage these new cuTensorNet APIs to accelerate MPS-based quantum circuit simulators. The gate split and tensor SVD APIs enable nearly an order of magnitude speedup over state-of-the-art CPU implementations. Tensor QR is the most efficient, with nearly two orders-of-magnitude speedup over the same Xeon 8480+ CPU. [Learn More About cuTensorNet](https://docs.nvidia.com/cuda/cuquantum/latest/cutensornet/index.html) ![A graph showing Matrix Product States (MPS) simulation tensor singular value decomposition (tensor SVD) performance on GPU](https://developer.download.nvidia.com/images/cuquantum/mps-simulation-svd-performance-gpu-graph.svg) MPS Singular Value Decomposition performance is measured in execution time as a function of bond dimension. We execute this on an NVIDIA B200 140 GB GPU and compare it to NumPy running on an EPYC 9124 data center CPU. ### Fast Expectation Values Simulating large-scale quantum systems and calculating the observables has been expensive with other approaches, such as Matrix Product State methods. Pauli propagation techniques express observables in terms of an expansion in products of Pauli operators and allow tracking of only the most important ones as the circuit is applied in reverse to the observable, leading to very efficient expectation value calculations. While this effort has largely been restricted to CPU hardware, cuQuantum offers primitives to drastically accelerate and scale this workload on GPUs today. This will enable cutting-edge researchers and quantum computer builders to validate and verify their results in regimes not previously possible. [Learn More About cuPauliProp](https://docs.nvidia.com/cuda/cuquantum/latest/cupauliprop/index.html) ![A graph showing GPU speedup over Qiskit PauliProp for 127 qubit utility circuit](https://developer.download.nvidia.com/images/cuquantum/gpu-speedup-qiskit-pauliprop-graph.svg) cuQuantum GPU simulations show multiple orders of magnitude speedups for a range of truncation schemes on NVIDIA GB200 compared to Qiskit PauliProp on an Emerald Rapids Data Center CPU. ### High Sampling Rates Understanding the effect of noise on Clifford circuits is a key for Quantum Error Correction (QEC) and hardware development. This involves simulating how rare noise events affect quantum measurements, which makes such simulation a critical workload for developing error correction codes and decoders to fix these errors. Users who leverage GPU acceleration are able to increase the rates at which they can perform offline real-time decoding with decoders and simulators in the same loop. Similarly, generating synthetic data to train AI-based decoders is often bottlenecked by CPU simulation. By bringing the emulation of Clifford circuits onto GPUs, this tool can add value to researchers on the cutting edge. [Learn More About cuStabilizer](https://docs.nvidia.com/cuda/cuquantum/latest/custabilizer/index.html) ![A graph showing throughput and speedup for surface code distance of 31 using NVIDIA cuQuantum](https://developer.download.nvidia.com/images/cuquantum/throughput-speedup-cuquantum-graph.svg) cuQuantum drastically improves the sampling rate for noisy effects on measurements. With a surface code distance of 31, the number of runtime per number of samples scales much better with NVIDIA GB200 compared to stim off the shelf, even with a custom multithreaded wrapper on an Emerald Rapids data center CPU. * * * ## Resources Whether you’re a researcher, developer, or enthusiast, our resources are tailored to help you get the most out of your quantum simulations and algorithms. Explore these resources to unlock the full potential of cuQuantum and accelerate your quantum research and development. ### Watch GTC Sessions - [Watch the Latest NVIDIA SC24 Special Address](https://www.nvidia.com/en-us/events/supercomputing/) - [Watch the Latest NVIDIA GTC keynote](https://www.nvidia.com/gtc/keynote/) - [NVIDIA GTC23 Quantum Computing Sessions](https://www.nvidia.com/gtc/session-catalog/?tab.catalogallsessionstab=16566177511100015Kus&search=&search.topic=16246413645860352Wfy&ncid=so-twit-862889#/) - [Watch the latest NVIDIA GTC keynote](https://www.nvidia.com/gtc/keynote/) - [NVIDIA GTC23 Quantum Computing Sessions](https://www.nvidia.com/gtc/session-catalog/?tab.catalogallsessionstab=16566177511100015Kus&search=&search.topic=16246413645860352Wfy&ncid=so-twit-862889#/) - [Introducing cuQuantum: Accelerating State Vector and Tensor Network-Based Quantum Circuit Simulation](https://events.rainfocus.com/widget/nvidia/nvidiagtc/sessioncatalog?search=A31093) - [A Deep Dive on the Latest HPC Software](https://gtc21.event.nvidia.com/media/A%20Deep%20Dive%20into%20the%20Latest%20HPC%20Software%20%5BS31286%5D/1_d2i9gz18) - [Benchmarking GPU Clusters With Universal Quantum Computing Simulations](https://gtc21.event.nvidia.com/media/Benchmarking%20GPU%20Clusters%20with%20the%20J%C3%BClich%20Universal%20Quantum%20Computer%20Simulator%20%5BE31941%5D/1_qfpd6x8r) ### Read NVIDIA Blog Posts - [Accelerating Google’s QPU Development With New Quantum Dynamics Capabilities](https://developer.nvidia.com/blog/accelerating-googles-qpu-development-with-new-quantum-dynamics-capabilities/) - [NVIDIA, Rolls-Royce, and Classiq Announce Quantum Computing Breakthrough for Computational Fluid Dynamics in Jet Engines](https://nvidianews.nvidia.com/news/nvidia-rolls-royce-and-classiq-announce-quantum-computing-breakthrough-for-computational-fluid-dynamics-in-jet-engines) - [Accelerating Google’s QPU Development with New Quantum Dynamics Capabilities ](https://developer.nvidia.com/blog/accelerating-googles-qpu-development-with-new-quantum-dynamics-capabilities/) - [NVIDIA, Rolls-Royce and Classiq Announce Quantum Computing Breakthrough for Computational Fluid Dynamics in Jet Engines](https://nvidianews.nvidia.com/news/nvidia-rolls-royce-and-classiq-announce-quantum-computing-breakthrough-for-computational-fluid-dynamics-in-jet-engines) - [Enabling Matrix Product State–Based Quantum Circuit Simulation With NVIDIA cuQuantum](https://developer.nvidia.com/blog/enabling-matrix-product-state-based-quantum-circuit-simulation-with-nvidia-cuquantum/) - [Best-in-Class Quantum Circuit Simulation at Scale With NVIDIA cuQuantum Appliance](https://developer.nvidia.com/blog/best-in-class-quantum-circuit-simulation-at-scale-with-nvidia-cuquantum-appliance/) - [Achieving Supercomputing-Scale Quantum Circuit Simulation With the cuQuantum Appliance](https://developer.nvidia.com/blog/achieving-supercomputing-scale-quantum-circuit-simulation-with-the-dgx-cuquantum-appliance/) - [Growing Range of Researchers, Scientists Adopt NVIDIA cuQuantum and CUDA Quantum](https://blogs.nvidia.com/blog/2022/09/20/cuquantum-qoda-adoption-accelerates/) - [NVIDIA Teams With Google Quantum AI, IBM, and Other Leaders to Speed Research in Quantum Computing](https://blogs.nvidia.com/blog/2021/11/09/cuquantum-public-beta/) - [NVIDIA Sets World Record for Quantum Computing Simulation With cuQuantum Running on DGX SuperPOD](https://blogs.nvidia.com/blog/2021/11/09/cuquantum-world-record/) - [What Is Quantum Computing?](https://blogs.nvidia.com/blog/2021/04/12/what-is-quantum-computing/) - [Accelerating Quantum Circuit Simulation With NVIDIA cuStateVec](https://developer.nvidia.com/blog/accelerating-quantum-circuit-simulation-with-nvidia-custatevec) - [Scaling Quantum Circuit Simulation With NVIDIA cuTensorNet](https://developer.nvidia.com/blog/scaling-quantum-circuit-simulation-with-cutensornet/) - [What Is a QPU?](https://blogs.nvidia.com/blog/2022/07/29/what-is-a-qpu/) * * * ## More Resources ![A decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Community ![img-alt-text](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign up for Developer Newsletter ![](https://developer.download.nvidia.com/images/isaac/lab/m48-ai-startup-256px-blk.png) ### Inception for Startups * * * ## Get started with cuQuantum today. [Download Now](https://developer.nvidia.com/cuquantum-downloads) --- # Source: https://developer.nvidia.com/curand.md # cuRAND ## Random Number Generation on NVIDIA GPUs [DOWNLOAD](/hpc-sdk)[DOCUMENTATION](http://docs.nvidia.com/cuda/curand/index.html)[SAMPLES](http://github.com/nvidia/cudalibrarysamples)[SUPPORT](https://forums.developer.nvidia.com/c/accelerated-computing/gpu-accelerated-libraries/12)[FEEDBACK](mailto:Math-Libs-Feedback@nvidia.com?subject=cuRand%20Feedback) The NVIDIA CUDA Random Number Generation library (cuRAND) delivers high performance GPU-accelerated random number generation (RNG). The cuRAND library delivers high quality random numbers 8x faster using hundreds of processor cores available in NVIDIA GPUs. The cuRAND library is included in both the [NVIDIA HPC SDK](/hpc-sdk) and the [CUDA Toolkit](/cuda-downloads). [Explore what’s new in the latest release...](/cuda-toolkit/whatsnew) Review the latest [CUDA performance report](http://developer.download.nvidia.com/compute/cuda/compute-docs/cuda-performance-report.pdf) to learn how much you could accelerate your code. * * * ### cuRAND Performance cuRAND also provides two flexible interfaces, allowing you to generate random numbers in bulk from host code running on the CPU or from within your CUDA functions/kernels running on the GPU. A variety of RNG algorithms and distribution options means you can select the best solution for your needs. ### cuRAND Key Features - **Flexible usage model** - Host API for generating random numbers in bulk on the GPU - Inline implementation allows use inside GPU functions/kernels, or in your host code - **Four high-quality RNG algorithms** - MRG32k3a - MTGP Merseinne Twister - XORWOW pseudo-random generation - Sobol’ quasi-random number generators, including support for scrambled and 64-bit RNG - **Multiple RNG distribution options** - Uniform distribution - Normal distribution - Log-normal distribution - Single-precision or double-precision - Poisson distribution [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/curandcallout.JPG)](/nvidia-gpu-computing-documentation#cuRAND) The random number generators and statistical distributions provided in the cuRAND library have been tested against well-known statistical test batteries, including TestUO1. Please see the cuRAND documentation for selected test results. ### cuRAND Availability The cuRAND library is freely available as part of the [NVIDIA HPC SDK](https://developer.nvidia.com/hpc-sdk). It is also included with the [CUDA Toolkit](http://www.nvidia.com/getcuda). For more information on cuRAND and other CUDA math libraries: - Source code examples demonstrating how to use the cuRAND library: - [CUDA C Monte Carlo: Single Asian Option](http://docs.nvidia.com/cuda/cuda-samples/index.html#monte-carlo-single-asian-option) - [CUDA C Monte Carlo Estimation of Pi (batch QRNG)](http://docs.nvidia.com/cuda/cuda-samples/index.html#monte-carlo-estimation-of-pi--batch-qrng-) - [CUDA C Monte Carlo Estimation of Pi (batch PRNG)](http://docs.nvidia.com/cuda/cuda-samples/index.html#monte-carlo-estimation-of-pi--inline-prng-) - [CUDA C Monte Carlo Estimation of Pi (batch inline QRNG)](http://docs.nvidia.com/cuda/cuda-samples/index.html#monte-carlo-estimation-of-pi--batch-inline-qrng--) - [CUDA C Monte Carlo Estimation of Pi (inline PRNG)](http://docs.nvidia.com/cuda/cuda-samples/index.html#monte-carlo-estimation-of-pi--inline-prng-) - [Additional GPU-accelerated libraries](/gpu-accelerated-libraries) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/curand_perf_A100-PCIE-40GB_455.23.05_11.3.27_10.2.4.27_f64_light.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/curand_perf_A100-PCIE-40GB_455.23.05_11.3.27_10.2.4.27_f32_light.svg) --- # Source: https://developer.nvidia.com/cusolver.md # cuSOLVER ## Direct Linear Solvers on NVIDIA GPUs [Download](/hpc-sdk)[Documentation](https://docs.nvidia.com/cuda-libraries/index.html)[Samples](http://github.com/nvidia/cudalibrarysamples)[Support](https://forums.developer.nvidia.com/c/accelerated-computing/gpu-accelerated-libraries/12)[Feedback](mailto:Math-Libs-Feedback@nvidia.com?subject=cuSOLVER%20Feedback) The NVIDIA cuSOLVER library provides a collection of dense and sparse direct linear solvers and Eigen solvers which deliver significant acceleration for Computer Vision, CFD, Computational Chemistry, and Linear Optimization applications. The cuSOLVER library is included in both the [NVIDIA HPC SDK](/hpc-sdk) and the [CUDA Toolkit](/cuda-downloads). ## cuSOLVERMp Multi-Node Multi-GPU Host API The NVIDIA cuSOLVERMp library is a high-performance, distributed-memory, GPU-accelerated library that provides tools for solving dense linear systems and eigenvalue problems. The library is available as a standalone download and is also included in the [NVIDIA HPC SDK](https://developer.nvidia.com/hpc-sdk). [Download](http://developer.nvidia.com/cusolvermp-downloads) ## cuSOLVER Performance cuSOLVER 11 leverages DMMA Tensor Cores automtically. DGX A100 is over 2x faster than DGX-2 despite having half the number of GPUs thanks to A100 and third generation NVLINK and NVSWITCH. ## cuSOLVER Key Features - cusolverDN: Key LAPACK dense solvers 3-6x faster than MKL. - Dense Cholesky, LU, SVD, QR - Applications include: optimization, Computer Vision, CFD - cusolverSP - Sparse direct solvers - Symmetric & generalized symmetric eigensolvers - Applications include: Newton's method, Chemical Kinetics - cusolverRF - Sparse refactorization solver - Applications include: Chemistry, ODEs, Circuit simulation [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cusolver-11_0.png)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/CUDALibs/cusolver-11_0.png) Available now: cuSOLVERMp The NVIDIA cuSOLVERMp library is a high-performance, distributed-memory, GPU-accelerated library that provides tools for solving dense linear systems and eigenvalue problems. The library is available as a standalone download and is also included in the NVIDIA HPC SDK. [Download](https://developer.nvidia.com/cusolvermp-home) --- # Source: https://developer.nvidia.com/cusparse.md # cuSPARSE GPU library APIs for sparse computation. cuSPARSE host APIs provide GPU accelerated basic linear algebra routines, and cuSPARSELt host APIs provide structured sparsity support that leverages sparse tensor cores for GEMM. Sparsity is widely applicable in machine learning, AI, computational fluid dynamics, seismic exploration and computational sciences. ## cuSPARSE Host API [Download](https://developer.nvidia.com/hpc-sdk)[Documentation](https://docs.nvidia.com/cuda/cusparse/index.html) The cuSPARSE APIs provides GPU-accelerated basic linear algebra subroutines for sparse matrix computations for unstructured sparsity. cuSPARSE is widely used by engineers and scientists working on applications in machine learning, AI, computational fluid dynamics, seismic exploration, and computational sciences. cuSPARSE is included in both the [NVIDIA HPC SDK](https://developer.nvidia.com/hpc-sdk) and the [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads). ## cuSPARSELt Host API [Download](https://developer.nvidia.com/cusparselt-downloads)[Documentation](https://docs.nvidia.com/cuda/cusparselt/index.html) cuSPARSELt APIs offer 2:4 structured sparsity support via Sparse Tensor Core in Ampere and later architectures for GEneral Matrix Multiplications (GEMMs). cuSPARSELt APIs provide options for pruning and compression of sparse matrices, Activation functions, bias vectors, and output scaling for AI and deep learning use cases. * * * ## cuSPARSE Performance The cuSPARSE library is highly optimized for performance on NVIDIA GPUs, with SpMM performance 30-150X faster than CPU-only alternatives. ## cuSPARSE Key Features - Support for dense, COO, CSR, CSC, and Blocked CSR sparse matrix formats - Full suite of sparse routines covering sparse vector x dense vector operations, sparse matrix x dense vector operations, and sparse matrix x dense matrix operations. - Routines for sparse matrix x sparse matrix addition and multiplication - Generic high-performance APIs for sparse-dense vector multiplication (SpVV), sparse matrix-dense vector multiplication (SpMV), and sparse matrix-dense matrix multiplication (SpMM) - ILU0 and IC0 preconditioners [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/cusparse9_2.png)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/cusparse9_2.png) * * * ## Resources **Documentation:** - [cuSPARSE](https://docs.nvidia.com/cuda/cusparse/index.html) - [cuSPARSELt](https://docs.nvidia.com/cuda/cusparselt/index.html) **Samples:** - [cuSPARSE](https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSE) - [cuSPARSELt](https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt) **[Forums](https://forums.developer.nvidia.com/tags/c/accelerated-computing/gpu-accelerated-libraries/12/cusparse)** **[Feedback](mailto:Math-Libs-Feedback@nvidia.com)** --- # Source: https://developer.nvidia.com/cutensor.md # cuTENSOR ## Tensor Linear Algebra on NVIDIA GPUs NVIDIA cuTENSOR is a GPU-accelerated tensor linear algebra library for tensor contraction, reduction, and elementwise operations. With cuTENSOR, applications can harness the specialized tensor cores on NVIDIA GPUs to perform high-performance tensor computations and accelerate deep learning training and inference, computer vision, quantum chemistry, and computational physics workloads. [Download](https://developer.nvidia.com/cutensor-downloads) ## Resources: - [Documentation](https://docs.nvidia.com/cuda/cutensor/index.html) - [Samples](http://github.com/nvidia/cudalibrarysamples) - [Feedback](mailto:Math-Libs-Feedback@nvidia.com) - [Support](https://forums.developer.nvidia.com/c/accelerated-computing/gpu-accelerated-libraries/12) ## cuTENSORMp Available Now cuTENSORMp offers new capabilities for contracting large tensors across multiple GPUs and multiple nodes. cuTENSORMp scales nearly linearly on NVIDIA Grace Blackwell NVL72 systems, enabling tensor math researchers to scale their contractions and have more options for tradeoffs for performance and scale in both quantum chemistry and quantum computing simulations. ![A chart showing strong scaling for 2x 32 GB contractions with cuTENSORMp across NVL72](https://developer.download.nvidia.com/images/cuda-charts-cutensor-webpage-charts-sc25-4538751-v7-chart2.png "A chart showing strong scaling for 2x 32 GB contractions with cuTENSORMp across NVL72") Users can get excellent performance for both tensors that exceed the memory of single GPUs, and speeding up fixed size tensor contractions, which have been optimized for Grace Blackwell NVL72. [Read the cuTENSORMp Documentation](https://docs.nvidia.com/cuda/cutensor/latest/user_guide_cutensorMp.html) * * * ## cuTENSOR Performance The cuTENSOR library is highly optimized for performance on NVIDIA GPUs with support for DMMA, TF32, and now 3xTF32. ![A graph showing performance enhancements for various precisions with space and dense contractions](https://developer.download.nvidia.com/images/cutensor-performance-ari.png " A graph showing performance enhancements for various precisions with space and dense contractions") The latest version of cuTENSOR offers big speedups for workloads like Block-Sparse tensor contractions, driving significant performance improvements within the same hardware generation. [Read the cuTENSOR Documentation](https://docs.nvidia.com/cuda/cutensor/latest/index.html) ### cuTENSOR Key Features - Just-in-time compiled kernels for tensor contraction - Plan-based multi-stage APIs for all operations - Support for arbitrarily dimensional tensor descriptors - Support for 3xTF32 compute type - Support for block-sparse tensor contractions - Support for multi-process tensor contractions - Support for int64 extents - Tensor contraction, reduction, and elementwise operations - Mixed precision support - Expressive API allowing elementwise operation fusion ### Ready to get started with cuTENSOR? [Download](https://developer.nvidia.com/cutensor-downloads) --- # Source: https://developer.nvidia.com/cuvs.md 1. [AI](https://developer.nvidia.com/topics/ai) 2. [Generative AI](https://developer.nvidia.com/generative-ai) cuVS # NVIDIA cuVS NVIDIA cuVS is an open-source library for GPU-accelerated vector search and data clustering that enables faster vector searches and index builds. It supports scalable data analysis, enhances semantic search efficiency, and helps developers accelerate existing systems or compose new ones from the ground up. Integrated with key libraries and databases, cuVS also manages complex code updates as new NVIDIA architectures and NVIDIA® CUDA® versions are released, ensuring peak performance and seamless scalability. [Download Now](https://github.com/rapidsai/cuvs "Try Now")[Integrations](https://docs.rapids.ai/api/cuvs/nightly/integrations/ "Request License")[Documentation](https://docs.nvidia.com/cuvs/index.html#product-documentationBYzNyK3Q "Request License") ## How NVIDIA cuVS Works NVIDIA cuVS is designed to accelerate and optimize vector index builds and vector search for existing [databases](https://www.nvidia.com/en-us/glossary/vector-database/) and vector search libraries. It enables developers to enhance data mining and semantic search workloads, such as recommender systems and [retrieval-augmented generation](/topics/ai/retrieval-augmented-generation) (RAG). Built on top of the [NVIDIA CUDA software stack](https://docs.rapids.ai/api/cuvs/nightly/#cuvs-technology-stack), it contains many building blocks for composing vector search systems and exposes easy-to-use APIs for C, C++, Rust, Java, Python, and Go. ![NVIDIA cuVS accelerates and optimizes vector index builds and vector search](https://developer.download.nvidia.com/images/how-nvidia-cu-vs-works.svg) ### Introductory Blog Get an intro into accelerating vector search with cuVS, popular applications, and performance comparison of GPU-accelerated vector search indexes vs. CPU. [Read the Blog](/blog/accelerating-vector-search-using-gpu-powered-indexes-with-rapids-raft/ "Get Started") ### Getting Started Guide Understand the differences between vector search indexes and fully-fledged vector databases. [Get the Primer](https://docs.rapids.ai/api/cuvs/nightly/getting_started/ "Get Started") ### Notebooks Build [IVF-PQ](https://github.com/rapidsai/cuvs/blob/branch-25.02/notebooks/tutorial_ivf_pq.ipynb) index and use it to search approximate nearest neighbors (ANN) or learn how to run approximate nearest neighbor search using [cuVS IVF-Flat](https://github.com/rapidsai/cuvs/blob/branch-25.02/notebooks/ivf_flat_example.ipynb) algorithm. [Get Started on GitHub](https://github.com/rapidsai/cuvs/tree/branch-25.02/notebooks "Get Started") ### Examples Get access to drop-in samples to build a new application with cuVS, or use it in an existing project. See cuVS [installation docs](https://docs.rapids.ai/api/cuvs/stable/build.html#cuda-gpu-requirements). [Check Out on GitHub](https://github.com/rapidsai/cuvs/tree/branch-24.12/examples "Get Started") * * * ## Key Features ### GPU-Accelerated Indexing Algorithms Optimized GPU indexing enables high-quality index builds and low-latency search. cuVS delivers advanced algorithms for indexing vector embeddings, including exact, tree-based, and graph-based indexes. - ### **Real-Time Updates for Large Language Models (LLMs)** cuVS enables real-time updates to search indexes by dynamically integrating new embeddings and data without rebuilding the entire index. By integrating cuVS with LLMs, search results remain fresh and relevant. - ### **High-Efficiency Indexing** GPU indexing lowers cost compared to CPU-only workflows while maintaining quality at scale. Additionally, the ability to build large indexes out-of-core enables more flexible GPU selection and ultimately lower costs per gigabyte. - ### **### Scalable Index Building** For real-time applications and large-scale deployments, cuVS enables both scale-up and scale-out for index creation and search at a fraction of the time it takes on a CPU without compromising quality. ### GPU-Accelerated Search Algorithms cuVS transforms vector search by integrating optimized CUDA-based algorithms for approximate nearest neighbors and clustering, ideal for large-scale, time-sensitive workloads. - ### **### Real-Time Updates for Large Language Models (LLMs)** cuVS enables real-time updates to search indexes by dynamically integrating new embeddings and data without rebuilding the entire index. By integrating cuVS with LLMs, search results remain fresh and relevant. - ### **### Low-Latency Performance** cuVS provides ultra-fast response times for applications such as semantic search, where speed and accuracy are critical. Furthermore, support for binary, 8-, 16-, and 32-bit types means memory use is optimized for high-throughput applications. - ### **### High-Throughput Processing** GPUs handle hundreds of thousands of queries per second, making cuVS perfect for demanding use cases like machine learning, data mining, and real-time analytics. ## Get Started Select the right path to get started using cuVS. Integrate it into your existing vector search systems, pipelines, or applications and accelerate your semantic search for data mining use cases in production. ![Evaluate with cuVS Bench](https://developer.download.nvidia.com/icons/containerized-model.svg) ### Evaluate Start using cuVS as a benchmarking tool designed for reproducible comparisons of ANN search implementations, especially between GPU and CPU, by optimizing index configurations and analyzing performance across different hardware environments. [Start Evaluating With cuVS Bench](https://docs.rapids.ai/api/cuvs/nightly/cuvs_bench/) ![Download Library (GitHub)](https://developer.download.nvidia.com/icons/m48-speech-recognition.svg) ### Develop NVIDIA cuVS is available on GitHub with end-to-end examples and an automated tuning guide. Access the source code to get started. [Download Library (GitHub)](https://github.com/rapidsai/cuvs) ![Launch Through Integrations](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) ### Launch cuVS can be used as a standalone library or deployed through a number of SDK and vector database integrations like FAISS, Milvus, Lucene, Kinetica, and more. [Launch Through Integrations](https://docs.rapids.ai/api/cuvs/nightly/integrations/) * * * ## Performance—World's Fastest Vector Search NVIDIA cuVS exploits the parallel architecture of NVIDIA GPUs, allowing for easy deployment of popular and performance-critical algorithms. GPU-acceleration of vector similarity search [sets benchmark records](https://github.com/harsha-simhadri/big-ann-benchmarks/blob/main/neurips21/t3/LEADERBOARDS.md#public-dataset-leaderboards-and-winners) for large-scale, high-performance solutions. ### 21x Faster Indexing Lower is Better. ![A chart showing 21X faster indexing on GPU vs CPU in the cloud](https://developer.download.nvidia.com/images/cuvs/index-build-performance-3691350.svg) _Time to build an index on GPU (8x A10g) vs CPU (Intel Ice Lake) in the cloud (AWS), reducing from hours to minutes._ ### 12.5x Lower Cost Lower is Better. ![A chart showing 12.5X lower cost indexing on GPU vs CPU in the cloud](https://developer.download.nvidia.com/images/cuvs/index-build-cost-3691350.svg) _Cost to build an index on the GPU (8x A10g) vs CPU (Intel Ice Lake) in the cloud (AWS)._ ### 29x Higher Throughput Higher is Better. ![A chart showing 34X higher throughput on GPU vs CPU](https://developer.download.nvidia.com/images/cuvs/search-throughput-3691350.svg) _Number of vectors that can be queried per second on a GPU (H100) vs CPU (Intel Xeon Platinum 8470Q) when submitted 10,000 at a time._ ### 11x Lower Latency Lower is Better. ![A chart showing 11X lower latency on GPU vs CPU](https://developer.download.nvidia.com/images/cuvs/search-latency-3691350.svg) _Average time to process each query on a GPU (H100) vs CPU (Intel Xeon Platinum 8470Q) when submitted one at a time._ * * * ## Starter Kits for NVIDIA cuVS Start accelerating your libraries, databases, and applications with cuVS by accessing tutorials, notebooks, forums, release notes, and comprehensive documentation. ### For Library Development cuVS provides easy-to-use Python APIs, which enable straightforward integration into libraries for data mining and analysis. cuVS is also integrated into the popular FAISS library for CPU and GPU interoperability. - [Explore Example Python Notebooks](https://github.com/rapidsai/cuvs/tree/branch-24.12/notebooks) - [Read the Getting Started Guide](https://docs.rapids.ai/api/cuvs/nightly/getting_started/) - [Read API Documentation](https://docs.rapids.ai/api/cuvs/nightly/api_docs/) ### For Database Development cuVS building blocks are built in C++ and wrapped in popular languages like C, Python, Rust, Java, and Go, making them easy to integrate into existing databases and vector indexing tools. - [Explore Example Projects and Code](https://github.com/rapidsai/cuvs/tree/branch-24.12/examples) - [Read the Getting Started Guide](https://docs.rapids.ai/api/cuvs/nightly/getting_started/) - [Try the Reproducible Benchmarking Tool](https://docs.rapids.ai/api/cuvs/nightly/cuvs_bench/) ### For Application Development cuVS can be used directly or through several database and library integrations to supercharge your applications and workflows with GPU acceleration. - [Try the NVIDIA AI Blueprint for RAG](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) - [Read the Getting Started Guide](https://docs.rapids.ai/api/cuvs/nightly/getting_started/) - [Explore Library and Database Integrations](https://docs.rapids.ai/api/cuvs/nightly/integrations/) - [Learn About Vector Databases vs. Databases](https://docs.rapids.ai/api/cuvs/nightly/vector_databases_vs_vector_search/) * * * ## Learning Library * * * ## Ecosystem [ ![NVIDIA cuVS Ecosystem Partner - Datastax](https://developer.download.nvidia.com/images/cuvs/logo-datastax.svg) ](https://www.datastax.com/blog/datastax-ai-paas-integrated-with-nvidia-nemo-retriever) [ ![NVIDIA cuVS Ecosystem Partner - Elastic](https://developer.download.nvidia.com/images/cuvs/logo-elastic.svg) ](https://www.nvidia.com/gtc/session-catalog/?tab.catalogallsessionstab=16566177511100015Kus&search=lucene#/session/1725488451726001Mz27) [ ![NVIDIA cuVS Ecosystem Partner -](https://developer.download.nvidia.com/images/cuvs/logo-faiss.svg) ](https://docs.rapids.ai/api/cuvs/nightly/integrations/faiss/) [ ![NVIDIA cuVS Ecosystem Partner - Kinetica](https://developer.download.nvidia.com/images/cuvs/logo-kinetica.svg) ](https://docs.rapids.ai/api/cuvs/nightly/integrations/kinetica/) [ ![NVIDIA cuVS Ecosystem Partner - Lucene](https://developer.download.nvidia.com/images/cuvs/logo-apache-lucene.svg) ](https://docs.rapids.ai/api/cuvs/nightly/integrations/lucene/) [ ![NVIDIA cuVS Ecosystem Partner - Milvus](https://developer.download.nvidia.com/images/cuvs/logo-milvus.svg) ](https://docs.rapids.ai/api/cuvs/nightly/integrations/milvus/) [ ![NVIDIA cuVS Ecosystem Partner - OpenSearch](https://developer.download.nvidia.com/images/cuvs/logo-open-search.svg) ](https://opensearch.org/blog/GPU-Accelerated-Vector-Search-OpenSearch-New-Frontier/) [ ![NVIDIA cuVS Ecosystem Partner - Solr](https://developer.download.nvidia.com/images/cuvs/logo-solr.svg) ](https://www.nvidia.com/gtc/session-catalog/?tab.catalogallsessionstab=16566177511100015Kus&search=lucene#/session/1725488451726001Mz27) [ ![NVIDIA cuVS Ecosystem Partner - Weaviate](https://developer.download.nvidia.com/images/cuvs/logo-weaviate.svg) ](https://www.nvidia.com/gtc/session-catalog/?tab.catalogallsessionstab=16566177511100015Kus&search=weaviate#/session/1725741930792001U8IR) * * * ## More Resources ![NVIDIA Developer Newsletter](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for Developer Newsletter ![NVIDIA Training and Certification](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Training and Certification ![NVIDIA Developer Program](https://developer.download.nvidia.com/images/cuvs/developer-1.svg) ### Join the NVIDIA Developer Program ### Get Started With NVIDIA cuVS Today [Download Now](https://github.com/rapidsai/cuvs) --- # Source: https://developer.nvidia.com/cv-cuda.md Learn from the best in the field with our exclusive computer vision speaker series. [Register now](https://www.nvidia.com/en-us/events/computer-vision-speaker-series/?nvid=nv-int-unbr-911558) for free! # CV-CUDA CV-CUDA™ is an open-source library that enables building high-performance, GPU-accelerated pre- and post-processing for [AI computer vision](/computer-vision) applications in the cloud at reduced cost and energy. [Download on GitHub](https://github.com/CVCUDA/CV-CUDA) [Share Your Use Case](/cv-cuda/early-access) * * * ![AI Computer Vision](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/cv-cuda-850x480.jpg "AI Computer Vision") ## Use Cases Explore common use cases with AI imaging and computer vision workloads deployed at scale in the cloud. ### Image Understanding Image understanding involves AI algorithms interpreting and processing visual data to recognize patterns, objects, and context, paving the way for applications like facial recognition, medical imaging, and scene understanding. [Learn More](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/collections/tao_computervision) ### Generative AI Generative AI algorithms produce new content or data, imitating the patterns they learn from existing data, enabling tas ks like image creation, text generation, and style transfer. [Learn More](https://www.nvidia.com/en-us/research/ai-playground/) ### 3D Worlds 3D worlds are digital environments that represent space in three dimensions, offering immersive experiences for users in gaming, simulations, and virtual reality platforms. [Learn More](https://www.nvidia.com/en-us/omniverse/) ### HD Mapping HD mapping creates highly detailed digital representations of the physical world, essential for the precision and decision-making processes of autonomous vehicles. [Learn More](https://www.nvidia.com/en-us/self-driving-cars/hd-mapping/) * * * ## Key Features CV-CUDA provides a specialized set of 45+ highly performant computer vision and image processing operators. CV-CUDA also offers: - C, C++, and Python APIs - Batching support, with variable shape images - Zero-copy interfaces to deep learning frameworks like [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) - An NVIDIA Triton™ Inference Server example using CV-CUDA and NVIDIA® [TensorRT™](https://developer.nvidia.com/tensorrt) - End-to-end GPU-accelerated object detection, segmentation, and classification examples View a full list of the operators in the [CV-CUDA documentation](https://cvcuda.github.io/). ![CV-CUDA Library](https://developer.download.nvidia.com/images/cv-cuda-stack-630x600.jpg "CV-CUDA Library") * * * ## CV-CUDA Benefits ![Computer Vision Cloud Applications](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/m48-edge-computing(1).svg "Computer Vision Cloud Applications") ### Specialized Set of Kernels for Cloud-Based Use Cases ![Computer Vision Operators](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/m48-workflow-complex.svg "Computer Vision Operators") ### Efficient, Hand-Optimized Kernels That Save Cost and Energy ![CV-CUDA Integration](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/m48-fleet-command.svg "CV-CUDA Integration") ### Lightweight and Flexible for Integrating Into Frameworks * * * ## Up to 49X End-to-End Throughput Improvement CV-CUDA lets you move your bottlenecked pre- and post-processing pipelines from the CPU to the GPU, boosting throughput for complex workflows. For a typical video segmentation pipeline, CV-CUDA enabled an end-to-end 49X speedup using NVIDIA [L4 Tensor Core GPUs](https://www.nvidia.com/en-us/data-center/l4/). With the latest and most efficient NVIDIA GPUs and CV-CUDA, developers of cloud-scale applications can save tens to hundreds of millions in compute costs and eliminate thousands of tons in carbon emissions. ### Video Segmentation Pipeline (End-to-End) 1080p, 30fps * * * ## Interoperability CV-CUDA is interoperable with the following libraries, SDKs, and frameworks. - [nvJPEG libraries](https://developer.nvidia.com/nvjpeg) - [Video Codec](https://developer.nvidia.com/video-codec-sdk) - [Video Processing Framework (VPF)](https://github.com/NVIDIA/VideoProcessingFramework) - [TAO Toolkit](https://developer.nvidia.com/tao-toolkit) - [TensorRT](https://developer.nvidia.com/tensorrt#resources) - [Triton Inference Server](https://developer.nvidia.com/triton-inference-server) - [PyTorch](https://pytorch.org/) - [TensorFlow](https://www.tensorflow.org/) ## Global Industry Adoption From content understanding to visual search and generative AI, customers are adopting CV-CUDA for their AI computer vision use cases. ![NVIDIA Partner for CV-CUDA](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/baidu-logo.svg "NVIDIA Partner for CV-CUDA") ![Microsoft- NVIDIA Partner for CV-CUDA](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/logo-bing.svg "NVIDIA Partner for CV-CUDA") ![Runway- NVIDIA Partner for CV-CUDA](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/runway-logo(2).svg "NVIDIA Partner for CV-CUDA") ![NVIDIA Partner for CV-CUDA](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/tencent-cloud-logo.svg "NVIDIA Partner for CV-CUDA") * * * ## In the News ![Computer Vision Segmentation Pipeline](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/reducing-cost-for-ai-based-computer-vision-630x354.jpg "Computer Vision Segmentation Pipeline") ### Increasing Throughput and Reducing Cost for AI-Based Computer Vision With CV-CUDA CV-CUDA enables real-time, high-performance cloud-scale applications with demands for lower latency and higher throughput. [Read how CV-CUDA increases throughput while also reducing both cost and energy consumption](/blog/increasing-throughput-and-reducing-costs-for-computer-vision-with-cv-cuda/) ![CV-CUDA for Visual Search](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/adopting-cv-cuda-630x354.jpg "CV-CUDA for Visual Search") ### NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI CV-CUDA is helping customers build and scale AI-based imaging and computer vision pipelines. [Read how industry leaders are adopting CV-CUDA](https://blogs.nvidia.com/blog/2023/03/21/cv-cuda-ai-computer-vision/) ![CV-CUDA Video Application](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/cv-cuda/computer-vision-cloud-applications-630x354.jpg "CV-CUDA Video Application") ### NVIDIA Introduces Open-Source Project to Accelerate Computer Vision Cloud Applications CV-CUDA combines accelerated image pre- and post-processing algorithms and tools to process higher image throughput and lower cloud computing cost. [Read how CV-CUDA can accelerate pre- and post-processing pipelines](https://blogs.nvidia.com/blog/2022/09/20/computer-vision-cloud/) * * * ## Videos and Webinars ## Additional Resources - [Watch Webinar: Overcoming Pre- and Post-Processing Bottlenecks in AI Computer Vision Pipelines](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51182/)(42:27 Minutes) - [See the Difference CV-CUDA Makes: Runway Optimizes AI Image and Video Generation Tools With CV-CUDA](https://www.youtube.com/watch?v=cxS4pT16_XQ)(01:18 Minutes) - [Review Documentation: CV-CUDA Developer Guide](https://github.com/CVCUDA/CV-CUDA/blob/release_v0.2.x/DEVELOPER_GUIDE.md) Help make CV-CUDA better by sharing your feedback and how you’re using it. We may follow up with you to continue the conversation. [Share Your Use Case](/cv-cuda/early-access) You must be a member of the [NVIDIA Developer Program](/developer-program) and be logged in with your organization’s email address. We don’t accept applications from personal email accounts. --- # Source: https://developer.nvidia.com/dcgm.md 1. [Home](/) NVIDIA DGCM # NVIDIA DCGM NVIDIA Data Center GPU Manager (DCGM) is a suite of tools for managing and monitoring NVIDIA Datacenter GPUs in cluster environments. It includes active health monitoring, comprehensive diagnostics, system alerts, and governance policies including power and clock management. Infrastructure teams can use it standalone and in addition easily integrate it into cluster management tools, resource scheduling, and monitoring products from NVIDIA partners. DCGM simplifies GPU administration in the data center, improves resource reliability and uptime, automates administrative tasks, and helps drive overall infrastructure efficiency. DCGM supports Linux operating systems on x86\_64, and aarch64 (sbsa) platforms. The installer packages include libraries, binaries, and source examples for using the API (C and Python). In addition, [Go bindings](https://github.com/NVIDIA/go-dcgm) are available via the open-source GitHub repository . Please refer to the documentation for additional details and instructions. DCGM also integrates into the Kubernetes ecosystem using DCGM-Exporter to provide rich GPU telemetry in containerized environments. DCGM has an open-core architecture - the foundational libraries and building blocks are available as open source on GitHub but at the same time certain blocks such as diagnostics and tests remain proprietary. [Documentation ](https://docs.nvidia.com/datacenter/dcgm/latest/index.html "Github Repo")[DCGM GitHub](https://github.com/NVIDIA/DCGM "Github Repo")[DCGM-Exporter GitHub ](https://github.com/NVIDIA/gpu-monitoring-tools "Github Repo")[DCGM Go Binding ](https://github.com/NVIDIA/go-dcgm "Github Repo") * * * ## How NVIDIA DCGM Works? NVIDIA Data Center GPU Manager (DCGM) is a powerful tool designed to manage and monitor NVIDIA GPUs in data centers, ensuring optimal performance and reliability. By utilizing DCGM, administrators can easily track the health, performance, and utilization of their GPU resources. The process begins with the installation of the DCGM on each server node that houses NVIDIA GPUs. At the core of DCGM is the libdcgm.so library. DCGM can be utilized either by initiating a service through HostEngine, which functions as a wrapper around the library, or by developing a standalone application that directly incorporates the library. When operated as a service, DCGM offers two interfaces for user interaction: dcgmi and the DCGM Exporter. The dcgmi interface is equipped with commands designed to manage and monitor GPU performance and health, making it an ideal tool for administrators who prefer a command-line approach. In contrast, the DCGM Exporter is tailored for cluster-level monitoring within native Kubernetes environments. It exports GPU metrics and health data for real-time monitoring and alerting, thus providing a comprehensive overview of the GPU cluster's status. DCGM also includes active and passive diagnostics for Nvidia hardware. Administrators can access this data through a user-friendly interface or via a command-line tool, enabling them to set up alerts for any irregularities or performance issues. By proactively identifying potential problems and optimizing GPU performance, NVIDIA DCGM plays a crucial role in maintaining the efficiency and reliability of data center operations. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/datacenter/dcgm-icon.png) * * * ## Key Benefits ### GPU Diagnostics and System Validation Effectively identify failures, performance degradations, power inefficiencies and their root causes. ### GPU Telemetry Gather rich set of GPU telemetry to explain job behavior, identifying opportunities to drive utilization and efficiencies, and determining root causes of potential application performance issues. ### Active GPU Health Monitoring Use low-overhead, non-invasive health monitoring while jobs run without impacting application behavior and performance. ### Integration with Management Ecosystem Easily deploy a DCGM based monitoring solution in a Kubernetes cluster environment. Out of the box integration with various ISV solutions such as Bright Cluster Manager, IBM Spectrum LSF and open-source tools such as Prometheus, collected. * * * ## NVIDIA DCGM Resources ### Blog Posts - [Monitoring GPUs in Kubernetes with DCGM](https://developer.nvidia.com/blog/monitoring-gpus-in-kubernetes-with-dcgm/) - [Job Statistics with NVIDIA Data Center GPU Manager and Slurm](https://devblogs.nvidia.com/job-statistics-nvidia-data-center-gpu-manager-slurm/) - [Setting Up GPU Telemetry with NVIDIA Data Center GPU Manager ](https://devblogs.nvidia.com/gpu-telemetry-nvidia-dcgm/) - [NVIDIA Data Center GPU Manager Simplifies Cluster Administration](https://devblogs.nvidia.com/nvidia-data-center-gpu-manager-cluster-administration/) ### Documentation - [DCGM Documentation](https://docs.nvidia.com/datacenter/dcgm) ### Recorded Talks GTC 2018 Talk: - GTC 2018 Talk: [GPU Monitoring and Management with NVIDIA Data Center GPU Manager](http://on-demand.gputechconf.com/gtc/2018/presentation/s8505-gpu-monitoring-and-management-with-nvidia-data-center-gpu-manager-dcgm-v2.pdf) * * * ## Get started with NVIDIA DCGM today [Get Started](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html#) --- # Source: https://developer.nvidia.com/deepstream-sdk.md 1. [Home](/) [Deep Learning](https://developer.nvidia.com/deep-learning) [Deep Learning Software](https://developer.nvidia.com/deep-learning-software) DeepStream SDK Quick Links - [Get Started](/deepstream-sdk) - [Download DeepStream](https://catalog.ngc.nvidia.com/orgs/nvidia/collections/deepstream_sdk) * * * # NVIDIA DeepStream SDK NVIDIA DeepStream’s multi-platform support gives you a faster, easier way to develop and deploy real-time video streaming pipelines for generative AI agents and applications. You can even deploy them on premises, at the edge, and in the cloud with just the click of a button. [Get Started](https://developer.nvidia.com/deepstream-getting-started "Get Started")[Download DeepStream](https://catalog.ngc.nvidia.com/orgs/nvidia/collections/deepstream_sdk "Download DeepStream") * * * ## What is NVIDIA DeepStream? The NVIDIA DeepStream SDK is a comprehensive real-time streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding. It’s ideal for developers, software partners, startups, and OEMs building vision AI agents and applications and services for a wide range of industries like smart cities, retail, manufacturing, and more. You can now create and deploy stream-processing pipelines that incorporate generative AI and other complex processing tasks like multi-camera tracking in minutes. To further accelerate development, DeepStream is also part of the [NVIDIA Metropolis Blueprint for Video Search and Summarization (VSS)](https://build.nvidia.com/nvidia/video-search-and-summarization/nim). This sample architecture for building visual AI agents can extract valuable insights from massive volumes of industrial video sensor data in real time. DeepStream is an integral part of [NVIDIA Metropolis](https://www.nvidia.com/en-us/autonomous-machines/intelligent-video-analytics-platform/), the platform for building end-to-end vision AI agents and applications that transform pixel and sensor data into actionable insights. ![What is DeepStream and how does the software stack look like](https://developer.download.nvidia.com/images/deepstream/metropolis-deepstream-vision-ai-edge.jpg) _DeepStream is an integral part of NVIDIA Metropolis, the platform for building end-to-end services and solutions that transform pixel and sensor data to actionable insights._ * * * ## Benefits ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deepstream/m48-configuration-sdk-256px-blk.png) ### Rapidly Deploy AI From the Cloud to the Edge The DeepStream SDK provides a complete video stream processing, ingestion, multi-camera, tracking pipeline that’s 100% NVIDIA GPU-accelerated. It’s ideal for a wide range of use cases across industries such as manufacturing, logistics, retail, and more. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deepstream/m48-microservices-2-256x-blk.png) ### Reduce Development Time to Minutes DeepStream Inference Builder simplifies the development process with declarative application definitions and easy addition of API endpoints for your application. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deepstream/m48-speed-256px-blk.png) ### Real-Time Insights Extract rich metadata in real time from sensor data such as images, video, and lidar. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deepstream/m48-edge-computing-256px-blk.png) ### Achieve the Lowest Total Cost of Ownership With NVIDIA GPUs Increase stream density, maximize performance, and minimize TCO by deploying AI models with DeepStream on NVIDIA hardware. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deepstream/m48-complexity-1-256px-blk.png) ### Multiple Programming Options Create powerful vision AI applications using C/C++ and Python. * * * ## Unique Capabilities # Accelerate Vision AI Development With 40+ GPU-Accelerated Plug-Ins DeepStream kickstarts the development of seamless real-time streaming pipelines for AI-based video, audio, and image analytics. It ships with 40+ hardware-accelerated plug-ins and 30+ sample applications and extensions to optimize pre/post processing, inference, multi-camera tracking, message brokers, and more. **Inference Builder** turns AI model ideas into production-ready services with ease. Developers can now go from YAML config to deployment in minutes—automating data flow, preprocessing, model execution, and API integration. **DeepStream Service Maker** simplifies the development process by abstracting the complexities of GStreamer to easily build C++ object-oriented applications. Use Service Maker to build complete DeepStream pipelines with a few lines of code **DeepStream Libraries** powered by NVIDIA® CV-CUDA™, NvImageCodec, and PyNvVideoCodec offer low-level GPU-accelerated operations to optimize pre- and post- stages of vision AI pipelines. [Learn More](https://github.com/NVIDIA-AI-IOT/inference_builder) [![A diagram showing how DeepStream inference builder works to accelerate vision AI.](https://developer.download.nvidia.com/images/deepstream/deepstream-diagram-ari.jpg)](https://developer.download.nvidia.com/images/deepstream/deepstream-diagram-ari.jpg) _Click to Enlarge_ #### Enable Multi-Camera Tracking Across a Range of Cameras Multiview 3D tracking (MV3DT), an extension of DeepStream NvTracker, enables distributed, real-time 3D tracking across networks of cameras. It works seamlessly with both 2D and 3D detectors, supporting a wide range of use cases. DeepStream automatically assigns unique IDs for new objects, preserving identity through occlusions and handovers. For precise multi-camera tracking, DeepStream includes a new calibration tool that aligns multiple cameras to the deployment floor plan simultaneously. This reduces manual effort and ensures consistent, accurate results. [Learn More](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_MV3DT.html) # Build End-to-End AI Solutions Speed up overall development efforts and unlock greater real-time performance by building end-to-end vision AI applications with NVIDIA Metropolis. Start with production-quality vision AI models, adapt and optimize them with the NVIDIA TAO Toolkit, and deploy using DeepStream. Use the Metropolis VSS Blueprint to build visual AI agents that can process thousands of live videos simultaneously to drive insights and automation. Get incredible flexibility—from rapid prototyping to full production-level solutions—and choose your inference path. With native integration to NVIDIA [Triton™ Inference Server](/nvidia-triton-inference-server), you can deploy models in native frameworks such as PyTorch and TensorFlow for inference. For high-throughput inference, use NVIDIA TensorRT to achieve the best possible performance. [Learn More](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_TAO_integration.html) [![DeepStream helps developers build seamless streaming pipeline for AI based video analytics](https://developer.download.nvidia.com/images/deepstream/end-to-end-vision-ai-development-ari.jpg)](https://developer.download.nvidia.com/images/deepstream/end-to-end-vision-ai-development-ari.jpg) _Click to Expand_ [![DeepStream helps developers build seamless streaming pipeline for AI based video analytics](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deepstream/embedded-deepstream-sdk3.jpg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deepstream/embedded-deepstream-sdk3.jpg) _Click to Expand_ #### Enjoy Seamless Development From Edge to Cloud DeepStream’s off-the-shelf containers let you build once and deploy anywhere—on clouds, workstations with NVIDIA GPUs, or NVIDIA Jetson™ devices. With the DeepStream Container Builder and NGC containers, you can easily create scalable, high-performance AI applications managed with Kubernetes and Helm. DeepStream REST-APIs also let you manage multiple parameters at run-time, simplifying the creation of SaaS solutions. With a standard REST-API interface, you can build web portals for control and configuration or integrate into your existing applications. [Learn More](https://docs.nvidia.com/metropolis/deepstream/dev-guide/) # Get Production-Ready DeepStream is available as a part of NVIDIA AI Enterprise, an end-to-end, secure, cloud-native AI software platform optimized to accelerate enterprises to the leading edge of AI. NVIDIA AI Enterprise delivers validation and integration for NVIDIA AI open-source software, access to AI solution workflows to speed time to production, certifications to deploy AI everywhere, and enterprise-grade support, security, and API stability to mitigate the potential risks of open-source software. [Learn More](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/) ![A collage of images showing DeepStream as a part of NVIDIA AI Enterprise to help deploy AI anywhere](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-ai-enterprise-630x354_1.jpg) * * * ## Explore Multiple Programming Options ### Inference Builder Streamline the entire workflow with a simple YAML configuration—automating data flow, preprocessing, model execution, and API integration. [Learn More About Inference Builder](https://github.com/NVIDIA-AI-IOT/inference_builder) ### Python Construct DeepStream pipelines using Gst Python, the GStreamer framework's Python bindings. The source code for the binding and Python sample applications are available on GitHub. [Learn More About Python](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_service_maker_python.html) ### C/C++ Create applications in C/C++, interact directly with GStreamer and DeepStream plug-ins, and use reference applications and templates. [Learn More About C/C++](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_C_Sample_Apps.html) * * * ## Improve Accuracy and Real-Time Performance DeepStream offers exceptional throughput for a wide variety of object detection, image processing, and instance segmentation AI models. The following table shows the end-to-end application performance from data ingestion, decoding, and image processing to inference. It takes multiple 1080p/30fps streams as input. Note that running on the DLAs for Jetson devices frees up the GPU for other tasks. For performance best practices, [watch this video tutorial](https://www.youtube.com/watch?v=Or8vfydL69s&feature=youtu.be). | Foundation Model | Tracker | Precision | Jetson Thor | L40S | A6000 | B200 | RTX PRO WS | RTX PRO SE | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | C-RADIO-Base | No Tracker | FP16 | 1242 | 2880 | 2568 | 8352 | 3025 | 3864 | | NV-DinoV2-Large | No Tracker | FP16 | 412 | 797 | 616 | 3552 | 895 | 1330 | | RT-DETR + C-RADIO-Base | No Tracker | FP16 | 194 | 609 | 546 | 1888 | 630 | 920 | | RT-DETR + C-RADIO-Base | NvDCF | FP16 | 161 | 618 | 534 | 1824 | 627 | 848 | | RT-DETR + C-RADIO-Base | MV3DT | FP16 | 183 | 343 | 497 | 1128 | 598 | 624 | | TrafficCamNet Transformer Lite | NvDCF | FP16 | 157 | 685 | 584 | 1200 | 785 | 928 | | Peoplenet (2.6.3) | MV3DT | FP16 | 617 | 486 | 552 | 4320 | 860 | 852 | | Grounding-DINO | No Tracker | FP16 | 24 | 98 | 101 | 208 | 132 | 158 | | SegFormer + C-RADIO-Base | No Tracker | FP16 | 135 | 870 | 884 | 1508 | 157 | 1060 | | Mask2Former + SWIN | No Tracker | FP16 | 26 | 94 | 63 | 173 | 68 | 102 | The DeepStream SDK lets you apply AI to streaming video and simultaneously optimize video decode/encode, image scaling, and conversion and edge-to-cloud connectivity for complete end-to-end performance optimization. To learn more about the performance using DeepStream, check the [documentation](https://www.google.com/url?q=https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Performance.html&sa=D&source=docs&ust=1715369000151037&usg=AOvVaw0yBjmmdaH_95_HgwtP21jm). * * * ## Read Customer Stories ![Industry.AI Customer Story](https://developer.download.nvidia.com/images/deepstream/bengaluru-airport-ai-ari.jpg) ### Optimizing Operations at Bengaluru Airport Industry.AI used the NVIDIA Metropolis stack, including DeepStream, to increase the safety and efficiency of the airport. Using vision AI, it was able to track abandoned baggage, flag long passenger queues, and alert security teams of potential issues. [Read the Blog](https://blogs.nvidia.com/blog/bengaluru-airport-vision-ai/) ![KoiReader Customer Story](https://developer.download.nvidia.com/images/deepstream/koireader-distribution-center-operation-ari.jpg) ### Enhancing Distribution Center Operation KoiReader developed an AI-powered machine vision solution using NVIDIA developer tools that included the DeepStream SDK to help PepsiCo achieve precision and efficiency in dynamic distribution environments. [Learn More About KoiReader](https://blogs.nvidia.com/blog/pepsi-koivision/) ![YMA Customer Story Please take the image from the video](https://developer.download.nvidia.com/images/deepstream/ai-smart-spaces-ari.jpg) ### Scaling AI-Powered Smart Spaces FYMA used NVIDIA DeepStream and NVIDIA Triton™ to improve AI-powered space analytics with frame rates exceeding previous benchmarks by 10x and accuracy by 3x. [Learn More](https://www.youtube.com/watch?v=GPsQAKq02lc) * * * ## General FAQ DeepStream is a closed-source SDK. Note that sources for all reference applications and several plugins are available. DeepStream Inference Builder will be open -source and available on [GitHub](https://github.com/NVIDIA-AI-IOT/inference_builder). The DeepStream SDK can be used to build end-to-end AI-powered applications to analyze video and sensor data. Some popular use cases are retail analytics, parking management, managing logistics, optical inspection, robotics, and sports analytics. See the [Platforms and OS compatibility table](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Quickstart.html#platform-and-os-compatibility). Yes, that’s now possible with the integration of the Triton Inference server. Also with DeepStream 6.1.1, applications can communicate with independent/remote instances of Triton Inference Server using gPRC. DeepStream supports several popular models out of the box. For instance, DeepStream supports all NVIDIA TAO models and ships with an example to run YOLO models. Yes, DeepStream 8.0 or later supports NVIDIA Blackwell architecture. Yes, audio is supported with DeepStream SDK. To get started, download the software and review the reference audio and Automatic Speech Recognition (ASR) applications. Learn more by reading the [ASR DeepStream Plugin](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvdsasr.html) Build high-performance vision AI apps and services using the DeepStream SDK. [Get Started](/deepstream-getting-started "Get Started with DeepStream SDK") --- # Source: https://developer.nvidia.com/industries/manufacturing/developer-resources-robotics-and-edge-ai-applications.md # Developer Resources for Robotics and Edge AI Applications **A hub of news, SDKs, technical resources, and more.** ## App Frameworks and SDKs #### Robotics and Automation [NVIDIA Isaac™ Sim](/isaac-sim) built on [NVIDIA Omniverse™](/nvidia-omniverse-platform) provides access to NVIDIA PhysX and RTX ray-tracing technologies for photo-realistic robotic simulations for training and testing in virtual industrial environments. It includes enhanced synthetic data generation capabilities and supports ROS/ROS2 applications. [Get Started with ISAAC SIM](/isaac-sim) [NVIDIA Isaac SDK™](/isaac-sdk) enables you to build and deploy commercial-grade, AI-powered robots. The toolkit includes building blocks and tools that accelerate robot developments that require the increased perception, navigation, and manipulation features enabled by AI. [Explore ISAAC SDK](/isaac-sdk) #### Edge AI and IoT NVIDIA JetPack™ SDK is a comprehensive solution for building Edge AI applications. All NVIDIA® Jetson™ hardware modules and developer kits are supported by JetPack SDK. Jetson is a complete System on Module (SOM), and comes with various power, performance, and form factors to help you build autonomous and intelligent machines according to your needs. [Get Started with JETPACK ](/embedded/jetpack) [NVIDIA Fleet Command](https://www.nvidia.com/en-us/data-center/products/fleet-command/) is a hybrid-cloud platform for centralized management at scale at the edge. From one control plane, anyone with a browser and internet connection can deploy applications, update software over the air, and monitor location health. [Learn About Fleet Command ](https://www.nvidia.com/en-us/data-center/products/fleet-command/) #### AI-Powered Computer Vision Pre-trained models for computer vision and the [TAO Toolkit](deepstream-sdk) TAO Toolkit enable quick development and adaptation of highly accurate AI models for robotics and autonomous machine use cases. [Access Pre-Trained Models](/transfer-learning-toolkit) [NVIDIA DeepStream](/deepstream-sdk) lets you build and deploy AI-powered intelligent video analytics apps and services across a broad range of automation applications. [Learn About the DEEPSTREAM SDK ](/deepstream-sdk) #### TensorRT NVIDIA® TensorRT™ is an open-source platform for high-performance deep learning inference, which includes an inference optimizer and runtime that delivers low latency and high throughput for your healthcare applications. [Learn About TensorRT](/tensorrt) #### High-Performance Computing The[NVIDIA HPC Software Development Kit](https://developer.nvidia.com/hpc-sdk) NVIDIA HPC Software Development Kit (SDK) includes the proven compilers, libraries, and software tools essential to maximizing developer productivity and the performance and portability of HPC applications. [Learn About The HPC SDK](/hpc-sdk) #### NVIDIA® Triton Inference Server Simplifies the deployment of AI models at scale in production. It is an open source inference serving software that lets teams deploy trained AI models from any framework on any GPU- or CPU-based infrastructure (cloud, data center, or edge). [Learn More](/nvidia-triton-inference-server) #### Real-Time Collaboration [NVIDIA Omniverse™](/nvidia-omniverse-platform) powers real-time collaboration in a single, interactive, simulated environment and instant photorealistic output to ultra-high-fidelity, ray-traced, and path-traced rendering with one click. [Get Started with OMNIVERSE](/nvidia-omniverse-platform) #### Conversational AI, Pose and Gaze Estimation NVIDIA Riva is an SDK for building and deploying AI applications that fuse vision, speech, and other sensors. It offers a complete workflow to build, train, and deploy GPU-accelerated AI systems that can use visual cues, such as gestures and gaze, with speech in context. Pre-trained models and scripts used in Riva are freely available in the[NVIDIA® NGC™ catalog.](https://ngc.nvidia.com/catalog) [Apply For Eearly Access](/riva) * * * ## On Demand Talks #### Sim-to-Real in Isaac Sim Train and test robots in virtual environments with Isaac Sim on Omniverse, then transfer to a physical Jetson-powered robot. This webinar, presented at GTC 2021, will review the latest features offered in Isaac Sim. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31824/?playlistId=playList-7415cf83-1f9c-4297-a90d-c821c4d2d478) #### Building Robotics Applications Using NVIDIA Isaac SDK This session covers the NVIDIA Isaac SDK robotics platform and its benefits for developers, including the engine, perception GEMS (packages with high-performance robotics algorithms), and workflows and what’s coming in the next release. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/gtcfall20-a21856/) #### Create Gesture-Based Interactions with a Robot Learn how to train your own gesture recognition deep learning pipeline. We’ll start with a pre-trained detection model, repurpose it for hand detection, and use it together with the purpose-built gesture recognition model. [Watch Now](https://developer.nvidia.com/gtc/2020/video/s21690-vid) #### Accelerating Vision AI Applications Using NVIDIA TAO Toolkit and Pre-trained Models Learn how to make the process of creating vision AI models quicker and easier with NVIDIA TAO Toolkit and pre-trained models. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/gtcfall20-a21333/) #### How to Train Your Robot: A Simulation-First Approach This session covers BMW Group’s research and development in one of their most recent applications: the Smart Transport Robot (STR), focusing on the simulation environment and how it’s involved in the development process of the STR. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/gtcfall20-a21825/) #### Optimizing for Edge AI on NVIDIA Jetson Deepdive session on workflows optimizing deep learning models for inference at the edge with NVIDIA Jetson. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/gtcfall20-a22252/) [View all GTC talks](https://www.nvidia.com/en-us/on-demand/playlist/playList-bf5494bd-c99d-449d-8a4d-9699c47c3c59/) * * * ## NVIDIA Deep Learning Institute ![Nvidia DLI](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-dli.jpg) [The NVIDIA Deep Learning Institute (DLI)](https://www.nvidia.com/en-us/deep-learning-ai/education/) offers hands-on training in AI and accelerated computing to solve real-world problems. Training is available as self-paced, online courses or in-person, instructor-led workshops. * * * #### Applications of AI for Predictive Maintenance Learn how to identify anomalies and failures in time-series data, estimate the remaining useful life of the corresponding parts, and use this information to map anomalies to failure conditions, minimizing unplanned downtime. [Learn More](https://www.nvidia.com/content/dam/en-zz/Solutions/deep-learning/deep-learning-education/DLI-Workshop-Applications-of-AI-for-Predictive-Maintenance-2.pdf) #### Getting Started with DeepStream for Video Analytics on Jetson Nano In this course, you’ll learn to extract meaningful insights from video streams through deep learning video analytics using JupyterLab notebooks on an NVIDIA Jetson Nano™. [Learn More](https://courses.nvidia.com/courses/course-v1:DLI+C-IV-02+V1/about) #### Deep Learning for Industrial Inspection Explore how convolutional and recurrent neural networks can be combined to generate effective descriptions of content within images and video clips by combining computer vision and natural language processing. Solve deep learning problems that require multiple types of data inputs. [Learn More](https://www.nvidia.com/content/dam/en-zz/Solutions/deep-learning/deep-learning-education/Deep-Learning-for-Industrial-Inspection.pdf) #### Deep Learning for Robotics Follow along to create robotics solutions on NVIDIA Jetson™ for embedded applications and train a robot to actuate the correct output based on the visual input. [Learn More](https://www.nvidia.com/content/dam/en-zz/Solutions/deep-learning/deep-learning-education/Deep-Learning-for-Robotics.pdf) * * * ## NVIDIA Robotics News ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-robotics-applications.jpg) #### Implementing Robotics Applications with ROS 2 and AI on the NVIDIA Jetson Platform Deep learning is being adopted in robotics to accurately navigate indoor environments, detect and follow objects of interest, and maneuver without collisions. [**Read Blog**](https://developer.nvidia.com/blog/implementing-robotics-applications-with-ros-2-and-ai-on-jetson-platform-2/) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/isaac-sim-social-synth-data-glass-physics-scaled-down.png) #### What Is Synthetic Data? Synthetic data generated from computer simulations or algorithms provides an inexpensive alternative to real-world data that’s increasingly used to create accurate AI models. [**Read Blog**](https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/) * * * #### Automatic Defect Inspection Using the NVIDIA End-to-End Deep Learning Platform Learn about the problems with traditional quality inspection and how deep learning can address and solve these industrial inspection tasks. [Read Blog](/blog/automatic-defect-inspection-using-the-nvidia-end-to-end-deep-learning-platform/) #### Rapid Prototyping on NVIDIA Jetson Platforms with MATLAB Create a prototype and deploy deep learning algorithms on hardware like the [NVIDIA Jetson Nano™ Developer Kit](https://developer.nvidia.com/embedded/jetson-nano-developer-kit) with MATLAB. [Read Blog](/blog/rapid-prototyping-on-nvidia-jetson-platforms-with-matlab/) #### Building a Multi-Camera Media Server for AI Processing on the NVIDIA Jetson Platform Build a simple, real-time multi-camera media server for AI processing on the NVIDIA Jetson™ platform. By using [GStreamer Daemon (GstD), GstInterpipe](https://developer.ridgerun.com/wiki/index.php?title=GStreamer_Daemon) GStreamer Daemon (GstD), GstInterpipe, and the [NVIDIA DeepStream SDK](/deepstream-sdk), you can develop a scalable and robust prototype that captures video from several different sources. [Read Blog](/building-multi-camera-media-server-ai-processing-jetson/) #### Introducing Jetson Xavier NX, the World’s Smallest AI Supercomputer for Autonomous Systems Explore how [NVIDIA Jetson Xavier™ NX](/embedded/jetson-xavier-nx) allows you to deploy next-generation autonomous systems and intelligent edge devices that require high-performance AI and complex deep neural networks (DNNs) in a small, low-power footprint—mobile robots, drones, smart cameras, portable medical equipment, embedded Internet of Things (IoT) systems, and more. [Read Blog](/blog/jetson-xavier-nx-the-worlds-smallest-ai-supercomputer/) Sign up for the latest developer news from NVIDIA [Subscribe](https://www.nvidia.com/en-us/preferences/email-signup/) --- # Source: https://developer.nvidia.com/dgx-cloud.md # NVIDIA DGX Cloud for Developers NVIDIA DGX Cloud accelerates AI workloads in the cloud, supporting a variety of workloads including pretraining, fine-tuning, inference, and the deployment of physical and industrial AI applications. [Get Started](#section-get-started "Download Now")[Documentation](https://docs.nvidia.com/dgx-cloud/ "Go to Documentation") * * * ## See a Part of DGX Cloud in Action [![Watch a video of how NVIDIA DGX Cloud Lepton works](https://developer.download.nvidia.com/images/dgx-cloud-services/demo-virtual-global-ai-factory-ari.jpg)](https://www.youtube.com/watch?v=ZnZumsCZ4pE&t=4s "Watch a video of how NVIDIA DGX Cloud Lepton works") NVIDIA DGX Cloud Lepton, now available for early access, provides an integrated platform for development (SSH, Jupyter, VS Code), training, fine-tuning (running batch jobs), and scalable inference (NVIDIA NIM™ endpoints). * * * ## How NVIDIA DGX Cloud Works DGX Cloud offers a comprehensive suite of cloud-native solutions that include: - **NVIDIA DGX Cloud Lepton** : Bring or discover compute from NVIDIA Cloud Partners (NCPs) and Cloud Service Providers (CSPs). Deploy across multiple clouds, decoupled from infrastructure, and focus on building apps from a single UI, using DGX Cloud Lepton’s integrated platform for development, training, fine-tuning, and scalable inference. - **NVIDIA DGX Cloud for CSPs** : Build and fine-tune AI foundation models on scalable, optimized GPU infrastructure co-engineered with leading cloud providers, with flexible access to contiguous, preconfigured, high-performance clusters and the latest GPUs. - **NVIDIA Cloud Functions (NVCFs)**: Scale deployment of AI workloads on a serverless AI inference platform that offers fully managed, auto-scaling, event-driven deployment across multiple clouds, on-prem, and existing compute environments. - **DGX Cloud Benchmarking** : Get detailed metrics on end-to-end AI workload performance, scaling efficiency, precision format impact, and multi-GPU behavior to help gauge what platform can deliver the fastest time-to-train and what GPU scale is required to achieve an outcome within a given time period. - **NVIDIA Omniverse™ on DGX Cloud** : Deploy streaming applications for industrial digitalization and physical AI simulations on a fully managed platform, utilizing optimized NVIDIA L40 GPUs that deliver NVIDIA RTX™ rendering and low-latency streaming directly to Chromium-based browsers or custom web-based applications. - **NVIDIA Cosmos™ Curator on DGX Cloud:** Fine-tune NVIDIA world foundation models with proprietary data, then manage the entire video data pipeline on DGX Cloud with fully GPU-accelerated pipelines for 89x faster processing and 11x higher throughput for captioning—accelerating AI development for robotics, AV, and video AI applications. ![A diagram showing NVIDIA DGX Cloud solutions](https://developer.download.nvidia.com/images/dgx-cloud-services/how-nvidia-dgx-cloud-work.svg) ### NVIDIA DGX Cloud Lepton Build or deploy AI applications across multi-cloud environments through a unified experience. [Read Blog](https://developer.nvidia.com/blog/introducing-nvidia-dgx-cloud-lepton-a-unified-ai-platform-built-for-developers/) ### NVIDIA NeMo Curator on NVIDIA DGX Cloud NVIDIA DGX Cloud on CSPs Build and fine-tune models with a turnkey, full-stack optimized platform on leading cloud providers with flexible term lengths. [Watch Video](https://www.youtube.com/watch?v=Ua6xoP4Ign4&t=2s) ### NVIDIA Cloud Functions NVIDIA Cloud Functions (NVCF) Deploy AI workloads with auto-scaling, cost-efficient GPU utilization, and multi-cloud flexibility. [Read Blog](https://developer.nvidia.com/blog/seamlessly-scale-ai-across-cloud-environments-with-nvidia-dgx-cloud-serverless-inference/) ### NVIDIA DGX Cloud Benchmarking Benchmarking Service Follow evolving performance optimizations and workload-specific recipes to maximize AI infrastructure. [Read Blog](https://developer.nvidia.com/blog/nvidia-dgx-cloud-introduces-ready-to-use-templates-to-benchmark-ai-platform-performance/) ### NVIDIA Omniverse on DGX Cloud Scale deployment of streaming applications for industrial digitalization and physical AI simulation. [Learn More About Omniverse on DGX Cloud](https://www.nvidia.com/en-us/data-center/omniverse-dgx-cloud/) ### NVIDIA Cosmos Curator on DGX Cloud Efficiently process, fine-tune, and deploy video and world foundation models with DGX Cloud managed services. [Read Blog](https://developer.nvidia.com/blog/petabyte-scale-video-processing-with-nvidia-nemo-curator-on-nvidia-dgx-cloud/) * * * ## Get Started With NVIDIA DGX Cloud Provision and operate environments optimized for AI training, fine-tuning, and inference on NVIDIA DGX Cloud. ![NVIDIA DGX Cloud Lepton](https://developer.download.nvidia.com/icons/m48-cloud-ai.svg) ### NVIDIA DGX Cloud Lepton Tap into global GPU compute to discover, procure, develop, customize, and deploy AI applications across multiple cloud providers. [Sign Up](https://developer.nvidia.com/dgx-cloud/get-lepton?ncid=no-ncid) ![NVIDIA DGX Cloud for CSPs](https://developer.download.nvidia.com/icons/m48-optimized-training.svg) ### NVIDIA DGX Cloud on CSPs Access optimized accelerated computing clusters for AI training and fine-tuning on any leading cloud. [Learn More About DGX Cloud on CSPs](https://www.nvidia.com/en-us/data-center/dgx-cloud-on-csps/?ncid=no-ncid) ![NVIDIA Cloud Functions](https://developer.download.nvidia.com/images/dgx-cloud-services/scalable-ai.svg) ### NVIDIA Cloud Functions Easily package, deploy, and scale inference pipelines or data preprocessing workflows in containers optimized for NVIDIA GPUs. [Learn More About NVCF](https://developer.nvidia.com/dgx-cloud/serverless-inference) ![NVIDIA DGX Cloud Benchmarking](https://developer.download.nvidia.com/icons/m48-cloud-performance.svg) ### NVIDIA DGX Cloud Benchmarking Access benchmarking recipes, tools, and services to identify AI workload performance gaps and optimize any NVIDIA AI infrastructure. [Learn More About DGX Cloud Benchmarking](https://developer.nvidia.com/dgx-cloud/benchmarking) ![NVIDIA Omniverse on DGX Cloud](https://developer.download.nvidia.com/icons/m48-foundational-model-dexterity-256px-blk.svg) ### NVIDIA Omniverse on DGX Cloud Scale deployment of streaming applications for industrial digitalization and physical AI simulations on one fully managed platform. [Learn More About Omniverse on DGX Cloud](https://www.nvidia.com/en-us/data-center/omniverse-dgx-cloud/) ![NVIDIA Cosmos Curator on NVIDIA DGX Cloud](https://developer.download.nvidia.com/icons/model-customization.svg) ### NVIDIA Cosmos Curator on NVIDIA DGX Cloud Efficiently process, fine-tune, and deploy world foundation models on a managed platform for large-scale video curation and model customization. [Explore GitHub](https://github.com/nvidia-cosmos/cosmos-curate) * * * ## NVIDIA DGX Cloud Starter Kits Start experimenting on [build.nvidia.com](http://build.nvidia.com), accelerated by DGX Cloud, or download [DGX Cloud Benchmarking recipes](https://github.com/NVIDIA/dgxc-benchmarking) to optimize your workload on NVIDIA GPUs. ### AI Development, Customization, and Deployment Across Clouds DGX Cloud Lepton enables GPU provisioning in specific regions and supports the full AI development lifecycle, including training, fine-tuning, and inference across multi-cloud environments. - Read [Introducing NVIDIA DGX Cloud Lepton: A Unified AI Platform Built for Developers](https://developer.nvidia.com/blog/introducing-nvidia-dgx-cloud-lepton-a-unified-ai-platform-built-for-developers/) - Read [How to Build AI Systems In House With Outerbounds and DGX Cloud Lepton](https://developer.nvidia.com/blog/how-to-build-ai-systems-in-house-with-outerbounds-and-dgx-cloud-lepton/) - [Request DGX Cloud Lepton Trial](https://www.developer.nvidia.com/dgx-cloud/get-lepton) ### Multi-Node AI Training and Fine-Tuning Platform With DGX Cloud Create, take cloud-native AI training to leading clouds with the latest NVIDIA AI architecture and software. - [Watch NVIDIA Run:ai on DGX Cloud Demo](https://resources.nvidia.com/en-us-dgx-cloud/dgx-cloud-with-run-ai-demo) - [Explore Technical Overview](https://docs.nvidia.com/dgx-cloud/run-ai/latest/overview.html) - [Request DGX Cloud on CSP Marketplaces](https://www.nvidia.com/en-us/data-center/dgx-cloud/trial/) ### Auto-Scaled Deployment on NVIDIA GPUs Package and deploy inference pipelines or data preprocessing workflows in containers optimized for NVIDIA GPUs,without worrying about underlying infrastructure. - Read [NCVF Quickstart Guide](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/quickstart.html) - Watch [NVCF Cluster Registration and NVIDIA NIM Deployment](https://youtu.be/RGHxx9h0zNM?si=2687xqitZYm-_Z2W ) - [Request NVCF Trial](https://developer.nvidia.com/get-nvcf) ### NVIDIA DGX Cloud Benchmarking for AI Workloads Produce AI training and inference performance results from a range of AI models, including Llama and DeepSeek, with recipes provided by DGX Cloud Benchmarking. - Read [NVIDIA DGX Cloud Introduces Ready-To-Use Templates to Benchmark AI Platform Performance](https://developer.nvidia.com/blog/nvidia-dgx-cloud-introduces-ready-to-use-templates-to-benchmark-ai-platform-performance/) - [Review Recipe on GitHub](https://github.com/NVIDIA/dgxc-benchmarking) - [Access NVIDIA DGX Cloud Benchmarking](https://developer.nvidia.com/dgx-cloud/benchmarking/join) ### Scalable Deployment of Streamed Applications for Physical AI Stream OpenUSD applications and digital twins directly on NVIDIA Omniverse from a fully man - Read [Deploying Your Omniverse Kit Apps at Scale](https://developer.nvidia.com/blog/deploying-your-omniverse-kit-apps-at-scale/) - Access [Omniverse on DGX Cloud via Microsoft Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/nvidia.dgx-cloud?tab=Overview) - Learn [Building AI-Powered Material Generation for Omniverse With DGX Cloud](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-53+V1) ### Large-Scale Video Curation Efficiently process, fine-tune, and deploy video and [world foundation models](https://www.nvidia.com/en-us/glossary/world-models/) with NVIDIA Cosmos Curator accelerated by NVIDIA DGX Cloud. - Read [Petabyte-Scale Video Processing With NVIDIA Cosmos Curator On NVIDIA DGX Cloud](https://developer.nvidia.com/blog/petabyte-scale-video-processing-with-nvidia-nemo-curator-on-nvidia-dgx-cloud/) - Explore [Physical AI With NVIDIA Cosmos: Post-Training for Autonomous Vehicles With Uber's Large-Scale Driving Data](https://resources.nvidia.com/en-us-dgx-cloud/physical-ai-post-training-for-autonomous-vehicles-with-ubers-large-scale-driving-data-using-nvidia-cosmos) - [Access Cosmos Repository](https://github.com/nvidia-cosmos/cosmos-curate) * * * ## DGX Cloud Learning Library Video ### AI Model Building on Orchestrated GPU Clusters **NVIDIA DGX Cloud for CSPs** Learn how DGX Cloud for CSPs provides the resources and orchestration for developing and building AI pipelines and managing the AI lifecycle for model training and fine-tuning. Tech Blog ### How to Build Large AI Systems In House **NVIDIA DGX Cloud Lepton ** See how a large-scale Reddit Agent recommender application demonstrates the benefits of operating a production-grade AI system stack in house. Research ### Post-Training for Autonomous Vehicles With Large-Scale Data **NVIDIA Cosmos Curator on DGX Cloud** Explore how NVIDIA and Uber built a system and process to produce post-trained Cosmos-AV models for autonomous driving development using NVIDIA Cosmos Curator on DGX Cloud. Training ### Build AI-Powered Material Generation **NVIDIA Omniverse on DGX Cloud** Learn how to create an NVIDIA Omniverse extension on DGX Cloud that uses AI to generate realistic 3D materials from natural language descriptions. Video ### NVCF Cluster Registration and NVIDIA NIM Deployment **NVIDIA Cloud Functions (NVCF) ** See NVCF in action by registering a Kubernetes cluster into NVCF, deploying an NVIDIA NIM onto the provisioned cluster and then running inference against it. Tech Blog ### Ensuring Reliable Model Training on NVIDIA DGX Cloud **NVIDIA DGX Cloud for CSPs ** Learn resiliency training techniques applied to NVIDIA Llama Nemotron and other foundation models that achieved \<1% hardware downtime for training runs at the 2K–10K GPU scale. * * * ## More Resources ![NVIDIA DGX Cloud Serverless Inference FAQ](https://developer.download.nvidia.com/icons/m48-misc-question-faq.svg) ### Read NVIDIA Cloud Functions FAQ ![NVIDIA Training and Certification](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![NVIDIA Inception Program for Startups](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program * * * ## Ethical AI NVIDIA believes trustworthy AI is a shared responsibility, and we have established policies and practices to support the development of AI across a wide array of applications. When downloading or using this model in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Get started with NVIDIA DGX Cloud today. [Get Started](https://www.nvidia.com/en-us/data-center/dgx-cloud/trial/) --- # Source: https://developer.nvidia.com/rtx/dlss.md # NVIDIA DLSS NVIDIA DLSS is a suite of neural rendering technologies powered by NVIDIA RTX™ Tensor Cores that boosts frame rates while delivering crisp, high-quality images that rival native resolution. DLSS 4 introduced Multi Frame Generation (MFG) and a transformer model for high quality super resolution. DLSS 4.5 introduces Dynamic MFG and a second generation transformer model for super resolution. [Download Now](#getstarted)[Forum](https://forums.developer.nvidia.com/c/gaming-and-visualization-technologies/dlss/493)[Notify Me](https://developer.nvidia.com/rtx/dlss/notify-me) * * * ## DLSS AI Technologies DLSS neural rendering technologies include: - **DLSS Multi Frame Generation** boosts frame rates by using AI to generate up to three frames per rendered frame - and up to five frames coming soon, powered by NVIDIA GeForce RTX 50 Series and RTX PRO Blackwell Generation GPUs with fifth-generation Tensor Cores. - **DLSS Dynamic Multi Frame Generation** Dynamically adjusts the frame generation multiplier to maximize smoothness across different games and scenes (coming in Spring). - **DLSS Ray Reconstruction** enhances image quality by using AI to generate additional pixels for intensive ray-traced scenes. DLSS replaces hand-tuned denoisers with an NVIDIA supercomputer-trained AI network that generates higher-quality pixels between sampled rays. - **DLSS Super Resolution** boosts performance by using AI to output higher-resolution frames from a lower-resolution input. DLSS samples multiple lower-resolution images and uses motion data and feedback from prior frames to construct high-quality images. A new second-generation Transformer AI model further improves stability, anti-aliasing, and visual clarity. - **Deep Learning Anti-Aliasing** provides higher image quality with an AI-based anti-aliasing technique. DLAA uses the same Super Resolution technology developed for DLSS, constructing a more realistic, high-quality image at native resolution. DLSS Super Resolution, Ray Reconstruction, and DLAA use cutting-edge transformer AI models trained by NVIDIA’s supercomputers for the best image quality. https://www.youtube-nocookie.com/embed/u4cfzFd9HzU?&rel=0 * * * ## See DLSS in Action ### Borderlands 4 https://www.youtube-nocookie.com/embed/jaFfHMZmXV4?&rel=0 ### ARC Raiders https://www.youtube-nocookie.com/embed/ub2OrMkdpYc?&rel=0 ### Battlefield 6 https://www.youtube-nocookie.com/embed/ICJajlj_ujo?&rel=0 * * * ## Key Benefits ### Performance Multiplier DLSS 4 featuring DLSS Multi Frame Generation—working in unison with the complete suite of DLSS technologies, including [NVIDIA Reflex](https://developer.nvidia.com/performance-rendering-tools/reflex)—multiplies frame rates by up to 8X over traditional brute-force rendering. ### Maximize Quality With AI DLSS takes advantage of AI models that are continuously improved through ongoing training on NVIDIA supercomputers, providing better image quality and performance across more games and applications. The latest transformer-based model offers improved stability, enhanced lighting detail, and more detail in motion. ### Enhanced Ray Tracing DLSS Ray Reconstruction replaces the need for multiple hand-tuned denoisers and generates higher-quality pixels between sampled rays. * * * ## Get Started With DLSS Through Streamline Streamline is an open-source cross-IHV solution that simplifies integration of the latest NVIDIA and other super resolution technologies into applications and games. This framework allows developers to easily implement a single integration and enable multiple super resolution technologies and other graphics effects supported by different hardware vendors. For additional help, refer to the [FAQ](https://forums.developer.nvidia.com/t/dlss-4-faq/321939). ### DLSS Multi Frame Generation Streamline Plugin Create up to three additional frames, providing a huge performance boost versus traditional brute-force rendering. Dynamic Multi Frame Generation Coming Spring. Sign up to be notified when it's available [here](https://developer.nvidia.com/rtx/dlss/notify-me). ### DLSS Ray Reconstruction Streamline Plugin Replace hand-tuned denoisers with AI that generates higher-quality pixels in between sampled rays. Includes latest transformer architecture for improved image quality. ### DLSS Super Resolution Streamline Plugin Output higher-resolution frames from a lower-resolution input. DLAA mode uses AI for anti-aliasing. Includes the latest transformer model in Beta. [Access Streamline](https://github.com/NVIDIAGameWorks/Streamline) * * * ## Get Started With DLSS in Game Engines ### DLSS in Unreal Engine 5 DLSS 4 is in Unreal Engine through our official DLSS Unreal Engine plugin. The plugin provides access to Multi Frame Generation, Frame Generation, Ray Reconstruction, Super Resolution, DLAA, Reflex Low Latency, and NVIDIA Image Scaling. ### DLSS 4 Plugin for Unreal Engine - [Download UE 5.7 DLSS Plugin](https://developer.nvidia.com/downloads/assets/gameworks/downloads/secure/dlss/2025.12.09_ue5.7_dlss4plugin_v8.4.0.zip) - [Download UE 5.6 DLSS Plugin](https://developer.nvidia.com/downloads/assets/gameworks/downloads/secure/dlss/2025.12.09_ue5.6_dlss4plugin_v8.4.0.zip) - [Download UE 5.5 DLSS Plugin](https://developer.nvidia.com/downloads/assets/gameworks/downloads/secure/dlss/2025.12.09_ue5.5_dlss4plugin_v8.4.0.zip) - [Download UE 5.4 DLSS Plugin](https://developer.nvidia.com/downloads/assets/gameworks/downloads/secure/dlss/2025.12.09_ue5.4_dlss4plugin_v8.4.0.zip) ### DLSS in Unity NVIDIA DLSS Super Resolution is now available through Unity 2021.2 Beta versions or newer. ### Unity Plugin - [Download Unity](https://unity.com/download) * * * ## More Resources ![A decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Join the NVIDIA Developer Program ![A decorative image representing Inception for Startups](https://developer.download.nvidia.com/icons/m48-misc-question-faq.svg) ### DLSS Developer Forums ![img-alt-text](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign up for Developer Newsletter * * * Get Started with DLSS today [Download](#getstarted) --- # Source: https://developer.nvidia.com/networking/doca.md 1. [Home](/) [Networking](/networking) NVIDIA DOCA # NVIDIA DOCA Software Framework # Accelerate application development for NVIDIA BlueField and ConnectX networking devices. NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers. DOCA-Host includes all needed host drivers and tools for your NVIDIA BlueField and ConnectX® devices. [Download DOCA](http://developer.nvidia.com/doca-downloads "Github Repo")[Get Started](https://developer.nvidia.com/networking/doca/getting-started "Download Workflows") ![NVIDIA DOCA software framework diagram](https://developer.download.nvidia.com/images/doca/doca-3.0-stack-diagram.svg) Together, DOCA and the BlueField networking platform enable the development of applications that deliver breakthrough networking, security, and storage performance. BlueField isolates the infrastructure service domain from the workload domain to offer significant improvements in application and server performance, security, and efficiency, giving developers all the tools they need to realize optimal, secure, accelerated data centers and AI clouds. DOCA software consists of an SDK and a runtime environment. The DOCA runtime, included by default with the BlueField networking platform, has tools for provisioning, deploying, and orchestrating containerized services on hundreds or thousands of DPUs and SuperNICs across the data center. The DOCA SDK provides industry-standard open APIs and software frameworks. The SDK supports a range of operating systems and distributions and includes drivers, libraries, tools, documentation, and example applications. DOCA-Host is the DOCA package for host installation and includes several installation profiles to best fit your data center workflows. DOCA-Host provides the needed interfaces for NVIDIA networking platforms, including both BlueField and ConnectX devices. [Read the DOCA User Manual](https://docs.nvidia.com/doca/sdk/) * * * ## Platform and Host Deployments ![NVIDIA DOCA software applications running on NVIDIA BlueField DPU](https://developer.download.nvidia.com/images/doca/doca-blue-field-networking-platform.jpg) ### DOCA on the BlueField Networking Platform The NVIDIA BlueField networking platform, powered by the DOCA software framework, is an advanced computing platform for data center infrastructure, delivering accelerated software-defined networking, storage, security, and management services at massive scale. [Learn About BlueField](https://www.nvidia.com/en-us/networking/products/data-processing-unit/) ![NVIDIA DOCA software applications running on a host server box](https://developer.download.nvidia.com/images/doca/doca-host.jpg) ### DOCA on the Host NVIDIA BlueField and NVIDIA Connect-X are paired with DOCA to deliver Ethernet and InfiniBand connectivity solutions at speeds up to 800 gigabits per second (Gb/s). Built on an open foundation, the DOCA-host package includes essential drivers and tools to enhance networking performance and enable advanced functionality. DOCA software is available on every leading operating system as a standalone package (without a bundled OS) for Arm® and x86 architectures. [Learn About DOCA-Host Installation and Profiles](https://docs.nvidia.com/doca/sdk/nvidia+doca+profiles/index.html) * * * ## Unpack the Stack # BlueField Software Bundle - The BlueField software bundle includes the bootloader, OS kernel, necessary network interface card (NIC) firmware, NVIDIA drivers, sample filesystem, and toolchain—all certified as part of the NVIDIA NGC™ catalog. - The BlueField bundle includes Ubuntu 22.04 as a commercial-grade Linux distribution with continuous OS and security updates. ![A stack diagram of NVIDIA DOCA software apps running on Arm core of NVIDIA BlueField DPU](https://developer.download.nvidia.com/images/doca/blue-field-software-bundle.jpg) ![A stack diagram of NVIDIA DOCA business apps isolated from NVIDIA DOCA on infrastructure services domain](https://developer.download.nvidia.com/images/doca/sdk-key-components.jpg) # SDK Key Components - DOCA RDMA (Remote direct-memory access) acceleration SDK: unified communications and collaboration (UCC) and Unified Communication X (UCX), RDMA verbs, GPUDirect® - Network acceleration SDK: NVIDIA Accelerated Switching and Packet Processing (ASAP2)™ software-defined networking (SDN), emulated VirtIO, P4, 5T for 5G technology, Firefly time synchronization - Security acceleration SDK: inline cryptography, App Shield runtime security - Storage acceleration SDK: storage emulation and virtualization, crypto and compression - Data path acceleration (DPA) SDK: accelerate workloads requiring high-performance access to NIC engines - Management SDK: deployment, provisioning, service orchestration - Industry-standard APIs: DPDK, SPDK, P4, Linux Netlink - User space and kernel ![A decorative image showing an interapplication workflow on a monitor](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/devzone-networking-doca-sdk-icon-infrastrucure.svg) ### Forward and Backward Compatibility DOCA provides multi-generational support to ensure that applications developed today will consistently run with added performance benefits on all future generations of BlueField. ![A decorative image of a cloud networking telemetry](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/devzone-networking-doca-sdk-icon-doca.svg) ### Offload, Accelerate, Isolate Infrastructure Network, storage, and security services are offloaded, accelerated, and isolated on BlueField while data is securely delivered to workloads at wire speed. ![A decorative image of a neural network](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/devzone-networking-doca-sdk-icon-sol.svg) ### Open Ecosystem DOCA offers a software application framework to accelerate ecosystem development. * * * ## DOCA Developer Resources # DOCA-Host and BlueField Bundle Runtime Downloads Download DOCA-Host and the BlueField DPU and SuperNIC runtime image. [Download DOCA](http://developer.nvidia.com/doca-downloads "Github Repo")[Get Started](https://developer.nvidia.com/networking/doca/getting-started "Download Workflows") Quick Links - [Download DOCA](http://developer.nvidia.com/doca-downloads) - [Getting Started](https://developer.nvidia.com/networking/doca/getting-started) - [DOCA Documentation](https://docs.nvidia.com/doca/sdk/) - [BlueField Hardware Manual](https://docs.nvidia.com/networking/dpu-doca/index.html#dpu-hw) - [BlueField Platform Software Manuals](https://docs.nvidia.com/networking/dpu-doca/index.html#dpu-os) * * * --- # Source: https://developer.nvidia.com/drive.md 1. [Topics](/topics/) 2. Automotive # Automotive Discover NVIDIA’s complete suite of developer solutions for autonomous vehicles, spanning, AI training, simulation, and in-vehicle compute. Explore more: [Build AI Models for Auto](http://build.nvidia.com/explore/automotive) Quick Links: - [NGC Automotive Catalog ](https://catalog.ngc.nvidia.com/search?filters=industry%7CAutomotive+%2F+Transportation%7Cindus_automotive_transportation&orderBy=weightPopularDESC) - [DRIVE SDK Downloads](https://developer.nvidia.com/drive/downloads) - [NVIDIA Halos](https://www.nvidia.com/en-us/ai-trust-center/halos/autonomous-vehicles/) ## Explore Autonomous Vehicle Solutions [ Explore All Topics](https://developer.nvidia.com/topics/) ![NVIDIA’s automotive AI platform](https://developer.download.nvidia.com/images/drive/automotive-ai-platform-ari.jpg) ### AI Training The NVIDIA automotive AI platform combines powerful NVIDIA DGX™ Cloud hardware with enterprise software for scalable AV data curation, labeling, and training. This enables efficient, end-to-end autonomous vehicle development and deployment. [Learn More](/drive/infrastructure) ![Autonomous vehicle simulation](https://developer.download.nvidia.com/images/drive/autonomous-vehicle-simulation-ari.jpg) ### Simulation Accelerate AV development with high-fidelity, scalable simulation workflows. Reconstruct real- world data into interactive simulation with NVIDIA Omniverse™ NuRec and generate synthetic data with NVIDIA Cosmos™ world foundation models for enhanced testing and validation. [Learn More](/drive/simulation) ![NVIDIA DRIVE AGX Thor Developer Kit](https://developer.download.nvidia.com/images/drive/drive-agx-ari.jpg) ### DRIVE AGX NVIDIA DRIVE AGX™ is the brain of the car—a complete hardware and software platform delivering industry-leading performance. By combining raw computing power with trusted [DRIVE AGX Orin](https://developer.nvidia.com/drive/ecosystem-orin) and [DRIVE AGX Thor](https://developer.nvidia.com/drive/ecosystem-thor) ecosystem partners, we provide a seamless transition to production. DRIVE AGX is powered by the NVIDIA [DriveOS](https://developer.nvidia.com/drive/os)™ SDK featuring [DriveWorks](https://developer.nvidia.com/drive/driveworks), CUDA®, TensorRT™, NvMedia, and NvStreams. This full-stack platform accelerates the development of scalable, reliable AI-powered vehicle applications. [Learn More](/drive/agx) --- # Source: https://developer.nvidia.com/drive/driveworks.md # NVIDIA DriveWorks SDK The NVIDIA® DriveWorks Software Development Kit (SDK) provides a suite of accelerated algorithms and versatile tools to bootstrap software development for Autonomous Vehicles. ## DriveWorks at a Glance The DriveWorks SDK contains a comprehensive set of modules, tools and samples that solve typical tasks and workloads for AV development. Developers can use DriveWorks for their applications and leverage the computing power of NVIDIA DRIVE AGX™ SoCs. ![DriveWorks acceleration libraries and tools](https://developer.nvidia.com/downloads/drive/images/driveworks.png) ## Featured Modules ### Sensor Abstraction Layer NVIDIA DriveWorks provides a Sensor Abstraction Layer that supports capturing of data from various sources. It is designed to provide the following features: - Abstraction between physical sensor models and software applications - A unified, compact sensor interface definition - Raw sensor serialization for recording - Virtual sensors to enable replay - Abstraction over DriveOS core components: NVIDIA CUDA®, NvMedia, and NvStreams ![DriveWorks comes with Compute Graph Framework and STM Scheduler](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/revamp-driveworks-sensor-abstraction-layer-630x354.jpg) * * * ### Image Processing ![DriveWorks Image Processing modules](https://developer-blogs.nvidia.com/wp-content/uploads/2019/08/featuretracking-624x404.png) The Image Processing Library provides structures and algorithms to efficiently process image data. It features the following algorithms: - Image Preprocessing: Rectification, Color Correction - Image Features: Extraction and Feature History - Image Filtering: Recursive Gaussian filter, Box filter, and Convolution filter - Area Tracking: Templates, 2D Bounding Boxes - Stereo: Rectification and Disparity Estimation * * * ### Point Cloud Processing The DriveWorks Point Cloud Processing Library is built specifically for Point Cloud data as transmitted by LiDAR sensors. It provides the following GPU-accelerated algorithms: - Accumulation of a Point Clouds over time - Stitching to combine Point Clouds - Range Image (Depth Map) creation - (Ground) Plane extraction via RANSAC algorithm - Point Cloud filter by attribute ![DriveWorks provides image and point cloud processing modules](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/revamp-driveworks-image-point-cloud-processing-630x354.jpg) * * * ### Dynamic Calibration ![Static calibration tools and self-calibration](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/revamp-driveworks-calibration-630x354.jpg) The Calibration Module supports Dynamic Calibration for Camera, Radar, LiDAR and IMU sensors that are compatible with the DriveWorks Sensor Abstraction Layer. Dynamic Calibration is a process where sensor parameters are re-estimated at runtime based on sensor measurements and vehicle motion. This process compensates for the effects of environmental changes or mechanical stress—such as changes in road gradient, tire pressure, or vehicle loading—which can affect the extrinsic parameters (position and orientation) of sensors during a vehicle's operation. * * * ### Egomotion The Egomotion module uses a motion model to track and predict the vehicle’s pose based on sensor inputs. It supports two types of motion models: an odometry-only model and, if an IMU is available, a model based on IMU and odometry. During run-time, the module takes measurements as input and internally updates the current estimation of the vehicle pose. The module can be queried for vehicle motion between any two points in time. ![DriveWorks Egomotion module uses a motion model to track and predict a vehicle’s pose](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/revamp-driveworks-egomotion-630x354.jpg) * * * ## Resources [![DRIVE SDK Downloads](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/m48-download-96px-grn.svg)](https://developer.nvidia.com/drive/downloads) [Downloads](https://developer.nvidia.com/drive/downloads) [![Developer Forums](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/m48-question-support-96px-grn.svg) ](https://forums.developer.nvidia.com/drive-agx) [Forums](https://forums.developer.nvidia.com/drive-agx) [![Developer Blog](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/m48-communication-news-flash-share-96px-grn.svg)](https://developer.nvidia.com/blog/tag/nvidia-drive/) [Blog](https://developer.nvidia.com/blog/tag/nvidia-drive/) [![NVIDIA Webinars](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/drive/images/m48-announcement-megaphone-96px-grn.svg)](https://developer.nvidia.com/drive/training) [Webinars](https://developer.nvidia.com/drive/training) Peek under the hood to experience NVIDIA’s latest autonomous driving innovations via DRIVE Labs and DRIVE Dispatch. [View DRIVE Videos](https://www.nvidia.com/en-us/self-driving-cars/drive-videos/) --- # Source: https://developer.nvidia.com/dynamo.md # NVIDIA Dynamo [NVIDIA Dynamo](https://www.nvidia.com/en-us/ai-data-science/products/triton-inference-server/)is an open-source, low-latency, modular inference framework for serving generative AI models in distributed environments. It enables seamless scaling of inference workloads across large GPU fleets with intelligent resource scheduling and request routing, optimized memory management, and seamless data transfer. NVIDIA Dynamo supports all major AI inference backends and features large language model (LLM)-specific optimizations, such as disaggregated serving. When serving the open-source DeepSeek-R1 671B reasoning model on [NVIDIA GB200 NVL72](https://www.nvidia.com/en-us/data-center/gb200-nvl72/), NVIDIA Dynamo increased throughput—measured in tokens per second per GPU—by up to 30x. Serving the Llama 70B model on NVIDIA Hopper™, it increased throughput by more than 2x. NVIDIA Dynamo is the ideal solution for developers looking to accelerate and scale generative AI models with the highest efficiency at the lowest cost. NVIDIA Dynamo builds on the successes of the [NVIDIA Triton Inference Server](https://github.com/triton-inference-server/server), an open-source software that standardizes AI model deployment and execution across every workload. [Get Started](https://github.com/ai-dynamo/dynamo)[Documentation](https://docs.nvidia.com/dynamo/latest/) * * * ## See NVIDIA Dynamo in Action https://www.youtube-nocookie.com/embed/1bRmskFCnqY? ### See How to Quickly Set up and Deploy NVIDIA Dynamo [Watch Video](https://youtu.be/1bRmskFCnqY) https://www.youtube-nocookie.com/embed/PRCZZKQirN8? ### Watch KV Cache-Aware Smart Router With NVIDIA Dynamo [Watch Video](https://youtu.be/PRCZZKQirN8) https://www.youtube-nocookie.com/embed/PRCZZKQirN8? ### Learn How NVIDIA Dynamo Enables Disaggregated Serving [Watch Video](https://www.youtube.com/watch?v=PRCZZKQirN8) * * * ## How NVIDIA Dynamo Works Models are becoming larger and more integrated into AI workflows that require interaction with multiple models. Deploying these models at scale involves distributing them across multiple nodes, requiring careful coordination across GPUs. The complexity increases with inference optimization methods, like disaggregated serving, which splits responses across different GPUs, adding challenges in collaboration and data transfer. **NVIDIA Dynamo addresses the challenges of distributed and disaggregated inference serving. It includes four key components:** - **GPU Resource Planner:** A planning and scheduling engine that monitors capacity and prefill activity in multi-node deployments to adjust GPU resources and allocate them across prefill and decode. - **Smart Router:** A KV-cache-aware routing engine that efficiently directs incoming traffic across large GPU fleets in multi-node deployments to minimize costly re-computations. - **Low Latency Communication Library:** State-of-the-art inference data transfer library that accelerates the transfer of KV cache between GPUs and across heterogeneous memory and storage types. - **KV Cache Manager:** A cost-aware KV cache offloading engine designed to transfer KV cache across various memory hierarchies, freeing up valuable GPU memory while maintaining user experience. ![A flowchart of how NVIDIA Dynamo works](https://developer.download.nvidia.com/images/how-it-works.jpg) * * * https://www.youtube-nocookie.com/embed/3C-6STonTLU? #### NVIDIA Dynamo Key Moments From GTC25 Watch the recording to learn about NVIDIA Dynamo’s key components and architecture and how they enable seamless scaling and optimized inference in distributed environments. [Watch Now](https://www.nvidia.com/gtc/session-catalog/?search=S73042&tab.catalogallsessionstab=16566177511100015Kus#/) * * * ## Get Started With NVIDIA Dynamo [Find](https://www.nvidia.com/en-us/ai-data-science/products/triton-inference-server/get-started/) the right license to deploy, run, and scale AI inference for any application on any platform. ![](https://developer.download.nvidia.com/icons/m48-download.svg) ### Download Code for Development NVIDIA Dynamo and NVIDIA Dynamo-Triton are available as open-source software on GitHub with end-to-end examples. [Go to NVIDIA Dynamo Repository (Github)](https://github.com/ai-dynamo/dynamo) [Go to NVIDIA Dynamo-Triton Repository (Github](https://github.com/triton-inference-server/server) ![](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) ### Purchase NVIDIA AI Enterprise NVIDIA Dynamo-Triton is available with enterprise-grade support, security, stability, and manageability with [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/ai-data-science/products/triton-inference-server/get-started/). NVIDIA Dynamo will be included in NVIDIA AI Enterprise for production inference in a future release. Get a free license to try NVIDIA AI Enterprise in production for 90 days using your existing infrastructure. [Request a 90-Day License](https://enterpriseproductregistration.nvidia.com/?LicType=EVAL&ProductFamily=NVAIEnterprise) [View NVIDIA Dynamo-Triton Licensing Options](https://www.nvidia.com/en-us/ai-data-science/products/triton-inference-server/get-started/#nv-accordion-d76f4815d2-item-cc46c5bf45) [Contact Us to Learn More About NVIDIA Dynamo](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/contact-sales/) ### Quick-Start Guide Learn the basics for getting started with NVIDIA Dynamo, including how to deploy a model in a disaggregated server setup and how to launch the smart router. [Get Started](https://github.com/ai-dynamo/dynamo) ### Introductory Blog Read about how NVIDIA Dynamo helps simplify AI inference in production, the tools that help with deployments, and ecosystem integrations. [Read Blog](/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/) ### Deploy LLM Inference With NVIDIA Dynamo and vLLM NVIDIA Dynamo supports all major backends, including vLLM. Check out the tutorial to learn how to deploy with vLLM. [Read Docs](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/) ### Multi-Node Deployment With NVIDIA Dynamo and Grove on Kubernetes Learn how to deploy multi-node models using NVIDIA Dynamo with Grove API, which enables efficient scaling and declarative startup ordering of interdependent AI inference components across multiple nodes. Get Started * * * ## Starter Kits Access technical content on inference topics like prefill optimizations, decode optimizations, and multi-GPU inference. #### Multi-GPU Inference Models have grown in size and can no longer fit on a single GPU. Deploying these models involves distributing them across multiple GPUs and nodes. This kit shares key optimization techniques for multi-GPU inference. - [MultiShot communication protocol](https://developer.nvidia.com/blog/3x-faster-allreduce-with-nvswitch-and-tensorrt-llm-multishot/) - [Pipeline parallelism for high-concurrency efficiency](https://developer.nvidia.com/blog/boosting-llama-3-1-405b-throughput-by-another-1-5x-on-nvidia-h200-tensor-core-gpus-and-nvlink-switch/) - [Large NVIDIA NVLink™ domains](https://developer.nvidia.com/blog/low-latency-inference-chapter-2-blackwell-is-coming-nvidia-gh200-nvl32-with-nvlink-switch-gives-signs-of-big-leap-in-time-to-first-token-performance/) #### Prefill Optimizations When a user submits a request to a large language model, it generates a KV cache to compute a contextual understanding of the request. This process is computationally intensive and requires specialized optimizations. This kit presents essential KV cache optimization techniques for inference. - [Key-value (KV) cache early reuse](https://developer.nvidia.com/blog/5x-faster-time-to-first-token-with-nvidia-tensorrt-llm-kv-cache-early-reuse/) - [Chunked prefill](https://developer.nvidia.com/blog/streamlining-ai-inference-performance-and-deployment-with-nvidia-tensorrt-llm-chunked-prefill/) - [Supercharging multiturn interactions](https://developer.nvidia.com/blog/nvidia-gh200-superchip-accelerates-inference-by-2x-in-multiturn-interactions-with-llama-models/) #### Decode Optimizations Once the LLM generates the KV cache and the first token, it moves into the decode phase, where it autoregressively generates the remaining output tokens. This kit highlights key optimization techniques for the decoding process. - [Multiblock attention for long sequences](https://developer.nvidia.com/blog/nvidia-tensorrt-llm-multiblock-attention-boosts-throughput-by-more-than-3x-for-long-sequence-lengths-on-nvidia-hgx-h200/) - [Speculative decoding for accelerated throughput](https://developer.nvidia.com/blog/tensorrt-llm-speculative-decoding-boosts-inference-throughput-by-up-to-3-6x/) - [Speculative decoding with Medusa](https://developer.nvidia.com/blog/low-latency-inference-chapter-1-up-to-1-9x-higher-llama-3-1-performance-with-medusa-on-nvidia-hgx-h200-with-nvlink-switch/) #### Topology Optimized Serving on Kubernetes AI workloads have evolved into complex multi-component systems spanning multiple nodes. Grove bridges AI inference frameworks and Kubernetes scheduling, enabling efficient scaling and declarative startup ordering of interdependent components through unified custom resources. This kit introduces Grove's capabilities and guides you through topology-optimized model deployment on Kubernetes. - Optimizing the Deployment of Interdependent AI Inference Components - Developer Workflow of Grove API - [NVIDIA Grove Github Repository](https://github.com/ai-dynamo/grove) * * * #### NVIDIA Blackwell Delivers Unmatched Performance and ROI for AI Inference The NVIDIA Blackwell platform—including NVFP4 low precision format, fifth-generation NVIDIA NVLink and NVLink Switch, and the NVIDIA TensorRT-LLM and NVIDIA Dynamo inference frameworks—enables the highest AI factory revenue: A $5M investment in GB200 NVL72 generates $75 million in token revenue—a 15x return on investment. This includes development with community frameworks such as SGLang, vLLM, and more. [Explore technical results](https://developer.nvidia.com/blog/nvidia-blackwell-leads-on-new-semianalysis-inferencemax-benchmarks/) ![NVIDIA Rivermax provides real-time streaming for the Las Vegas Sphere, world’s largest LED display](https://developer.download.nvidia.com/images/tensorrt/inference-tech-blog-sa-external-think-smart-1920x1080.png) * * * ## More Resources ![Decorative image representing forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore Developer Discord ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Watch Dynamo Office Hours On-Demand ![Decorative image representing forums](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for Inference-Related Developer News ![](https://developer.download.nvidia.com/icons/m48-misc-question-faq.svg) ### Read NVIDIA Dynamo FAQ ![](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program * * * ## Ethical AI NVIDIA believes trustworthy AI is a shared responsibility, and we have established policies and practices to support the development of AI across a wide array of applications. When downloading or using this model in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Get Started With NVIDIA Dynamo Today [Download Now](https://github.com/ai-dynamo/dynamo) --- # Source: https://developer.nvidia.com/industries/energy.md Tune in on January 16th @ 9am for a live recap and Q&A of NVIDIA’s latest developer announcements at CES 2025. [Learn More](https://www.addevent.com/event/Hm24479846) # Energy Industry Resources for Developers A hub of SDKs, technical resources, and more for developers working in the energy industry. ## App Frameworks and SDKs ### NVIDIA Energy SDK Explore open source and NDA samples in a collection of GPU-accelerated algorithms and techniques for the Energy industry, including reverse time migration (RTM), Kirchhoff Migration, SRME, seismic processing, reservoir simulation, Marchenko, and more. **[Explore Energy SDK](https://github.com/NVIDIA/energy-sdk)** ### HPC SDK The [NVIDIA HPC Software Development Kit](/hpc-sdk) (SDK) includes the proven compilers, libraries, and software tools essential to maximizing developer productivity and the performance and portability of high-performance computing (HPC) applications. **[Explore HPC SDK](/hpc-sdk)** ### NVIDIA CUDA-X [NVIDIA® CUDA-X](/gpu-accelerated-libraries), built on top of [NVIDIA CUDA®](/cuda-zone), is a collection of libraries, tools, and technologies that deliver dramatically higher performance - compared to CPU-only alternatives - across multiple application domains, from artificial intelligence (AI) to high performance computing (HPC). **[Explore CUDA-X](/gpu-accelerated-libraries)** ### NVIDIA AI Enterprise AI researchers, developers, and data scientists are using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/) to rapidly deploy, manage, and scale AI workloads in the modern hybrid cloud running on on VMware vSphere with NVIDIA-Certified Systems™. **[Explore AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/)** ### NVIDIA Omniverse [NVIDIA Omniverse™](https://www.nvidia.com/en-us/omniverse/) is an extensible, open platform built for 3D virtual collaboration and real-time physically accurate simulation. Omniverse combined with [NVIDIA PhysicsNeMo](/modulus), a framework for developing physics machine learning neural network models, enables digital twins for autonomous vehicles, smart factories, and more. **[Explore Omniverse](https://www.nvidia.com/en-us/omniverse/)** ### NVIDIA Jetson [NVIDIA® Jetson™](/embedded-computing) brings accelerated AI performance to the edge in a power-efficient and compact form factor. Together with NVIDIA JetPack™ SDK, these Jetson modules open the door for you to develop and deploy innovative products across all industries. **[Explore Jetson](/embedded-computing)** ### NGC With [NGC™](https://ngc.nvidia.com/), you can quickly deploy AI frameworks with containers, get a head start with pre-trained models or model training scripts, and use domain-specific workflows and Helm charts for the fastest AI implementations, giving you faster time to solution. **[Explore NGC](https://ngc.nvidia.com/)** ### Metropolis NVIDIA Metropolis is an end-to-end application framework built on the NVIDIA EGX platform that simplifies the development, deployment and scale of AI-enabled intelligent video analytics applications, such as retail analytics, traffic management, and automated factory inspections. **[Explore Metropolis](/metropolis)** ### RAPIDS The NVIDIA RAPIDS™ suite of open-source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA and based on extensive hardware and data science experience. **[Explore RAPIDS](/rapids)** ## Browse by Resource Type - [ Webinars ](#web) - [ GTC On-Demand ](#gtc) - [ Blogs ](#blogs) ### Personalizing Customer Experiences in Energy with AI Learn how Minerva CQ, an NVIDIA Inception member, is leveraging NVIDIA Riva Enterprise to augment customer service agents with real-time AI assist for each customer conversation. **[Watch Now (56:01)](https://resources.nvidia.com/en-us-energy-utilities/webinar-energy-miner)** ### Powering Energy Startup Success with NVIDIA Inception Learn how Skycatch, Utilidata, and VoltaGrid accelerated growth by joining the NVIDIA Inception Program, and utilized NVIDIA’s resources and technical expertise to better manage distributed energy resources at the grid-edge and use 3D drone mapping to decarbonize mining operations. **[Watch Now (53:42)](https://resources.nvidia.com/en-us-energy-utilities/webinar-recording-powering)** ### Accelerating Plastic Recycling using Virtual Reactors Learn how Encina utilized CPFD Barracuda Virtual Reactor simulation software–powered by NVIDIA and running on Microsoft Azure– to model trillions of fluid particles over 500X faster than serialized CPUs. **[Register Now](https://resources.nvidia.com/en-us-energy-surfaces/accelerating-plastic)** ### Demystifying AI for Petroleum Engineers Find out AI fundamentals, techniques, and tips for petroleum engineers to build scalable models and powerful applications from SPE Digital Energy Technical Section and NVIDIA. **[Watch Now (1:07:21)](https://resources.nvidia.com/en-us-upstream-energies/demystifying-ai-petr)** ### Advancing the Future of Energy with High-Performance AI Beyond Limits and NVIDIA share insights on transparent Cognitive AI, and implementation of novel solutions in upstream and downstream using GPU-accelerated model-free, deep reinforcement learning. **[Watch Now (1:01:08)](https://resources.nvidia.com/en-us-upstream-energy/beyond-limits-nvidia?lx=d-5uUJ)** ### Expediting Seismic Facies Analysis Learn how to use NVIDIA DGX A100 and RAPIDS for high-performance machine learning to expedite unsupervised seismic facies analysis. **[Watch Now (59:28)](https://resources.nvidia.com/en-us-upstream-energy/webinar-energy-exped?lx=d-5uUJ)** **[View all on-demand energy sessions](https://www.nvidia.com/en-us/on-demand/search/?facet.event_sessionIndustry%5B%5D=Energy&facet.mimetype%5B%5D=event%20session&headerText=All%20Sessions&layout=list&page=1&q=-&sort=date)** ### Exelon Uses Synthetic Data Generation of Grid Infrastructure to Automate Drone Inspection Exelon, the largest regulated electric utility in the US, is using NVIDIA Omniverse to automatically create 1000's of grid asset defect examples for real-time drone inspection models—enhancing grid reliability and resiliency. Learn more. **[Read Blog](https://resources.nvidia.com/en-us-energy-utilities/exelon-uses-syntheti)** ### Microsoft and TempoQuest Accelerate Wind Energy Forecasts with AceCast Learn how TempoQuest leveraged accelerated computing from NVIDIA on Microsoft Azure to move the traditional Weather Research and Forecasting (WRF) software to GPUs, enabling faster predictions of power generated by renewable wind and solar resources. **[Read Blog](https://resources.nvidia.com/en-us-energy-utilities/microsoft-and-tempo)** ### How AI is Powering the Future of Clean Energy To manage renewable energy at scale, NVIDIA and its ecosystem partners are using AI to optimize solar and wind farms, simulate climate and weather, maintain power grids, advance carbon capture, and power fusion breakthroughs. **[Read Blog](https://blogs.nvidia.com/blog/2023/07/27/i-am-ai-clean-energy/)** ### Forged in Flames: Startup Fuses Gen AI, Computer Vision to Fight Wildfires Chooch, an NVIDIA Inception Premier member, improved wildfire detection for utilities by 250x using computer vision and generative AI models powered by NVIDIA’s edge-to-cloud accelerated computing. **[Read Blog](https://resources.nvidia.com/en-us-energy-utilities/generative-ai-wildfire)** ### Sun in Their AIs: Nonprofit Forecasts Solar Energy for UK Grid Open Climate Fix, a member of NVIDIA Inception, built transformer-based AI models trained on terabytes of satellite data to improve solar energy generation predictions by 3x. Learn how the nonprofit is helping decarbonize the UK's electric grid. **[Read Blog](https://resources.nvidia.com/en-us-energy-utilities/ai-forecasts-solar-e)** ### Startup Enhances Safety of Earth’s Energy Infrastructure From Outer Space Processing hyperspectral imagery with NVIDIA edge AI technology, NVIDIA Inception member Orbital Sidekick has detected hundreds of suspected gas and hydrocarbon leaks. **[Read Blog](https://resources.nvidia.com/en-us-upstream-energies/orbital-sidekick)** **[View all NVIDIA energy blogs](https://blogs.nvidia.com/blog/tag/energy/)** ### Sign up for the latest developer news from NVIDIA [Subscribe](https://www.nvidia.com/en-us/preferences/email-signup/) --- # Source: https://developer.nvidia.com/networking/ethernet-switch-sdk.md # NVIDIA Ethernet Switch SDK and SAI The optimized SDK for high-performance Ethernet switches. The NVIDIA® Ethernet Switch SDK provides the flexibility to implement any switching and routing functionality, with sophisticated programmability that doesn’t compromise performance in packet rate, bandwidth, or latency. With the SDK, server and networking OEMs and network operating system (NOS) vendors can take advantage of **the advanced networking features** of the Ethernet switch family of integrated circuits (ICs) and build flexible, innovative, and cost-optimized switching solutions. SAI (Switch Abstraction Interface) defines a community-standard API, providing a vendor-independent way of controlling an Ethernet Switch ASIC. NVIDIA is a founding member of SAI and has been part of the governance board since 2014. NVIDIA's SAI implementation offers a subset of the functionalities available in the NVIDIA SDK. Apply for access to NVIDIA Spectrum Ethernet Switch SDK and SAI documentation and code. Please log in or create an account using a business email address. Applications using personal accounts (e.g., @gmail, @yahoo) won’t be accepted. Access will only be approved after signing an SDK license agreement. For more information, please reach out to an NVIDIA representative. [Join now](/networking/ethernet-switch-sdk/join) * * * ## Typical Use Cases for the Ethernet Switch SDK **The Ethernet Switch SDK offers a flexible business model with multiple development paths:** - From the application-specific integrated circuit (ASIC) up, build your own switch hardware, PCB, and system and develop a NOS. - Start from a wide selection of white-box options, and develop your own NOS running on production-ready hardware. NVIDIA offers hardware customization and rebranding. - Run on top of SONiC to benefit from an existing NOS and develop your own containerized application with access to the switch ASIC. This path is typically for applications not part of a mainstream switch or router NOS. **The Ethernet Switch SDK is an open, flexible, high-performance infrastructure for building private clouds to develop powerful applications. The typical users are:** - Cloud service providers (CSPs) who build their own gear for self-consumption - Original equipment manufacturers (OEMs) for applications such as Ethernet Bunch of Flash (eBOF), embedded connectivity (5G for telecommunications), and tactical equipment - OEMs for applications such as TAP aggregation, packet brokering, security appliances such as firewalls, and networking test equipment - Original design manufacturers (ODMs) developing tailored solutions for deployments that off-the-shelf switches cannot support, such as analytics services * * * ## Powerful and Flexible SDK - Single, consistent API model across the entire Ethernet switch family of switch ASICs - Proven with 10+ years in production - Multi-threaded on multiple CPU architectures - Container with SDK access on an existing NOS - Easily portable code base for fast time to market - Linux development environment - API library written in ANSI C - Python libraries for fast development and prototyping ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/networking/nvidia-diagram-spectrum-asic.svg) * * * ## Development Environment ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/networking/nvidia-icon-sdk-dvs-os_0.svg) **Rapid Feature Development** NVIDIA development systems (DVS) and DVS-OS with native Linux are used as the development environment: - Open Network Linux based on Debian 12 (Kernel 6.1.123) base OS - Open Network Installation Environment (ONIE) - Straightforward, practical feature usage by example, supported on every Ethernet platform. Code examples in Python and C - User-friendly development environment, including everything needed to build and debug the SDK, switch abstraction interface (SAI), and NVIDIA What Just Happened® (WJH) * * * ## Key Features - **Unified forwarding capacity** – Up to 102.4 terabits per second (Tb/s) with up to 512 low-latency ports running from 1GbE to 800GbE - **Layer 1** – Port interfaces, speeds, and form factors. Cables (including split) and transceivers. - **Layer 2 switching** – 802.1Q, 802.1D, LAG, STP, MSTP, PVRST, QinQ, IGMP snooping with (\*,G) and (S,G) support. SFLOW and port mirroring (SPAN, RSPAN, ERSPAN), port isolation, 802.1X. - **Layer 3 routing** – VLAN and physical port router interface, VRF, IPv4 and IPv6 unicast and multicast routing. Router interface and router counters. Shared ECMP containers. Tunneling, IPinIP, VxLAN, and QinVXLAN. PIM-EVPN and EVPN multihoming. MPLS, BFD. - **Quality of Service (QoS)** – RDMA over Converged Ethernet (RoCE) support. Dynamic shared buffers configuration and monitoring. Bridge-level storm control policers. RED, ECN. Flow control and priority flow control (PFC). ETS. Port and TC-based policers and shapers. HLL and SLL monitoring and configuration. - **Lossless traffic and performance** – Facilities to set priorities per port, per virtual port, and per traffic flow. Efficient traffic for latency and bandwidth, buffer management, and QoS resources. - **Flexible programmability and ACL** – Predefined L2, L3, L4, and above match keys. Flexible and user-defined match keys. Parsing and lookup on the first 256 bytes of the packet’s header. Multiple ingress and egress binding points for ACL, pipeline overwrite actions, packet flexible modifier actions, policy and classification actions, redirect actions, and tunnel and mirror actions. - **Telemetry and visibility** – What Just Happen (WJH) event-based visibility with raw and aggregated data streaming. Buffer utilization visibility, histograms, and thresholds. Unit-based counters and generic flow flexible counters. - **In-service software upgrade (ISSU)** – Zero-hit traffic loss for zero downtime of the data plane. Fast boot for minimizing traffic hits. - **Precision Timing Protocol (PTP)** – 1588v2 with as low as 5ns accuracy. - **AI switching** – 200GbE, adaptive routing, congestion control with dynamic resource allocation. - **Resource management** – Centralized resources management module with utilization and prediction capabilities. The SDK provides access to the maximum available capacity of the configuration and forwarding tables - **Debuggability** – Logging facilities, SDK dump, and API recorder and player. Interactive Python binding and example scripts. - **Host** – Flexible control packets trap to host CPU. Policing and rate limiting to host CPU. Netdev per port/LAG and bridge Linux kernel integration. User space file descriptor-based packet send and receive. Iterators for every allocable SDK object. Convenient logging for debugging and troubleshooting. System debug information collection script. Precompiled packages for CentOS- and Debian-based systems. --- # Source: https://developer.nvidia.com/industries/financial-services.md 1. [Industries](/industries/overview) 2. Financial Services # Developer Resources for Financial Services NVIDIA technologies for use cases across capital markets, banking, payments, and fintech. Use Cases: - Fraud detection - Financial crime (AML/KYC) - Customer service - Algorithmic trading - Intelligent document processing ### Starter Kits by Use Case ### Detect Anomalies With Fraud Prevention AI-enabled applications using deep learning techniques such as [graph neural networks](https://blogs.nvidia.com/blog/what-are-graph-neural-networks/) (GNNs) can reduce false positives in transaction fraud detection, enhance identity verification accuracy for know-your-customer (KYC) requirements, and make anti-money laundering (AML) efforts more effective, improving both the customer experience and your company’s financial health. - [Learn About the Use Case](https://www.nvidia.com/en-us/use-cases/ai-for-fraud-detection/) - [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/financial-fraud-detection) - [View Self-Paced Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+C-TC-01+V1) ### Scale Real-Time Decisions With Accelerated Portfolio Optimization Traditional CPU-based workflows for portfolio optimization are slow, limited in scale, and unable to support real-time analysis. With AI acceleration, financial firms can model advanced risk measures, process large-scale portfolios in real time, and improve operational efficiency, enabling faster, more informed investment decisions. - [Read the Technical Blog](https://developer.nvidia.com/blog/accelerating-real-time-financial-decisions-with-quantitative-portfolio-optimization/) - [Explore the AI Developer Example](https://build.nvidia.com/nvidia/quantitative-portfolio-optimization) - [View Self-Paced Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-DS-09+V1) ### Accelerate Feature Engineering From Unstructured Data Financial institutions consume massive volumes of unstructured data in their alpha research, feature generation, and strategy creation processes. By combining real-world financial data streams (news, filings, calls) with automated fine-tuning and evaluation, AI accelerates signal backtesting and the time from research to actionable models. - [Read the Technical Blog](https://developer.nvidia.com/blog/build-efficient-financial-data-workflows-with-ai-model-distillation) - [Explore the AI Developer Example](http://build.nvidia.com/nvidia/ai-model-distillation-for-financial-data) ### Explore Tools and Technologies for Financial Services **NVIDIA HPC SDK** ### Optimize High-Performance Computing for Trading **Segments** : Trading The NVIDIA HPC SDK includes the proven compilers, libraries, and software tools essential for maximizing developer productivity and the performance and portability of HPC applications. [Get Started With the HPC SDK](/hpc-sdk) **NVIDIA NeMo** ### Build, Customize, and Deploy Generative AI Models **Segments** : Banking, Payments, Trading NVIDIA NeMo™ is an end-to-end platform for developing custom generative AI—including large language models (LLMs) and speech AI—anywhere. [Get Started With NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) **NVIDIA Dynamo** ### Distribute and Disaggregate Generative AI Serving **Segments** : Banking, Payments, Trading NVIDIA Dynamo is an open-source, low-latency inference framework for serving generative AI models in distributed environments. It scales inference workloads across large GPU fleets with optimized resource scheduling, memory management, and data transfer, and it supports all major AI inference backends. [Get Started With NVIDIA Dynamo](/triton-inference-server) **NVIDIA RAPIDS** ### Accelerate Data Preparation **Segments** : Banking, Payments, Trading NVIDIA RAPIDS™ is an open-source suite of data science libraries that accelerates data processing. It comes with simple integration options for the most popular data science tools. [Get Started With RAPIDS](/rapids)[Try the RAPIDS Accelerator for Apache Spark](https://docs.nvidia.com/spark-rapids/index.html) **GNNs** ### Detect Anomalies for Fraud Prevention **Segments** : Payments Graph neural network (GNN) frameworks are easy-to-use Python packages that offer building blocks to build GNNs on top of existing deep learning frameworks. These can be used for a wide range of applications, such as enhancing accuracy for transaction fraud detection. [Get Started With GNNs](/gnn-frameworks#gnn-frameworks) **NVIDIA FLARE** ### Train Machine Learning Models with Privacy and Security **Segments** : Banking, Payments NVIDIA FLARE™ (Federated Learning Application Runtime Environment) is a domain-agnostic, open-source, and extensible SDK for federated learning. [Get Started With FLARE](/flare) ## Browse by Resource Type GTC Sessions Webinars Blogs [View All](https://www.nvidia.com/en-us/on-demand/playlist/playList-e332507f-4ab8-46d8-8827-756cd25aba13/ "View All") ### Accelerating End-to-End Data Science Workflows Learn how to build and execute end-to-end, GPU-accelerated data science workflows that let you quickly explore, iterate, and move your work into production. In this self-paced lab, you’ll learn how to use RAPIDS accelerated data science libraries to perform data analysis at scale with a wide variety of GPU-accelerated algorithms. [Enroll Now](https://courses.nvidia.com/courses/course-v1:DLI+S-DS-01+V1/) ### Generative AI Explained Explore generative AI, which has recently taken the world by storm. Using neural networks to identify patterns and structures within existing data, it generates new content based on a variety of inputs. In this course, you’ll learn generative AI concepts, applications, and the challenges and opportunities of this exciting field. [Enroll Now](https://courses.nvidia.com/courses/course-v1:DLI+S-FX-07+V1/) ### Synthetic Data Generation for Training Computer Vision Models Streamline synthetic data generation (SDG) using 3D assets into a single application—and modify the appearance and format of the data—with NVIDIA Omniverse Replicator. This lab highlights one of the ways deep learning tools and Omniverse can be used together to streamline deep learning workloads. [Enroll Now](https://courses.nvidia.com/courses/course-v1:DLI+S-OV-10+V1/) [View All Courses](https://www.nvidia.com/en-us/training/ "Download Workflows") ![An analyst reviews real-time market data and financial portfolio graphs on monitors](https://developer.download.nvidia.com/images/fsi-visual-1920x1080(1).jpg) ### Accelerating Real-Time Financial Decisions with Quantitative Portfolio Optimization Learn how the Quantitative Portfolio Optimization developer example accelerates strategy testing and time to decision in financial services. [Read Blog](https://developer.nvidia.com/blog/accelerating-real-time-financial-decisions-with-quantitative-portfolio-optimization/) ![Efficient Financial Data Workflows with AI Model Distillation](https://developer.download.nvidia.com/images/financial-data-workflows-1920x1080.jpg) ### Build Efficient Financial Data Workflows With AI Model Distillation Learn how the AI Model Distillation for Financial Data developer example enables feature engineering and faster backtesting for research in capital markets. [Read Blog](https://developer.nvidia.com/blog/build-efficient-financial-data-workflows-with-ai-model-distillation) ![Fraud Detection in Financial Services with Graph Neural Networks](https://developer.download.nvidia.com/images/ai-workflow-fraud-1920x1080.jpg) ### Supercharging Fraud Detection in Financial Services With Graph Neural Networks Learn how you can leverage the NVIDIA AI Blueprint for financial fraud detection to get started with model building and inference to detect payment transaction fraud. [Read Blog](https://resources.nvidia.com/en-us-financial-services-industry/supercharging-fraud-detection-in-financial-services-with-graph-neural-networks) ![Algorithmic Trading Simulations](https://developer.download.nvidia.com/images/algorithmic-trading-simulations-1920x1080.jpg) ### GPU-Accelerate Algorithmic Trading Simulations by Over 100x With Numba Read how Numba was used on NVIDIA H200 Tensor Core GPUs to boost the performance of computational simulations of financial markets by up to 114x, compared to CPU-only approaches. [Read Blog](https://resources.nvidia.com/en-us-financial-services-industry/gpu-accelerate-algorithmic) ![](https://developer.download.nvidia.com/images/industry/financial-service-industry.jpg) _Figure caption_ ## Accelerate Financial Services With Enterprise AI Massive datasets. Perpetual market fluctuations. Customer inquiries. Intelligent technology can address critical challenges within the modern financial services industry. With NVIDIA’s AI solutions—for generative AI, LLMs, data analytics, and more—institutions can optimize trading, detect transaction fraud, and enhance customer experiences. [Learn About AI for Financial Services](https://www.nvidia.com/en-us/industries/finance/) --- # Source: https://developer.nvidia.com/fleet-command.md # NVIDIA Fleet Command Enterprise Support Services NVIDIA Fleet Command™, the turnkey solution for edge AI, comes with 24/7 NVIDIA Enterprise Support for business-critical issues, providing organizations hands-on help to troubleshoot issues and resources to find the answer to any question. First-project-implementation support, administrative training, and a technical account manager are optionally available to help manage support issues. For more details, please refer to the [end customer terms and conditions.](http://docs.nvidia.com/hpc-sdk/compilers/hcss-terms-and-conditions/index.html) ## Get Started - Interested in getting started with Fleet Command? [Contact us.](https://www.nvidia.com/en-us/data-center/products/fleet-command/?nvmid=get-started) - Already a Fleet Command user? Log in to the [NVIDIA support portal.](https://nvid.nvidia.com/dashboard) - Learn more about [NVIDIA Enterprise Support.](https://www.nvidia.com/en-us/support/enterprise/) --- # Source: https://developer.nvidia.com/gpudirect-storage.md # Magnum IO GPUDirect Storage ## A Direct Path Between Storage and GPU Memory As datasets increase in size, the time spent loading data can impact application performance. GPUDirect® Storage creates a direct data path between local or remote storage, such as NVMe or NVMe over Fabrics (NVMe-oF), and GPU memory. By enabling a direct-memory access (DMA) engine near the network adapter or storage, it moves data into or out of GPU memory—without burdening the CPU. [Download](#Download)[Technical overview](https://nvdam.widen.net/s/k8vrp9xkft/tech-overview-magnum-io-1790750-r5-web) [Read blog](/blog/accelerating-io-in-the-modern-data-center-magnum-io-storage-partnerships/) ![GPU direct storage](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/magnum-io-gpudirect-storage-l.svg) _GPUDirect Storage enables a direct data path between storage and GPU memory and avoids extra copies through a bounce buffer in the CPU’s memory._ ## Partner Ecosystem ### GA NVIDIA GPUDirect Storage integrated solution in production. [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/ddn-logo-190w-107h.svg)](https://www.ddn.com/) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/dellemc-logo-190w-107h.svg)](https://www.delltechnologies.com/en-us/storage/powerscale.html) [![](https://developer.download.nvidia.com/images/HS_Horiz_Tag_Blk@300x-8.png)](https://hammerspace.com/) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/hpe-logo-190w-107h.svg)](https://www.hpe.com) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/hitachi-inspire-the-next-logo-1902-107h.svg)](https://www.hitachivantara.com/en-us/home.html) [![IBM logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Logo-IBM-Azul.png)](https://www.ibm.com/us-en/) [![Kioxia logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/kioxia-logo-190w-107h.svg)](https://www.kioxia.com/en-us/top.html) [![Liqid logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/liqid-logo-190w-107h.svg)](https://www.liqid.com/) [![Micron logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/micron-logo-190w-107h.svg)](https://www.micron.com/about/blog/2020/july/maximize-your-investment-in-micron-ssds-for-ai-ml-workloads-with-nvidia-gpudirect-storage) [![Netapp logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/netapp-logo-190w-107h.svg)](http://www.netapp.com/) [![Netapp logo](https://developer.download.nvidia.com/images/Nutanix-Logo-Charcoal-Gray-Digital.svg)](https://www.nutanix.com/) [![Pure storage logo](https://developer.download.nvidia.com/images/pure-storage-logo.svg)](https://www.purestorage.com/) [![Samsung logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/190w-107h-samsung-logo.svg)](https://www.samsung.com/us/) [![Scaleflux logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/scaleflux-logo-190w-107h.svg)](https://www.scaleflux.com/) [![Supermicro logo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Supermicro_GreenC_NewLogo_WhiteBackground.png)](https://www.supermicro.com/en/) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/vast-logo-190w-107h.svg)](https://vastdata.com/) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/weka-logo-190w-107h.svg)](https://www.weka.io/) ## Key Features of v1.14 The following features have been added in v1.14: - Updated GDS user level stats to show P2PDMA stats - Updated GDS kernel mode driver (nvidia-fs.ko) to support 6.12 kernel - Added Get/Put API support for GDS Parameters, enabling read/write access for size\_t, bool, and string types. - Assorted bug fixes ## Software Download **GPUDirect Storage v1.14 Release** NVIDIA Magnum IO GPUDirect® Storage (GDS) is now part of CUDA. See [https://docs.nvidia.com/gpudirect-storage/index.html](https://docs.nvidia.com/gpudirect-storage/index.html) for more information. GDS is currently supported on Linux x86-64 distributions; it is not supported on Windows. When choosing which CUDA packages to download, please select Linux first followed by x86-64 then either RHEL or Ubuntu distributions along with the desired packaging format(s). [Download](/cuda-downloads) ## Resources - [Read the blog: Accelerating IO in the modern data center - magnum IO storage partnerships](/blog/accelerating-io-in-the-modern-data-center-magnum-io-storage-partnerships/) - [NVIDIA Magnum IO™ SDK](https://developer.nvidia.com/magnum-io) - [Read the blog: Optimizing data movement in GPU applications with the NVIDIA Magnum IO developer environment](https://developer.nvidia.com/blog/optimizing-data-movement-in-gpu-apps-with-magnum-io-developer-environment/) - [Read the blog: accelerating IO in the modern data center: Magnum IO Architecture](https://developer.nvidia.com/blog/accelerating-io-in-the-modern-data-center-magnum-io-architecture) - [Watch the webinar: NVIDIA GPUDirect Storage: Accelerating the data path to the GPU](https://info.nvidia.com/gpudirect-storage-webinar-reg-page.html?ondemandrgt=yes) - [NVIDIA-Certified Systems configuration guide](https://docs.nvidia.com/ngc/ngc-deploy-on-premises/nvidia-certified-configuration-guide/index.html) - [NVIDIA-Certified Systems](https://www.nvidia.com/en-us/data-center/products/certified-systems/) - Contact us at [gpudirectstorageext@nvidia.com](mailto:gpudirectstorageext@nvidia.com) --- # Source: https://developer.nvidia.com/isaac/gr00t.md Build physical AI faster with NVIDIA Cosmos: The advanced world foundation model platform. [Get Started](/cosmos) 1. [Home](https://developer.nvidia.com/)2. 3. [Isaac](/isaac) 4. Isaac GR00T # NVIDIA Isaac GR00T ## Generalist Robot 00 Technology NVIDIA Isaac™ GR00T is a research initiative and development platform for developing general-purpose robot foundation models and data pipelines to accelerate humanoid robotics research and development. [Download Isaac GR00T N1.6 Model](https://github.com/NVIDIA/Isaac-GR00T)[Experience the GR00T-Mimic Blueprint](https://build.nvidia.com/nvidia/isaac-gr00t-synthetic-manipulation) * * * ## See Isaac GR00T in Action https://www.youtube-nocookie.com/embed/Coy2TyBcT4g? * * * ## How Isaac GR00T Works Isaac GR00T features robot foundation models for cognition and control, simulation frameworks built on [NVIDIA Omniverse](https://developer.nvidia.com/omniverse)™ and [Cosmos](https://developer.nvidia.com/cosmos)™, data pipelines for generating synthetic data and environments, and a computer in the robot—Jetson AGX Thor™—to run the entire robot stack. ![A diagram showing how NVIDIA Isaac Gr00T works](https://developer.download.nvidia.com/images/products/gr00t-1920-1080.jpg) ### Introductory Video Watch how Isaac GR00T helps provide the building blocks for the future of AI-powered robotics. [Watch Video (2:21)](https://www.youtube.com/watch?v=m1CH-mgpdYg) ### Introductory Blog Explore the world’s first open foundation model for generalized humanoid robot reasoning and skills. [Read Blog](https://developer.nvidia.com/blog/accelerate-generalist-humanoid-robot-development-with-nvidia-isaac-gr00t-n1/) ### ### Train Humanoids With Isaac GR00T-Dreams Watch how GR00T-Dreams and Isaac GR00T are enabling robots to learn new tasks and generalize across environments. [Watch Video](https://www.youtube.com/watch?v=pMWL1MEI-gE) ### Developer Tutorial Learn how to post-train Isaac GR00T N1.5, enabling robot training with just 20-40 demonstrations. [Watch Tutorial](https://www.youtube.com/watch?v=L-WriIVOyYE&t=4s) * * * ## Isaac GR00T N Robot Foundation Models Isaac GR00T N open foundation models are ideal for generalized humanoid robot reasoning and skills. This cross-embodiment solution takes multimodal input—including language and images—to perform manipulation tasks in diverse environments. These models are trained on an expansive humanoid dataset consisting of real captured data, synthetic data generated using the components of the [Isaac GR00T-Mimic blueprint](https://build.nvidia.com/nvidia/isaac-gr00t-synthetic-manipulation), [Isaac GR00T-Dreams blueprint](http://github.com/nvidia/gr00t-dreams), and internet-scale video data. They’re also adaptable through post-training for specific embodiments, tasks, and environments. Isaac GR00T N models can easily generalize across common tasks—such as grasping, moving objects with one or both arms, and transferring items from one arm to another—or perform multi-step tasks that require long context and combinations of general skills. These capabilities can be applied across a variety of use-cases, including material handling, packaging, and inspection. ### Isaac GR00T N1.6 3B Model [Download From Hugging Face](https://huggingface.co/nvidia/GR00T-N1.6-3B) * * * ## Isaac GR00T Workflows Discover powerful tools for accelerating the development of advanced capabilities for humanoid robots and bridge the simulation-to-reality gap. ### GR00T-Teleop Collect high-quality human demonstrations via teleoperation. [Learn More About GR00T-Teleop](https://developer.nvidia.com/blog/building-a-synthetic-motion-generation-pipeline-for-humanoid-robot-learning/) ![A diagram showing how NVIDIA Isaac Gr00T works](https://developer.download.nvidia.com/icons/m48-robot-mimic-r4.svg) ### GR00T-Mimic Generate a large number of synthetic motion trajectories from a small number of human demonstrations. [Learn More About the GR00T-Mimic Blueprint](https://build.nvidia.com/nvidia/isaac-gr00t-synthetic-manipulation) ![Placeholder](https://developer.download.nvidia.com/icons/m48-robot-reinforcement-learning-alt-256px-blk.png) ### GR00T-Gen Teach robots to generalize and adapt to new situations using diverse, realistic simulation environments. [Learn More About GR00T-Gen](https://developer.nvidia.com/blog/advancing-humanoid-robot-sight-and-skill-development-with-nvidia-project-gr00t/#gr00t-gen_for_diverse_environment_generation) ![Placeholder](https://developer.download.nvidia.com/icons/m48-robot-hand-256px-blk.png) ### GR00T-Dexterity Simplify the complex grasping tasks with an end-to-end, pixels- to- action grasping system to have human-like dexterity. [Learn More About GR00T-Dexterity](https://developer.nvidia.com/blog/advancing-humanoid-robot-sight-and-skill-development-with-nvidia-project-gr00t/#gr00t-dexterity_for_fine-grained_and_dexterous_manipulation%C2%A0) ![Placeholder](https://developer.download.nvidia.com/icons/m48-foundational-model-dexterity-256px-blk.png) ### GR00T-Mobility Enhance the robot’s ability to adapt to new, unseen environments using reinforcement learning and imitation learning techniques. [Learn More About GR00T-Mobility](https://developer.nvidia.com/blog/advancing-humanoid-robot-sight-and-skill-development-with-nvidia-project-gr00t/#gr00t-mobility_for_locomotion_and_navigation%C2%A0) ![Placeholder](https://developer.download.nvidia.com/icons/m48-robot-humanoid-256px-blk.png) ### GR00T-Control Achieve responsive and precise humanoid robot control with a suite of whole- body control libraries, models, and policies. [Learn More About GR00T-Control](https://developer.nvidia.com/blog/advancing-humanoid-robot-sight-and-skill-development-with-nvidia-project-gr00t/#gr00t-control_for_whole-body_control) ![Placeholder](https://developer.download.nvidia.com/icons/m48-foundational-model-cognition-256px-blk.png) ### GR00T-Perception Integrate vision language models, LLMs, and retrieval-augmented memory to boost perception, cognition, and adaptability. [Learn More About GR00T-Perception](https://developer.nvidia.com/blog/advancing-humanoid-robot-sight-and-skill-development-with-nvidia-project-gr00t/#gr00t-perception_for_multimodal_sensing) * * * ## Get Started Developing Humanoid Robots Get the software and infrastructure you need to advance your humanoid robot development with GR00T foundational technologies. ### Software and Services #### Data Generation Pipelines Generate large synthetic trajectory data sets from just a few human demonstrations with GR00T-Mimic and Cosmos. Then, train your robot motion policy model in Isaac Lab. [Try It Now](https://github.com/NVIDIA-Omniverse-blueprints/synthetic-motion-generation) #### Generate vast amounts of synthetic trajectory data using NVIDIA Cosmos world foundation models to teach robots how to perform new actions in novel environments. [Try It Now](http://github.com/nvidia/gr00t-dreams) #### Train Robot Policies NVIDIA Isaac Lab enables scalable, adaptable policy training in physically accurate scenes, bridging the sim-to-real gap. [Learn More About Isaac Lab](https://developer.nvidia.com/isaac/lab) #### Simulate and Validate The trained robot policies can be validated in physically accurate environments using NVIDIA Isaac Sim™ before deployment. [Learn More About Isaac Sim](https://developer.nvidia.com/isaac/sim) ### Compute Infrastructure #### Train The NVIDIA DGX™ Cloud end-to-end AI platform for training robotics foundation models gives you scalable capacity built on the latest NVIDIA architecture. It’s co-engineered with the world’s leading cloud service providers. [Try DGX Now](https://www.nvidia.com/en-us/data-center/get-dgx/) #### Simulate NVIDIA Omniverse and Cosmos systems deliver the simulation platforms on industry-leading NVIDIA RTX™ PRO 6000 Blackwell Workstation and Server GPUs. This lets you accelerate the next generation of robotics simulation and learning workloads. [Explore the NVIDIA RTX PRO 6000 Blackwell Series ](https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000-family/) #### Deploy Accelerate the development of advanced humanoid robots and run multimodal AI models. Jetson AGX Thor is based on NVIDIA Blackwell architecture and comes with integrated functional safety, high-performance CPU, and 100 GB of ethernet bandwidth. [Learn More ](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/) * * * ## Humanoid Developer Learning Library [ Whitepaper ### An Open Foundation Model for Generalist Humanoid Robots A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks. . ](https://arxiv.org/abs/2503.14734) [ Tech Blog ### Building a Synthetic Motion Generation Pipeline for Humanoid Robot Learning Learn how imitation learning enables developers to generate exponentially large datasets from a small number of human demonstrations. . ](https://developer.nvidia.com/blog/building-a-synthetic-motion-generation-pipeline-for-humanoid-robot-learning/) [ Tech Blog ### Advancing Physical AI with NVIDIA Cosmos World Foundation Model Platform NVIDIA Cosmos is a platform that helps developers build custom world models for physical AI systems at scale. It offers open world foundation models and tools for every stage of development, from data curation to training to customization. ](https://developer.nvidia.com/blog/advancing-physical-ai-with-nvidia-cosmos-world-foundation-model-platform/) * * * ## Humanoid Robotics Ecosystem NVIDIA builds foundation models, acceleration libraries, and blueprints to accelerate the world’s ecosystem of humanoid developers. ![NVIDIA’s ecosystem of humanoid developers - Ability Robotics](https://developer.download.nvidia.com/images/logos/agility-robotics-logo.svg) ![NVIDIA’s ecosystem of humanoid developers - Apptronik](https://developer.download.nvidia.com/images/logos/logo-apptronik.svg) ![NVIDIA’s ecosystem of humanoid developers - Boston Dynamics](https://developer.download.nvidia.com/images/logos/boston-dynamics-logo.svg) ![NVIDIA’s ecosystem of humanoid developers - Neura Robotics](https://developer.download.nvidia.com/images/logos/neura-robotics-logo-black.svg) ![NVIDIA’s ecosystem of humanoid developers - Sanctuary AI](https://developer.download.nvidia.com/images/logos/sanctuary-ai-logo.svg) ![NVIDIA’s ecosystem of humanoid developers - Unitree Robotics](https://developer.download.nvidia.com/images/logos/unitree-logo.svg) * * * ## More Resources ![A decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Explore the Community ![img-alt-text](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Training and Certification ![A decorative image representing Inception for Startups](https://developer.download.nvidia.com/icons/m48-ai-startup.svg) ### Join the Program for Startups ## Accelerate Your Humanoid Robot Development [Download Isaac GR00T N Models](https://github.com/NVIDIA/Isaac-GR00T?)[Get Started with Isaac Lab](https://developer.nvidia.com/isaac/lab) --- # Source: https://developer.nvidia.com/grace-cpu.md # NVIDIA Grace CPU ## Meet NVIDIA Grace, a breakthrough in data center performance and efficiency. NVIDIA Grace, our first data center CPU, is designed to deliver flagship server performance in a power-efficient design for enterprise and hyperscale computing applications, scientific computing, and data analytics. The NVIDIA Grace CPU is a standards-based design that builds on and is compatible with the same Arm™ ecosystem found in mobile, edge, data center, and on-premises systems. View Resources[Explore Documentation](https://docs.nvidia.com/grace/index.html) * * * ## Developer Tools As part of our commitment to the broader Arm™ ecosystem, existing AArch64 binaries, tools, compilers, and operating systems are compatible with NVIDIA Grace—delivering incredible performance thanks to its streamlined architecture and intelligent on-chip fabric. Applications not yet distributed for Arm can gain a dramatic boost in performance and efficiency with a simple recompile. NVIDIA distributes upstream Arm optimizations to open-source tools, like Clang, for developers who don't want to wait for regular releases, but do want to build code that performs optimally, not only on NVIDIA Grace but any Arm server CPU. ![NVIDIA Grace CPU](https://developer.download.nvidia.com/images/grace/nvidia-grace-cpu.jpg) ### Compilers - [HPC SDK](https://developer.nvidia.com/hpc-sdk) - [Clang](https://developer.nvidia.com/grace/clang) ### Libraries - [NVIDIA Performance Libraries (NVPL)](https://developer-qa.nvidia.com/nv/admin/pages/3443/builder?locale=en) - [CUDA](https://developer.nvidia.com/cuda-zone) ### Tools - [NVIDIA Developer Tools](https://developer-qa.nvidia.com/nv/admin/pages/3443/builder?locale=en) - [NVIDIA Nsight™ Systems](https://developer.nvidia.com/nsight-systems) - [NVIDIA Developer Tools](https://developer-qa.nvidia.com/nv/admin/pages/3443/builder?locale=en) * * * ## Performance Tuning Guides These guides for end users and application developers include best practices for setting up and running NVIDIA Grace CPU-based systems and guidance on achieving optimal performance for key benchmarks and applications. ![NVIDIA Grace Platform](https://developer.download.nvidia.com/images/grace/grace-platform-tuning-guide.jpg) ### NVIDIA Grace Platform Tuning Guide Find out how to set up your NVIDIA Grace system for optimal performance. [Learn More](https://docs.nvidia.com/grace-performance-tuning-guide.pdf) ![NVIDIA Grace CPU](https://developer.download.nvidia.com/images/grace/grace-cpu-benchmarking-guide.jpg) ### NVIDIA Grace CPU Benchmarking Guide See procedures and sample code for benchmarking your NVIDIA Grace system. [Learn More](https://nvidia.github.io/grace-cpu-benchmarking-guide/) ![NVIDIA GH200 Grace Hopper](https://developer.download.nvidia.com/images/grace/nvidia-grace-cpu.jpg) ### NVIDIA GH200 Grace Hopper Benchmarking Guide Find out how to set up your NVIDIA Grace system for optimal performance. [Learn More](https://docs.nvidia.com/gh200-superchip-benchmark-guide.pdf) ![NVIDIA MGX GH200 server](https://developer.download.nvidia.com/images/grace/mgx-gh200-quick-start-guide.jpg) ### NVIDIA MGX GH200 Quick Start Guide Get guidance on unboxing and setting up your NVIDIA MGX GH200 server. [Learn More](https://docs.nvidia.com/gh200-superchip-mgx-system-quick-start-guide.pdf) --- # Source: https://developer.nvidia.com/industries/healthcare.md # Source: https://developer.nvidia.com/isaac/healthcare.md 1. [Home ](/) [Isaac](/isaac/) Isaac for Healthcare # NVIDIA Isaac for Healthcare NVIDIA Isaac™ for Healthcare is a platform purpose-built for developing healthcare robots. Built on NVIDIA’s three-computer framework for [physical AI](https://www.nvidia.com/en-us/glossary/generative-physical-ai/), it features pre-trained models, physics-based simulation, synthetic data generation pipelines, and accelerated runtime libraries. Isaac for Healthcare supports developers across the entire workflow—from collecting and curating data to building and testing AI models in realistic simulated environments, and deploying intelligent, low-latency robotic applications at the edge. [Get Started](https://isaac-for-healthcare.github.io/i4h-docs/workflows/ "vMaterials for Windows")[GitHub](https://github.com/isaac-for-healthcare "vMaterials for Linux ") * * * ## What Isaac for Healthcare Includes Whether you're building surgical robots, AI-guided imaging systems, or intelligent diagnostic tools, Isaac for Healthcare empowers you to design, test, and deploy with confidence. ### Sensor Simulation Physics-based medical sensor emulation for AI training. Generate photorealistic synthetic data with GPU-accelerated performance. - RGB Camera Sim - Ultrasound Sensor Sim [ View Sensors](https://isaac-for-healthcare.github.io/i4h-docs/sensor-simulation/) ### Models Ready-to-use AI models and robotic policies for medical applications. Accelerate development with domain-specific neural networks. - Post-trained Pi0 - Post-trained GR00T N1 - Surgical Control Policies [View Models](https://isaac-for-healthcare.github.io/i4h-docs/models/) ### ### Workflows End-to-end blueprints for building healthcare robotics—combining simulation, training, and deployment. - Robotic Surgery, Robotic Ultrasound, and Telesurgery - Custom Hardware and Asset Tutorials - GR00T and Pi-Zero Model Training Guides [Explore Workflows](https://isaac-for-healthcare.github.io/i4h-docs/workflows/) ### ### Synthetic Data Generation Synthetic data generation capabilities for training robust AI models. Create unlimited, diverse datasets for medical robotics validation. - MAISI for BYO Anatomy - COSMOS-Transfer - COSMOS-Predict (Surgery) [Learn More](https://isaac-for-healthcare.github.io/i4h-docs/sdg/) ### ### Assets and Tutorials Sim-ready medical assets and comprehensive tutorials for rapid prototyping. Pre-validated 3D models and step-by-step guides. - Medical Equipment and Hospital Environment Assets - Anatomical Assets and BYO Anatomy Tutorial - Sim-ready Robot Assets and BYO Robot Tutorial [Browse Assets](https://isaac-for-healthcare.github.io/i4h-docs/asset-catalog/) * * * ## What Developers Can Do With Isaac for Healthcare Isaac for Healthcare brings the combined power of digital twins and physical AI for: - Digital prototyping of next-gen healthcare robotic systems, sensors, and instruments. - Training AI models with real and synthetic data generated by ‌high-fidelity simulation environments - Evaluating AI models in a digital twin environment with hardware-in-the-loop (HIL) - Collecting data for training robotic policies through imitation learning by enabling extended reality (XR)- and/or haptics-enabled teleoperation of robotic systems in digital twins - Training robotic policies for augmented dexterity (for example, for use in robot-assisted surgery) and using GPU parallelization to train reinforcement and imitation learning algorithms - Continuous testing of robotic systems through HIL digital twin systems - Creating deployment applications to bridge simulation and deployment on a physical surgical robot * * * ## Get Started With Isaac for Healthcare Workflows Choose from one of many workflows that contain production-ready implementations integrating all aspects of Isaac for Healthcare, helping you quickly begin robotics development. ![Robotic Surgery](https://developer.download.nvidia.com/icons/m48-medical-surgical-robot.svg) ### Explore the Workflows Accelerate innovation in robotic-assisted devices and procedural development. These workflows include pre-built, modular robotics applications that demonstrate the full development journey from simulation to deployment. [Visit GitHub](https://isaac-for-healthcare.github.io/i4h-docs/workflows/#available-workflows) * * * ## More Resources ![Decorative image representing Developer Community](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore Developer Forums ![Get Training and Certification](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Training and Certification ![NVIDIA Developer Program](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). Get started with Isaac for Healthcare today. [Access GitHub](https://github.com/isaac-for-healthcare "vMaterials for Windows") --- # Source: https://developer.nvidia.com/higher-education-and-research.md # Higher Education and Research Developer Resources A hub of resources and news for researchers, educators, and students. Academic institutions are at the forefront of nurturing the next generation in the emerging technologies of accelerated computing, data science, and AI. To equip researchers, educators, and students in this community, NVIDIA has developed a diverse set of resources—including hardware grants, hands-on workshops, certifications, teaching materials, self-paced courses, webinars, events, and more. ## Resources We invite you to participate in a wide range of custom resources. Navigate through researchers, educators, or students tab and browse the custom program offerings, training opportunities, blogs, and webinars. - [ Researchers ](#researchers) - [ Educators ](#educators) - [ Students ](#students) ## Programs ### Publications Gain insight into some of our leading-edge research. [Learn More About NVIDIA Research](https://research.nvidia.com/publications) ### 6G Developer Program Join to access platforms, documentation, and software releases for advancing cutting-edge 6G research. [Apply Today](/6g-program) ### NVIDIA Academic Grant Program Respond to calls for proposals to advance your research in generative AI and large language models and simulation and modeling. [Learn More About Academic Grant Program](https://www.nvidia.com/en-us/industries/higher-education-research/academic-grant-program/) ## Training ### Self-Paced Courses Access online courses to advance your knowledge in AI, data science, graphics, and more. [Explore Courses](https://www.nvidia.com/en-us/training/online/) ### Learn With NVIDIA Visit our one-stop shop for education and training solutions. [Sign Up Today](https://www.nvidia.com/en-us/learn/) ### Open Hackathons and Bootcamps Accelerate and optimize research applications with mentors by your side. [Learn More About Open Hackathons](https://www.openhackathons.org/s/upcoming-events) ## Additional Learning Opportunities ### NVIDIA Research Explore the work of NVIDIA's world-class researchers and interns, who work in areas such as AI, deep learning, parallel computing, and more. [Learn More About NVIDIA Research](https://www.nvidia.com/en-us/research/) ### NVIDIA GTC Join a global experience that brings together thousands of brilliant innovators, researchers, creators, thought leaders, and decision-makers who are shaping our world with the power of AI. [Learn More About GTC](https://www.nvidia.com/gtc/) ### Developer Forums Network with like-minded developers, engage with GPU experts, and contribute to the discussions. [Learn More About Developer Forums](https://forums.developer.nvidia.com/) ## Blogs ![Generative AI CVPR A cat astronaut image as shown in visual generative AI application at CVPR event](https://developer.download.nvidia.com/images/cvpr-1920-1080.jpg "Generative AI CVPR A cat astronaut image as shown in visual generative AI application at CVPR event") ### NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR NVIDIA Research presented 50 papers, introducing AI software with potential applications in creative industries, autonomous vehicle development, healthcare, and robotics. [Read More](https://blogs.nvidia.com/blog/visual-generative-ai-cvpr-research/) ![AI brain implant restores bilingual communication for a stroke survivor](https://developer.download.nvidia.com/images/neuroprosthesis-1_12-1920-1080.jpg "AI brain implant restores bilingual communication for a stroke survivor") ### AI Brain Implant Restores Bilingual Communication for Stroke Survivor Scientists have enabled a stroke survivor, who is unable to speak, to communicate in both Spanish and English by training a neuroprosthesis implant to decode his bilingual brain activity. [Read More](https://developer.nvidia.com/blog/ai-brain-implant-restores-bilingual-communication-for-stroke-survivor/) ## On-Demand Sessions ### Navigating the Path to Global AI Research Explore how to become a successful global AI researcher through the lens of Europe’s research ecosystem. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-se62803?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ### Use Grace Hopper to Accelerate Deep Learning and HPC Pipelines Oregon State University (OSU) and Mark III share tips, key learnings, and results achieved from optimizing and deploying existing deep learning pipelines on NVIDIA Grace Hopper™ for the first time. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-expt63133/?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ### Fireside Chat With Fei-Fei Li and Billy Dally: The High-Speed Revolution in AI Hear from Fei-Fei Li and Bil Daly on the intersection of academia and enterprise and large language models in this fireside chat. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-se62721/?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ## Webinars ### Building Intelligent AI Chatbots Using RAG See how organizations can build highly customized AI chatbots that can seamlessly access, process, and present data from current, domain-specific, trusted sources. [Watch Now](https://info.nvidia.com/building-intelligent-ai-chatbots-using-rag-webinar.html?ondemandrgt=yes#) ### Deploying Generative AI in Production Explore key considerations for deploying and scaling generative AI in production, including the critical role of AI inference. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/other2024-prodgenai/) ## Programs ### Teaching Kits Eliminate the barrier of incorporating AI and GPU computing in coursework through access to the latest downloadable materials and online courses. [Request A Kit](https://www.nvidia.com/en-us/training/teaching-kits) ### Get Certified by NVIDIA Advance your research project or university curriculum with access to world-class computing resources, from GPUs and DPUs to developer kits. [Learn More About Certification Programs](https://www.nvidia.com/en-us/learn/certification/) ### DLI University Ambassador Program Get certified to deliver hands-on NVIDIA Deep Learning Institute (DLI) workshops to university faculty, students, and researchers. [Apply Now](https://www.nvidia.com/en-gb/deep-learning-ai/education/ambassador-program/) ### Deep Learning Institute Visit our one-stop shop for education and training solutions. [Learn More DLI](https://www.nvidia.com/en-us/training/) ### Jetson AI Ambassador Certification Receive your certificate to teach AI in your classroom with NVIDIA® Jetson Nano™. [Get Certified](https://developer.nvidia.com/embedded/learn/jetson-ai-certification-programs) ### Open Hackathons and Bootcamps Accelerate and optimize research applications with mentors by your side. [Learn More About Open Hackathons](https://www.openhackathons.org/s/upcoming-events) ## Blogs ![Oregon State University is building a research complex with NVIDIA supercomputers](https://developer.download.nvidia.com/images/oregon-state-1920-1080.jpg "Oregon State University is building a research complex with NVIDIA supercomputers") ### OSU Breaks Ground on a $213 Million Research Complex Oregon State University is building a research complex with powerful NVIDIA supercomputers. It will unite faculty and students to tackle critical global challenges in climate science, clean energy, and water resources. [Read More](https://blogs.nvidia.com/blog/oregon-state-higher-ed/) ![Logo of NVIDIA Deep Learning Institute Science and Engineering Teaching Kit](https://developer.download.nvidia.com/images/dli-graphics-science-1920-1080.jpg "Logo of NVIDIA Deep Learning Institute Science and Engineering Teaching Kit") ### NVIDIA Deep Learning Institute Launches Science and Engineering Teaching Kit This new teaching kit will enable the next generation of engineers and scientists to leverage AI for innovation in the field of engineering and science. It was created with leading academics, including George Karniadakis, professor of applied mathematics and engineering at Brown University, and his team. [Read More](https://developer.nvidia.com/blog/nvidia-deep-learning-institute-launches-science-and-engineering-teaching-kit/) ## On-Demand Sessions ### Priming Researchers and Students for AI and Accelerated Computing Enable university instructors and research computing to train students and researchers with NVIDIA’s developer training programs. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-s61961?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ### Integrate Generative AI, Simulation, and Design Into Curricula Hear from professors on how to integrate generative AI into the classroom. Bring innovative workflows and 3D collaboration to the broader educational community through DLI Teaching Kits. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-s62100/?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ### Exploring AI-Assisted Developer Tools for Accelerated Computing Learn from academic and research leaders in commercial solutions about AI-assisted code development, how accelerated computing is currently supported, and the challenges associated with future developments. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-se62128/?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ## Webinars ### Empowering Future Engineers and Scientists With AI and NVIDIA PhysicsNeMo Watch this webinar to learn how NVIDIA has collaborated with pioneers at the intersection of science, engineering, and AI to develop resources to help professionals in these fields. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/other2024-nvmodulus/) ### Accelerating Gene Variant Detection With Deep Learning Hear from Dr. Tychele Turner from the Washington University School of Medicine as she shares the utility of GPU-based acceleration in genomics, including her lab’s use of NVIDIA Parabricks®, a suite of GPU-accelerated and deep learning industry-standard genomics analysis tools for next-generation sequencing. [Watch Now](https://info.nvidia.com/parabricks-academic-webinar.html) ## Programs ### AI Essentials Check out our getting-started resources to explore the fundamentals of today’s hottest technologies. [Learn More About AI Essentials](https://www.nvidia.com/en-us/learn/ai-learning-essentials/) ### Deep Learning Institute Visit our one-stop shop for education and training solutions. [Learn More About DLI](https://www.nvidia.com/en-us/training/) ### NVIDIA Student Network Exclusive AI training and events for student clubs and organizations around the globe. [Apply Today](https://developer.nvidia.com/student-network) ### Open Hackathons and Bootcamps Accelerate and optimize research applications with mentors by your side. [Learn More About Open Hackathons](https://www.openhackathons.org/s/upcoming-events) ### Jetson Specialist Certification Showcase your skills and earn certificates to demonstrate your understanding of NVIDIA Jetson™ and AI with free, open-source courses. [Get Certified](https://developer.nvidia.com/embedded/learn/jetson-ai-certification-programs) ### Graduate Fellowship Program Apply to receive funding for your PhD research in accelerated computing, AI, and other related fields. Application period: August–September [Learn More About Graduate Fellowship Program](https://research.nvidia.com/graduate-fellowships) ## Blogs ![Developers attend a class in AI development for career growth](https://developer.download.nvidia.com/images/dli-workshop-1920-1080.jpg "Developers attend a class in AI development for career growth") ### Byte-Sized Courses: NVIDIA Offers Self-Paced Career Development in AI and Data Science Industry experts gather to share advice on starting a career in AI, highlighting technical training and certifications for career growth. [Read More](https://blogs.nvidia.com/blog/ai-data-science-career-development/) ![NVIDIA Founder and CEO Jensen Huang's commencement address at Caltech](https://developer.download.nvidia.com/images/jhh-caltech-1920-1080.jpg "NVIDIA Founder and CEO Jensen Huang's commencement address at Caltech") ### “Believe in Something Unconventional, Something Unexplored,” NVIDIA CEO Tells Caltech Grads Jensen Huang's commencement address at Caltech highlighted the importance of dedication to innovative ideas, viewing setbacks as opportunities, and the significance of resilience. [Read More](https://blogs.nvidia.com/blog/jensen-huang-caltech-commencement-address/) ## On-Demand Sessions ### Transforming AI Join the authors of Attention Is All You Need, the groundbreaking paper that introduced the transformer neural network architecture, in a session hosted by NVIDIA Founder and CEO Jensen Huang. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-s63046/?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ### Visualizing New Narrative Forms Through AI Learn how artists and scientists can inspire each other to bring to life the mysteries of existence. Explore invisible worlds at the quantum level and how each discipline can [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-s62035/?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ### Navigating Careers in AI Recent advancements in AI models have opened unprecedented opportunities for businesses and the scientific community. In this panel session, learn how to take the next step in your career and explore ways to ensure you have the skills you need to tackle new workloads. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc24-se62721/?playlistId=playList-d2142539-0155-44b0-97ed-215714db0f2a) ## Webinars ### Essential Training and Tips to Accelerate Your Career in AI Join us for a panel discussion on fostering career growth and learning in the fields of AI and other advanced technologies. Our panelists will share their unique career journeys, valuable insights, and tips and tricks for success. [Watch Now](https://www.nvidia.com/en-us/on-demand/session/other2024-aicareer/) ### Bringing Generative AI to Life With NVIDIA Jetson Experience next-gen applications emerging in robotics and computer vision by deploying foundational large language models (LLMs) and vision transformers into real-world embedded systems at the edge. [Watch Now](https://gateway.on24.com/wcc/eh/1407606/lp/4384777/bringing-generative-ai-to-life-with-nvidia-jetson/) ## Additional Resources - [Deep Learning Institute](https://www.nvidia.com/en-us/training/) - [Career opportunities](https://www.nvidia.com/en-us/about-nvidia/careers/) - [NVIDIA GTC](https://www.nvidia.com/gtc/) - [NVIDIA On-Demand](https://www.nvidia.com/en-us/on-demand/) - [Open Bootcamps and Hackathons](https://www.gpuhackathons.org/) - [AI Essentials Learning Series](https://www.nvidia.com/en-us/learn/ai-learning-essentials/) - [NVIDIA Inception](https://www.nvidia.com/en-us/startups/) - [Follow us on Twitter](https://twitter.com/NVIDIAAIDev) Sign up for the latest developer news from NVIDIA. [Subscribe](https://www.nvidia.com/en-us/preferences/email-signup/) --- # Source: https://developer.nvidia.com/holoscan-sdk.md 1. [Home ](/) [Robotics and Edge AI](/embedded-computing) Holoscan # NVIDIA Holoscan SDK NVIDIA Holoscan is a domain-agnostic, multimodal AI sensor processing platform that provides the accelerated, full-stack infrastructure needed for real-time processing of streaming data at the edge or in the cloud. [Download Now](/holoscan-downloads "vMaterials for Windows")[Documentation](https://docs.nvidia.com/holoscan/sdk-user-guide/index.html "vMaterials for Linux ")[Forum](https://forums.developer.nvidia.com/c/healthcare/holoscan-sdk/320/none "vMaterials for Linux ") * * * ## Download the NVIDIA Holoscan SDK To compile the open-source Holoscan SDK yourself, download it from [GitHub](https://github.com/nvidia-holoscan/holoscan-sdk). [GitHub](https://github.com/nvidia-holoscan/holoscan-sdk "vMaterials for Linux ")| [SDK Documentation](https://docs.nvidia.com/clara-holoscan/sdk-user-guide/index.html "vMaterials for Linux ") | [Release Notes](https://github.com/nvidia-holoscan/holoscan-sdk/releases "vMaterials for Linux ") ### PyPi [Pip Install](https://pypi.org/project/holoscan/) NVIDIA Holoscan SDK ### Ubuntu Package Download and install the [Debian Package](https://developer.nvidia.com/holoscan-downloads) ### NGC Container Run within the [Holoscan Container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara-holoscan/containers/holoscan) ### Conda Now available on [Conda](https://anaconda.org/conda-forge) * * * ## How Holoscan SDK Works ### Getting-Started Guide Install, setup, and start learning how to use the Holoscan SDK with guided examples. [Read Guide](https://docs.nvidia.com/holoscan/sdk-user-guide/getting_started.html) ### Holoscan GitHub Learn more about the Holoscan AI sensor processing platform, or compile the open-source Holoscan SDK yourself. [ Go to GitHub ](https://github.com/nvidia-holoscan) ### ### Holoscan Reference Applications Check out and share reference applications, operators, tutorials, and benchmarks on the HoloHub. [Explore HoloHub](https://nvidia-holoscan.github.io/holohub/) ### Learn About the Latest Release Jump into the latest features and improvements for the Holoscan SDK. [View Release Notes](https://github.com/nvidia-holoscan/holoscan-sdk/releases) * * * ## Get Started With Holoscan ![Pick Your Platform](https://developer.download.nvidia.com/icons/m48-orin-256px-blk.svg) ### 1. Pick Your Platform Optimize experiences for edge workloads by running Holoscan on industrial-grade hardware like NVIDIA IGX™. The SDK is also compatible with multiple hardware platforms, including aarch64 and x86. [Learn More](https://www.nvidia.com/en-us/edge-computing/products/igx/) ![Install the SDK](https://developer.download.nvidia.com/icons/m48-software-stack-container.svg) ### 2. Install the SDK Get the specifications you need to run Holoscan locally. Understand the prerequisites and more with this install guide. [Get the Installation Guide](https://docs.nvidia.com/holoscan/sdk-user-guide/sdk_installation.html) ![Explore Examples](https://developer.download.nvidia.com/icons/m48-demo-topics.svg) ### 3. Explore Examples Learn how to use the Holoscan SDK to build applications through a series of core examples. [Start Learning](https://docs.nvidia.com/holoscan/sdk-user-guide/holoscan_core.html) * * * ![Holoscan Sensor Bridge processes low-latency sensor data from diverse sensors over Ethernet](https://developer.download.nvidia.com/images/holoscan/holoscan-sensor-bridge-ari.jpg) #### Process Low-Latency Sensor Data Over Ethernet With Holoscan Sensor Bridge Holoscan Sensor Bridge seamlessly handles high-bandwidth data from diverse sensors over Ethernet, enabling real-time, high-performance AI processing. It serves as a flexible field-programmable gate array (FPGA) interface with a standard API and open software. [Learn More About Holoscan Sensor Bridge](https://www.nvidia.com/en-us/technologies/holoscan-sensor-bridge/) * * * ## Holoscan Reference Applications Access an open repository of [Holoscan reference applications and operators](https://nvidia-holoscan.github.io/holohub/), including prebuilt examples of end-to-end applications and guidance on how to use, customize, and scale them. ![](https://developer.download.nvidia.com/images/holoscan/holoscan-surgical-workflow.jpg) ### Reference Application: End-to-End Surgical Video NVIDIA Holoscan’s Surgical Video Workflow enables rapid, low-latency processing of surgical video feeds with advanced AI models for tool detection and segmentation. With a modular, customizable pipeline, it’s designed to streamline the development of AI-driven medical imaging solutions. [Get Started](https://nvidia-holoscan.github.io/holohub/workflows/ai_surgical_video/) ![](https://developer.download.nvidia.com/images/holoscan/3d-slicer-ari.jpg) OpenIGTLink ### Real-Time Visualization With 3D Slicer This reference application enables images to be easily sent from 3D Slicer to Holoscan for running medical AI inference at the edge for guiding surgical planning, medical research, and advanced diagnostics—leveraging the strength of both platforms within a single reference application. [Learn More About Using 3D Slicer With Holoscan on GitHub](https://github.com/nvidia-holoscan/holohub/tree/main/applications/openigtlink_3dslicer) ![](https://developer.download.nvidia.com/images/holoscan/xr-volume-rendering-ari.jpg) Magic Leap ### XR: Volume Rendering The XR Volume Rendering application combines Holoscan’s real-time AI processing with Magic Leap’s AR display and spatial tracking, showing how XR in medical imaging can help with surgical planning, medical education, and patient consultations. [Learn More About Volume Rendering With Holoscan on GitHub](https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnvidia-holoscan%2Fholohub%2Ftree%2Fmain%2Fapplications%2Fvolume_rendering&data=05%7C02%7Cavleenk%40nvidia.com%7C0487dd803fe54ae32cd908dcf52387f6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638654780240090131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2fQf9bR2N8lT2EDwOWFZlR48X3B2oYPWDAooSyybubU%3D&reserved=0) * * * ## Holoscan Reference Applications Once you’ve set up Holoscan and [learned the basics](https://docs.nvidia.com/holoscan/sdk-user-guide/holoscan_core.html), check out prebuilt examples of end-to-end applications and guidance on how to use, customize, and scale them. ### Body Pose Estimation This computer vision task recognizes specific points on the human body in images or videos. - [View Reference Application: Body Pose Estimation](https://nvidia-holoscan.github.io/holohub/applications/body_pose_estimation/) - [Go to GitHub Repo](https://github.com/nvidia-holoscan/holohub/tree/main/applications/body_pose_estimation) - [Run Instructions](https://nvidia-holoscan.github.io/holohub/applications/body_pose_estimation/#run-instructions) ### Speech-to-Text and Large Language Models This application shows how to transcribe audio files with a speech-to-text model (STT) and how to use a large language model (LLM) to summarize and generate new, relevant information. - [View Reference Application: STT and LLMs](https://nvidia-holoscan.github.io/holohub/applications/speech_to_text_llm/#yaml-configuration) - [Go to GitHub Repo](https://github.com/nvidia-holoscan/holohub/tree/main/applications/speech_to_text_llm) - [Run Instructions](https://nvidia-holoscan.github.io/holohub/applications/speech_to_text_llm/#run-instructions) ### SAM2: Segment Anything in Images and Videos This application demonstrates how to run SAM2 models on a live video feed with the possibility of changing query points in real time. - [View Reference Application: SAM](https://nvidia-holoscan.github.io/holohub/applications/sam2/) - [Read Setup Instructions](https://nvidia-holoscan.github.io/holohub/applications/sam2/#setup-instructions) ### Sensor IO: GPUDirect Data Ingestion Reference applications for sensor input and output (IO) with Holoscan demonstrate the advanced data-ingestion capabilities of NVIDIA® GPUDirect®. Partners like Yuan and Deltacast provide state-of-the-art IO capture cards that integrate seamlessly with Holoscan, enabling rapid data transfer directly to GPU memory. - [View Reference Application: High-Speed Endoscopy (C++)](https://nvidia-holoscan.github.io/holohub/applications/high_speed_endoscopy/cpp/) - [View Reference Application: High-Speed Endoscopy (Python)](https://nvidia-holoscan.github.io/holohub/applications/high_speed_endoscopy/python/) ### DDS-Based Interoperability The Data Distribution Service (DDS) video application demonstrates how video frames can be written to or read from a DDS databus to provide flexible integration between Holoscan applications and other applications (using Holoscan or not). - [View Reference Application: Video Streaming](https://nvidia-holoscan.github.io/holohub/applications/dds/dds_video/) - [Read Quick-Start Guide](https://nvidia-holoscan.github.io/holohub/applications/dds/dds_video/#quick-start) - [Go to GitHub Repo](https://github.com/nvidia-holoscan/holohub/tree/main/applications/dds/dds_video) ### Multi-AI AR-Assisted Surgery This application shows how to enhance robotic surgery by combining 3D anatomical models with real-time AI segmentation to get improved depth perception, better navigation, and enhanced patient safety. - [View Reference Application: Tool Segmentation With AI Overlay](https://nvidia-holoscan.github.io/holohub/applications/orsi/orsi_segmentation_ar/) - [Go to GitHub Repo](https://github.com/nvidia-holoscan/holohub/tree/main/applications/orsi/orsi_segmentation_ar) * * * ## Holoscan Developer Blogs * * * ## More Resources ![NVIDIA Developer Forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore Developer Forums ![NVIDIA Inception for Startups](https://developer.download.nvidia.com/icons/m48-ai-startup.svg) ### Accelerate Your Startup ![NVIDIA Developer Program](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloading or using a model in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). Get started with Holoscan today. [Visit HoloHub](https://nvidia-holoscan.github.io/holohub/ "vMaterials for Windows") --- # Source: https://developer.nvidia.com/hpc-sdk.md # NVIDIA HPC SDK ## A Comprehensive Suite of Compilers, Libraries and Tools for HPC The NVIDIA HPC Software Development Kit (SDK) includes the proven compilers, libraries and software tools essential to maximizing developer productivity and the performance and portability of HPC applications. ![NVIDIA HPC SDK Development and Analysis](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/hpc-sdk-explainer-diagram-update-r1.png "NVIDIA HPC SDK Development and Analysis") The NVIDIA HPC SDK C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC® directives, and CUDA®. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming. Performance profiling and debugging tools simplify porting and optimization of HPC applications, and containerization tools enable easy deployment on-premises or in the cloud. With support for NVIDIA GPUs and Arm or x86-64 CPUs running Linux, the HPC SDK provides the tools you need to build NVIDIA GPU-accelerated HPC applications. [Download Now](/hpc-sdk/downloads)[Get Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nvhpc) * * * ## Why Use the NVIDIA HPC SDK? ### Performance Widely used HPC applications, including VASP, Gaussian, ANSYS Fluent, GROMACS, and NAMD, use CUDA, OpenACC, and GPU-accelerated math libraries to deliver breakthrough performance to their users. You can use these same software tools to GPU-accelerate your applications and achieve dramatic speedups and power efficiency using NVIDIA GPUs. ### Portability Build and optimize applications for over 99 percent of today's Top500 systems, including those based on NVIDIA GPUs, x86-64, or Arm. You can use drop-in libraries, C++17 parallel algorithms and OpenACC directives to GPU accelerate your code and ensure your applications are fully portable to other compilers and systems. ### Productivity Maximize science and engineering throughput and minimize coding time with a single integrated suite that allows you to quickly port, parallelize and optimize for GPU acceleration, including industry-standard communication libraries for multi-GPU and scalable computing, and profiling and debugging tools for analysis. * * * ## Support for Your Favorite Programming Languages ### C++17 Parallel Algorithms C++17 parallel algorithms enable portable parallel programming using the Standard Template Library (STL). The NVIDIA HPC SDK C++ compiler supports full C++17 on CPUs and offloading of parallel algorithms to NVIDIA GPUs, enabling GPU programming with no directives, pragmas, or annotations. Programs that use C++17 parallel algorithms are readily portable to most C++ implementations for Linux, Windows, and macOS. ### Fortran 2003 Compiler The NVIDIA Fortran compiler supports Fortran 2003 and many features of Fortran 2008. With support for OpenACC and [CUDA Fortran](/cuda-fortran) on NVIDIA GPUs, and SIMD vectorization, OpenACC and OpenMP for multicore x86-64 and Arm, it has the features you need to port and optimize your Fortran applications on today’s heterogeneous GPU-accelerated HPC systems. ### OpenACC Directives NVIDIA Fortran, C, and, C++ compilers support OpenACC directive-based parallel programming for NVIDIA GPUs and multicore CPUs. Over 200 HPC application ports have been initiated or enabled using OpenACC, including production applications like VASP, Gaussian, ANSYS Fluent, WRF, and MPAS. OpenACC is the proven performance-portable directives solution for GPUs and multicore CPUs. * * * ## Key Features ![NVIDIA HPC SDK includes a suite of GPU-accelerated math libraries](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-x-libraries.svg " NVIDIA HPC SDK includes a suite of GPU-accelerated math libraries") ### GPU Math Libraries The NVIDIA HPC SDK includes a suite of [GPU-accelerated math libraries](https://developer.nvidia.com/gpu-accelerated-libraries#linear-algebra) for compute-intensive applications. The cuBLAS and cuSOLVER libraries provide GPU-optimized and multi-GPU implementations of all BLAS routines and core routines from LAPACK, automatically using NVIDIA GPU Tensor Cores where possible. cuFFT includes GPU-accelerated 1D, 2D, and 3D FFT routines for real and complex data, and cuSPARSE provides basic linear algebra subroutines for sparse matrices. These libraries are callable from CUDA and OpenACC programs written in C, C++ and Fortran. ### Optimized for Tensor Cores NVIDIA GPU Tensor Cores enable scientists and engineers to dramatically accelerate suitable algorithms using mixed precision or double precision. The NVIDIA HPC SDK math libraries are optimized for Tensor Cores and multi-GPU nodes to deliver the full performance potential of your system with minimal coding effort. Using the NVIDIA Fortran compiler, you can leverage Tensor Cores through automatic mapping of transformational array intrinsics to the cuTENSOR library. Technical Blog: [Bringing Tensor Cores to Standard Fortran](https://developer.nvidia.com/blog/bringing-tensor-cores-to-standard-fortran/) ![NVIDIA A100 Tensor Core FP64](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/HPC_SDK/A100_TensorCore_FP64.jpg " NVIDIA A100 Tensor Core FP64") ![NVIDIA HPC compilers and tools are optimized for your CPU](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/optimized-for-cpu.svg "NVIDIA HPC compilers and tools are optimized for your CPU") ### Optimized for Your CPU Heterogeneous HPC servers use GPUs for accelerated computing and multicore CPUs based on the x86-64 or Arm instruction set architectures. [NVIDIA HPC compilers](/hpc-compilers) and tools are supported on all of these CPUs, and all compiler optimizations are fully enabled on any CPU that supports them. With uniform features, command-line options, language implementations, programming models, and tool and library user interfaces across all supported systems, the NVIDIA HPC SDK simplifies the developer experience in diverse HPC environments. ### Multi-GPU Programming The NVIDIA Collective Communications Library (NCCL) implements highly optimized multi-GPU and multi-node collective communication primitives using MPI-compatible all-gather, all-reduce, broadcast, reduce, and reduce-scatter routines to take advantage of all available GPUs within and across your HPC server nodes. NVSHMEM implements the OpenSHMEM standard for GPU memory and provides multi-GPU and multi-node communication primitives that can be initiated from a host CPU or GPU and called from within a CUDA kernel. ![NVIDIA Collective Communications Library (NCCL) implements multi-GPU programming](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/multi-gpu-programming.svg "NVIDIA Collective Communications Library (NCCL) implements multi-GPU programming") ![NVIDIA HPC SDK includes a MPI library for scalable systems programming](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/scalable-system-programming.jpg "NVIDIA HPC SDK includes a MPI library for scalable systems programming") ### Scalable Systems Programming MPI is the standard for programming distributed-memory scalable systems. The NVIDIA HPC SDK includes a CUDA-aware MPI library based on Open MPI with support for GPUDirect™ so you can send and receive GPU buffers directly using remote direct memory access (RDMA), including buffers allocated in CUDA Unified Memory. CUDA-aware Open MPI is fully compatible with CUDA C/C++, CUDA Fortran and the NVIDIA OpenACC compilers. ### Nsight Performance Profiling Nsight™ Systems provides system-wide visualization of application performance on HPC servers and enables you to optimize away bottlenecks and scale parallel applications across multicore CPUs and GPUs. Nsight Compute allows you to deep dive into GPU kernels in an interactive profiler for GPU-accelerated applications via a graphical or command-line user interface, and allows you to pinpoint performance bottlenecks using the NVTX API to directly instrument regions of your source code. ![Nsight systems provides visualization of app performance on HPC servers](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight-performance-profiling.jpg "Nsight systems provides visualization of app performance on HPC servers") ![NVIDIA HPC SDK can deploy software anywhere using the HPC Container Maker](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-gpu-cloud-hpc-apps-kv-16x9.jpg "NVIDIA HPC SDK can deploy software anywhere using the HPC Container Maker") ### Deploy Anywhere [Containers](https://developer.nvidia.com/ai-hpc-containers) simplify software deployment by bundling applications and their dependencies into portable virtual environments. The NVIDIA HPC SDK includes instructions for developing, profiling, and deploying software using the HPC Container Maker to simplify the creation of container images. The NVIDIA Container Runtime enables seamless GPU support in virtually all container frameworks, including Docker and Singularity. Technical blog: [Building and Deploying HPC Applications using NVIDIA HPC SDK from the NVIDIA NGC Catalog](https://developer.nvidia.com/blog/building-and-deploying-hpc-applications-using-hpc-sdk-from-ngc-catalog/). * * * ## What Users are Saying _“On Perlmutter, we need Fortran, C and C++ compilers that support all the programming models our users need and expect on NVIDIA GPUs and AMD EPYC CPUs — MPI, OpenMP, OpenACC, CUDA and optimized math libraries. The NVIDIA HPC SDK checks all of those boxes.”_ **– Nicholas Wright, NERSC Chief Architect** * * * ## HPC Compilers Support Services HPC Compiler Support Services provide access to NVIDIA technical experts, including: - Paid technical support for the NVFORTRAN, NVC++ and NVC compilers (NVCC excluded). - Help with installation and usage of NVFORTRAN, NVC++ and NVC compilers. - Confirmation of bug reports, prioritization of bug fixes above those from non-paid users. - Where possible, help with temporary workarounds for confirmed compiler bugs. - Access to release archives including both HPC SDK and legacy PGI packages. - For more details please refer to [End Customer Terms & Conditions.](http://docs.nvidia.com/hpc-sdk/compilers/hcss-terms-and-conditions/index.html) ### Get Started - Interested in purchasing the support offer? [Contact us](mailto:enterpriseservices@nvidia.com). - Already have an active support contract and already registered for support? Log in to the [NVIDIA support portal](https://nvid.nvidia.com/dashboard). - Existing customer and want to renew your contract? [Contact us](mailto:renewalsales@nvidia.com?subject=Renewing%20HPC%20Compilers%20Support%20Services). - Questions? Learn more by sending email to [enterpriseservices@nvidia.com](mailto:enterpriseservices@nvidia.com?subject=HPC%20Compilers%20Support%20Services%20Question). * * * ## Featured Content ## Resources - [HPC SDK Documentation](https://docs.nvidia.com/hpc-sdk/index.html) - [Developer Forums](https://forums.developer.nvidia.com/) - Training: - [Portable Acceleration of HPC Applications using ISO C++ - Part 1: Fundamentals](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-dlit51170/) - [Portable Acceleration of HPC Applications using ISO C++ - Part 2: Fundamentals](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-dlit51170/) - [Scaling GPU-Accelerated Applications with the C++ Standard Library](https://courses.nvidia.com/courses/course-v1:DLI+S-AC-09+V1/) - [GPU Hackathons and Bootcamps](https://www.openhackathons.org/s/) - Industry Articles: - [Why Standards-Based Parallel Programming Should be in Your HPC Toolbox](https://www.hpcwire.com/2022/09/05/why-standards-based-parallel-programming-should-be-in-your-hpc-toolbox/?=&linkId=100000147318917) - [Leveraging Standards-Based Parallel Programming in HPC Applications](https://www.hpcwire.com/2022/10/03/leveraging-standards-based-parallel-programming-in-hpc-applications/) - [New C++ Sender Library Enables Portable Asynchrony](https://www.hpcwire.com/2022/12/05/new-c-sender-library-enables-portable-asynchrony/) - Technical Blogs: - [Developing Accelerated Code with Standard Language Parallelism](/blog/developing-accelerated-code-with-standard-language-parallelism/) - [Multi-GPU Programming with Standard Parallel C++: Part One](/blog/multi-gpu-programming-with-standard-parallel-c-part-1/) - [Multi-GPU Programming with Standard Parallel C++: Part Two](/blog/multi-gpu-programming-with-standard-parallel-c-part-2/) - [Using Fortran Standard Parallel Programming for GPU Acceleration](/blog/using-fortran-standard-parallel-programming-for-gpu-acceleration/) - [N Ways to SAXPY: Demonstrating the Breadth of GPU Programming Options](/blog/n-ways-to-saxpy-demonstrating-the-breadth-of-gpu-programming-options) - [Accelerating Standard C++ with GPUs Using stdpar](/blog/accelerating-standard-c-with-gpus-using-stdpar/) - [Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK](/blog/accelerating-fortran-do-concurrent-with-gpus-and-the-nvidia-hpc-sdk/) - [Bringing Tensor Cores to Standard Fortran](/blog/bringing-tensor-cores-to-standard-fortran/) - [Building and Deploying HPC Applications Using NVIDIA HPC SDK from the NVIDIA NGC Catalog](/blog/building-and-deploying-hpc-applications-using-hpc-sdk-from-ngc-catalog/) - [Accelerating Python on GPUs with nvc++ and Cython](/blog/accelerating-python-on-gpus-with-nvc-and-cython/) - [Multinode Multi-GPU: Using NVIDIA cuFFTMp FFTs at Scale](/blog/multinode-multi-gpu-using-nvidia-cufftmp-ffts-at-scale/) - [Extending Block-Cyclic Tensors for Multi-GPU with NVIDIA cuTensorMg](/blog/extending-block-cyclic-tensors-for-multi-gpu-with-nvidia-cutensormg/) - [Accelerating GPU Applications with NVIDIA Math Libraries](/blog/accelerating-gpu-applications-with-nvidia-math-libraries/) - [Accelerating NVIDIA HPC Software with SVE on AWS Graviton3](/blog/accelerating-nvidia-hpc-software-with-sve-on-aws-graviton3/?ncid=so-twit-562255#cid=hpc06_so-twit_en-us) - Presentations: - [Standard Fortran on GPUs and its Utility in Quantum Chemistry Codes](https://www.youtube.com/watch?v=DrvI2gw3tnI) - Related libraries and software: - [NVIDIA GPUDirect®](/gpudirect) - [Magnum IO](https://www.nvidia.com/en-us/data-center/magnum-io/) ## Get Started [Download](/hpc-sdk/downloads) --- # Source: https://developer.nvidia.com/networking/hpc-x.md # NVIDIA HPC-X ## Increase scalability and performance of messaging communications. NVIDIA® HPC-X® is a comprehensive software package that includes Message Passing Interface (MPI), Symmetrical Hierarchical Memory (SHMEM) and Partitioned Global Address Space (PGAS) communications libraries, and various acceleration packages. This full-featured, tested, and packaged toolkit enables MPI and SHMEM/PGAS programming languages to achieve high performance, scalability, and efficiency and ensures that communication libraries are fully optimized by NVIDIA Quantum InfiniBand networking solutions. ### Performance at Any Scale HPC-X takes advantage of NVIDIA Quantum InfiniBand hardware-based networking acceleration engines to maximize application performance. It dramatically reduces MPI operation time, freeing up valuable CPU resources, and decreases the amount of data traversing the network, allowing unprecedented scale to reach evolving performance demands. ![NVIDIA In-Network Computing](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/networking/hpc-x/nvidia-mellanox-in-network-2c50-d%402x.jpg) ## Software and Acceleration Packages ### HPC-X MPI MPI is a standardized, language-independent specification for writing message-passing programs. NVIDIA HPC-X MPI is a high-performance, optimized implementation of Open MPI that takes advantage of NVIDIA’s additional acceleration capabilities, while providing seamless integration with industry-leading commercial and open-source application software packages. [Learn More](https://docs.mellanox.com/category/hpcx) ### HPC-X OpenSHMEM The HPC-X OpenSHMEM programming library is a one-side communications library that supports a unique set of parallel programming features, including point-to-point and collective routines, synchronizations, atomic operations, and a shared memory paradigm used between the processes of a parallel programming application. [Learn More](https://docs.mellanox.com/category/hpcx) ### In-Network Computing NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP™) improves the performance of MPI operations by offloading them from the CPU to the switch network and eliminating the need to send data multiple times, decreasing the amount of data traversing the network and dramatically reducing MPI operation time. [Learn More](https://docs.mellanox.com/category/hpcx) ### Unified Communication X Unified Communications X (UCX®) is an open-source communication framework for data-centric and high-performance applications. UCX provides a low-overhead, point-to-point communication path for near-native, hardware-level performance with cross-platform support. [Learn More](https://docs.mellanox.com/category/hpcx) ### Unified Collective Communication Unified Collective Communication (UCC) is an open-source communication framework for high-performance applications. UCC provides low-overhead collective communications leveraging InfiniBand In-Network Computing acceleration engines. [Learn More](https://docs.mellanox.com/category/hpcx) ### NCCL/SHARP and UCX Support NCCL-RDMA plug-ins enable remote direct-memory access (RDMA) and switch-based collectives (SHARP) with the NVIDIA Collective Communication Library (NCCL). The NCCL UCX plug-in replaces the default NCCL verbs-based inter-node communication routines with UCX-based communication routines for enhanced performance. [Learn More](https://docs.mellanox.com/category/hpcx) ### ClusterKit ClusterKit is a multifaceted node-assessment tool for high-performance clusters. ClusterKit employs well-known techniques and tests to ensure the health and performance of a cluster. [Learn More](https://docs.mellanox.com/category/hpcx) ### Key Features - Offloads collectives communications from MPI onto NVIDIA Quantum InfiniBand networking hardware - Multiple transport support, including Reliable Connection (RC), Dynamic Connected (DC), and Unreliable Datagram (UD) - Intra-node shared memory communication - Receive-side tag matching - Native support for MPI-3 - Multi-rail support with message striping - NVIDIA GPUDirect® with CUDA® support - NCCL-RDMA-SHARP plug-in support ### Benefits - Increases CPU availability, application scalability, and system efficiency for improved application performance - Ensures node-level and system-level health and performance - Maximizes application performance with underlying hardware architecture - Fully optimized for NVIDIA Quantum InfiniBand networking solutions - Supports any interconnect based on InfiniBand or Ethernet standards ## Resources - [ Datasheet ](#product-briefs) - [ Reference Guide ](#reference-guide) - [ Download ](#download) - [NVIDIA HPC-X](https://nvdam.widen.net/s/8xqvk2vprd/infiniband-hpcx-datasheet-web) - [NVIDIA HPC-X Product Documentation](https://docs.mellanox.com/category/hpcx) - [NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Product Documentation](https://docs.mellanox.com/category/mlnxsharp) - [NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) API Guide](https://www.mellanox.com/files/related-docs/prod_acceleration_software/Mellanox_SHARP_SW_API_Guide.pdf) https://downloaders.azurewebsites.net/downloaders/hpcx_downloader/downloader2.html?1 ### See how you can build the most efficient, high-performance network. [Request a Demo](https://mellanox.secure.force.com/VF_Lead_Demo_Form) ### Configure Your Cluster [GET STARTED](https://www.mellanox.com/solutions/configuration-tools) ### Take Networking Courses [LEARN MORE](https://academy.mellanox.com) ### Ready to Purchase? [HOW TO BUY](https://store.mellanox.com/) --- # Source: https://developer.nvidia.com/igx-downloads.md # IGX Download Center NVIDIA IGX Orin™ is a high-performance, AI platform, featuring industrial- grade hardware and enterprise software and support. Purpose- built for industrial and medical environments, IGX delivers industry- leading performance, security, and functional safety, and has a 10- year lifecycle and support. --- # Source: https://developer.nvidia.com/ai-apps-for-rtx-pcs/inference-backends.md 1. [Home](https://developer.nvidia.com/) 2. [RTX AI PCs](/ai-apps-for-rtx-pcs/) NVIDIA AI Inference Backends # Accelerate AI On NVIDIA RTX PCs NVIDIA RTX™ PCs accelerate your AI features for maximum performance and lowest latency. NVIDIA offers broad support on all major AI inference backends to meet every developer’s needs. * * * ## Overview of AI Inference Backends Developers need to consider several factors before choosing a deployment ecosystem and path for their application. Each inference backend offers specific model optimization tools and deployment mechanisms for efficient application integration. Inference backends map model execution to hardware, with top options optimized for NVIDIA RTX GPUs. Achieving peak AI performance requires model optimization techniques like quantization and pruning. Higher-level interfaces streamline application packaging, installation, and integration, enhancing efficiency. Windows ML with NVIDIA TensorRT for RTX NVIDIA TensorRT for RTX Ollama / Llama.cpp PyTorch ### Who Is Windows ML with TensorRT for RTX For? For developers who want to deploy performant, cross-vendor apps across Windows OS. ### Inferencing Backends Windows ML Runtime, built on ONNX Runtime, allows developers to run ONNX models locally across the entirety of PC hardware—including CPUs, NPUs, and GPUs. Windows ML automatically picks what execution provider to use, depending on the hardware available on the user’s PC, then downloads all the files necessary for that hardware. Windows ML is powered by NVIDIA TensorRT™ for RTX on NVIDIA GPUs for maximum performance. [Get Started With Windows ML](https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview) [See Windows ML Example Samples](https://github.com/microsoft/WindowsAppSDK-Samples/tree/feature/winml-stable/Samples/WindowsML) [See ONNX Runtime API Samples](https://github.com/microsoft/onnxruntime-inference-examples/tree/main) ### Model Optimization The Olive optimization toolkit offers hardware-aware quantization across CPUs, NPUs, and NVIDIA RTX GPUs—with easy integration into the Windows ML inferencing backend. [Get Started With Olive](https://microsoft.github.io/Olive/) ### Deployment Mechanisms Packaging and deploying Windows ML-based apps on PCs is simple. Your application and the Windows ML are decoupled, allowing for OTA updates. Just add a reference to Windows ML within your project, and Windows ML will manage the download and install everything else—including versioning, execution providers, runtime, and all the dependencies. [Get Started With Windows ML](https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview) ### Introduction to ONNX Runtime https://www.youtube-nocookie.com/embed/Wp5PaRpudlk? [Watch Video (8:12)](https://www.youtube.com/watch?v=Wp5PaRpudlk) ### ONNXRuntime-GenAI Installation and Inference Walkthrough https://www.youtube-nocookie.com/embed/tNq9MP9wlBg? [Watch Video (6:00)](https://www.youtube.com/watch?v=tNq9MP9wlBg) ### Who Is Ollama and Llama.cpp For? For large language model (LLM) developers who want wide reach with cross-vendor and cross-OS support. ### Inferencing Backends Ollama and Llama.cpp enable LLM-only inferencing across a variety of devices and platforms with unified APIs. This requires minimal setup, delivers good performance, and is a lightweight package. Ollama and Llama.cpp are powered by the GGML runtime and compatible with GGUF model formats. [Get Started With Ollama](https://github.com/ollama/ollama)[Get Started With Llama.cpp](https://github.com/ggml-org/llama.cpp) ### Model Optimization Ollama leverages model optimization formats, such as GGUF, both within and outside Llama.cpp tooling. This format allows for optimal model performance and lightweight deployment. It uses quantization techniques to reduce the size and computational requirements of the model to run across a variety of platforms. [Get Started With Ollama / Llama.cpp Model Quantization](https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md) ### Deployment Mechanisms With Ollama, you can deploy in an out-of-process format, with a server running on localhost. Apps communicate with this server using a REST API. [Get Started With Ollama](https://github.com/ollama/ollama)[Get Started With Llama.cpp](https://github.com/ggml-org/llama.cpp)[See How to Deploy LLMs on RTX PCs](https://forums.developer.nvidia.com/t/how-to-deploy-llms-on-rtx-pcs/317354) ### Who Is NVIDIA TensorRT for RTX SDK For? For developers looking for full behavior control on NVIDIA RTX GPUs. ### Inferencing Backends NVIDIA TensorRT™ for RTX offers full behavior control on RTX PCs, is lightweight for easy packaging into applications, and can generate optimized engines in just seconds on device. [Get Started With TensorRT for RTX SDK ](https://developer.nvidia.com/tensorrt-rtx) ### Optimize Your Models TensorRT for RTX uses a just-in-time (JIT) engine builder to compile any ONNX model with optimizations that take full advantage of the user’s specific GPU-configuration. It happens transparently to the user, taking less than 30 seconds on first setup. [Get Started With TensorRT for RTX SDK](https://developer.nvidia.com/tensorrt-rtx) ### Deployment Mechanisms With TensorRT for RTX, deploying AI apps is easier. Developers can include both the model and the lightweight TensorRT runtime (under 200 MB) inside their applications. When a user installs the app or on first run, TensorRT-RTX quickly compiles the model for their specific hardware in under 30 seconds, ensuring peak performance. [Get Started with TensorRT for RTX SDK](https://developer.nvidia.com/tensorrt-rtx) ### Who Is PyTorch For? For developers looking to experiment with and evaluate AI while maintaining cohesion with model training pipelines. ### Inferencing Backends PyTorch is a popular open-source machine learning library that offers cross-platform and cross-device inferencing options. [Get Started With PyTorch](https://pytorch.org/get-started/locally/) ### Model Optimization PyTorch offers several leading algorithms for model quantization, ranging from quantization-aware training (QAT) to post-training quantization (PTQ), as well as sparsity for in-framework model optimization. [Get Started With torchao](https://github.com/pytorch/ao/) ### Deployment Mechanisms To serve models in production applications within PyTorch, developers often deploy using an out-of-process format. This would require building python packages, generating model files and standing up a localhost server. This can be streamlined with frameworks such as tocrchserve and HuggingFace Accelerate. [Get Started With PyTorch ](https://github.com/pytorch/pytorch?tab=readme-ov-file#nvidia-cuda-support)[Get Started With torchserve](https://github.com/pytorch/serve/tree/master)[Get Started With HuggingFace Accelerate](https://github.com/pytorch/serve/tree/master/examples/large_models/Huggingface_accelerate)[Recommendations on How to Deploy Diffusion Models on NVIDIA RTX PCs](https://forums.developer.nvidia.com/t/recommendations-on-how-to-deploy-diffusion-models-on-nvidia-rtx-pcs/341946) * * * ## Choosing an Inferencing Backend | | Windows ML with NVIDIA TensorRT for RTX | TensorRT for RTX | Ollama / Llama.cpp | PyTorch | | --- | --- | --- | --- | --- | | **For** | Windows application developers building AI features for Windows PC | Windows application developers who want maximum control and flexibility of AI behavior on NVIDIA RTX GPUs | LLM developers who want wide reach with cross-vendor and cross-OS support | Developers experimenting with and evaluating AI while maintaining cohesion with model training pipelines | | **Performance** | Fastest | Fastest | Fast | Good | | **OS Support** | Windows | Windows and Linux | Windows, Linux, and Mac | Windows and Linux | | **Hardware Support** | Any GPU or CPU | NVIDIA RTX GPUs | Any GPU or CPU | Any GPU or CPU | | **Model Checkpoint Format** | ONNX | ONNX | GGUF or GGML | PyT | | **Installation Process** | Pre-installed on Windows | Install SDK and Python bindings | Installation of Python packages required | Installation of Python packages required | | **LLM Support** | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | | **CNN Support** | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | - | ![](https://developer.download.nvidia.com/icons/check.svg) | | **Model Optimizations** | Microsoft Olive | TensorRT-Model Optimizer | Llama.cpp | - | | **Python** | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | | **C/C++** | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | ![](https://developer.download.nvidia.com/icons/check.svg) | | **C#/.NET** | ![](https://developer.download.nvidia.com/icons/check.svg) | - | ![](https://developer.download.nvidia.com/icons/check.svg) | - | | **Javascript** | ![](https://developer.download.nvidia.com/icons/check.svg) | - | ![](https://developer.download.nvidia.com/icons/check.svg) | - | * * * ## Latest NVIDIA News NVIDIA TensorRT for RTX Ollama * * * Stay up to date on how to power your AI apps with NVIDIA RTX PCs. [Learn More](https://developer.nvidia.com/ai-apps-for-rtx-pcs) --- # Source: https://developer.nvidia.com/networking/infiniband-software.md # InfiniBand Software and Drivers NVIDIA® InfiniBand and drivers, protocol software and tools are supported by respective major OS Vendors and Distributions Inbox and/or by NVIDIA where noted. NVIDIA software also supports all major processor architectures. | | | | **Software/Drivers Quick Links** | | | --- | --- | | Description | Release Notes | | --- | --- | | [MLNX\_OFED\*](https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/) | [v24.10-0.7.0.0](https://docs.nvidia.com/networking/display/mlnxofedv24100700/release+notes) | | [DOCA-OFED](https://developer.nvidia.com/networking/doca) | [v3.2.0](https://docs.nvidia.com/networking/display/nvidiawinof2documentationv251050020/release+notes) | | [WinOF-2](https://docs.nvidia.com/networking/display/nvidiawinof2documentationv251050020/release+notes) | [v25.10.50020](https://docs.nvidia.com/networking/display/nvidiawinof2documentationv251050020/release+notes) | | [WinOF](https://network.nvidia.com/products/adapter-software/ethernet/windows/winof-2/) | [v5.50.54000](https://docs.mellanox.com/category/winof) | | [FlexBoot](https://network.nvidia.com/products/adapter-software/flexboot/) | [v3.8.201](https://docs.nvidia.com/networking/display/flexboot38201uefi144010/release-notes) | | [UEFI](https://network.nvidia.com/products/adapter-ethernet-sw/UEFI/) | [v14.10.10](https://docs.nvidia.com/networking/display/flexboot38201uefi144010/release-notes) | | [MFT](https://network.nvidia.com/products/adapter-software/firmware-tools/) | [v4.34.0-145](https://docs.nvidia.com/networking/display/mftv4340/release+notes) | | [HPC-X](https://developer.nvidia.com/networking/hpc-x) | [v2.25](https://docs.nvidia.com/networking/display/hpcxv225/release+notes) | | [VMware® ESX Server](https://network.nvidia.com/products/ethernet-drivers/vmware/esxi-server/) | [ESXi 7.0 U3: v4.22.73.1004](https://docs.mellanox.com/display/VMwarev416711)[ESXi 7.0 U2: v4.21.71.101](https://docs.mellanox.com/display/VMwareRNv42171101)[ESXi 7.0: v4.19.71.1](https://docs.mellanox.com/display/VMwareRNv419711)[ESXi 6.7: v4.17.71.1](https://docs.mellanox.com/display/VMwarev417711)[ESXi 6.5: v4.16.71.1](https://docs.mellanox.com/display/VMwarev416711) | | [FreeBSD Driver](https://network.nvidia.com/products/drivers/freebsd/) | [v3.7.1 ConnectX-4 and above](https://docs.nvidia.com/networking/display/FREEBSDv371/Release+Notes)[v2.1.6 ConnectX-3](https://www.mellanox.com/related-docs/prod_software/Mellanox_FreeBSD_Release_Notes_v2.1.6.pdf) | | [NVIDIA GPUDirect](https://network.nvidia.com/products/GPUDirect-RDMA/) | [v1.1](https://docs.mellanox.com/category/gpudirect) | \* MLNX\_OFED has transitioned into DOCA-Host, and is now available as DOCA-OFED (learn about DOCA-Host profiles [here](https://docs.nvidia.com/doca/sdk/doca+profiles/index.html)). ### Configure a Cluster [Get Started](https://www.nvidia.com/en-us/networking/infiniband-configurator/) ### Ready to Purchase [How to Buy](https://marketplace.nvidia.com/en-us/enterprise/networking/?limit=15) --- # Source: https://developer.nvidia.com/embedded/jetpack.md 1. [Home ](/) [Embedded Computing](/embedded/develop/software) JetPack # NVIDIA JetPack NVIDIA JetPack™ is the official software stack for the NVIDIA Jetson™ platform, giving you a comprehensive suite of tools and libraries for building AI-powered edge applications. JetPack 7, the latest evolution in the series, is the most advanced software stack yet, purpose-built to enable cutting-edge robotics and generative AI at the edge. With full support for NVIDIA Jetson platforms, JetPack 7 provides ultra-low latency, deterministic performance, and scalable deployment for machines that interact with the physical world. [JetPack Downloads and Notes](/embedded/jetpack/downloads "vMaterials for Windows")[Jetson Linux Developer Guide](https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/ "vMaterials for Linux ")[Developer Forum](https://forums.developer.nvidia.com/c/robotics-edge-computing/jetson-embedded-systems/jetson-thor/740 "vMaterials for Linux ") * * * ## JetPack 7 Overview JetPack 7 gives you full support for the NVIDIA® Jetson [Thor](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-thor/)™ platform, preemptable real-time kernel, Multi-Instance GPU (MIG), and integrated [Holoscan Sensor Bridge](https://www.nvidia.com/en-us/technologies/holoscan-sensor-bridge/). Built on Linux Kernel 6.8 and Ubuntu 24.04 LTS, and designed with a modular, cloud-native architecture, it integrates the latest NVIDIA AI compute stack and seamlessly integrates with NVIDIA AI workflows. Whether you’re developing humanoid robots or building an application with the most demanding generative AI workload, JetPack 7 delivers the software foundation to bring them to life. **JetPack 7 is architected with SBSA architecture** With JetPack 7, Jetson software aligns with the Server Base System Architecture (SBSA), positioning Jetson Thor alongside industry-standard ARM server design. SBSA standardizes critical hardware and firmware interfaces, delivering stronger OS support, simpler software portability, and smoother enterprise integration. Building on this foundation, Jetson Thor now supports a unified CUDA 13.0 installation across all Arm targets, streamlining development, reducing fragmentation, and ensuring consistency from server-class systems to Jetson Thor. [![NVIDIA Jetson software stack](https://developer.download.nvidia.com/images/jetson/jetson-software-stack-diagram-r1-01(1).svg "NVIDIA Jetson software stack")](https://developer.download.nvidia.com/images/jetson/jetson-software-stack-diagram-r1-01(1).svg) Click Image to Enlarge * * * ## Components of the JetPack SDK ### AI Compute Stack Note: Jetson Thor is based on the SBSA stack. Please use the SBSA installer when installing from the links below. #### CUDA The NVIDIA® CUDA® Toolkit provides a powerful development environment for creating GPU-accelerated applications, including a compiler, math libraries, and debugging tools. [Explore CUDA](/cuda-toolkit) #### cuDNN The NVIDIA cuDNN (CUDA Deep Neural Network) library offers high-performance primitives for deep learning, with optimized implementations for convolution, pooling, normalization, and activation layers. [Explore cuDNN](/cudnn) #### TensorRT NVIDIA TensorRT™ is a high-performance inference runtime that optimizes and accelerates deep learning models, delivering low latency and high throughput across major frameworks. [Explore TensorRT](/tensorrt) ### AI Frameworks #### PyTorch PyTorch is a fast, flexible deep learning framework with NGC containers for easy deployment across AI tasks like NLP, computer vision, and recommendation systems. [Explore PyTorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) #### vLLM vLLM is a fast and easy-to-use library for LLM inference and serving. Coming Soon #### SGLang SGLang is a fast serving framework for large language models and vision language models. Coming Soon #### Triton Inference Server NVIDIA Triton Inference Server™ enables seamless AI deployment across cloud and edge environments, ensuring consistency and performance optimization. [See Triton Inference Server](https://github.com/triton-inference-server) ### Jetson Linux Components and Libraries #### Flashing Jetson devices can be flashed with Jetson Linux through multiple methods, from command-line tools to automated scripts, with NVIDIA SDK Manager offering the most user-friendly option. [Explore Flashing](https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/SD/FlashingSupportJetsonThor.html) #### Security Jetson Linux delivers a comprehensive suite of security features spanning edge to cloud, including secure boot, disk encryption, runtime integrity, fTPM, and secure OTA updates. [Explore Security](https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/SD/Security.html) #### OTA OTA (Over-the-Air) updates on Jetson enable seamless, remote delivery of software and security upgrades, keeping devices up-to-date without manual intervention. [Explore JetPack Flashing](https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/SD/SoftwarePackagesAndTheUpdateMechanism.html#over-the-air-update) #### Graphics Libraries Jetson supports various graphics APIs, including OpenGL, Vulkan, and EGL, enabling GPU-accelerated rendering and compute for advanced 3D graphics and UI rendering. [Explore Graphics APIs](https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/SD/Graphics.html) #### Multimedia APIs Jetson Linux Multimedia APIs provide low-level access to camera and video processing hardware. This lets you create high-performance applications with fine-grained control over multimedia pipelines. [Explore Multimedia APIs](https://docs.nvidia.com/jetson/archives/r38.4/DeveloperGuide/SD/Multimedia.html) #### Computer Vision Libraries JetPack includes optimized computer vision libraries like OpenCV and VisionWorks that accelerate image processing and vision tasks on Jetson platforms using GPUs and dedicated hardware. [See Computer Vision Libraries](/computer-vision-sdk) ### Other JetPack Components #### Jetson Platform Services Jetson Platform Services is a modular software suite that accelerates AI development on Jetson and helps with rapid deployment of edge applications. [See Jetson Platform Services](/embedded/jetpack/jetson-platform-services-get-started) #### Cloud-Native Design Cloud-native design on Jetson helps you create scalable AI applications at the edge with containerized development, Kubernetes, and microservices, bridging cloud and edge development. [Explore Cloud-Native](https://developer.nvidia.com/embedded/jetson-cloud-native)[Explore NGC Containers](https://catalog.ngc.nvidia.com/containers?filters=&orderBy=weightPopularDESC&query=&page=&pageSize=) #### Nsight Developer Tools NVIDIA Nsight™ developer tools provide powerful profiling, debugging, and performance analysis for optimizing GPU-accelerated applications across AI, graphics, and compute workloads. [Explore Nsight Developer Tool](/nsight-systems) ### Supported SDKs #### NVIDIA DeepStream SDK This SDK gives you a powerful toolkit for building AI-powered vision applications, enabling real-time video analytics with accelerated inference and object tracking. [Explore DeepStream SDK](/deepstream-sdk) #### NVIDIA Isaac ROS NVIDIA Isaac ROS is a collection of hardware-accelerated ROS 2 packages for NVIDIA Jetson. It’s ideal for high-performance perception, localization, and AI in robotics applications. [Explore Isaac ROS](/isaac/ros) #### NVIDIA Holoscan SDK NVIDIA Holoscan SDK is a streaming AI framework for building real-time sensor-processing applications at the edge. It enables high-performance pipelines for healthcare, robotics, and industrial use cases. [Explore Holoscan SDK](/holoscan-sdk) ### Community Support #### Jetson AI Lab Jetson AI Lab is an interactive platform for learning and experimenting with AI on NVIDIA Jetson, offering hands-on projects, tutorials, and tools tailored for edge AI development. [Explore Jetson AI Lab](https://www.jetson-ai-lab.com/) #### Developer Forums NVIDIA Developer Forums are a community hub for developers to ask questions, share knowledge, and get support on NVIDIA technologies, platforms, and SDKs.. [Explorer Developer Forums](https://forums.developer.nvidia.com/c/robotics-edge-computing/jetson-embedded-systems/70) * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Get started with JetPack today. [Download JetPack](/embedded/jetpack/downloads) --- # Source: https://developer.nvidia.com/embedded/jetson-developer-kits.md # NVIDIA Jetson Developer Kits NVIDIA® Jetson™ developer kits enable professionals, students, and enthusiasts to develop and test software for Jetson-based products. Each kit includes a Jetson module on a reference carrier board with standard interfaces for flexible development and prototyping. [Documentation](/embedded/learn/getting-started-jetson#documentation "vMaterials for Windows")[Tutorials](/embedded/learn/getting-started-jetson#tutorials "vMaterials for Linux ")[Forum](https://forums.developer.nvidia.com/c/robotics-edge-computing/jetson-embedded-systems/70 "vMaterials for Linux ") * * * ## Accelerate Edge AI and Robotics Innovation #### Jetson AGX Thor Developer Kit The NVIDIA® Jetson AGX Thor™ Developer Kit gives you unmatched performance and scalability. It’s powered by the NVIDIA Blackwell GPU and 128 GB of memory, delivering up to 2070 FP4 TFLOPS of AI compute to effortlessly run the latest generative AI models—all within a 130 W power envelope. Compared to NVIDIA Jetson AGX Orin™, it provides up to 7.5x higher AI compute and 3.5x better energy efficiency. [Discover Jetson Thor](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-thor/) ![A picture of the Jetson AGX Thor Developer Kit, shown at a three-quarter angle from the side.](https://developer.download.nvidia.com/images/jetson/Jetson_AGX_Thor_Devkit__white.png) ![A picture of the Jetson Orin Nano Developer Kit, shown at a slight angle from the top.](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/embedded/images/jetsonNano/jetson_orin_nano-devkit-front_top-right-trimmed.jpg) #### Jetson Orin Nano Super Developer Kit The NVIDIA Jetson Orin Nano™ Super Developer Kit is a compact, yet powerful computer that redefines generative AI for small edge devices. At just $249, it provides developers, students, and makers with the most affordable and accessible platform, backed by the support of NVIDIA AI software and a broad AI software ecosystem. [Learn About Jetson Orin Nano](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit/) #### Jetson AGX Orin Developer Kit The NVIDIA Jetson AGX 64GB Orin Developer Kit and all Jetson Orin modules share one SoC architecture, enabling the developer kit to emulate any of the modules and make it easy for you to start developing your next product. Compact size, lots of connectors, and up to 275 TOPS of AI performance make this developer kit perfect for prototyping advanced AI-powered robots and other Edge AI devices. [Explore Jetson AGX Orin](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin) ![Jetson AGX Orin Developer Kit](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/embedded/images/jetsonXavier/Jetson-AGX-Orin-Dev-Kit-3qtr-front-right_reverse.jpg) * * * ## Explore NVIDIA Jetson Software and Resources ### Jetson Software Jetson Software is the leading platform for real-time AI and robotics applications, delivering unmatched edge intelligence across industries. [Explore Jetson Software](/embedded/develop/software) ### JetPack SDK JetPack SDK is the complete software suite for developing and deploying AI-powered edge applications on NVIDIA Jetson platforms. [Explore JetPack SDK](/embedded/jetpack) ### Jetson Modules If you’re creating a Jetson-based product, see our detailed Jetson module collateral, product design guides, and more. [Discover Jetson Modules](/embedded/jetson-modules) ### Jetson Download Center The NVIDIA Jetson Download Center provides documentation, software, and resources for the Jetson embedded platform, including tools, drivers, and support files. [Learn More](/embedded/downloads) ### Jetson Partners Get to know Jetson Partners and find the latest products and services made to work with the Jetson platform. [Connect with Partners](/embedded/ecosystem?sortBy=jetson_ecosystem_partners%2Fsort%2Fcompany_name%3Aasc) ### Jetson Community Jetson developers actively contribute videos, how-tos, and open-source projects in addition to the free and comprehensive tutorials offered by NVIDIA. [See Jetson Projects](/embedded/community/jetson-projects) Discover Jetson software for real-time AI and robotics [Get the JetPack SDK](https://developer.nvidia.com/embedded/develop/software) --- # Source: https://developer.nvidia.com/isaac/lab.md 1. [Home](/) [Isaac](https://developer.nvidia.com/isaac) Isaac Lab # NVIDIA Isaac Lab NVIDIA Isaac™ Lab is an open-source, unified framework for [robot learning](https://www.nvidia.com/en-us/glossary/robot-learning/) designed to help train robot policies. It’s built on [NVIDIA Isaac Sim™](https://developer.nvidia.com/isaac/sim), delivering high-fidelity physics simulation using [NVIDIA PhysX®](https://developer.nvidia.com/physx-sdk) and physically based rendering with NVIDIA RTX™. This bridges the gap between high-fidelity simulation and perception-based robot training, helping developers and researchers build more robots, more efficiently. Built on Isaac Lab, [Isaac Lab-Arena](https://developer.nvidia.com/isaac/lab-arena) is an open-source framework for scalable policy evaluation in simulation that gives you streamlined APIs to simplify task curation and diversification. [Download Now from GitHub ](https://github.com/isaac-sim/IsaacLab "Download on GitHub")[Documentation ](https://isaac-sim.github.io/IsaacLab/main/index.html "Documentation") * * * ## How Isaac Lab Works Isaac Lab’s modular architecture and NVIDIA GPU-based parallelization make it ideal for building robot policies that cover a wide range of embodiments, including [humanoid robots](https://www.nvidia.com/en-us/glossary/humanoid-robot/), manipulators, and autonomous mobile robots (AMRs). This gives you a comprehensive framework for robot learning, covering everything from environment setup to policy training. It supports both [imitation](https://www.nvidia.com/en-us/glossary/imitation-learning/) and [reinforcement learning](https://www.nvidia.com/en-us/glossary/reinforcement-learning/) methods. Plus, you can further customize and extend its capabilities with a variety of physics engines, such as PhysX, [NVIDIA Warp](https://developer.nvidia.com/warp-python), and MuJoCo. Isaac Lab is also the foundational robot learning framework of the [NVIDIA Isaac GR00T platform](https://developer.nvidia.com/isaac/gr00t). ![Isaac Lab’s comprehensive platform for robot learning and robot policy building](https://developer.download.nvidia.com/images/isaac/lab/how-nvidia-isaac-lab-works.jpg) ## Introductory Resources ### ### A Simulation Framework for Multi-Modal Robot Learning See how Isaac Lab’s combination of advanced simulation capabilities and data-center scale execution will help unlock breakthroughs in [robotics research](https://www.nvidia.com/en-us/research/robotics/). [Read Whitepaper](https://research.nvidia.com/publication/2025-09_isaac-lab-gpu-accelerated-simulation-framework-multi-modal-robot-learning) ### NVIDIA Isaac Lab-Arena Built on Isaac Lab, Isaac Lab-Arena is an open-source framework for scalable policy evaluation in simulation. [Read Blog](https://developer.nvidia.com/blog/simplify-generalist-robot-policy-evaluation-in-simulation-with-nvidia-isaac-lab-arena/) ### Isaac Lab Courses Explore the fundamentals of robot learning and Isaac Lab, a powerful tool for developing robotic applications. [Take the Introductory Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-36+V1) ### Isaac Lab Office Hours Stay informed with our [recurring office hours](https://addevent.com/calendar/ae483892) that cover in-depth topics with experts answering questions about Isaac Lab. [Watch the Livestreams ](https://www.youtube.com/watch?v=12vqcFywtWQ&list=PL3jK4xNnlCVcnMqm4Lnqa5Bok4_iP5NsK) * * * ## Key Features ### Flexible Robot Learning Customize workflows with robot training environments, tasks, learning techniques, and the ability to integrate custom libraries (e.g., skrl, RLLib, rl\_games, and more). ### Reduced Sim-to-Real Gap The GPU-accelerated PhysX version provides accurate, high-fidelity physics simulations. This include support for deformables that allows for more realistic modeling of robot interactions with the environment. ### Unified Representation Discover easy customization and addition of new environments, robots, and sensors with OpenUSD through Isaac Lab’s modular design. * * * ## Get Started With Isaac Lab ![](https://developer.download.nvidia.com/images/icons/m48-coding-256px-blk.png) ### Download Get started with the latest version of Isaac Lab by following the installation guides on GitHub. [Installation Guide (GitHub)](https://isaac-sim.github.io/IsaacLab/main/source/setup/installation/index.html#local-installation)[Documentation](https://isaac-sim.github.io/IsaacLab/) ### Tutorials Access the step-by-step guide to help understand and use various features of the framework. [Developer Guide](https://github.com/isaac-sim/IsaacLab/discussions) ![](https://developer.download.nvidia.com/icons/m48-speech-recognition.svg) ### Start Your Learning Path Explore advanced concepts in robot learning, gain practical skills, and learn how you can streamline your development processes with Isaac Lab. [Learn More](https://www.nvidia.com/en-us/learn/learning-path/robotics/) * * * #### Newton, the Next-Generation Open-Source Physics Simulation Engine Newton is an open-source, GPU-accelerated, and extensible physics engine, co-developed by Google DeepMind and Disney Research, and [managed by the Linux Foundation](https://www.linuxfoundation.org/press/linux-foundation-announces-contribution-of-newton-by-disney-research-google-deepmind-and-nvidia-to-accelerate-open-robot-learning). Built on NVIDIA Warp and OpenUSD, Newton is optimized for robotics and compatible with learning frameworks such as MuJoCo Playground or Isaac Lab. [Newton Beta](https://github.com/newton-physics) is now available to use. [Get Started on Newton](https://developer.nvidia.com/newton-physics) ![Newton Physics Engine logo](https://developer.download.nvidia.com/images/isaac/newton-ari.jpg) * * * ## Starter Kits View more tutorials and how-to guides in the [documentation](https://isaac-sim.github.io/IsaacLab/main/source/tutorials/index.html). ### Accelerate Robot Learning Choose from reinforcement learning and imitation learning to train AI robots. Easily bring your custom libraries and use the direct agent-environment or hierarchical-manager development workflows. - [Read Use Case: Robot Learning](https://www.nvidia.com/en-us/use-cases/robot-learning/?) - [Isaac GR00T-Mimic Blueprint for Synthetic Manipulation Motion Generation](https://github.com/NVIDIA-Omniverse-blueprints/synthetic-manipulation-motion-generation) - [Physical AI Dataset](https://huggingface.co/collections/nvidia/physicalai-67c643edbb024053dcbcd6d8) ### Enable Perception in the Loop Tiled rendering reduces rendering time by consolidating input from multiple cameras into a single large image. With a streamlined API for handling vision data, the rendered output directly serves as observational data for simulation learning. - [Read Guide: Tiled Rendering](https://isaac-sim.github.io/IsaacLab/main/source/overview/sensors/camera.html#tiled-rendering) ### Scale With Multi-GPU and Multi-Node Training Scale up training of cross-embodied models for complex reinforcement learning environments across multiple GPUs and nodes. Deploy locally and on the cloud (AWS, GCP, Azure, and Alibaba Cloud) by integrating with NVIDIA OSMO. - [Read Guide: Multi-GPU and Multi-Node Rendering](https://isaac-sim.github.io/IsaacLab/main/source/features/multi_gpu.html) ### Accurate High-Fidelity Physics Simulation and Rendering in Omniverse Tap into the latest GPU-accelerated PhysX version through Isaac Lab, including support for deformables, ensuring quick and accurate physics simulations augmented by domain randomizations. - [Read Guide: Mastering Omniverse for Robotics](https://isaac-sim.github.io/IsaacLab/main/source/how-to/master_omniverse.html#) * * * ### RTX PRO Server—the Best Platform for Industrial and Physical AI NVIDIA RTX PRO Server accelerates every industrial digitalization, robot simulation, and synthetic data generation workload. [Learn More](https://www.nvidia.com/en-us/data-center/rtx-pro-6000-blackwell-server-edition/) * * * ## Isaac Lab Learning Library Research A GPU Accelerated Simulation Framework For Multi-Modal Robot Learning **NVIDIA Isaac Lab** We present Isaac Lab, the natural successor to Isaac Gym, which extends the paradigm of GPU-native robotics simulation into the era of large-scale multi-modal learning. Tech Blog Streamline Robot Learning with Whole-Body Control and Enhanced Teleoperation in NVIDIA Isaac Lab 2.3 **NVIDIA Isaac Lab** The latest version of Isaac Lab 2.3, in early developer preview, improves humanoid robot capabilities with advanced whole-body control, enhanced imitation learning, and better locomotion. Tech Blog Quadruped Robot Locomotion and Multiphysics Simulation Using Newton in NVIDIA Isaac Lab **NVIDIA Isaac Lab** Walks through how to train a quadruped robot to move from one point to another and how to set up a multiphysics simulation with an industrial manipulator to fold clothes. This tutorial uses Newton within NVIDIA Isaac Lab. * * * ## Ecosystem Our industry partners and collaborators are integrating NVIDIA Isaac Lab and accelerated computing into their platforms and solutions. [![ NVIDIA industry partner - 1X](https://developer.download.nvidia.com/images/isaac/lab/logo-1x.png)](https://www.1x.tech/) ![NVIDIA industry partner - AgiBot](https://developer.download.nvidia.com/images/logos/agibot-logo.svg) [![NVIDIA industry partner - Agility](https://developer.download.nvidia.com/images/isaac/lab/logo-agility-robotics.png)](https://agilityrobotics.com/) ![NVIDIA industry partner - Boston Dynamics](https://developer.download.nvidia.com/images/isaac/lab/logo-boston-dynamics.png) ![NVIDIA industry partner - Field AI](https://developer.download.nvidia.com/images/isaac/lab/logo-field-ai.png) ![NVIDIA industry partner - Fourier](https://developer.download.nvidia.com/images/isaac/lab/logo-fourier.png) ![NVIDIA industry partner - Galbot](https://developer.download.nvidia.com/images/isaac/lab/logo-galbot.png) ![NVIDIA industry partner - General Robotics](https://developer.download.nvidia.com/images/logos/general-robotics-logo.svg) ![](https://developer.download.nvidia.com/images/isaac/lab/logo-mentee-robotics.png) ![NVIDIA industry partner - RAI Institute](https://developer.download.nvidia.com/images/logos/rai-logo.svg) [![ NVIDIA industry partner - Skild AI](https://developer.download.nvidia.com/images/isaac/lab/logo-skild-ai.png)](https://www.skild.ai/) ![NVIDIA industry partner - UCR](https://developer.download.nvidia.com/images/isaac/lab/logo-ucr.svg) ![NVIDIA industry partner - X-Humanoid](https://developer.download.nvidia.com/images/logos/x-humanoid-logo.svg) * * * ## More Resources ![Decorative image representing forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Join the Program for Startups * * * ## Latest Isaac Lab News ## Get started with NVIDIA Isaac Lab today. [Download Now](https://github.com/isaac-sim/IsaacLab "Download Now")[Documentation](https://isaac-sim.github.io/IsaacLab/ "Documentation") ## FAQs The Isaac Lab framework is open-sourced under the [BSD-3-Clause license](https://opensource.org/licenses/BSD-3-Clause). Isaac Sim is a comprehensive robotics simulation platform built on NVIDIA Omniverse™ that provides high-fidelity simulation with advanced physics and photorealistic rendering. It focuses on synthetic data generation (SDG) and testing and validation (SIL/HIL), and is a reference template for custom robotics simulators. In contrast, Isaac Lab is a lightweight, open-source framework built on top of Isaac Sim, specifically optimized for robot learning workflows and designed to simplify common tasks in robotics research like reinforcement learning, imitation learning, and motion planning. If you’re an existing NVIDIA Isaac Gym (predecessor of Isaac Lab) user, we recommend migrating to Isaac Lab to ensure you have access to the latest advancements in robot learning and a powerful development environment to accelerate your robot training efforts. Check out the [migration guide](https://isaac-sim.github.io/IsaacLab/main/source/migration/migrating_from_isaacgymenvs.html) from Isaac Gym environments to Isaac Lab. Yes, Isaac Lab and MuJoCo are complementary. MuJoCo's ease of use and lightweight design allow for rapid prototyping and deployment of policies and Isaac Lab can complement it when you want to create more complex scenes, scaling massively parallel environments with GPUs and high-fidelity sensor simulations with RTX rendering. NVIDIA and MuJoCo are actively exploring advancing technical collaborations, stay tuned for future announcements. --- # Source: https://developer.nvidia.com/magnum-io.md # NVIDIA MAGNUM IO SDK ## The IO Subsystem for the Modern, GPU-Accelerated Data Center The NVIDIA MAGNUM IO™ software development kit (SDK) enables developers to remove input/output (IO) bottlenecks in AI, high performance computing (HPC), data science, and visualization applications, reducing the end-to-end time of their workflows. Magnum IO covers all aspects of data movement between CPUs, GPUsns, DPUs, and storage subsystems in virtualized, containerized, and bare-metal environments. [Get Magnum IO Container](https://ngc.nvidia.com/catalog/containers/nvidia:magnum-io:magnum-io) ## Latest Magnum IO News Magnum IO for Cloud-Native Supercomputing Architecture Magnum IO, the IO subsystem for data centers, introduces the new enhancements necessary to accelerate IO and the communications supporting multi-tenant data centers, known as Magnum IO for Cloud-Native Supercomputing. [Read More](/blog/accelerating-cloud-native-supercomputing-with-magnum-io/) [![Magnum IO Cloud Native Architecture](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/magnum-io/magnum-io-cloud-native-architecture-cut-graphics-2c50-d%402x_1.jpg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/magnum-io/magnum-io-cloud-native-architecture-cut-graphics-2c50-d%402x_1.jpg) [![Volumetric Video Leveraging Magnum IO and Verizon 5G](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/magnum-io/magnum-demo-screen-grab-2c50-d@2x.jpg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/magnum-io/magnum-demo-screen-grab-2c50-d@2x.jpg) Volumetric Video Leveraging Magnum IO and Verizon 5G Magnum IO GPUDirect over an InfiniBand network enables Verizon’s breakthrough distributed volumetric video architecture. By placing their technology into edge computing centers, located at sports centers around the United States and in Verizon facilities, they’re able to bring 3D experiences to media and serve up new options for putting you in the game. [Watch Video](https://www.youtube.com/watch?v=YF1dsFjMkdw) ## Magnum IO Ecosystem [![Magnum IO Web Diagram](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-magnum-io-web-diagram.svg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-magnum-io-web-diagram.svg) _The Magnum IO stack contains the libraries developers need to create and optimize applications IO across the whole stack: Networking across NVIDIA® NVLink®, Ethernet, and InfiniBand. Local and remote direct storage APIs. In-Networking Compute to accelerate multi-node operations. And IO management of networking hardware._ ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-abstractions-icon.svg) ### Flexible Abstractions Magnum IO enables AI, data analytics, visualization, and HPC developers to innovate and accelerate applications built using common high-level abstractions and APIs. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-scalability-icon.svg) ### Architected for Scale Magnum IO technologies allow for scaling up computation to multiple GPUs via NVLink and PCIe and across multiple nodes on InfiniBand and Ethernet at data center scale. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-management-icon.svg) ### Advanced IO Management Advanced telemetry and monitoring built with NVIDIA NetQ™ and NVIDIA UFM® help users to configure, troubleshoot, and fine-tune the interconnect infrastructure for peak performance. ## Magnum IO Components ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-network-icon.svg) ### Network IO - [GPUDirect® RDMA](/gpudirect) - [MOFED](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) - [NCCL](/nccl) - [NVIDIA ASAP2—Accelerated Switch and Packet Processing®](https://www.nvidia.com/en-us/networking/products/ethernet/) - [NVIDIA HPC-X®](/networking/hpc-x) - [NVSHMEM](/nvshmem) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-storage-icon.svg) ### Storage IO - [GPUDirect Storage](https://developer.nvidia.com/gpudirect-storage) - SNAP ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-inference-icon.svg) ### In-Network Computing - Hardware tag matching - [NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™](https://docs.mellanox.com/display/SHARPv200/Introduction) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Magnum_IO/nvidia-management-icon.svg) ### IO Management - [NVIDIA NetQ](nvidia.com/en-us/networking/ethernet-switching/netq/), [WJH](https://www.mellanox.com/products/what-just-happened) - [NVIDIA UFM](https://www.nvidia.com/en-us/networking/infiniband/ufm/) ## Accelerating IO Across Applications ### Deep Learning Magnum IO networking provides both point-to-point functions like send and receive, and collectives like AllReduce for deep learning training at scale. The collective APIs hide low-level optimizations like topology detection, peer-to-peer copy, and multi-threading to simplify deep learning training. Send/receive can enable users to accelerate giant deep learning models too big to fit in one GPU’s memory. GPUDirect Storage can also help alleviate IO bottlenecks from local or remote storage by bypassing bounce buffers on the CPU host. ### High-Performance Computing To unlock next-generation discoveries, scientists rely on simulation to better understand complex molecules for drug discovery, physics for new sources of energy, and atmospheric data to better predict extreme weather patterns. Magnum IO exposes hardware-level acceleration engines and smart offloads, such as RDMA, GPUDirect, and NVIDIA SHARP, while bolstering the 400Gb/s high bandwidth and ultra-low latency of NVIDIA Quantum 2 InfiniBand networking. With multi-tenancy, user applications may be unaware of indiscriminate interference from neighboring application traffic. Magnum IO, on the latest NVIDIA Quantum-2 InfiniBand platform, features new and improved capabilities for mitigating the negative impact on a user’s performance. This delivers optimal results, as well as the most efficient high performance computing (HPC) and machine learning deployments at any scale. ### Data Analytics Data science and machine learning are the world's largest compute segments. Modest improvements in the accuracy of predictive machine learning models can translate into billions of dollars. To enhance accuracy, the RAPIDS™ Accelerator for Apache Spark library has a built-in shuffle based on NVIDIA UCX® that can leverage GPU-to-GPU communication and RDMA capabilities. Combined with NVIDIA networking, Magnum IO, GPU-accelerated Spark 3.0, and RAPIDS, the NVIDIA data center platform can speed up these huge workloads at unprecedented levels of performance and efficiency. ## Resources - [Magnum IO Developer Environment Documentation](https://github.com/NVIDIA/MagnumIO/blob/main/dev-env/README.md) - [GPUDirect Storage: A Direct Path Between Storage and GPU Memory](https://developer.nvidia.com/blog/gpudirect-storage/) - [Accelerating IO in the Modern Data Center: Network IO](https://developer.nvidia.com/blog/accelerating-io-in-the-modern-data-center-network-io/) - [Accelerating NVSHMEM 2.0 Team-Based Collectives Using NCCL](https://developer.nvidia.com/blog/accelerating-nvshmem-2-0-team-based-collectives-using-nccl/) - [Optimizing Data Movement in GPU Applications with the NVIDIA Magnum IO Developer Environment](https://developer.nvidia.com/blog/optimizing-data-movement-in-gpu-apps-with-magnum-io-developer-environment/) - [Access MOFED](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed) Get Started Using the Magnum IO Developer Environment The Magnum IO Developer Environment is available as a container with the latest versions of all libraries, development tools, and profiling tools needed to begin development and optimization. The optimized applications can then be run in virtualized, containerized, or bare-metal environments. [Download on Github](https://github.com/NVIDIA/MagnumIO) [Download on NGC](https://ngc.nvidia.com/catalog/containers/nvidia:magnum-io:magnum-io) --- # Source: https://developer.nvidia.com/management-library-nvml.md # NVIDIA Management Library (NVML) The NVIDIA Management Library (NVML) is a C-based programmatic interface for monitoring and managing various states within [Data Center GPUs](https://www.nvidia.com/en-us/data-center/). It is intended to be a platform for building 3rd party applications, and is also the underlying library for the NVIDIA-supported [nvidia-smi tool](https://docs.nvidia.com/deploy/nvidia-smi/index.html). NVML is thread-safe so it is safe to make simultaneous NVML calls from multiple threads. ### Query-able states includes: - Identification: Various dynamic and static information is reported, including board serial numbers, PCI device ids, VBIOS/Inforom version numbers and product names. - GPU utilization: Current utilization rates are reported for both the compute resources of the GPU and the memory interface. - ECC error counts: Both correctable single bit and detectable double bit errors are reported. Error counts are provided for both the current boot cycle and for the lifetime of the GPU. - Temperature and fan speed: The current core GPU temperature is reported, along with fan speeds for non-passive products. - Power management: For supported products, the current board power draw and power limits are reported. - Active compute process: The list of active processes running on the GPU is reported, along with the corresponding process name/id and allocated GPU memory. - Clocks and PState: Max and current clock rates are reported for several important clock domains, as well as the current GPU performance state. ### Modifiable state includes: - ECC mode: Enable and disable ECC. - ECC reset: Clear single and double bit ECC error counts. - Compute mode: Control whether compute processes can run on the GPU and whether they run exclusively or concurrently with other compute processes. - Persistence mode: Control whether the NVIDIA driver stays loaded when no active clients are connected to the GPU. The NVIDIA Management Library can be downloaded as part of the [NVIDIA GPU Driver](https://www.nvidia.com/en-us/drivers/) for Linux and Windows. ### Additional References - [NVML API Reference](https://docs.nvidia.com/deploy/nvml-api/index.html) - [NVML Python Bindings](https://pypi.org/project/nvidia-ml-py/) --- # Source: https://developer.nvidia.com/industries/manufacturing.md # Developer Resources for Manufacturing and Industrial Applications A hub of news, SDKs, technical resources, and more. ## App Frameworks and SDKs ### High-Performance Computing * * * ### Robotics and Automation * * * ### High-Performance Computing * * * #### NVIDIA HPC SDK The NVIDIA HPC SDK includes the compilers, libraries, and software tools essential to maximizing developer productivity and the performance and portability of HPC applications. [Learn About the HPC SDK](/hpc-sdk) #### NVIDIA PhysicsNeMo The NVIDIA PhysicsNeMo toolkit addresses the challenges associated with AI-driven physics simulations and complex non-linear physics problems. [Access PhysicsNeMo](/modulus) ### Robotics and Automation * * * #### NVIDIA Isaac Industrial and commercial robotics development can be complex, time consuming, and expensive. The NVIDIA Isaac™ robotics platform is an end-to-end solution that helps decrease costs, simplify development, and accelerate time to market. [Explore Isaac](/isaac) #### NVIDIA Isaac Sim NVIDIA Isaac Sim™, built on NVIDIA Omniverse™, provides access to NVIDIA® PhysX® and RTX™ ray-tracing technologies to create photorealistic robotic simulations for training and testing in virtual industrial environments. [Learn About Isaac Sim](/isaac-sim) ### Data Science and Analytics * * * ### Edge AI Deployment * * * ### AI-Powered Video Analytics * * * ### Real-Time Collaboration * * * ### Data Science and Analytics * * * #### NVIDIA RAPIDS The NVIDIA RAPIDS™ suite of open-source software libraries, built on CUDA-X AI™, makes it possible to execute end-to-end data science and analytics pipelines entirely on GPUs. [Get Started With RAPIDS](/rapids) ### Edge AI Deployment * * * #### NVIDIA Fleet Command NVIDIA Fleet Command™ is a hybrid-cloud platform for managing and scaling AI at the edge. From one control plane, anyone with a browser and internet can deploy applications, update software over the air, and monitor location health. [Learn About Fleet Command](https://www.nvidia.com/en-us/data-center/products/fleet-command/#fleet-command) ### AI-Powered Video Analytics * * * #### NVIDIA Metropolis NVIDIA Metropolis is an application framework, a set of developer tools, and a partner ecosystem that brings visual data and AI together to improve operational efficiency and safety across a broad range of industries. Metropolis simplifies the development, deployment, and scaling of AI-enabled video analytics applications from edge to cloud. [Explore Metropolis](/metropolis) ### Real-Time Collaboration * * * #### NVIDIA Omniverse NVIDIA Omniverse powers real-time collaboration in a single, interactive, simulated environment and instant photorealistic output to ultra-high-fidelity, ray-traced, and path-traced rendering with one click. [Get Started With Omniverse](/nvidia-omniverse-platform) ### Rendering and Video * * * ### Augmented and Virtual Reality * * * ### Rendering and Video * * * #### NVIDIA PhysX NVIDIA PhysX is an open-source, scalable, multi-platform physics simulation solution that brings high-performance and precision accuracy to industrial use cases such as high-fidelity robotics and scientific visualization. [Access PhysX](/designworks) #### NVIDIA OptiX Ray Tracing Engine For ray-tracing applications, the NVIDIA OptiX™ application framework provides a simple, recursive, and flexible pipeline for optimal ray-tracing performance on GPUs. [Learn About Optix](/optix) ### Augmented and Virtual Reality * * * #### NVIDIA CloudXR Suite NVIDIA CloudXR™ is NVIDIA’s solution for streaming virtual reality (VR), augmented reality (AR), and mixed reality (MR) content from any OpenVR XR application on a remote server—cloud, data center, or edge. [Explore CloudXR](/nvidia-cloudxr-sdk) #### NVIDIA VRWorks Graphics NVIDIA VRWorks™ is a comprehensive suite of APIs, libraries, and engines that enable application and headset developers to create amazing VR experiences. [Access VRWorks](/vrworks) ## Browse by Resource Type - [ ### **On-Demand Sessions** ](#gtc) - [ ### **Training** ](#dli) [View all GTC talks](https://www.nvidia.com/en-us/on-demand/playlist/playList-252abb4e-6e00-4af0-8b40-440a93dba1e5/) #### Applications of AI for Predictive Maintenance Learn how to identify anomalies and failures in time-series data, estimate the remaining useful life of the corresponding parts, and use this information to map anomalies to failure conditions, minimizing unplanned downtime. [**Learn More**](https://www.nvidia.com/content/dam/en-zz/Solutions/deep-learning/deep-learning-education/DLI-Workshop-Applications-of-AI-for-Predictive-Maintenance-2.pdf) #### Getting Started with DeepStream for Video Analytics on Jetson Nano In this course, you’ll learn to extract meaningful insights from video streams through deep learning video analytics using JupyterLab notebooks on an [NVIDIA Jetson Nano™](/embedded/jetson-nano-developer-kit). [**Learn More**](https://courses.nvidia.com/courses/course-v1:DLI+S-IV-02+V2/) #### Deep Learning for Industrial Inspection Explore how convolutional and recurrent neural networks can be combined to generate effective descriptions of content within images and video clips by combining computer vision and natural language processing. Solve deep learning problems that require multiple types of data inputs. [**Learn More**](https://www.nvidia.com/content/dam/en-zz/Solutions/deep-learning/deep-learning-education/Deep-Learning-for-Industrial-Inspection.pdf) #### Deep Learning for Robotics Explore how to create robotics solutions on NVIDIA Jetson™ for embedded applications and train a robot to actuate the correct output based on the visual input. [**Learn More**](https://www.nvidia.com/content/dam/en-zz/Solutions/deep-learning/deep-learning-education/Deep-Learning-for-Robotics.pdf) **[View all courses](https://www.nvidia.com/en-us/deep-learning-ai/education/)** * * * ## Programs For You ### Developer Resources The NVIDIA Developer Program provides the advanced tools and training needed to successfully build applications on all NVIDIA technology platforms. This includes access to hundreds of SDKs, a network of like-minded developers through our community forums, and more. [Join Today](/developer-program) ### Technical Training NVIDIA Deep Learning Institute (DLI) offers hands-on training in AI, accelerated computing, and accelerated data science to solve real-world problems. Powered by GPUs in the cloud, training is available as self-paced, online courses or live, instructor-led workshops. [View Courses](https://www.nvidia.com/en-us/training/) ### Accelerate Your Startup NVIDIA Inception—an acceleration platform for AI, data science, and HPC startups—supports over 7,000 startups worldwide with go-to-market support, expertise, and technology. Startups get access to training through the DLI, preferred pricing on hardware, and invitations to exclusive networking events. [Learn More](https://www.nvidia.com/en-us/deep-learning-ai/startups/) * * * ## NVIDIA Manufacturing News **[View all manufacturing news](/blog/tag/manufacturing/)** Sign up for the latest developer news from NVIDIA [Learn More](https://www.nvidia.com/en-us/preferences/email-signup/) --- # Source: https://developer.nvidia.com/industries/media-and-entertainment.md # Developer Resources for Media and Entertainment Explore a hub of SDKs, resources, and news for developers working in the M&E industry. ... ... ... Photo by Diamond View ## Platforms ![Holoscan for media](https://developer.download.nvidia.com/images/industry/m-e-platforms-holoscan-1920x1080.jpg) ### NVIDIA Holoscan for Media NVIDIA Holoscan for Media is the real-time AI platform for companies in broadcast, streaming, and live sports. It enables container orchestration for multi-vendor live production and AI inference on video and audio streams. Build and deploy applications that connect to uncompressed media feeds with minimal latency on NVIDIA-accelerated hardware. [Learn More](/holoscan-for-media) ![Explore NVIDIA AI Enterprise suite of AI and data analytics software ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-ai-enterprise-630x354.jpg) ### NVIDIA AI Enterprise NVIDIA AI Enterprise is a cloud-native suite of software tools, libraries, and frameworks, including NVIDIA NIM™ and NeMo™ microservices, that accelerate and simplify the development, deployment, and scaling of AI applications. Deploy agentic AI systems anywhere—across clouds, data centers, or at the edge—leveraging the extensive partner ecosystem. [Learn More](https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/) ![Explore NVIDIA Jetson platform for autonomous machines and other embedded applications ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-jetson-630x354.jpg) ### NVIDIA NeMo NVIDIA NeMo™ is an end-to-end platform for developing custom generative AI—including large language models (LLMs), vision language models (VLMs), retrieval models, video models, and speech AI—anywhere. [Learn More](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) ![Explore the NVIDIA RTX platform for visualization.](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-rtx-410x230.jpg) ### NVIDIA RTX PRO NVIDIA RTX PRO™ GPUs are the backbone of modern media production, delivering real-time ray tracing, AI-accelerated workflows, and scalable performance for creators and studios. From pre-visualization and virtual production to VFX and final rendering, RTX PRO solutions enable seamless collaboration across mobile workstations, data centers, and hybrid cloud environments. [Learn More](https://www.nvidia.com/en-us/products/workstations/) ## NVIDIA Blueprints ### Build a Video Search and Summarization (VSS) Agent The NVIDIA AI Blueprint for Video Search and Summarization (VSS) makes it easy to start building and customizing video analytics AI agents. These insightful, accurate, and interactive agents are powered by generative AI, vision language models (VLMs), large language models (LLMs), and NVIDIA NIM microservices. They can be given tasks through natural language and perform complex operations like video summarization and visual question-answering, unlocking entirely new application possibilities. [Try Now](https://build.nvidia.com/nvidia/video-search-and-summarization/blueprintcard) ### PDF to Podcast Unlock the power of on-the-go learning and tackle the challenge of information overload with generative AI-powered audio read-outs. Use this blueprint to build a generative AI application that transforms PDF data—such as training documents, technical research, or documentation—into personalized audio content. [Try Now](https://build.nvidia.com/nvidia/pdf-to-podcast) ### Build an Enterprise RAG Pipeline The NVIDIA AI Blueprint for RAG gives developers a foundational starting point for using NVIDIA NeMo Retriever models to build scalable, customizable data extraction and retrieval pipelines that deliver high accuracy and throughput. Use this blueprint to build RAG applications that provide context-aware responses by connecting LLMs to extensive multimodal enterprise data, including text, tables, charts, and infographics from millions of PDFs. [Try Now](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) [View All NVIDIA Blueprints](https://build.nvidia.com/) ## Applications and SDKs - [ View All ](#view-all) - [ **AI** ](#ai) - [ **3D Animation / Physics** ](#animation) - [ **Rendering** ](#rendering) - [ **Immersive** ](#immersive) - [ **Video Processing** ](#video-processing) - [ **Image Processing** ](#image-processing) - [ **Infrastructure** ](#infrastructure) ![NVIDIA CUDA helps developers speed up computing applications by harnessing GPU power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-410x230.jpg) #### CUDA CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. [Learn More](/cuda-zone)[Download SDK](/cuda-downloads) ![ NVIDIA DLSS boosts frame rates and generates sharp images ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/DLSS.jpg) #### Deep Learning Super Sampling (DLSS) NVIDIA Deep Learning Super Sampling (DLSS) is a deep learning neural network that boosts frame rates and generates sharp images. [Learn More](/dlss) ![Flowchart showing NVIDIA TensorRT SDK for deep learning inference](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-tensor-rt-410x230.jpg) #### NVIDIA TensorRT NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. [Learn More](/tensorrt)[Download SDK](/tensorrt-getting-started) ![NVIDIA Maxine SDK helps developers build virtual collaboration and content creation apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-maxine-410x230.jpg) #### NVIDIA Maxine NVIDIA Maxine is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming. [Learn More](/maxine)[Download SDK](/maxine-getting-started) ![Flowchart showing NVIDIA DeepStream SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deep-stream-sdk-410x230.jpg) #### DeepStream SDK NVIDIA’s DeepStream SDK delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video, audio, and image understanding. DeepStream is for vision AI developers, software partners, startups, and OEMs building IVA apps and services. [Learn More](/deepstream-sdk)[Download SDK](/deepstream-getting-started) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-optical-flow-sdk-410x230.jpg) #### NVIDIA Optical Flow SDK Optical Flow SDK exposes the latest hardware capability of Turing and Ampere GPUs dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, robust to frame-to-frame intensity variations, and track the true object motion. [Learn More](/opticalflow-sdk)[Download SDK](/opticalflow/download) ![NVIDIA Triton Inference Server delivers fast and scalable AI in production ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-triton-inference-server-410x230.jpg) #### NVIDIA Triton Inference Server NVIDIA Triton™ Inference Server delivers fast and scalable AI in production. Open-source inference serving software, Triton Inference Server, streamlines AI inference by enabling teams to deploy trained AI models from any framework and on any GPU- or CPU-based infrastructure (cloud, data center, or edge). [Learn More](/nvidia-triton-inference-server)[Download SDK](https://github.com/triton-inference-server/server) ![Flowchart showing NVIDIA Video Codec SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-video-codec-sdk-410x230.jpg) #### NVIDIA Video Codec SDK Video Codec SDK is a comprehensiveAPI set, including high-performance tools, samples, and documentation for hardware-accelerated video encode and decode on Windows and Linux. [Learn More](/nvidia-video-codec-sdk)[Download SDK](/nvidia-video-codec-sdk/download) ![NVIDIA Riva SDK helps build speech AI apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-riva-410x230.jpg) #### NVIDIA Riva NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications customized for your use case and delivering real-time performance. [Learn More](/riva)[Download SDK](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) ![NVIDIA OptiX helps to achieve optimal ray tracing performance on GPU](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-opti-x-ray-tracing-engine-410x230.jpg) #### NVIDIA OptiX Ray Tracing Engine OptiX is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Bring the power of NVIDIA GPUs to your ray tracing applications with programmable intersection, ray generation, and shading. [Learn More](/optix)[Download SDK](/designworks/optix/download) ![NanoVDB adds real-time rendering GPU support for OpenVDB](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nanovdb-410x230.jpg) #### NanoVDB NanoVDB adds real-time rendering GPU support for OpenVDB. OpenVDB is the Academy Award-winning industry-standard data structure and toolset used for manipulating volumetric effects. [Learn More](/nanovdb)[Download SDK](https://github.com/AcademySoftwareFoundation/openvdb/tree/feature/nanovdb) ![Pixar Universal Scene Description (USD) for content creation](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/pixar-universal-scene-description-usd-410x230.jpg) #### Pixar Universal Scene Description (USD) Universal Scene Description (USD) is an easily extensible, open-source 3D scene description and file format developed by Pixar for content creation and interchange among different tools. [Learn More](/usd) ![Rendering MDL sample cloth](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/material-definition-language-410x230.jpg) #### Material Definition Language (MDL) The NVIDIA MDL SDK is a set of tools to integrate MDL support into rendering applications. It contains components for loading, inspecting, editing of material definitions as well as compiling MDL functions to GLSL, HLSL, Native x86, PTX and LLVM-IR. With the NVIDIA MDL SDK, any physically based renderer can easily add support for MDL and join the MDL eco-system. [Learn More](/mdl-sdk)[Download SDK](/mdl-sdk-get-started) ![Nvdiffrast is used for rasterization-based differentiable rendering](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvdiffrast-410x230.jpg) #### Nvdiffrast Nvdiffrast is a PyTorch/TensorFlow library that provides high-performance primitive operations for rasterization-based differentiable rendering. [Learn More](https://nvlabs.github.io/nvdiffrast/) ![NVIDIA HairWorks effects in The Witcher 3: Wild Hunt](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/HairWorks_1600_899.jpg) #### NVIDIA HairWorks NVIDIA HairWorks enables advanced simulation and rendering techniques for richer hair and fur. [Learn More](/hairworks)[Download SDK](/gameworks-source-github) To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video. #### NVIDIA PhysX System Software NVIDIA® PhysX® is a scalable multi-platform physics simulation solution that brings high-performance and precision accuracy to industrial simulation. [Learn More](/physx-sdk)Download SDK ![NVIDIA WaveWorks featured in Just Cause 3](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/WaveWorks_1600_899.jpg) #### NVIDIA WaveWorks NVIDIA WaveWorks delivers cinematic-quality ocean simulation for interactive applications. Learn MoreDownload SDK ![NVIDIA Texture Tools Exporter standalone showcasing a cube map reflection](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Texture_Tools_1600_899.jpg) #### NVIDIA Texture Tools Exporter NVIDIA Texture Tools Exporter creates highly compressed texture files directly from image sources. [Learn More](/nvidia-texture-tools-exporter)[Tools Exporter](/nvidia-texture-tools-exporter)[Download SDK](/2021.1.1/nvidia_texture_tools_exporter) ![NVIDIA Quadro Sync enables video wall synchronization](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230_0.jpg) #### NVIDIA Quadro Sync NVIDIA® Quadro® Sync solutions enable the creation of dazzling ultra-high resolution, perfectly synchronized displays to meet the visualization and presentation needs across industries. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/quadro-sync/) ![NVIDIA Reflex for low latency gaming](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-reflex-410x230.jpg) #### NVIDIA Reflex NVIDIA Reflex is a suite of technologies that measures, analyzes, and reduces system latency. [Learn More](/reflex)[Dwonload SDK](/reflex/get-started) ![NVIDIA NRD highlighted in Techland's Dying Light 2 Stay Human](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-real-time-denoiser-410x230.jpg) #### NVIDIA Real-Time Denoiser (NRD) NVIDIA Real-Time Denoisers (NRD) is a noise removal technology for image noise generated from lighting, shadow, and ray-traced sources. [Learn More](/nvidia-rt-denoiser)[Download SDK](/nvidia-rt-denoiser/get-started) ![NVIDIA RTXDI off and on comparison](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/RTXDI_1600_899.jpg) #### RTX Direct Illumination (RTXDI) NVIDIA RTX™ Direct Illumination (RTXDI) automatically renders millions of realistic dynamic lights and shadows. Learn MoreDownload SDK ![NVIDIA RTXGI off and on comparison](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/featured_image_rtxgi_1080-rsz.jpg) #### RTX Global Illumination (RTXGI) NVIDIA RTX Global Illumination (RTXGI) renders multi-bounce lighting instantly with accurate indirect lighting. [Learn More](/rtxgi)[Download SDK](/rtxgi-sdk-get-started) ![NVIDIA HBAO+ adds richer, more detailed shadows around objects](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-hbao-technology-410x230.jpg) #### NVIDIA HBAO+ Technology HBAO+ improves upon existing Ambient Occlusion techniques to add richer, more detailed and realistic shadows around objects that occlude rays of light. [Learn More](https://www.nvidia.com/en-gb/geforce/technologies/hbao-plus/) ![NVIDIA VRWorks helps developers create amazing VR experiences](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-vr-works-410x230.jpg) #### NVIDIA VRWorks VRWorks is a comprehensive suite of APIs, libraries, and engines that enable application and headset developers to create amazing virtual reality experiences. VRWorks enables a new level of presence by bringing physically realistic visuals, sound, touch interactions, and simulated environments to virtual reality. [Learn More](/vrworks)[Download SDK](/vrworks-graphics-348) ![Stream VR, AR, and MR content with NVIDIA Cloud XR SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-cloud-xr-sdk-410x230.jpg) #### NVIDIA CloudXR SDK CloudXR is NVIDIA's solution for streaming virtual reality (VR), augmented reality (AR), and mixed reality (MR) content from any OpenVR XR application on a remote server—cloud, data center, or edge. The CloudXR streaming solution includes NVIDIA RTX™ hardware, NVIDIA RTX Virtual Workstation (vWS) drivers, and the CloudXR software development kit (SDK). [Learn More](nvidia-cloudxr-sdk) ![GPUDirect RDMA enables a direct data path between local or remote storage](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gpu-direct-rdma-410x230.jpg) #### GPUDirect RDMA Designed specifically for the needs of GPU acceleration, GPUDirect RDMA provides direct communication between NVIDIA GPUs in remote systems. This eliminates the system CPUs and the required buffer copies of data via the system memory, resulting in 10X better performance. [Learn More](https://www.mellanox.com/products/GPUDirect-RDMA) ![NVIDIA Rivermax networking SDK for media and data streaming apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-rivermax-410x230.jpg) #### NVIDIA Rivermax NVIDIA® Rivermax® offers a unique IP-based solution for any media and data streaming use case. Rivermax leverages NVIDIA ConnectX® and BlueField DPU hardware streaming acceleration technology that enables direct data transfers to and from the GPU, delivering best-in-class throughput and latency with minimal CPU utilization for streaming workloads. Rivermax is the only fully-virtualized streaming solution that complies with the stringent timing and traffic flow requirement of the SMPTE ST 2110-21 specification. [Learn More](/networking/rivermax)[Download SDK](/networking/rivermax-getting-started) ![NVIDIA Mosaic enables multiple displays on video wall](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230.jpg) #### NVIDIA Mosaic Whether you want to see your work across multiple displays or project your ideas in 4K, you can with NVIDIA Mosaic™ multi-display technology. With NVIDIA Mosaic, you can easily span any application across up to 16 high-resolution panels or projectors from a single system, conveniently treating the multiple displays as a single desktop, without application software changes or visual artifacts. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/nvidia-mosaic-technology/) ![Achieve a seamless, blended picture from two projectors on a curved screen with Warp and Blend](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/warp-and-blend-410x230.jpg) #### Warp and Blend Warp and Blend are interfaces exposed in NVAPI for warping (image geometry corrections) and blending (intensity and black level adjustment) a single display output or multiple display outputs. [Learn More](/warp-and-blend) ![Euclidean Distance Transform (EDT) as an example of NPP](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-performance-primitives-410x230.jpg) #### NVIDIA Performance Primitives (NPP) The NVIDIA Performance Primitives (NPP) library provides GPU-accelerated image, video, and signal processing functions that perform up to 30x faster than CPU-only implementations. With over 5,000 primitives for image and signal processing, you can easily perform tasks such as color conversion, image compression, filtering, thresholding and image manipulation. Learn More ![NVIDIA GPUDirect enhances data movement and access for GPUs](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gpu-direct-rdma-410x230.jpg) #### NVIDIA GPUDirect NVIDIA GPUDirect® is a family of technologies, part of Magnum IO, that enhances data movement and access for NVIDIA data center GPUs. Using GPUDirect, network adapters and storage drives can directly read and write to/from GPU memory, eliminating unnecessary memory copies, decreasing CPU overheads and reducing latency, resulting in significant performance improvements. [Learn More](/gpudirect) ![NVIDIA vGPU software for every workload across any device, anywhere.](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/vgpu-kv-web-refresh-vws-8k-v3.jpg) #### Virtual GPU (vGPU) NVIDIA virtual GPU (vGPU) software enables powerful GPU performance for workloads ranging from graphics-rich virtual workstations to data science and AI, enabling IT to leverage the management and security benefits of virtualization as well as the performance of NVIDIA GPUs required for modern workloads. Installed on a physical GPU in a cloud or enterprise data center server, NVIDIA vGPU software creates virtual GPUs that can be shared across multiple virtual machines, accessed by any device, anywhere. [Learn More](https://www.nvidia.com/en-us/data-center/virtual-solutions/) ![NVIDIA DOCA SDK helps to accelerate and secure next gen data center](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-doca-sdk-410x230.jpg) #### NVIDIA DOCA SDK NVIDIA® DOCA™ is the key to unlocking the potential of the NVIDIA BlueField® data processing unit (DPU) to offload, accelerate, and isolate data center workloads. With DOCA, developers can program the data center infrastructure of tomorrow by creating software-defined, cloud-native, DPU-accelerated services with zero-trust protection to address the increasing performance and security demands of modern data centers. [Learn More](/networking/doca) ![NVIDIA CUDA helps developers speed up computing applications by harnessing GPU power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-410x230.jpg) #### CUDA CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. [Learn More](/cuda-zone)[Download SDK](/cuda-downloads) ![ NVIDIA DLSS boosts frame rates and generates sharp images ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/DLSS.jpg) #### Deep Learning Super Sampling (DLSS) NVIDIA Deep Learning Super Sampling (DLSS) is a deep learning neural network that boosts frame rates and generates sharp images. [Learn More](/dlss) ![Flowchart showing NVIDIA TensorRT SDK for deep learning inference](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-tensor-rt-410x230.jpg) #### NVIDIA TensorRT NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. [Learn More](/tensorrt)[Download SDK](/tensorrt-getting-started) ![NVIDIA Maxine SDK helps developers build virtual collaboration and content creation apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-maxine-410x230.jpg) #### NVIDIA Maxine NVIDIA Maxine is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming. [Learn More](/maxine)[Download SDK](/maxine-getting-started) ![Flowchart showing NVIDIA DeepStream SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deep-stream-sdk-410x230.jpg) #### DeepStream SDK NVIDIA’s DeepStream SDK delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video, audio, and image understanding. DeepStream is for vision AI developers, software partners, startups, and OEMs building IVA apps and services. [Learn More](/deepstream-sdk)[Download SDK](/deepstream-getting-started) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-optical-flow-sdk-410x230.jpg) #### NVIDIA Optical Flow SDK Optical Flow SDK exposes the latest hardware capability of Turing and Ampere GPUs dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, robust to frame-to-frame intensity variations, and track the true object motion. [Learn More](/opticalflow-sdk)[Download SDK](/opticalflow/download) ![NVIDIA Triton Inference Server delivers fast and scalable AI in production ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-triton-inference-server-410x230.jpg) #### NVIDIA Triton Inference Server NVIDIA Triton™ Inference Server delivers fast and scalable AI in production. Open-source inference serving software, Triton Inference Server, streamlines AI inference by enabling teams to deploy trained AI models from any framework and on any GPU- or CPU-based infrastructure (cloud, data center, or edge). [Learn More](/nvidia-triton-inference-server)[Download SDK](https://github.com/triton-inference-server/server) ![Flowchart showing NVIDIA Video Codec SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-video-codec-sdk-410x230.jpg) #### NVIDIA Video Codec SDK Video Codec SDK is a comprehensiveAPI set, including high-performance tools, samples, and documentation for hardware-accelerated video encode and decode on Windows and Linux. [Learn More](/nvidia-video-codec-sdk)[Download SDK](/nvidia-video-codec-sdk/download) ![NVIDIA Riva SDK helps build speech AI apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-riva-410x230.jpg) #### NVIDIA Riva NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications customized for your use case and delivering real-time performance. [Learn More](/riva)[Download SDK](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) ![NVIDIA CUDA helps developers speed up computing applications by harnessing GPU power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-410x230.jpg) #### CUDA CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. [Learn More](/cuda-zone)[Download SDK](/cuda-downloads) ![NVIDIA DLSS boosts frame rates and generates sharp images](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/DLSS.jpg) #### Deep Learning Super Sampling (DLSS) NVIDIA Deep Learning Super Sampling (DLSS) is a deep learning neural network that boosts frame rates and generates sharp images. [Learn More](/dlss) ![NVIDIA Triton Inference Server simplifies deployment of scalable AI in production](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-triton-inference-server-410x230.jpg) #### NVIDIA Triton Inference Server NVIDIA Triton™ Inference Server delivers fast and scalable AI in production. Open-source inference serving software, Triton Inference Server streamlines AI inference by enabling teams to deploy trained AI models from any framework and on any GPU- or CPU-based infrastructure (cloud, data center, or edge). [Learn More](/nvidia-triton-inference-server)[Download SDK](https://github.com/triton-inference-server/server) ![NVIDIA OptiX helps to achieve optimal ray tracing performance on GPU](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-opti-x-ray-tracing-engine-410x230.jpg) #### NVIDIA OptiX Ray Tracing Engine OptiX is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Bring the power of NVIDIA GPUs to your ray tracing applications with programmable intersection, ray generation, and shading. [Learn More](/optix)[Download SDK](/designworks/optix/download) ![NanoVDB adds real-time rendering GPU support for OpenVDB](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nanovdb-410x230.jpg) #### NanoVDB NanoVDB adds real-time rendering GPU support for OpenVDB. OpenVDB is the Academy Award-winning industry-standard data structure and toolset used for manipulating volumetric effects. [Learn More](/nanovdb)[Download SDK](https://github.com/AcademySoftwareFoundation/openvdb/tree/feature/nanovdb) ![Pixar Universal Scene Description (USD) for content creation](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/pixar-universal-scene-description-usd-410x230.jpg) #### Pixar Universal Scene Description (USD) Universal Scene Description (USD) is an easily extensible, open-source 3D scene description and file format developed by Pixar for content creation and interchange among different tools. [Learn More](/usd) ![Rendering MDL sample cloth](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/material-definition-language-410x230.jpg) #### Material Definition Language (MDL) The NVIDIA MDL SDK is a set of tools to integrate MDL support into rendering applications. It contains components for loading, inspecting, editing of material definitions as well as compiling MDL functions to GLSL, HLSL, Native x86, PTX and LLVM-IR. With the NVIDIA MDL SDK, any physically based renderer can easily add support for MDL and join the MDL eco-system. [Learn More](/mdl-sdk)[Download SDK](/mdl-sdk-get-started) ![Nvdiffrast is used for rasterization-based differentiable rendering](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvdiffrast-410x230.jpg) #### Nvdiffrast Nvdiffrast is a PyTorch/TensorFlow library that provides high-performance primitive operations for rasterization-based differentiable rendering. [Learn More](https://nvlabs.github.io/nvdiffrast/) ![NVIDIA HairWorks effects in The Witcher 3: Wild Hunt](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/HairWorks_1600_899.jpg) #### NVIDIA HairWorks NVIDIA HairWorks enables advanced simulation and rendering techniques for richer hair and fur. [Learn More](/hairworks)[Download SDK](/gameworks-source-github) To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video. #### NVIDIA PhysX System Software NVIDIA® PhysX® is a scalable multi-platform physics simulation solution that brings high-performance and precision accuracy to industrial simulation. [Learn More](/physx-sdk)Download SDK ![NVIDIA WaveWorks featured in Just Cause 3](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/WaveWorks_1600_899.jpg) #### NVIDIA WaveWorks NVIDIA WaveWorks delivers cinematic-quality ocean simulation for interactive applications. Learn MoreDownload SDK ![NVIDIA Texture Tools Exporter standalone showcasing a cube map reflection](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Texture_Tools_1600_899.jpg) #### NVIDIA Texture Tools Exporter NVIDIA Texture Tools Exporter creates highly compressed texture files directly from image sources. [Learn More](/nvidia-texture-tools-exporter)[TOOLS EXPORTER](/nvidia-texture-tools-exporter)[Download SDK](/2021.1.1/nvidia_texture_tools_exporter) ![NVIDIA CUDA helps developers speed up computing applications by harnessing GPU power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-410x230.jpg) #### CUDA CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. [Learn More](/cuda-zone)[Download SDK](/cuda-downloads) ![ NVIDIA DLSS boosts frame rates and generates sharp images ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/DLSS.jpg) #### Deep Learning Super Sampling (DLSS) NVIDIA Deep Learning Super Sampling (DLSS) is a deep learning neural network that boosts frame rates and generates sharp images. [Learn More](/dlss) ![NVIDIA OptiX helps to achieve optimal ray tracing performance on GPU](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-opti-x-ray-tracing-engine-410x230.jpg) #### NVIDIA OptiX Ray Tracing Engine OptiX is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Bring the power of NVIDIA GPUs to your ray tracing applications with programmable intersection, ray generation, and shading. [Learn More](/optix)[Download SDK](/designworks/optix/download) ![NanoVDB adds real-time rendering GPU support for OpenVDB](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nanovdb-410x230.jpg) #### NanoVDB NanoVDB adds real-time rendering GPU support for OpenVDB. OpenVDB is the Academy Award-winning industry-standard data structure and toolset used for manipulating volumetric effects. [Learn More](/nanovdb)[Download SDK](https://github.com/AcademySoftwareFoundation/openvdb/tree/feature/nanovdb) ![NVIDIA Quadro Sync enables video wall synchronization](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230_0.jpg) #### NVIDIA Quadro Sync NVIDIA® Quadro® Sync solutions enable the creation of dazzling ultra-high resolution, perfectly synchronized displays to meet the visualization and presentation needs across industries. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/quadro-sync/) ![NVIDIA Reflex for low latency gaming](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-reflex-410x230.jpg) #### NVIDIA Reflex NVIDIA Reflex is a suite of technologies that measures, analyzes, and reduces system latency. [Learn More](/reflex)[Download SDK](/reflex/get-started) ![Rendering MDL sample cloth](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/material-definition-language-410x230.jpg) #### Material Definition Language (MDL) The NVIDIA MDL SDK is a set of tools to integrate MDL support into rendering applications. It contains components for loading, inspecting, editing of material definitions as well as compiling MDL functions to GLSL, HLSL, Native x86, PTX and LLVM-IR. With the NVIDIA MDL SDK, any physically based renderer can easily add support for MDL and join the MDL eco-system. [Learn More](/mdl-sdk)[Download SDK](/mdl-sdk-get-started) ![NVIDIA NRD highlighted in Techland's Dying Light 2 Stay Human](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-real-time-denoiser-410x230.jpg) #### NVIDIA Real-Time Denoiser (NRD) NVIDIA Real-Time Denoisers (NRD) is a noise removal technology for image noise generated from lighting, shadow, and ray-traced sources. [Learn More](/nvidia-rt-denoiser)[Download SDK](/nvidia-rt-denoiser/get-started) ![Nvdiffrast is used for rasterization-based differentiable rendering](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvdiffrast-410x230.jpg) #### Nvdiffrast Nvdiffrast is a PyTorch/TensorFlow library that provides high-performance primitive operations for rasterization-based differentiable rendering. [Learn More](https://nvlabs.github.io/nvdiffrast/) ![NVIDIA RTXDI off and on comparison](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/RTXDI_1600_899.jpg) #### RTX Direct Illumination (RTXDI) NVIDIA RTX™ Direct Illumination (RTXDI) automatically renders millions of realistic dynamic lights and shadows. Learn MoreDownload SDK ![NVIDIA RTXGI off and on comparison](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/featured_image_rtxgi_1080-rsz.jpg) #### RTX Global Illumination (RTXGI) NVIDIA RTX Global Illumination (RTXGI) renders multi-bounce lighting instantly with accurate indirect lighting. [Learn More](/rtxgi)[Download SDK](/rtxgi-sdk-get-started) ![NVIDIA HBAO+ adds richer, more detailed shadows around objects](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-hbao-technology-410x230.jpg) #### NVIDIA HBAO+ Technology HBAO+ improves upon existing Ambient Occlusion techniques to add richer, more detailed and realistic shadows around objects that occlude rays of light. [Learn More](https://www.nvidia.com/en-gb/geforce/technologies/hbao-plus/) ![NVIDIA CUDA helps developers speed up computing applications by harnessing GPU power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-410x230.jpg) #### CUDA CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. [Learn More](/cuda-zone)[Download SDK](/cuda-downloads) ![ NVIDIA DLSS boosts frame rates and generates sharp images ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/DLSS.jpg) #### Deep Learning Super Sampling (DLSS) NVIDIA Deep Learning Super Sampling (DLSS) is a deep learning neural network that boosts frame rates and generates sharp images. [Learn More](/dlss) ![NVIDIA VRWorks helps developers create amazing VR experiences](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-vr-works-410x230.jpg) #### NVIDIA VRWorks VRWorks is a comprehensive suite of APIs, libraries, and engines that enable application and headset developers to create amazing virtual reality experiences. VRWorks enables a new level of presence by bringing physically realistic visuals, sound, touch interactions, and simulated environments to virtual reality. [Learn More](/vrworks)[Download SDK](/vrworks-graphics-348) ![NVIDIA Maxine SDK helps developers build virtual collaboration and content creation apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-maxine-410x230.jpg) #### NVIDIA Maxine NVIDIA Maxine is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming. [Learn More](/maxine)[Download SDK](/maxine-getting-started) ![Pixar Universal Scene Description (USD) for content creation](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/pixar-universal-scene-description-usd-410x230.jpg) #### Pixar Universal Scene Description (USD) Universal Scene Description (USD) is an easily extensible, open-source 3D scene description and file format developed by Pixar for content creation and interchange among different tools. [Learn More](/usd) ![NVIDIA Riva SDK helps build speech AI apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-riva-410x230-1.jpg) #### NVIDIA Riva NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance. [Learn More](/riva)[Download SDK](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) ![NVIDIA NRD highlighted in Techland's Dying Light 2 Stay Human](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-real-time-denoiser-410x230.jpg) #### NVIDIA Real-Time Denoiser (NRD) NVIDIA Real-Time Denoisers (NRD) is a noise removal technology for image noise generated from lighting, shadow, and ray-traced sources. [Learn More](/nvidia-rt-denoiser)[Download SDK](/nvidia-rt-denoiser/get-started) ![Stream VR, AR, and MR content with NVIDIA Cloud XR SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-cloud-xr-sdk-410x230.jpg) #### NVIDIA CloudXR SDK CloudXR is NVIDIA's solution for streaming virtual reality (VR), augmented reality (AR), and mixed reality (MR) content from any OpenVR XR application on a remote server—cloud, data center, or edge. The CloudXR streaming solution includes NVIDIA RTX™ hardware, NVIDIA RTX Virtual Workstation (vWS) drivers, and the CloudXR software development kit (SDK). [Learn More](nvidia-cloudxr-sdk) ![NVIDIA CUDA helps developers speed up computing applications by harnessing GPU power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-410x230.jpg) #### CUDA CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. [Learn More](/cuda-zone)[Download SDK](/cuda-downloads) ![GPUDirect RDMA enables a direct data path between local or remote storage](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gpu-direct-rdma-410x230.jpg) #### GPUDirect RDMA Designed specifically for the needs of GPU acceleration, GPUDirect RDMA provides direct communication between NVIDIA GPUs in remote systems. This eliminates the system CPUs and the required buffer copies of data via the system memory, resulting in 10X better performance. [Learn More](https://www.mellanox.com/products/GPUDirect-RDMA) ![NVIDIA Maxine SDK helps developers build virtual collaboration and content creation apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-maxine-410x230.jpg) #### NVIDIA Maxine NVIDIA Maxine is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming. [Learn More](/maxine)[Download SDK](/maxine-getting-started) ![NVIDIA DLSS boosts frame rates and generates sharp images](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/DLSS.jpg) #### Deep Learning Super Sampling (DLSS) NVIDIA Deep Learning Super Sampling (DLSS) is a deep learning neural network that boosts frame rates and generates sharp images. [Learn More](/dlss) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-optical-flow-sdk-410x230.jpg) #### NVIDIA Optical Flow SDK Optical Flow SDK exposes the latest hardware capability of Turing and Ampere GPUs dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, robust to frame-to-frame intensity variations, and track the true object motion. [Learn More](/opticalflow-sdk)[Download SDK](/opticalflow/download) ![NVIDIA Triton Inference Server delivers fast and scalable AI in production ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-triton-inference-server-410x230.jpg) #### NVIDIA Triton Inference Server NVIDIA Triton™ Inference Server delivers fast and scalable AI in production. Open-source inference serving software, Triton Inference Server, streamlines AI inference by enabling teams to deploy trained AI models from any framework and on any GPU- or CPU-based infrastructure (cloud, data center, or edge). [Learn More](/nvidia-triton-inference-server)[Download SDK](https://github.com/triton-inference-server/server) ![NVIDIA Rivermax networking SDK for media and data streaming apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-rivermax-410x230.jpg) #### NVIDIA Rivermax NVIDIA® Rivermax® offers a unique IP-based solution for any media and data streaming use case. Rivermax leverages NVIDIA ConnectX® and BlueField DPU hardware streaming acceleration technology that enables direct data transfers to and from the GPU, delivering best-in-class throughput and latency with minimal CPU utilization for streaming workloads. Rivermax is the only fully-virtualized streaming solution that complies with the stringent timing and traffic flow requirement of the SMPTE ST 2110-21 specification. [Learn More](/networking/rivermax)[Download SDK](/networking/rivermax-getting-started) ![NVIDIA Mosaic enables multiple displays on video wall](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230.jpg) #### NVIDIA Mosaic Whether you want to see your work across multiple displays or project your ideas in 4K, you can with NVIDIA Mosaic™ multi-display technology. With NVIDIA Mosaic, you can easily span any application across up to 16 high-resolution panels or projectors from a single system, conveniently treating the multiple displays as a single desktop, without application software changes or visual artifacts. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/nvidia-mosaic-technology/) ![Achieve a seamless, blended picture from two projectors on a curved screen with Warp and Blend](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/warp-and-blend-410x230.jpg) #### Warp and Blend Warp and Blend are interfaces exposed in NVAPI for warping (image geometry corrections) and blending (intensity and black level adjustment) a single display output or multiple display outputs. [Learn More](/warp-and-blend) ![NVIDIA Quadro Sync enables video wall synchronization](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230_0.jpg) #### NVIDIA Quadro Sync NVIDIA® Quadro® Sync solutions enable the creation of dazzling ultra-high resolution, perfectly synchronized displays to meet the visualization and presentation needs across industries. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/quadro-sync/) ![NVIDIA Video Codec SDK optimizes video memory usage](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-capture/proviz-video-codec-developer-diagram-850x480.png) #### NVIDIA Video Codec SDK A comprehensive set of API including high-performance tools, samples and documentation for hardware accelerated video encode and decode on Windows and Linux. [Learn More](/nvidia-video-codec-sdk)[LDownload SDK](/nvidia-video-codec-sdk/download) ![Euclidean Distance Transform (EDT) as an example of NPP](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-performance-primitives-410x230.jpg) #### NVIDIA Performance Primitives (NPP) The NVIDIA Performance Primitives (NPP) library provides GPU-accelerated image, video, and signal processing functions that perform up to 30x faster than CPU-only implementations. With over 5,000 primitives for image and signal processing, you can easily perform tasks such as color conversion, image compression, filtering, thresholding and image manipulation. Learn More ![NVIDIA GPUDirect enhances data movement and access for GPUs](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gpu-direct-rdma-410x230.jpg) #### NVIDIA GPUDirect NVIDIA GPUDirect® is a family of technologies, part of Magnum IO, that enhances data movement and access for NVIDIA data center GPUs. Using GPUDirect, network adapters and storage drives can directly read and write to/from GPU memory, eliminating unnecessary memory copies, decreasing CPU overheads and reducing latency, resulting in significant performance improvements. [Learn More](/gpudirect) ![NVIDIA CUDA helps developers speed up computing applications by harnessing GPU power](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda-410x230.jpg) #### CUDA CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. [Learn More](/cuda-zone)[Download SDK](/cuda-downloads) ![ NVIDIA DLSS boosts frame rates and generates sharp images ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/DLSS.jpg) #### Deep Learning Super Sampling (DLSS) NVIDIA Deep Learning Super Sampling (DLSS) is a deep learning neural network that boosts frame rates and generates sharp images. [Learn More](/dlss) ![NVIDIA Mosaic enables multiple displays on video wall](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230.jpg) #### NVIDIA Mosaic Whether you want to see your work across multiple displays or project your ideas in 4K, you can with NVIDIA Mosaic™ multi-display technology. With NVIDIA Mosaic, you can easily span any application across up to 16 high-resolution panels or projectors from a single system, conveniently treating the multiple displays as a single desktop, without application software changes or visual artifacts. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/nvidia-mosaic-technology/) ![NVIDIA Maxine SDK helps developers build virtual collaboration and content creation apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-maxine-410x230.jpg) #### NVIDIA Maxine NVIDIA Maxine is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming. [Learn More](/maxine)[Download SDK](/maxine-getting-started) ![Flowchart showing NVIDIA DeepStream SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deep-stream-sdk-410x230.jpg) #### DeepStream SDK NVIDIA’s DeepStream SDK delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video, audio, and image understanding. DeepStream is for vision AI developers, software partners, startups, and OEMs building IVA apps and services. [Learn More](/deepstream-sdk)[Download SDK](/deepstream-getting-started) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-optical-flow-sdk-410x230.jpg) #### NVIDIA Optical Flow SDK Optical Flow SDK exposes the latest hardware capability of Turing and Ampere GPUs dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, robust to frame-to-frame intensity variations, and track the true object motion. [Learn More](/opticalflow-sdk)[Download SDK](/opticalflow/download) ![NVIDIA Triton Inference Server delivers fast and scalable AI in production ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-triton-inference-server-410x230.jpg) #### NVIDIA Triton Inference Server NVIDIA Triton™ Inference Server delivers fast and scalable AI in production. Open-source inference serving software, Triton Inference Server, streamlines AI inference by enabling teams to deploy trained AI models from any framework and on any GPU- or CPU-based infrastructure (cloud, data center, or edge). [Learn More](/nvidia-triton-inference-server)[Download SDK](https://github.com/triton-inference-server/server) ![NVIDIA OptiX helps to achieve optimal ray tracing performance on GPU](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-opti-x-ray-tracing-engine-410x230.jpg) #### NVIDIA OptiX Ray Tracing Engine OptiX is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Bring the power of NVIDIA GPUs to your ray tracing applications with programmable intersection, ray generation, and shading. [Learn More](/optix)[Download SDK](/designworks/optix/download) ![NanoVDB adds real-time rendering GPU support for OpenVDB](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nanovdb-410x230.jpg) #### NanoVDB NanoVDB adds real-time rendering GPU support for OpenVDB. OpenVDB is the Academy Award-winning industry-standard data structure and toolset used for manipulating volumetric effects. [Learn More](/nanovdb)[Download SDK](https://github.com/AcademySoftwareFoundation/openvdb/tree/feature/nanovdb) ![Euclidean Distance Transform (EDT) as an example of NPP](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-performance-primitives-410x230.jpg) #### NVIDIA Performance Primitives (NPP) The NVIDIA Performance Primitives (NPP) library provides GPU-accelerated image, video, and signal processing functions that perform up to 30x faster than CPU-only implementations. With over 5,000 primitives for image and signal processing, you can easily perform tasks such as color conversion, image compression, filtering, thresholding and image manipulation. Learn More ![Achieve a seamless, blended picture from two projectors on a curved screen with Warp and Blend](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/warp-and-blend-410x230.jpg) #### Warp and Blend Warp and Blend are interfaces exposed in NVAPI for warping (image geometry corrections) and blending (intensity and black level adjustment) a single display output or multiple display outputs. [Learn More](/warp-and-blend) ![NVIDIA Rivermax networking SDK for media and data streaming apps](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-rivermax-410x230.jpg) #### NVIDIA Rivermax NVIDIA® Rivermax® offers a unique IP-based solution for any media and data streaming use case. Rivermax leverages NVIDIA ConnectX® and BlueField DPU hardware streaming acceleration technology that enables direct data transfers to and from the GPU, delivering best-in-class throughput and latency with minimal CPU utilization for streaming workloads. Rivermax is the only fully-virtualized streaming solution that complies with the stringent timing and traffic flow requirement of the SMPTE ST 2110-21 specification. [Learn More](/networking/rivermax)[Download SDK](/networking/rivermax-getting-started) ![Flowchart showing NVIDIA DeepStream SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deep-stream-sdk-410x230.jpg) #### DeepStream SDK NVIDIA’s DeepStream SDK delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video, audio, and image understanding. DeepStream is for vision AI developers, software partners, startups, and OEMs building IVA apps and services. [Learn More](/deepstream-sdk)[Download SDK](/deepstream-getting-started) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-optical-flow-sdk-410x230.jpg) #### NVIDIA Optical Flow SDK Optical Flow SDK exposes the latest hardware capability of Turing and Ampere GPUs dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, robust to frame-to-frame intensity variations, and track the true object motion. [Learn More](/opticalflow-sdk)[Download SDK](/opticalflow/download) ![NVIDIA vGPU software for every workload across any device, anywhere.](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/vgpu-kv-web-refresh-vws-8k-v3.jpg) #### Virtual GPU (vGPU) NVIDIA virtual GPU (vGPU) software enables powerful GPU performance for workloads ranging from graphics-rich virtual workstations to data science and AI, enabling IT to leverage the management and security benefits of virtualization as well as the performance of NVIDIA GPUs required for modern workloads. Installed on a physical GPU in a cloud or enterprise data center server, NVIDIA vGPU software creates virtual GPUs that can be shared across multiple virtual machines, accessed by any device, anywhere. [Learn More](https://www.nvidia.com/en-us/data-center/virtual-solutions/) ![NVIDIA Mosaic enables multiple displays on video wall](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230.jpg) #### NVIDIA Mosaic Whether you want to see your work across multiple displays or project your ideas in 4K, you can with NVIDIA Mosaic™ multi-display technology. With NVIDIA Mosaic, you can easily span any application across up to 16 high-resolution panels or projectors from a single system, conveniently treating the multiple displays as a single desktop, without application software changes or visual artifacts. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/nvidia-mosaic-technology/) ![Achieve a seamless, blended picture from two projectors on a curved screen with Warp and Blend](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/warp-and-blend-410x230.jpg) #### Warp and Blend Warp and Blend are interfaces exposed in NVAPI for warping (image geometry corrections) and blending (intensity and black level adjustment) a single display output or multiple display outputs. [Learn More](/warp-and-blend) ![NVIDIA Quadro Sync enables video wall synchronization](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-quadro-sync-410x230_0.jpg) #### NVIDIA Quadro Sync NVIDIA® Quadro® Sync solutions enable the creation of dazzling ultra-high resolution, perfectly synchronized displays to meet the visualization and presentation needs across industries. [Learn More](https://www.nvidia.com/en-us/design-visualization/solutions/quadro-sync/) ![NVIDIA DOCA SDK helps to accelerate and secure next gen data center](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-doca-sdk-410x230.jpg) #### NVIDIA DOCA SDK NVIDIA® DOCA™ is the key to unlocking the potential of the NVIDIA BlueField® data processing unit (DPU) to offload, accelerate, and isolate data center workloads. With DOCA, developers can program the data center infrastructure of tomorrow by creating software-defined, cloud-native, DPU-accelerated services with zero-trust protection to address the increasing performance and security demands of modern data centers. [Learn More](/networking/doca) ![GPUDirect RDMA enables a direct data path between local or remote storage](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gpu-direct-rdma-410x230.jpg) #### NVIDIA GPUDirect NVIDIA GPUDirect® is a family of technologies, part of Magnum IO, that enhances data movement and access for NVIDIA data center GPUs. Using GPUDirect, network adapters and storage drives can directly read and write to/from GPU memory, eliminating unnecessary memory copies, decreasing CPU overheads and reducing latency, resulting in significant performance improvements. [Learn More](/gpudirect) ![NVIDIA GPUDirect enhances data movement and access for GPUs](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvapi-410x230.jpg) #### GPUDirect RDMA Designed specifically for the needs of GPU acceleration, GPUDirect RDMA provides direct communication between NVIDIA GPUs in remote systems. This eliminates the system CPUs and the required buffer copies of data via the system memory, resulting in 10X better performance. [Learn More](https://network.nvidia.com/products/GPUDirect-RDMA/) ## Developer Tools ![NVAPI allows direct access to NVIDIA GPUs and drivers on all windows platforms](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gpu-direct-rdma-410x230_2.jpg) ### NVAPI NVAPI is NVIDIA's core software development kit that allows direct access to NVIDIA GPUs and drivers on all windows platforms. In addition, NVAPI provides support for categories of operations that range beyond the scope of those found in familiar graphics APIs such as DirectX and OpenGL [Learn More](/nvapi) ![Flowchart showing NVIDIA Nsight tools for performance optimization](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Nsight-Diagram.png) ### Nsight NVIDIA Nsight™ is a collection of applications for building, debugging, profiling, and developing software that utilizes NVIDIA hardware. [Learn More About Nsight](/tools-overview) ![Design and develop deep neural networks for in-app inference with Nsight DL Designer](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight-deep-learning-designer-410x230.jpg) ### Nsight Deep Learning Designer Nsight DL Designer is an integrated development environment that helps developers efficiently design and develop deep neural networks for in-app inference. [Learn More](/nsight-dl-designer) https://www.youtube-nocookie.com/embed/Xjqsl4rW8xc ### NVIDIA Nsight Aftermath SDK NVIDIA® Nsight™ Aftermath SDK is a simple library you integrate into your D3D12 or Vulkan game’s crash reporter to generate GPU "mini-dumps" when a TDR or exception occurs. [Learn More](/nsight-aftermath) ## Browse Resources ## Additional Programs And Resources ### Access Developer Resources The NVIDIA Developer Program provides the advanced tools and training needed to successfully build applications on all NVIDIA technology platforms. This includes access to hundreds of SDKs, a network of like-minded developers in our community forums, and more. **[Join Today](https://developer.nvidia.com/developer-program)** ### Get Hands-On Training The NVIDIA Deep Learning Institute (DLI) offers hands-on training in AI, accelerated computing, and accelerated data science to solve real-world problems. Powered by GPUs in the cloud, training is available as self-paced, online courses or live, instructor-led workshops. **[View Courses](https://www.nvidia.com/en-us/training/)** ### Accelerate Your Startup NVIDIA Inception—an acceleration platform for AI, data science, and high-performance computing (HPC) startups—supports over 9,000 startups worldwide with go-to-market support, expertise, and technology. Startups get access to training through the DLI, preferred pricing on hardware, and invitations to exclusive networking events. **[Learn More](https://www.nvidia.com/en-us/startups/)** Sign up for the latest developer news from NVIDIA [Subscribe](https://www.nvidia.com/en-us/preferences/email-signup/) --- # Source: https://developer.nvidia.com/metropolis.md # NVIDIA Metropolis for Developers Discover an advanced collection of developer blueprints, AI models, and tools that deliver exceptional scale, throughput, cost-effectiveness, and faster time to production. It provides everything you need to build, deploy, and scale vision AI agents and applications, from the edge to the cloud. [Get Started](https://build.nvidia.com/explore/vision) [![ A diagram showing NVIDIA Metropolis with a host of SDKs and developer tools](https://developer.download.nvidia.com/images/metropolis/metropolis-stack.png " A diagram showing NVIDIA Metropolis with a host of SDKs and developer tools")](https://developer.download.nvidia.com/images/metropolis/metropolis-stack.png) Click to enlarge ## Explore All the Benefits ### Faster Builds Use and tune high-performance vision language models and vision foundation models to streamline AI training for your unique industry. NVIDIA Blueprints and cloud-native modular microservices are designed to help you accelerate development. ### Lower Cost Powerful SDKs—including NVIDIA TensorRT™, DeepStream, and TAO—reduce overall solution cost. Generate synthetic data, boost accuracy with model customization, and maximize inference throughput on NVIDIA infrastructure. ### More Flexible Deployments Deploy with flexibility using NVIDIA Inference Microservices (NIM™), cloud-native Metropolis microservices, and containerized applications offering options for on-premises, cloud, or hybrid deployments. ## Powerful Tools for AI-Enabled Video Analytics The Metropolis suite of SDKs provides a variety of starting points for AI application development and deployment. - [ #### View All ](#view-all) - [ #### Models ](#models) - [ #### Tools ](#tools) - [ #### Data ](#data) ### State-of-the-Art Vision Language Models and Vision Foundation Models ​ [Vision language models](https://www.nvidia.com/en-us/glossary/vision-language-models/) (VLMs) are multimodal, generative AI models that can understand and process video, images, and text. Computer vision foundation models, including vision transformers (ViTs), analyze and interpret visual data to create embeddings or perform tasks like object detection, segmentation, and classification. Cosmos Reason offers you an open and fully customizable world foundation model designed for video reasoning. It enables efficient training data curation for robotics and autonomous vehicles (AVs), and powers spatio-temporal understanding to accelerate automation across smart cities and industrial environments. Explore - [Explore NVIDIA NIM for Vision](https://build.nvidia.com/explore/vision) - [Cosmos Reason](https://build.nvidia.com/nvidia/cosmos-reason1-7b) - [Vision Foundation Models](https://docs.nvidia.com/tao/tao-toolkit/text/model_zoo/overview.html#foundation-models) ![A vision language model that generates text output for video insights](https://developer.download.nvidia.com/images/metropolis/metropolis-and-iva-ngc-visual-blog-1920x1080%20(2).jpg "A vision language model that generates text output for video insights") ![The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit](https://developer.download.nvidia.com/images/tao-tool-kit-key-visual-update-5.0-render-2686010.jpg "The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit") ### TAO The Train, Adapt, and Optimize (TAO) toolkit is a low-code AI model development solution for developers. It lets you use the power of transfer learning to fine-tune NVIDIA computer vision models and vision foundation models with your own data and optimize for inference—without AI expertise or a large training dataset. [Learn More About TAO](/transfer-learning-toolkit) ### AI Agent Blueprints The [NVIDIA AI Blueprint for video search and summarization (VSS)](https://build.nvidia.com/nvidia/video-search-and-summarization) makes it easy to build and customize video analytics AI agents using generative AI, VLMs, LLMs, and NVIDIA NIM. The [video analytics AI agents](https://www.nvidia.com/en-us/use-cases/video-analytics-ai-agents/) are given tasks through natural language and can analyze, interpret, and process vast amounts of video data to provide critical insights that help a range of industries optimize processes, improve safety, and cut costs. VSS enables seamless integration of generative AI into existing computer vision pipelines—enhancing inspection, search, and analytics with multimodal understanding and zero-shot reasoning. Easily deploy from the edge to the cloud on platforms including NVIDIA RTX PRO™ 6000, DGX™ Spark, and Jetson Thor™. [Explore NVIDIA AI Blueprint for Video Search and Summarization](https://build.nvidia.com/nvidia/video-search-and-summarization) ![Use NVIDIA AI Blueprint for video search and summarization](https://developer.download.nvidia.com/images/ai-agent-blueprints-1902x1080.jpg "Use NVIDIA AI Blueprint for video search and summarization") ![NVIDIA NIM are easy to use, cloud-native microservices for inferencing](https://developer.download.nvidia.com/images/metropolis/practitioner-nim-1920x1080.png "NVIDIA NIM are easy to use, cloud-native microservices for inferencing") ### NVIDIA NIM NVIDIA NIM is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across the cloud, data center, and workstations. Supporting a wide range of AI models—including foundation models, LLMs, VLMs, and more—NIM ensures seamless, scalable AI inferencing, on-premises or in the cloud, using industry-standard APIs. [Explore NVIDIA NIM for Vision](https://build.nvidia.com/explore/vision) ### Metropolis Microservices Metropolis microservices provide powerful, customizable, cloud-native building blocks for developing vision AI agents, applications, and solutions. They’re built to run on NVIDIA cloud and data center GPUs, as well as the NVIDIA Jetson Orin™ edge AI platform. [Learn More](/metropolis-microservices) ![Use Metropolis microservices to develop vision AI applications.](https://developer.download.nvidia.com/images/metropolis/metropolis-key-visual-microservices-ref-app.jpg "Use Metropolis microservices to develop vision AI applications.") ![NVIDIA DeepStream SDK is a complete streaming analytics toolkit](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deep-stream-sdk-630x354.jpg "NVIDIA DeepStream SDK is a complete streaming analytics toolkit") ### DeepStream SDK NVIDIA DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding. It’s ideal for vision AI developers, software partners, startups, and OEMs building IVA apps and services. DeepStream 8.0 will include multi-camera tracking and a low-code inference builder. [Learn More About DeepStream SDK](/deepstream-sdk) ### NVIDIA Omniverse NVIDIA Omniverse™ helps you integrate [OpenUSD](https://www.nvidia.com/en-us/omniverse/usd/), NVIDIA RTX™ rendering technologies, and generative [physical AI](https://www.nvidia.com/en-us/glossary/generative-physical-ai/) into existing software tools and simulation workflows to develop and test [digital twins](https://www.nvidia.com/en-us/omniverse/solutions/digital-twins/). You can use it with your own software for building AI-powered robot brains that drive robots, Metropolis perception from cameras, equipment, and more for continuous development, testing, and optimization. Omniverse Replicator makes it easier to generate physically accurate 3D synthetic data at scale, or build your own [synthetic data](https://nvidia.com/en-us/use-cases/synthetic-data/) tools and frameworks. Bootstrap perception AI model training and achieve accurate Sim2Real performance without having to manually curate and label real-world data. [Learn More About Omniverse Replicator](/nvidia-omniverse-platform/replicator) ![An autonomous mobile robot in action in a warehouse](https://developer.download.nvidia.com/images/metropolis/replicator-isaac-sim-630x354.jpg "An autonomous mobile robot in action in a warehouse") ![Use NVIDIA Cosmos generative world foundation models to build physical AI systems](https://developer.download.nvidia.com/images/metropolis/physical-ai-dataset-1902x1080.jpg "Use NVIDIA Cosmos generative world foundation models to build physical AI systems") ### NVIDIA Cosmos NVIDIA Cosmos™ is a platform of state-of-the-art generative [world foundation models](https://www.nvidia.com/en-us/glossary/world-models/?ncid=ref-dev-171762-vlm-jan-25) (WFMs), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline. It's purpose-built to accelerate the development of [physical AI](https://www.nvidia.com/en-us/glossary/physical-ai/?ncid=ref-dev-171762-vlm-jan-25) systems. [Learn More About NVIDIA Cosmos](https://www.nvidia.com/en-us/ai/cosmos/) ### NVIDIA Physical AI Dataset Unblock data bottlenecks with this open-source dataset for training vision AI applications to understand industrial facilities, smart cities, robots, and autonomous vehicle development. The unified collection is composed of validated data used to build NVIDIA physical AI solutions—now available for free to developers on Hugging Face. [Explore the NVIDIA Physical AI Dataset](https://huggingface.co/collections/nvidia/physicalai-67c643edbb024053dcbcd6d8) ![NVIDIA Physical AI dataset for smart spaces, robot, and autonomous vehicle development](https://developer.download.nvidia.com/images/metropolis/largest-synthetic-dataset-sfg.jpg "NVIDIA Physical AI dataset for smart spaces, robot, and autonomous vehicle development") ![NVIDIA Isaac SIM mimics industrial facilities](https://developer.download.nvidia.com/images/metropolis/nvidia-isaac-sim.jpg "NVIDIA Isaac SIM mimics industrial facilities") ### NVIDIA Isaac SIM Developers need training data that mimics what cameras would capture in complex, dynamic 3D spaces such as industrial facilities and smart cities. Action and Event Data Generation is a reference application on NVIDIA Isaac Sim™. It lets developers generate synthetic image and video data in a physically accurate virtual environment to train custom vision AI models. These include tools to simulate actors like humans and robots, create objects with domain randomization, and generate incident-based scenarios for various vision AI models. Use the VLM scene captioning tool to automatically generate image-caption pairs and accelerate the annotation process [Get started With Event and Actor Generation on Isaac SIM](https://docs.isaacsim.omniverse.nvidia.com/latest/action_and_event_data_generation/index.html) ## Use and Fine-Tune Optimized AI Models ### State-of-the-Art Vision Language Models and Vision Foundation Models ​ [Vision language models](https://www.nvidia.com/en-us/glossary/vision-language-models/) (VLMs) are multimodal, generative AI models that can understand and process video, images, and text. Computer vision foundation models, including vision transformers (ViTs), analyze and interpret visual data to create embeddings or perform tasks like object detection, segmentation, and classification. Cosmos Reason offers you an open and fully customizable world foundation model designed for video reasoning. It enables efficient training data curation for robotics and autonomous vehicles (AVs), and powers spatio-temporal understanding to accelerate automation across smart cities and industrial environments. Explore - [Explore NVIDIA NIM for Vision](https://build.nvidia.com/explore/vision) - [Cosmos Reason](https://build.nvidia.com/nvidia/cosmos-reason1-7b) - [Vision Foundation Models](https://docs.nvidia.com/tao/tao-toolkit/text/model_zoo/overview.html#foundation-models) ![A vision language model that generates text output for video insights](https://developer.download.nvidia.com/images/metropolis/metropolis-and-iva-ngc-visual-blog-1920x1080%20(2).jpg "A vision language model that generates text output for video insights") ![The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit](https://developer.download.nvidia.com/images/tao-tool-kit-key-visual-update-5.0-render-2686010.jpg "The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit") ### TAO The Train, Adapt, and Optimize (TAO) toolkit is a low-code AI model development solution for developers. It lets you use the power of transfer learning to fine-tune NVIDIA computer vision models and vision foundation models with your own data and optimize for inference—without AI expertise or a large training dataset. [Learn More About TAO](/transfer-learning-toolkit) ## Build Powerful AI Applications ### AI Agent Blueprints The [NVIDIA AI Blueprint for video search and summarization (VSS)](https://build.nvidia.com/nvidia/video-search-and-summarization) makes it easy to build and customize video analytics AI agents using generative AI, VLMs, LLMs, and NVIDIA NIM. The [video analytics AI agents](https://www.nvidia.com/en-us/use-cases/video-analytics-ai-agents/) are given tasks through natural language and can analyze, interpret, and process vast amounts of video data to provide critical insights that help a range of industries optimize processes, improve safety, and cut costs. VSS enables seamless integration of generative AI into existing computer vision pipelines—enhancing inspection, search, and analytics with multimodal understanding and zero-shot reasoning. Easily deploy from the edge to the cloud on platforms including NVIDIA RTX PRO™ 6000, DGX™ Spark, and Jetson Thor™. [Explore NVIDIA AI Blueprint for Video Search and Summarization](https://build.nvidia.com/nvidia/video-search-and-summarization) ![Use NVIDIA AI Blueprint for video search and summarization](https://developer.download.nvidia.com/images/ai-agent-blueprints-1902x1080.jpg "Use NVIDIA AI Blueprint for video search and summarization") ![NVIDIA NIM are easy to use, cloud-native microservices for inferencing](https://developer.download.nvidia.com/images/metropolis/practitioner-nim-1920x1080.png "NVIDIA NIM are easy to use, cloud-native microservices for inferencing") ### NVIDIA NIM NVIDIA NIM is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across the cloud, data center, and workstations. Supporting a wide range of AI models—including foundation models, LLMs, VLMs, and more—NIM ensures seamless, scalable AI inferencing, on-premises or in the cloud, using industry-standard APIs. [Explore NVIDIA NIM for Vision](https://build.nvidia.com/explore/vision) ### Metropolis Microservices Metropolis microservices provide powerful, customizable, cloud-native building blocks for developing vision AI agents, applications, and solutions. They’re built to run on NVIDIA cloud and data center GPUs, as well as the NVIDIA Jetson Orin™ edge AI platform. [Learn More](/metropolis-microservices) ![Use Metropolis microservices to develop vision AI applications.](https://developer.download.nvidia.com/images/metropolis/metropolis-key-visual-microservices-ref-app.jpg "Use Metropolis microservices to develop vision AI applications.") ![NVIDIA DeepStream SDK is a complete streaming analytics toolkit](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/deep-stream-sdk-630x354.jpg "NVIDIA DeepStream SDK is a complete streaming analytics toolkit") ### DeepStream SDK NVIDIA DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding. It’s ideal for vision AI developers, software partners, startups, and OEMs building IVA apps and services. DeepStream 8.0 will include multi-camera tracking and a low-code inference builder. [Learn More About DeepStream SDK](/deepstream-sdk) ## Augment Training With Simulation and Synthetic Data ### NVIDIA Omniverse NVIDIA Omniverse™ helps you integrate [OpenUSD](https://www.nvidia.com/en-us/omniverse/usd/), NVIDIA RTX™ rendering technologies, and generative [physical AI](https://www.nvidia.com/en-us/glossary/generative-physical-ai/) into existing software tools and simulation workflows to develop and test [digital twins](https://www.nvidia.com/en-us/omniverse/solutions/digital-twins/). You can use it with your own software for building AI-powered robot brains that drive robots, Metropolis perception from cameras, equipment, and more for continuous development, testing, and optimization. Omniverse Replicator makes it easier to generate physically accurate 3D synthetic data at scale, or build your own [synthetic data](https://nvidia.com/en-us/use-cases/synthetic-data/) tools and frameworks. Bootstrap perception AI model training and achieve accurate Sim2Real performance without having to manually curate and label real-world data. [Learn More About Omniverse Replicator](/nvidia-omniverse-platform/replicator) ![An autonomous mobile robot in action in a warehouse](https://developer.download.nvidia.com/images/metropolis/replicator-isaac-sim-630x354.jpg "An autonomous mobile robot in action in a warehouse") ![Use NVIDIA Cosmos generative world foundation models to build physical AI systems](https://developer.download.nvidia.com/images/metropolis/physical-ai-dataset-1902x1080.jpg "Use NVIDIA Cosmos generative world foundation models to build physical AI systems") ### NVIDIA Cosmos NVIDIA Cosmos™ is a platform of state-of-the-art generative [world foundation models](https://www.nvidia.com/en-us/glossary/world-models/?ncid=ref-dev-171762-vlm-jan-25) (WFMs), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline. It's purpose-built to accelerate the development of [physical AI](https://www.nvidia.com/en-us/glossary/physical-ai/?ncid=ref-dev-171762-vlm-jan-25) systems. [Learn More About NVIDIA Cosmos](https://www.nvidia.com/en-us/ai/cosmos/) ### NVIDIA Physical AI Dataset Unblock data bottlenecks with this open-source dataset for training vision AI applications to understand industrial facilities, smart cities, robots, and autonomous vehicle development. The unified collection is composed of validated data used to build NVIDIA physical AI solutions—now available for free to developers on Hugging Face. [Explore the NVIDIA Physical AI Dataset](https://huggingface.co/collections/nvidia/physicalai-67c643edbb024053dcbcd6d8) ![NVIDIA Physical AI dataset for smart spaces, robot, and autonomous vehicle development](https://developer.download.nvidia.com/images/metropolis/largest-synthetic-dataset-sfg.jpg "NVIDIA Physical AI dataset for smart spaces, robot, and autonomous vehicle development") ![NVIDIA Isaac SIM mimics industrial facilities](https://developer.download.nvidia.com/images/metropolis/nvidia-isaac-sim.jpg "NVIDIA Isaac SIM mimics industrial facilities") ### NVIDIA Isaac SIM Developers need training data that mimics what cameras would capture in complex, dynamic 3D spaces such as industrial facilities and smart cities. Action and Event Data Generation is a reference application on NVIDIA Isaac Sim™. It lets developers generate synthetic image and video data in a physically accurate virtual environment to train custom vision AI models. These include tools to simulate actors like humans and robots, create objects with domain randomization, and generate incident-based scenarios for various vision AI models. Use the VLM scene captioning tool to automatically generate image-caption pairs and accelerate the annotation process [Get started With Event and Actor Generation on Isaac SIM](https://docs.isaacsim.omniverse.nvidia.com/latest/action_and_event_data_generation/index.html) * * * ## Developer Resources ![Learn how to build a video search and summarization agent](https://developer.download.nvidia.com/images/metropolis/build-video-search%201902x1080.jpg "Learn how to build a video search and summarization agent") ### Build a Video Search and Summarization Agent Learn how to seamlessly build a video analytics AI agent using NVIDIA AI Blueprint for video search and summarization (VSS). Read the Blog: [Part 1](https://developer.nvidia.com/blog/build-a-video-search-and-summarization-agent-with-nvidia-ai-blueprint/) | [Part 2](https://developer.nvidia.com/blog/advance-video-analytics-ai-agents-using-the-nvidia-ai-blueprint-for-video-search-and-summarization/) ![A vision language model workflow](https://developer.download.nvidia.com/images/metropolis/vlm-reference-workflows-1902x1080.jpg "A vision language model workflow") ### VLM Reference Workflows Check out advanced workflows for building multimodal visual AI agents. [Read the Blog](/blog/build-multimodal-visual-ai-agents-powered-by-nvidia-nim/) ![Security alert on detecting a person without wearing a helmet](https://developer.download.nvidia.com/images/metropolis/vlm-prompt-guide-1902x1080.jpg "Security alert on detecting a person without wearing a helmet") ### VLM Prompt Guide Learn how to effectively prompt a VLM for single-image, multi-image, and video-understanding use cases. [Read the Blog](/blog/vision-language-model-prompt-engineering-guide-for-image-and-video-understanding/) ![Learn how to fine-tune NVIDIA Cosmos Reason](https://developer.download.nvidia.com/images/metropolis/nvidia-cosmos.jpg " Learn how to fine-tune NVIDIA Cosmos Reason ") ### Post-train NVIDIA Cosmos Reason Learn how to fine-tune NVIDIA Cosmos Reason VLM for physical AI and robotics. [Read the Blog](/blog/maximize-robotics-performance-by-post-training-nvidia-cosmos-reason) [View all Metropolis technical blogs](/blog/tag/metropolis/) ## Explore NVIDIA GTC Talks On-Demand Develop, deploy, and scale AI-enabled video analytics applications with NVIDIA Metropolis. [Get Started](https://build.nvidia.com/explore/vision) --- # Source: https://developer.nvidia.com/morpheus-cybersecurity.md # NVIDIA Morpheus [NVIDIA Morpheus](https://www.nvidia.com/en-us/ai-data-science/products/morpheus/ "NVIDIA Morpheus") is a GPU-accelerated, end-to-end AI framework that enables developers to create optimized applications for filtering, processing, and classifying large volumes of streaming cybersecurity data. Morpheus incorporates AI to reduce the time and cost associated with identifying, capturing, and acting on threats, bringing a new level of security to the data center, cloud, and edge. Morpheus also extends human analysts’ capabilities with generative AI by automating real-time analysis and responses, producing synthetic data to train AI models that identify risks accurately, and run what-if scenarios [Access GitHub Repo](https://github.com/nv-morpheus/Morpheus "Github Repo")[Download Workflows](https://catalog.ngc.nvidia.com/collections?filters=platform%7CMorpheus%7Cpltfm_morpheus&orderBy=weightPopularDESC&query=&page=&pageSize= "Download Workflows")[Documentation](https://docs.nvidia.com/morpheus/index.html "Documentation")[Forum](https://forums.developer.nvidia.com/c/ai-data-science/cybersecurity/455 "Forum") ## Ways to Get Started With NVIDIA Morpheus [Use](https://www.nvidia.com/en-us/ai-data-science/products/morpheus/get-started/ "Tools to build and deploy") the right tools to build and deploy AI-based cybersecurity solutions. ![Decorative image representing development access code](https://developer.download.nvidia.com/images/morpheus/m48-accelerate-computing-with-cuda-c-c-256px-blk.png) ### Access Code for Development Morpheus is available as open-source software on GitHub for developers interested in using the latest pre-release features and who want to build from source. [Access Code on GitHub](https://github.com/nv-morpheus/Morpheus "Access code on GitHub") ![Decorative image representing development workflows for download](https://developer.download.nvidia.com/images/morpheus/m48-download-256px-blk.png) ### Download Workflows for Development The [digital fingerprinting](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/digital-fingerprinting/ "Digital Fingerprinting") threat detection and [spear phishing](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/spear-phishing/ "spear phising") detection AI workflows include AI frameworks and resources, such as Helm charts, Jupyter Notebooks, and documentation to give developers a jump-start when building AI solutions. [Download Workflows from NGC](https://catalog.ngc.nvidia.com/collections?filters=platform|pltfm_morpheus|Morpheus&orderBy=weightPopularDESC&query= "Download Workflows NGC") [Try the Workflows on LaunchPad](https://www.nvidia.com/en-us/launchpad/cybersecurity/ "Try the Workflows") ![Decorative image representing NVIDIA AI Enterprise for production on cloud](https://developer.download.nvidia.com/images/morpheus/m48-configuration-sdk-256px-blk.png) ### Purchase NVIDIA AI Enterprise for Production Get unlimited usage on all clouds, access to NVIDIA AI experts, and long-term support for production deployments with a purchase of NVIDIA AI Enterprise. [Contact Us About Purchasing Morpheus](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/contact-sales/ "Contact Us") [Apply for a 90-Day NVIDIA AI Enterprise Evaluation License](https://enterpriseproductregistration.nvidia.com/?LicType=EVAL&ProductFamily=NVAIEnterprise "90-Day NVIDIA AI License") * * * ## Introductory Resources ### Getting Started Guide Learn three ways to get started with Morpheus, including how to use a pre-built docker container, how to build the Morpheus docker container, and how to build Morpheus from source. [Get Started](https://docs.nvidia.com/morpheus/getting_started.html "Get Started") ### Technical Briefs Explore the training and inference pipelines, components, prerequisites, and more for working with the cybersecurity AI workflows. [Read Now](https://docs.nvidia.com/ai-enterprise/workflows-cybersecurity-ai/0.1.0/index.html "Get Started") ### Examples Take a deeper dive into some of the concepts in Morpheus, including anomalous behavior profiling with the [RAPIDS Forest Inference Library (FIL)](https://developer.nvidia.com/blog/sparse-forests-with-fil/ "RAPIDS FIL"), abnormal behavioral profiling detection, graph neural networks (GNN) fraud detection, and more. [Get Started](https://docs.nvidia.com/morpheus/examples.html "Get Started") ### Introductory Blog Explore three use cases that demonstrate how generative AI and large language models (LLMs) improve cybersecurity, as well as three examples for applying AI foundation models for cybersecurity. [Read Blog](https://developer.nvidia.com/blog/?p=73728 "Read Blog") ### Technical Session Watch NVIDIA Director of Cybersecurity Engineering, Bartley Richardson, as he discusses the power of generative AI in helping businesses detect threats faster, fill data gaps, and enrich models with synthetic training data. [Watch Now](https://resources.nvidia.com/en-us-morpheus-developer/ai-powered-cybersecurity?lx=Wr-Hxo "Watch Now") ### GTC Session Explore the latest developments in Morpheus and see how you can take advantage of the tools, frameworks, pre-trained models, and infrastructure optimization software for building solutions that tackle cybersecurity challenges. [Watch Now ](https://resources.nvidia.com/l/en-us-gtcf22-morpheus?contentType=session "Watch Now") * * * ## Content Kits Access technical content on cybersecurity topics such as spear phishing detection and digital fingerprinting. ### Spear Phishing Detection Spear phishing is one of the largest and most costly cyber threats to organizations. Customized for individuals and usually very convincing, spear phishing emails are difficult to defend against due to a lack of training data. This kit will take you through spear phishing detection built around Morpheus and demonstrate how to use the content. - [Technical Brief](https://docs.nvidia.com/ai-enterprise/workflows-cybersecurity-ai/0.1.0/spear-phishing/technical-brief.html "Technical Brief") - [Example](https://github.com/nv-morpheus/Morpheus/tree/branch-24.03/docs/source/modules/examples/spear_phishing "Example") - [Modules](https://docs.nvidia.com/morpheus/modules/index.html#spear-phishing-modules "Modules") - [Developer Guide](https://github.com/nv-morpheus/Morpheus/tree/branch-24.03/examples/developer_guide/2_1_real_world_phishing "Developer Guide") - [Demo](https://resources.nvidia.com/en-us-morpheus-developer/detection-demo?lx=Wr-Hxo "Demo") ### Digital Fingerprinting As the number of connected users and devices expands, enterprises are generating more data than they can collect and analyze, creating a cybersecurity challenge. Use this kit to build a fingerprint of every user on the network to better detect anomalies. - [Technical Brief](https://resources.nvidia.com/en-us-digital-fingerprinting-workflow/cybersecurity-technical-brief "Technical Brief") - [Example](https://github.com/nv-morpheus/Morpheus/tree/branch-24.03/examples/digital_fingerprinting "Example") - [Modules](https://docs.nvidia.com/morpheus/modules/index.html#digital-fingerprinting-modules "Modules") - [Reference Guide](https://docs.nvidia.com/morpheus/developer_guide/guides/6_digital_fingerprinting_reference.html "Reference Guide") - [Developer Guide](https://docs.nvidia.com/morpheus/developer_guide/guides/5_digital_fingerprinting.html "Developer Guide") - [GTC Session](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51406/ "GTC Session") - [Demo](https://resources.nvidia.com/en-us-morpheus-developer/cyberscurity-pinpoints-threats-faster-demo?lx=Wr-Hxo "Demo") ## * * * Hands-On Labs, Self-Paced Training, and Instructor-Led Workshops Are you looking to test, prototype, and deploy your own applications and models? NVIDIA LaunchPad provides free access to NVIDIA hardware and software, so you can experience the power of cybersecurity through guided, hands-on labs. Plus, if you’re looking to grow your skills or accelerate your team’s development, NVIDIA Training offers instructor-led workshops and self-paced courses to get you started. ![Learn how to build AI-based cybersecurity solutions](https://developer.download.nvidia.com/images/morpheus/build-ai-based-cybersecurity-solutions-launch-pad-lab.jpg) ### Build AI-Based Cybersecurity Solutions LaunchPad Lab In this free hands-on lab, you’ll find use case examples for digital fingerprinting, analyzing the behavior of every user and machine across the network, detecting and classifying phishing emails, profiling anomalous behavior, and more. [Apply Now](https://www.nvidia.com/en-us/launchpad/ai/build-ai-based-cybersecurity-solutions/ "Apply Now") ![Learn how to detect spear phishing using Generative AI](https://developer.download.nvidia.com/images/morpheus/spear-phishing-detection.jpg) ### Spear Phishing Detection Using Generative AI LaunchPad Lab Have an upcoming cybersecurity AI project? Apply to get hands-on experience using the NVIDIA AI workflow so you can build a faster, more accurate spear phishing detection solution. [Apply Now](https://www.nvidia.com/en-us/launchpad/ai/spear-phishing-detection/ "Apply Now") ![Learn how to detect cyber threats using digital fingerprinting](https://developer.download.nvidia.com/images/morpheus/morpheus-dfp-gif.gif) ### Digital Fingerprinting to Detect Cyber Threats LaunchPad Lab Identify and react to cybersecurity threats faster with AI-based threat detection and alert prioritization. Access a step-by-step guided lab for insider threat detection with ready-to-use software, sample data, and applications. [Apply Now](https://www.nvidia.com/en-us/launchpad/ai/digital-fingerprinting-to-detect-cyber-threats/ "Apply Now") ![Learn how to develop and deploy NVIDIA digital fingerprinting with Morpheus](https://developer.download.nvidia.com/images/morpheus/digital-fingerprinting-with-morpheus-dli-course-2.jpg) ### Digital Fingerprinting with Morpheus DLI Course In this course, you’ll get hands-on experience developing and deploying the NVIDIA digital fingerprinting AI workflow that enables 100 percent data visibility and drastically reduces the time to detect threats. You’ll also hear from cybersecurity experts from a variety of institutions about how to use NVIDIA AI frameworks and tools to architect cybersecurity solutions. [Enroll Now](https://courses.nvidia.com/courses/course-v1:DLI+T-DS-02+V2/ "Enroll Now") ![Learn how to build AI-based cybersecurity pipelines](https://developer.download.nvidia.com/images/morpheus/building-ai-based-cybersecurity-pipelines.jpg) ### Building AI-Based Cybersecurity Pipelines Instructor Led Workshop Build Morpheus pipelines to process and perform AI-based inference on massive amounts of data for cybersecurity use cases—in real time. Utilize several AI models with a variety of data input types for tasks like sensitive information detection, anomalous behavior profiling, and digital fingerprinting. [Enroll Now](https://www.nvidia.com/en-us/training/instructor-led-workshops/building-ai-based-cybersecurity-pipelines/ "Enroll Now") ![Learn how to customize cybersecurity AI workflows](https://developer.download.nvidia.com/images/morpheus/customize-cybersecurity-workflows-launch-pad-lab.jpg) ### Customize Cybersecurity Workflows LaunchPad Lab In this hands-on lab, design and optimize end-to-end cybersecurity AI workflows for production deployments of custom use cases using the NVIDIA Morpheus SDK. Simplify deployment of cybersecurity AI models using NVIDIA Triton™ Inference Server and its optimization utilities for accelerated inference. [Apply Now](https://www.nvidia.com/en-us/launchpad/ai/customized-cybersecurity-ai-applications/ "Enroll Now") Explore the online community for NVIDIA Morpheus, browse how-to questions and best practices, engage with other developers, and report bugs. [Join Now](https://forums.developer.nvidia.com/c/ai-data-science/cybersecurity/455 "Join Now") --- # Source: https://developer.nvidia.com/nccl.md 1. [Cloud and Data Center](/aerial) [Networking](/networking) NCCL # NVIDIA Collective Communications Library (NCCL) The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and networking. [Download NCCL](https://developer.nvidia.com/nccl/nccl-download "Download")[Documentation](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/overview.html# "Documentation")[Release Notes ](https://docs.nvidia.com/deeplearning/nccl/release-notes/index.html "Release Notes")[GitHub](https://github.com/NVIDIA/nccl "GitHub")[NCCL API Guide](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api.html "NCCL API Guide") * * * ## How NCCL Works NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter, and point-to-point send and receive. These routines are optimized to achieve high bandwidth and low latency over PCIe,NVIDIA NVLink™, and other high-speed interconnects within a node and over NVIDIA networking across nodes. With its single-kernel implementation of communication and computation, NCCL ensures low-latency synchronization, making it ideal for both distributed training and real-time inference scenarios. Developers can scale across nodes without tuning for specific hardware configurations, thanks to the NCCL dynamic topology detection and streamlined C-based API. NCCL can be built and installed through [Github](https://github.com/NVIDIA/nccl). NCCL is also available for download as part of the [NVIDIA HPC SDK](/hpc-sdk) and through binaries on the [NVIDIA developer zone](/nccl/nccl-download). ![This is how NVIDIA Collective Communication Library (NCCL) works](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/NCCL_1GPU_multiGPU.png) ### Performance NCCL removes the need for developers to optimize their applications for specific machines. NCCL provides fast collectives over multiple GPUs both within and across nodes. ### Ease of Programming NCCL uses a simple C API that can be easily accessed from a variety of programming languages. NCCL closely follows the popular collectives API defined by Message Passing Interface (MPI). ### Compatibility NCCL is compatible with any multi-GPU parallelization model, including single-threaded, multi-threaded (using one thread per GPU), and multi-process (MPI combined with multi-threaded operation on GPUs). * * * ## Key Features - High-performance collective and point-to-point communication for faster multi-GPU and multi-node training - Device APIs that enable communication directly from CUDA kernels, unlocking lower latency and better compute–communication overlap - Automatic topology detection across PCIe, NVLink™, NVSwitch™, InfiniBand, RoCE, and other networks to maximize performance - Advanced graph search algorithms that build the most efficient rings and trees for peak bandwidth and minimal latency - Flexible plugin framework that extends NCCL to custom transports and next-generation interconnects - Full support for multi-threaded, multi-process, and MPI-driven distributed applications - Integrated profiling, reliability, and observability tools like NCCL RAS and NCCL Inspector to accelerate debugging and performance tuning * * * ## NCCL Blogs * * * ## More Resources Check out the following videos presented by our NCCL team to learn more. - [GTC Webinar](https://www.nvidia.com/en-us/on-demand/session/gtc25-s72583/) - [Multi-GPU Programming in NCCL and NVSHMEM](https://www.youtube.com/live/2xMzQ1Z2Qe0) - [GPU Communication Libraries Tutorial](https://www.youtube.com/watch?v=rlA5QreHekk&list=PLBM5Lly_T4yRGBFgforeMTDpjasC_PV7r&index=32) Learn more about related libraries and software. - [NVIDIA HPC SDK](/hpc-sdk) - [cuDNN](/cudnn) - [cuBLAS](/cublas) - [NVIDIA DALI®](/dali) - [NVIDIA NGC™](https://ngc.nvidia.com/) - [NVIDIA Magnum IO™](https://www.nvidia.com/en-us/data-center/magnum-io/) ![Decorative image representing Developer Newsletter](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Submit a Bug, RFE, or Question ![Decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program Get started with NCCL today. [Download NCCL](/nccl/nccl-download "Download NCCL") --- # Source: https://developer.nvidia.com/nemo-agent-toolkit.md 1. [Topics](/topics) [AI](/topics/ai) 2. [Generative AI](/topics/ai/generative-ai) NeMo Agent Toolkit # NVIDIA NeMo Agent Toolkit NVIDIA NeMo™ Agent Toolkit is an open-source AI framework for building, profiling, and optimizing agents and tools from any framework, enabling unified, cross-framework integration across connected [AI agent](https://www.nvidia.com/en-us/glossary/ai-agents/) systems. By exposing hidden bottlenecks and costs and optimizing the workflow, it helps enterprises scale [agentic systems](https://www.nvidia.com/en-us/glossary/multi-agent-systems/) efficiently while maintaining reliability. NeMo Agent Toolkit is part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for managing the AI agent lifecycle, providing telemetry, orchestration, and observability tools that accelerate development, uncover bottlenecks, and streamline performance across multi-agent systems. [Access GitHub](https://github.com/NVIDIA/NeMo-Agent-Toolkit)[Documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/index.html)[Forum](https://forums.developer.nvidia.com/tags/c/ai-data-science/nvidia-nemo/715/nemo-agent-toolkit) * * * ## See NeMo Agent Toolkit in Action | - [**Create Your Own AI Agent**](https://www.youtube.com/watch?v=NsogD7UhZ4Q) | 18:31 | | - [**Benchmarking and Optimizing AI Agents**](https://www.youtube.com/watch?v=CpuOJwVFmCg&t) | 11:04 | | - [**How To Develop Teams of AI Agents**](https://www.youtube.com/watch?v=H65OluZaiZQ) | 23:06 | | - [**Optimize Your AI Agent Workflows**](https://www.youtube.com/watch?v=yrqdvBLAI3k) | 05:29 | * * * ## How NeMo Agent Toolkit Works NVIDIA NeMo Agent Toolkit provides unified monitoring and optimization for AI agent systems, working across LangChain, CrewAI, and custom frameworks. It captures granular metrics on cross-agent coordination, tool usage efficiency, and computational costs, enabling data-driven optimizations through NVIDIA Accelerated Computing. It can be used to parallelize slow workflows, cache expensive operations, and maintain and evaluate system accuracy quickly. Compatible with OpenTelemetry and major agent frameworks, the toolkit reduces cloud spend and enhances performance while providing insights to scale from single agents to enterprise-grade digital workforces. NeMo Agent Toolkit supports the Model Context Protocol (MCP), enabling developers to use the toolkit to access tools served by remote MCP servers, or as a server to make their own tools available to others via MCP. This means agents built with the toolkit can easily use any tool registered in an MCP registry. ### Simplify Development Experiment and prototype new agentic AI applications quickly and easily with the toolkit’s YAML configuration builder. With universal descriptors for agents, tools, and workflows, you can flexibly choose and connect agent frameworks best suited to each task in a workflow. Access a reusable collection of tools, pipelines, and agentic workflows to ease the development of agentic AI systems. ### Accelerate Development and Improve Reliability Build agentic systems with ease and repeatability. In the tool registry, access the best [retrieval-augmented generation (RAG)](https://www.nvidia.com/en-us/glossary/retrieval-augmented-generation/) architectures, workflows, and search tools available across your organization, or leverage the [AI-Q NVIDIA Blueprint](https://build.nvidia.com/nvidia/aiq), built with NVIDIA NIM™ and NeMo. With the AI-Q blueprint, developers have an example to build highly accurate, scalable multimodal ingestion and [RAG](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline?ncid=pa-srch-goog-898408-API-Build-Exact) pipelines that connect AI agents to enterprise data and reasoning for various use cases, including AI agents for research and reporting. ### Streamline Agent Optimization Use the Agent Hyperparameter Optimizer to automatically select the best hyperparameters, such as the type of large language model (LLM), its temperature, and max\_token, by optimizing the agentic workflow on metrics including accuracy, groundedness, latency, token counts, or custom. The toolkit also supports prompt optimization to further refine agent performance. This allows developers to quickly identify optimal settings for agents, tools, and workflows, reducing trial and error while accelerating innovation across projects. ### Accelerate Agent Responses Use fine-grained telemetry to enhance agentic AI workflows. This profiling data can be used by [NVIDIA NIM](https://www.nvidia.com/en-us/ai-data-science/products/nim-microservices/?ncid=pa-srch-goog-157409-prsp) and [NVIDIA Dynamo](/dynamo) to optimize the performance of agentic systems. These forecasted metrics—which can include details about an inference call to an LLM for a particular agent, such as what prompt is in memory, where it might reside, and which other agents are likely to call it—can be used to drive a more efficient workflow, enabling better business outcomes without requiring an upgrade to underlying infrastructure. ### Increase Accuracy Evaluate an agentic system’s accuracy using metrics collected with NeMo Agent Toolkit, and connect them with your preferred observability and orchestration tools. Understand and debug inputs and outputs for each component in an agentic workflow, and identify areas for improvement. Swap out tools or models and use the toolkit’s YAML config to quickly reevaluate the pipeline to understand its impact. ![A flowchart showing how NeMo Agent Toolkit works](https://developer.download.nvidia.com/images/in-action-diagram.jpg) ### Introductory Blog Learn how to leverage AI code generation with NeMo Agent Toolkit to build a test-driven coding agent. [Read Blog](https://developer.nvidia.com/blog/improve-ai-code-generation-using-nvidia-agentiq-open-source-toolkit/) ### Introductory Video Watch a video walk-through to see how you can get started with NeMo Agent Toolkit. [Watch Video](https://www.youtube.com/watch?v=H65OluZaiZQ) ### Tutorial Blog Take a technical deep dive to learn how to extend the toolkit by adding integration with an additional agentic framework, such as Agno. [Read Blog](https://developer.nvidia.com/blog/extending-the-nvidia-agent-intelligence-toolkit-to-support-new-agentic-frameworks/) ### Notebooks Through this series of notebooks, we demonstrate how you can use NeMo Agent Toolkit to build, connect, evaluate, profile, and deploy an agentic system. [Explore Notebooks](https://github.com/NVIDIA/NeMo-Agent-Toolkit/tree/develop/examples/notebooks) * * * ## Get Started With NeMo Agent Toolkit ### Quick Install With Pip (Recommended) pip install nvidia-nat # Verify the library installation: nat --help nat --version ### Local Installer Instructions (For running examples) # Clone the repo: git clone -b main git@github.com:NVIDIA/NeMo-Agent-Toolkit.git nemo-agent-toolkit cd nemo-agent-toolkit # Initialize the Git repository: git submodule update --init --recursive # Download the datasets: git lfs install git lfs fetch git lfs pull # Create a Python environment: uv sync --all-groups --all-extras uv venv --python 3.12 --seed .venv source .venv/bin/activate uv sync --all-groups --all-extras # Verify the library installation: nat --help nat --version **Note:** For the instructions above, you must have uv already installed. If you do not, to install uv, get started [here](https://docs.astral.sh/uv/getting-started/installation/). * * * ## Starter Kits Start developing agentic AI applications with NeMo Agent Toolkit with tutorials, best practices, and documentation. The [AI-Q NVIDIA Blueprint](https://blogs.nvidia.com/ai-agents-blueprint/) showcases [examples](https://github.com/NVIDIA-AI-Blueprints/aiq-research-assistant) for building agentic workflows that use the toolkit. #### Getting Started With NeMo Agent Toolkit Access the toolkit documentation, and start building, connecting, and evaluating agentic AI systems. - [Get Quick-Start Setup Guide](https://docs.nvidia.com/nemo/agent-toolkit/latest/quick-start/installing.html) - [Read Troubleshooting Guide](https://docs.nvidia.com/nemo/agent-toolkit/latest/troubleshooting.html) - [Read Release Notes](https://docs.nvidia.com/nemo/agent-toolkit/latest/release-notes.html) * * * ## NeMo Agent Toolkit Learning Library Tech Blog ### How To Scale Your LangGraph Agents **NVIDIA NeMo Agent Toolkit** This post will cover the tools and techniques from NVIDIA NeMo Agent Toolkit that can be used to deploy and scale an agentic AI application into production. Video ### Benchmarking and Optimizing AI Agents **NVIDIA NeMo Agent Toolkit** In this step-by-step tutorial, we show you how to get started using the NeMo Agent Toolkit test time compute module with searching, editing, scoring, and selection. Video ### Connect MCP Tools and NVIDIA NIM for Building Optimized Agentic Systems **NVIDIA NeMo Agent Toolkit** Discover how the latest release of NVIDIA NeMo Agent Toolkit streamlines multi-agent interoperability through deep Model Context Protocol (MCP) integration. Video ### How To Build Custom AI Agents **NVIDIA NeMo Agent Toolkit** Learn how to use the toolkit to build custom AI agents and add advanced AI capabilities into your projects. Tech Blog ### Extending NeMo Agent Toolkit To Support New Agentic Frameworks **NVIDIA NeMo Agent Toolkit** Take a technical deep dive to learn how to extend the toolkit by adding integration with an additional agentic framework, such as Agno. Video ### How To Develop Teams of AI Agents **NVIDIA NeMo Agent Toolkit** Learn how to use NeMo Agent Toolkit Python library to build agentic AI applications in this step-by-step tutorial video. Video ### Optimize Your AI Agent Workflows **NVIDIA NeMo Agent Toolkit** Learn how to use the toolkit profiler to get deeper insights into the performance and behavioral characteristics of your AI agent workflows. Tech Blog ### Improving AI Code Generation **NVIDIA NeMo Agent Toolkit, USD, Cosmos** Learn how to leverage AI code generation with the toolkit to build a test-driven coding agent. Documentation ### NeMo Agent Toolkit Documentation **NVIDIA NeMo Agent Toolkit** Read a troubleshooting guide, release notes, quick-start guide, and more to get started. Guide ### Building Multi-Agent Systems the Easy Way **NVIDIA NeMo Agent Toolkit** Read a hands-on guide to using the toolkit, including what you can build, what’s under the hood, and more, published by The BIG DATA guy. Tech Blog ### Scaling Synthetic Data Generation With Multi-Agent AI **NVIDIA NeMo Agent Toolkit, USD, NVIDIA Cosmos™** Learn about a multi-agent approach utilizing generative AI for the systematic, automated creation of top-tier synthetic datasets to advance physical AI development and deployment. Tech Blog ### Chat With Your Enterprise Data Through Open-Source AI-Q NVIDIA Blueprint **NVIDIA NeMo Agent Toolkit** Read how you can get started with AI-Q, a free reference implementation for building advanced AI agents. * * * ## More Resources ![Decorative image representing forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Read the FAQ ## Ethical AI NVIDIA believes trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure their model meets the requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Get started with NeMo Agent Toolkit today. [Access GitHub](http://github.com/NVIDIA/AgentIQ) --- # Source: https://developer.nvidia.com/nemo-curator.md 1. [Topics](https://developer.nvidia.com/topics/) [AI](https://developer.nvidia.com/topics/ai) 2. [Generative AI](https://developer.nvidia.com/generative-ai) NVIDIA NeMo Curator # NVIDIA NeMo Curator for Developers [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/)™ Curator improves generative AI model accuracy by processing text, image, and video data at scale for training and customization. It also provides prebuilt pipelines for generating synthetic data to customize and evaluate generative AI systems. With NeMo Curator, part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for managing the AI agent lifecycle, developers can curate high-quality data and train highly accurate generative AI models for various industries, including finance, retail, manufacturing and telecommunications. NeMo Curator, along with [NeMo microservices](https://developer.nvidia.com/blog/maximize-ai-agent-performance-with-data-flywheels-using-nvidia-nemo-microservices/) enables developers to create [data flywheels](https://www.nvidia.com/en-us/glossary/data-flywheel/) and continuously optimize generative AI agents, enhancing the overall experience for end users. [Download ](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator "Github Repo")[Documentation ](https://docs.nvidia.com/nemo/curator/latest/ "Download Workflows")[Forum ](https://forums.developer.nvidia.com/tags/c/ai-data-science/nvidia-nemo/715/nemo-curator "Download Workflows") * * * ## How NVIDIA NeMo Curator Works NeMo Curator streamlines data-processing tasks, such as data downloading, extraction, cleaning, quality filtering, deduplication, and blending or shuffling, providing them as Pythonic APIs, making it easier for developers to build data-processing pipelines. High-quality data processed from NeMo Curator enables you to achieve higher accuracy with less data and faster model convergence, reducing training time. NeMo Curator supports the processing of text, image, and video modalities and can scale up to 100+ PB of data. NeMo Curator provides a customizable and modular interface, allowing you to select the building blocks for your data processing pipelines. Please refer to the architecture diagrams below to see how you can build data processing pipelines. ### Text Data Processing This architecture diagram shows the various features available for processing text. At a high level, a typical text processing pipeline begins with downloading data from public sources or private repositories and performing cleaning steps, such as fixing Unicode characters. Next, heuristic filters—such as word count—are applied, followed by deduplication, advanced quality filtering using [classifier models](https://huggingface.co/collections/nvidia/nemo-curator-classifier-models-66b25154213dafdcb8bde900) for quality and domain, and finally, data blending. _Click to Enlarge_ [![NeMo Curator lets you use prebuilt synthetic data generation pipelines or build your own with easy-to-use set of tools](https://developer.download.nvidia.com/images/nemo-curator-sdg-general-arch.png)](https://developer.download.nvidia.com/images/nemo-curator-sdg-general-arch.png) _Click to Enlarge_ ### Synthetic Data Generation NeMo Curator has a simple, easy-to-use set of tools that let you use pre-built [synthetic data generation](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/syntheticdata.html) pipelines or build your own. Any model inference service that uses the OpenAI API is compatible with the synthetic data generation module, allowing you to generate your data from any model. NeMo Curator provides pre-built pipelines for several use cases to help you get started easily, including evaluating and customizing embedding models, prompt generation (open Q&A, closed Q&A, writing, math/coding), synthetic two-turn prompt generation, dialogue generation, and entity classification. ### Video Data Processing This architecture diagram illustrates the various features available through the early access program for processing high-quality videos. A typical pipeline has the following steps - **Video decoding and splitting** : Decode long videos and split them into semantically shorter clips. - **Transcoding** : Convert all the short videos to a consistent format. - **Captioning** : Caption videos using domain-specific state-of-the-art vision language models (VLMs) to describe the clips in detail. - **Text embedding** : Create embeddings of text captions for downstream semantic search and deduplication. [![NeMo Curator supports the processing of video modalities](https://developer.download.nvidia.com/images/nemo-curator/nemo-video-data-processing.jpg)](https://developer.download.nvidia.com/images/nemo-curator/nemo-video-data-processing.jpg) _Click to Enlarge_ _Click to Enlarge_ ### Audio Data Processing This architecture diagram shows the various features available for processing audio. A typical pipeline has the following steps - **Data download and extraction** : Fetch audio files from cloud, internet, or local disk sources. - **Speech-to-text inference** : Transcribe audio with a NeMo ASR model, using GPU acceleration for speed. - **Metric calculation (WER)**: Compute Word Error Rate to assess transcription accuracy. - **Get audio duration** : Extract duration metadata for each file. - **Feature-based filtering** : Filter samples by WER and duration thresholds. - **Metadata conversion** : Transform curated outputs to document format and export as JSONL. ### Image Data Processing This architecture diagram shows the various features available for processing images. A typical pipeline begins with downloading the dataset in a WebDataset format, followed by creating CLIP embeddings. Next, the images are filtered for high quality using the NSFW and Aesthetic filters. Duplicate images are then removed using semantic deduplication, and finally, a high-quality dataset is created. _Click to Enlarge_ * * * ## Introductory Resources ### Introductory Blog Learn about the various features NeMo Curator offers for processing high-quality data in this introductory blog. [Read Blog](https://developer.nvidia.com/blog/scale-and-curate-high-quality-datasets-for-llm-training-with-nemo-curator/) ### Tutorials These tutorials provide the coding foundation for building applications that consume the data that NeMo Curator curates. [Explore the Notebooks](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials) ### Introductory Webinar Explore how to easily build scalable data-processing pipelines to create high-quality datasets for training and customization. [Register Now](https://www.nvidia.com/en-us/events/enhance-generative-ai-model-accuracy/) ### Documentation These docs provide an in-depth overview of the various features supported, best practices, and tutorials. [Read Documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/index.html) * * * ## Ways to Get Started With NVIDIA NeMo Curator Use the right tools and technologies to generate high-quality datasets for large language model (LLM) training. ![Decorative icon](https://developer.download.nvidia.com/icons/m48-download.svg) ### Download For those looking to use the NeMo framework for development, the container is available to download for free on the NVIDIA NGC™ catalog. [Pull Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) ![Decorative icon representing source code](https://developer.download.nvidia.com/images/icons/m48-coding-256px-blk.png) ### Access To use the latest pre-release features and source code, NeMo Curator is available as an open-source project on GitHub. [Access Code](https://github.com/NVIDIA/NeMo-Curator) * * * ## Performance NeMo Curator leverages NVIDIA RAPIDS™ libraries like cuDF, cuML, and cuGraph along with Ray to scale workloads across multi-node, multi-GPU environments, significantly reducing data processing time. For video processing, it uses a combination of a hardware decoder (NVDEC) and a hardware encoder (NVENC) as well as Ray to avoid bottlenecks and ensure high performance. With NeMo Curator, developers can achieve 16x faster processing for text and 89x faster processing for video when compared to alternatives. Refer to the charts below for more details. ### Accelerate Video Processing From Years to Days With NeMo Curator Processing time for 20 million hours of video. ![](https://developer.download.nvidia.com/images/nemo-curator/accelerate-video-processing@2x.svg) \* Performance compared with ISO power consumption on 2,000 CPUs and 128 DGX nodes​ ### 16x Faster Text Processing Time With NeMo Curator Processing time for fuzzy duplication of the RedPajama-v2 subset (8 TB). ![](https://developer.download.nvidia.com/images/nemo-curator/16x-faster-text-processing-time@2x.svg) **‘“On”** : Data processed with NeMo Curator ​ **“Off”** : Data processed with a leading alternative library on CPUs * * * ## Starter Kits Start developing your generative AI application with NeMo Curator by accessing [tutorials](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials), [best practices](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/bestpractices.html#data-curator-best-practices), and [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/index.html) for various use cases. ### Text Processing Process high-quality text data with features such as deduplication, quality filtering, and synthetic data generation. - [Read Processing Custom Datasets for LLM Training Blog](https://developer.nvidia.com/blog/curating-custom-datasets-for-llm-training-with-nvidia-nemo-curator/) - [Read Processing Non-English Datasets Blog](https://developer.nvidia.com/blog/curating-non-english-datasets-for-llm-training-with-nvidia-nemo-curator/) - [Read Synthetic Data Generation With Llama 3.1-405B Blog](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/synthetic-preference-data) ### Image Processing Process high-quality image data with features such as semantic deduplication, CLIP image embedding, NSFW, and aesthetic filters. - [Explore Image Curation Tutorial](https://github.com/NVIDIA/NeMo-Curator/blob/main/tutorials/image-curation/image-curation.ipynb) - [Read Image Curation Documentation](https://github.com/NVIDIA/NeMo-Curator/tree/main/docs/user-guide/image) ### Audio Processing Process high-quality audio data with features such as splitting, transcoding, filtering, annotation, and semantic deduplication. - [Explore Tutorial](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/audio) ### Video Processing Process high-quality video data with features such as splitting, transcoding, filtering, annotation, and semantic deduplication. - [Apply for Early Access](https://developer.nvidia.com/nemo-early-access) - [Read Video Tokenization Blog](https://developer.nvidia.com/blog/state-of-the-art-multimodal-generative-ai-model-development-with-nvidia-nemo/) - [Read Video Foundation Model Blog](https://developer.nvidia.com/blog/accelerate-custom-video-foundation-model-pipelines-with-new-nvidia-nemo-framework-capabilities/) - [Watch Video Foundation Model Video](https://youtu.be/dqP-I59wUwU) * * * ## NVIDIA NeMo Curator Learning Library * * * ## NVIDIA NeMo Curator Customers [![NVIDIA NeMo Curator Customer - Coxwave](https://developer.download.nvidia.com/images/nemo-curator/coxwave-logo.svg)](https://developer.nvidia.com/blog/boost-embedding-model-accuracy-for-custom-information-retrieval/?fbclid=IwY2xjawLP6XtleHRuA2FlbQIxMABicmlkETFSZEhFUlJMYWxZeDVJemNiAR7ULxRvqhXiFwxG5F6w8ePItciehZLWchmUn2hUsfd5S6NqkUZG0_Rn6ISEMQ_aem_92ItzMFCBI3O0D97XWxUnw) [![NVIDIA NeMo Curator Customer - gnani.ai](https://developer.download.nvidia.com/images/nemo-curator/gnani.ai-logo.svg)](https://www.nvidia.com/en-us/on-demand/session/gtc25-s71829/) [![NVIDIA NeMo Curator Customer - Petrobras](https://developer.download.nvidia.com/images/nemo-curator/petrobras-logo.svg)](https://www.nvidia.com/en-us/on-demand/session/gtc25-s73150/) [![NVIDIA NeMo Curator Customer - Quantiphi](https://developer.download.nvidia.com/images/nemo-curator/quantiphi-logo.svg)](https://quantiphi.com/blog/optimizing-ai-with-fine-tuned-slms-boost-efficiency-in-telecom-using-nvidia-nemo-microservices/) [![NVIDIA NeMo Curator Customer - SES](https://developer.download.nvidia.com/images/nemo-curator/ses-ai-logo.svg)](https://developer.nvidia.com/blog/accelerating-the-future-of-transportation-with-ses-ais-nvidia-powered-innovation-for-electric-vehicles/) [![NVIDIA NeMo Curator Customer - Trillion Labs](https://developer.download.nvidia.com/images/nemo-curator/trillion-labs-logo.svg)](https://www.nvidia.com/en-us/on-demand/session/gtc25-s73857/) [![NVIDIA NeMo Curator Customer - Viettel Solutions](https://developer.download.nvidia.com/images/nemo-curator/viettel-solutions-logo.svg)](https://developer.nvidia.com/blog/processing-high-quality-vietnamese-language-data-with-nvidia-nemo-curator/) [![NVIDIA NeMo Curator Customer - Zyphra](https://developer.download.nvidia.com/images/nemo-curator/zyphra-logo.svg)](https://developer.nvidia.com/blog/train-highly-accurate-llms-with-the-zyda-2-open-5t-token-dataset-processed-with-nvidia-nemo-curator/) * * * ## More Resources ![Decorative image representing forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Accelerate Your Startup * * * ## Ethical AI NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended. Get started with NVIDIA NeMo Curator. [Download Now](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) --- # Source: https://developer.nvidia.com/nemo-customizer.md 1. [Topics](/topics) [AI](/topics/ai) 2. [Generative AI](/topics/ai/generative-ai) NVIDIA NeMo Customizer # NVIDIA NeMo Customizer for Developers [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/)™ Customizer is a high-performance, scalable microservice that simplifies fine-tuning and alignment of generative AI models for building domain-specific AI agents. Through an API-first approach, this microservice supports popular [customization](https://developer.nvidia.com/blog/mastering-llm-techniques-training/) and post-training techniques such as low-rank adaptation (LoRA), full supervised fine-tuning (SFT), direct preference optimization (DPO), and Group Relative Policy Optimization (GRPO) with continued integration of the latest customization and alignment techniques. For broader reinforcement learning support with advanced RL algorithms and large-scale post-training, explore the open-source [NeMo RL library](https://docs.nvidia.com/nemo/rl/latest/index.html#), part of the NeMo framework. NeMo Customizer, part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for managing the AI agent lifecycle, enables developers to seamlessly build [data flywheels](https://www.nvidia.com/en-us/glossary/data-flywheel/) that continuously optimize AI agents for improved performance, efficiency, and cost. [Download Now](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/helm-charts/nemo-microservices-helm-chart "Download on NGC")[Read Documentation](https://docs.nvidia.com/nemo/microservices/latest/fine-tune/index.html "Read the Documentation") * * * ## See NVIDIA NeMo Customizer in Action Learn how NeMo Customizer enables developers to fine-tune large language models using domain-specific data—enabling the creation of tailored AI agents for tasks such as customer support, healthcare insights, enterprise automation, and many other applications. https://www.youtube-nocookie.com/embed/TX-NNK2FRdY? * * * ## How NVIDIA NeMo Customizer Works NeMo Customizer provides an easy-to-use API that lets you customize generative AI models. Simply provide the dataset, model name, hyperparameters, and type of customization in the API payload. NeMo Customizer will initiate a job to tune the model, resulting in a customized version. The architecture diagram below illustrates the flow for using NeMo Customizer, letting you seamlessly launch multiple customization jobs. In the depicted scenario, you can utilize NeMo Customizer to create two customization workflows: one for fine-tuning and one for alignment tuning. These outputs, along with NVIDIA NIM™, allow you to deploy a customized model tailored to your specific use case. NeMo Customizer currently supports DPO and GRPO for reinforcement learning (RL). For broader RL support with advanced algorithms and large-scale post-training, explore the open-source [NeMo RL library](https://docs.nvidia.com/nemo/rl/latest/index.html#) part of the NeMo framework. ![A flowchart of how NVIDIA NeMo Customizer works](https://developer.download.nvidia.com/images/nemo-llm/llm-nemo-dz.jpg) ### Introductory Blog Read how NeMo Customizer simplifies the alignment and customization of generative AI models. [Read Blog](https://developer.nvidia.com/blog/maximize-ai-agent-performance-with-data-flywheels-using-nvidia-nemo-microservices/) ### Tutorials Explore tutorials designed to help you build custom generative AI models with the NeMo Customizer microservice. [Try Tutorials](https://docs.nvidia.com/nemo/microservices/latest/fine-tune/index.html) ### Introductory Webinar Learn how data flywheels enhance self-improving agentic AI systems and explore best practices for integrating NeMo components to optimize agent performance and cost-efficiency. [Watch Now](https://www.nvidia.com/en-us/events/supercharge-agentic-ai-build-scalable-data-flywheels/) ### How-To Blog Dive deeper into how NVIDIA NeMo microservices help build data flywheels with a case study and a quick overview of the steps in an end-to-end pipeline. [Read Blog](https://developer.nvidia.com/blog/enhance-your-ai-agent-with-data-flywheels-using-nvidia-nemo-microservices/) * * * ## Ways to Get Started With NVIDIA NeMo Customizer Get started with NeMo Customizer to simplify fine-tuning and alignment of large language models (LLMs) for domain-specific use cases, and for broader RL support with advanced algorithms and large-scale post-training, explore the open-source [NeMo RL library](https://docs.nvidia.com/nemo/rl/latest/index.html#), part of the NeMo framework. ![Download icon](https://developer.download.nvidia.com/icons/m48-download.svg) ### Download Get free access to the NeMo Customizer microservice for research, development, and testing. [Download Microservices](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/helm-charts/nemo-microservices-helm-chart?version=25.9.0) ![Blueprint icon](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) ### Try Jump-start building your AI solutions with [NVIDIA AI Blueprints](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/), customizable reference applications, available on the NVIDIA API catalog. [Try the Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-data-flywheel) * * * ## Performance NeMo Customizer uses several parallelism techniques to reduce the training time for large models with support for multi-GPU and multi-node infrastructure. These methods operate together to enhance the training process, ensuring optimal use of resources and improved training performance. **Experience 1.8x Faster Customization With NeMo Customizer** ![A chart showing 2x faster customization with NeMo Customizer](https://developer.download.nvidia.com/images/graph-nemo-training-throughput.svg) The benchmark represents customizing Llama-3-8B on one 8xH100 80G SXM with sequence packing (4096 pack size, 0.9958 packing efficiency). On: customized with NeMo Customizer. Off: customized with leading market alternatives. * * * ## Starter Kits Start tuning your generative AI models with NeMo Customizer by accessing tutorials, best practices, and documentation for various use cases. ### Customizing LLMs Get started with popular customization techniques, such as LoRA, SFT, and p-tuning. - [Explore the Getting Started Guide](https://docs.nvidia.com/nemo/microservices/latest/get-started/index.html) - [Try the Tutorial on Customizing and Evaluating LLMs](https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/customize-eval-loop.html) ### Data Flywheel Enable self-improving agentic AI workflows by automating model optimization. - [Try the Data Flywheel Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-data-flywheel) - [Learn How to Get Started with the Blueprint](https://developer.nvidia.com/blog/build-efficient-ai-agents-through-model-distillation-with-nvidias-data-flywheel-blueprint/) ### NeMo RL Open-source library with support for advanced reinforcement learning algorithms and large-scale post-training of LLMs. - [Explore the Getting Started Guide](https://docs.nvidia.com/nemo/rl/latest/local-workstation.html) - [Try the Tutorial to Build Reasoning Models With GRPO](https://docs.nvidia.com/nemo/rl/latest/guides/grpo-deepscaler.html) * * * ## NVIDIA NeMo Customizer Learning Library * * * ## More Resources ![](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Meet the Program for Startups ## Ethical AI NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended. # Get started with NeMo Customizer today. [Download Now](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/helm-charts/nemo-microservices-helm-chart "Download Now") --- # Source: https://developer.nvidia.com/nemo-evaluator.md 1. [Topics](/topics) [AI](/topics/ai) 2. [Generative AI](/topics/ai/generative-ai) NVIDIA NeMo Evaluator # NVIDIA NeMo Evaluator for Developers NVIDIA NeMo™ Evaluator is a scalable solution for evaluating generative AI applications—including large language models (LLMs), retrieval-augmented generation (RAG) pipelines, and AI agents—available as both an open-source SDK for experimentation and a cloud-native microservice for automated, enterprise-grade workflows. NeMo Evaluator SDK supports over 100 built-in academic benchmarks and an easy-to-follow process for adding customizable metrics via open-source contribution. In addition to academic benchmarks, NeMo Evaluator microservice provides LLM-as-a-judge scoring, RAG, and agent metrics that make it easy to assess and optimize models across environments. NeMo Evaluator is a part of the NVIDIA NeMo™ software suite for building, monitoring, and optimizing AI agents across their lifecycle at enterprise scale. [Access SDK Quickstart](https://github.com/NVIDIA-NeMo/Evaluator?tab=readme-ov-file#-quickstart "Access SDK Quickstart ")[Download Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/evaluator?version=25.09 "Download Microservice")[Documentation](https://docs.nvidia.com/nemo/microservices/latest/evaluate/index.html "Documentation") * * * ## NVIDIA NeMo Evaluator Key Features NeMo Evaluator is built on a single-core engine that powers both the open-source SDK and the enterprise-ready microservice. # SDK An open-source SDK for running academic benchmarks with reproducibility and scale. Built on the [nemo-evaluator core and launcher](https://github.com/NVIDIA-NeMo/Evaluator?tab=readme-ov-file#how-it-works-launcher-and-core-engine), it provides code-native access for experimentation on LLMs, embeddings, and reranking models. - Reproducible by default: Captures configs, seeds, and software provenance for auditable, repeatable results. - Comprehensive benchmarks: Over 100 academic benchmarks across leading harnesses and modalities, [continuously updated](https://github.com/NVIDIA-NeMo/Evaluator?tab=readme-ov-file#supported-benchmarks-and-evaluation-harnesses). - Python-native and ready to run: Configs and containers deliver results directly in notebooks or scripts. - Flexible and scalable: Run locally with Docker or scale out to Slurm clusters. [Go to GitHub](https://github.com/NVIDIA-NeMo/Evaluator) # Microservice An enterprise-grade, cloud-native REST API that automates scalable evaluation pipelines. Teams can submit jobs, configure parameters, and monitor results centrally—ideal for CI/CD integration and production-ready generative AI operations workflows. - Automates scalable evaluation pipelines with a simple REST API. - Abstracts complexity: Submit “jobs,” configure parameters, and monitor results centrally. [Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/evaluator?version=25.09) * * * ## How NVIDIA NeMo Microservices Evaluator Works NeMo Evaluator microservice allows a user to run various evaluation jobs for agentic AI applications through a REST API. Evaluation flows enabled include: academic benchmarking, agentic and RAG metrics, and LLM-as-a-judge. A user can also tune their judge model via the prompt optimization feature. ![Diagram of how NeMo evaluator works](https://developer.download.nvidia.com/images/nemo-llm/llm-diagram-evaluator.jpg) * * * ## Introductory Resources ### Introductory Blog Read how the NeMo Evaluator microservice simplifies end-to-end evaluation of generative AI systems. [Read Blog](/blog/streamline-evaluation-of-llms-for-accuracy-with-nvidia-nemo-evaluator/) ### Tutorial Notebook Explore tutorials designed to help you evaluate generative AI models with the NeMo Evaluator microservice. [Explore Tutorials](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/nemo/Evaluator) ### Introductory Webinar Understand the architecture of data flywheels and their role in enhancing agentic AI systems and learn best practices for integrating NeMo components to optimize AI agent performance. [Watch Now](https://www.nvidia.com/en-us/events/supercharge-agentic-ai-build-scalable-data-flywheels/) ### How-To Blog Dive deeper into how NVIDIA NeMo microservices help build data flywheels with a case study and a quick overview of the steps in an end-to-end pipeline. [Read Blog](/blog/enhance-your-ai-agent-with-data-flywheels-using-nvidia-nemo-microservices/) * * * ## Ways to Get Started With NVIDIA NeMo Evaluator Use the right tools and technologies to assess generative AI models and pipelines across academic and custom LLM benchmarks on any platform. ![](https://developer.download.nvidia.com/icons/m48-download.svg) # Download Get free access to the NeMo Evaluator microservice for research, development, and testing. [Download Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/evaluator?version=25.09) ![Buiild icon](https://developer.download.nvidia.com/icons/m48-coding.svg) ### Access Get free access to the NeMo Evaluator microservice for research, development, and testing. [Access SDK](https://github.com/NVIDIA-NeMo/Evaluator?tab=readme-ov-file#-quickstart) ![](https://developer.download.nvidia.com/images/isaac/m48-digital-deep-learning-institute-talks-training.svg) # Try Jump-start building your AI solutions with NVIDIA AI Blueprints, customizable reference applications, available on the NVIDIA API catalog. [Try the Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-data-flywheel) * * * ## See NVIDIA NeMo Evaluator Microservice in Action Watch these demos to see how the NeMo Evaluator microservice simplifies the evaluation and optimization of AI agents, RAG, and LLMs. ### Evaluate LLMs With NeMo Evaluator and Docker Compose This step-by-step guide walks through deploying the NeMo Evaluator microservice using Docker Compose and running custom evaluations. [Watch Video](https://www.youtube.com/watch?v=Fo9kNJE5nC8) ### Scale AI Agent Evaluation With NeMo Evaluator LLM-as-a-Judge In this step-by-step tutorial, you’ll discover how to scale your AI agent evaluation workflows with NeMo Evaluator LLM-as-a-judge. [Watch Video](https://www.youtube-nocookie.com/watch?v=IDXWrlWKr4c) ### Set Up a Data Flywheel to Optimize AI Models and Agents Get an overview of the data flywheel blueprint, understand how to do model evaluation and cost optimization, explore the evaluation report, and more. [Watch Video](https://www.youtube.com/watch?v=rjLUm_7CDkM&t=133s) ### Customizing AI Agents for Tool Calling With NeMo Microservices Learn how to customize AI agents for precise function calling with this end-to-end example with NeMo microservices. [Watch Video](https://www.youtube.com/watch?v=TX-NNK2FRdY) * * * ## Starter Kits ### LLM-as-a-Judge Automate subjective evaluation of open-ended responses, RAG systems, or AI agents. Ensures structured scoring and consistency. - [Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/evaluator?version=25.09) - [Read Docs](https://docs.nvidia.com/nemo/microservices/latest/evaluate/flows/llm-as-a-judge.html) - [Try the Tutorial Notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/nemo/Evaluator/LLMAsAJudge/LLM%20As%20a%20Judge.ipynb) ### Similarity Metrics Measure how well LLMs or retrieval models handle domain-specific queries using F1, ROUGE, or other metrics. - [Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/evaluator?version=25.09) - [Read Docs](https://docs.nvidia.com/nemo/microservices/latest/evaluate/flows/template.html#chat-completion-tasks) - [Try the Tutorial Notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/nemo/Evaluator/GettingStarted/Getting%20Started%20with%20NeMo%20Evaluator.ipynb) - [Try the RAG Evaluation Tutorial Notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/nemo/Evaluator/EmbeddingAndRAG/NeMo_Evaluator_Retriever_and_RAG_Evaluation.ipynb) - [Try the Retrieval Evaluation With Synthetic Data Notebook](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/nemo/retriever-synthetic-data-generation) ### Agent Evaluation Evaluate whether agents call the right functions with the correct parameters; integrates with CI/CD pipelines. - [Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/evaluator?version=25.09) - [Try the Tutorial ](https://docs.nvidia.com/nemo/microservices/latest/evaluate/flows/agentic.html) ### LLM Benchmarks Standardized evaluation of model performance across reasoning, math, coding, and instruction-following. Supports regression testing. - [Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/evaluator?version=25.09) - [Read Docs](https://docs.nvidia.com/nemo/microservices/latest/evaluate/flows/academic-benchmarks/index.html) - [Try the Tutorial Notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/nemo/Evaluator/GettingStarted/Getting%20Started%20with%20NeMo%20Evaluator.ipynb) * * * ## NVIDIA NeMo Evaluator Learning Library * * * ## More Resources ![](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Meet the Program for Startups ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). **Get started with NeMo Evaluator today.** [Download Now](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/collections/nemo-microservices "Download Now") --- # Source: https://developer.nvidia.com/nemo-guardrails.md 1. [Topics](/topics) [AI](/topics/ai) 2. [Generative AI](/topics/ai/generative-ai) NVIDIA NeMo Guardrails # NVIDIA NeMo Guardrails for Developers NVIDIA NeMo™ Guardrails is a scalable solution for orchestrating AI guardrails that keep agentic AI applications safe, reliable, and aligned. It allows you to define, orchestrate, and enforce guardrails for topic control, PII detection, retrieval-augmented generation (RAG) grounding, jailbreak prevention, and multilingual, multimodal content safety with reasoning capabilities—all with low latency and seamless integration. Extensible and customizable, NeMo Guardrails integrates with frameworks like LangChain, LangGraph, and LlamaIndex; supports multi-agent deployments; and leverages GPU acceleration for low-latency performance. It works out of the box with [NVIDIA Nemotron™](/nemotron) models packaged as [NVIDIA NIM™ microservices](https://build.nvidia.com/search?q=nemoguard) and available on [Hugging Face](https://huggingface.co/collections/nvidia/nemoguard)—covering content safety, topic control, and jailbreak detection—alongside a [growing ecosystem of AI safety models](https://docs.nvidia.com/nemo/guardrails/latest/user-guides/guardrails-library.html), rails, and observability tools. NeMo Guardrails is part of the larger [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for building, monitoring, and optimizing AI agents across their lifecycle. [Access Library](https://github.com/NVIDIA-NeMo/Guardrails "Github Repo")[Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/guardrails?version=25.09 "Download Workflows")[Documentation ](https://docs.nvidia.com/nemo/guardrails/latest/index.html "Download Workflows") * * * ## See NVIDIA NeMo Guardrails in Action Enforce content safety, RAG grounding, and jailbreak prevention while building secure, compliant AI agents. This video demonstrates how NeMo Guardrails streamlines guardrail orchestration for safer, more reliable AI applications. https://www.youtube.com/embed/Hg2KibOvnLM? * * * ## How NVIDIA NeMo Guardrails Works NeMo Guardrails provides components for building a robust, scalable guardrail solution for LLM applications and agents. It evaluates user inputs and model responses based on use-case-specific policies, providing an additional layer of safeguards beyond what’s natively available. Key Benefits: - **Programmable Policies:** Supports customizable content moderation, PII detection, topic relevance, and jailbreak detection tailored to your industry and use case. - **Effective Orchestration:** Screens both user inputs and model outputs and effectively orchestrates multiple rails with the lowest latency. - **Enterprise-Grade Support and Scale:** Handles high volume and scale to multiple applications with enterprise-grade support. - **Flow Management:** Blocks, filters, or tailors next action or responses based on your requirements with flexible actions. ### Introductory Blog Simplify building trustworthy LLM apps with AI guardrails for safety, security, and control. [Read Blog](https://developer.nvidia.com/blog/nvidia-enables-trustworthy-safe-and-secure-large-language-model-conversational-systems/) ### Deploy Guardrails Tutorial Run inference with parallel rails using NeMo Guardrails microservice. [Access Tutorial](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/nemo/NeMo-Guardrails/Parallel_Rails_Tutorial.ipynb) ### Example Configurations The configurations in this folder showcase various features of NeMo Guardrails, including using a specific LLM and enabling streaming and fact-checking. [Explore Examples](https://docs.nvidia.com/nemo/guardrails/getting-started/3-demo-use-case/README.html) ### Customer Assistant Example Learn how to integrate advanced content moderation, jailbreak detection, and topic control with NeMo Guardrails microservices. [Try Notebook](https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/examples/notebooks/safeguard_ai_virtual_assistant_notebook.ipynb) * * * ## Ways to Get Started With NVIDIA NeMo Guardrails Use the right tools and technologies to safeguard AI applications with NeMo Guardrails scalable AI guardrail orchestration solution. ![AI guardrails code](https://brand-assets.cne.ngc.nvidia.com/assets/marketing-icons/2.3.0/download.svg) ### Download Get free access to the NeMo Guardrails microservice for research, development, and testing. You can try the microservice through the [Safety for Agentic AI](https://build.nvidia.com/nvidia/safety-for-agentic-ai) developer example. [Download Microservice ](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/collections/nemo-microservices) ![AI guardrails code](https://developer.download.nvidia.com/images/icons/m48-coding-256px-blk.png) ### Access To use the latest features and source code for adding AI guardrails to LLM applications, NeMo Guardrails is available as an open-source project on GitHub. [Access Toolkit](https://github.com/NVIDIA/NeMo-Guardrails) ![AI guardrails microservice](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) ### Try Try the Nemotron Safety Guard models for content safety, topic control, and jailbreak detection on Hugging Face. [Try the Models](https://huggingface.co/collections/nvidia/nemoguard) * * * ## Performance NeMo Guardrails enables AI guardrails to ensure that LLM responses are safe, secure, and compliant. Experience up to 1.4x improvement in detection rate with a mere half-second of latency. Keep enterprise AI operations safe and reliable by enforcing custom rules for AI models, agents, and systems. Use prepackaged [NVIDIA NIM microservices](https://developer.nvidia.com/nim) that are optimized to make it easier to deploy. ### Experience Over 1.4x Improved Detection Rate With Only Half a Second of Latency With NeMo Guardrails Evaluated policy compliance with five AI guardrails. ![A chart showing 1.5X Improvement in Protection Rate With Only 30 Seconds of Latency With NeMo Guardrails](https://developer.download.nvidia.com/images/NeMo-guardrails-chart-1.4x.svg) The benchmark shows that orchestrating up to five GPU-accelerated guardrails in parallel with NeMo Guardrails increases detection rate by 1.4x while adding only ~0.5 seconds of latency—delivering ~50% better protection without slowing down responses. * * * ## Starter Kits ### Hero-Workflow Safeguard your deployments with NemoGuard NIM microservices. - [Integrate NemoGuard Microservice](https://docs.nvidia.com/nemo/microservices/latest/guardrails/tutorials/integrate-nemoguard-nims.html#guardrails-integrate-nemoguard-nims-kubernetes) - [Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/collections/nemo-microservices) - [Try the Tutorial Notebook ](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/nemo/NeMo-Guardrails/Parallel_Rails_Tutorial.ipynb) ### Guardrails for RAG Enhance content safety with programmable guardrails while building RAG apps delivering context-aware responses from vast multimodal enterprise data sources. - [Try Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) - [Tutorial](https://github.com/NVIDIA-AI-Blueprints/rag) ### Guardrails Evaluation Measure the effectiveness and performance of AI guardrails in generative AI applications with an evaluation tool through NeMo Guardrails. - [Try Microservice](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/collections/nemo-microservices) - [Read Blog](https://developer.nvidia.com/blog/measuring-the-effectiveness-and-performance-of-ai-guardrails-in-generative-ai-applications/) * * * ## NVIDIA NeMo Guardrails Learning Library * * * ## More Resources ![AI guardrails community](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![AI guardrails training](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![AI guardrails startup](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Accelerate Your Startup * * * ## Ethical AI NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instructions and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended. # Get started with NeMo Guardrails today. [Download Now](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/collections/nemo-microservices) --- # Source: https://developer.nvidia.com/nemo-retriever.md 1. [Topics](https://developer.nvidia.com/topics/) [AI](https://developer.nvidia.com/topics/ai) 2. [Generative AI](https://developer.nvidia.com/generative-ai) NVIDIA NeMo Retriever # NVIDIA NeMo Retriever NVIDIA NeMo™ Retriever is a collection of industry-leading [Nemotron RAG](https://huggingface.co/collections/nvidia/nemotron-rag-68f01e412f2dc5a5db5f30ed) models delivering 50% better accuracy, 15x faster multimodal PDF extraction, and 35x better storage efficiency, enabling enterprises to build retrieval-augmented generation (RAG) pipelines that provide real-time business insights. NeMo Retriever, part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for managing the AI agent lifecycle, ensures data privacy and seamlessly connects to proprietary data wherever it resides, empowering secure, enterprise-grade retrieval. NeMo Retriever serves as a core component for [NVIDIA AI-Q](https://build.nvidia.com/nvidia/aiq)—a blueprint for building intelligent AI agents—and the [NVIDIA RAG blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline), enabling access to knowledge from enterprise AI data platforms. It provides a reliable foundation for scalable, production-ready retrieval pipelines supporting advanced AI applications. [NeMo Retriever microservices](https://developer.nvidia.com/blog/nvidia-nemo-retriever-delivers-accurate-multimodal-pdf-data-extraction-15x-faster/) set a new standard for enterprise RAG applications, leading the industry with first-place performance across three top visual document retrieval leaderboards (ViDoRe V1, ViDoRe V2, MTEB, and MMTEB VisualDocumentRetrieval). [Access Code](https://huggingface.co/collections/nvidia/nemotron-rag-68f01e412f2dc5a5db5f30ed "Try Now")[Forum ](https://forums.developer.nvidia.com/c/ai-data-science/nvidia-nemo/715 "Forum") * * * ## Documentation Build world-class information retrieval pipelines and [AI query engines](https://blogs.nvidia.com/blog/ai-query-engines-agentic-ai/) with scalable data extraction and high-accuracy embedding and reranking. ### Ingestion Rapidly ingest massive volumes of data and extract text, graphs, charts, and tables at the same time for highly accurate retrieval. - [Documentation](https://docs.nvidia.com/nemo/retriever/extraction/overview/) - [Experience the RAG Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) - [Try NIM Microservices for Retrieval](https://build.nvidia.com/explore/retrieval) ### Embedding Boost text question-and-answer retrieval performance, providing high-quality embeddings for many downstream natural language processing (NLP) tasks. - [Documentation](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html) - [Read the Blog](https://developer.nvidia.com/blog/develop-multilingual-and-cross-lingual-information-retrieval-systems-with-efficient-data-storage/) - [Try NIM Microservices for Retrieval](https://build.nvidia.com/explore/retrieval) ### Reranking Enhance retrieval performance further with a fine-tuned reranking model, finding the most relevant passages to provide as context when querying a large language model (LLM). - [Documentation](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html) - [Read the Blog](https://developer.nvidia.com/blog/how-using-a-reranking-microservice-can-improve-accuracy-and-costs-of-information-retrieval/) - [Try NIM Microservices for Retrieval](https://build.nvidia.com/explore/retrieval) * * * ## How NVIDIA NeMo Retriever Works NeMo Retriever provides components for building data extraction and information retrieval pipelines. The pipeline extracts structured and unstructured data (ex. text, charts, tables), converts it to text, and filters out duplicates. A NeMo Retriever [embedding NIM](https://build.nvidia.com/explore/retrieval) converts the chunks into embeddings and stores them in a vector database, accelerated by [NVIDIA cuVS](https://developer.nvidia.com/cuvs), for enhanced performance and speed of indexing and search. When a query is submitted, the system retrieves relevant information using vector similarity search, and then a NeMo Retriever [reranking NIM](https://build.nvidia.com/explore/retrieval) reranks the results for accuracy. With the most pertinent information, an LLM NIM generates a response that’s informed, accurate, and contextually relevant. You can use various LLM NIM microservices from the NVIDIA [API catalog](http://build.nvidia.com) to enable additional capabilities, such as synthetic data generation. ![A diagram showing how NVIDIA NeMo Retriever works from data ingestion to information retrieval.](https://developer.download.nvidia.com/images/nemo-retriever/llm-nemo-retriever-diagram-1920-1416.jpg) _NVIDIA NeMo Retriever collection of NIM microservices are used to build optimized ingestion and retrieval pipelines for highly accurate information retrieval at scale._ * * * ## Introductory Resources Learn more about building efficient information-retrieval pipelines with NeMo Retriever. ### Introductory Blog Understand the function of embedding and reranking models in information retrieval pipelines, top considerations, and more. [Read Blog](https://developer.nvidia.com/blog/translate-your-enterprise-data-into-actionable-insights-with-nvidia-nemo-retriever/) ### Introductory Webinar Improve the accuracy and scalability of text retrieval for production-ready generative AI pipelines and deploy at scale. [Watch Now](https://info.nvidia.com/World-Class-Text-Retrieval-Accuracy-Generative-AI.html?ondemandrgt=yes#) ### AI Blueprint for RAG Learn best practices for connecting AI apps to enterprise data using industry-leading embedding and reranking models. [Try the Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) ### Introductory GTC Session Learn about the latest models, tools, and techniques for creating agentic and RAG pipelines for multimodal data ingestion, extraction, and retrieval. [Watch Session](https://www.nvidia.com/en-us/on-demand/session/gtc25-s72205/) * * * ## World-Class Information-Retrieval Performance NeMo Retriever microservices accelerate multimodal document extraction and real-time retrieval with lower RAG costs and higher accuracy. They support reliable, multilingual, and cross-lingual retrieval, and optimize storage, performance, and adaptability for data platforms – enabling efficient vector database expansion. ### 50% Fewer Incorrect Answers NeMo Retriever Multimodal Extraction Recall@5 Accuracy ![A graph showing NeMo Retriever has achieved 2X throughput for fast info retrieval](https://developer.download.nvidia.com/images/nemo-retriever/nemo-retriever-llama-chart-1.svg) Evaluated on publicly available dataset of PDFs consisting of text, charts, tables, and infographics. NeMo Retriever Extraction On: nemoretriever-page-elements-v2, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1, paddle-ocr compared with NeMo Retriever Off: open-source alternative: HW - 1xH100 ### 3X Higher Embedding Throughput NeMo Retriever Llama 3.2 Multilingual Text Embedding ![A graph showing NeMo Retriever has achieved high accuracy with 30% fewer incorrect answers](https://developer.download.nvidia.com/images/nemo-retriever/nemo-retriever-llama-chart-2.svg) This test was conducted with the following requirements: 1xH100 SXM; passage token length: 512, batch size: 64, concurrent client requests: 5; OSS Alternative: FP16 compared to the NeMo Retriever lama-3.2-nv-embedqa-1b-v2, NIM: FP8 ### 15X Higher Multimodal Data Extraction Throughput NeMo Retriever Extraction NIM Microservices ![A graph showing NeMo Retriever embedding model is a leader on the Massive Text Embedding Benchmark (MTEB) leaderboard](https://developer.download.nvidia.com/images/nemo-retriever/nemo-retriever-extraction-chart-3.svg) Pages per second, evaluated on publicly available dataset of PDFs consisting of text, charts, and tables, with NeMo Retriever extraction NIM microservices: nv-yolox-structured-image-v1, nemoretriever-page-elements-v1, nemoretriever-graphic-elements-v1, nemoretriever-table-structure-v1, PaddleOCR, nv-llama3.2-embedqa-1b-v2 compared to an open-source alternative; HW - 1xH100 ### 35x Improved Data Storage Efficiency Multilingual, Long-Context, Text Embedding NIM Microservice ![A graph showing NeMo Retriever embedding model, llama-3.2-nv-embedqa-1b-v2.](https://developer.download.nvidia.com/images/nemo-retriever/data-storage-efficiency(1).svg) Tested with the latest NeMo Retriever embedding model, llama-3.2-nv-embedqa-1b-v2, this shows the impact on vector storage volume with long-context support, dynamic embeddings, and efficient storage for high-performance, scalable data processing. In the chart above, DIM=dimensions. * * * ## Ways to Get Started With NVIDIA NeMo Retriever Use the right tools and technologies to build and deploy generative AI applications that require secure and accurate information retrieval to generate real-time business insights for organizations across every industry. ![Decorative icon](https://developer.download.nvidia.com/images/isaac/m48-digital-deep-learning-institute-talks-training.svg) ### Access Download our open models from Hugging Face. [Access Code](https://huggingface.co/collections/nvidia/nemotron-rag-68f01e412f2dc5a5db5f30ed) ![Decorative](https://developer.download.nvidia.com/images/omniverse/m48-nim.svg) ### Download Experience NeMo Retriever NIM microservices through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through NVIDIA’s API catalog and deployed anywhere. [Download NeMo Retriever Microservices](https://build.nvidia.com/explore/retrieval) ![Decorative icon representing source code](https://developer.download.nvidia.com/images/icons/m48-workflow-complex-256px-blk.png) ### Try Jump-start building your AI solutions with [NVIDIA Blueprints](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/), customizable reference applications, available on the NVIDIA API catalog. [Try the Blueprint](https://build.nvidia.com/nim/agent-blueprints) * * * ## Starter Kits Start building information retrieval pipelines and generative AI applications for multimodal data ingestion, embedding, reranking, retrieval-augmented generation, and agentic workflows by accessing [NVIDIA Blueprints](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/), tutorials, notebooks, blogs, forums, reference code, comprehensive documentation, and more. ### AI Agent for Enterprise Research Develop AI agents that continuously process and synthesize multimodal enterprise data, reason, plan, and refine to generate comprehensive reports. - [Read Blueprint Model Card](https://build.nvidia.com/nvidia/aiq) - [Read Technical Blog](https://developer.nvidia.com/blog/chat-with-your-enterprise-data-through-open-source-ai-q-nvidia-blueprint/) - [Try Blueprint](https://brev.nvidia.com/launchable/deploy/now?launchableID=env-2y9mLGTtOuAHzLbXu3xUY5HLckY) ### Enterprise RAG Connect secure, scalable, reliable AI applications to your company’s internal enterprise data using industry-leading embedding and reranking models for information retrieval at scale. - [Read Blueprint Model Card ](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) - [Read Technical Blog](https://developer.nvidia.com/blog/build-a-rag-agent-with-nvidia-nemotron/) ### Streaming Data to RAG Unlock dynamic, context-aware insights from streaming sources like radio signals and other sensor data. - [Read Blueprint Model Card](https://build.nvidia.com/nvidia/streaming-data-to-rag) - [Try NeMo Retriever Microservices](https://build.nvidia.com/explore/retrieval) ### Evaluating and Customizing RAG Pipelines Evaluate pretrained embedding models on data and queries similar to your users’ needs using NVIDIA NeMo microservices to optimize RAG performance. - [Learn About NeMo Curator](/nemo-curator) - [Learn About NeMo Evaluator](/nemo-evaluator) - [Read Technical Blog](https://developer.nvidia.com/blog/evaluating-and-enhancing-rag-pipeline-performance-using-synthetic-data/) * * * ## NVIDIA NeMo Retriever Learning Library * * * ## More Resources ![Decorative image representing forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Accelerate Your Startup * * * ## Ethical AI NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended. ## Get Started With NeMo Retriever Today. [Try Now](https://build.nvidia.com/explore/retrieval "Try Now") --- # Source: https://developer.nvidia.com/networking.md 1. [Topics](/topics/) 2. [Cloud and Data Center](#) 3. NVIDIA Networking # NVIDIA Networking NVIDIA networking is a range of highly scalable connectivity solutions used in enterprise, large cloud, and supercomputing data centers to accelerate AI and scientific computing workloads. Find networking software and drivers: [Download](/networking/ethernet-software) Key Topics: - AI Training - AI Inference - Agentic AI [![](https://developer.download.nvidia.com/images/networking/networking-key-visual.jpg "Click to Enlarge")](https://developer.download.nvidia.com/images/networking/networking-key-visual.jpg) Click to Enlarge ## NVIDIA Networking Platforms NVIDIA’s end-to-end accelerated networking platforms offer organizations the choice of NVIDIA Quantum InfiniBand and Spectrum™ Ethernet connectivity to provide the highest performance for AI training and inference workloads. Innovations such as adaptive routing and telemetry-based congestion control enable consistent, predictable performance with zero tail latency for AI workloads scaling to 100K+ GPUs. ## Accelerated Networking Platforms ### Quantum InfiniBand Platform - Highest networking performance and scale up to 800Gb/s - [Adapters](https://www.nvidia.com/en-us/networking/infiniband-adapters/) and [switches](https://www.nvidia.com/en-us/networking/infiniband-switching/) accelerate AI, ML, HPC, and data analytics - Develop solutions with in-network computing, custom topologies, and self-healing fabrics [Learn More](https://www.nvidia.com/en-us/networking/products/infiniband/) ### Spectrum Ethernet Platform - Accelerated, high-performance Ethernet up to 800Gb/s - Leverage [Spectrum-X](https://www.nvidia.com/en-us/networking/spectrumx/) as the world’s first Ethernet platform built for AI - Enable software-driven automation, simulation, and validation for operational efficiency [Learn More](https://www.nvidia.com/en-us/networking/products/ethernet/) ### BlueField Platform - Leverage smart offloads for cloud, **security** , timing, [video](/networking/rivermax), and storage - Integrate with open-source tools including [DPDK](/networking/dpdk), [OvS](https://www.nvidia.com/en-us/networking/products/ethernet/), Kubernetes, **RoCE** , **Spark** , and **OpenStack** - Accelerate applications at speeds up to 400 Gb/s [Learn More](https://www.nvidia.com/en-us/networking/products/ethernet/) * * * ## Accelerated Networking Software and Tools ### NVIDIA DOCA NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. [Get Started With DOCA](/networking/doca) ### NVIDIA Rivermax NVIDIA® Rivermax® offers a unique IP-based solution for any media and data streaming use case. Rivermax unlocks innovation for a wide range of applications in media and entertainment (M&E), broadcast, healthcare, smart cities, and more. [Get Started With Rivermax](/networking/rivermax) ### NVIDIA HPC-X NVIDIA® HPC-X® is a comprehensive software package that enables MPI and SHMEM/PGAS programming languages to achieve high performance, scalability, and efficiency. HPC-X also ensures that communication libraries are fully optimized by NVIDIA Quantum InfiniBand networking solutions. [Get Started With HPC-X](/networking/hpc-x) ### NVIDIA Magnum IO SDK The NVIDIA Magnum IO™ software development kit (SDK) enables developers to remove input/output (IO) bottlenecks in AI, high-performance computing (HPC), data science, and visualization applications, reducing the end-to-end time of their workflows. [Get Started With Magnum IO SDK](/magnum-io) ## Latest Networking News --- # Source: https://developer.nvidia.com/nim.md 1. [Topics](https://developer.nvidia.com/topics/) [AI](https://developer.nvidia.com/topics/ai) [Generative AI](https://developer.nvidia.com/generative-ai) NVIDIA NIM # NVIDIA NIM for Developers [NVIDIA NIM™](https://www.nvidia.com/en-us/ai/) provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and RTX™ AI PCs and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows and optimize response latency and throughput for each combination of foundation model and GPU. [Try APIs](https://build.nvidia.com/explore/discover "Github Repo")[Get Started With NIM](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html "Download Workflows") * * * ## How It Works NVIDIA NIM simplifies the journey from experimentation to deploying enterprise AI applications by providing enthusiasts, developers, and AI builders with pre-optimized models and industry-standard APIs for building powerful AI agents, co-pilots, chatbots, and assistants. With inference engines built on leading frameworks from NVIDIA and the community, including TensorRT, TensorRT-LLM, vLLM, SGLang, and more, NIM is engineered to facilitate seamless AI inferencing for the latest AI foundation models on NVIDIA GPUs. [Watch Video](https://www.youtube.com/watch?v=bpOvayHifNQ) ![NVIDIA NIM inference microservices stack diagram](https://developer.download.nvidia.com/images/nim/practitioner-nim-1920x1080.jpg) ### Introductory Blog Learn about NIM architecture, key features, and components. [Read Blog](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/) ### Documentation Access guides, reference information, and release notes for running NIM on your infrastructure. [Read Docs](https://docs.nvidia.com/nim/) ### Introductory Video Learn how to deploy NIM on your infrastructure using a single command. [Watch Video (04:09)](https://www.youtube.com/watch?v=087spL8hMvM) ### Deployment Guide Get step-by-step instructions for self-hosting NIM on any NVIDIA accelerated infrastructure. [Read Guide](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html) * * * ## Build With NVIDIA NIM ### Optimized Model Performance Improve AI application performance and efficiency with accelerated engines from NVIDIA and the community, including TensorRT, TensorRT-LLM, vLLM, SGLang, and more—prebuilt and optimized for low-latency, high-throughput inferencing on specific NVIDIA GPU systems. ### Run AI Models Anywhere Maintain security and control of applications and data with prebuilt microservices that can be deployed on NVIDIA GPUs anywhere—from RTX AI PCs, workstations, data centers, or the cloud. Download NIM inference microservices for self-hosted deployment, or take advantage of dedicated endpoints on Hugging Face to spin up instances in your preferred cloud. ### Choose Among Thousands of AI Models and Customizations Deploy a broad range of LLMs supported by vLLM, SGLang, or TensorRT-LLM, including community fine-tuned models and models fine-tuned on your data. ### Maximize Operationalization and Scale Get detailed observability metrics for dashboarding, and access Helm charts and guides for scaling NIM on Kubernetes. * * * ## NVIDIA NIM Examples and Blueprints Sample App Repo Reference Blueprints Agentic AI Toolkit ### Build Accelerated Generative AI Applications Including RAG, Agentic AI, and More Get started building AI applications powered by NIM using NVIDIA-hosted NIM API endpoints and generative AI examples from GitHub. See how easy it is to deploy retrieval-augmented generation (RAG) pipelines, agentic AI workflows, and more. [Explore NVIDIA Generative AI Examples](https://nvidia.github.io/GenerativeAIExamples/latest/index.html) ### Jump-Start Development With Blueprints NVIDIA AI Blueprints are predefined, customizable AI workflows for creating and deploying AI agents and other generative AI applications. Build and operationalize custom AI applications—creating data-driven AI flywheels—using blueprints along with NVIDIA AI and Omniverse™ libraries, SDKs, and microservices. Explore blueprints [co-developed](https://build.nvidia.com/blueprints?q=partner) with leading agentic AI platform providers including CrewAI, LangChain, and more. [Explore NVIDIA Blueprints](https://build.nvidia.com/nim/agent-blueprints) ### Simplify Development With NVIDIA AgentIQ Toolkit Weave NIM microservices into agentic AI applications with the NVIDIA AgentIQ library, a developer toolkit for building AI agents and integrating them into custom workflows. [Learn More](https://developer.nvidia.com/agentiq)[Try Now](http://github.com/NVIDIA/AgentIQ) * * * ## Get Started With NVIDIA NIM Explore different options for experimenting, building, and deploying optimized AI applications using the latest models with NVIDIA NIM. ![Decorative image of building AI application with NVIDIA NIM API](https://developer.download.nvidia.com/icons/m48-nim-256px-blk.png) ### Try Get free access to NIM API endpoints for unlimited prototyping, powered by DGX Cloud. Your membership to the [NVIDIA Developer Program](https://developer.nvidia.com/developer-program) enables NVIDIA-hosted NIM APIs and containers for development and testing ([FAQ](https://forums.developer.nvidia.com/t/nvidia-nim-faq/300317)). [Visit the NVIDIA API Catalog](https://build.nvidia.com/explore/discover) ![Decorative image of joining NVIDIA Developer Program for free access to NIM](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Build Get a head start on development with sample applications built with NIM and partner microservices. [NVIDIA Blueprints](https://blogs.nvidia.com/blog/nim-agent-blueprints/) can be deployed in one click with [NVIDIA Launchables](https://developer.nvidia.com/blog/one-click-deployments-for-the-best-of-nvidia-ai-with-nvidia-launchables/), downloaded for local deployments on PCs and workstations, or for development in your datacenter or private cloud. [Explore NVIDIA Blueprints](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere) ![Decorative image of deploying with NVIDIA AI Enterprise](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) ### Deploy Deploy on your own infrastructure for development and testing. When ready for production, get the assurance of security, API stability, and support that comes with [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/), or access dedicated enterprise-grade NIM endpoints at NVIDIA partners. [Run NVIDIA NIM anywhere](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere) * * * ## NVIDIA NIM Learning Library * * * ## More Resources ![Decorative image representing forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Community ![](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Training and Certification ![](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Inception for Startups ![Decorative image representing Inception for Startups](https://developer.download.nvidia.com/icons/m48-newspaper-256px-blk.png) ### Tech Blogs * * * ## Ethical AI NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended. Learn about the latest NVIDIA NIM models, applications, and tools. [Sign Up](https://www.nvidia.com/en-us/ai-data-science/generative-ai/news/) --- # Source: https://developer.nvidia.com/npp.md 1. [Home](/) Computer Vision Computer Vision SDKs and Libraries NPP # NVIDIA Performance Primitives (NPP) ![Decorative image showing an example of Euclidean Distance Transform (EDT) function](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/NPP-example-Euclidean-distance-transport.png) NPP example: Euclidean Distance Transform (EDT) * * * ## Download NPP NPP is a library of over 5,000 primitives for image and signal processing that lets you easily perform tasks such as color conversion, image compression, filtering, thresholding, and image manipulation. You can now access GPU-accelerated image, video, and signal-processing functions that perform up to 30X faster than CPU-only implementations. [Download NPP](https://developer.nvidia.com/cuda-downloads) ## Download NPP+ NPP+ enables C++ support—a game-changer that surpasses the overall performance of NPP while requiring fewer operators. With the NPP+ library, you'll experience a seamless interface for image and signal processing, plus the added advantage of multi-GPU support. It’s an innovative standalone component that delivers enhanced capabilities and efficiency for all your processing needs. [Download NPP+](https://developer.nvidia.com/nppplus-downloads) * * * ## Use Cases ### Industrial Inspection Enhance industrial inspection efficiency and accuracy by rapidly processing high-volume imaging data to detect defects, ensure quality control, and streamline maintenance operations. ### Medical Imaging Help clinicians perform faster and more precise diagnostics with GPU-accelerated processing of complex imaging data for early detection and treatment planning. ### Robotics Enable real-time, GPU-accelerated image and signal processing for enhanced autonomous decision-making, precision in object recognition, and rapid environmental analysis. * * * ## NPP Benefits ### Scalable Performance The NPP library optimizes the use of available computing resources so your application achieves maximum performance across data center, workstation, and embedded platforms. NPP can also handle highfidelity 10-bit or 12-bit HDR video (i.e. cooled sensor astrophotography). ### Simple Setup Ready-to-use, domain-specific, high-performance primitives feature a rich set of functions supporting a large variety of image formats. Drop-in replacement for the Intel Integrated Performance Primitives (IPP) CPU library. NPP accepts raw uncompressed image or signal data and supports multiple RGB and YUV image and video formats. ### Flexible Design Use as a stand-alone library to add GPU acceleration to your application in a matter of hours, or as a cooperative library for interoperating efficiently with your existing GPU code. It includes both low-level primitives and high-level capabilities. ### C++ Support Enjoy faster execution times, simplify code, and enhance maintainability with NPP+. It helps you effortlessly integrate image and signal processing and tap into the power of multiple GPUs for scalability and higher performance levels. * * * ## Comparative Performance ![A comparative performance chart of NVIDIA Performance Primitives (NPP) vs IPP](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/Comparative-performance-NPP-vs-IPP.png) # Test Setup - IPP 2018 running on an Intel Xeon Gold 6240@2GHz 3.9GHz Turbo (Cascade Lake) server with HT on; Ubuntu18.04 OS - GPU—Tesla T4(TU104) 1\*16097 MiB 1\*40 SM - Tesla V100-SXM2-32GB(GV100) 1\*32510 MiB 1\*80 SM - A100-SXM4-40GB(GA100) 1\*40557 MiB 1\*108 SM - CUDA Driver — 445.33 (r445\_00), CUDA Toolkit 11.0 - Speedup represents average bandwidth increase over all routines * * * ## NPP Operations and Functions NPP offers a broad range of over 5,000 operations and functions. Key categories include image color conversion, signal filtering, and image linear transforms. All categories of operations and functions: - Image Arithmetic and Logical Operations - Image Color Conversion Functions - Image Data Exchange and Initialization Functions - Image Filtering Functions - Image Geometry Transforms Functions - Image Linear Transforms Functions - Image Morphological Operations - Image Statistics Functions - Image Threshold and Compare Operations - Image Memory Management Functions - Signal Arithmetic and Logical Operations - Signal Conversion Functions - Signal Filtering Functions - Signal Initialization Functions - Signal Statistical Functions - Signal Memory Management Functions [ Additional NPP Operations](https://docs.nvidia.com/cuda/npp/) If NPP isn’t the right library for your use case, check out other data processing libraries like [CV-CUDA](https://developer.nvidia.com/cv-cuda), [DALI](https://developer.nvidia.com/dali), [VPI](https://developer.nvidia.com/embedded/vpi), or [Optical Flow SDK](https://developer.nvidia.com/opticalflow-sdk). * * * ## Videos and Webinar * * * ## Additional Resources - [GPU Accelerated Computing With C and C++ (2:01 Minutes) ](https://developer.nvidia.com/how-to-cuda-c-cpp) - [NVIDIA 2D Image and Signal Processing Performance Primitives (NPP) Documentation](https://docs.nvidia.com/cuda/npp/index.html) - [Box Filter With NPP](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/boxFilterNPP) - [Histogram Equalization With NP](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/histEqualizationNPP) - [FreeImage and NPP Interoperability](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/freeImageInteropNPP) - [Watershed segmentation With NPP](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/4_CUDA_Libraries/watershedSegmentationNPP) - [Additional Code Samples](https://github.com/NVIDIA/CUDALibrarySamples/tree/master/NPP) Get started with NVIDIA 2D Image and signal processing performance primitives (NPP). [Download](https://developer.nvidia.com/cuda-downloads) Quick Links - [Download NPP ](https://developer.nvidia.com/cuda-downloads) - [Download NPP+](https://developer.nvidia.com/nppplus-downloads) - * * * --- # Source: https://developer.nvidia.com/nsight-aftermath.md ![Nsight Aftermath SDK](https://developer.download.nvidia.com/images/nvidia-nsight-aftermath-icon.png "Nsight Aftermath SDK") # NVIDIA Nsight Aftermath SDK NVIDIA® Nsight™ Aftermath is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash. [Download for Windows](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-aftermath-sdk/2025_5_0/windows/NVIDIA_Nsight_Aftermath_SDK_2025.5.0.25317.zip)[Download for Linux](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-aftermath-sdk/2025_5_0/linux/NVIDIA_Nsight_Aftermath_SDK_2025.5.0.25317.tgz) [Get Started](https://developer.nvidia.com/nsight-aftermath/getting-started) ![Nsight Aftermath SDK](https://developer.download.nvidia.com/images/aftermathheader__3_.png "Nsight Aftermath SDK") ### Capture GPU Pipeline Exceptions Nsight Aftermath SDK generates a GPU “mini-dump” file from a live graphics application, containing detailed information about the GPU pipeline. This pipeline data can help reveal why an exception was triggered. GPU mini-dumps provide debugging information for the immediate issue, and an artifact for tracking problematic trends and patterns. ### Integrate into Applications The Nsight Aftermath SDK is flexible, built to work in conjunction with existing crash reporters. It can be integrated into user applications to allow important debugging information to be collected and processed by workflows that are already established. The granularity that Nsight Aftermath SDK captures at the moment of GPU failure makes crash reporters more assistive to resolving errors. ### Visualize in Nsight Graphics The GPU “mini-dump” files generated by Nsight Aftermath can be loaded into Nsight Graphics for a rich visualization of the data. Nsight Graphics will automatically display GPU State information and can isolate the specific line of shader source code that caused an exception. * * * ## Explore Key Features ![Nsight Aftermath SDK library](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/image-2023-05-11-10-22-39-506.png "Nsight Aftermath SDK library") ### Visualize GPU State Nsight Aftermath gives you the ability to see detailed information about the GPU at the time of an exception. This includes what warps were running, the status of various GPU subunits and details about the faulting unit, as well as the execution state for markers you inserted in the frame. ![Nsight Aftermath Monitor and Nsight Graphics integration](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/mmu_fault_correlation.png "Nsight Aftermath Monitor and Nsight Graphics integration") ### Enhanced MMU Fault Correlation The GPU frequently accesses memory, which is processed by a hardware unit called the MMU, or Memory Management Unit. Often, when memory operations fail, no contextual data is left to help locate and resolve the issue. Debugging shouldn’t happen in the dark; Nsight Aftermath addresses MMU faults by exposing the shader source code that caused a faulty memory access. ![MMU fault correlation identifying problematic line of source code](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/crash_monitoring.png " MMU fault correlation identifying problematic line of source code") ### Crash Monitoring with Nsight Graphics The latest version of Nsight Aftermath is packaged with Nsight Graphics, so the tools can be used in conjunction. The Nsight Aftermath Monitor–included in both the SDK and Nsight Graphics–is the command center for collecting crash information. It can be extended by the Aftermath API to collect additional data like user markers and application specific meta data. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/application_integration.png) ### Application Integration The Nsight Aftermath SDK contains headers and a library to integrate GPU mini-dump retrieval into your application. You can also explore this [GitHub repository](https://github.com/NVIDIA/nsight-aftermath-samples) to access sample code, and an example dump file that can be viewed in Nsight Graphics. ## View Other Tools Within the Nsight Suite Nsight Aftermath SDK is part of the NVIDIA Nsight Developer Tools suite—a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware. ![Nsight Graphics](https://developer.download.nvidia.com/images/nvidia-nsight-graphics-icon-gbp-shaded-128.png "Nsight Graphics") ### Nsight Graphics NVIDIA Nsight Graphics is a standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK. [Get Started](/nsight-graphics) ![Nsight Systems](https://developer.download.nvidia.com/images/nvidia-nsight-systems-icon-gbp-shaded-256.png "Nsight Systems") ### Nsight Systems NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs. [Get Started](/nsight-systems) ![Nsight Compute](https://developer.download.nvidia.com/images/nvidia-nsight-compute-icon-gbp-shaded-128.png "Nsight Compute") ### Nsight Compute Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results. [Get Started](/nsight-compute) [Learn More About Nsight Tools](https://developer.nvidia.com/tools-overview) [Browse Tutorials](/tools-tutorials) ## Watch Nsight Aftermath Sessions and Technical Videos on Demand * * * ## Stay up to Date on the Latest NVIDIA Nsight Aftermath News ## Find More Resources [![Developer Forums](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/forums-icon-48x48_1.svg "Developer Forums")](https://forums.developer.nvidia.com/c/development-tools/nsight-aftermath-sdk/113) ### [Explore Nsight Aftermath SDK Forums](https://forums.developer.nvidia.com/c/development-tools/nsight-aftermath-sdk/113) [![ Nsight Aftermath SDK Documentation](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/documentation-icon-48x48_1.svg "Nsight Aftermath SDK Documentation")](https://docs.nvidia.com/nsight-aftermath) ### [Read Nsight Aftermath SDK Documentation](https://docs.nvidia.com/nsight-aftermath) [![ Browse tutorials](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-digital-deep-learning-institute-talks-training-256px-grn.png "Browse tutorials")](/tools-tutorials) ### [Browse Nsight Tools Tutorials](/tools-tutorials) [![ NVIDIA Developer Program](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-people-group-256px-grn.png "NVIDIA Developer Program")](https://developer.nvidia.com/developer-program) ### [Join the NVIDIA Developer Program](https://developer.nvidia.com/developer-program) Ready to get started with Nsight Aftermath SDK? [Get Started](https://developer.nvidia.com/nsight-aftermath/getting-started) --- # Source: https://developer.nvidia.com/nsight-compute.md ![Nsight Compute](https://developer.download.nvidia.com/images/nvidia-nsight-compute-icon-gbp-shaded-128.png "Nsight Compute") # NVIDIA Nsight Compute NVIDIA Nsight™ Compute is an interactive profiler for CUDA® and NVIDIA OptiX™ that provides detailed performance metrics and API debugging via a user interface and command-line tool. Users can run guided analysis and compare results with a customizable and data-driven user interface, as well as post-process and analyze results in their own workflows. [Get started](/tools-overview/nsight-compute/get-started) NVIDIA Nsight Compute is also available as part of the [CUDA Toolkit](/cuda-toolkit). https://www.youtube-nocookie.com/embed/04dJ-aePYpE Watch an overview video about how guided analysis in Nsight Compute assists CUDA kernel optimizations. Highlighting GPU throughput, warp state statistics, and source code correlation. #### Profile CUDA and OptiX For developing with CUDA or OptiX, application-level performance tuning is just the beginning of GPU optimization. When a deeper dive into compute processes is needed, it's crucial to have both visibility to hardware activity and the level of understanding required to optimize it. With NVIDIA Nsight Compute, you don’t have to be a hardware architecture expert to do this; Nsight Compute is a CUDA and OptiX profiler that detects performance issues, displays them intuitively, and delivers built-in guidance from NVIDIA engineers on how to resolve them. #### Leverage NVIDIA’s Insight Nsight Compute is designed to assist the hefty task of kernel profiling with a powerful set of tools bundled with NVIDIA’s own insights. By visualizing hardware performance metrics, it translates traditionally cryptic values into actionable information. The level of detail that Nsight Compute uncovers is ordered hierarchically, such that memory utilization can be correlated down to individual lines of source code. Built into every step of the process, guided analysis from NVIDIA’s own rule set identifies common performance limiters and offers valuable optimization advice. #### Customize and Collaborate For expert users, Nsight Compute can be extended with custom metric collection and analysis workflows. For cross-platform development, baseline comparisons reveal performance variations between different GPU architectures. For collaboration, dependencies and source information can be imported into the report and shared with colleagues and teams. Profiling can be conducted through the Nsight Compute GUI, or through the CLI; on the local device, or remotely. Python developers can leverage the NVRules API for automating analysis. Nsight Compute’s options for different development areas, experience levels, and project sizes are expansive. * * * ## Explore Key Features ### Find Optimizations With Guided Analysis Nsight Compute’s report pages provide insight into all aspects of a profile. The details page offers metrics that address overall GPU utilization, how performance is connected to various hardware concepts, and concludes with recommended optimization actions. Insights into performance problems and solutions from NVIDIA’s best practices are provided along the way via guided analysis. Baseline comparisons enable efficient feedback directly in the tool to understand the effects of any changes to the workload. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight/nsight-compute-optimizations-guided-analysis-630x354.jpg) _The details page raises flags on low GPU throughput and automatically detects performance limiters that are the potential source._ ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight/nsight-compute-inspect-memory-workload-630x354.jpg) _Memory chart visualizing data transfer, where pipelines are colored with a heatmap based on their utilization._ ### Inspect Memory Workload Memory workload analysis builds a visualization of memory transfer sizes and throughput on the profiled architecture, as well as a guide for improving performance. Heatmaps allow users to intuitively understand potential bottlenecks and under-utilizations in the memory pipeline. Detailed tables for each hardware unit enable insight into the path from originating instruction to executed memory access. [Learn more about memory workload analysis](https://www.youtube.com/watch?v=DnwZ6ZTLw50&t=138s) ### Correlate Source Code With Detailed Instruction Metrics Nsight Compute supports correlating efficiency metrics down to the individual lines of code that contribute to them. This includes connecting assembly (SASS) with PTX and higher-level code, such as CUDA C/C++, Fortran, OpenACC or python. A heat-map visualization highlights areas with high metric values to quickly locate problematic areas. Warp stall sampling identifies latency and inefficiency issues while instruction execution metrics indicate expensive code locations. Such detail empowers the scrutinous eye to tune performance at a precise degree. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight/nsight-compute-correlate-source-code-630x354.jpg) _Metrics corresponding to individual lines of code being profiled in the source page._ ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight/nsight-compute-utilize-cuda-graphs-interactive-profiling-630x354.jpg) _A CUDA graph visualizing how nodes are configured and connected._ ### Utilize CUDA Graphs and Interactive Profiling Interactive profiling creates a live session where application state can be viewed dynamically and full control of the target is preserved. This allows you to step API calls, inspect resources, or experiment with different kernel configurations to readily make performance comparisons. Explore and export CUDA graphs to understand how they are connected and profile individual nodes or the entire graph with detailed hardware metrics. ### Uplift OptiX Development Nsight Compute is part of the NVIDIA Nsight Developer Tools suite, a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight/nsight-compute-uplift-optix-api-development-630x354.jpg) _Acceleration structure viewer where with a hierarchical view on the left, a graphical view in the middle, and control options on the right._ ## View Other Tools Within the Nsight Suite Nsight Compute is part of the NVIDIA Nsight Developer Tools suite; a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware. ![](https://developer.download.nvidia.com/images/nvidia-nsight-graphics-icon-gbp-shaded-128.png) ### Nsight Graphics NVIDIA Nsight™ Graphics is a standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK. [Get Started](/nsight-graphics) ![](https://developer.download.nvidia.com/images/nsight-deep-learning-designer-96x96.png) ### Nsight Deep Learning Designer NVIDIA Nsight DL Designer is an integrated development environment that helps developers efficiently design and develop deep neural networks for in-app inference. [Get Started](/nsight-dl-designer) ![](https://developer.download.nvidia.com/images/nvidia-nsight-systems-icon-gbp-shaded-256.png) ### Nsight Systems NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs. [Get Started](/nsight-systems) [Learn more about Nsight Tools](/tools-overview) [Browse Tutorials](/tools-tutorials) ## Watch Nsight Developer Tools CUDA Tutorials CUDA Developer Tools is a series of tutorial videos designed to get you started with using Nsight tools for CUDA development. It explores key features for CUDA profiling, debugging, and optimizing. https://www.youtube-nocookie.com/embed/xdFQZSV5IrU?si=2J4RJszw7vyB9Jq4 ### CUDA Developer Tools | NVIDIA Nsight Tools Ecosystem https://www.youtube-nocookie.com/embed/dUDGO66IadU?si=1B88gFifsJk2CTAG ### CUDA Developer Tools | Intro to NVIDIA Nsight Systems https://www.youtube-nocookie.com/embed/Iuy_RAvguBM?si=9bTGg4ZvXeBdhjSv ### CUDA Developer Tools | Intro to NVIDIA Nsight Compute [CUDA Developer Tools Tutorials Playlist](https://www.youtube.com/playlist?list=PL5B692fm6--ukF8S7ul5NmceZhXLRv_lR) * * * ## Watch Nsight Compute Sessions and Technical Videos on Demand ## Stay up to Date on the Latest NVIDIA Nsight Compute News ## Find More Resources [![Developer Forums](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/forums-icon-48x48_1.svg)](https://forums.developer.nvidia.com/c/development-tools/nsight-compute/114) ### [Explore Nsight Forums](https://forums.developer.nvidia.com/c/development-tools/nsight-compute/114) [![RTXDI Technical Blog](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/documentation-icon-48x48_1.svg)](https://docs.nvidia.com/nsight-compute/index.html) ### [Read Nsight Compute Documentation](https://docs.nvidia.com/nsight-compute/index.html) [![ RTXDI Sessions on NVIDIA On-Demand](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-digital-deep-learning-institute-talks-training-256px-grn.png)](http://developer.nvidia.com/tools-tutorials) ### [Browse Nsight Tools Tutorials](/tools-tutorials) [![ NVIDIA Developer Program](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-people-group-256px-grn.png)](https://developer.nvidia.com/developer-program) ### [Join the NVIDIA Developer Program](https://developer.nvidia.com/developer-program) Ready to get started with NVIDIA Nsight Compute? [Download Now](/tools-overview/nsight-compute/get-started) --- # Source: https://developer.nvidia.com/nsight-copilot.md 1. [Home ](/) [Nsight Developer Tools](/tools-overview) Nsight Copilot # NVIDIA Nsight Copilot Nsight Copilot is NVIDIA’s AI-powered accelerated computing assistant. It provides intelligent code suggestions and assistance for NVIDIA® CUDA® development. Available via a Visual Studio Code extension, and integration in the [Nsight Compute](https://developer.nvidia.com/nsight-compute) kernel profiler is coming soon. [Get VS Code Extension](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-copilot "vMaterials for Windows")[Forum](https://forums.developer.nvidia.com/c/developer-tools/nsight-copilot/739 "vMaterials for MacOSX") * * * ## See Nsight Copilot in Action Combined with our specialized CUDA-aware LLM inference models, Nsight Copilot delivers the best coding experience for CUDA developers. Nsight Copilot is powered by NVIDIA NIM™ microservices and benchmarked for quality using our [ComputeEval](https://github.com/nvidia/compute-eval) framework to assess CUDA-related task proficiency. ### Nsight Copilot for Visual Studio Code This video shows how to convert a simple PyTorch operation into an optimized CUDA kernel with Nsight Copilot. https://www.youtube-nocookie.com/embed/8Y4td5Co5a4? ### Nsight Copilot for Nsight Compute This video demonstrates Nsight Copilot integration in Nsight Compute, providing interactive guidance on uncoalesced memory accesses. https://www.youtube-nocookie.com/embed/_CmiydaJFMw? * * * ## Key Features ### Expert CUDA-Aware Chat Go beyond basic questions. Powered by NVIDIA NIMs, chat provides expert-level, contextually-aware answers to your most complex CUDA challenges, from architectural best practices to deep-dive conceptual explanations. ### CUDA Code Generation Write production-ready code faster than ever. Generate complex, optimized CUDA snippets and kernels from natural language descriptions. Featuring nvidia/CUDA-autocomplete model, get highly relevant, real-time code completions right as you type. ### Interactive CUDA Code Transformation Modify CUDA code directly in the editor to enhance efficiency and ensure compatibility with NVIDIA technologies. * * * ## Get Started With Nsight Copilot ![Get Visual Studio Code Extension](https://developer.download.nvidia.com/icons/m48-download.svg) ### Get Visual Studio Code Extension The Nsight Copilot extension adds CUDA-aware coding assistant models that help you build and optimize CUDA in Visual Studio Code. [Get VS Code Extension](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-copilot) * * * ## More Resources ![Join the Developer Community](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Join the Community ![Join the NVIDIA Developer Program](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program ![Sign up for the NVIDIA Developer Newsletter](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign up for the Developer Newsletter ## Get started with Nsight Copilot [Download VS Code Extension](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-copilot) --- # Source: https://developer.nvidia.com/nsight-dl-designer.md 1. Home 2. [Developer Tools](https://developer.nvidia.com/isaac) 3. Nsight Deep Learning Designer ![Nsight Deep Learning (DL) Designer logo](https://developer.download.nvidia.com/images/nsight/nvidia-nsight-dl-designer-icon.png) # # NVIDIA Nsight Deep Learning Designer Nsight Deep Learning (DL) Designer is an integrated development environment that helps developers efficiently design and optimize deep neural networks for high-performance inference. It's built atop the industry standard ONNX model format and popular inference solutions like TensorRT™ and ONNX Runtime. [Get Started ](/nsight-dl-designer/getting-started "Download Now") ## ## Key Benefits Nsight DL Designer visualizes a TensorRT model for inspecting and editing. #### ### GUI for ONNX Model Design and Optimizations Nsight DL Designer is a GUI-based tool that makes editing and creating an ONNX model visible and intuitive. Its integration with other tools (including user-defined ones) allows quick and easy whole model transformations. #### ### Built-in Profiler for Performance Evaluations Nsight DL Designer ships with both a ONNX Runtime profiler and a TensorRT profiler. Developers can quickly evaluate a model’s inference performance profile while they make changes to the model. #### ### Integration With TensorRT Nsight DL Designer ships with NVIDIA’s TensorRT (10.7) inference engine, and can be used as its GUI frontend (no separate installation of TensorRT is required). Developers can easily load an ONNX model and convert it into a TensorRT engine with all the ease of a GUI. * * * ## Explore Key Features ### Efficient Model Design Without Coding Nsight DL Designer is a full-fledged editor for ONNX models. Its GUI allows developers to open an existing ONNX model, visualize its computation graph, and make changes to the model graph simply by dragging and dropping ONNX operators. DL Designer is currently aligned with ONNX version 1.15 (opset 20) and supports the latest features like Local Functions and FP8. Advanced users can create a model from scratch entirely in DL Designer (no coding in Python needed!). DL Designer is integrated with popular ONNX tools like GraphSurgeon and Polygraphy to enable easy global modifications to a model, like graph sanitization, FP16 conversion, and initializers type conversions. DL Designer also supports user-defined tools that allow developers to make quick whole model transformations using tools that they are familiar with. [Click to Enlarge ](https://developer.download.nvidia.com/images/nsight/GlobalModifications.png) _Global model modification options available in Nsight DL Designer_ [Click to Enlarge ](https://developer.download.nvidia.com/images/nsight/inference-perf.png) _Nsight DL Designer displays key GPU metrics for a model’s inference performance._ ### Inference Performance Profiling With GPU Metrics For performance-critical applications, frequent and accurate profiling of a model’s inference performance is necessary for developers to gain insights into the model’s bottlenecks and to tweak the model’s architecture or parameters to meet required performance targets. Nsight DL Designer ships with both a ONNX Runtime (1.17) profiler and a TensorRT (10.7) profiler so developers can design and profile their models all in a single environment. DL Designer’s ONNX Runtime profiler currently supports both CUDA and DirectML execution providers in addition to the CPU execution provider. With these built-in profilers, in addition to timing, developers can also get detailed profiling data that are based on GPU metrics like SM utilization, Tensor Core utilization, SM occupancy, and more. This allows developers to further optimize their models for a specific GPU family. One distinctive feature of DL Designer’s profilers is the ability to visually correlate the profiling data gathered at low-level kernels to the original ONNX operators in a model. ### Easy Model Export For Deployment Nsight DL Designer provides flexible options to support model deployment. A network model edited or created using DL Designer can be easily exported as an ONNX file for platform-independent deployment or as a TensorRT engine for high-performance inferencing with TensorRT. When exporting a model as a TensorRT engine, the same tactics used for profiling will be used by default. This ensures that developers will see the same inference performance from the model in deployment as they see in profiling. _Click to EnlargeTactics and optimization parameters selected for exporting models as TensorRT Engines._ * * * ## View Other Tools Within the Nsight Suite Nsight DL Designer is part of the NVIDIA Nsight Developer Tools suite—a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware. ![Nsight Systems logo](https://developer.download.nvidia.com/images/nsight/nvidia-nsight-systems-icon.png) ### ### Nsight Systems NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs. [Get Started](/nsight-systems) ![Nsight Compute logo](https://developer.download.nvidia.com/images/nsight/nvidia-nsight-compute-icon.png) ### ### Nsight Compute Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results. [Get Started](/nsight-compute) ![Nsight Graphics logo](https://developer.download.nvidia.com/images/nvidia-nsight-graphics-icon-gbp-shaded-128.png) ### ### Nsight Graphics NVIDIA Nsight Graphics is a standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK. [Get Started](/nsight-graphics) [Browse Tutorials](/nsight-developer-tools "Download Container")[Learn More About Nsight Tools ](/tools-overview "Download Now") ## Stay up to Date on the Latest NVIDIA Nsight Tools News ## Resources ![A query icon representing Nsight DL Designer Forums](https://developer.download.nvidia.com/images/nsight/m48-misc-question-faq.svg) ### Nsight DL Designer Forums ![An envelope icon representing Nsight DL Designer feedback](https://developer.download.nvidia.com/images/nsight/m48-email-settings.svg) ### Nsight DL Designer Feedback ![A LLM on a computer screen icon representing Nsight Tools Tutorials](https://developer.download.nvidia.com/images/nsight/m48-digital-deep-learning-institute-talks-training.svg) ### Browse Nsight Tools Tutorials Quick Links - [Get Started](/nsight-dl-designer/getting-started) - - * * * Get started with Nsight DL Designer. [Get Started](/nsight-dl-designer/getting-started "Github Repo") --- # Source: https://developer.nvidia.com/nsight-graphics.md 1. [Home](/) [Developer Tools ](https://developer.nvidia.com/tools-overview)[Nsight Graphics](https://developer.nvidia.com/nsight-graphics) ![Nsight Compute](https://developer.download.nvidia.com/images/nvidia-nsight-graphics-icon-gbp-shaded-128.png) # NVIDIA Nsight Graphics NVIDIA Nsight™ Graphics is a standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK. [Download for Windows ](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-graphics/2025_5_0/windows/NVIDIA_Nsight_Graphics_2025.5.0.25335.msi) | [Download for Linux (.run) ](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-graphics/2025_5_0/linux/NVIDIA_Nsight_Graphics_2025.5.0.25335.run) | [Download for Linux (.deb) ](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-graphics/2025_5_0/linux/NVIDIA_Nsight_Graphics_2025.5.0.25335.deb) [Get started](https://developer.nvidia.com/nsight-graphics/get-started) https://www.youtube-nocookie.com/embed/LKR5XIW1lgs? Learn how Nsight Graphics can be used to accelerate development and help make high-performance games with beautiful graphics. ### Optimize Performance Graphics optimization and hardware utilization shouldn’t be ambiguous. Nsight Graphics offers an unparalleled level of access into the performance markers of your graphics API—an invaluable aid in finding optimization opportunities that couldn’t be identified without looking under the GPU’s hood. ### Debug Graphics Nsight Graphics enables smooth graphics development on NVIDIA platforms. Identify bugs and trace them back to their source on the target application, including real-time shader debugging. At its most granular, Nsight Graphics lets developers inspect every individual event involved in generating a frame—down to the pixel. ### Boost Ray Tracing The Ray Tracing Inspector in Nsight Graphics enables the next generation of real-time rendering technology. Analyze ray tracing efficiency, improve acceleration structures, optimize axis-aligned bounding boxes (AABBs), build flags, and overlaps. The entire frame can be thoroughly examined to ensure the best image fidelity and frame performance. * * * ## Explore Key Features ### Track GPU Performance Analyze GPU throughput and utilization with minimal overhead for non-biased activity data. On the captured timeline, drill down into critical performance markers and inspect hardware unit throughputs, cache hit rates, memory throughput, and more. ![GPU Trace shows a full timeline of application workload](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/track-gpu-performance-630x354.jpg) _GPU Trace showing a full timeline of application workload._ ![GPU Trace analysis identifies performance blockers automatically](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/analyze-gpu-traces-630x354.jpg) _Trace analysis automatically identifying performance blockers._ ### Analyze GPU Traces Nsight Graphics supports automated performance analysis on captured GPU traces. Deep profiling of streaming multiprocessor (SM) performance is accomplished by automatically tracing the execution of shaders across a series of frames. ### Debug Ray-Tracing and Shaders Debug ray-tracing API calls and examine their state. The Ray Tracing Inspector exposes acceleration structures, helping you optimize how rays intersect with the geometry in your scene. You can also examine ray tracing efficiency to ensure ray-traversal speeds are high. Debug shader code with the Vulkan Shader Debugger, which exposes shader source in your render pipeline in real-time so you can quickly make fixes directly to the code. ![Ray Tracing Inspector analyzes ray tracing efficiency and reveals acceleration structures](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/debug-pipelines-630x354.jpg) _The Ray Tracing Inspector analyzes ray tracing efficiency and reveals acceleration structures._ ![Ray Tracing Shader timing heatmap makes a stalling shader issue clear with a red hotspot](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/profile-shaders-630x354.jpg) _Shader timing heatmap makes a stalling shader issue clear with a red hotspot._ ### Profile Ray-Tracing Shaders The Nsight Graphics Shader Profiler exposes shader data, including stalls and the reasons they occurred. The Real-Time Shader Profiler allows you to view the most expensive shaders at each moment in real-time. And the shader timing heatmap visualizes hotspots overlaid on the scene where shader times lagged per pixel. Profiling ray-tracing shaders can be an arduous task that requires extensive knowledge of the GPU. These features turn ray-tracing profiling into a streamlined and intuitive process. ### Export C++ Capture Create a self-contained C++ project that allows for frame analysis in a reduced CPU-load scenario. This lets you perform repeatable and isolated analysis without being bound to the original application and provides a protected environment for experimenting with optimization tweaks. ![A C++ project that allows for frame analysis of Hellblade: Senua's Sacrifice game in a reduced CPU-load scenario](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/export-c-capture-630x354.jpg) _Images of Hellblade: Senua's Sacrifice courtesy of[ Ninja Theory Ltd](http://www.ninjatheory.com/); Hellblade is a Direct3D 12 / DXR game based on Unreal Engine 4._ * * * ## View Other Tools Within the Nsight Suite Nsight Graphics is part of the NVIDIA Nsight Developer Tools suite; a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware. ![img-alt-text](https://developer.download.nvidia.com/images/nvidia-nsight-aftermath-icon.png) ### Nsight Aftermath SDK Nsight Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash. [Get Started](https://developer.nvidia.com/nsight-aftermath) ![img-alt-text](https://developer.download.nvidia.com/images/nvidia-nsight-systems-icon-gbp-shaded-256.png) ### Nsight Systems NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs. [Get Started](https://developer.nvidia.com/nsight-systems) ### Nsight Perf SDK NVIDIA Nsight Pef SDK is a graphics profiling toolbox that enables you to collect GPU performance metrics directly from your application. Leverage the built-in HUD renderer for real-time, high-level performance triage. [Get Started](https://developer.nvidia.com/nsight-perf-sdk) [Learn More About Nsight Tools](https://developer.nvidia.com/tools-overview)[Browse Tutorials](http://developer.nvidia.com/nsight-developer-tools) * * * ## Check out partner testimonials and ecosystem > Dassault Systèmes and its SOLIDWORKS brand have always supported bleeding-edge rendering technologies from NVIDIA. Nsight Graphics is one of our go-to graphics debugging tools. With valuable features like C++ Capture and Pixel History, Nsight Graphics has enabled us to solve complex rendering problems with ease. > > —Siddharth Palaniappan, SOLIDWORKS Graphics R&D Development Senior Manager, Dassault Systèmes > > ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-dassault-systems.svg) > Vulkan is the cornerstone of Adobe’s multi-platform, multi-vendor rendering strategy for its Adobe Substance 3D products. Thanks to the ray-tracing extensions that NVIDIA pioneered and contributed to Khronos, Vulkan gives native access to ray-tracing hardware, offering exceptional ray-tracing performance on supported devices. In addition, Nsight Graphics and Nsight Systems are invaluable tools when it comes to understanding and improving the performance of Vulkan ray-tracing applications. > > — Francois Beaune, Lead Software Engineer of Photorealistic Rendering, Adobe 3D and Immersive > > ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-adobe.svg) > Nsight Graphics provides a huge productivity boost for our team, because it helps us to efficiently debug shader logic, parameters, or textures. Nsight Graphics additionally is invaluable to gain insights into memory layout of geometry and textures. The team is always eager to support our development efforts with best practices and new features. > > — Jan Ohlenburg, Director of Software Development, Maxon > > ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-maxon.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-adobe.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-autodesk.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-dassault-systems.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-epic-games.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-maxon.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-ubi-soft.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-popcorn-fx.svg) * * * ## Watch Nsight Graphics Tutorial Series Fundamental concepts in graphics development, and key tips for ensuring peak performed, are explored with Nsight Graphics. https://www.youtube-nocookie.com/embed/HSsPJ4qK6AU? How to Improve Shader Performance by Resolving LDC Divergence https://www.youtube-nocookie.com/embed/f0a9mN4HQCI? Avoiding Stalls and Hitches in DirectX 12 https://www.youtube-nocookie.com/embed/ctow9BO79nA? Building Acceleration Structures Using Async Compute * * * ## Watch Nsight Graphics sessions and technical videos on demand * * * ## Stay up to date on the latest Nsight Graphics news * * * ## Find more resources ![img-alt-text](https://developer.download.nvidia.com/icons/m48-misc-question-faq.svg) #### Explore Nsight Tools Forums ![img-alt-text](https://developer.download.nvidia.com/icons/m48-document-support-guide-2.svg) #### Read Nsight Graphics Documentation ![img-alt-text](https://developer.download.nvidia.com/icons/m48-live-talk-on-demand.svg) #### Browse Nsight Tools Tutorials ![img-alt-text](https://developer.download.nvidia.com/icons/m48-people-group.svg) #### Join the NVIDIA Developer Program Ready to get started with NVIDIA Nsight Graphics? [Download Now](https://developer.nvidia.com/nsight-graphics/get-started) Quick Links - [Download for Windows ](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-graphics/2025_5_0/windows/NVIDIA_Nsight_Graphics_2025.5.0.25335.msi) - [Download for Linux (.run) ](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-graphics/2025_5_0/linux/NVIDIA_Nsight_Graphics_2025.5.0.25335.run) - [Download for Linux (.deb) ](https://developer.nvidia.com/downloads/assets/tools/secure/nsight-graphics/2025_5_0/linux/NVIDIA_Nsight_Graphics_2025.5.0.25335.deb) * * * --- # Source: https://developer.nvidia.com/nsight-perf-sdk.md # NVIDIA Nsight Perf SDK The NVIDIA® Nsight™ Perf SDK is a graphics profiling toolbox for DirectX, Vulkan, and OpenGL enabling you to collect GPU performance metrics directly from your application. [Get Started](/nsight-perf-sdk/get-started) [![NVIDIA Nsight® Perf SDK](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/Nsight_Perf_ProductPage_Hero_Updated.jpg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/Nsight_Perf_ProductPage_Hero_Updated.jpg) _Just a few lines of code are needed to set up GPU performance metrics collection with the Nsight Perf SDK._ [![Realtime Perf Triage](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/box-thumbnail.png) ](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/box-full-size-screenshot.png)(click image to expand) ## Realtime Perf Triage Enable high-level performance triage via realtime collection and on-screen visualization of GPU performance metrics. The new GPU Periodic Sampler collects device-level metrics at high sampling rates with low overhead. [![Profile In-Application](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/Profile_InAPP_Thumbnail.jpg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/PixNSightPerfSDK.png)(click image to expand) _Microsoft’s PIX on Windows showing NVIDIA GPU performance metrics_ ## Profile In-Application Integrate GPU performance metric collection into your application or graphics developer tool of choice. Activate profiling from your own custom programmatic triggers. Choose the list of GPU metrics to collect, customize your output, and keep control over your workflow. ![CI/CD](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/CICD_Final.jpg) ## Upgrade Your CI/CD Generate detailed profiler reports on every developer and artist change. Add dedicated perf regression criteria by inspecting GPU metric values. * * * ## Realtime Performance HUD [![Realtime Performance HUD](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/paragraph-thumbnail%5B16%5D.png)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/paragraph-full-size-screenshot.png)(click image to expand) Add continuous performance metrics collection to your code, and leverage the built-in HUD renderer to effortlessly enable real-time, high-level performance triage. Explore panels with metrics on SM, L2 cache, ROP, VRAM and various other subunits to gain an early understanding of the performance characteristics and potential bottlenecks of the scene as you move through it. The HUD- and Periodic Sampler-utility classes also serve as an example for creating your own powerful, low-overhead, real-time workflows on top of the low-level Nsight Perf SDK API. https://www.youtube-nocookie.com/embed/0gpoWXpOadA * * * ## Timeline Viewer Examine a snapshot of your application’s performance with the Nsight Perf SDK one-shot sampling mode. This allows you to examine hardware activity with minimal overhead. Nsight Perf SDK’s high-frequency sampling collects key GPU metrics in sharp detail. You can visualize unit throughputs, warp occupancy, draw calls, and more on the new Timeline Viewer. ![]()[![Timeline Viewer](https://developer.download.nvidia.com/images/TimelineView_353.png)](https://developer.download.nvidia.com/images/TimelineView.png)(click image to expand) https://www.youtube-nocookie.com/embed/HHjTvFbfZy0?si=BUjykJlZe7T8_dQU * * * ## HTML Profiler Report Generator [![Be One with the GPU](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/HTML_Profiler_Image_630.jpg)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/NsightPerfSDK_HTML_report_v2.png)(click image to expand) Generate detailed profiler reports with minimal effort. Simply insert a few calls at Graphics API Device Initialization, Present/SwapBuffers, a Keypress handler, or an automated trigger. Insert annotations (PushRange/PopRange) around GPU workloads to collect additional reports per region of execution. The report generator automatically collects 100s of GPU metrics of interest; there is no need to study these complex topics on first usage. The reports provide a top-down representation of GPU performance, with fast navigation to the top performance limiters. Quickly determine the workload type, pipeline activity and utilization, shader latency reasons, and 3D data flow. ## View Other Tools Within the Nsight Suite Nsight Perf SDK is part of the NVIDIA Nsight Developer Tools suite—a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware. ![](https://developer.download.nvidia.com/images/nvidia-nsight-graphics-icon-gbp-shaded-128.png) ### Nsight Graphics NVIDIA Nsight Graphics is a standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK. [Get Started](/nsight-graphics) ![Nsight Aftermath SDK](https://developer.download.nvidia.com/images/nvidia-nsight-aftermath-icon.png) ### Nsight Aftermath SDK Nsigh Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash.p\> [Get Started](https://developer.nvidia.com/nsight-aftermath) ![](https://developer.download.nvidia.com/images/nvidia-nsight-systems-icon-gbp-shaded-256.png) ### Nsight Systems NVIDIA Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs. [Get Started](/nsight-systems) [Learn More About Nsight Tools](https://developer.nvidia.com/tools-overview) [Browse Tutorials](http://developer.nvidia.com/tools-tutorials) ## Partners and Industry Standards [![DirectX](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/DirectX-NSP.jpg)](https://docs.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics) [![Microsoft](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/Microsoft-NSP.jpg)](https://aka.ms/pixonwindows) [![Khronos](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/Khronos_Group_NSP.jpg)](https://www.khronos.org/) [![Vulkan](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/Vulkan_NSP.jpg)](https://www.khronos.org/vulkan/) [![OpenGL](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/gamedev/OpenGL_NSP.jpg)](https://www.khronos.org/opengl/) * * * ## NVIDIA Nsight Tools News **[View all Nsight news](https://developer.nvidia.com/blog/tag/nsight/)** ## Find More Resources [![Developer Forums](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/forums-icon-48x48_1.svg)](https://forums.developer.nvidia.com/) ### [Explore Nsight Perf SDK Forums](https://forums.developer.nvidia.com/) [![ RTXDI Sessions on NVIDIA On-Demand](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-digital-deep-learning-institute-talks-training-256px-grn.png)](http://developer.nvidia.com/tools-tutorials) ### [Browse Nsight Tools Tutorials](http://developer.nvidia.com/tools-tutorials) [![ NVIDIA Developer Program](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-people-group-256px-grn.png)](https://developer.nvidia.com/developer-program) ### [Join the NVIDIA Developer Program](https://developer.nvidia.com/developer-program) Ready to download NVIDIA Nsight® Perf SDK? [Get Started](nsight-perf-sdk/get-started) --- # Source: https://developer.nvidia.com/nsight-systems.md 1. [Home](/) [Developer Tools ](/tools-overview) Nsight Systems ![Nsight Compute](https://developer.download.nvidia.com/images/nvidia-nsight-systems-icon-gbp-shaded-256.png) # NVIDIA Nsight Systems NVIDIA Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs, from large servers to our smallest systems-on-a-chip (SoCs). [Get Started](https://developer.nvidia.com/nsight-systems/get-started) Nsight Systems 2026.1.1 is available now. ![Nsight Systems can make high-performance games with beautiful graphics](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nsight-systems-850x480.jpg) Nsight Systems visualizes system workload metrics on a timeline and provides tools that help developers detect, understand, and solve performance issues. ### Profile the System The full picture of app optimization requires drilling deeply into hardware interactions to ensure maximum parallelism is achieved. Nsight Systems visualizes unbiased, system-wide activity data on a unified timeline, allowing application developers to investigate correlations, dependencies, activity, bottlenecks, and resource allocation to ensure hardware components are working harmoniously. ### Analyze Performance Nsight Systems offers low-overheard performance analysis that visualizes otherwise hidden layers of events and metrics used for pursuing optimizations, including CPU parallelization and core utilization, GPU streaming-multiprocessor (SM) optimization, system workload and CUDA® libraries trace, network communications, OS interactions, and more. ### Scale Across Platforms Nsight Systems is the universal tool for developing applications on NVIDIA platforms, whether on-premises or in the cloud. Scale across a wide range of NVIDIA platforms, from [NVIDIA DGX™](https://www.nvidia.com/en-us/data-center/dgx-platform/) to [NVIDIA RTX™ workstations](https://www.nvidia.com/en-us/design-visualization/workstations/), including [NVIDIA DRIVE®](https://developer.nvidia.com/drive) for automotive and [NVIDIA Jetson™](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/) for edge AI and robotics. Nsight Systems provide valuable insights for optimizing AI, high-performance computing (HPC), pro-visualization and gaming applications. * * * ## Explore Key Features ### Visualize CPU-GPU Interactions Nsight Systems latches on to a target application to expose GPU and CPU activity, events, annotations, throughput, and performance metrics in a chronological timeline. With low overhead, this data can be visualized accurately and in parallel for ease of understanding. GPU workloads are further correlated with in-application CPU events, allowing for performance blockers to be easily identified and remedied. ![Nsight Systems can make high-performance games with beautiful graphics](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/visualize-cpu-gpu-interactions-630x354.jpg) _CPU activity (top) in parallel to GPU graphics and compute activity (bottom)._ ![Nsight Systems tracks GPU activity](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/track-gpu-activity-630x354.jpg) _The GPU Metrics section of the Nsight Systems timeline._ ### Track GPU Activity To further explore the GPU, toggling on GPU Metrics Sampling will plot low-level input/output (IO) activity such as PCIe throughput, [NVIDIA NVLink®](https://www.nvidia.com/en-us/data-center/nvlink/), and dynamic random-access memory (DRAM) activity. GPU Metrics Sampling also exposes SM utilization, Tensor Core activity, instruction throughput, and warp occupancy. Every workload and their CPU origin can be readily tracked to support performance tuning. ### Trace GPU Workloads For compute tasks, Nsight Systems supports investigating the CUDA API and tracing CUDA libraries, including cuBLAS, cuDNN, and NVIDIA TensorRT™. For graphics computing, Nsight Systems supports profiling Vulkan, OpenGL, DirectX 11, DirectX 12, DXR, and NVIDIA OptiX™ APIs. ![Nsight Systems traces GPU workloads](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/trace-gpu-workloads-630x354.jpg) _Alt text: DX12 API calls as they happen chronologically in the timeline alongside render thread._ https://www.youtube-nocookie.com/embed/XPXfeh80zDE? _Scale AI Applications to the Data Center and Cloud with NVIDIA Nsight Systems GTC Demo Video_ ### Accelerate Multi-Node Performance Nsight Systems supports multi-node profiling to resolve performance limiters on the scale of data centers and clusters. [Multi-node analysis](https://docs.nvidia.com/nsight-systems/UserGuide/index.html#multi-report-analysis) automatically diagnoses performance limiters across many nodes simultaneously. Additionally, network metrics alongside Python backtrace sampling paint a complete picture across GPUs, CPUs, DPUs, and internode communication. ### Optimize Python for AI and Deep Learning Nsight Systems helps you write Python applications that maximize GPU utilization. Backtraces and automatic call stack sampling allows you to fine-tune performance for deep learning applications. Furthermore, integration with Jupyter Lab allows you to profile Python and other supported languages directly in Jupyter, including detailed analysis with the full Nsight Systems GUI. [Get the NVIDIA Nsight Tools JupyterLab Extension](https://pypi.org/project/jupyterlab-nvidia-nsight/) https://www.youtube-nocookie.com/embed/aQ1NYoRvp7o? _Feature Spotlight on Python support in Nsight Developer Tools_ ![Nsight Systems detects frame stutter and bottlenecks](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/detect-frame-stutter-and-bottlenecks-630x354.jpg) _Nsight Systems detected a low-health frame resulting in a large stutter, as well as the calls that caused it. _ ### Detect Frame Stutter and Bottlenecks Nsight Systems automatically detects slow frames (by highlighting frame times higher than a target) as well as local stutter frames (by highlighting frames with higher times than neighboring frames). It also automatically reports CPU times per frame and API calls that are likely candidates for causing stutters. This equips developers with plenty of information to locate and resolve the causes of frame drops and inconsistent frame timing. [Read more about using Nsight Systems to fix stutters in games](https://developer.nvidia.com/blog/using-nsight-systems-for-fixing-stutters-in-games/) [View Full Feature List](https://developer.nvidia.com/nsight-systems/get-started#feat-table)[Get Started](https://developer.nvidia.com/nsight-systems/get-started) * * * ## View Other Tools Within the Nsight Suite Nsight Systems is part of the NVIDIA Nsight Developer Tools suite, a collection of powerful tools, libraries, and SDKs that enable developers to build, debug, and profile software utilizing the latest accelerated computing hardware. ![img-alt-text](https://developer.download.nvidia.com/images/nvidia-nsight-graphics-icon-gbp-shaded-128.png) ### Nsight Graphics NVIDIA Nsight Graphics is a standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK. [Get Started](https://developer.nvidia.com/nsight-graphics) ![img-alt-text](https://developer.download.nvidia.com/images/nvidia-nsight-compute-icon-gbp-shaded-128.png) ### Nsight Compute Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results.. [Get Started](https://developer.nvidia.com/nsight-compute) ![img-alt-text](https://developer.download.nvidia.com/images/nvidia-nsight-aftermath-icon.png) ### Nsight Aftermath SDK Nsigh Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash. [Get Started](https://developer.nvidia.com/nsight-aftermath) [Browse Tutorials](http://developer.nvidia.com/nsight-developer-tools)[Learn More About Nsight Tools](https://developer.nvidia.com/tools-overview) * * * ## Check out partner testimonials and ecosystem > Vulkan is the cornerstone of Adobe’s multi-platform, multi-vendor rendering strategy for its Adobe Substance 3D products. Thanks to the ray-tracing extensions that NVIDIA pioneered and contributed to Khronos, Vulkan gives native access to ray-tracing hardware, offering exceptional ray-tracing performance on supported devices. In addition, Nsight Graphics and Nsight Systems are invaluable tools when it comes to understanding and improving the performance of Vulkan ray-tracing applications. > > — Francois Beaune, Lead Software Engineer of Photorealistic Rendering, Adobe 3D and Immersive > > ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-adobe.svg) > NVIDIA Nsight Systems has enabled the Microsoft Azure HPC+AI team to perform detailed analysis and optimize GPU-accelerated AI and software for our services and customers. The tool paints a clear picture of events on the CPUs, GPUs, NICs, and OS, which have allowed us to quickly identify the top time-consuming functions and cold spots to target. > > — Kushal Datta, Principal Software Engineer, Microsoft Azure HPC+AI > > ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-microsoft.svg) > We noticed that our new Quadro P6000 server was ‘starved’ during training, and we needed experts to support us. NVIDIA Nsight Systems helped us to achieve over 90% GPU utilization. A deep learning model that previously took 600 minutes to train now takes only 90. > > — Felix Goldberg, Chief AI Scientist, Tracxpoint **Deepset achieves a 3.9X speedup and 12.8X cost reduction for training natural language processing models by working with AWS and NVIDIA.** [Learn More](https://aws.amazon.com/blogs/machine-learning/deepset-achieves-a-3-9x-speedup-and-12-8x-cost-reduction-for-training-nlp-models-by-working-with-aws-and-nvidia/) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-adobe.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-autodesk.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-dassault-systems.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-epic-games.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-microsoft.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-popcorn-fx.svg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/logo-ubi-soft.svg) * * * ## Watch Nsight Developer Tools CUDA Tutorials CUDA Developer Tools is a series of tutorial videos designed to get you started with using Nsight tools for CUDA development. It explores key features for CUDA profiling, debugging, and optimizing. https://www.youtube.com/embed/xdFQZSV5IrU?https://www.youtube.com/embed/xdFQZSV5IrU?https://www.youtube-nocookie.com/embed/xdFQZSV5IrU? #### CUDA Developer Tools | NVIDIA Nsight Tools Ecosystem https://www.youtube-nocookie.com/embed/dUDGO66IadU? #### CUDA Developer Tools | Intro to NVIDIA Nsight Systems https://www.youtube-nocookie.com/embed/Iuy_RAvguBM? #### CUDA Developer Tools | Intro to NVIDIA Nsight Compute [CUDA Developer Tools Tutorials Playlist](https://www.youtube.com/playlist?list=PL5B692fm6--ukF8S7ul5NmceZhXLRv_lR) * * * ## Watch Nsight Systems Sessions and Technical Videos on Demand * * * ## Stay up to Date on the Latest NVIDIA Nsight Systems News * * * ## Find more resources ![img-alt-text](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/forums-icon-48x48_1.svg) #### Explore Nsight Systems Forums ![img-alt-text](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/documentation-icon-48x48_1.svg) #### Read Nsight Systems Documentation ![img-alt-text](https://developer.download.nvidia.com/images/m48-live-talk-on-demand-256px-grn-1.png) #### Browse Nsight Tools Tutorials [ ](https://developer.nvidia.com/tools-tutorials) ![img-alt-text](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/m48-people-group-256px-grn.png) #### Join the NVIDIA Developer Program Ready to get started with NVIDIA Nsight Systems? [Download Now](https://developer.nvidia.com/nsight-systems/get-started) Quick Links - [Download](/nsight-systems/get-started) - [Documentation](https://docs.nvidia.com/nsight-systems/index.html) * * * --- # Source: https://developer.nvidia.com/nsight-visual-studio-code-edition.md # NVIDIA Nsight Visual Studio Code Edition NVIDIA Nsight™ Visual Studio Code Edition (VSCE) is an application development environment for heterogeneous platforms that brings CUDA® development for GPUs on Linux and QNX target[1] systems into [Microsoft Visual Studio Code](https://code.visualstudio.com/). NVIDIA Nsight™ VSCE enables you to build and debug GPU kernels and native CPU code as well as inspect the state of the GPU and memory. [1] note that while Nsight VSCode Edition may be run on Linux, Windows, or MacOS host systems, the GPU being debugged must be on a Linux or QNX target system. Local debugging can only be done on Linux systems. The wait is over! Download the latest official release now. [Download Now](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-vscode-edition) [Resources |](#resources)[Support |](#support-feedback) [Feedback](#support-feedback) https://www.youtube-nocookie.com/embed/gN3XeFwZ4ng ## Benefits ### Higher Productivity Using smart CUDA auto- code completion features improves the overall development experience and enables users to save time and effort when writing code. ### Interactivity Debugging with Nsight Visual Studio Code Edition provides diverse benefits, including code formatting, easy navigation through source code, displaying and interacting with different source files, building executables, and testing. ### Remote Development Support Nsight Visual Studio Code Edition enables developers to implement CUDA code in various cluster environments such as Virtual Machines or remote Docker containers. It also supports code development for Linux systems via the Remote – WSL extension. ## Key Features ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/hpc-nsight-vscode-features-syntax-2c50-d%402x.jpg) ### CUDA Syntax Highlighting for Code Development and Debugging Edit code productively with syntax highlighting and IntelliSense for CUDA code. Auto-completion, go to definition, find references, rename symbols, and more all seamlessly work for kernel functions the same as they do for C++ functions. ### CUDA Kernel Breakpoint Support and Kernel Execution Control Break into a debugging session in CPU or GPU device code using standard breakpoints, including support for conditional breakpoints with expression evaluation. GUI controls allow you to step over, into, or out of statements in the source code, just like normal CPU debugging. Breakpoints are evaluated for every kernel thread and will halt execution when any thread encounters them. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/hpc-nsight-vscode-features-kernel-2c50-d%402x.jpg) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/hpc-nsight-vscode-features-inspection-2c50-d%402x.jpg) ### GPU and Kernel State Inspection Break into the debugger to see all the detailed information at any point in the application with GPU debugging support for register, variable, and call-stack. There is watchlist support to add specific variables of interest for tracking. Additionally, there are focus controls to manually select streaming multiprocessor (SM), warp, and lane numbers to switch the debugger focus. ## Resources - [Documentation](https://docs.nvidia.com/nsight-visual-studio-code-edition) - [Videos](https://developer.nvidia.com/nsight-vsce-videos) - [Blogs](https://developer.nvidia.com/nsight-vsce-blogs) - [Release History](https://developer.nvidia.com/nsight-vsce-history) ### Support - Post comments on [Forum](https://forums.developer.nvidia.com/c/development-tools/nsight-vscode-edition/379) - Report a [bug](https://developer.nvidia.com/rdp/bugs/nsight-graphics-bug-reporting) ### Feedback Let us know about your experience with NVIDIA developer tools [Submit feedback](https://developer.nvidia.com/nvidia-nsight-visual-studio-code-edition-provide-feedback) The wait is over! Download the latest official release now. [Download](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-vscode-edition) --- # Source: https://developer.nvidia.com/nvjpeg.md Learn from the best in the field with our exclusive computer vision speaker series. [Register now](https://www.nvidia.com/en-us/events/computer-vision-speaker-series/?nvid=nv-int-unbr-911558) for free! # nvJPEG Libraries ## GPU-accelerated JPEG decoder, encoder and transcoder The nvJPEG library is a high-performance GPU-accelerated library for decoding, encoding and transcoding JPEG format images. The nvJPEG2000 library is for decoding JPEG 2000 format images. Applications that rely on nvJPEG or nvJPEG2000 for decoding deliver higher throughput and lower latency compared to CPU-only decoding. ## nvJPEG The nvJPEG library provides low-latency decoding, encoding, and transcoding for common JPEG formats used in computer vision applications such as image classification, object detection and image segmentation. ### nvJPEG Key Features - Hybrid decoding using both the CPU and the GPU - Hardware acceleration for baseline JPEG decode on A100 GPUs - Single image and batched image decoding - Single phase and multiphase decoding - Color space conversion to RGB, BGR, RGBI, BGRI, and YUV - Input to the library is in the host memory, and the output is in the GPU memory - User-provided memory manager for the device and pinned host memory allocations ### Get Started with nvJPEG - For the most current version of nvJPEG, download the [CUDA Toolkit](/cuda-toolkit). - If you are using CUDA Toolkit 10.0 or 9.0, please download the [nvJPEG installer](/nvjpeg-release-download). - [nvJPEG Documentation](https://docs.nvidia.com/cuda/nvjpeg/index.html) - [nvJPEG Examples](https://github.com/NVIDIA/CUDALibrarySamples) on GitHub - Technical Blog: [Leveraging the Hardware JPEG Decoder and NVIDIA nvJPEG Library on NVIDIA A100 GPUs](https://developer.nvidia.com/blog/leveraging-hardware-jpeg-decoder-and-nvjpeg-on-a100/) * * * ## nvJPEG Performance ### Decoding Speed with Multiple Threads ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/fhd.png) Speed up achieved by hardware decode on A100 compared to CUDA decode on V100 and CPU only decode. CPU: Intel Xeon Platinum 8168@2GHz 3.7GHz Turbo (Skylake) HT On ### Encoding Speed ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/baseline.png) JPEG Baseline encoding throughput comparison between CPU, and V100 and A100 GPUs for common image sizes and formats. CPU: Intel Xeon Platinum 8168@2GHz 3.7GHz Turbo (Skylake) HT On * * * ## nvJPEG2000 The nvJPEG2000 library is for application developers and researchers who are employing JPEG 2000 formatted images in their research in fields such as deep learning, medical imaging, digital pathology, remote sensing and digital cinema applications. nvJPEG2000 reads and decodes JPEG 2000 format image data from CPU memory. The decoded output is in GPU memory. The library relies on both CPU and GPU for decoding. ### nvJPEG2000 Key Features - Output formats: grayscale and color images with arbitrary width and height - Compression Technique: Lossy (wavelet CDF 9/7) and lossless (wavelet CDF 5/3) image compression and decompression - jp2 file format and jpeg2000 code stream are supported ### Get Started with nvJPEG2000 [Download](/nvjpeg2000-downloads) - [nvJPEG2000 Documentation](https://docs.nvidia.com/cuda/nvjpeg2000/index.html) - [nvJPEG2000 Examples](https://github.com/NVIDIA/CUDALibrarySamples) on GitHub * * * ## nvJPEG2000 Performance ### 4x Faster Lossless Decoding ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/lossless_updated.png) Lossless: 5-3 wavelet transform CPU: Intel Xeon Gold 6240@2GHz 3.9GHz Turbo (Cascade Lake) HT On ### 7X Faster Lossy Decoding ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/lossy_updated.png) Lossy: 9-7 wavelet transform CPU: Intel Xeon Gold 6240@2GHz 3.9GHz Turbo (Cascade Lake) HT On * * * ## Resources - Technical Blog: [Leveraging the Hardware JPEG Decoder and NVIDIA nvJPEG Library on NVIDIA A100 GPUs](https://developer.nvidia.com/blog/leveraging-hardware-jpeg-decoder-and-nvjpeg-on-a100/) - Related libraries and software: - [NVIDIA Data Loading Library (DALI)](/DALI) - [NVIDIA Performance Primitives (NPP)](/npp) - [NVIDIA GPU Cloud](https://ngc.nvidia.com/catalog/collections) - For questions or feedback, please contact [nvjpeg@nvidia.com](mailto:nvjpeg@nvidia.com) - To make a feature request or report an issue, register on [NVIDIA Developer Zone](/developer-program) ### Join the Developer Program [Join Now](/developer-program) --- # Source: https://developer.nvidia.com/nvmath-python.md 1. [Home](/) [High Performance Computing](/hpc) [CUDA-X GPU Accelerated Libraries](/gpu-accelerated-libraries) nvmath-python Quick Links - [Install (pip)](https://docs.nvidia.com/cuda/nvmath-python/latest/installation.html#install-from-pypi)[Install (conda) ](https://docs.nvidia.com/cuda/nvmath-python/latest/installation.html#install-from-conda)[Build from source](https://docs.nvidia.com/cuda/nvmath-python/latest/installation.html#build-from-source)[GitHub](https://github.com/NVIDIA/nvmath-python/)[Documentation](https://docs.nvidia.com/cuda/nvmath-python/index.html) - * * * ![img-alt-text](https://developer.download.nvidia.com/images/cudss-nvmath-python-green-r4@4x_CUT-1.png) # nvmath-python **nvmath-python** (Beta) is an [open source library](https://github.com/nvidia/nvmath-python) that bridges the gap between Python scientific community and [NVIDIA CUDA-X™ math Libraries](https://developer.nvidia.com/gpu-accelerated-libraries) by reimagining Python’s performance-oriented APIs. It interoperates with and complements existing array libraries such as NumPy, CuPy, and PyTorch by pushing performance limits to new levels through such capabilities as stateful APIs, just-in-time kernel fusion, custom callbacks, and scaling to many GPUs. Python practitioners, library developers, and GPU kernel writers are finding **nvmath-python** a powerful tool for getting their scientific, data science, and AI workflows scale with minimum effort. [Install with pip](https://docs.nvidia.com/cuda/nvmath-python/latest/installation.html#install-from-pypi "Install with pip")[Install with conda](https://docs.nvidia.com/cuda/nvmath-python/latest/installation.html#install-from-conda "Install with conda") **Other Links:** [Build From Source](https://docs.nvidia.com/cuda/nvmath-python/latest/installation.html#build-from-source "Build from source")[GitHub](https://github.com/NVIDIA/nvmath-python/ "GitHub") * * * ## Key Features ### Intuitive Pythonic APIs - **nvmath-python** reimagines math library APIs to cover **sophisticated use cases** , which are impossible with _NumPy-like_ APIs without performance compromises. - **Host APIs** provide both out-of-the-box simplicity and versatility of customization options through optional arguments for accessing all “knobs” of the underlying NVIDIA math libraries. Host APIs are divided into generic and specialized APIs. Generic APIs are intended for a consistent user experience across memory/execution spaces. They may not work with some data types, which are hardware specific and they do not necessarily leverage specific device capabilities. They are great for writing portable code. In contrast, specialized APIs have narrower applications, they may only work on a specific hardware only. They are great for fully leveraging hardware capabilities at the expense of portability. - **Device APIs** allow embedding nvmath-python library calls in custom kernels, written using Python compilers such as numba-cuda. You no longer need to write GEMM or FFT kernels from scratch. - **Host APIs with callbacks** allow embedding custom Python code into nvmath-python calls. Internal JIT machinery compiles the custom code and fuses with nvmath-python operation to achieve peak performance. - **Stateful** (class-form) APIs allow splitting the entire math operation into specification, planning, autotuning, and execution phases. Having an expensive specification, planning, and autotuning done once, allows their cost amortization through multiple subsequent executions. - **Integration with the** [Python logging facility](https://docs.python.org/3/library/logging.html) brings visibility into specification, planning, autotuning, and execution machinery details at runtime. ### Interoperability With Python Ecosystem - **nvmath-python** works in conjunction with popular Python packages. This includes GPU-based packages like CuPy, PyTorch, and RAPIDS and CPU-based packages like NumPy, SciPy, and scikit-learn. You can keep using familiar data structures and workflows while benefiting from accelerated math operations through **nvmath-python**. - **nvmath-python** is not a replacement for array libraries such as NumPy, CuPy, and PyTorch. It does not implement array APIs for array creation, indexing, and slicing. **nvmath-python** is intended to be used alongside these array libraries. All these dependencies are optional, and you’re free to choose which array library (or multiple libraries) to work with alongside **nvmath-python**. - **nvmath-python** supports CPU and GPU execution and memory spaces. It eases the transition between CPU and GPU implementations as well as allows implementing hybrid CPU-GPU workflows. - In combination with Python compilers, such as [numba-cuda](https://github.com/NVIDIA/numba-cuda), you can implement GPU custom kernels with embedded **nvmath-python** library calls. ### Scalable Performance - **nvmath-python** pushes performance limits to an edge by delivering performance comparable to the underlying CUDA-X native libraries, such as [cuBLAS family](/cublas), [cuFFT family,](/cufft) [cuDSS](/cudss), and [cuRAND](/curand). With stateful APIs you can amortize the costs of _specification_, _planning_, and _autotuning_ phases through _multiple executions_. - For CPU execution **nvmath-python** leverages the [NVPL library](/nvpl) for the best performance on [NVIDIA Grace™ CPU](https://www.nvidia.com/en-us/data-center/grace-cpu/) platforms. It also supports acceleration of x86 hosts by leveraging the MKL library. - In combination with Python compilers, such as **[numba-cuda,](https://github.com/NVIDIA/numba-cuda)** you can now write highly performant kernels involving GEMM, FFT, and/or RNG operations. Here are just a few examples of impossible made possible with **nvmath-python** : - [DGEMM emulation using int8 tensor cores](https://github.com/NVIDIA/nvmath-python/blob/main/examples/device/cublasdx_fp64_emulation.py) - [Convolution kernel](https://github.com/NVIDIA/nvmath-python/blob/main/examples/device/cufftdx_convolution_performance.py) - [Monte Carlo kernel](https://github.com/NVIDIA/nvmath-python/blob/main/examples/device/curand_philox_uniform4.py) - **nvmath-python** allows scaling beyond a single GPU and even beyond a single node without major coding effort. Multi-GPU multi-node (MGMN) APIs allow easy transition from a single GPU implementation to MGMN and seamlessly scale to thousands of GPUs. The library also offers helper utilities for reshaping data (re-partitioning) as needed without major coding. * * * ## Supported Operations ### Dense Linear Algebra - Generalized Matrix Multiplication The library offers a GEneralized Matrix Multiplication (GEMM) that performs 𝐃 = 𝐹(ɑ ⋅ 𝐀 ⋅ 𝐁 + β ⋅ 𝐂), where 𝐀, 𝐁, 𝐂 are matrices of compatible dimensions and layouts, ɑ and β are scalars, and 𝐹(𝐗) is a pre-defined function (epilog) which is applied elementwise to matrix 𝐗. #### Documentation - [Specialized Host APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/host-apis/linalg/index.html) - Generic Host APIs (coming soon) - [Device APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/device-apis/cublas.html) - Distributed APIs (coming soon) #### Tutorials and examples - [Blog "Fusing Epilog Operations with Matrix Multiplication Using nvmath-python"](https://developer.nvidia.com/blog/fusing-epilog-operations-with-matrix-multiplication-using-nvmath-python/) - [Tutorial "Introduction to GEMM with nvmath-python"](https://github.com/NVIDIA/nvmath-python/blob/main/notebooks/matmul/01_introduction.ipynb) - [Tutorial "Epilogs"](https://github.com/NVIDIA/nvmath-python/blob/main/notebooks/matmul/02_epilogs.ipynb) - [Tutorial "Implementing a neural network using nvmath-python"](https://github.com/NVIDIA/nvmath-python/blob/main/notebooks/matmul/03_backpropagation.ipynb) - [Tutorial "FP8 computations with nvmath-python"](https://github.com/NVIDIA/nvmath-python/blob/main/notebooks/matmul/04_fp8.ipynb) - [Host API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/linalg/advanced/matmul) - [Device API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/device) - [Host APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/host-apis/linalg/index.html) provide a _specialized API_ located in nvmath.linalg.advanced submodule backed by cuBLASLt library. This API supports GPU execution space only. The key distinguishing feature of the library is an ability to fuse matrix operations and epilog in a **single fused kernel**. The library also offers facilities to perform additional **autotuning** allowing the best fused kernel selection for a specific hardware and a specific problem size. Both **stateful** and **stateless** APIs are provided. Generic APIs will be implemented in a future release. - [Device APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/device-apis/cublas.html) are located in the nvmath.device submodule backed by [cuBLASDx library](https://docs.nvidia.com/cuda/cublasdx/). They can be used from within [numba-cuda](https://github.com/NVIDIA/numba-cuda) kernels. - **Distributed APIs** will be implemented in a future release. ![nvmath-python linear algebra performance](https://developer.download.nvidia.com/images/linear_alg_perf_CUT.jpg) _Advanced matmul performance is shown on H100 PCIe for matrices A[m×n], B[n×k], bias[m], where m=65536, n=16384, k=8192. The data type of the operands and result is bfloat16, float32 type is used for compute._ ### Fast Fourier Transforms The library offers forward and inverse FFTs for complex-to-complex (C2C), complex-to-real (C2R), and real-to-complex (R2C) discrete Fourier transformations. #### Documentation - [Host APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/host-apis/fft/index.html) - [Device APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/device-apis/cufft.html) - [Distributed APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/distributed-apis/fft/index.html) #### Tutorials and examples - [Host API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/fft) - [Device API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/device) - [Distributed FFT API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/distributed/fft) - [Distributed Reshape API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/distributed/reshape) ![nvmath-python FFT performance](https://developer.download.nvidia.com/images/fast_four_trans_perf_CUT.jpg) _Fast fourier transform performance is shown on H100 PCIe for FFTs of size 512 computed in 1048576 (220) batches using complex64 data type._ - [Host APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/host-apis/fft/index.html) are located in the nvmath.fft submodule backed by [cuFFT](/cufft#section-cufft) library. The APIs support both CPU and GPU execution spaces. NVIDIA Grace™ CPU platforms are powered by the NVPL library whereas for x86 hosts MKL is offered as a CPU backend. The key distinguishing feature of the library is an ability to fuse an FFT operation and a custom callback written as a Python function into a **single fused kernel**. The library also offers facilities to perform additional **autotuning** allowing the best fused kernel selection for a specific hardware and a specific problem size. Both **stateful** and **stateless** APIs are provided. - [Device APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/device-apis/cufft.html) are located in the nvmath.device submodule backed by [cuFFTDx library](/cufft#section-cufftdx). They can be used from within [numba-cuda](https://github.com/NVIDIA/numba-cuda) kernels. - [Distributed APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/distributed-apis/fft/index.html) are located in the nvmath.distributed.fft submodule powered by [cuFFTMp library](/cufft#section-cufftmp), allowing users to solve distributed 2D and 3D FFT exascale problems. ### Random Number Generation The library offers device APIs for performing random number generation from within a GPU kernel written in [numba-cuda](https://github.com/NVIDIA/numba-cuda). It provides a collection of pseudo- and quasi-random number bit generators as well as sampling from popular probability distributions. #### Documentation - [Device APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/device-apis/curand.html) #### Tutorials and examples - [Device API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/device) ![nvmath-python FFT performance](https://developer.download.nvidia.com/images/gbm_paths.png) _Stock pricing using Geometric Brownian Motion kernel written in numba-cuda using nvmath-python device RNG_ - [Device APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/device-apis/curand.html) are located in the nvmath.device submodule backed by [cuRAND library](/curand). They can be used from within numba-cuda kernels for high-performance Monte Carlo simulations on GPU. Please note, the library does not offer corresponding host APIs and rather encourages using random number generation facilities provided by respective array libraries, such as NumPy and CuPy. **Bit RNGs:** - MRG32k3a - MTGP Merseinne Twister - XORWOW - Sobol quasi-random number generators **Distribution RNGs:** - Uniform distribution - Normal distribution - Log-normal distribution - Poisson distribution ### Sparse Linear Algebra - Direct Solver The library offers specialized APIs to support sparse linear algebra computations. The library currently offers the specialized direct solver API for solving systems of linear equations 𝐀 ⋅ 𝐗 = 𝐁, where 𝐀 is a known left-hand side (LHS) sparse matrix, 𝐁 is a known right-hand side (RHS) vector or a matrix of a compatible shape, and 𝐗 is an unknown solution provided by the solver. #### Documentation - [Specialized Host APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/host-apis/sparse/index.html) - Generic Host APIs (in future) - Device APIs (in future) - Distributed APIs (coming soon) #### Tutorials and examples - [Host DSS API examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples/sparse/advanced/direct_solver) ![nvmath-python FFT performance](https://developer.download.nvidia.com/images/DatacenterKV-2-1536x864.png) _Stock pricing using Geometric Brownian Motion kernel written in numba-cuda using nvmath-python device RNG_ - **[Host APIs](https://docs.nvidia.com/cuda/nvmath-python/latest/host-apis/sparse/index.html)** provide a specialized API located in nvmath.sparse.advanced submodule backed by [cuDSS library](/cudss). This API supports GPU execution and hybrid GPU-CPU execution space only. The key distinguishing feature of the library is an ability to solve a series of linear systems in batches. The library supports explicit batching, when linear systems are provided as sequences of LHS and/or RHS, and implicit batching, when the sequence is inferred from a higher dimensional tensor. Both **stateful** and **stateless** APIs are provided. Generic APIs will be implemented in a future release. - **Device APIs** will be available in a future release. - **Distributed APIs** will be implemented in a future release. * * * ## Resources - [nvmath-python Documentation](https://docs.nvidia.com/cuda/nvmath-python/index.html) - [nvmath-python Github](https://github.com/nvidia/nvmath-python) - [nvmath-python Tutorials (Jupyter notebooks)](https://github.com/NVIDIA/nvmath-python/tree/main/notebooks/) - [nvmath-python Examples](https://github.com/NVIDIA/nvmath-python/tree/main/examples) - [CUDA-X GPU-Accelerated Libraries](https://developer.nvidia.com/gpu-accelerated-libraries) - [nvmath-python Demo at GTC](https://www.nvidia.com/en-us/on-demand/session/gtc24-s62162/?start=1394) - [Blog "Fusing Epilog Operations with Matrix Multiplication Using nvmath-python"](https://developer.nvidia.com/blog/fusing-epilog-operations-with-matrix-multiplication-using-nvmath-python/) - [Tutorial "Accelerating and Scaling Python for HPC"](https://github.com/samaid/pyhpc-tutorial) **Get started with nvmath-python** [Install Now](https://docs.nvidia.com/cuda/nvmath-python/latest/quickstart.html "Install Now") --- # Source: https://developer.nvidia.com/nvpl.md 1. [Home](/) 2. NVIDIA NVPL # NVIDIA Performance Libraries NVIDIA Performance Libraries (NVPL) are a collection of essential mathematical libraries optimized for [Arm 64-bit architectures](/arm). NVPL allows you to easily port HPC applications to [NVIDIA Grace™ CPU](https://www.nvidia.com/en-us/data-center/grace-cpu/) platforms to achieve industry-leading performance and efficiency. [Download](https://developer.nvidia.com/nvpl-downloads)[Documentation](https://docs.nvidia.com/nvpl/)[Feedback](nvpl-interest-ext@exchange.nvidia.com) * * * ## Key Features ### Seamlessly Port to Grace CPU NVPL math libraries are drop-in replacements for standard C and Fortran mathematical APIs. This allows existing HPC applications to be easily ported to Grace-based systems with no source code changes. ### Optimized for Arm Architecture NVPL is specifically tuned for Arm 64-bit CPUs and ensures mathematical applications achieve peak efficiency on the Grace microarchitecture. NVPL allows you to fully harness NVIDIA silicon in the data center. ### Use Standard Math Subroutines Many HPC applications rely on standardized mathematical APIs like BLAS, FFTW, and LAPACK that are critical to application performance. NVPL provides optimized implementations of these essential libraries for the Grace CPU. * * * ## NVPL Libraries (Beta) ### NVPL BLAS Industry standard Basic Linear Algebra Subprograms (BLAS) optimized for the NVIDIA Grace CPU architecture with single- and multi-threaded interfaces. ![NVPL BLAS efficiency chart](https://developer.download.nvidia.com/images/nvpl/nvpl-blas-dgemm-efficiency-chart@2x.png) ![NVPL LAPACK Performance chart](https://developer.download.nvidia.com/images/nvpl/nvpl-lapack-chart@2x(1).png) ### NVPL LAPACK Dense direct linear solvers and eigen-solvers for computer vision, linear optimization, and more on NVIDIA Grace CPUs. ### NVPL FFT Fast Fourier Transforms (FFT) for applications including computational physics and quantum chemistry. ![NVPL FFT Performance chart](https://developer.download.nvidia.com/images/nvpl/nvpl-fft-performance@2x.png) ### NVPL TENSOR Tensor contraction, reduction, and elementwise operations for deep learning and inference on Grace CPUs. ### NVPL RAND High-performance random number generation (RNG) for fast and high-quality random numbers. ### NVPL SPARSE Sparse linear algebra subroutines for accelerating machine learning, fluid dynamics, and more. ### NVPL ScaLAPACK A LAPACK extension designed for distributed memory parallel computing environments. * * * ## Resources - [NVPL Documentation](https://docs.nvidia.com/nvpl/) - [NVPL Samples (GitHub)](https://github.com/NVIDIA/NVPLSamples) - [Unlock the Power of NVIDIA Grace and NVIDIA Hopper™ Architectures with Foundational HPC Software](https://developer.nvidia.com/blog/unlock-the-power-of-nvidia-grace-and-nvidia-hopper-architectures-with-foundational-hpc-software/) - [NVPL Feedback](mailto:nvpl-interest-ext@exchange.nvidia.com) ## Ready to get started with NVIDIA Performance Libraries? [Download](https://developer.nvidia.com/nvpl-downloads) Quick Links - [Download ](https://developer.nvidia.com/nvpl-downloads) - [Documentation](https://docs.nvidia.com/nvpl/) * * * --- # Source: https://developer.nvidia.com/nvshmem.md # NVSHMEM **NVSHMEM™** is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA® streams. [Download NVSHMEM](/nvshmem-downloads) [Documentation](https://docs.nvidia.com/nvshmem/api/index.html) [Release Notes](https://docs.nvidia.com/nvshmem/release-notes-install-guide/release-notes/index.html) [GitHub](https://github.com/NVIDIA/nvshmem/releases) [NVSHMEM API Guide](https://docs.nvidia.com/nvshmem/api/api.html) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvshmem/mpi-nvshmem-explainer-diagram.svg)Existing communication models, such as Message-Passing Interface (MPI), orchestrate data transfers using the CPU. In contrast, NVSHMEM uses asynchronous, GPU-initiated data transfers, eliminating synchronization overheads between the CPU and the GPU. * * * #### Efficient, Strong Scaling NVSHMEM enables long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling. #### Low Overhead One-sided communication primitives reduce overhead by allowing the initiating process or GPU thread to specify all information required to complete a data transfer. This low-overhead model enables many GPU threads to communicate efficiently. #### Naturally Asynchronous Asynchronous communications make it easier for programmers to interleave computation and communication, thereby increasing overall application performance. * * * ## What's New in NVSHMEM 3.4 - Added support for data direct NIC configurations in the IB transports. - Added a new environment variable, `NVSHMEM_DISABLE_DATA_DIRECT`, to force disable data direct NIC even when present. - Added support for CPU-Assisted IBGDA without the use of GDRCopy or the x86 regkey setting. Systems not supporting the other methods will automatically fall back to this new method. It enables the use of IBGDA on a broad range of systems without the need for administrator intervention. - Added a new environment variable `NVSHMEM_HCA_PREFIX` to enable IB transports on systems which name their HCA devices in a non-standard way (for example, `ipb*` instead of `mlx5*`). - Deprecated support for the combined `libnvshmem.a` host and device static library. * * * ## Key Features - Combines the memory of multiple GPUs into a partitioned global address space that’s accessed through NVSHMEM APIs - Includes a low-overhead, in-kernel communication API for use by GPU threads - Includes stream-based and CPU-initiated communication APIs - Supports x86 and Arm processors - Is interoperable with MPI and other OpenSHMEM implementations * * * ## NVSHMEM Advantages ### Increase Performance Convolution is a compute-intensive kernel that’s used in a wide variety of applications, including image processing, machine learning, and scientific computing. Spatial parallelization decomposes the domain into sub-partitions that are distributed over multiple GPUs with nearest-neighbor communications, often referred to as halo exchanges. In the Livermore Big Artificial Neural Network (LBANN) deep learning framework, spatial-parallel convolution is implemented using several communication methods, including MPI and NVSHMEM. The MPI-based halo exchange uses the standard send and receive primitives, whereas the NVSHMEM-based implementation uses one-sided put, yielding significant performance improvements on Lawrence Livermore National Laboratory’s [Sierra supercomputer](https://computing.llnl.gov/computers/sierra). ##### Efficient Strong-Scaling on Sierra Supercomputer ##### Efficient Strong-Scaling on NVIDIA DGX SuperPOD ### Accelerate Time to Solution Reducing the time to solution for high-performance, scientific computing workloads generally requires a strong-scalable application. QUDA is a library for lattice quantum chromodynamics (QCD) on GPUs, and it’s used by the popular MIMD Lattice Computation (MILC) and Chroma codes. NVSHMEM-enabled QUDA avoids CPU-GPU synchronization for communication, thereby reducing critical-path latencies and significantly improving strong-scaling efficiency. [Watch the GTC 2020 Talk](/gtc/2020/video/s21673) ### Simplify Development The conjugate gradient (CG) method is a popular numerical approach to solving systems of linear equations, and CGSolve is an implementation of this method in the Kokkos programming model. The CGSolve kernel showcases the use of NVSHMEM as a building block for higher-level programming models like Kokkos. NVSHMEM enables efficient multi-node and multi-GPU execution using Kokkos global array data structures without requiring explicit code for communication between GPUs. As a result, NVSHMEM-enabled Kokkos significantly simplifies development compared to using MPI and CUDA. ##### Productive Programming of Kokkos CGSolve * * * ## Resources - Users of NVSHMEM: - [QUDA: Avoiding the Jam (Lattice 2022 Talk)](https://indico.hiskp.uni-bonn.de/event/40/contributions/693/attachments/318/539/Lattice2022_Wagner_NVSHMEM.pdf) - [cuFFTMp: Multinode Multi-GPU: Using NVIDIA cuFFTMp FFTs at Scale (Blog)](https://developer.nvidia.com/blog/multinode-multi-gpu-using-nvidia-cufftmp-ffts-at-scale/) - GROMACS - [Massively Improved Multi-node NVIDIA GPU Scalability with GROMACS (Blog)](https://developer.nvidia.com/blog/massively-improved-multi-node-nvidia-gpu-scalability-with-gromacs) - [Cutting-Edge CUDA Technologies for Molecular Dynamics and Beyond (GTC 2023 Talk)](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51110/) - [Kokkos: Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces (Paper)](https://ieeexplore.ieee.org/document/10321871) - Others - [Scalable Simulation of Quantum Circuit with Noise on GPU-based HPC Systems (GTC Talk)](https://www.nvidia.com/gtc/session-catalog/?tab.scheduledorondemand=1583520458947001NJiE&search=nvshmem#/session/1638571299696001GJl3) - NVSHMEM Blogs: - [Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0](https://developer.nvidia.com/blog/enhancing-application-portability-and-compatibility-across-new-platforms-using-nvidia-magnum-io-nvshmem-3-0/) - [IBGDA: Improving Network Performance of HPC Systems Using NVIDIA Magnum IO NVSHMEM and GPUDirect Async](https://developer.nvidia.com/blog/improving-network-performance-of-hpc-systems-using-nvidia-magnum-io-nvshmem-and-gpudirect-async/) - [Scaling Scientific Computing with NVSHMEM](https://developer.nvidia.com/blog/scaling-scientific-computing-with-nvshmem/) - [Accelerating NVSHMEM 2.0 Team-Based Collectives Using NCCL](https://developer.nvidia.com/blog/accelerating-nvshmem-2-0-team-based-collectives-using-nccl/) - [Introductory Webinar](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51705/) - [NVSHMEM Documentation](https://docs.nvidia.com/hpc-sdk/nvshmem/index.html) - [NVSHMEM Best Practices Guide](https://docs.nvidia.com/nvshmem/release-notes-install-guide/best-practice-guide/index.html) - [NVSHMEM API Documentation](https://docs.nvidia.com/nvshmem/api/api.html) - [OpenSHMEM Specification](http://openshmem.org/site/) - [NVSHMEM Developer Forum](https://forums.developer.nvidia.com/tag/nvshmem) - For questions or to provide feedback, please contact [nvshmem@nvidia.com](mailto:nvshmem@nvidia.com) - Related libraries and software: - [NVIDIA HPC SDK™](/hpc-sdk) - [NVIDIA GPUDirect®](/gpudirect) - [Magnum IO](https://www.nvidia.com/en-us/data-center/magnum-io/) - [CUDA-X™ Libraries](/gpu-accelerated-libraries) Ready to start developing with NVSHMEM? [Get Started](/nvshmem-downloads) --- # Source: https://developer.nvidia.com/omniverse.md 1. [Home](/) 2. NVIDIA Omniverse ### # NVIDIA Omniverse Libraries Omniverse is NVIDIA’s collection of libraries and microservices for physical AI development. #### Scroll to Resource [Learning Library](#library) #### Quick Links: - [Documentation](https://docs.omniverse.nvidia.com/) - [Developer Guide](https://docs.omniverse.nvidia.com/prod_kit/prod_kit.html) - [Learning Path](https://www.nvidia.com/en-us/learn/learning-path/digital-twins/) - [Join Community](/omniverse/community) - [Forum](https://forums.developer.nvidia.com/c/omniverse/300) - [Discord](https://discord.com/invite/nvidiaomniverse) ## Get Started With Omniverse Libraries Leveraging NVIDIA’s deep expertise in accelerated computing and AI, [Omniverse libraries](https://www.nvidia.com/en-us/omniverse/) enable software makers to integrate pre-built functionality into their solutions. ### Newton Physics An open-source, extensible physics engine built on NVIDIA Warp and OpenUSD. [Learn More about Newton Physics Engine](/newton-physics) ### NuRec APIs, libraries, and datasets for 3D Gaussian- based neural simulation from real-world data. [Get Started](https://docs.nvidia.com/nurec/index.html#) ### Omniverse Kit SDK containing core libraries and developer tooling for building physical AI applications. [Explore Docs](https://docs.nvidia.com/omniverse/index.html#sdks) ### OpenUSD Exchange SDK for building OpenUSD-based input/output solutions to connect 3D data across sources. [Explore Docs](https://docs.omniverse.nvidia.com/usd/code-docs/usd-exchange-sdk/latest/index.html) ### PhysX Open-source multi-physics SDK for scalable robotics simulation and modeling. [Go to GitHub](https://github.com/NVIDIA-Omniverse/PhysX) ### Warp Open-source developer framework for data generation and spatial computing in Python. [Go to GitHub](https://github.com/NVIDIA/warp) ## Get Started With Developer Blueprints ![Digital twins for AI data centers design and operations](https://developer.download.nvidia.com/images/omniverse/digital-twin-ai-factory-design-operations-ari.jpg) ### Digital Twins for AI Factory Design and Operations Unified digital twin for designing, testing, and optimizing AI data centers. [Try Blueprint](https://build.nvidia.com/nvidia/digital-twins-for-ai-factories) [Watch Video](https://www.youtube.com/watch?v=TRc0z_qcze0) ![Digital twin for aerodynamic simulation and design](https://developer.download.nvidia.com/images/omniverse/digital-twin-fluid-simulation-ari.jpg) ### Digital Twin for Interactive Fluid Simulation AI-powered virtual wind tunnel for real-time aerodynamic simulation and design. [Try Blueprint](https://build.nvidia.com/nvidia/digital-twins-for-fluid-simulation) [Watch Video](https://www.youtube.com/watch?v=RIxrN7yedyQ) ![Digital twin for complex industrial robot fleets](https://developer.download.nvidia.com/images/omniverse/industrial-automation-ari.jpg) ### Multi-Robot Fleets for Industrial Automation Framework for simulating complex industrial robot fleets before real-world deployment. [Try Blueprint](https://build.nvidia.com/nvidia/mega-multi-robot-fleets-for-industrial-automation) [Watch Video](https://www.youtube.com/watch?v=IuWk0C3MzBQ) ![Synthetic manipulation motion generation for robotics](https://developer.download.nvidia.com/images/omniverse/synthetic-manipulation-robotics-ari.jpg) ### Synthetic Manipulation Motion Generation for Robotics Reference workflow for generating exponential synthetic manipulation data for robots. [Try Blueprint](https://build.nvidia.com/nvidia/isaac-gr00t-synthetic-manipulation) [Watch Video](https://www.youtube.com/watch?v=Odve87ieiYE) ## Developer Blueprints Coming Soon ![Digital twin for building gigawatt-scale AI data centers](https://developer.download.nvidia.com/images/omniverse/ai-factories-ari.jpg) ### Gigawatt-Scale AI Factories A framework for building gigawatt-scale data centers for massive AI workloads. [Read the Blog](https://blogs.nvidia.com/blog/omniverse-dsx-blueprint/) [Watch Video](https://www.youtube.com/watch?v=Odve87ieiYE) ![Digital twin for smart cities](https://developer.download.nvidia.com/images/omniverse/digital-twin-smart-cities-ari.jpg) ### Digital Twins for Smart Cities Build, test, and optimize AI agents in SimReady digital twins to monitor city-scale operations. [Read the Blog](https://blogs.nvidia.com/blog/smart-city-ai-blueprint-europe/) [Watch Video](https://www.youtube.com/watch?v=BJpBp2gwk2Y) ## Omniverse Learning Resources --- # Source: https://developer.nvidia.com/optical-flow-sdk.md ## Optical Flow - [Benefits](#benefits)[Performance](#performance)[Use Cases](#use-cases)[Resources](#resources) # NVIDIA Optical Flow SDK The NVIDIA Optical Flow SDK taps in to the latest hardware capabilities of NVIDIA Turing™, Ampere, and Ada architecture GPUs dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, ideal for handling frame-to-frame intensity variations and tracking true object motion. [Get Started](https://developer.nvidia.com/opticalflow/download) ![Turing hardware generated optical flow map sample](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/opticalflow/OF_SDK_000.png) Turing hardware-generated optical flow map sample — [source footage](https://ultravideo.cs.tut.fi/#testsequences) * * * ## Optical Flow Benefits ## Accurate Video Analytics Accurately detect and track objects in successive video frames while significantly reducing the computational complexity requirements. ## Real-Time Performance Interpolate or extrapolate video frames in real time, improving smoothness of video playback or reducing latency in VR experiences. ## GPU Acceleration Get performance optimized for Turing, Ampere, and future generations of NVIDIA GPU architectures that includes high-speed computation of accurate flow vectors with little impact on the CPU or GPU. * * * ## Performance ![A chart showing performance of frame interpolation time](https://developer.download.nvidia.com/images/frame-interpolation-time-1920x1080.jpg "A chart showing performance of frame interpolation time") Frame Interpolation Time ![A graph showing optical flow quality metrics](https://developer.download.nvidia.com/images/optical-flow-quality-metrics-1920x1080.jpg "A graph showing optical flow quality metrics") Optical Flow Quality metrics * * * ## Optical Flow Engine-Assisted Frame Rate Up-Conversion Library ![Interpolated frames are generated in between the original frames to create a smoother image](https://developer.download.nvidia.com/images/optical-flow-fruc-1920x1080-1.jpg "Interpolated frames are generated in between the original frames to create a smoother image") NvOFFRUC interpolates new frames using optical flow vectors to double the effective frame rate of a video. The result is improved smoothness of video playback and perceived visual quality. [Learn More About Frame Rate Up-Conversion](https://developer.nvidia.com/blog/harnessing-the-nvidia-ada-architecture-for-frame-rate-up-conversion-in-the-nvidia-optical-flow-sdk/) ## Object Tracking for Intelligent Video Analytics Optical Flow SDK 2.0 introduced an object tracker library based on optical flow, along with source code and ready-to-use API. In our experiments, the optical flow-based object tracker has been shown to reduce the GPU utilization by up to 80%, compared to some of the most popular algorithms without compromising the accuracy of tracking. Optical Flow SDK 3.0 introduces a DirectX12 Interface, forward and backward flow and a global flow vector. [Learn More About Optical Flow SDK](https://developer.nvidia.com/blog/an-introduction-to-the-nvidia-optical-flow-sdk/) ![A flowchart showing object tracking for Intelligent Video Analytics](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/opticalflow/OF2_005.png "A flowchart showing object tracking for Intelligent Video Analytics") ## Video Frame Interpolation and Extrapolation ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/opticalflow/OF_SDK_001b.png) Optical flow can also be used very effectively for interpolating or extrapolating video frames in real time. This can be useful in improving the smoothness of video playback, generating slow-motion videos, or reducing the apparent latency in VR experience. Optical Flow functionality in Turing and Ampere GPUs accelerates these use cases by offloading the intensive flow vector computation to a dedicated hardware engine on the GPU silicon, thereby freeing up GPU and CPU cycles for other tasks. This functionality in hardware is independent of CUDA cores. [Learn More About Video Frame Interpolation and Extrapolation](https://docs.nvidia.com/video-technologies/optical-flow-sdk/nvfruc-programming-guide/index.html#frame-interpolation) * * * ## Videos & Webinars ## Additional Resources ![](https://developer.download.nvidia.com/images/newspaper-icon.svg)[Read DevBlog:An Introduction to the Optical Flow SDK](https://devblogs.nvidia.com/an-introduction-to-the-nvidia-optical-flow-sdk/) ![](https://developer.download.nvidia.com/images/newspaper-icon.svg)[Read DevBlog: AV1 Encoding and NvOFFRUC: Video Performance Boosts and Higher Fidelity on the NVIDIA Ada Architecture](https://developer.nvidia.com/blog/av1-encoding-and-fruc-video-performance-boosts-and-higher-fidelity-on-the-nvidia-ada-architecture/) ![](https://developer.download.nvidia.com/images/newspaper-icon.svg)[Read DevBlog: Harnessing the NVIDIA Ada Architecture for Frame-Rate Up-Conversion in the NVIDIA Optical Flow SDK](https://developer.nvidia.com/blog/harnessing-the-nvidia-ada-architecture-for-frame-rate-up-conversion-in-the-nvidia-optical-flow-sdk/) ![](https://developer.download.nvidia.com/images/file-icon.svg)[View Document: Optical Flow Documentation](https://docs.nvidia.com/video-technologies/optical-flow-sdk/index.html) ![](https://developer.download.nvidia.com/images/newspaper-icon.svg)[Github: NVIDIA Optical Flow in OpenCV](https://github.com/opencv/opencv_contrib) Get started developing with the Optical Flow SDK. [Download Now](https://developer.nvidia.com/opticalflow/download) --- # Source: https://developer.nvidia.com/rtx/ray-tracing/optix.md Join this fireside chat with NVIDIA CEO Jensen Huang at SIGGRAPH on July 29. [Register Now](https://www.nvidia.com/en-us/events/siggraph/) # NVIDIA OptiX™ Ray Tracing Engine An application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Bring the power of NVIDIA GPUs to your ray tracing applications with programmable intersection, ray generation, and shading. [Get Started](https://developer.nvidia.com/designworks/optix/download) ![](https://developer.download.nvidia.com/assets/optix/optix_main.jpg) _Image courtesy of Chaos - RTX Mode_ ## Ray Tracing Programmable GPU-accelerated Ray-Tracing Pipeline, single-ray shader programming model using C++, and ray Tracing acceleration using RT Cores. ## Scalability Optimized for current and future generations of NVIDIA GPU architectures. Transparently scales across multiple GPUs, and can combine GPU memory over NVLink for large scenes. ## Ease of Integration Free for commercial use. Nsight Compute 2019.4 & NsightVSE 2019.3, debugger OptiX application profiling support. ## Shader Execution Reordering (SER) SER is a performance optimization that unlocks the potential for better ray and memory coherency in ray tracing shaders. [Learn more about SER](https://developer.nvidia.com/blog/improve-shader-performance-and-in-game-frame-rates-with-shader-execution-reordering/) ## AI-Accelerated Denoiser Included with OptiX, the AI-Accelerated Denoiser is a new post-processing feature to denoise images reducing the need for rendering iterations. This denoiser is based on a paper published by NVIDIA research [“Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder”](https://research.nvidia.com/publication/2017-07_interactive-reconstruction-monte-carlo-image-sequences-using-recurrent) It uses GPU-accelerated artificial intelligence to dramatically reduce the time to render a high fidelity image that is visually noiseless. [Learn More](https://developer.nvidia.com/optix-denoiser) [![Optimized AI models for Domain Specific Tasks](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/optix-isotropix-1280x620.png)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/optix-isotropix-1280x620.png) ## Partners From film and games to design and scientific visualization, OptiX has been successfully deployed in a broad range of commercial applications. These applications range from rendering software to scientific visualization (including Gordon Bell Award finalists), defense applications, audio synthesis, and computing lightmaps for games. [![altair Render](https://developer.download.nvidia.com/assets/optix/altair_done.png)](https://www.thearender.com) ##### Thea Render × [![](https://developer.download.nvidia.com/assets/optix/altair_done.png)](https://www.thearender.com) **Thea Render** is a physically-based global illumination renderer of high quality. It is a unique renderer that is able to render using state-of-the-art techniques in biased photorealistic, unbiased and GPU modes. Thea Render comes with its own standalone application (Studio) with various tools, material editor and advanced staging operations along with integration (plugins) on various popular modeling solutions. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/thea_with.png) ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/thea_without1.png) > Altair® Thea Render® v2.0 integrates NVIDIA® OptiX™ denoiser, dramatically accelerating production of final renders. Users can take advantage of this optimized workflow, creating out-of-the-box, stunning photorealistic images in a fraction of previous render times. > > Dr. Ing. Ioannis Pantazopoulos, VP Rendering Technology, **Altair** [Learn more about Thea Render](https://www.thearender.com/features/) [![Autodesk Arnold](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-autodesk.png)](https://www.autodesk.com/) ##### Autodesk Arnold × [![Autodesk Arnold](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-arnold_001.png)](https://www.autodesk.com/) Arnold is an advanced Monte Carlo ray tracing renderer. It is designed for artists and built for the demands of modern animation and visual effects production. It is available as a standalone renderer on Linux, Windows and Mac OS X, with plug-ins for Maya, 3ds Max, Houdini, Cinema 4D, and Katana. With an integrated OptiX denoiser, Arnold takes advantage of NVIDIA AI tech for accelerated interactive rendering. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/optiX-denoiser_Arnold-kitchen.PNG) > The OptiX Denoiser is an invaluable option for interactive workflows in Arnold. The artist can create and move around geometry and lights and get immediate noise-free visual feedback, even for challenging rendering scenarios. > > Frederic Servant, Arnold Development Manager, **Autodesk** [Learn more about Arnold](https://www.autodesk.com/products/arnold/overview) [![Cebas finalRender](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/logo-cebas.png)](https://www.cebas.com/) ##### cebas finalRender × [![Cebas finalRender](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/logo-cebas.png)](https://www.cebas.com/) Cebas Visual Technology, founded in Heidelberg, Germany and headquartered in Victoria, BC Canada, has been developing 3dsMax plugins for visual technology since 1988. Following the launch of our latest finalRender trueHybrid™, cebas' mission as always, is dedicated to getting the most sophisticated renderer into the hands of the artists affordably by incorporating latest NVIDIA GPU technology combined with cebas CPU enhancements, to achieve a powerful as well as an unique mix of processing power. Our new finalRender's latest addition is the NVIDIA's OptiX 5.0 AI Denoiser feature. Users can expect ongoing innovative updates as finalRender progresses. ![Cebas finalRender](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/Before_after_Ai-denoiser.png) _This image shows the OptiX AI-Denoiser running in finalRender at 100 samples after only 45 seconds of rendering._ > Our very first integration tests revealed right from the start that NVIDIA has created an exceptional piece of software engineering by combining the power of AI and their powerful GPU hardware to surmount what has bothered every single GPU software developer for years - Noise in the image. The use of AI Neuronal Network technology in OptiX 5.0 to enhance the process of denoising and cebas' engineering work on finalRender's trueHybrid™ technology offers a bright future towards higher quality photo-realistic images in much lesser time. > > Edwin Braun, CEO & Co-founder, **Cebas Visual Technology** [Learn more about finalRender](https://www.cebas.com/) [![Chaos Group Vray](https://developer.download.nvidia.com/assets/optix/chaos_new.png)](https://www.chaosgroup.com/) ##### Chaos Group Vray × [![Chaos Group Vray](https://developer.download.nvidia.com/assets/optix/chaos_new.png)](https://www.chaosgroup.com/) Chaos Group is a worldwide leader in computer graphics. They create the technology that helps artists and designers create photoreal imagery and animation for design, television, and feature films. Their physically-based rendering and simulation software is used daily by top design studios, architectural firms, advertising agencies, and visual effects companies around the globe. Their research and development in cloud rendering, material scanning, and virtual reality is shaping the future of creative storytelling and digital design. ![Chaos Group Vray](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/CGvrayblog.png) > We’re finding the NVIDIA denoising results to be very impressive on interactive scenes, giving artists a much quicker estimate of what their final result will look like. We believe this will speed the creative process while using our upcoming V-Ray GPU. > > Vlado Koylazov, founder, **Chaos Group** [Learn more about Vray](https://www.chaosgroup.com/) [![SOLIDWORKS Visualize](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-solidworks.png)](https://www.solidworks.com/category/visualization) ##### SOLIDWORKS Visualize × [![SOLIDWORKS Visualize](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-solidworks.png)](https://www.solidworks.com/category/visualization) **SOLIDWORKS® Visualization products** (formerly known as Bunkspeed) provide a suite of standalone software tools that combine industry-leading rendering capabilities with design-oriented features and workflows that enable easy and fast creation of visual content for designers, engineers, marketing, and other content creators. Import SOLIDWORKS, Autodesk Alias®, Rhino®, SketchUp® and many other CAD formats to create compelling scenes and ultimately the most realistic content possible. [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/denoiser_cover-image.png)](http://blogs.solidworks.com/solidworksblog/2018/02/introducing-new-artificial-intelligence-denoiser.html)[Learn more SOLIDWORKS Visulaize](https://www.solidworks.com/category/visualization) News: Blog: [Introducing the New Artificial Intelligence Denoiser](http://blogs.solidworks.com/solidworksblog/2018/02/introducing-new-artificial-intelligence-denoiser.html) Blog: [From Great Idea to Amazing Product: SOLIDWORKS and NVIDIA Power AI, VR and Virtualized Workflows](https://blogs.nvidia.com/blog/2018/02/05/solidworks-world-2018/) [![ESI IC.IDO](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-icido.png)](https://www.esi-group.com/) ##### ESI IC.IDO × [![ESI IC.IDO](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-icido.png)](https://www.esi-group.com/) **ESI Group** is a leading innovator in Virtual Prototyping software and services. ESI | IC.IDO provides a Human Centric digital mock-up environment that enables individual engineers as well as teams to explore, experience, validate, and collaborate to resolve complex integration scenarios at the intersection between product function, human interaction and assembly/service requirements. > “We adopted OptiX for ray tracing in IC.IDO. It was incredibly easy to integrate and offers amazing speed and performance with NVIDIA GPUs, this frees our engineering team to focus their time and talents on developing new features for our Virtual Engineering enterprise customers. Offering a unified visualization and physical simulation experience in VR gives users the ability to interact with their products and processes in ways previously only possible with full scale physical prototypes.” > > Dr. Christian Odaker, Director of R&D, Immersive Experience at ESI Group [Learn more about IC.IDO](https://virtualreality.esi-group.com) [![Foundry Modo](https://developer.download.nvidia.com/assets/optix/foundry_done.png)](https://www.foundry.com/products/modo) ##### Foundry Modo × [![Foundry Modo](https://developer.download.nvidia.com/assets/optix/foundry_done.png)](https://www.foundry.com/products/modo) Modo’s powerful and flexible 3D modeling, texturing and rendering toolset empowers artists to explore and develop ideas without jumping through technical hoops. Modo® is your starting point for creative exploration. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/Modo122_Optix_ArchViz_001.jpg) **Without Denoiser | With Denoiser** ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/Modo122_Optix_RimDeNoise_002.jpg) **Without Denoiser | With Denoiser** > Modo artists can now expect to see GPU Accelerated Rendering with OptiX™ from NVIDIA®. The integrated enhancement provides incredible speed and quality for their product and arch-viz creations. Developed and integrated in collaboration with NVIDIA®’s award-winning engineering team, OptiX™ has been optimized specifically for Modo 12.2 and future releases. > > Shane Griffith, Senior Product Manager, **Foundry** [Learn more about Modo](https://www.foundry.com/products/modo) [![NVIDIA Iray](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-iray.png)](https://www.nvidia.com/en-us/design-visualization/iray) ##### NVIDIA Iray × [![NVIDIA Iray](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-iray.png)](https://www.nvidia.com/en-us/design-visualization/iray) **NVIDIA Iray** employs OptiX technology for optimal performance in both its path tracing and ray tracing render modes. Iray is a state of the art, yet easy to use, photorealistic rendering solution provided as an SDK for seamless integration into custom tools and within industry-leading products from the likes of Dassault Systemes and Siemens PLM. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/image07.png)[Learn more about Iray](https://www.nvidia.com/en-us/design-visualization/iray) [![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-optis1.png)](http://www.optis-world.com/) ##### OPTIS × [![OPTIS](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-optis1.png)](http://www.optis-world.com/) **OPTIS** , the virtual prototyping company, brings life and emotion to all industrial projects. Its world-leading solutions pave the way for a revolutionary design process: towards zero physical prototypes. Since 1989, OPTIS offers its know-how in light and human vision simulation into leading CAD/CAM software and dedicated immersive virtual solutions. This synergy creates true-to-life virtual mockups which are used as real decision-making tools. Today, more than 2,500 clients in over 50 countries already trust OPTIS and innovate day after day with its solutions to ensure the look and safety of their designs, reduce their ecological footprint and bring their future products faster on the market. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Optis003.png) > “We use powerful NVIDIA GPU technologies, like the new Quadro GV100 to accelerate our simulation applications and algorithms, and NVIDIA OptiX for fast AI-based rendering. Looking ahead, we’re excited about the potential NVIDIA RTX ray-tracing technology holds to deliver more lifelike images faster than ever,” said Jacques Delacour, CEO and founder of OPTIS. [Learn more about SPEOS (Bright Light and Appearance Simulation)](https://www.optis-world.com/product-offering-light-simulation-virtual-reality-software/SPEOS)[Learn more about Theia RT (Real-time Color and Material Evaluation)](https://www.optis-world.com/product-offering-light-simulation-virtual-reality-software/theia-rt-software-evaluation-vizualization-real-time)[Learn more about Optis](http://www.optis-world.com/) [![Pixar’s Flow Material Editing Tool](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-pixar1.png)](https://example.com) ##### Pixar’s Flow Material Editing Tool × ![Pixar’s Flow Material Editing Tool](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-pixar1.png) **Pixar Animation Studio's** new material editing tool "Flow" enables their artists to interactively edit rich, complex shading networks. Flow provides live real-time feedback with full, multi-bounce progressive ray tracing using OptiX. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/image02.png) _Pixar Flow material editing tool. Image courtesy of Pixar Animation Studios_ [Watch SIGGRAPH talk on OptiX integration in Flow](http://on-demand.gputechconf.com/siggraph/2015/video/SIG515-Danny-Nahmias.html) [![Redshift](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/logo-redshift.png)](https://www.redshift3d.com/) ##### Redshift × [![Redshift](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/logo-redshift.png)](https://www.redshift3d.com/) Redshift Rendering Technologies Inc was founded in early 2012 in Newport Beach, California with the goal of developing a production-quality, GPU-accelerated renderer with support for the biased global illumination techniques that until now have remained squarely in the CPU-only domain. https://www.youtube-nocookie.com/embed/2vJ_5nPVU0s?&loop=1&playlist=2vJ_5nPVU0s > With OptiX 5.0, NVIDIA continues to lead the way for the use of AI in rendering for design, character generation and the creation of virtual worlds. Integration of OptiX 5.0 was a no-brainer for us — being both easy and free, it turbocharges the creative process and improves productivity for our users. > > Panos Zompolas, chief technology officer and co-founder, **Redshift** [Learn more about Redshift](https://www.redshift3d.com/product) [![SideFX Houdini](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/logo-sidefx_001.png)](https://www.sidefx.com/products/houdini/) ##### SideFX Houdini × [![SideFX Houdini](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/logo-sidefx_001.png)](https://www.sidefx.com/products/houdini/) Houdini is a 3D animation software application developed by SideFX, based in Toronto. SideFX adapted Houdini from the PRISMS suite of procedural generation software tools. Its exclusive attention to procedural generation distinguishes it from other 3D computer graphics software. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/houdini_denoiser.png) _result of denoiser shown on the left half of image above_ [Learn more about the Houdini denoiser integration](http://www.sidefx.com/docs/houdini/render/optixdenoiser.html) [![Unity](https://developer.download.nvidia.com/assets/optix/unity_done.png)](https://unity.com/srp/High-Definition-Render-Pipeline) ##### Unity × [![Unity](https://developer.download.nvidia.com/assets/optix/unity_done.png)](https://unity.com/srp/High-Definition-Render-Pipeline) Unity is the creator of the world’s leading real-time 3D development platform, giving users the most powerful and accessible tools to quickly create, easily operate, and fully monetize amazing immersive and interactive experiences. Unity empowers anyone, regardless of skill level and industry, to maximize their success. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/Denoiser/UnityOptixDenoise.png) _By using AI denoising, artists can spend more time creating beautiful renderings than battling noise. Robust denoising means that creators can focus on iterating the artwork, thereby optimizing workflow efficiency and output quality._ [Learn more about Unity](https://unity.com/srp/High-Definition-Render-Pipeline) [![Visual Molecular Dynamics (VMD)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-vmd.png)](http://www.ks.uiuc.edu/Research/vmd/) ##### Visual Molecular Dynamics (VMD) × [![Visual Molecular Dynamics (VMD)](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/optix/logo-action-vmd.png)](http://www.ks.uiuc.edu/Research/vmd/) **Visual Molecular Dynamics (VMD)** is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. VMD’s preferred rendering mode for both viewport and final render is OptiX, with full VCA support available. The OptiX path renders the highest visual quality and even has a frame rate five times higher than OpenGL on massive datasets. https://www.youtube-nocookie.com/embed/6hKq5A __yrY?&loop=1&playlist=6hKq5A__ yrY [Learn more about VMD](http://www.ks.uiuc.edu/Research/vmd/) [ ![Maverick](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/maverickrender-logo-1800x512-black.png)](https://maverickrender.com/) ##### Maverick Studio × [![Maverick](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/maverickrender-logo-1800x512-black.png)](https://maverickrender.com/) Maverick Render consists of 3 GPU-accelerated applications that will help designers efficiently render photo-realistic 3D models and analyze how performant their GPUs are. **Maverick Studio** is a very easy-to-use product rendering application that harnesses Maverick’s proprietary light simulation technology with interactive and intuitive drag-and-drop tools. **Maverick Indie** , the baby brother of Maverick Studio, is an ideal tool to import 3D models or materials and render highly photo-real presentations and turntables. **Maverick Benchmark** is a very consistent and reliable free-to-use tool that stress-tests Nvidia GPUs to tell how performant they are. ![maverick.jpg](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/maverick.jpg)[Learn more about Maverick](https://maverickrender.com/) [ ![OTOY](https://developer.download.nvidia.com/images/otoy_logo_vector_horizontal.png)](https://home.otoy.com/render/octane-render/) ##### Otoy × [![OTOY](https://developer.download.nvidia.com/images/otoy-logo-vector.svg)](https://home.otoy.com/render/octane-render/) Octane by OTOY is the first and fastest GPU accelerated unbiased path-traced render engine and features OptiX™ 7 ray-tracing hardware acceleration powered by NVIDIA RTX. OptiX RTX GPU hardware acceleration enables users to see 2-3x speed increases in many scenes using RTX and up to 15-30x accelerations in fully RTX optimized scenes. ![otoy.jpg](https://developer.download.nvidia.com/images/otoy-octane-cornerstore.png)[Learn more about Octane](https://home.otoy.com/render/octane-render/) ## OptiX in the News ![OptiX 8 Release](https://developer.download.nvidia.com/assets/optix/optix_news.png) ### NVIDIA releases OptiX 8 A flexible and powerful ray tracing solution for the Media and Entertainment Industry. [Learn More](https://developer.nvidia.com/blog/flexible-and-powerful-ray-tracing-with-optix-8/) ![What’s New in Optix](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/optix/new_in_optix.png) ### What’s New in OptiX Catch up with the latest additions to the OptiX SDK and learn tips and tricks on how best to implement them into your products. [Learn More](https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31736/) ![OptiX Advanced Topics](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/optix/optix_news_thumbnail_2.jpg) ### OptiX Advanced Topics GTC 2021 Session Join Senior Software Engineer, David Hart for a deep dive into the OptiX Curves API and learn best practices for how to best optimize your applications. [Learn More](https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31752/) ## Resources https://www.youtube-nocookie.com/embed/Z7QsPb7YWjc - [**How to get started with OptiX 7**](https://devblogs.nvidia.com/how-to-get-started-with-optix-7/) - [Documentation](https://raytracing-docs.nvidia.com/) - [Learn more about the AI-accelerated denoiser](/optix-denoiser) - [Developer Forum](https://devtalk.nvidia.com/default/board/90/) - [GTC on Demand](http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php?searchByKeyword=optix&searchItems=&sessionTopic=&sessionEvent=&sessionFormat=&submit=&select=+) - [OptiX GPU Ray Tracing ACM paper](http://cacm.acm.org/magazines/2013/5/163758-gpu-ray-tracing/fulltext) Ready to get started developing with OptiX? [Get Started](https://developer.nvidia.com/designworks/optix/download) --- # Source: https://developer.nvidia.com/drive/os.md # NVIDIA DriveOS SDK NVIDIA DriveOS™ is an automotive operating system developed with industry-standard safety and security methodologies certified by the globally renowned automotive certification organization, TÜV SÜD. This powerful platform is designed to meet cutting-edge automotive product requirements, including advanced AI inference, high-performance computer vision, advanced graphics, high-end audio, and complex safety and security use cases. DriveOS also supports autonomous driving and AI-powered cockpit experiences. DriveOS enables Linux or QNX as the application operating system with a software development kit for image processing, diverse sensor integration, AI acceleration, low-overhead interprocess communication, and developer tools for debugging and profiling. [Download DriveOS](/drive/downloads) _Requires [NVIDIA DRIVE AGX™ SDK Developer Program](https://developer.nvidia.com/drive/agx-sdk-program) membership_ ![DRIVE OS Stack Diagram](https://developer.download.nvidia.com/images/drive/auto-drive-agx-development-kit-3647451-v2.svg) * * * ## Benefits ### ![Decorative](https://developer.download.nvidia.com/images/drive/m48-coding-256px-blk.png) ### Programmability Enables a smooth transition from cloud or workstation to the SoC - Supports NVIDIA® CUDA® and TensorRT®, widely adopted programming models with a large developer base. - Supports unified APIs, from cloud to target. ### ![Decorative](https://developer.download.nvidia.com/images/drive/m48-containerized-security-256px-blk.png) ### Safety and Reliability Designed for safety-critical applications - Complies with ASPICE, ISO 26262, and ISO/SAE 21434. - Supports heterogeneous redundancy for compute workloads. ### ![Decorative](https://developer.download.nvidia.com/images/drive/m48-speed-256px-blk.png) ### Highly Optimized Efficient processing of time-critical workloads - Camera frames are directly loaded into GPU memory for high-performance sensor interfacing and processing with NvMedia. - Supports NvStreams for efficient data transport, with zero-copy data transfer between hardware accelerators. - Enables highly flexible sensor-processing pipelines. ### ![Decorative](https://developer.download.nvidia.com/images/drive/m48-accelerate-computing-with-cuda-c-c-256px-blk(1).png) ### Accelerated Application Development Building blocks and tools for automotive software - [DriveWorks](https://developer.nvidia.com/drive/driveworks) provides optimized algorithms and tools. - NVIDIA Nsight provides tools for debugging, profiling, and tracing. - Offers rich [documentation](https://docs.nvidia.com/drive) including numerous samples to demonstrate features, [webinars, and GTC talks](/drive/training) to get familiar with DriveOS. ### ![Decorative](https://developer.download.nvidia.com/images/drive/m48-nim-256px-blk.png) ### Virtualization and Containerization Comprehensive scalability and isolation features - Hypervisor manages resources and provides abstraction between underlying hardware and OS on top. - Guest operating systems provide additional isolation and redundancy. - Host and target side Docker container support enable efficient development. --- # Source: https://developer.nvidia.com/physx-sdk.md 1. [Home](/) [Omniverse](/omniverse) NVIDIA PhysX 2. - - - # NVIDIA PhysX Modeling digital twins of factories, cities, or entire worlds involves complex simulations that must accurately replicate the physical behavior of objects and systems, which are foundational in building [physical AI](https://www.nvidia.com/en-us/glossary/generative-physical-ai/) for powering the next generation of autonomous systems. NVIDIA PhysX® is a powerful, open-source multi-physics SDK that provides scalable simulation and modeling capabilities for robotics and autonomous vehicle applications. [Get PhysX in Omniverse ](/omniverse "Get PhysX in Omniverse")[Get PhysX Source on GitHub ](https://github.com/NVIDIA-Omniverse/PhysX "Get PhysX on GitHub")[Documentation ](https://docs.omniverse.nvidia.com/extensions/latest/ext_physics.html#overview "Documentation") ![NVIDIA PhysX SDK brings high-performance and accuracy to industrial simulation use cases](https://developer.download.nvidia.com/images/ov-physX-web-800x451.gif) PhysX based simulation of robots screwing nuts onto bolts, in Omniverse. * * * ## Key Benefits of PhysX ### Open-Source PhysX is fully open-source, enabling free use and sharing of your work with other developers, as well as contributing your innovative developments back to the community. ### Unified Model FEM soft body, cloth, particles, and fluid simulation with two-way coupled interactions under a unified solver framework. ### Scalable Runs on various platforms, from mobile CPUs to high-end GPUs, with a GPU API supporting end-to-end reinforcement learning via NVIDIA Isaac™ Lab. ### Accurate Enhanced simulation stability with robust collision detection, stacking, and joints, while also supporting momentum conservation and gyroscopic forces. * * * ## PhysX Features ### Rigid Body Dynamics Analyze multi-body dynamics under external forces like gravity. PhysX offers scalable rigid-body simulation for CPU and GPU, with proven industry performance. ![PhysX Feature - Scene Query](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/nvidia-physx-scene-query.jpg) ### Scene Query Conduct spatial queries in simulated environments using the PhysX raycast, overlap, and sweep functions with customizable filtering. ### Joints Utilize a suite of common built-in joint types and support custom joints in PhysX through a flexible callback mechanism. ### Reduced Coordinate Articulations Reduced coordinate articulations provide a linear-time, guaranteed joint-error-free simulation of a tree of rigid bodies. ### Vehicle Dynamics Study the movement of multi-body interactions under external forces, such as gravity. PhysX provides industry-proven scalable rigid body simulation on both CPU and GPU. ### Character Controllers The kinematic character controller in PhysX permits an avatar to navigate a simulated world and supports both static and dynamically simulated bodies. ### Soft Body Dynamics Finite Element Method (FEM) simulation of soft bodies allows for accurate and efficient models of elastic deformable bodies. ### SDF Colliders A new Signed Distance Field based collision representation allows PhysX to simulate non-convex shapes like gears and cams without convex decomposition. ### Position-Based Dynamics Position-Based Dynamics provide a flexible framework for simulating a wide range of phenomena including liquids, granular materials, cloth, rigid bodies, deformable bodies, and more. ### Custom Geometry Utilize a wide range of built-in geometries and a flexible callback mechanism to allow custom geometry types into the simulation. ### Fracture & Destruction Blast, available in PhysX, is a destruction and fracture library designed for performance, scalability, and flexibility. ### Smoke & Fire Flow enables realistic combustible fluid, smoke, and fire simulations. Flow is part of the PhysX SDK. * * * #### Newton, the Next-Generation Open-Source Physics Simulation Engine Newton is an open-source, GPU-accelerated, and extensible physics engine, co-developed by Google DeepMind and Disney Research, and [managed by the Linux Foundation](https://www.linuxfoundation.org/press/linux-foundation-announces-contribution-of-newton-by-disney-research-google-deepmind-and-nvidia-to-accelerate-open-robot-learning). Built on NVIDIA Warp and OpenUSD, Newton is optimized for robotics and compatible with learning frameworks such as MuJoCo Playground or NVIDIA Isaac Lab. [Newton Beta](https://github.com/newton-physics) is now available to use. [Get Started on Newton](/newton-physics) ![](https://developer.download.nvidia.com/images/isaac/newton-ari.jpg) * * * ## PhysX Implementation | | CPUs | NVIDIA GPUs | | --- | --- | --- | | Character Controllers | | | | Custom Geometries | | | | Fracture and Destruction - PhysX Blast | | | | Joints | | | | PBD (liquid/cloth/inflatable/shape matching) | | | | Reduced Coordinate Articulations | | | | Rigid Body Dynamics | | | | Scene Queries | | | | Smoke and Fire - PhysX Flow | | | | Soft Body Dynamics (Finite Element Method) | | | | Vehicle Dynamics | | | * * * ## See PhysX in Action * * * ## PhysX News * * * ## Resources - [Read Documentation ](https://docs.omniverse.nvidia.com/) - [Access Training](https://www.nvidia.com/en-us/training/online/?activetab=ctabs-5) - [Connect on our Forums ](https://forums.developer.nvidia.com/c/omniverse/300/) - [Join the NVIDIA Developer Program ](/developer-program) * * * ## More Resources ![Decorative image representing Developer Newsletter](https://developer.download.nvidia.com/icons/m48-document-support-guide-2.svg) ### Read Documentation ![Decorative image representing Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Connect on Our Forums ![Decorative image representing Developer Newsletter](https://developer.download.nvidia.com/icons/m48-deep-learning-institute-talks-training-256px-blk.png) ### Access Training ![Decorative image representing Developer Program](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program ## PhysX in NVIDIA Omniverse PhysX is the primary physics engine of [NVIDIA Omniverse™](https://developer.nvidia.com/omniverse), a platform of APIs and SDKs for building complex 3D and industrial digitalization workflows based on [OpenUSD](https://www.nvidia.com/en-us/omniverse/usd/). NVIDIA [Isaac Sim](https://developer.nvidia.com/isaac/sim)™ and [Isaac Lab](https://isaac-sim.github.io/IsaacLab/) are two robotics reference applications built on Omniverse, which enable developers to build physical AI to power next-generation robots. [Explore Omniverse](/omniverse) --- # Source: https://developer.nvidia.com/industries/public-sector.md 1. [Industries](/industries/overview) 2. Public Sector # Developer Resources for the Public Sector NVIDIA technologies for use cases across generative AI, cybersecurity, digital twins, and remote sensing. Use Cases: - Generative AI - Cybersecurity - Robotics and Autonomy - Simulation and Training - Remote Sensing ## Get Started With Public Sector Use Cases Generative AI Cybersecurity Robotics and Autonomy Simulation and Training Remote Sensing **Blueprint** ### Autonomous Orchestration of Agentic Workflows Federal agencies are automating the extraction and synthesis of insights from multimodal enterprise data to deliver real-time situational awareness and faster, more informed decision-making. Autonomous workflows enhance productivity with critical information advantages and reduce cognitive load. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline) **Blueprint** ### Customized and Streamlined Public Service Delivery Organizations are using AI-powered chatbots, virtual assistants, and digital humans to deliver personalized government information and support services to citizens. Automated digital workflows improve efficiency and provide more responsive, tailored engagement with constituents. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/ai-virtual-assistant-for-customer-service) **Blueprint** ### Agentic AI Automation in Public Services Generative AI is being used to automate administrative tasks, summarize documents, and improve internal communications, accelerating decision-making and streamlining service delivery. AI tools enable agencies to personalize content and citizen interactions, resulting in more engaging and responsive public services. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/aiq) **Blueprint** ### Vulnerability Analysis for Container Security Generative AI can improve vulnerability defense while decreasing the load on security teams. Using NVIDIA NIM™ microservices and the Morpheus cybersecurity AI SDK, the NIM Agent Blueprint accelerates CVE analysis at enterprise scale, reducing time to assess from days to just seconds. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/vulnerability-analysis-for-container-security) **Blueprint** ### Safety for Agentic AI As the number of connected users and devices grows, organizations are producing more data than they can manage, increasing cybersecurity risks. Learn how to improve safety, security, and privacy of AI systems at build, deploy, and run stages. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/safety-for-agentic-ai) **Training** ### Building AI-Based Cybersecurity Pipelines In this course, you’ll learn to build Morpheus pipelines to process and perform AI-based inference on massive amounts of data for cybersecurity use cases in real time. Explore how to utilize several AI models with a variety of data input types for tasks like sensitive information detection, anomalous behavior profiling, and digital fingerprinting. [Enroll in the DLI Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+C-DS-03+V1) **Blueprint** ### Synthetic Manipulation Motion Generation for Robotics Autonomous vehicles and robotics are enhancing operations, improving safety, and increasing efficiency in complex and changing environments. By using simulation-driven development and edge AI processing, agencies can streamline logistics, manufacturing, and service delivery with minimal risk and greater adaptability. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/isaac-gr00t-synthetic-manipulation) **Blueprint** ### Test Multi-Robot Fleets for Industrial Automation Humanoid robots are streamlining logistics and manufacturing by automating tasks like material transport, inventory management, and assembly. These robots deliver real-time data, adapt to changing environments, and improve safety and efficiency, while reducing costs and workforce strain. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/mega-multi-robot-fleets-for-industrial-automation/blueprintcard) **Training** ### Synthetic Data Generation for Perception Model Training in Isaac Sim In this course, you’ll learn how to train and deploy a perception model using synthetic data generation (SDG) for dynamic robotic tasks. Learn how to analyze the role of perception models, use simulation for SDG, apply domain randomization techniques, and evaluate the effectiveness of a trained model. [Enroll in the DLI Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-30+V1) **Blueprint** ### Build Digital Twins for AI Factory Design and Operations Virtual factory integration enables the development of OpenUSD-based tools and data pipelines to speed up operations and create new digital manufacturing opportunities. This includes layout planning, process simulation, robotics, and monitoring—all enhanced by AI for tasks like multi-camera tracking. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/digital-twins-for-ai-factories/blueprintcard) **Blueprint** ### Enhance Predictive Maintenance Accelerated data processing and efficient ETL workflows enable predictive maintenance for faster, smarter decisions and lower costs. The optical inspection of public infrastructure is critical in accelerating time to insight and improving the assessment of maintenance needs. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/video-search-and-summarization) **Training** ### Extend Omniverse Kit Applications for Building Digital Twins In this hands-on course, you’ll learn how to develop OpenUSD-based digital twin applications on the Omniverse™ platform, focusing on data aggregation, interactivity, and physics simulation. [Enroll in the DLI Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-13+V1) **Blueprint** ### Visualization and Simulation of Geospatial Environments GPU-accelerated pipelines in the cloud rapidly process massive satellite and sensor data, enabling faster, more informed decisions for geospatial intelligence. These advanced AI and computer vision workflows enhance visualization and simulation for digital twins, autonomous systems, sensor processing, and more. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/earth2-weather-analytics) **Blueprint** ### Air Traffic Management Systems Accelerated Signal Processing Accelerated radar and signal processing enables real-time analysis in air traffic management. These enhancing signal processing frameworks equip controllers with the timely, detailed insights needed to manage complex and ever-increasing traffic volumes. [Explore the NVIDIA AI Blueprint](https://build.nvidia.com/nvidia/digital-twins-for-fluid-simulation) **Training** ### Disaster Risk Monitoring Using Satellite Imagery Learn how to build and deploy a deep learning model to automate the detection of flood events using satellite imagery. This workflow can be applied to lower the cost, improve the efficiency, and significantly enhance the effectiveness of various natural disaster management use cases. [Enroll in the DLI Course](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-ES-01+V1) ## Explore Tools and Technologies for Public Sector **NVIDIA NeMo** ### Build, Customize, and Deploy Generative AI Models NVIDIA NeMo™ is an end-to-end platform for developing custom generative AI—including large language models (LLMs), vision language models (VLMs), retrieval models, video models, and speech AI—anywhere. [Get Started With NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) **NVIDIA Morpheus** ### Identify Cybersecurity Threats in Near Real Time NVIDIA Morpheus is a GPU-accelerated, end-to-end AI framework that enables developers to create optimized applications for filtering, processing, and classifying large volumes of streaming cybersecurity data. [Get Started With Morpheus](/morpheus-cybersecurity) **NVIDIA Omniverse** ### Create Physically Accurate World-Scale Simulations NVIDIA Omniverse™ is a modular development platform of SDKs, APIs, and microservices for building 3D applications and services powered by Universal Scene Description (OpenUSD) and NVIDIA RTX™. [Get Started With Omniverse](/nvidia-omniverse-platform) **NVIDIA Holoscan** ### Enhance High-Compute Applications NVIDIA Holoscan is a domain-agnostic AI computing platform that delivers the accelerated, full-stack infrastructure required for scalable, software-defined, and real-time processing of streaming data running at the edge or in the cloud. [Get Started With Holoscan](/holoscan-sdk) **NVIDIA Isaac Sim** ### Accelerate AI Robotic Development NVIDIA Isaac Sim™ is a reference application built on NVIDIA Omniverse that enables developers to simulate and test AI-driven robotics solutions in physically based virtual environments. [Get Started With Isaac Sim](/isaac/sim) **NVIDIA Metropolis** ### Build, Deploy, and Scale Vision AI Agents and Applications NVIDIA Metropolis is an advanced collection of developer workflows and tools to build, deploy, and scale vision AI and generative AI from the edge to the cloud. [Get Started With Metropolis](/metropolis) ## Browse by Resource Type Sessions Blogs ![img-alt-text](https://developer.download.nvidia.com/images/public-sector/llm-tech-blog-gtc24-nemo-retriever-1920x1080.jpg) ### Translate Your Enterprise Data Into Actionable Insights With NVIDIA NeMo Retriever Learn how NVIDIA NeMo Retriever uses GPU-accelerated microservices for fast semantic search and retrieval-augmented generation, helping enterprises extract insights from complex, unstructured data. [Read Blog](https://developer.nvidia.com/blog/translate-your-enterprise-data-into-actionable-insights-with-nvidia-nemo-retriever/) ![img-alt-text](https://developer.download.nvidia.com/images/public-sector/robot-arm-code-ros-sim-1920-1080.jpg) ### A Beginner’s Guide to Simulating and Testing Robots With ROS 2 and NVIDIA Isaac Sim In this blog, learn how to simulate and validate a robot stack by leveraging ROS 2 packages with NVIDIA Isaac Sim, a reference application built on the NVIDIA Omniverse platform. Explore use cases that Isaac Sim can unlock for AI-enabled robots. [Read Blog](https://developer.nvidia.com/blog/a-beginners-guide-to-simulating-and-testing-robots-with-ros-2-and-nvidia-isaac-sim/) ![img-alt-text](https://developer.download.nvidia.com/images/public-sector/synctwin-digital-twin-openusd-1920-1080.jpg) ### Transforming Microsoft XLS and PPT Files Into a Factory Digital Twin With OpenUSD SyncTwin has developed an app on NVIDIA Omniverse that uses OpenUSD to enable its customers to create digital twins of their facilities using data from Microsoft PowerPoint, Excel, PDF, and other non-3D data formats. SyncTwin is integrating generative AI to enable customers to create and modify digital twins with simple text prompts. [Read Blog](https://developer.nvidia.com/blog/transforming-microsoft-xls-and-ppt-files-into-a-factory-digital-twin-with-openusd/) ![img-alt-text](https://developer.download.nvidia.com/images/public-sector/metropolis-iva-mtmc-camera-calibration-blog-press-3397253-1920x1080.jpg) ### Real-Time Vision AI From Digital Twins to Cloud-Native Deployment With NVIDIA Metropolis Microservices and NVIDIA Isaac Sim Read this technical blog to explore Metropolis microservices features, including cloud-native AI application development and deployment with NVIDIA Metropolis microservices, simulation and synthetic data generation with NVIDIA Isaac Sim, and more. [Read Blog](https://developer.nvidia.com/blog/real-time-vision-ai-from-digital-twins-to-cloud-native-deployment-with-nvidia-metropolis-microservices-and-nvidia-isaac-sim/) ## More Resources ![Decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-developer-1.svg) ### Join the NVIDIA Developer Program ![Decorative image representing Training and Certification](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Training and Certification ![Decorative image representing Inception for Startups](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Accelerate Your Startup --- # Source: https://developer.nvidia.com/industries/retail-consumer-packaged-goods-cpg.md # Developer Resources for Retail and CPG A hub of news, SDKs, technical resources, and more for developers working in the retail and consumer packaged goods (CPG) industry. ![A collage of developer resources for the retail and consumer packaged goods (CPG) industry](https://developer.download.nvidia.com/images/retail/multi-image-phone@2x.jpg) ## Applications and SDKs ### NVIDIA NeMo for Generative AI NVIDIA NeMo™ is an end-to-end platform for developing custom generative AI—including large language models (LLMs), multimodal, vision, and speech AI—anywhere. It delivers enterprise-ready models with precise data curation, cutting-edge customization, retrieval-augmented generation (RAG), and accelerated performance. [Get Started ](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) ### NVIDIA RAPIDS for Data Science NVIDIA RAPIDS™, part of NVIDIA CUDA-X™, is an open-source suite of GPU-accelerated data science and AI libraries with APIs that match the most popular open-source data tools, such as pandas, Apache Spark, sci-kit learn, and NetworkX. It accelerates performance by orders of magnitude at scale across data pipelines. [Get Started ](/rapids) ### NVIDIA Metropolis for Computer Vision NVIDIA Metropolis brings powerful SDKs such as NVIDIA TAO, DeepStream, TensorRT™, and Triton™, microservices, and reference workflows to develop and deploy both generative AI and vision AI applications. [Get Started ](/metropolis-microservices) ### NVIDIA ACE for Digital Humans NVIDIA ACE is a suite of technologies for bringing digital humans to life with generative AI. For example, James is a virtual assistant built with ACE that can provide contextually accurate responses in real time. [Interact With James](https://build.nvidia.com/nvidia/digital-humans-virtual-assistant) ### NVIDIA MEGA for Virtual Factories NVIDIA MEGA is a reference architecture that combines NVIDIA Omniverse™, ACE, Metropolis, cuOpt™, and Isaac Sim™ to help developers test and refine industrial automation and generative physical AI in simulation before deploying into real-world factories and wa​​rehouses. [Learn More](https://www.youtube.com/watch?v=HjpwGgmt57U%3E%3C/span%3E%3C/a%3E%20%3Cbr%3E%3Cbr%3E%20%3Ca%20class=) ### NVIDIA NIM NVIDIA NIM™, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data center, and workstations. Built on robust foundations, including inference engines like Triton Inference Server, TensorRT, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing at scale. [Learn More](/nim) [Get Started](http://build.nvidia.com/) ## NVIDIA Retail Shopping Assistant Blueprint Develop a [generative AI-powered](https://www.nvidia.com/en-us/ai-data-science/generative-ai/) shopping assistant that provides interactive customer experiences using retrieval-augmented generation ([RAG](https://www.nvidia.com/en-us/glossary/retrieval-augmented-generation/))—a technique that connects [large language models (LLMs)](https://www.nvidia.com/en-us/glossary/data-science/large-language-models/) to a company’s enterprise data—and [NVIDIA NIM](https://www.nvidia.com/en-us/ai/) microservices. This workflow example provides an advanced starting point for generating conversational, accurate, and precise responses tailored to consumers. [Learn More](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/retail-shopping-advisor/ "Github Repo") ![A business manager uses NVIDIA Retail Shopping Advisor on the laptop](https://developer.download.nvidia.com/images/retail/ai-workflow-1902x1080.jpg) ## NVIDIA Omniverse ![NVIDIA Omniverse for Digital Twins and Simulation simulate warehouses, products, and packaging](https://developer.download.nvidia.com/images/retail/digital-twins-and-simulations-1902x1080l.jpg) ### Digital Twins and Simulations NVIDIA Omniverse for digital twins and simulations revolutionizes retail planning and optimization with digital replicas of physical environments and goods. The platform simulates store layouts, warehouses, products, and packaging to cut costs and streamline operations by testing scenarios risk free, optimizing resource allocation, and streamlining supply chains. [Learn More](/omniverse) ![A leather handbag representing NVIDIA Omniverse platform for 3D design collaboration](https://developer.download.nvidia.com/images/retail/design-collaboration-1902x1080l.jpg) ### Design Collaboration Omniverse is an easily extensible platform for 3D design collaboration and scalable multi-GPU, real-time, true-to-reality simulation. Omniverse revolutionizes the way we create and develop as individuals and work together as teams, bringing more creative possibilities and efficiency to 3D creators, developers, and enterprises. [Learn More](https://docs.omniverse.nvidia.com/digital-twins/latest/building-full-fidelity-viz/collaborating-sharing.html) ## Build With NVIDIA AI Enterprise [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/) is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade AI applications, including generative AI. Easy-to-use microservices provide optimized model performance with enterprise-grade security, support, and stability to ensure a smooth transition from prototype to production. [Try for Free](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/) ## Browse by Resource Library AI Workflows & Blueprints NIM microservices Frameworks SDKs ![Retail loss prevention AI workflow at point of sale Image 2](https://developer.download.nvidia.com/images/retail/retail-loss-prevention-2560x1440.jpeg) ### Retail Loss Prevention Prevent shrinkage and theft at the point of sale. [Learn More About Loss Prevention](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/retail-loss-prevention/) ![Retail shopping advisor AI workflow enhances personalized shopping experiences](https://developer.download.nvidia.com/images/retail/retail-shopping-advisor-ari.jpeg) ### Retail Shopping Assistant Enhance personalized shopping experiences with generative AI. [Learn More About Shopping Assistants](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/retail-shopping-advisor/) ![Retail store analytics AI workflow for store business improvements](https://developer.download.nvidia.com/images/retail/retail-store-analytics-2560x1440.jpeg) ### Retail Store Analytics Gain actionable insights with advanced video analytics for retail [Learn More About Store Analytics](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/retail-store-analytics/) ![Multi-camera tracking AI workflow for customer journey in stores](https://developer.download.nvidia.com/images/retail/multi-camera-tracking-2560x1440.jpeg) ### Multi-Camera Tracking Track customer journeys across cameras throughout the store. [Learn More About Multi-Camera Tracking](https://www.nvidia.com/en-us/ai-data-science/ai-workflows/multi-camera-tracking/) NIMs #### Llama-3.1 8b **Language NIM: Language Generation** Advanced state-of-the art model with language understanding, superior reasoning, and text generation. It provides the benefits of low latency and high throughput, making it highly efficient for real-time applications and capable of handling large volumes of data. [Try Now](https://build.nvidia.com/meta/llama-3_1-8b-instruct) NIMs #### Edify (Getty) **Visual/Multimodal NIM: Image Generation** Getty Images’ API service for 4K image generation. It’s been trained on NVIDIA Edify using Getty Images' commercially safe creative libraries. [Try Now](https://build.nvidia.com/gettyimages/edify-image) NIMs #### Neva **Visual/Multimodal NIM: Language Generation and Computer Vision** Multimodal vision-language model that understands text and images and generates informative responses. [Try Now](https://build.nvidia.com/nvidia/neva-22b) NIMs #### CuOpt **Optimization/Simulation NIM: Route Optimization** World-record accuracy and performance for complex route optimization. [Try Now](https://build.nvidia.com/nvidia/nvidia-cuopt) [View All NIM microservices](https://build.nvidia.com/explore/discover) Frameworks #### #### NVIDIA NeMo **Generative AI** NeMo is an end-to-end, cloud-native framework for building, customizing, and deploying generative AI models. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI. [Try Now](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) Frameworks #### #### NVIDIA Omniverse **Simulation and Digital Twins** From designing true-to-reality 3D assets and environments to operating large-scale digital twins, Omniverse Enterprise is a computing platform that enables individuals and teams to develop [Universal Scene Description (OpenUSD)](https://www.nvidia.com/en-us/omniverse/usd/)-based 3D workflows and applications. [Learn More About Omniverse](https://www.nvidia.com/en-us/omniverse/) Frameworks #### #### NVIDIA ACE **Digital Humans With Generative AI** ACE is a suite of technologies for bringing digital humans to life with generative AI. [Learn More About ACE](/ace) Frameworks #### #### NVIDIA Metropolis **Computer Vision AI** Metropolis brings powerful SDKs such as TAO, DeepStream, TensorRT, and Triton, microservices, and reference workflows to develop and deploy both generative AI and vision AI applications. [Learn More About Metropolis](https://www.nvidia.com/en-us/autonomous-machines/intelligent-video-analytics-platform/) Frameworks #### #### NVIDIA RAPIDS **Data Science Software** RAPIDS, part of CUDA-X, is an open-source suite of GPU-accelerated data science and AI libraries with APIs that match the most popular open-source data tools. It accelerates performance by orders of magnitude at scale across data pipelines. [Learn More About RAPIDS](/rapids) Frameworks #### #### NVIDIA Triton Inference Server **AI Model Deployment** Triton Inference Server lets teams run inference on trained machine learning or deep learning models from any framework on any processor—GPU, CPU, or other. It’s open-source software that standardizes AI model deployment and execution across every workload. [Learn More About Triton Inference Server](/triton-inference-server) Frameworks #### #### NVIDIA Riva **Multimodal Conversational AI** Riva empowers developers to voice their conversational AI applications—Q&A assistants, contact center agent assists, virtual assistants, digital humans, and AI robots—and transform them into natural polyglots. To ensure the best possible accuracy, Riva GPU-accelerated microservices offer flexible customization to specific languages, accents, and domains. [Learn More About Riva](https://www.nvidia.com/en-us/ai-data-science/products/riva/) Frameworks #### #### NVIDIA Isaac Sim **Robotics Simulation and Synthetic Data** Isaac Sim is a reference application that enables developers to design, simulate, test, and train AI-based robots and autonomous machines in a physically based virtual environment. [Learn More About Isaac Sim](/isaac/sim) Frameworks #### #### NVIDIA cuOpt **Accelerated Optimization** cuOpt helps teams solve complex routing problems with multiple constraints and delivers new capabilities, like dynamic rerouting, job scheduling, and robotic simulations, with subsecond solver response times. [Learn More About cuOpt](/cuopt-logistics-optimization) Frameworks #### #### NVIDIA Morpheus **Cybersecurity** Morpheus is an open application framework that helps cybersecurity developers create optimized AI pipelines for filtering, processing, and classifying large volumes of real-time data. Morpheus uses AI to identify, capture, and act on threats and anomalies that were previously impossible to identify. [Learn More About Morpheus](https://developer.nvidia.com/morpheus-cybersecurity) SDKs #### #### NVIDIA NeMo NeMo is an end-to-end platform for developing custom generative AI anywhere. It includes tools for training, customization, and RAG, guardrailing and toolkits, data curation tools, and model pretraining, offering enterprises an easy, cost-effective, and fast way to adopt generative AI. [Learn More About NeMo](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) SDKs #### #### NVIDIA RAPIDS RAPIDS, part of CUDA-X, is an open-source suite of GPU-accelerated data science and AI libraries with APIs that match the most popular open-source data tools. It accelerates performance by orders of magnitude at scale across data pipelines. [Learn More About RAPIDS](/rapids) SDKs #### #### NVIDIA DeepStream SDK The DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing and video, audio, and image understanding. It’s ideal for vision AI developers, software partners, startups, and OEMs building intelligent video analytics (IVA) apps and services. [Learn More About NeMo](/deepstream-sdk) SDKs #### #### NVIDIA TAO Toolkit The open-source TAO Toolkit, built on TensorFlow and PyTorch, uses the power of transfer learning while simplifying model training and optimizing for inference throughput on practically any platform. The result is an ultra-streamlined workflow that can take any model, adapt it to real or synthetic data, and optimize it for inference throughput—without AI expertise or large training datasets. [Learn More About TAO](/tao-toolkit) SDKs #### #### NVIDIA Riva Riva empowers developers to voice their conversational AI applications—Q&A assistants, contact center agent assists, virtual assistants, digital humans, and AI robots—and transform them into natural polyglots. To ensure the best possible accuracy, Riva GPU-accelerated microservices offer flexible customization to specific languages, accents, and domains. [Learn More About Riva](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html) * * * ## Browse by Resource Type Blogs Videos and Demos Training [View All](https://www.nvidia.com/en-us/on-demand/playlist/playList-e332507f-4ab8-46d8-8827-756cd25aba13/ "View All") ### Accelerating End-to-End Data Science Workflows Learn how to build and execute end-to-end, GPU-accelerated data science workflows that let you quickly explore, iterate, and move your work into production. In this self-paced lab, you’ll learn how to use RAPIDS accelerated data science libraries to perform data analysis at scale with a wide variety of GPU-accelerated algorithms. [Enroll Now](https://courses.nvidia.com/courses/course-v1:DLI+S-DS-01+V1/) ### Generative AI Explained Explore generative AI, which has recently taken the world by storm. Using neural networks to identify patterns and structures within existing data, it generates new content based on a variety of inputs. In this course, you’ll learn generative AI concepts, applications, and the challenges and opportunities of this exciting field. [Enroll Now](https://courses.nvidia.com/courses/course-v1:DLI+S-FX-07+V1/) ### Synthetic Data Generation for Training Computer Vision Models Streamline synthetic data generation (SDG) using 3D assets into a single application—and modify the appearance and format of the data—with NVIDIA Omniverse Replicator. This lab highlights one of the ways deep learning tools and Omniverse can be used together to streamline deep learning workloads. [Enroll Now](https://courses.nvidia.com/courses/course-v1:DLI+S-OV-10+V1/) [View All Courses](https://www.nvidia.com/en-us/training/ "Download Workflows") * * * ## Programs for You ### Access Developer Resources The NVIDIA Developer Program provides the advanced tools and training needed to successfully build applications on all NVIDIA technology platforms. This includes access to hundreds of SDKs, a network of like-minded developers in our community forums, and more. [Join the Developer Program Today ](https://developer.nvidia.com/developer-program) ### Get Technical Training NVIDIA RAPIDS™, part of NVIDIA CUDA-X™, is an open-source suite of GPU-accelerated data science and AI libraries with APIs that match the most popular open-source data tools, such as pandas, Apache Spark, sci-kit learn, and NetworkX. It accelerates performance by orders of magnitude at scale across data pipelines. [View Courses ](https://www.nvidia.com/en-us/training/) ### Accelerate Your Startup NVIDIA Inception—an acceleration platform for AI, data science, and high-performance computing (HPC) startups—supports over 19,000 startups worldwide with go-to-market support, expertise, and technology. Startups get access to training through the DLI, preferred pricing on hardware, and invitations to exclusive networking events. [Apply to Inception ](https://www.nvidia.com/en-us/startups/) * * * Get Started ## Take the Next Steps ### Stay Up to Date on NVIDIA News for Retail [Subscribe Now](https://www.nvidia.com/en-us/industries/retail/newsletter/) ### Request a Consultation Our experts can help your business unlock potential and unleash innovation. [Let's Talk ](https://www.nvidia.com/en-us/industries/retail/contact-sales/) Join the NVIDIA Developer Program to take a free self-paced course. [Join Now](/developer-program) --- # Source: https://developer.nvidia.com/riva.md 1. [Topics](https://developer.nvidia.com/topics/) [AI](https://developer.nvidia.com/topics/ai) [Conversational AI](https://developer.nvidia.com/topics/ai/conversational-ai) NVIDIA Riva # NVIDIA Riva for Developers [NVIDIA® Riva](https://www.nvidia.com/en-us/ai-data-science/products/riva/) is a collection of GPU-accelerated multilingual speech and translation microservices for building fully customizable, real-time conversational AI pipelines. Riva includes [industry-leading](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard) automatic speech recognition (ASR), text-to-speech (TTS), and neural machine translation (NMT) and is deployable in all clouds, in data centers, at the edge, or in embedded devices. With Riva, organizations can add speech and translation capabilities that transform chatbots into engaging, expressive multilingual AI voice agents or avatars. [Try APIs](https://build.nvidia.com/explore/speech "Download Now")[Documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html "Forums link")[Forum](https://forums.developer.nvidia.com/c/ai-data-science/deep-learning/riva/475 "Go to Documentation") * * * ## How NVIDIA Riva Works Speech and translation AI microservices convert spoken words into text (speech recognition), written language into spoken words (speech synthesis), and spoken or written words from one language to another (translation). Pretrained AI models are trained on vast datasets and can be fine-tuned on custom datasets to accelerate the development of domain-specific models. Fully containerized, these microservices are optimized for real-time performance and offline high throughput on premises or in the cloud, and can quickly scale to hundreds and thousands of parallel streams. ![A workflow diagram showing speech and translation AI microservices](https://developer.download.nvidia.com/images/products/riva-how-it-works-2560x940.svg) ### Quick-Start Guide Get step-by-step instructions for deploying pretrained models and how to interact with them. [Get Started ](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html?_gl=1*mut5vc*_gcl_au*NDk1OTAwNzg1LjE3MTYyMjY5Njc.) ### Introductory Blog Learn about Riva’s architecture, key features, and components. [Read Blog](https://resources.nvidia.com/en-us-riva/intro-blog?_gl=1*mut5vc*_gcl_au*NDk1OTAwNzg1LjE3MTYyMjY5Njc.) ### Introductory Webinar Build and deploy end-to-end speech and translation AI pipelines. [Watch Webinar](https://resources.nvidia.com/en-us-speech-ai-gtc/gtcfall22-a41126?_gl=1*mut5vc*_gcl_au*NDk1OTAwNzg1LjE3MTYyMjY5Njc.) ### Real-World Use Cases See how to use Riva for multilingual transcription, translation, and voice. [Watch Video (2:03)](https://resources.nvidia.com/en-us-speech-ai-content/nvidia-riva-demo?lx=h9TPm5&_gl=1*mut5vc*_gcl_au*NDk1OTAwNzg1LjE3MTYyMjY5Njc.) * * * ## Ways to Get Started With NVIDIA Riva Use the right tools and technologies to build and deploy fully customizable, multilingual speech and translation AI applications. ![A decorative image of building AI application with NVIDIA NIM APIs](https://developer.download.nvidia.com/icons/m48-nim-256px-blk%201@2x.png) ### Try Experience Riva through a UI-based portal for exploring and prototyping with NVIDIA-managed endpoints, available for free through NVIDIA's API catalog. [Try Now](https://build.nvidia.com/explore/speech) ![A decorative image of deploying with NVIDIA AI Enterprise](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) ### Deploy Get a free license to try [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/) for 90 days using your existing infrastructure. [Request a 90-Day License](https://enterpriseproductregistration.nvidia.com/?LicType=EVAL&ProductFamily=NVAIEnterprise) * * * ## Development Starter Kits Start developing your speech and translation AI application with Riva by accessing tutorials, notebooks, [forums](https://forums.developer.nvidia.com/c/ai-data-science/deep-learning/riva/475), [release notes](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/release-notes.html), and comprehensive [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html). ### Automatic Speech Recognition Achieve high transcription accuracy for Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Portuguese, Russian, and Spanish with state-of-the-art models pretrained on thousands of hours of audio on NVIDIA supercomputers. - [Read Speech-Recognition Documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html) - [Read Quick-Start Guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html) - [Explore Speech-Recognition Tutorials](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials-asr-contents.html) ### Text-to-Speech Customize across English, German, Italian, Mandarin, and Spanish TTS pipelines for the voice and intonation you want. - [Read TTS Documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html) - [Read Quick-Start Guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html) - [Explore TTS Tutorials](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials-tts-contents.html) ### Neural Machine Translation Integrate highly accurate text-to-text, speech-to-text, or speech-to-speech translation for up to 32 languages into your conversational application pipelines. - [Read NMT Documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/translation/translation-overview.html) - [Read Quick-Start Guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html) - [Explore NMT Tutorials](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials-nmt-contents.html) * * * ## NVIDIA Riva Learning Library * * * ## More Resources ![A decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Explore the Community ![img-alt-text](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Training and Certification ![A decorative image representing Inception for Startups](https://developer.download.nvidia.com/icons/m48-ai-startup.svg) ### Meet the Program for Startups * * * ## Ethical AI NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Always consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended. Stay up to date on the latest speech and translation AI news from NVIDIA. [Sign Up](https://www.nvidia.com/en-us/deep-learning-ai/solutions/conversational-ai/speech-ai-news/) --- # Source: https://developer.nvidia.com/networking/rivermax.md 1. [Home](/) [Networking](/networking) Rivermax # NVIDIA Rivermax NVIDIA® Rivermax® is an optimized networking SDK for media and data streaming applications. [Get Started](#section-get-started "Get Started")[Documentation](#section-learning-library "Documentation")[NVIDIA DOCA Forum](https://forums.developer.nvidia.com/c/infrastructure/doca/370 "DOCA Forums") * * * ## See Rivermax in Action #### Real-Time Streaming for the World’s Largest LED Display The Las Vegas Sphere relies on NVIDIA Rivermax software to accelerate media streaming. Rivermax enables direct, low‑latency data transfers between the network and GPUs, and with NVIDIA BlueField® DPUs and ConnectX® NICs, it ensures jitter‑free, synchronized delivery of ultra‑high‑resolution, multi‑layer 16K video across the Sphere’s massive LED canvas. [Learn More](https://blogs.nvidia.com/blog/sphere-las-vegas/) ![NVIDIA Rivermax provides real-time streaming for the Las Vegas Sphere, world’s largest LED display](https://developer.download.nvidia.com/images/networking/sphere-entertainment-1920x1080.jpg) * * * ## How Rivermax Works ![Join NVIDIA Developer Community](https://developer.download.nvidia.com/images/networking/doca-animation-rivermax-how-it-works-4129550-1920x1080-r05.gif) ### Simplify SMPTE ST 2110 Deployment Accelerate the transition to SMPTE ST 2110 with NVIDIA GPUs, NVIDIA networking, and the Rivermax SDK. [Learn More](https://www.youtube.com/watch?v=-AulvUL7npc) ### Start Building With Rivermax Quickly build and optimize data- and media-streaming applications with Rivermax. [Learn More](https://docs.nvidia.com/doca/sdk/doca+rivermax/index.html) ### Boost Video Streaming With Rivermax Learn how to combine GPU-accelerated image and video processing with Rivermax to achieve ultra-high throughput and performance for video streaming applications. [Learn More](https://www.nvidia.com/en-eu/products/holoscan/media/) ### Rivermax for Media and Entertainment Discover how Rivermax enables ultra-low-latency, high-throughput IP streaming over Ethernet to power next-gen broadcast and media production workflows. [Learn More](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/networking/rivermax/NVIDIA-Rivermax-Datasheet.pdf) * * * ## Key Features GPU-accelerated media streaming is streamlined with direct SMPTE ST 2110 processing—delivering efficient, scalable, real-time performance. ### GPUDirect Support With NVIDIA GPUDirect®, Rivermax utilizes the high-speed PCIe interface to pass data directly to and from the GPU without burdening the CPU. - Cuts processing latency - Supported on Linux and Windows - ST2110 and any data streaming ### Kernel Bypass, CPU Efficient Rivermax minimizes CPU overhead and maximizes application performance. - Kernel bypass technology for direct, hardware-based data transfer between user-space memory and the network interface - Offloads packet processing from the CPU to dedicated hardware accelerators - Hardware-level packet pacing for smooth, consistent streaming across all data flows ### Advanced and Flexible SDK The unified, cross-platform SDK is designed for high-performance media streaming and easy integration. - Seamless GPU-NIC orchestration with support for SMPTE ST 2110 and hardware-based interconnects - Broad compatibility across Linux, Windows, x86, and Arm, with a simple API for frame- and line-based data - Integrated timing and redundancy via easy PTP/NMOS integration and SMPTE ST 2022-7 stream reconstruction * * * ## Get Started With Rivermax ![Register and login to NVIDIA Rivermax getting started page](https://developer.download.nvidia.com/icons/m48-login-256px-blk.svg) ### Log In **Register to access the Rivermax Getting Started page.** Explore SDK highlights, and download and install it on supported platforms. [Go to the Rivermax Getting Started Page](/networking/rivermax-getting-started) ![EPIC Unreal Engine Plug-In](https://developer.download.nvidia.com/icons/m48-accelerate-computing-with-cuda-c-c++.svg) ### Build and Explore **Access advanced code samples.** Deploy the Rivermax SDK and use example code to stream video, data, and audio. [Explore Code Samples on the Rivermax GitHub](https://github.com/NVIDIA/Rivermax) ![EPIC Unreal Engine Plug-In](https://developer.download.nvidia.com/icons/m48-virtual-pc-cloud-computer.svg) ### Get the Epic Unreal Engine Plug-In **Use SMPTE 2110 with an nDisplay virtual production setup.** Learn to integrate SMPTE 2110 with NVIDIA Rivermax for virtual production. [Use SMPTE ST 2110 With nDisplay](https://dev.epicgames.com/documentation/en-us/unreal-engine/using-smpte-2110-with-ndisplay?application_version=5.5) * * * ## Starter Kits ### SMPTE ST 2110 SMPTE ST 2110 delivers modern, flexible, and high-performance IP-based media production. - [Simplified Application-Level Interfaces for Sending and Receiving Generic or Media Streams (GitHub)](https://github.com/NVIDIA/rivermax-dev-kit/tree/main) - [NVIDIA Holoscan for Media—Real-Time AI Platform for the Future of Live Media](/holoscan-for-media) - [NVIDIA DeepStream—ST2110 GStreamer Pipeline (Documentation)](https://docs.nvidia.com/metropolis/deepstream/8.0/text/DS_plugin_gst-nvdsudpsink.html) - [Ultra-High-Performance Video Streaming Meets the GPU With NVIDIA Rivermax (GTC Session)](https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31886/) ### Virtual Production Epic Unreal Engine with Rivermax enhances flexibility and performance for virtual productions, so creative teams can deliver high-quality, synchronized content to multiple displays. - [DOCA-FireFly Time Service (Documentation)](https://docs.nvidia.com/networking/display/dpf2504/doca+firefly+service) - [Timing With DOCA Firefly (Whitepaper)](https://resources.nvidia.com/en-us-accelerated-networking-resource-library/gtc21-fall-networkin) - [Epic Games and NVIDIA Are Setting New Standards in Virtual Production (Blog)](https://resources.nvidia.com/en-us-professional-broadcast/unreal-engine) - [Epic Unreal Engine Plug-In](https://dev.epicgames.com/community/learning/tutorials/mPJO/unreal-engine-smpte-2110-ux) ### Professional Audiovisual Pro AV is a rapidly evolving field powering the world's most advanced communication, entertainment, and collaborative spaces. - [The MSG Sphere Is a Flagship Pro AV Project—by Definition and in Scale (Blog)](https://blogs.nvidia.com/blog/sphere-las-vegas/) - [Accelerating Customers' Media Streaming Solutions on Windows (GTC Session)](https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41927/) - [IPMX Receiver Code (GitHub)](https://github.com/NVIDIA/rivermax-dev-kit/tree/main/source/apps/rmax_ipmx_receiver) - [IPMX Sender Code (GitHub)](https://github.com/NVIDIA/rivermax-dev-kit/tree/main/source/apps/rmax_ipmx_sender) * * * ## Learning Library Tech Blog Rivermax and FastSockets: Ultra-Low-Latency Networking **NVIDIA Rivermax** Ultra-low latency and zero packet loss are vital for the financial, gaming, and media and entertainment industries. Video FlowCaster Rivermax Software ST2110 for Adobe, Avid, and Resolve **NVIDIA Rivermax** Drastic's FlowCaster plug-in for Adobe Premiere, Avid Media Composer, DaVinci Resolve, and others leverages the NVIDIA Rivermax software to generate SMPTE 2110-compliant audio, video, and ancillary output. Tutorial Ultra-High-Performance Video Streaming Meets the GPU With NVIDIA Rivermax **NVIDIA Rivermax** Rivermax natively supports GPUDirect on Linux and Windows to provide maximum performance with low latency. Tutorial Connecting NVIDIA NIM to Uncompressed Audio and Video Pipelines With Holoscan for Media **NVIDIA Holoscan for Media** NVIDIA Holoscan for Media is a software-defined platform running on AI infrastructure for developing, deploying, and running live media applications. Video AI-Enabled IP-Based Workflows With NVIDIA Rivermax and NVIDIA Jetson **NVIDIA DOCA FireFly, RTX™ A600, Rivermax, BlueField DPU, ConnectX-6 Dx NIC, and Jetson™** NVIDIA and Dell technologies simplify SMPTE ST 2110 and enable AI-driven workflows from workstation to edge to power next-gen IP broadcast. Guide Rivermax License Generation Guidelines **NVIDIA Rivermax** This document provides guidelines for generating a license for NVIDIA Rivermax. * * * ## Partner Ecosystem Explore the NVIDIA partners who are rolling out full, Rivermax-based IP solutions rigorously tested in their labs. [ ![NVIDIA Rivermax Ecosystem Partner - 7thSense](https://developer.download.nvidia.com/images/logos/7thsense-logo.svg) ](https://7thsense.one/) [ ![NVIDIA Rivermax Ecosystem Partner - ATEME](https://developer.download.nvidia.com/images/logos/ateme-logo.svg) ](https://www.ateme.com/product-titan-software/) [ ![NVIDIA Rivermax Ecosystem Partner - Avid](https://developer.download.nvidia.com/images/logos/avid-logo.svg) ](https://www.avid.com/) [ ![NVIDIA Rivermax Ecosystem Partner - Amazon AWS](https://developer.download.nvidia.com/images/logos/aws-logo(1).svg) ](https://aws.amazon.com/elemental-live/) [ ![NVIDIA Rivermax Ecosystem Partner - Comprimato](https://developer.download.nvidia.com/images/logos/comprimato-logo.svg) ](https://comprimato.com/products/twenty-one-encoder/) [ ![NVIDIA Rivermax Ecosystem Partner - Disguise](https://developer.download.nvidia.com/images/logos/disguise-logo.svg) ](https://www.disguise.one/en/products/rx-range/rx/) [ ![NVIDIA Rivermax Ecosystem Partner - Drastic.tv](https://developer.download.nvidia.com/images/logos/dt-logo.svg) ](https://www.drastic.tv/) [ ![NVIDIA Rivermax Ecosystem Partner - Emergent](https://developer.download.nvidia.com/images/logos/emergent-logo.svg) ](https://emergentvisiontec.com/) [ ![NVIDIA Rivermax Ecosystem Partner - Evertz](https://developer.download.nvidia.com/images/logos/evertz-logo.svg) ](https://evertz.com/applications/live-production/) [ ![NVIDIA Rivermax Ecosystem Partner - Grass Valley](https://developer.download.nvidia.com/images/logos/grass-valley-logo.svg) ](https://www.grassvalley.com/ampp/) [ ![NVIDIA Rivermax Ecosystem Partner - Harmonic](https://developer.download.nvidia.com/images/logos/harmonic-logo.svg) ](https://www.harmonicinc.com/video-streaming/media-servers-storage/spectrum-x/) [ ![NVIDIA Rivermax Ecosystem Partner - IMMERSIVE Design Studios](https://developer.download.nvidia.com/images/logos/immersive-logo.svg) ](http://www.immersivedesignstudios.com) [ ![NVIDIA Rivermax Ecosystem Partner - IntoPIX](https://developer.download.nvidia.com/images/logos/intopix-logo.svg) ](https://www.intopix.com/blogs/post/gtc-2021-4k-demo-JPEG-XS-Rivermax-NIC-ConnectX5-2110) [ ![NVIDIA Rivermax Ecosystem Partner - Lawo](https://developer.download.nvidia.com/images/logos/lawo-logo.svg) ](https://lawo.com/) [ ![NVIDIA Rivermax Ecosystem Partner - Mediaproxy](https://developer.download.nvidia.com/images/logos/mediaproxy-logo.svg) ](https://www.mediaproxy.com/Solutions/Monitoring) [ ![NVIDIA Rivermax Ecosystem Partner - NEIO Systems](https://developer.download.nvidia.com/images/logos/neio-systems-logo.svg) ](http://neio.systems) [ ![NVIDIA Rivermax Ecosystem Partner - Panasonic](https://developer.download.nvidia.com/images/logos/panasonic-logo.svg) ](https://pro-av.panasonic.net/en/products/it_ip_platform/) [ ![NVIDIA Rivermax Ecosystem Partner - Pixera](https://developer.download.nvidia.com/images/logos/pixera-logo.svg) ](https://pixera.one/en/) [ ![NVIDIA Rivermax Ecosystem Partner - RT Software](https://developer.download.nvidia.com/images/logos/rt-software-logo.svg) ](https://rtsw.co.uk/) [ ![NVIDIA Rivermax Ecosystem Partner - Telestream](https://developer.download.nvidia.com/images/logos/telestream-logo.svg) ](http://www.telestream.net/iq/inspect-2110.htm) [ ![NVIDIA Rivermax Ecosystem Partner - Unreal Engine](https://developer.download.nvidia.com/images/logos/unreal-engine-logo.svg) ](https://dev.epicgames.com/documentation/en-us/unreal-engine/ndisplay-workflows-for-smpte-2110-in-unreal-engine) ![NVIDIA Rivermax Ecosystem Partner -ZREAL](https://developer.download.nvidia.com/images/logos/zreal-logo.svg) * * * ## More Resources ![NVIDIA Rivermax FAQ](https://developer.download.nvidia.com/icons/m48-misc-question-faq.svg) ### Read the FAQ ![Join NVIDIA Developer Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Join the Community ![Sign up for NVIDIA Developer Newsletter](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign Up for the Developer Newsletter Get started with NVIDIA Rivermax today. [Download Now](/networking/rivermax-getting-started "Download Rivermax") --- # Source: https://developer.nvidia.com/isaac/ros.md 1. [Home](/) [Isaac](https://developer.nvidia.com/isaac) Isaac ROS # NVIDIA Isaac ROS NVIDIA Isaac™ ROS (Robot Operating System) is a collection of NVIDIA® CUDA®-accelerated computing packages and AI models designed to streamline and expedite the development of advanced AI robotics applications. [Download Now](https://github.com/NVIDIA-ISAAC-ROS "Download Now")[Documentation ](https://nvidia-isaac-ros.github.io "Documentation")[Forums ](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/isaac-ros/600 "Forums") * * * ## How NVIDIA Isaac ROS Works Isaac ROS gives you a powerful toolkit for building robotic applications. It offers ready-to-use packages for common tasks like navigation and perception, uses NVIDIA frameworks for optimal performance, and can be deployed on both workstations and embedded systems like NVIDIA Jetson™. ![A diagram showing how NVIDIA Isaac ROS works](https://developer.download.nvidia.com/images/isaac/ros/isaac-ros-robotics-diagram-1.png) ### Quick Start Guide Learn what you need to get started and how to set up using the Isaac ROS suite to tap into the power of NVIDIA acceleration on NVIDIA Jetson. [Read the Guide](https://nvidia-isaac-ros.github.io/getting_started/index.html) ### Introductory Talk Isaac ROS offers modular packages for robotic perception and easy integration into existing ROS 2-based applications. This talk covers Isaac ROS GEMs and how to use multiple GEMs in your robotics pipeline. [Watch the Video](https://www.nvidia.com/en-us/on-demand/session/gtc24-se62934/) ### Introductory Webinars Check out a series of Isaac ROS webinars covering various topics, from running your own ROS 2 benchmarks to harnessing the power of NVIDIA NITROS. [View the Webinars](https://gateway.on24.com/wcc/experience/elitenvidiabrill/1407606/3998202/isaac-ros-webinar-series) ### What’s New—Isaac ROS 3.2 Check out the latest Isaac ROS update to boost your robot’s capabilities with advanced AI-based perception and manipulation [Read the Blog ](https://developer.nvidia.com/blog/advancing-robot-learning-perception-and-manipulation-with-latest-nvidia-isaac-release/) * * * ## Key Features ![Open Robotics ROS logo](https://developer.download.nvidia.com/images/products/logo-ros.jpg) _ROS is a trademark of Open Robotics_ **Open Ecosystem** ### Built on ROS NVIDIA Isaac ROS is built on the open-source [ROS 2™](https://www.ros.org/) software framework. This means the millions of developers in the ROS community can easily take advantage of NVIDIA-accelerated libraries and AI models to fast track their AI robot development and deployment workflows. ![NVIDIA Isaac Transport for ROS (NITROS) processing pipelines](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/ros-dp-nitros.gif) **Hardware Acceleration** ### NVIDIA Isaac Transport for ROS The NVIDIA implementation of type adaption and negotiation is called NITROS, which are ROS processing pipelines made up of Isaac ROS hardware-accelerated modules (a.k.a. GEMs). NITROS lets ROS 2 applications take full advantage of GPU hardware acceleration, potentially achieving higher performance and more efficient use of computing resources across the entire ROS 2 graph. [Download NVIDIA Isaac Transport for ROS (Github) ](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_nitros) #### High-Throughput Perception Isaac ROS delivers a rich collection of individual ROS packages (GEMs) and complete pipelines ([NITROS](https://nvidia-isaac-ros.github.io/concepts/nitros/index.html)) optimized for NVIDIA GPUs and NVIDIA Jetson™ platforms. This helps you achieve more with reduced development times. #### Modular, Flexible Packages Plug and play with a selection of packages—for computer vision, image processing, robust object detection, collision detection, and trajectory optimization—and easily go to production. #### The Power of NVIDIA AI Isaac ROS is compatible with all ROS 2 nodes, making it easier to integrate into existing applications. Develop robotic applications using NVIDIA AI and [pretrained models](/ai-models) from robotics-specific datasets for faster development. * * * ## Getting Started on NVIDIA Isaac ROS ![A 101 decorative image representing system setup](https://developer.download.nvidia.com/icons/m48-101-getting-started.svg) #### System Setup Tap into NVIDIA-accelerated libraries and AI models to speed up your AI robot workflows. [Check your system requirements and set up your system.](https://nvidia-isaac-ros.github.io/getting_started/) [Get Started](https://nvidia-isaac-ros.github.io/getting_started/index.html) ![A decorative image representing plug-and-pay ROS packages](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) #### Plug-and-Play ROS Packages Read through the Isaac ROS [concepts](https://nvidia-isaac-ros.github.io/concepts/index.html) and easily move to production with a selection of advanced packages. [Isaac ROS Packages](https://nvidia-isaac-ros.github.io/repositories_and_packages/index.html) ![Deploy at the edge with NVIDIA Jetson](https://developer.download.nvidia.com/images/isaac/perceptor/m48-orin.svg) #### Deployment on the Edge With Partner Kits NVIDIA Jetson provides hardware acceleration, optimized AI software, a robust ecosystem, and energy efficiency, making it an ideal platform to deploy your Isaac ROS applications. [Nova Carter](https://nvidia-isaac-ros.github.io/nova/getting_started/platforms/nova_carter.html) and the Nova Orin™ developer platforms also help you accelerate AMR development. [Get it from Leopard Imaging](https://leopardimaging.com/nvidia-nova-devkit/)[Get it From Segway Robotics](https://robotics.segway.com/nova-dev-kit/)[Get it From Orbbec](https://www.orbbec.com/opdk/)[Get it From LIPS](https://www.lips-hci.com/nvidia-isaac-perceptor) #### NVIDIA Isaac for Manipulation CUDA-accelerated libraries and AI models give you a faster, easier way to develop AI-powered robotic arms that can seamlessly perceive, understand, and interact with their environments. [Learn More About Isaac for Manipulation](https://nvidia-isaac-ros.github.io/v/release-4.0/reference_workflows/isaac_for_manipulation/index.html) #### NVIDIA Isaac for Mobility Accelerate the development of advanced autonomous mobile robots (AMRs) that can perceive, localize, and operate in unstructured environments like warehouses or factories. [Learn More About Isaac for Mobility](https://nvidia-isaac-ros.github.io/v/release-4.0/reference_workflows/isaac_for_mobility/index.html) * * * ## High-Performance Perception With NITROS Pipelines ROS 2 graphs using [NITROS-based](https://nvidia-isaac-ros.github.io/concepts/nitros/index.html), NVIDIA-accelerated Isaac ROS packages can significantly increase performance. You can find a complete performance summary [here](https://nvidia-isaac-ros.github.io/performance/index.html). | Node | Input Size | AGX Orin | Orin NX | x86\_64 w/ RTX 4090 | | --- | --- | --- | --- | --- | | AprilTag Node | 720p | 249 fps 4.5 ms @ 30 Hz | 116 fps 9.3 ms @ 30 Hz | 596 fps 0.97 ms @ 30 Hz | | Freespace Segmentation Node | 576p | 2120 fps 1.7 ms @ 30 Hz | 2490 fps 1.6 ms @ 30 Hz | 3500 fps 0.52 ms @ 30 Hz | | Depth Segmentation Node | 576p | 45.8 fps 79 ms @ 30 Hz | 28.2 fps 99 ms @ 30 Hz | 105 fps 25 ms @ 30 Hz | | TensorRT Node PeopleSemSegNet | 544p | 460 fps 4.1 ms @ 30 Hz | 348 fps 6.1 ms @ 30 Hz | - | | Triton Node PeopleSemSegNet | 544p | 304 fps 4.8 ms @ 30 Hz | 206 fps 6.5 ms @ 30 Hz | - | | DNN Stereo Disparity Node Full | 576p | 103 fps 12 ms @ 30 Hz | 42.1 fps 26 ms @ 30 Hz | 350 fps 2.3 ms @ 30 Hz | | H.264 Decoder Node | 1080p | 197 fps 8.2 ms @ 30 Hz | - | 596 fps 4.2 ms @ 30 Hz | | H.264 Encoder Node I-frame Support | 1080p | 402 fps 13 ms @ 30 Hz | - | 409 fps 3.4 ms @ 30 Hz | | H.264 Encoder Node P-frame Support | 1080p | 473 fps 11 ms @ 30 Hz | - | 596 fps 2.1 ms @ 30 Hz | | Nvblox Node | - | 4.87 fps 35.9 ms | 4.95 fps -1.43 ms | 4.95 fps 195 ms | * * * ## Starter Kits Start developing your robotics and AI application with Isaac ROS with these [forums](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/isaac-ros/600), [release notes](https://nvidia-isaac-ros.github.io/releases/index.html#), and [comprehensive documentation](https://nvidia-isaac-ros.github.io). ### Localization and Mapping Isaac ROS Visual SLAM provides a high-performance, best-in-class ROS 2 package for VSLAM (visual simultaneous localization and mapping). - [Read Isaac ROS Visual SLAM Overview](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_visual_slam/index.html) - [Quickstart Guide: Isaac ROS Visual SLAM](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_visual_slam/isaac_ros_visual_slam/index.html#quickstart) - [Watch Webinar: Pinpoint, 250 fps, ROS 2 Localization With vSLAM on Jetson](https://gateway.on24.com/wcc/experience/elitenvidiabrill/1407606/3998202/isaac-ros-webinar-series) ### 3D Scene Reconstruction Isaac ROS nvBlox uses RGB-D data to create a dense 3D map, including unforeseen obstacles, to generate a temporal costmap for navigation. - [Read Isaac ROS nvBlox Overview](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_nvblox/index.html) - [Quickstart Guide: Isaac ROS nvBlox](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_nvblox/isaac_ros_nvblox/index.html#quickstart) ### Pose Estimation and Tracking NVIDIA’s FoundationPose is a state-of-the-art foundation model for 6D pose estimation and tracking of novel objects. - [Read Isaac ROS Pose Estimation Overview](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_pose_estimation/index.html) - [Quickstart Guide: Isaac ROS Pose Estimation](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_pose_estimation/isaac_ros_centerpose/index.html#quickstart) ### Motion Planning Isaac ROS cuMotion is an NVIDIA CUDA-accelerated library for solving robot motion planning problems at scale by running multiple trajectory optimizations simultaneously to return the best solution. - [Read Isaac ROS cuMotion Overview](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_cumotion/index.html#quickstarts) - [Quickstart Guide: Isaac ROS cuMotion MoveIt Plugin](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_cumotion/isaac_ros_cumotion_moveit/index.html#quickstart) - [Quickstart Guide: cuMotion Robot Segmentation](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_cumotion/isaac_ros_cumotion/index.html#robot-segmentation-quickstart) ### Testing and Validation in Simulation Virtually train, test, and validate robotics systems using NVIDIA Isaac Sim and NVIDIA Isaac Lab. - [Tutorials: How to use Isaac Sim with Isaac ROS](https://nvidia-isaac-ros.github.io/getting_started/index.html#isaac-sim-tutorials) * * * ## Isaac ROS Learning Library Featured Documentation NVIDIA Isaac ROS Release Notes Get the comprehensive updates on the latest features, improvements, and bug fixes for Isaac ROS. Featured Tech Blog Advance Robot Learning, Perception, and Manipulation with the Latest NVIDIA Isaac Release Explore the latest Isaac release to enhance your robot learning, perception, manipulation, and environment mapping. Featured Webinar Isaac ROS Webinar Series Check out a series of Isaac ROS webinars covering various topics, from running your own ROS two benchmarks to harnessing the power of NITROS. Blog Universal Robots Accelerate Cobot Development With NVIDIA Universal Robots is addressing the limitations of traditional robots in handling complex tasks and operating in dynamic environments with its AI Accelerator, developed with the NVIDIA Isaac platform. Blog Amazon Devices & Services Step Towards Zero-Touch Manufacturing Explore how Amazon Devices & Services is driving major advancements in manufacturing with a new physical AI solution powered by the NVIDIA Isaac Platform. * * * ## More Resources ![NVIDIA Developer Forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![NVIDIA Training and Certification](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![NVIDIA Inception Program for Startups](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Join the Program for Startups * * * # Get Started Accelerate your robotic application development and get started today with NVIDIA Isaac ROS. [Download from GitHub](https://github.com/NVIDIA-ISAAC-ROS "Download from GitHub")[Documentation](https://nvidia-isaac-ros.github.io "Documentation") --- # Source: https://developer.nvidia.com/rtx-kit.md # NVIDIA RTX Kit NVIDIA RTX™ Kit is a suite of neural rendering technologies to ray trace games with AI, render scenes with immense geometry, and create game characters with photo-realistic visuals. [Get Started ](#section-getting-started "Github Repo")[Notify Me for Future Releases ](/rtx-kit/notify-me "Github Repo")[Download Zorah Sample ](https://dlss.download.nvidia.com/demos/zorah/ZorahSample_UE5_Source_1.1.0.7z "Github Repo") * * * https://www.youtube-nocookie.com/embed/5PHBXY0FI5o?&autoplay=1&loop=1&playlist=5PHBXY0FI5o Zorah technology demo, built in Unreal Engine 5 ### Watch Trailer See Zorah in action in our visual trailer shown during the NVIDIA keynote at CES 2025. [Watch Video (1:37)](https://youtu.be/H681NInCwFo) ### What are Cooperative Vectors? See how cooperative vectors unlocks the full potential of neural rendering. [Read Blog ](https://devblogs.microsoft.com/directx/enabling-neural-rendering-in-directx-cooperative-vector-support-coming-soon/) ### Path Tracing in Unreal Engine 5 Learn what you can do with real-time path tracing in UE5. [Watch Video (36:25)](https://www.youtube.com/watch?v=zD9kS1nOuyU&pp=ygUfcGF0aCB0cmFjaW5nIGluIHVucmVhbCBlbmdpbmUgNQ%3D%3D) ### Get Started with RTX Kit In this tutorial, we will focus on how to get started with the new SDKs available today through NVIDIA RTX Kit. [Read Blog ](https://developer.nvidia.com/blog/get-started-with-neural-rendering-using-nvidia-rtx-kit/) * * * ## Key Benefits ![A lamp and a piece of satin cloth placed on stone stairs](https://developer.download.nvidia.com/images/rtx/kit/deploy-ai-shaders-1920-1080.jpg) ### Train and Deploy AI From Shaders Unlock new compression techniques that reduce texture memory consumption up to 8X and compress shader code for up to 5X faster material processing. Allows developers to render film-quality assets in real time. ![Sunlight shining on ancient building walls](https://developer.download.nvidia.com/images/rtx/kit/path-trace-1920-1080.jpg) ### Path Trace at Game-Ready Performance Build your bounding volume hierarchy (BVH) faster and simulate physically accurate reflections, shadows and global illumination for detailed worlds. ![A digital human rendering by NVIDIA RTX Kit](https://developer.download.nvidia.com/images/rtx/kit/digital-human-render-1920-1080.jpg) ### Accelerate Digital Human Rendering Hardware-accelerated strand-based hair provides a simplified workflow with photo-realistic results and ray-traced subsurface scattering renders skin better. * * * ## Get Started with RTX Kit Technologies AI-Powered Shaders Geometry and Lighting Character Rendering ### RTX Neural Shaders Train and deploy neural networks within shaders to unlock new compression and approximation techniques for next-generation asset generation. [Download (GitHub)](https://github.com/NVIDIA-RTX/Rtxns) ### RTX Neural Texture Compression Use AI to compress textures with up to 8x disk memory improvement at similar visual fidelity to traditional block compression. [Download (GitHub)](https://github.com/NVIDIA-RTX/Rtxntc) ### RTX Texture Filtering Randomly samples textures after shading and filters difficult volumes, reducing artifacts and improving image quality. [Download (GitHub)](https://github.com/NVIDIA-RTX/Rtxtf) ### RTX Neural Materials Use AI to compress shader code of complex multi-layered materials for up to 8X faster material processing to bring real-time performance to film-quality assets. [Notify Me](/rtx-kit/notify-me/) ### RTX Texture Streaming Tool that divides textures into smaller tiles and efficiently manages and loads them based on need. [Download (GitHub)](https://github.com/NVIDIA-RTX/RTXTS) ### RTX Mega Geometry Accelerate BVH building for cluster-based geometry systems, enabling up to 100x more ray-traced triangles and better performance in heavily ray-traced scenes. [Download (GitHub)](https://github.com/NVIDIA-RTX/RTXMG) [Access NVIDIA RTX Branch of Unreal Engine 5 ](https://developer.nvidia.com/game-engines/unreal-engine/rtx-branch) ### RTX Dynamic Illumination Library of importance sampling algorithms that sample the most important lights in a scene and renders them physically accurate. 1. ReSTIR Direct Illumination: One-bounce lighting from many lights without complex data 2. ReSTIR Global Illumination: Resamples multi-bounce indirect lighting paths 3. ReSTIR Path Tracing: Samples the most impactful light paths for increased image quality [Download (GitHub)](https://github.com/NVIDIA-RTX/RTXDI) [Access RTX Branch of Unreal Engine 5](https://developer.nvidia.com/game-engines/unreal-engine/rtx-branch) ### RTX Global Illumination Scalable solution to compute multi-bounce indirect lighting. 1. Neural Radiance Cache (NRC): Use AI to predict the amount of light that's emitted from or passing through a specific area 2. Spatial Hash Radiance Cache( SHaRC): Fast and scalable algorithm to compute light in a given area 3. Dynamic Diffuse Global Illumination (DDGI): Probe-based solution that delivers multi-bounce indirect lighting without lightmaps or baking. [Download (GitHub)](https://github.com/NVIDIA-RTX/Rtxgi) [Access RTX Branch of Unreal Engine 5](https://developer.nvidia.com/game-engines/unreal-engine/rtx-branch) ### NVIDIA RTX™ Path Tracing NVIDIA RTX™ Path Tracing (RTXPT) merges years of best practices within real-time ray tracing and neural graphics development for building a real-time path tracer. [Download (GitHub)](https://github.com/NVIDIA-RTX/RTXPT) ### NVIDIA Real-Time Denoisers Library of denoisers designed to work with low ray-per-pixel signals. 1. ReBLUR: Denoise diffuse and specular signals 2. SIGMA: Denoise shadows 3. ReLAX: Denoise RTX Dynamic Illumination signals. [Download (GitHub)](https://github.com/NVIDIA-RTX/NRD) ### NVIDIA Opacity Micro-Map Efficiently map intricate geometries onto triangles and encode their opacity for better ray tracing performance. [Download (GitHub)](https://github.com/NVIDIA-RTX/Omm) ### RTX Memory Utility Compaction and suballocation of acceleration structures to reduce memory consumption. [Download (GitHub)](https://github.com/NVIDIA-RTX/Rtxmu) ### RTX Character Rendering Set of tools to create path-traced stand-based hair and skin. 1. Subsurface Scattering (SSS): Render skin with accurate lighting and translucency 2. Linear Swept Spheres (LSS): Blackwell accelerated sphere and curve primitive for strand-based path-traced hair. 3. Enhanced analytical Bi-Directional Scattering Distribution Function (BSDF): Provides shading for strand-based hair. 4. Disjoint Orthogonal Triangles Strips (DOTS): Provides high-quality strand-based hair for all GPUs. [Download (GitHub)](https://github.com/NVIDIA-RTX/Rtxcr) **RTX Branch of Unreal Engine 5 (Coming Soon)** * * * ## Download the New Microsoft Agility SDK Preview Today Cooperative vectors are a brand-new programming feature available now in Shader Model 6.9. It introduces RTX Tensor core acceleration for AI computations directly within game shaders. This enables significant performance improvements and efficiency gains in neural shading technologies. [Download Now](https://devblogs.microsoft.com/directx/directx12agility/ "Github Repo")[Learn More](https://devblogs.microsoft.com/directx/agility-sdk-1-717-preview-and-1-616-retail "Github Repo") * * * ## Additional RTX Rendering Technologies ### Generative AI Face Rendering RTX Neural Faces is a new generative AI algorithm that allows developers to cross the uncanny valley in real time. This AI model is trained on a character dataset built from the original photographs of the model. Using a synthetic generation pipeline, the base character dataset is expanded to variants with different lighting conditions, emotions and occlusion. This model then takes a rasterized face and 3D pose and generates an enhanced face in real time. [Get Notified when RTX Neural Faces is Available](/rtx-kit/notify-me) https://www.youtube-nocookie.com/embed/KnozAHKTz9o? ![Images of a building rendered better with NVIDIA DLSS 4 and Reflex technologies](https://developer.download.nvidia.com/images/rtx/kit/dlss-reflex.jpg) ### Better With DLSS and Reflex DLSS is a revolutionary suite of neural rendering technologies that uses AI to boost FPS, reduce latency, and improve image quality. ‌The latest breakthrough, DLSS 4, brings new Multi Frame Generation and enhanced Ray Reconstruction and Super Resolution, powered by [NVIDIA GeForce RTX™ 50 Series GPUs](https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/) and fifth-generation Tensor Cores. Reflex technologies optimize the graphics pipeline for ultimate responsiveness, providing faster target acquisition, quicker reaction times, and improved aim precision in competitive games. Reflex 2 introduces Frame Warp, which further reduces latency based on the game’s latest mouse input. [Learn More about NVIDIA DLSS](https://developer.nvidia.com/rtx/dlss) [Learn More about NVIDIA Reflex](https://developer.nvidia.com/performance-rendering-tools/reflex) ### Improving Ray Traced Shader Performance Shader Execution Reordering (SER) is a performance optimization that unlocks the potential for better execution and memory coherence in ray tracing shaders. SER allows applications to easily reorder threads on the GPU, reducing the divergence effects that occur in particularly challenging ray tracing workloads like path tracing. New SER innovations in GeForce RTX 50 Series GPUs further improve efficiency and precision of shader reordering operations compared to GeForce RTX 40 Series GPUs. [Learn How to Get Started](/blog/improve-shader-performance-and-in-game-frame-rates-with-shader-execution-reordering/) ![NVIDIA RTX Neural Faces generates an enhanced face in real time](https://developer.download.nvidia.com/images/rtx/kit/ser-1920-1080.jpg) * * * ## RTX Kit Library * * * ## On-Demand Sessions * * * ## More Resources ![A decorative image representing Developer Community](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Developer Forums ![A decorative image representing Inception for Startups](https://developer.download.nvidia.com/images/m48-misc-question-faq-256px-blk.png) ### RTX Kit FAQ ![img-alt-text](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign up for Developer Newsletter * * * Get Started with NVIDIA RTX Kit Today [Download](#section-getting-started) --- # Source: https://developer.nvidia.com/ai-apps-for-rtx-pcs/sdks-models.md # AI SDKs and Models for Your RTX PC Application Add cutting-edge AI features for your RTX PC applications with NVIDIA SDKs and pretrained models. ## Build With NVIDIA AI SDKs 3D and Graphic Design Generative AI Video and Broadcast ![](https://developer.download.nvidia.com/images/rtx/improve-multiply-graphics-performance-with-ai-1920x1080.jpeg) ### NVIDIA DLSS: Improve Image Quality and Multiply Graphics Performance Use NVIDIA DLSS, a neural graphics technology that multiplies performance using AI, to create entirely new frames, display higher resolution through image reconstruction, and improve the image quality of intensive ray-traced content—all while delivering best-in-class image quality and responsiveness. [Learn More About DLSS ](/rtx/dlss) ![](https://developer.download.nvidia.com/images/rtx/advanced-ai-ray-tracing-1920x1080.jpg) ### NVIDIA OptiX: Advanced AI Ray Tracing Algorithms on the GPU Use NVIDIA OptiX™ Ray Tracing Engine, an application framework for achieving optimal ray-tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray-tracing algorithms, including an advanced AI denoiser. Bring the power of NVIDIA GPUs to ray-tracing applications with programmable intersection, ray generation, and shading. [Learn More About OptiX ](/rtx/ray-tracing/optix) ![](https://developer.download.nvidia.com/images/rtx/render-photo-realistic-assets-with-ai-1920x1080.jpg) ### NVIDIA RTX Kit: Render Photo-Realistic Assets Use NVIDIA RTX™ Kit, a suite of neural rendering technologies to render assets with AI, better organize geometry for improved path tracing performance, and tools to create assets with photo-realistic visuals. [Learn More About RTX Kit ](/rtx-kit) ![](https://developer.download.nvidia.com/images/rtx/digital-humans-1920x1080.jpeg) ### NVIDIA ACE: Bring Digital Humans to Life Use NVIDIA ACE - a suite of digital human technologies for bringing digital assistants and game characters to life with generative AI. [Learn More About ACE ](/ace) ![](https://developer.download.nvidia.com/images/rtx/improve-video-quality-video-conferencing-1920x1080.jpg) ### NVIDIA Maxine: Improve Real-Time Audio and Video for Conferencing and Telepresence Workflows Use NVIDIA Maxine, a collection of high-performance, easy-to-use, NVIDIA NIM™ microservices and SDKs, for deploying AI features that enhance audio, video, and augmented reality effects for video conferencing and telepresence. [Learn More About Maxine ](/maxine) ![](https://developer.download.nvidia.com/images/rtx/improve-video-quality-creative-media-1920x1080.jpeg) ### RTX Video: Improve Streamed Video Quality on Browsers, Creative, and Media Playback Apps Use NVIDIA RTX Video with AI-enhanced RTX Video Super Resolution and RTX Video HDR effects for creative and media playback apps to improve sharpness, clarity, and the automatic conversion of SDR video to HDR. [Learn More About RTX Video ](/rtx-video-sdk) ![](https://developer.download.nvidia.com/images/rtx/accelerate-video-encode-decode-1920x1080.jpeg) ### Video Codec SDK: Accelerate Video Encode and Decode Use Video Codec SDK, a comprehensive set of APIs including high-performance tools, samples, and documentation for hardware-accelerated video encode and decode on NVIDIA GPUs, for several popular codecs. [Learn More About Video Codec SDK ](/video-codec-sdk) ![](https://developer.download.nvidia.com/images/rtx/track-true-object-motion-1920x1080.jpeg) ### Optical Flow SDK: Track True Object Motion on Videos Use NVIDIA Optical Flow SDK, which taps into the latest hardware capabilities dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, ideal for handling frame-to-frame intensity variations and tracking true object motion. [Learn More About Optical Flow SDK ](/optical-flow-sdk) * * * ## Example Workflows Accelerated With NVIDIA AI on RTX PCs Generative AI Video Broadcast 3D and Graphic Design Photography Audio https://www.youtube-nocookie.com/embed/uryeFhnNzEs?&rel=0 ### NVIDIA ACE (2:24) https://www.youtube-nocookie.com/embed/fc_NSAu41b0?&rel=0 ### NVIDIA ChatRTX (0:53) https://www.youtube-nocookie.com/embed/rOHCjPEwviQ?&rel=0 ### NVIDIA and Blackmagic Design Showcase Enhanced AI Features of NVIDIA RTX (1:26) https://www.youtube-nocookie.com/embed/E1puf1kdkbA?&rel=0 ### GPU-Acceleration Benefits in Adobe Premiere Pro With NVIDIA (1:20) https://www.youtube-nocookie.com/embed/nR-vP_7XFHE?&rel=0 ### NVIDIA Broadcast 1.4 Update Featuring Eye Contact (0:32) https://www.youtube-nocookie.com/embed/3GPNsPMqY8o?&rel=0 ### NVIDIA Maxine: Reinventing Real-Time Video Communications With AI (1:13) https://www.youtube-nocookie.com/embed/KfcsdhGKb1U?&rel=0 ### NVIDIA GET3D: AI Model to Populate Virtual Worlds With 3D Objects and Characters (0:35) https://www.youtube-nocookie.com/embed/h3xT_F_tEuU?&rel=0 ### Instantly Create Expressive Facial Animation With NVIDIA Audio2Face (0:45) https://www.youtube-nocookie.com/embed/WwCe9Woy1jw?&rel=0 ### Accelerating AI in Photoshop Neural Filters (3:05) https://www.youtube-nocookie.com/embed/UURFgE43PUg?&rel=0 ### NVIDIA RTX Accelerates AI Super Resolution in Adobe Photoshop (1:39) https://www.youtube-nocookie.com/embed/jZ5TQs7NihU?&rel=0 ### NVIDIA Riva: Using Speech AI for Transcription, Translation, and Voice (2:03) https://www.youtube-nocookie.com/embed/WO4KB5VjT2E?&rel=0 ### Improving End-to-End Conversation Quality With NVIDIA Maxine (1:38) * * * ## Power Your App with State-of-the-Art AI Models for Every Task Power your application with thousands of pre-trained models that exist in the open-source community, all of which are accelerated on RTX today. ‌Get started today with highly accurate models that span diverse use cases and domains. ![Language Models](https://developer.download.nvidia.com/images/rtx/llm-corp-blog-chat-labs-blog4-1280x680.jpg) ### Language Models Language models facilitate application development for natural language downstream tasks like text generation, summarization, chatbots, question answering, translation, and more. These models use the transformer architecture to achieve high accuracy across a wide range of complexities in each task type. [Explore Language Models](https://catalog.ngc.nvidia.com/models?filters=platform%7CRuns+on+RTX%7Cpltfm_runs_on_rtx&orderBy=weightPopularDESC&query=&page=&pageSize=) [See our FAQ for Integrating LLMs Locally](https://forums.developer.nvidia.com/t/how-to-deploy-llms-on-rtx-pcs/317354) ![Image Generation Models](https://developer.download.nvidia.com/images/rtx/image-generation-models-1920x1080.jpeg) ### Image Generation Models NVIDIA is at the forefront of image generation, partnering with popular model developers like Black Forest Labs and Stability.ai. [Explore Image Generation Models](https://www.nvidia.com/en-us/research/ai-playground/) ![Computer Vision Models](https://developer.download.nvidia.com/images/rtx/computer-vision-models-800x450.jpg) ### Computer Vision Models With computer vision, devices can understand the world through images and videos. Computer vision models can be used for image classification, object detection, and tracking, object recognition, semantic segmentation, and instance segmentation. [Explore Computer Vision Models](https://catalog.ngc.nvidia.com/models?filters=application%7CObject+Detection%7Cuscs_object_detection%2Capplication%7CImage+Segmentation%7Cuscs_image_segmentation%2Capplication%7CVideo+Analytics%7Cuscs_video_analytics&orderBy=weightPopularDESC&query=&page=&pageSize=) ![Speech Models](https://developer.download.nvidia.com/images/rtx/speech-ai-visual-riva-1200x628.jpg) ### Speech Models Speech models deal with recognizing and transcribing audio into text or generating speech from text. It includes text-to-speech (TTS) and automatic speech recognition (ASR) models. [Explore Speech Models](https://catalog.ngc.nvidia.com/models?filters=application%7CSpeech+to+Text%7Cuscs_speech_to_text%2Capplication%7CSpeech+enhancement%7Cuscs_speech_enhancement%2Capplication%7CAudio+Synthesis%7Cuscs_audio_synthesis%2Capplication%7CText+to+Speech%7Cuscs_text_to_speech&orderBy=weightPopularDESC&query=&page=&pageSize=) * * * ## More Resources ![](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Join the NVIDIA Developer Program ![](https://developer.download.nvidia.com/icons/m48-email-settings.svg) ### Sign up for Developer Newsletter ![](https://developer.download.nvidia.com/icons/m48-ai-startup.svg) ### Accelerate your Startup --- # Source: https://developer.nvidia.com/isaac/sim.md 1. [Home](/) [Isaac](https://developer.nvidia.com/isaac) Isaac Sim # NVIDIA Isaac Sim NVIDIA Isaac Sim™ is an open-source reference framework built on [NVIDIA Omniverse](/omniverse)™ that enables developers to simulate and test AI-driven robotics solutions in physically based virtual environments. Isaac Sim is fully extensible. This enables developers to build [Universal Scene Description (OpenUSD)](https://www.nvidia.com/en-us/omniverse/usd/?ncid=pa-srch-goog-230293-vt49&_bt=697115562887&_bk=openusd&_bm=p&_bn=g&_bg=162278196418&gad_source=1&gclid=EAIaIQobChMI3YzrqYG7hgMV2zWtBh2QhwopEAAYASAAEgJbVfD_BwE)-based custom simulators or integrate core Isaac Sim technologies into existing testing and validation pipelines. Ready to get started? [Download Now From GitHub](https://github.com/isaac-sim/IsaacSim "Download on Github")[Isaac Sim on Brev](https://docs.isaacsim.omniverse.nvidia.com/5.1.0/installation/install_advanced_cloud_setup_brev.html "Isaac Sim on Brev") * * * ## How Isaac Sim Works Isaac Sim facilitates three essential workflows: generating synthetic data for training or post-training robot models used for perception, mobility, and manipulation. It also enables validating robot stacks through software and hardware-in-loop testing and enabling [robot learning](https://www.nvidia.com/en-us/use-cases/robot-learning/) through NVIDIA [Isaac™ Lab](/isaac/lab). ![A diagram showing how NVIDIA NeMo Retriever works from data ingestion to information retrieval.](https://developer.download.nvidia.com/images/isaac/nvidia-isaac-sim-diagram.jpg) _NVIDIA NeMo Retriever collection of NIM microservices are used to build optimized ingestion and retrieval pipelines for highly accurate information retrieval at scale._ ### Isaac Sim Documentation Browse documentation and learn how to get started on Isaac Sim. [Read the Quick Start Guide](https://docs.isaacsim.omniverse.nvidia.com/latest/index.html) ### Robotics Simulation Overview Learn how robotics simulation helps developers virtually train, test, and validate robots, and the advantages of a simulation-first approach. [Learn More](https://www.nvidia.com/en-us/use-cases/robotics-simulation/) ### Isaac Sim Courses Gain a foundational understanding of core robotics concepts and explore essential workflows in simulation and robot learning with hands-on training in Isaac Sim™ and Isaac Lab. [Take the Courses](https://www.nvidia.com/en-us/learn/learning-path/robotics/) ### Isaac Sim Office Hours Stay informed with our [recurring Office Hours](https://addevent.com/calendar/ae483892) that cover in-depth topics with experts and customers using Isaac Sim. [Watch the Livestreams](https://www.youtube.com/watch?v=ybtJxQbj2NE&list=PL3jK4xNnlCVewIu3MAcrP3HbvVUmKWNpS) * * * ## Key Features ### Pre-Populated Robots and SimReady Assets Isaac Sim supports a wide range of robots with differential bases, form factors, and functions built on OpenUSD that have the ideal physics properties to speed up robot simulation. - **Humanoids:** 1X, Agility, Fourier Intelligence, and Sanctuary - **Manipulators:** Fanuc, KUKA, Universal Robots, and Techman - **Quadrupeds:** ANYbotics, Boston Dynamics, and Unitree - **AMRs:** idealworks, iRobot Access over 1,000 [SimReady](https://www.nvidia.com/en-us/glossary/simready/) 3D assets—including conveyors, boxes, and pallets—to build your simulation scene. Isaac Sim now supports robot and sensor schema, making it much easier to define metadata. ![NVIDIA Isaac Lab application for robot learning and foundation model training](https://developer.download.nvidia.com/images/isaac/sim/pre-populated-robots-and-sim-ready-assets.jpg) ### Bootstrap AI Model Development Bootstrap AI model training with [synthetic data generation](https://www.nvidia.com/en-us/use-cases/synthetic-data/), where data is limited or restricted. Developers can further use this data to augment with NVIDIA [Cosmos™ world foundation models](https://www.nvidia.com/en-us/ai/cosmos/) and post-train vision-language-action models such as [GR00T N1.5](/isaac/gr00t). ### Render Real-World Data in Simulation [NVIDIA Omniverse NuRec](https://docs.omniverse.nvidia.com/materials-and-rendering/latest/neural-rendering.html)neural rendering capabilities speed up simulation by turning captured sensor data into interactive simulation scenes. ### Modular Architecture for Robotics Workflows Create custom workflows or integrate with your existing ones to support various types of robots, including [humanoids](https://www.nvidia.com/en-us/use-cases/humanoid-robots/), manipulators, and autonomous mobile robots (AMRs). ### Realistic Physics Simulation Tap into [NVIDIA PhysX®](/physx-sdk)for physics capabilities like joint friction, actuation, rigid and soft body dynamics, velocity, and more. * * * ## Get Started With Isaac Sim ![](https://developer.download.nvidia.com/images/isaac/m48-nim.svg) ### Set Up Your System Check to see if your machine meets the system requirements and compatibility, then get started by installing Isaac Sim. [Set Up Your Machine](https://docs.isaacsim.omniverse.nvidia.com/latest/introduction/quickstart_index.html#quick-tutorials) ![](https://developer.download.nvidia.com/images/isaac/m48-speech-recognition.svg) ### Take the Self-Paced Course In this beginner course, you will learn how to build a simple robot, apply physics properties, integrate advanced sensors, and troubleshoot common issues in Isaac Sim. [Get Started](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-27+V1) ![](https://developer.download.nvidia.com/images/isaac/m48-digital-deep-learning-institute-talks-training.svg) ### Connect With the Community Engage with the Isaac Sim community by joining the NVIDIA forum. Find answers for troubleshooting and post your own questions. [Join the Isaac Sim Forum](https://forums.developer.nvidia.com/c/omniverse/simulation/69) ## Expedite Robotics Workflows ![A decorative image of building AI application with NVIDIA NIM APIs](https://developer.download.nvidia.com/images/isaac/sim/nvidia-cosmos-ari.jpg) ### NVIDIA Cosmos [NVIDIA Cosmos](http://www.nvidia.com/en-us/ai/cosmos) is a platform comprising state-of-the-art generative [world foundation models](https://www.nvidia.com/en-us/glossary/world-models/), advanced tokenizers, guardrails, and an accelerated video processing pipeline built to accelerate the development of [physical AI](https://www.nvidia.com/en-us/glossary/physical-ai/) systems such as autonomous vehicles and robots. [Get Started With NVIDIA Cosmos](https://nvidia-cosmos.github.io/cosmos-cookbook/) ![A decorative image of accessing hands-on labs with NVIDIA LaunchPad](https://developer.download.nvidia.com/images/isaac/sim/nvidia-isaac-lab-1920x1080.jpg) ### NVIDIA Isaac™ Lab Built on Isaac Sim, Isaac Lab is an open-source unified framework for robot learning to train robot policies. [Learn More](/isaac/lab) ![A decorative image of deploying with NVIDIA AI Enterprise](https://developer.download.nvidia.com/images/isaac/sim/nvidia-physical-ai-ari.jpg) ### NVIDIA Physical AI Dataset Unblock data bottlenecks with the NVIDIA Physical AI Dataset, an open-source dataset composed of validated data used to build NVIDIA physical AI—now freely available to developers on Hugging Face. [Access the Datasets](https://huggingface.co/collections/nvidia/physicalai-67c643edbb024053dcbcd6d8) * * * ## Starter Kits ### Neural Reconstruction and Rendering With NVIDIA Omniverse NuRec Turn real world sensor data into interactive simulation using 3D Gaussian Splatting-based rendering for enhanced efficiency and accuracy. - [Getting Started With Neural Rendering](https://docs.omniverse.nvidia.com/materials-and-rendering/latest/neural-rendering.html) - [How to Instantly Render Real-World Scenes in Interactive Simulation](https://developer.nvidia.com/blog/how-to-instantly-render-real-world-scenes-in-interactive-simulation/) - [3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting](https://research.nvidia.com/labs/toronto-ai/3DGUT/) - [Reconstruct a Scene in NVIDIA Isaac Sim Using Only a Smartphone](/blog/reconstruct-a-scene-in-nvidia-isaac-sim-using-only-a-smartphone/) - [Simulate Robotic Environments Faster with NVIDIA Isaac Sim and World Labs Marble](/blog/simulate-robotic-environments-faster-with-nvidia-isaac-sim-and-world-labs-marble) ### Realistic Physics Simulation Model the physical behavior of objects and systems foundational to physical AI. Isaac Sim can simulate rigid body and vehicle dynamics, multi-joint articulation, SDF colliders, and more for realistic physics simulation - [Physics Simulation Fundamentals](https://docs.omniverse.nvidia.com/isaacsim/latest/simulation_fundamentals.html) - [Getting Started Guides for Sensor Simulation](https://docs.omniverse.nvidia.com/isaacsim/latest/features/sensors_simulation/index.html) - [NVIDIA® PhysX®](/physx-sdk) - [Newton](https://developer.nvidia.com/blog/announcing-newton-an-open-source-physics-engine-for-robotics-simulation) ### Scalable Synthetic Data Generation Bootstrap AI model training with synthetic data. Generate training data by randomizing attributes like lighting, reflection, color, and position of scene and assets. - [Synthetic Data Generation Use Cases](https://www.nvidia.com/en-us/use-cases/synthetic-data/) - [Omniverse Replicator Getting Started Guide](https://docs.omniverse.nvidia.com/extensions/latest/ext_replicator/getting_started.html) - [Build Synthetic Data Pipelines to Train Smarter Robots](/blog/build-synthetic-data-pipelines-to-train-smarter-robots-with-nvidia-isaac-sim/) - [Scaling Action Recognition Models With Synthetic Data Blog](https://developer.nvidia.com/blog/scaling-action-recognition-models-with-synthetic-data/) ### ROS Support Custom ROS2 messages and URDF/MJCF are now open-source. Get support for custom ROS messages that allow standalone scripting to manually control the simulation steps. - [URDF Importer Getting Started Guide](https://docs.isaacsim.omniverse.nvidia.com/latest/importer_exporter/ext_isaacsim_asset_importer_urdf.html) ### Robotics Learning Virtually train, test, and validate robotics systems using NVIDIA Isaac Lab. - [Isaac Lab Whitepaper](https://research.nvidia.com/publication/2025-09_isaac-lab-gpu-accelerated-simulation-framework-multi-modal-robot-learning) - [Isaac Lab Reference Architecture](https://isaac-sim.github.io/IsaacLab/main/source/refs/reference_architecture/index.html) - [Isaac GR00T for Synthetic Manipulation Motion Generation](https://build.nvidia.com/nvidia/isaac-gr00t-synthetic-manipulation) ### Industrial Facility Digital Twin Build intelligent factory, warehouse, and industrial facility solutions that enable comprehensive design, simulation, and optimization of industrial assets and processes. - [Mega NVIDIA Omniverse Blueprint for Multi-Robot Fleet Simulation](https://build.nvidia.com/nvidia/mega-multi-robot-fleets-for-industrial-automation) * * * #### Newton, the Next-Generation Open-Source Physics Simulation Engine Newton is an open-source, GPU-accelerated, and extensible physics engine, co-developed by Google DeepMind and Disney Research, and [managed by the Linux Foundation](https://www.linuxfoundation.org/press/linux-foundation-announces-contribution-of-newton-by-disney-research-google-deepmind-and-nvidia-to-accelerate-open-robot-learning). Built on NVIDIA Warp and OpenUSD, Newton is optimized for robotics and compatible with learning frameworks such as MuJoCo Playground or NVIDIA Isaac Lab. [Newton Beta](https://github.com/newton-physics) is now available to use. [Get Started on Newton](/newton-physics) ![](https://developer.download.nvidia.com/images/isaac/newton-ari.jpg) * * * ## Isaac Sim Learning Library Featured Tech Blog Using Synthetic Data for Model Training Read the step-by-step technical guide on train AMRs to detect warehouse pallet jacks using synthetic data. Featured Tech Blog Validating Robot Models in Simulation Learn how to develop and deploy AI-powered robots using NVIDIA Isaac Sim and NVIDIA TAO Toolkit. Featured Tech Blog Beginner’s Guide to ROS 2 Workflows With Isaac Sim Learn how to simulate and validate your robot stack with ROS 2 packages using NVIDIA Isaac Sim. * * * ## Latest Robotics News * * * ## More Resources ![NVIDIA Developer Forums](https://developer.download.nvidia.com/images/omniverse/m48-people-group.svg) ### Explore the Community ![NVIDIA Training and Certification](https://developer.download.nvidia.com/images/isaac/m48-certification-ribbon-2-256px-blk.png) ### Get Training and Certification ![NVIDIA Inception Program for Startups](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Join the Program for Startups * * * ## FAQ Yes, Isaac Sim is free to use, licensed as open source under Apache 2.0 and available on GitHub. With Isaac Sim which can be used for free with Isaac Sim under the [NVIDIA Isaac Sim Additional Software and Materials License](https://www.nvidia.com/en-us/agreements/enterprise-software/isaac-sim-additional-software-and-materials-license/). Yes, ISVs can integrate Isaac Sim into their software solutions with Omniverse Kit. While Isaac Sim is open-source and can be distributed freely, distribution of Omniverse Kit requires a separate license with NVIDIA, which is available via an Omniverse Enterprise subscription. This license grants the rights to redistribute, sublicense and and support Omniverse Kit as part of Isaac Sim within your product. You can import 3D robot models using OnShape, URDF, MJCF, and ShapeNet Importers and through CAD converter. Please refer to the [documentation](https://docs.isaacsim.omniverse.nvidia.com/latest/installation/requirements.html) for more details. Yes, you can connect Isaac Sim to ROS/ROS2 using the Isaac ROS/ROS2 Bridge Extensions. Please refer to the [documentation](https://docs.isaacsim.omniverse.nvidia.com/latest/installation/install_ros.html)for more details. Yes, you can access Isaac Sim on Brev, which gives one-click access to NVIDIA GPU instances on popular cloud platforms. You can also download it as a container from [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/isaac-sim) and run it on your preferred CSP (cloud service provider). For AWS users, the Isaac Sim container is available on the [AWS marketplace](https://aws.amazon.com/marketplace/search/results?prevFilters=%7B%22id%22%3A%22c568fe05-e33b-411c-b0ab-047218431da9%22%7D&searchTerms=NVIDIA+Isaac+Sim) for easy deployment. While Isaac Sim is free to deploy on AWS EC2 for development and research purposes, you'll still need to pay for any AWS EC2-related services and fees. Isaac Lab is an open-source, lightweight reference application built on the Isaac Sim platform specifically optimized for robot learning at scale. Learn more about Isaac Lab [here](/isaac/lab). [Annotators](https://docs.omniverse.nvidia.com/isaacsim/latest/replicator_tutorials/ext_replicator-agent/writer_control.html?highlight=annotators#rtsp-writer) can include RGB, bounding box, instance segmentation, semantic segmentation, and more. The annotated data can be exported in [COCO and KITTI formats](https://docs.omniverse.nvidia.com/isaacsim/latest/replicator_tutorials/tutorial_replicator_scene_based_sdg.html?highlight=COCO#config-scenarios). Isaac Sim can be easily scaled to multiple GPUs for faster simulations. Learn more [here](https://docs.isaacsim.omniverse.nvidia.com/latest/installation/install_faq.html#isaac-sim-setup-faq). ## Get started today with NVIDIA Isaac Sim today. [Download Isaac Sim](https://docs.isaacsim.omniverse.nvidia.com/5.1.0/installation/download.html "Download Now")[Download Container ](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/isaac-sim "Download Container from NGC") --- # Source: https://developer.nvidia.com/industries/telecommunications.md # Telecommunications Developer Resources A hub of news, SDKs, technical resources, and more for developers working in telecommunications. ## App Frameworks and SDKs #### 5G/6G RAN [NVIDIA Aerial™ CUDA®-Accelerated RAN](https://docs.nvidia.com/aerial/cuda-accelerated-ran/latest/index.html) is an application framework for building commercial-grade, software-defined, GPU-accelerated, cloud-native 5G and 6G networks. The Aerial CUDA-Accelerated RAN platform is the key building block for the accelerated 5G virtualized distributed unit (vDU) and has been deployed in commercial and research networks. **[Learn More](https://developer.nvidia.com/industries/telecommunications/ai-aerial)** #### Aerial Framework (RAN) [NVIDIA Aerial Framework](https://docs.nvidia.com/aerial/#aerial-framework) is a toolkit for transitioning from Python to CUDA-accelerated vRAN, including multimodal signal processing, app integration, and performance optimization. **[Learn More](https://developer.nvidia.com/industries/telecommunications/ai-aerial)** #### Network Digital Twin The [NVIDIA Aerial Omniverse™ Digital Twin](https://docs.nvidia.com/aerial/aerial-dt/index.html) is a next-generation, system-level simulation platform for performing cutting-edge research and development on 5G and 6G wireless systems and optimizing network planning and operations. It’s the unique tool for benchmarking system performance and exploring machine learning-based wireless communication algorithms under real-world conditions. **[Learn More](https://developer.nvidia.com/industries/telecommunications/ai-aerial)** #### Network Planning Telecom network engineers and developers can automate network configuration, optimize performance, and reduce unplanned downtime with a pre-built blueprint using an agentic LLM-driven framework—built with [NVIDIA NIM™ microservices](https://www.nvidia.com/en-us/ai-data-science/products/nim-microservices/). **[Learn More](https://build.nvidia.com/nvidia/telco-network-configuration)** #### Network Slicing Set up and run an open-source 5G network simulator lab, coupled with an agentic AI workflow for dynamic bandwidth allocation. The lab uses Open Air Interface for the 5G core and RAN, FlexRIC for slice management, and iPerf for traffic generation, with performance data logged in a Kinetica database. The agentic AI workflow, powered by a LangGraph agent, analyzes traffic data and optimizes bandwidth allocation. **[Explore 5G Network Simulator](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/community/autonomous_5g_slicing_lab)****[Explore O-RAN Chatbot](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/community/oran-chatbot-multimodal)** #### Generative AI [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/)™ is an end-to-end platform for developing custom generative AI—including large language models (LLMs), multimodal, vision, and speech AI—anywhere. Deliver enterprise-ready models with precise data curation, cutting-edge customization, retrieval-augmented generation (RAG), and accelerated performance. **[Learn More](https://www.nvidia.com/en-us/ai-data-science/products/nemo/)** #### DOCA The [NVIDIA DOCA™ SDK](/networking/doca) enables developers to rapidly create applications and services on top of NVIDIA® BlueField® data processing units (DPUs), leveraging industry-standard APIs. **[Learn More](/networking/doca)** #### Cybersecurity [NVIDIA Morpheus](/morpheus-cybersecurity) is an open application framework that lets cybersecurity developers create optimized AI pipelines for filtering, processing, and classifying large volumes of real-time data. Morpheus can send and receive telemetry data directly from the [BlueField DPU](https://www.nvidia.com/en-us/networking/products/data-processing-unit/), allowing continuous, real-time, and variable feedback that can affect policies, rewrite rules, adjust sensing, and more. **[Learn More](/morpheus-cybersecurity)** #### Computer Vision [NVIDIA Metropolis](/metropolis) is an application framework that simplifies the development, deployment, and scaling of AI-enabled video analytics applications from edge to cloud. It includes production-ready, [pretrained models and transfer learning toolkits](/transfer-learning-toolkit) for training and optimizing models, the [NVIDIA DeepStream SDK](/deepstream-sdk) for streaming analytics, other deployment SDKs, [CUDA-X](/gpu-accelerated-libraries)™ libraries, and the [NVIDIA EGX](https://www.nvidia.com/en-us/data-center/products/egx/)™ platform. **[Explore Tools](/metropolis)** #### Data Science and Analytics The [NVIDIA RAPIDS™](/rapids) suite of open-source software libraries, built on CUDA-X AI™, gives developers the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. **[Get Started With Rapids](/rapids)** #### Conversational AI [NVIDIA Riva](/riva) is an SDK for building and deploying AI applications that fuse vision, speech, and other sensors. It offers a complete workflow to build, train, and deploy GPU-accelerated AI systems that can use visual cues such as gestures and gaze along with speech in context. **[Learn More](/riva)** #### Video Conferencing [NVIDIA Maxine™](/maxine/) is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming. **[Learn More](/maxine/)** ## Browse by Resource Type - [ **News** ](#news) - [ **On-Demand Talks** ](#on-demand) ## Programs for You #### 6G Developer Program The NVIDIA 6G Developer Program provides access to platforms, documentation, and software releases to advance cutting-edge 6G research. It also offers a forum where developers can connect with NVIDIA experts and peers. Join the program to drive rapid innovation for the 6G era. [Join Today](/6g-program) #### Developer Resources The NVIDIA Developer Program provides the advanced tools and training needed to successfully build applications on all NVIDIA technology platforms. This includes access to hundreds of SDKs, a network of like-minded developers through our community forums, and more. [Join Today](/developer-program) #### Technical Training NVIDIA Deep Learning Institute (DLI) offers hands-on- training in AI, accelerated computing, and accelerated data science to solve real-world problems. Powered by GPUs in the cloud, training is available as self-paced, online courses or live, instructor-led workshops. [View Courses](https://www.nvidia.com/en-us/training/) #### Accelerate Your Startup NVIDIA Inception—an acceleration platform for AI, data science, and HPC startups—supports over 22,000 startups worldwide with go-to-market support, expertise, and technology. Startups get access to training through the DLI, preferred pricing on hardware, and invitations to exclusive networking events. [Learn More](https://www.nvidia.com/en-us/deep-learning-ai/startups/) ### Sign up for the latest developer news from NVIDIA [Subscribe](https://www.nvidia.com/en-us/preferences/email-signup/) --- # Source: https://developer.nvidia.com/tensorrt.md 1. [Topic](/topics/) [AI Inference](/topics/ai/ai-inference/) TensorRT # NVIDIA TensorRT NVIDIA® TensorRT™ is an ecosystem of tools for developers to achieve high-performance deep learning inference. TensorRT includes inference compilers, runtimes, and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes the TensorRT compiler, TensorRT-LLM, TensorRT Model Optimizer, TensorRT for RTX, and TensorRT Cloud. [Download Now](https://developer.nvidia.com/tensorrt/download)[Documentation ](https://docs.nvidia.com/deeplearning/tensorrt/)[GitHub](https://github.com/NVIDIA/TensorRT) * * * ## How TensorRT Works Speed up inference by 36X compared to CPU-only platforms. Built on the NVIDIA® CUDA® parallel programming model, TensorRT includes libraries that optimize neural network models trained on all major frameworks, calibrate them for lower precision with high accuracy, and deploy them to hyperscale data centers, workstations, laptops, and edge devices. TensorRT optimizes inference using quantization, layer and tensor fusion, and kernel tuning techniques. NVIDIA TensorRT Model Optimizer provides easy-to-use quantization techniques, including post-training quantization and quantization-aware training to compress your models. FP8, FP4, INT8, INT4, and advanced techniques such as AWQ are supported for your deep learning inference optimization needs. Quantized inference significantly minimizes latency and memory bandwidth, which is required for many real-time services, autonomous and embedded applications. ![](https://developer.download.nvidia.com/images/tensorrt/how-tensor-rt-works.jpg) ### Read the Introductory TensorRT Blog Learn how to apply TensorRT optimizations and deploy a PyTorch model to GPUs. [Read Blog](https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorrt-updated/) ### Watch On-Demand TensorRT Sessions From GTC Learn more about TensorRT and its features from a curated list of webinars at GTC. [Watch Sessions](https://www.nvidia.com/en-us/on-demand/playlist/playList-53110dbc-c11d-4619-b821-987015090afa/) ### Get the Complete Developer Guide See how to get started with TensorRT in this step-by-step developer and API reference guide. [Read Guide](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html) ### Navigate AI infrastructure and Performance Learn how to lower your cost per token and get the most out of your AI models with our ebook. [View Ebook](https://www.nvidia.com/en-us/solutions/ai/inference/balancing-cost-latency-and-performance-ebook/) * * * ## Key Features ### Large Language Model Inference [NVIDIA TensorRT-LLM](https://developer.nvidia.com/blog/optimizing-inference-on-llms-with-tensorrt-llm-now-publicly-available/) is an open-source library that accelerates and optimizes inference performance of large language models (LLMs) on the NVIDIA AI platform with a simplified Python API. Developers accelerate LLM performance on NVIDIA GPUs in the data center or on workstation GPUs. ### Compile in the Cloud NVIDIA TensorRT Cloud is a developer-focused service for generating hyper-optimized engines for given constraints and KPIs. Given an LLM and inference throughput/latency requirements, a developer can invoke TensorRT Cloud service using a command-line interface to hyper-optimize a TensorRT-LLM engine for a target GPU. The cloud service will automatically determine the best engine configuration that meets the requirements. Developers can also use the service to build optimized TensorRT engines from ONNX models on a variety of NVIDIA RTX, GeForce, Quadro®, or Tesla®-class GPUs. TensorRT Cloud is available with limited access to select partners. [Apply](https://developer.nvidia.com/tensorrt-cloud-program) for access, subject to approval. ### Optimize Neural Networks [TensorRT Model Optimizer](https://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/) is a unified library of state-of-the-art model optimization techniques, including quantization, pruning, speculation, sparsity, and distillation. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, and SGLang to efficiently optimize inference on NVIDIA GPUs. TensorRT Model Optimizer also supports training for inference techniques such as Speculative Decoding Module Training, Pruning/Distillation, and Quantization Aware Training through NeMo and Hugging Face frameworks. ### Major Framework Integrations TensorRT integrates directly into [PyTorch](https://developer.nvidia.com/blog/accelerating-inference-up-to-6x-faster-in-pytorch-with-torch-tensorrt/) and [Hugging Face](http://hf.co/blog/optimum-nvidia) to achieve 6X faster inference with a single line of code. TensorRT provides an [ONNX](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#fit) parser to import [ONNX](https://github.com/NVIDIA/TensorRT/blob/release/10.9/quickstart/IntroNotebooks/2.%20Using%20PyTorch%20through%20ONNX.ipynb) models from popular frameworks into TensorRT. [MATLAB](https://www.mathworks.com/help/gpucoder/ug/tensorrt-target.html) is integrated with TensorRT through GPU Coder to automatically generate high-performance inference engines for NVIDIA Jetson™, NVIDIA DRIVE®, and data center platforms. ### Deploy, Run, and Scale With Dynamo-Triton TensorRT-optimized models are deployed, run, and scaled with [NVIDIA Dynamo Triton](https://www.nvidia.com/en-us/ai-data-science/products/triton-inference-server/) inference-serving software that includes TensorRT as a backend. The advantages of using Triton include high throughput with dynamic batching, concurrent model execution, model ensembling, and streaming audio and video inputs. ### Simplify AI deployment on RTX TensorRT for RTX offers an optimized inference deployment solution for NVIDIA RTX GPUs. It facilitates faster engine build times within 15 to 30s, facilitating apps to build inference engines directly on target RTX PCs during app installation or on first run, and does so within a total library footprint of under 200 MB, minimizing memory footprint. Engines built with TensorRT for RTX are cross-OS, cross-GPU portable, ensuring a build once, deploy anywhere workflow. ### Accelerate Every Inference Platform TensorRT can optimize models for applications across the edge, laptops, desktops, and data centers. It powers key NVIDIA solutions—such as NVIDIA TAO, NVIDIA DRIVE, NVIDIA Clara™, and NVIDIA JetPack™—and is integrated with application-specific SDKs, such as NVIDIA NIM™, NVIDIA DeepStream, NVIDIA Riva, NVIDIA Merlin™, NVIDIA Maxine™, NVIDIA Morpheus, and NVIDIA Broadcast Engine. TensorRT provides developers a unified path to deploy intelligent video analytics, speech AI, recommender systems, video conferencing, AI-based cybersecurity, and streaming apps in production. * * * ## Get Started With TensorRT TensorRT is an ecosystem of APIs for building and deploying high-performance deep learning inference. It offers a variety of inference solutions for different developer requirements. | Use-case | Deployment Platform | Solution | | --- | --- | --- | | Inference for LLMs | Data center GPUs like GB100, H100, A100, etc. | Download TRT-LLM TensorRT-LLM is available for free on [GitHub](https://github.com/NVIDIA/TensorRT-LLM/tree/rel). [Download (GitHub)](https://github.com/NVIDIA/TensorRT-LLM/tree/rel) [Documentation](https://nvidia.github.io/TensorRT-LLM) | | Inference for non-LLMs like CNNs, Diffusions, Transformers, etc. Safety-compliant and high-performance inference for Automotive Embedded Inference for non-LLMs in robotics and edge applications | Data center GPUs, Embedded, and Edge platforms Automotive platform: NVIDIA DRIVE AGX Edge Platform: Jetson, NVIDIA IGX, etc. | Download TensorRT The TensorRT inference library provides a general-purpose AI compiler and an inference runtime that delivers low latency and high throughput for production applications. [Download SDK](https://developer.nvidia.com/nvidia-tensorrt-download) [Download Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt) | | AI Model Inferencing on RTX PCs | NVIDIA GeForce RTX and RTX Pro GPUs in laptops and desktops | Download TensorRT for RTX TensorRT for RTX is a dedicated inference deployment solution for RTX GPUs. [Download SDK](/tensorrt-rtx) [Documentation](https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/index.html) | | Model optimizations like Quantization, Distillation, Sparsity, etc. | Data center GPUs like GB100, H100, etc. | Download TensorRT Model Optimizer TensorRT Model Optimizer is free on NVIDIA PyPI, with examples and recipes on [GitHub](https://github.com/NVIDIA/TensorRT-Model-Optimizer). [Download (GitHub)](https://github.com/NVIDIA/TensorRT-Model-Optimizer) [Documentation](https://nvidia.github.io/TensorRT-Model-Optimizer) | * * * ## Get Started With TensorRT Frameworks TensorRT Frameworks add TensorRT compiler functionality to frameworks like PyTorch. ![TensorRT speeds up inference by 36X](https://developer.download.nvidia.com/icons/m48-download.svg) ### Download ONNX and Torch-TensorRT The TensorRT inference library provides a general-purpose AI compiler and an inference runtime that delivers low latency and high throughput for production applications. **ONYX:** [Documentation](https://github.com/NVIDIA/TensorRT/blob/release/10.9/quickstart/IntroNotebooks/2.%20Using%20PyTorch%20through%20ONNX.ipynb) **Torch-TensorRT:** [Download Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) [Documentation](https://pytorch.org/TensorRT/) ![TensorRT speeds up inference by 36X](https://developer.download.nvidia.com/icons/m48-accellerated-computing-with-cuda-python-256px-blk.png) ### Experience Tripy: Pythonic Inference With TensorRT Experience high-performance inference and excellent usability with Tripy. Expect intuitive APIs, easy debugging with eager mode, clear error messages, and top-notch documentation to streamline your deep learning deployment. [Documentation](https://nvidia.github.io/TensorRT-Incubator/index.html) [Examples](https://github.com/NVIDIA/TensorRT-Incubator/tree/main/tripy/examples) [Contribute](https://github.com/NVIDIA/TensorRT-Incubator/blob/main/tripy/CONTRIBUTING.md) ![TensorRT speeds up inference by 36X](https://developer.download.nvidia.com/icons/m48-digital-deep-learning-institute-talks-training.svg) ### Deploy Get a free license to try [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/) in production for 90 days using your existing infrastructure. [Request a 90-Day License](https://enterpriseproductregistration.nvidia.com/?LicType=EVAL&ProductFamily=NVAIEnterprise) * * * ## World-Leading Inference Performance TensorRT was behind NVIDIA’s wins across all [inference performance](https://developer.nvidia.com/blog/tag/inference-performance/) tests in the industry-standard benchmark for [MLPerf Inference](https://www.nvidia.com/en-us/data-center/mlperf/). TensorRT-LLM accelerates the latest large language models for [generative AI](https://www.nvidia.com/en-us/ai-data-science/generative-ai/), delivering up to 8X more performance, 5.3X better TCO, and nearly 6X lower energy consumption. [See All Benchmarks](/deep-learning-performance-training-inference/ai-inference) ### 8X Increase in GPT-J 6B Inference Performance ![TensorRT-LLM on H100 has 8X increase in GPT-J 6B inference performance](https://developer.download.nvidia.com/images/gpt-j-6b-630x354-1.jpg) ### 4X Higher Llama2 Inference Performance ![TensorRT-LLM on H100 has 4X Higher Llama2 Inference Performance](https://developer.download.nvidia.com/images/llama-2-70b-630x354-1.jpg) ### Total Cost of Ownership Lower is better ![TensorRT-LLM has lower total cost of ownership than GPT-J 6B and Llama 2 70B](https://developer.download.nvidia.com/images/cost-of-ownership-630x354-1.jpg) ### Energy Use Lower is better ![TensorRT-LLM has lower energy use than GPT-J 6B and Llama 2 70B](https://developer.download.nvidia.com/images/energy-use-630x354-1.jpg) * * * #### NVIDIA Blackwell Delivers Unmatched Performance and ROI for AI Inference The NVIDIA Blackwell platform—including NVFP4 low precision format, fifth-generation NVIDIA NVLink and NVLink Switch, and the NVIDIA TensorRT-LLM and NVIDIA Dynamo inference frameworks—enables the highest AI factory revenue: A $5M investment in GB200 NVL72 generates $75 million in token revenue—a 15x return on investment. This includes development with community frameworks such as SGLang, vLLM, and more. [Explore technical results](https://developer.nvidia.com/blog/nvidia-blackwell-leads-on-new-semianalysis-inferencemax-benchmarks/) ![NVIDIA Rivermax provides real-time streaming for the Las Vegas Sphere, world’s largest LED display](https://developer.download.nvidia.com/images/tensorrt/inference-tech-blog-sa-external-think-smart-1920x1080.png) * * * ## Starter Kits ### Beginner Guide to TensorRT - [View Quick-Start Guide](/tensorrt-getting-started) - [View Quick-Start Notebooks](https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html) - Read Blog: [Speeding Up Deep Learning Inference Using NVIDIA TensorRT](/blog/speeding-up-deep-learning-inference-using-tensorrt-updated/) - Read Blog: [Optimizing and Serving Models With TensorRT and Triton](/blog/optimizing-and-serving-models-with-nvidia-tensorrt-and-nvidia-triton/) - Watch Video: [Getting Started With NVIDIA TensorRT](https://www.youtube.com/watch?v=SlUouzxBldU) ### Beginner Guide to TensorRT-LLM - [View Quick-Start Guide](/tensorrt-getting-started) - [View Quick-Start Notebooks](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html) - Read Blog: [Speeding Up Deep Learning Inference Using NVIDIA TensorRT](/blog/speeding-up-deep-learning-inference-using-tensorrt-updated/) - Read Blog: [Optimizing and Serving Models With TensorRT and Triton](/blog/optimizing-and-serving-models-with-nvidia-tensorrt-and-nvidia-triton/) - Watch Video: [Getting Started With NVIDIA TensorRT](https://www.youtube.com/watch?v=SlUouzxBldU) ### Beginner Guide to TensorRT Model Optimizer - [Reference Architecture](https://docs.omniverse.nvidia.com/simready/latest/sim-needs/synth-data-gen.html) - [Workflow Guide & Documentation](https://docs.omniverse.nvidia.com/extensions/latest/ext_product-configurator.html) - [Training Courses](https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-OV-14+V1) - [NVIDIA Omniverse Blueprint for Precise Visual Generative AI](https://build.nvidia.com/nvidia/conditioning-for-precise-visual-generative-ai) ### Beginner Guide to Torch-TensorRT - Watch Video: [Getting Started With NVIDIA Torch-TensorRT](https://www.youtube.com/watch?v=TU5BMU6iYZ0) - Read Blog: [Accelerate Inference up to 6X in PyTorch](/blog/accelerating-inference-up-to-6x-faster-in-pytorch-with-torch-tensorrt/) - Download Notebook: [Object Detection With SSD](https://github.com/NVIDIA/Torch-TensorRT/blob/master/notebooks/ssd-object-detection-demo.ipynb) (Jupyter Notebook) ### Beginner Guide to TensorRT Pythonic Frontend: Tripy - [Introduction Guide](https://nvidia.github.io/TensorRT-Incubator/pre0_user_guides/00-introduction-to-tripy.html) - [ResNet-50 notebook](https://github.com/NVIDIA/TensorRT-Incubator/blob/main/tripy/notebooks/resnet50.ipynb) - [nanoGPT](https://github.com/NVIDIA/TensorRT-Incubator/tree/main/tripy/examples/nanogpt) - [Segment Anything Model V2](https://github.com/NVIDIA/TensorRT-Incubator/tree/main/tripy/examples/segment-anything-model-v2) ### Beginner Guide to TensorRT for RTX - [View Quick Start Guide ](https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/installing-tensorrt-rtx/installing.html) - [Access Samples and Demos](https://github.com/NVIDIA/TensorRT-RTX/tree/main) - [Read Blog: # Run High-Performance AI Applications with NVIDIA TensorRT for RTX ](https://developer.nvidia.com/blog/run-high-performance-ai-applications-with-nvidia-tensorrt-for-rtx/) - [Access TensorRT for RTX through WindowsML ](https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/get-started?tabs=csharp) * * * ## TensorRT Learning Library OSS (Github) Quantization Quickstart **NVIDIA TensorRT-LLM** The [PyTorch backend](https://nvidia.github.io/TensorRT-LLM/torch.html#quantization) supports FP8 and NVFP4 quantization. Explore [GitHub](https://nvidia.github.io/TensorRT-LLM/torch.html#quantization) to pass quantized models in the Hugging Face model hub, which are generated by TensorRT Model Optimizer. [Link to GitHub](https://nvidia.github.io/TensorRT-LLM/torch.html#quantization) [Link to PyTorch Documentation](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_pytorch_quantization.html) OSS (Github) Adding a New Model in PyTorch Backend This guide provides a step-by-step process for adding a new model in PyTorch Backend. [Link to GitHub](https://nvidia.github.io/TensorRT-LLM/torch/adding_new_model.html) OSS (Github) Using TensoRT-Model Optimizer for Speculative Decoding ModelOpt’s Speculative Decoding module enables your model to generate multiple tokens in each generation step. This can be useful for reducing the latency of your model and speeding up inference. [Link to GitHub](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/7_speculative_decoding.html) * * * ## TensorRT Ecosystem Ecosystem Widely Adopted Across Industries ![NVIDIA TensorRT is widely adopted by top companies across industries](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/tensorrt/Logo_farm_GTC.png) * * * ## More Resources ![NVIDIA Developer Forums](https://developer.download.nvidia.com/icons/m48-people-group.svg) ### Explore the Community ![NVIDIA Training and Certification](https://developer.download.nvidia.com/icons/m48-certification-ribbon-2.svg) ### Get Training and Certification ![NVIDIA Inception Program for Startups](https://developer.download.nvidia.com/images/isaac/m48-ai-startup-256px-blk.png) ### Read Top Stories and Blogs * * * ## Ethical AI NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). **Get started with TensorRT today, and use the right inference tools to develop AI for any application on any platform.** [Download Now ](https://developer.nvidia.com/tensorrt/download) --- # Source: https://developer.nvidia.com/thrust.md # Thrust Thrust is a powerful library of parallel algorithms and data structures. Thrust provides a flexible, high-level interface for GPU programming that greatly enhances developer productivity. Using Thrust, C++ developers can write just a few lines of code to perform GPU-accelerated sort, scan, transform, and reduction operations orders of magnitude faster than the latest multi-core CPUs. For example, the thrust::sort algorithm delivers 5x to 100x faster sorting performance than STL and TBB. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/Thrust_newlogo3.JPG) > Having access to GPU computing through the standard template interface greatly increases productivity for a wide range of tasks, from simple cashflow generation to complex computations with Libor market models, variable annuities or CVA adjustments. The Thrust C++ library has lowered the barrier of entry significantly by taking care of low-level functionality like memory access and allocation, allowing the financial engineer to focus on algorithm development in a GPU-enhanced environment. > > Peter Decrem, Director of Rates Products, Quantifi [Download Now](/cuda-downloads) [Explore what’s new in the latest release...](/cuda-toolkit/whatsnew) ## Key Features Thrust provides STL-like templated interfaces to several algorithms and data structures designed for high performance heterogeneous parallel computing: ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/Thrustpageimage1a.png) ## Examples The easiest way to learn Thrust is by looking at a few examples. The example below generates random numbers on the host and transfers them to the device where they are sorted. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/Thrustpageimage4.png) This second code sample computes the sum of 100 random numbers on the GPU. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/Thrustpageimage5.png) ## Performance Review the latest [CUDA performance report](http://developer.download.nvidia.com/compute/cuda/compute-docs/cuda-performance-report.pdf) to learn how much you could accelerate your code. ![](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/images/libperf/rel60/Thrust.jpg) ## Availability In addition to the Thrust open source project hosted on Github, a production-tested version of Thrust is included in the CUDA Toolkit ## Additional Resources - [Thrust QuickStart Guide](http://docs.nvidia.com/cuda/thrust/index.html) - [An Introduction to Thrust](http://code.google.com/p/thrust/downloads/detail?name=An%20Introduction%20To%20Thrust.pdf) - [Using Thrust With Fortran](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/cuda/files/Using-Thrust-to-Sort-CUDA-FORTRAN-Arrays.pdf), [Watch the Webinar MP4](http://developer.download.nvidia.com/CUDA/training/ThrustonFortran.mp4) - [Thrust 1.0 Overview Webinar](http://developer.download.nvidia.com/CUDA/training/webinarthrust1.mp4) - [Other GPU-accelerated libraries](/gpu-accelerated-libraries) - [Prototyping with Thrust: GTC 2010](http://www.gputechconf.com/page/gtc-on-demand.html#session2104) --- # Source: https://developer.nvidia.com/video-codec-sdk.md # NVIDIA Video Codec SDK A comprehensive set of APIs including high-performance tools, samples and documentation for hardware-accelerated video encode and decode on Windows and Linux. [Get Started ](/video-codec-sdk/download)[Notify Me for Future Releases ](https://developer.nvidia.com/video-codec-sdk/notify-me) ![img-alt-text](https://developer.download.nvidia.com/images/nvidia-video-codec-sdk.jpg) Video apps and streaming enabled by NVENC and NVDEC on consumer laptops, desktops and datacenters ensure optimized performance across platforms. ## Hardware-Based Decoder and Encoder NVIDIA GPUs contain one or more hardware-based decoders and encoders (separate from the CUDA cores) which provide fully accelerated hardware-based video decoding and encoding for several popular codecs. This provides two benefits: - With decoding/encoding offloaded, the compute engine and the CPU are free for other operations. - Any processing pipeline which requires video and GPU compute in sequence runs far more efficiently. This is because the output of the video decoder can be directly provided to the GPU compute, and the output of the GPU compute can be directly provided to the video encoder without any extra memory/PCIe transfers. GPU hardware accelerator engines for video encoding (referred to as NVENC) and video decoding (referred to as NVDEC) support faster-than-real-time video processing, which makes them suitable for use in multiple applications, including video transcoding, video data compression/decompression for deep learning, game broadcasting, virtual desktops, cloud gaming, secure video playback, etc. Video Codec SDK exposes the APIs that let you harness the NVENC and NVDEC for all video encoding/decoding capabilities of these engines. NVIDIA’s newest GPU architecture, Blackwell, further enhances the performance and quality of NVENC and NVDEC. The ultra-high quality (UHQ) mode for HEVC introduced in SDK v12.2 extends to AV1 in SDK v13.0 (January 2025). In addition, the Blackwell architecture supports 422 H.264; 422 HEVC; 422i, 420i H.264; multi-view HEVC, and improved H.264 throughput per NVDEC. ### Hardware-Accelerated Video Encoding - NVENC NVIDIA GPUs contain an on-chip hardware-accelerated video encoder (NVENC), which provides video encoding for H.264, HEVC (H.265) and AV1 codecs. The software enhancements in SDK v13.0 enable extending ultra-high quality (UHQ) mode to AV1 encoding.. This makes AV1 encoding on Blackwell NVENC comparable to software AV1 encoding with ~3X throughput. The UHQ HEVC mode was introduced in the SDK v12.2 in the Ada generation. Blackwell introduces 422 progressive and interlaced encode and decode support in hardware, enabling professional use cases in media and entertainment, video editing and broadcast. Multiple NVENCs working together can achieve performance as high as 8K video at 60FPS+. Video Codec SDK 13.0 also introduces MV-HEVC for hardware-accelerated stereo encoding to address use cases in broadcast, auto and AR/VR headsets. Using rich APIs in Video Codec SDK, NVENC can be used in a wide range of use cases requiring latency as low as cloud gaming and quality as high as OTT streaming and studio broadcasting. NVIDIA’s Python video bindings, such as PyNvVideoCodec, enable deep learning applications to harness the power of NVENC for video data curation and archiving. ![](https://developer.download.nvidia.com/images/2160p30-HQ.png?vv) [![](https://developer.download.nvidia.com/images/1080p30-LL.png?v)](https://developer.download.nvidia.com/images/1080p30-low-latency-encoding.jpg) Note: These graphs showcase performance on NVIDIA data center A10, L4, and L40S. Bitrate savings are BD-BR based on PSNR, averaged across a large variety of content (several hundreds of video clips), using FFmpeg. Only data center GPUs are presented on the benchmark graphs for clarity, but an equivalent workstation GPU with the same architecture performs similarly​. To learn more about the hardware details, the process and software configuration used for generating the above data, please refer to [this detailed documentation](https://developer.download.nvidia.com/designworks/video-codec-sdk/Video-Benchmark-Ada-July-2023.pdf)​. | **GPU** | **H.264 (AVCHD) YUV 4:2:0** | **H.264 (AVCHD) YUV 4:4:4** | **H.264 (AVCHD) LOSSLESS** | **H.265 (HEVC) YUV 4:2:0** | **H.265 (HEVC) YUV 4:4:4** | **H.265 (HEVC) LOSSLESS** | **AV1** | | --- | --- | --- | --- | --- | --- | --- | --- | | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | | Maxwell (1st Gen)\* | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | | Maxwell (2nd Gen) | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | N/A | N/A | N/A | N/A | N/A | N/A | | Maxwell (GM206) | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | N/A | N/A | | Pascal | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 10-bit | 8192 x 8192\*\* | 10-bit | 8192 x 8192\*\* | 10-bit | 8192 x 8192\*\* | N/A | N/A | | Volta | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | N/A | N/A | | Turing | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | N/A | N/A | | Ampere (A100) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | | Ampere (non A100) | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | N/A | N/A | | Ada | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 4096 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | 10-bit | 8192 x 8192 | _\* Except GM108 and GP108 (not supported)_ _\*\* Except GP100 (is limited to 4K resolution)_ ### Hardware-Accelerated Video Decoding - NVDEC NVIDIA GPUs contain an on-chip hardware-accelerated video decoder (NVDEC), which provides video decoding for several popular codecs. The APIs in Video Codec SDK enable the software developers to harness the power of NVDEC for many use cases, ranging from traditional use cases such as secure video playback to accelerating video data ingestion and decoding for DNN training and inference. NVDEC supports hardware-accelerated decoding of the following video codecs on Windows and Linux platforms: MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1 (see table below for codec support for each GPU generation). With Blackwell GPUs, NVDEC doubles the video decoding throughput for H.264 decoding and adds support for 422 decoding for H.264 and HEVC codecs. [![](https://developer.download.nvidia.com/images/1080p30-Decode-Streams.png?v)](https://developer.download.nvidia.com/images/1080p30-decode-streams.jpg) | **GPU** | **\*H.265 (HEVC) 4:4:4** | **H.265 (HEVC) 4:2:0** | **H.264 (AVCHD) 4:2:0** | **VP9** | **VP8** | **MPEG-2** | **VC-1** | **AV1** | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | MAX Color | MAX Res. | | Kepler | N/A | N/A | N/A | N/A | 8-bit | 4096 x 4096 | N/A | N/A | N/A | N/A | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Maxwell (1st Gen)\* | N/A | N/A | N/A | N/A | 8-bit | 4096 x 4096 | N/A | N/A | N/A | N/A | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Maxwell (2nd Gen) | N/A | N/A | N/A | N/A | 8-bit | 4096 x 4096 | N/A | N/A | 8-bit | 4096 x 4096 | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Maxwell (GM206) | N/A | N/A | 10-bit | 4096 x 2304 | 8-bit | 4096 x 4096 | 8-bit | 4096 x 2304 | 8-bit | 4096 x 4096 | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Pascal | N/A | N/A | 12-bit | 8192 x 8192\*\* | 8-bit | 4096 x 4096 | 12-bit\*\*\*\* | 8192 x 8192\*\* | 8-bit | 4096 x 4096\*\*\* | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Volta | N/A | N/A | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Turing | 12-bit | 8192 x 8192 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Ampere (A100) | 12-bit | 8192 x 8192 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | N/A | N/A | | Ampere (non A100) | 12-bit | 8192 x 8192 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | 10-bit | 8192 x 8192 | | Ada | 12-bit | 8192 x 8192 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 12-bit | 8192 x 8192 | 8-bit | 4096 x 4096 | 8-bit | 4080 x 4080 | 8-bit | 2048 x 1024 | 10-bit | 8192 x 8192 | _\* Except GM108 (not supported)_ _\*\* Max resolution support is limited to selected Pascal chips_ _\*\*\* VP8 decode support is limited to selected Pascal chips_ _\*\*\*\* VP9 10/12 bit decode support is limited to select Pascal chips_ * * * ## Video Codec APIs NVIDIA has provided hardware-accelerated video processing on GPUs for over a decade through the NVIDIA Video Codec SDK. Video Codec SDK is a comprehensive set of APIs, high-performance tools, sample applications, reusable code, and documentation for hardware-accelerated video encoding and decoding on Windows and Linux. The NVENCODE and NVDECODE APIs in the NVIDIA Video Codec SDK are C-style APIs, useful for high-performance encoding and decoding using NVENC and NVDEC, respectively. They expose most of the hardware functionality, along with other codec features that are commonly used. Video Codec SDK also provides a set of reusable code in the form of C++ classes built on top of the NVENCODE/NVDECODE APIs, which the applications can easily integrate. NVENCODE/NVDECODE APIs are comprehensive in nature, and expose a large number of codec capabilities, including advanced features. DirectX and Vulkan Video, on the other hand, provide low-level, hardware-agnostic APIs, and attempt to provide more precise control over resource/memory allocation, task scheduling, and work submission to video hardware engines. Whether you prefer DirectX or Vulkan, you can combine flexible GPU-accelerated video encoding and decoding with other GPU acceleration, like 3D and AI, using the language of your choice. The low-level Vulkan Video extensions are also attractive to developers of popular open-source streaming media frameworks such as [GStreamer](https://gstreamer.freedesktop.org/features/index.html) and [FFmpeg](https://lynne.ee/vulkan-video-decoding.html), both of which are being actively ported to Vulkan Video. The cross-platform availability of Vulkan will enable accelerated GPU processing for these frameworks across multiple platforms without needing to port to multiple proprietary video APIs. Please refer to the [Vulkan Video getting started page](https://developer.nvidia.com/vulkan/video/get-started) for more details. [PyNvVideoCodec](https://docs.nvidia.com/video-technologies/pynvvideocodec/index.html) is another set of APIs introduced in Q4 2023, which provides simple APIs for harnessing video encoding and decoding capabilities when working with videos in Python. PyNvVideoCodec is a library that provides python bindings over C++ APIs for hardware accelerated video encoding and decoding. Major advantages of PyNvVideoCodec are: simple installation process, easy APIs with advanced features, and direct interoperability with many deep learning frameworks such as PyTorch. Video Codec SDK, DirectX Video, Vulkan Video, and PyNvVideoCodec provide comprehensive support for GPU-accelerated video workflows. NVIDIA will continue to support these APIs, providing developers multiple options to choose from, and use the ones that best suit their needs. | | Vulkan Video | DirectX Video | NVIDIA Video Codec SDK | PyNvVideoCodec | | --- | --- | --- | --- | --- | | **Platform** | Windows and Linux | Windows | Windows and Linux | Windows and Linux | | **Benefits** | - Low-Level Control - Native Vulkan Integration - Easy for Vulkan developers - Multi-Vendor | - Low-Level Control - Native DirectX and Windows integration - Easy for DirectX developers - Multi-Vendor | - Low- and high-level control - Native Integration in custom pipelines - Useful for users with less knowledge of Vulkan and DirectX - Easy for C, C++ developers - NVIDIA Proprietary API - Comprehensive feature set | - Python bindings over C++ Video Codec SDK wrapper classes - Easy for Python developers - NVIDIA Proprietary API | | **Native API interface** | Vulkan Graphics | D3D11 (Decode only) and D3D12 | D3D9, D3D10, D3D11, D3D12 (Encode only) CUDA (Encode and decode) | CUDA (Encode and decode), PyTorch, TensorRT | * * * ## Partners and Examples ![NVIDIA Partner logo - Beamr](https://developer.download.nvidia.com/images/Beamr_Logo_NVIDIA.png) ![NVIDIA Partner logo - BlackMagicDesign](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/gpudirect/logo-action-blackmagic.png) ![NVIDIA Partner logo - Comprimato](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/comprimato-showcase-logo.png) ![NVIDIA Partner logo - DeltaCast](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/gpudirect/logo-action-deltacast.png) ![NVIDIA Partner logo - Fastvideo](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/fastvideo-showcase-logo.png) ![NVIDIA Partner logo - Slussonic](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/flussonic-showcase-logo.png) ![NVIDIA Partner logo - GCore](https://developer.download.nvidia.com/images/logo-gcore.png) ![NVIDIA Partner logo - Main Concept](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/mainconcept-logo.jpg) ![NVIDIA Partner logo - Medialooks](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/medialooks-showcase-logo.png) ![NVIDIA Partner logo - Multicamera Systems](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/multicam_logo.png) ![NVIDIA Partner logo - NORPIX](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/norpix-showcase-logo.png) ![NVIDIA Example logo - GeForce Now](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/gfn-showcase-logo.png) ![NVIDIA Partner logo - Open Broadcaster Software](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/obs-showcase-logo.png) ![NVIDIA Partner logo - Premeier](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/premiere_thumb.png) ![NVIDIA Partner logo - Splitmedia Labs](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/logo-action-splitmedia.png) ![NVIDIA Partner logo - Streamline](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/streamline-showcase-logo.png) ![NVIDIA Partner logo - Telestream](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/telestream-showcase-logo.png) ![NVIDIA Partner logo - Visionular](https://developer.download.nvidia.com/images/logos/logo-visionular-logo-horizontal.png) ![NVIDIA Partner logo - Wowza](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/designworks/images-videocodec/Wowza-showcase-logo.png) * * * ## Latest Video Codec SDK News [See all Video Codec SDK blogs](https://developer.nvidia.com/blog/recent-posts/?products=Video+Codec+SDK) * * * ## Resources - [GPU Support Matrix](/video-encode-and-decode-gpu-support-matrix-new) * * * ## Get started developing with Video Codec SDK. [Get Started ](/video-codec-sdk/download) Quick Links - [Get Started](/video-codec-sdk/download) - [Support Matrix ](/video-encode-decode-support-matrix) - * * * --- # Source: https://developer.nvidia.com/vrworks.md 1. 1. [Home](/) 2. [NVIDIA VRWorks Graphics](https://developer.nvidia.com/vrworks) # NVIDIA VRWorks Graphics VRWorks™ is a comprehensive suite of APIs, libraries, and engines that enable application and headset developers to create amazing virtual reality experiences. VRWorks enables a new level of presence by bringing physically realistic visuals, sound, touch interactions, and simulated environments to virtual reality. [Get Started](#started) ## See VRWorks Results Across Industries ![](https://developer.download.nvidia.com/vrworks/images/vrworks-results-proviz-zerolight-1920x1080.jpg) ### ZeroLight ZeroLight Virtual Showroom leverages VRS to enable foveated rendering on HMDs that support eye tracking ### ESI Group IC.IDO ESI Group uses VR SLI to improve performance in their IC.IDO collaborative VR environment ### Autodesk VRED Autodesk VRED integrates VRS, VR SLI and DLSS to deliver content adaptive shading and foveated rendering * * * ## Benefits VRWorks enhances advanced VR performance by increasing application rendering efficiency and image quality through the use of variable-rate shading and foveated rendering. It offers easy integration and unlimited configurability, thanks to versatile tools and extensive support for multiple graphics APIs, enabling developers to deliver superior VR experiences effortlessly. Additionally, optimized HMD support exposes headset manufacturers to the latest NVIDIA hardware technologies via a plug-and-play collection of APIs, resulting in more performant, immersive, and responsive virtual reality environments. ### Variable Rate Shading Increase rendering performance and quality by applying a varying amount of processing power to different areas of the image. Variable Rate Shading (VRS) is an easy to implement rendering technique enabled by Turing GPUs. With VRS, single-pixel shading operations can be applied to a block of pixels, allowing applications to effectively vary the shading rate in different areas of the screen. VRS can be used to render more efficiently in VR by rendering to a buffer that closely approximates the lens corrected image that is output to the headset display. VRS can also be coupled with eye-tracking to maximize quality in the foveated region. [Learn More](https://developer.nvidia.com/vrworks/graphics/variablerateshading) ### VRSS Foveated rendering improves VR image quality by sampling select regions of the HMD screen at a higher shading rate. The latest version of VRSS (Variable Rate Supersampling) supports gaze tracking by integrating eye-tracking technology to dynamically change foveated regions based on where the user looks. VRSS is a zero-coding solution for application developers–all the work is done through NVIDIA drivers and the end user simply turns on VRSS in the NVIDIA Control Panel. HMD manufacturers can integrate their eye tracking run-time to take advantage of these features. [Learn More](https://developer.nvidia.com/blog/delivering-dynamic-foveated-rendering-with-nvidia-vrss-2/) ### Multi-View Rendering By rendering four projection centers, Multi-View Rendering (MVR) can power canted HMDs (non-coplanar displays) enabling extremely wide fields of view and novel display configurations. MVR is a feature in Turing GPUs that expands upon Single Pass Stereo, increasing the number of projection views for a single rendering pass from two to four. All four of the views available in a single pass are now position-independent and can shift along any axis in the projective space. [Learn More](https://developer.nvidia.com/vrworks/graphics/multiview) ### VR SLI VR SLI provides increased performance for virtual reality apps where multiple GPUs can be assigned a specific eye to dramatically accelerate stereo rendering. With the GPU affinity API, VR SLI allows scaling for systems with more than 2 GPUs. [Learn More](https://developer.nvidia.com/vrworks/graphics/vrsli) ### Lens Matched Shading Lens Matched Shading uses the new Simultaneous Multi-Projection architecture of NVIDIA Pascal-based GPUs to provide substantial performance improvements in pixel shading. [Learn More](https://developer.nvidia.com/vrworks/graphics/lensmatchedshading) ### Direct Mode Enable better plug-and-play support and compatibility for VR headsets. With Direct Mode, the NVIDIA driver treats the HMD as a special display accessible only to VR applications instead of a normal Windows monitor where your desktop shows up. Direct Mode leverages Front Buffer Rendering to render directly to the front buffer to reduce latency, and Context Priority which supports fine-grained control over GPU scheduling. Context Priority enables advanced virtual reality features such as late latch and asynchronous time warp, which cut latency and quickly adjust images as HMD users move their heads, without the need to re-render a new frame. [Learn More](https://developer.nvidia.com/vrworks-hmd-developer-program) ### DSC As HMD resolutions rapidly increase, the bandwidth requirements increase as well. Display Stream Compression (DSC) can provide the needed bandwidth reduction to support compression ratios up to 3:1. DSC was developed as an industry-wide standard for video interfaces; it features extremely low latency and visually lossless compression. [Learn More](https://news.developer.nvidia.com/DSC-higher-fidelity-vr/) * * * ## Get Started with NVIDIA VRWorks ### For Application Developers VRWorks Graphics SDK provides a set of versatile tools to enable ease of integration for application developers to deliver the best VR performance and image quality, with the most configurability and lowest latency. VRWorks Graphics SDK continues to be widely adopted by leading ISV developers in both the enterprise, creative, and gaming markets. [Download VRWorks Graphics SDK](https://developer.nvidia.com/downloads/vrworks/secure/3.6.2/public_3.6_06112024.zip) ### For HMD Manufacturers VRWorks Graphics SDK provides custom tools for head-mounted display (HMD) manufacturers to optimize performance and latency to deliver the best image quality with high-resolution displays. The VRWorks SDK for headset developers is available through the VRWorks HMD Manufacturer Program. [Register for VRWorks HMD Manufacturer Program](https://developer.nvidia.com/nvidia-vrworks-hmd-developer-program) * * * ## Latest VRWorks News * * * ## Resources - [Consumer VR Solutions](https://www.nvidia.com/en-us/geforce/technologies/vr/) - [Professional VR Solutions](http://www.nvidia.com/object/quadro-vr-ready.html) - [VIVE Developer Site](https://developer.vive.com/) - [SteamVR/OpenVR from Valve](https://store.steampowered.com/search/?category1=993) - [[Oculus Developer Portal](https://developers.facebook.com/) ](https://store.steampowered.com/search/?category1=993) Ready to get started developing with VRWorks? [Get Started](#started) * * * --- # Source: https://developer.nvidia.com/warp-python.md 1. [Home](https://developer.nvidia.com/) 2. [Developer Tools Catalog](/developer-tools-catalog) NVIDIA Warp # NVIDIA Warp ## Differentiable Spatial Computing for Python NVIDIA Warp is an open-source developer framework for building and accelerating data generation and spatial computing in Python. Warp gives coders an easy way to write GPU-accelerated, kernel-based programs for simulation AI, robotics, and machine learning (ML). With Warp, Python developers can create GPU-accelerated, 3D simulation workflows that drive ML pipelines in PyTorch, JAX, PhysicsNeMo, and [NVIDIA Omniverse™](https://www.nvidia.com/en-us/omniverse/). Benefits include simulation performance equivalent to native CUDA® code, with the convenience and developer productivity of Python. [Download Now ](https://github.com/NVIDIA/warp) ## Key Features ### Kernel-Based Code NVIDIA Warp performs a just-in-time (JIT) runtime compilation of Python functions to x86 and CUDA® kernel-level code. Kernel-based programming provides a low-level abstraction that maps closely to GPU hardware, and, in contrast to tensor-based programming, provides implicit kernel fusion (controlled by the user), fine-grained control over threads, native support for conditional logic, and sparse scatter and gather. [Learn more](https://nvidia.github.io/warp/basics.html#kernels) ### Differentiable Programming In addition to generating forward-mode kernel code, Warp can generate reverse-mode (adjoint) kernels that propagate the gradients of simulation results back into frameworks, such as PyTorch and JAX for network training, design optimization, and parameter estimation. [Learn more](https://nvidia.github.io/warp/modules/differentiability.html#differentiability) ### Built for Spatial Computing Warp includes built-in functionality to enable 3D simulation and geometry processing. In addition to a rich spatial math library, Warp provides higher-level data structures, such as meshes, hash grids, and sparse volumes (NanoVDB) for GPU-accelerated geometric queries. [Learn more](https://nvidia.github.io/warp/modules/runtime.html#meshes) * * * ## See NVIDIA Warp in Action Many Python developers are using Warp today. In Omniverse, groups are using Warp for digital humans, physics simulation, and procedural animation. Warp includes several higher-level data structures that make implementing simulation and geometry processing algorithms easier. ### Meshes Triangle meshes are ubiquitous in simulation and computer graphics. Warp provides a built-in type for managing mesh data that supports geometric queries, such as closest-point, ray-cast, and overlap checks. ### Sparse Volumes Sparse volumes are incredibly useful for representing grid data over large domains, such as signed distance fields (SDFs) for complex objects, or velocities for large-scale fluid flow. Warp includes support for sparse volumes defined using the NanoVDB standard. ### Hash Grids Many particle-based simulation methods, such as the discrete element method (DEM) or smoothed particle hydrodynamics (SPH), involve iterating over spatial neighbors to compute force interactions. Hash grids are a well-established data structure to accelerate these nearest-neighbor queries and are particularly well-suited to the GPU. ![A quadrotor drone being simulated and controlled using NVIDIA Warp.](https://developer.download.nvidia.com/images/warp-drone.jpg) ### Robotics In this example, Warp is used to simulate the dynamics of a quadrotor drone using the AirSim aerodynamics model. Through Warp’s differentiable simulation capabilities, users can write model-based predictive controllers (MPC) to optimize the drone’s trajectory with loss functions that take into account obstacles and target goals. [Learn More](https://github.com/NVIDIA/warp?tab=readme-ov-file#warpexamplessim) ![img-aAn aerodynamic flow simulation around a vehicle using NVIDIA Warp. lt-text](https://developer.download.nvidia.com/images/warp-f1-car-aero.jpg) ### Simulation Here, you can see Warp being used to create an incompressible flow solver for aerodynamics simulation around a vehicle. Warp provides built-in support for multidimensional arrays and sparse collision fields via NanoVDB to accurately represent the vehicle’s collision geometry. [Learn More](https://github.com/NVIDIA/warp?tab=readme-ov-file#warpexamplescore) ![imgAn example of differentiable simulation for initial value alt-text](https://developer.download.nvidia.com/images/warp-training-optimization.jpg) ### Training and Optimization We can also use Warp to solve initial value problems. Thanks to Warp’s differentiable kernels, we’re able to simulate the trajectory of the ball and obtain gradients with respect to the initial velocity. These gradients can then be used in PyTorch or JAX to optimize the trajectory and hit the target after a few seconds. [Learn More](https://github.com/NVIDIA/warp?tab=readme-ov-file#warpexamplesoptim) ![A synthetic image generation pipeline written in NVIDIA Warp.](https://developer.download.nvidia.com/images/warp-sdg-cropped.jpg) ### Data Generation Warp enables accelerated data generation and processing in Python. In this example, Warp kernels are used to generate procedural image data as part of an Omniverse [synthetic data generation](https://www.nvidia.com/en-us/use-cases/synthetic-data/) (SDG) pipeline. [Learn More](http://docs.omniverse.nvidia.com/extensions/warp.html) * * * _Autodesk Research Leverages NVIDIA Warp to Accelerate Computational Fluid Dynamics on NVIDIA GPUs_ #### Accelerate CAE Tool Development Warp is enabling computer-aided engineering (CAE) industry developers to accelerate physics-based CAE simulations and embrace real-time interactive design using AI-enabled digital twins. Warp gives coders an easy way to write kernel-based programs for CAE and machine learning. Warp supports PyTorch, JAX, PhysicsNeMo, and NVIDIA Omniverse. [Learn More](https://developer.nvidia.com/topics/cae) * * * #### Newton, the Next-Generation Open-Source Physics Simulation Engine Newton is an open-source, GPU-accelerated, and extensible physics engine, co-developed by Google DeepMind and Disney Research, and [managed by the Linux Foundation](https://www.linuxfoundation.org/press/linux-foundation-announces-contribution-of-newton-by-disney-research-google-deepmind-and-nvidia-to-accelerate-open-robot-learning). Built on NVIDIA Warp and OpenUSD, Newton is optimized for robotics and compatible with learning frameworks such as MuJoCo Playground or NVIDIA Isaac Lab. [Newton Beta](https://github.com/newton-physics) is now available to use. [Get Started on Newton](/newton-physics) ![](https://developer.download.nvidia.com/images/isaac/newton-ari.jpg) * * * ## NVIDIA Warp On Demand Playlist * * * ## Resources - [Read technical blogs](https://developer.nvidia.com/blog/creating-differentiable-graphics-and-physics-simulation-in-python-with-nvidia-warp) - [View Warp documentation](https://nvidia.github.io/warp) - [See the GTC Talk on Warp](https://www.nvidia.com/en-us/on-demand/session/gtc24-s63345/) - [Watch Omniverse Basics: NVIDIA Warp Overview](https://www.nvidia.com/en-us/on-demand/session/omniverse2020-om1453/) - [Visit the developer forums](https://github.com/NVIDIA/warp/issues) - [Explore NVIDIA Deep Learning Institute (DLI) training](https://www.nvidia.com/en-us/training/) Visit Github to download the latest version of NVIDIA Warp software. [Download Now](https://github.com/NVIDIA/warp)