# Nemo Framework > Documentation for Nemo Framework --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/24.07/overview.html.md --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html.md Title: Best Practices for NeMo Developers# URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html Published Time: Fri, 18 Jul 2025 19:29:26 GMT Markdown Content: Import Guarding[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html.md#import-guarding "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------- Sometimes, developers have an optional package they would like to use only when it is available. In this case, the developer may want to follow different code paths depending on whether the optional package is present. Other times, a developer may want to require a package for their collection, but they may not want to make that package required for all collections. In either of these cases, it’s important to guard the optional imports. In [import_utils.py](https://github.com/NVIDIA/NeMo/blob/main/nemo/utils/import_utils.py.md), NeMo provides the utilities required to handle the import of optional packages effectively. This script is adapted from cuML’s [safe_imports module](https://github.com/rapidsai/cuml/blob/e93166ea0dddfa8ef2f68c6335012af4420bc8ac/python/cuml/internals/safe_imports.py.md). The two functions developers should be aware of are: 1. [safe_import](https://github.com/NVIDIA/NeMo/blob/a9746a654d37d3451bcc33ad58cf8378efe787b7/nemo/utils/import_utils.py.md#L243): A function used to import optional modules. Developers can provide an optional error message to be displayed in the case the module is used after a failed import. Alternatively, they can provide an alternate module to be used if the import of the optional module fails. `safe_import` returns a tuple containing: 1. the successfully imported optional module or, if the import fails, the given alternate module or a placeholder `UnavailableMeta` class instance and 2. a boolean indicating whether the import of the optional module was successful. The returned boolean can be used throughout the script to ensure you only use the optional module when it is present. For example, in the LLM collection, we use `safe_import` to determine whether TE is installed. When [creating the default GPT layer spec](https://github.com/NVIDIA/NeMo/blob/a98c5ed2c3027d90cd16b505fecfb54097d0b743/nemo/collections/llm/gpt/model/base.py.md#L115-L119), we use the value of `HAVE_TE` to determine whether the default layer spec uses the transformer engine: _, HAVE_TE = safe_import("transformer_engine") ... def default_layer_spec(config: "GPTConfig") -> ModuleSpec: if HAVE_TE: return transformer_engine_layer_spec(config) else: return local_layer_spec(config) 2. [safe_import_from](https://github.com/NVIDIA/NeMo/blob/a9746a654d37d3451bcc33ad58cf8378efe787b7/nemo/utils/import_utils.py.md#L283): A function used to import symbols from modules that may not be available. As in the case of `safe_import`, developers can provide a message to display whenever the symbol is used after a failed import, or they can provide an object to be used in place of the symbol if the import of the symbol fails. `safe_import_from` returns the same tuple containing: 1. the successfully imported optional symbol or, if the import fails, the given alternate object or a placeholder `UnavailableMeta` class instance and 2. a boolean indicating whether the import of the desired symbol was successful. `safe_import` and `safe_import_from` are used throughout the NeMo codebase. [megatron_gpt_model.py](https://github.com/NVIDIA/NeMo/blob/e35a6592f53ee34b1ec2fc3f1e009dd1ebc79e65/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py.md#L131-L136) is one example: transformer_engine, HAVE_TE = safe_import("transformer_engine") te_module, HAVE_TE_MODULE = safe_import_from("transformer_engine.pytorch", "module") get_gpt_layer_with_te_and_hyena_spec, HAVE_HYENA_SPEC = safe_import_from( "nemo.collections.nlp.modules.common.hyena.hyena_spec", "get_gpt_layer_with_te_and_hyena_spec" ) HAVE_TE = HAVE_TE and HAVE_TE_MODULE and HAVE_HYENA_SPEC Transformer Engine is required for FP8 and Cuda Graphs. The value of `HAVE_TE` is used throughout `megatron_gpt_model.py` to determine whether these features can be enabled and to gracefully handle the case when a user requests these features and they are not present. For example, when a user enables cuda graphs, we use the value of `HAVE_TE` to ensure that Transformer Engine is present. If `HAVE_TE` is False, [a useful message is printed](https://github.com/NVIDIA/NeMo/blob/e35a6592f53ee34b1ec2fc3f1e009dd1ebc79e65/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py.md#L2143). One consequence of import guarding is suppose a developer expects a particular module to be present, but the import fails. If the import is guarded, this will cause the execution to continue with a different code path than the developer expects. During development, a user may find it useful to run in `debug` mode. This causes the logger to [report any failed imports](https://github.com/NVIDIA/NeMo/blob/a9746a654d37d3451bcc33ad58cf8378efe787b7/nemo/utils/import_utils.py.md#L271) along with the corresponding traceback, which can help the developer catch any unexpected failed imports and understand why the expected modules are missing. Debug mode can be enabled with the following code: from nemo.utils import logging logging.set_verbosity(logging.DEBUG) Working with Hugging Face Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html.md#working-with-hugging-face-models "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Some of the NeMo examples require accessing gated Hugging Face models. If you try to run a model and get an error that looks like this: OSError: You are trying to access a gated repo. Make sure to have access to it at you likely need to set up your `HF_TOKEN` environment variable. You must first request access to the gated model by following the URL provided. After access has been granted, make sure you have a Hugging Face access token (if you do not, follow [this tutorial](https://huggingface.co/docs/hub/en/security-tokens.md#how-to-manage-user-access-tokens) to generate one). Finally, be sure to set the `HF_TOKEN` variable in your environment: export HF_TOKEN= Working with scripts in NeMo 2.0[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html.md#working-with-scripts-in-nemo-2-0 "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- When working with any scripts in NeMo 2.0, please make sure you wrap your code in an `if __name__ == "__main__":` block. Otherwise, your code may hang unexpectedly. The reason for this is that NeMo 2.0 uses Python’s `multiprocessing` module in the backend when running a multi-GPU job. The multiprocessing module will create new Python processes that will import the current module (your script). If you did not add `__name__== "__main__"`, then your module will spawn new processes which import the module and then each spawn new processes. This results in an infinite loop of processing spawning. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html.md#working-with-scripts-in-nemo-2-0) - [import_utils.py](https://github.com/NVIDIA/NeMo/blob/main/nemo/utils/import_utils.py.md) - [safe_imports module](https://github.com/rapidsai/cuml/blob/e93166ea0dddfa8ef2f68c6335012af4420bc8ac/python/cuml/internals/safe_imports.py.md) - [safe_import](https://github.com/NVIDIA/NeMo/blob/a9746a654d37d3451bcc33ad58cf8378efe787b7/nemo/utils/import_utils.py.md#L243) - [creating the default GPT layer spec](https://github.com/NVIDIA/NeMo/blob/a98c5ed2c3027d90cd16b505fecfb54097d0b743/nemo/collections/llm/gpt/model/base.py.md#L115-L119) - [safe_import_from](https://github.com/NVIDIA/NeMo/blob/a9746a654d37d3451bcc33ad58cf8378efe787b7/nemo/utils/import_utils.py.md#L283) - [megatron_gpt_model.py](https://github.com/NVIDIA/NeMo/blob/e35a6592f53ee34b1ec2fc3f1e009dd1ebc79e65/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py.md#L131-L136) - [a useful message is printed](https://github.com/NVIDIA/NeMo/blob/e35a6592f53ee34b1ec2fc3f1e009dd1ebc79e65/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py.md#L2143) - [report any failed imports](https://github.com/NVIDIA/NeMo/blob/a9746a654d37d3451bcc33ad58cf8378efe787b7/nemo/utils/import_utils.py.md#L271) - [this tutorial](https://huggingface.co/docs/hub/en/security-tokens.md#how-to-manage-user-access-tokens) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md Title: Changelog — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html Published Time: Thu, 31 Jul 2025 18:51:00 GMT Markdown Content: Changelog[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#changelog "Link to this heading") ---------------------------------------------------------------------------------------------------------------------- This section identifies the major changes in each version of the NVIDIA NeMo™ Framework released to date. 25.07 NeMo Framework Container[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-container "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------- ### Existing Repository: [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo.md)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#existing-repository-https-github-com-nvidia-nemo "Link to this heading") #### Training Performance (Speed)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#training-performance-speed "Link to this heading") * nvFSDP + Activation offloading tuning for GB200 * nvFSDP with hybrid model sharding with partial replication * nvFSDP with persistent buffers * NVL sharp + IB sharp for DP/FSDP communications on H100 and B200 * MXFP8 with TP communication overlap * MXFP8 with reduced memory allocation * FP8 sub-channel recipe (128x128 for weight and 1x128 for activation) * cuDNN fused attention for MLA (both Hopper and Blackwell) * Advanced custom asymmetric pipelining (for MTP, loss function, and embedding) * BF16 optimizer for model memory saving * CUDA graph fix for fine-tuning benchmarks * CUDA graph support for Llama 4 #### Collections[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#collections "Link to this heading") * LLM * Improved DeepSeek V3 * MoE Permute Fusion supported ([PR](https://github.com/NVIDIA/NeMo/pull/13188.md)) * Module-specific recompute supported ([PR](https://github.com/NVIDIA/NeMo/pull/13188.md)) * Subchannel FP8 recipe ([PR](https://github.com/NVIDIA/NeMo/pull/12940.md)) * Blackwell support ([PR](https://github.com/NVIDIA/NeMo/pull/13620.md)) * MoE router score promoted to FP32 to facilitate convergence ([PR](https://github.com/NVIDIA/NeMo/pull/13188.md)) * Qwen 3 ([PR](https://github.com/NVIDIA/NeMo/pull/13554.md)[Docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/qwen3.html.md)) * Gemma3 ([PR](https://github.com/NVIDIA/NeMo/pull/13536.md)[Docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md)) * New embedding models supported ([PR](https://github.com/NVIDIA/NeMo/pull/13890.md) and [Docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/embeddingmodels/gpt/llama_embedding.html.md)) * nv-embedqa-e5-v5 * llama-3.2-nv-embedqa-300m-v2 * llama-3.2-nv-embedqa-1b-v2 * New reranker models supported ([PR](https://github.com/NVIDIA/NeMo/pull/13876.md)[Docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/reranker/llama_reranker.html.md)) * llama-3.2-nv-rerankqa-500m-v2 * llama-3.2-nv-rerankqa-1b-v2 * Multimodal * Audio Vision Language model (AVLM) supported ([PR](https://github.com/NVIDIA/NeMo/pull/12477.md)[Docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md)) #### Speech[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#speech "Link to this heading") * Batched beam search for transducers (RNN-T and TDT) * RNNT/TDT buffered/streaming inference + batched decoding support in cache-aware * Added support for CTC batched beam search with GPU-LM * Key fixes * Punctuation marks in timestamps * Fix timestamps when CUDA graphs enabled * Fix masking of tokens in AED inference * TDT streaming inference fix ### New Repository[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#new-repository "Link to this heading") #### [Export & Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy.md)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id14 "Link to this heading") * [NeMo Export-Deploy Release](https://github.com/NVIDIA-NeMo/Export-Deploy/releases.md) * Pip installers for export and deploy * RayServe support for multi-instance deployment * TensorRT-LLM PyTorch backend * MCore inference optimizations ⚠️ Note: The current [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo.md) is being re-organized to enable better user experience. New repos are under [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo.md) github org, where users can also find an overview of how different repos fit into NeMo Framework. At time of the relase, the below Alpha repos in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo.md) are under active development. New features, improvements, and documentation updates are released regularly. We are working toward a stable release, so expect the interface to solidify over time. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out. #### Megatron-Bridge (Alpha)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#megatron-bridge-alpha "Link to this heading") * Llama and Qwen * Pretrain/SFT * PEFT * Recipe structure with examples for plain Python and NeMo Run usage #### Automodel (Alpha)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#automodel-alpha "Link to this heading") * Custom FSDP Support * Docs (WIP) [PR](https://github.com/NVIDIA-NeMo/Automodel/pull/28.md); [PR](https://github.com/NVIDIA-NeMo/Automodel/pull/63.md) * Packed sequence support * [Docs](https://docs.nvidia.com/nemo/automodel/latest/guides/llm/dataset.html.md#enable-packed-sequences-in-nemo-automodel); [PR](https://github.com/NVIDIA-NeMo/Automodel/pull/49.md) * Triton kernels for LoRA * Docs (WIP); [PR](https://github.com/NVIDIA-NeMo/Automodel/pull/81.md) #### Eval (Alpha)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#eval-alpha "Link to this heading") * Enable log probs benchmarks with nvidia-lm-eval * Support for new harnesses: * BFCL * BigCode * Simple-evals * Safety-harness * Garak * Single node multi-instance/DP evaluation with Ray #### nvFSDP (Alpha)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nvfsdp-alpha "Link to this heading") * Support sharding strategies for Optimizer State, Gradient and Model Weights (similar to ZeRO 1/2/3) * Checkpoint support * Integration with Automodel * FP8 Mixed Precision with Transformer Engine * Hopper related optimizations (User-Buffer-Registration NCCL communication​​) * FSDP2 similar API usage 25.07 NeMo Curator Container[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-curator-container "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------ * [NeMo Curator GitHub Release](https://github.com/NVIDIA-NeMo/Curator/releases.md) Updates to Non-Container Repositories[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#updates-to-non-container-repositories "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ These repos are not included in the NeMo Framework Containers. For details, please refer to [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo.md). ### RL[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#rl "Link to this heading") * [NeMo RL GitHub Release](https://github.com/NVIDIA-NeMo/RL/releases.md) * NeMo RL’s container build process is publicly available [here](https://docs.nvidia.com/nemo/rl/latest/docker.html.md) NeMo Framework 25.04.02[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-25-04-02 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- The new container is available as `nvcr.io/nvidia/nemo:25.04.02`. For convenience, the `nvcr.io/nvidia/nemo:25.04` tag has also been updated to point to this latest patch version. This release addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information, visit [NVIDIA Security](https://www.nvidia.com/en-us/security.md/); for acknowledgement, please reach out to the NVIDIA PSIRT team at [PSIRT@nvidia.com](mailto:PSIRT%40nvidia.com.md). NeMo Framework 25.04.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-25-04-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- The new container is available as `nvcr.io/nvidia/nemo:25.04.01`. For convenience, the `nvcr.io/nvidia/nemo:25.04` tag has also been updated to point to this latest patch version. ### Collections[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id20 "Link to this heading") * LLM * Llama 4: Fixed an accuracy issue caused by MoE probability normalization. Improved pre-train and fine-tune performance. ### Export & Deploy[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id21 "Link to this heading") * Updated vLLMExporter to use vLLM V1 to address a security vulnerability. ### AutoModel[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#automodel "Link to this heading") * Improved chat-template handling. ### Fault Tolerance[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#fault-tolerance "Link to this heading") * Local checkpointing: Fixed support for auto-inserted metric names when resuming from local checkpoints. NeMo Framework 25.04.00[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-25-04-00 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- ### Curator[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#curator "Link to this heading") * Llama Based PII Redaction * Trafilatura Text Extractor * Chinese & Japanese Stopwords for Text Extractors * Writing gzip compressed jsonl datasets * Training dataset curation for retriever customization using hard-negative mining * Implemented a memory efficient pairwise similarity in Semantic Deduplication ### Export & Deploy[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id22 "Link to this heading") * NeMo 2.0 export path for NIM * ONNX and TensorRT Export for NIM Embedding Container * In-framework deployment for HF Models * TRT-LLM deployment for HF Models in NeMo Framework ### Evaluation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#evaluation "Link to this heading") * Integrate [nvidia-lm-eval](https://pypi.org/project/nvidia-lm-eval.md/) to NeMo FW for evaluations with OpenAI API compatible in-framework deployment ### AutoModel[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id23 "Link to this heading") * VLM AutoModelForImageForTextToText * FP8 for AutoModel * Support CP with FSDP2 * Support TP with FSDP2 * Performance Optimization * Add support for cut cross entropy & Liger kernels * Gradient Checkpointing ### Fault Tolerance[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id24 "Link to this heading") * Integrate NVRx v0.3 [Local checkpointing](https://nvidia.github.io/nvidia-resiliency-ext/checkpointing/local/index.html.md) ### Collections[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id25 "Link to this heading") * LLM * Llama4 (See [Known Issue](https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html.md#llama4-known-issue)) * Llama Nemotron Ultra * Llama Nemotron Super * Llama Nemotron Nano * Nemotron-h/5 * DeepSeek V3 Pretraining * Evo2 * Qwen 2.5 * LoRA for Qwen3-32B and Qwen3-30B-A3B * MultiModal * FLUX * Gemma 3 * Qwen2-VL * ASR * NeMo Run support for ASR training * N-Gram LM on GPU for AED * N-Gram LM on GPU + Transducer greedy decoding (RNN-T, TDT) * Timestamps support for AED timestamp supported models * Migrate SpeechLM to NeMo 2.0 * Canary-1.1 * Replace ClassificationModels class with LabelModels ### Performance[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#performance "Link to this heading") * Functional MXFP8 support for (G)B200 * Current scaling recipe with TP communication overlap and FP8 param gathers * Custom FSDP support that fully utilizes GB200 NVL72 ### Deprecations[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#deprecations "Link to this heading") * Nemo Aligner is deprecated from 25.04 onwards. Users can use 25.02 or prior if they would like to use Nemo Aligner. A new library, [Nemo RL](https://github.com/NVIDIA/NeMo-RL.md), has been released to replace it. * Nemo 1.x path is deprecated from 25.04 onwards. Users can use 25.02 or prior if they would like to use Nemo 1.0 path. However, we strongly encourage you to migrate to Nemo 2.x path to take advantage of our latest features and functionalities. See [migration guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html.md) for more info. ### Long-term support (LTS)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#long-term-support-lts "Link to this heading") * 25.04 is our first LTS release. The release cadence for LTS containers will be roughly 1-2 times a year and the LTS container will be end-of-lifed when the next LTS container is published. LTS releases will accept critical bug fixes and security patches until end-of-lifed. NeMo Framework 25.02.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-25-02-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- ### Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#training "Link to this heading") * Fix MoE based models training instability. * Fix bug in Llama exporter for Llama 3.2 1B and 3B. * Fix bug in LoRA linear_fc1adapter when different TP is used during saving and loading the adapter checkpoint. * Upgrade Nsight Systems to 2025.1.1 to fix an issue in the previous container that kernel overlapping fails to work when profiling is used. NeMo Framework 25.02[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-25-02 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- ### Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id26 "Link to this heading") * Blackwell and Grace Blackwell support * Pipeline parallel support for distillation * Improved NeMo Framework installation ### Aligner[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#aligner "Link to this heading") * DPO with EP * Support MCore Distributed Optimizer * Aligner on Blackwell ### Curator[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id27 "Link to this heading") * Python 3.12 Support * Curator on Blackwell * Nemotron-CC Dataset Recipe * Performant S3 for Fuzzy Deduplication ### Export & Deploy[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id28 "Link to this heading") * vLLM export for NeMo 2.0 ### Evaluations[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#evaluations "Link to this heading") * Integrate lm-eval-harness ### Collections[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id29 "Link to this heading") * LLM * DAPT Example and best practices in nemo 2.0 * [NeMo 2.0] Enable Tool Learning and add a tutorial * Support GPT Embedding Model (Llama 3.2 1B/3B) * Qwen2.5, Phi4 (via AutoModel) * SFT for Llama 3.3 model (via AutoModel) * Support BERT Embedding Model with NeMo 2.0 * DeepSeek SFT & PEFT Support * MultiModal * Clip * SP for NeVA * CP for NeVA * Intern-VIT ### Automodel[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id30 "Link to this heading") * Preview release. * PEFT and SFT support for LLMs available via Hugging Face’s AutoModelForCausalLM. * Support for Hugging Face-native checkpoints (full model and adapter only). * Support for distributed training via DDP and FSDP2. ### ASR/TTS[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#asr-tts "Link to this heading") * Lhotse: TPS-free 2D bucket estimation and filtering * Update model outputs to make all asr outputs to be in consistent format * Sortformer Release Model NeMo Framework 24.12[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-12 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- ### Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id31 "Link to this heading") * Fault Tolerance * Straggler Detection * Auto Relaunch ### LLM & MM[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#llm-mm "Link to this heading") * MM models * Llava-next * Llama 3.2 * Sequence Model Parallel for NeVa * Enable Energon * SigLIP (NeMo 1.0 only) * LLM 2.0 migration * Starcoder2 * Gemma 2 * T5 * Baichuan * BERT * Mamba * ChatGLM * DoRA support ### Aligner[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id32 "Link to this heading") * NeMo 2.0 Model Support * Sequence Packing for DPO * Reinforce/RLOO Support * SFT Knowledge Distillation * Context Parallelism for SFT ### Export[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#export "Link to this heading") * Nemo 2.0 base model export path for NIM * PTQ in Nemo 2.0 ### Curator[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id33 "Link to this heading") * Synthetic Data Generation for Text Retrieval * LLM-based Filters * Easiness * Answerability * Q&A Retrieval Generation Pipeline * Parallel Dataset Curation for Machine Translation * Load/Write Bitext Files * Heuristic filtering (Histogram, Length Ratio) * Classifier filtering (Comet, Cometoid) ### ASR[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#asr "Link to this heading") * Timestamps with TDT decoder * Timestamps option with .transcribe() NeMo Framework 24.09[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-09 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- ### LLM & MM[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id34 "Link to this heading") * Training * Long context recipe * PyTorch Native FSDP 1 * Distributed Checkpoint * Torch native format * Async * Parallel r/w * NeMo Run OSS * Models * Llama 3 * Mixtral * Nemotron * E2E BF16 Training of Llama3 with RedPajama2 data (~2.4T tokens) * NeMo2.0 llama3 8b trained on 2.4T RP2 tokens * NeMo2.0 llama3 70b trained on <1T RP2 tokens * NeMo 1.0 * SDXL (text-2-image) * Model Opt * Depth Pruning [[docs](https://github.com/NVIDIA/NeMo/blob/main/docs/source/nlp/nemo_megatron/model_distillation/drop_layers.rst.md)] * Logit based Knowledge Distillation [[docs](https://github.com/NVIDIA/NeMo/blob/main/docs/source/nlp/distillation.rst.md)] ### Aligner[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id35 "Link to this heading") * Rejection Sampling (SFT) * Packed sequence training in aligner for SFT ### Export[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id36 "Link to this heading") * TensorRT-LLM v0.12 integration * LoRA support for vLLM * FP8 checkpoint ### Curator[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id37 "Link to this heading") * Image Semantic Deduplication * NSFW Classifier * Aesthetic Classifier * CLIP Embedding Creation * AEGIS Classifier * Quality Classifier ### ASR[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id38 "Link to this heading") * Parakeet large (ASR with PnC model) * Added [Uzbek](https://huggingface.co/nvidia/stt_uz_fastconformer_hybrid_large_pc.md) offline and [Gregorian](https://huggingface.co/nvidia/stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc.md) streaming models * Optimization feature for efficient bucketing to improve bs consumption on GPUs NeMo Framework 24.07[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-07 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- ### Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id39 "Link to this heading") * Features and Model architectures * PEFT: QLoRA support, LoRA/QLora for Mixture-of-Experts (MoE) dense layer * State Space Models & Hybrid Architecture support (Mamba2 and NV-Mamba2-hybrid) * Support Nemotron, Minitron, Gemma2, Qwen, RAG * Multimodal * NeVA: Add SOTA LLM backbone support (Mixtral/LLaMA3) and suite of model parallelism support (PP/EP) * Support Language Instructed Temporal-Localization Assistant (LITA) on top of video NeVA * Custom Tokenizer training in NeMo * Update the Auto-Configurator for EP, CP and FSDP ### ASR[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id40 "Link to this heading") * SpeechLM and SALM * Adapters for Canary Customization * Pytorch allocator in PyTorch 2.2 improves training speed up to 30% for all ASR models * Cuda Graphs for Transducer Inference * Replaced webdataset with Lhotse - gives up to 2x speedup * Transcription Improvements - Speedup and QoL Changes * ASR Prompt Formatter for multimodal Canary ### Aligner[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id41 "Link to this heading") * Speed up Aligner RLHF by 7x with TRT-LLM * Reward Preference Optimization (RPO) * Identity Preference Optimization (IPO) * SteerLM2 * Llama 3 performance and convergence example * Constitutional AI algorithm (RLAIF) ### Curator[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id42 "Link to this heading") * Semantic Deduplication * Resiliparse for Text Extraction * Improve Distributed Data Classification - Domain classifier is 1.55x faster through intelligent batching * Synthetic data generation for fine-tuning ### Export & Deploy[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id43 "Link to this heading") * In framework PyTriton deployment with backends: * PyTorch * vLLM * TRT-LLM update to 0.10 * TRT-LLM C++ runtime NeMo Framework 24.05[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-05 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- NeMo Framework now supports Large Language Models (LLM), Multimodal (MM), Automatic Speech Recognition (ASR), and Text-to-Speech (TTS) in a single consolidated container. ### LLM and MM[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#llm-and-mm "Link to this heading") * Megatron Core RETRO * Pre-training * Zero-shot Evaluation * Pretraining, conversion, evaluation, SFT, and PEFT for: * Mixtral 8X22B * Llama 3 * SpaceGemma * Embedding Models Fine Tuning * Mistral * BERT * BERT models * Distributed checkpoint * Video capabilities with NeVa ### Performance[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id44 "Link to this heading") * Distributed Checkpointing * Torch native backend * Parallel read/write * Async write * Multimodal LLM (LLAVA/NeVA) * Pipeline Parallelism support * Sequence packing support ### Export[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id45 "Link to this heading") * Integration of Export & Deploy Modules into NeMo Framework container * Upgrade to TRT-LLM 0.9 ### Curator[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id46 "Link to this heading") * SFT/PEFT (LoRA, and p-tuning) Data Curation Pipeline and Example * Dataset Blending Tool * Domain Classifier ### Aligner[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#id47 "Link to this heading") * LoRA techniques with: * PPO Actor * PDO * SFT/SteerLM * Stable Diffusion models ### Speech (ASR & TTS)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#speech-asr-tts "Link to this heading") * AED Multi Task Models (Canary) - Multi-Task Multi-Lingual Speech Recognition / Speech Translation model * Multimodal Domain - Speech LLM supporting SALM Model * Parakeet-tdt_ctc-1.1b Model - RTFx of > 1500 (can transcribe 1500 seconds of audio in 1 second) * Audio Codec 16kHz Small - NeMo Neural Audio Codec for discretizing speech for use in LLMs * mel_codec_22khz_medium * mel_codec_44khz_medium ### Perf Improvements[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#perf-improvements "Link to this heading") * Transcribe() upgrade - Enables one line transcribe with files, tensors, data loaders * Frame looping algorithm for RNNT faster decoding - Improves Real Time Factor (RTF) by 2-3x * Cuda Graphs + Label-Looping algorithm for RNN-T and TDT Decoding - Transducer Greedy decoding at over 1500x RTFx, on par with CTC Non-Autoregressive models * Semi Sorted Batching support - External User contribution that speeds up training by 15-30%. ### Customization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#customization "Link to this heading") * Context biasing for CTC word stamping - Improve accuracy for custom vocabulary and pronunciation ### Longform Inference[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#longform-inference "Link to this heading") * Longform inference support for AED models * Transcription of multi-channel audio for AED models ### Misc[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#misc "Link to this heading") * Upgraded webdataset - Speech and LLM / Multimodal unified container NeMo Framework 24.03.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-03-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- Issues Fixed * GPT Memory Leak at Loss Function * Eval script issue for Mixtral PEFT * Llama 7B Out-of-memory issue when using 1TB system memory * Enable Pipeline Parallelism support for LoRA merge * Multi-node Llama training on Kubernetes while saving checkpoint NeMo Framework 24.03[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-03 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * [Fully Sharded Data Parallel (FSDP)](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt/fsdp.html.md) support for GPT * Post-Training Quantization (PTQ) with AMMO library (0.7.4) for Llama * Support Expert Parallelism on all MoE models e.g. Mixtral * Pipeline parallel for p-tuning * Updated PEFT metrics for all popular community models. ([Support matrix Temp Internal Link](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/peft/landing_page.html.md)) * Upgraded PyTorch Lightning to 2.2 * Upgraded base container to PyTorch 24.02 * Consolidation of the StarCoder2 and Gemma specific containers, with the previous Framework GA container * Customizable distributed data classification tool in Curator * GPU-accelerated quality classification model code in Curator * GPU-accelerated domain classification model code in Curator NeMo Framework 24.01.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-01-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- * Added Mixture of Experts parameter passing for MCore * PP/TP support for Mixture of Experts * SFT / PEFT support for Gemma model * Training / SFT / PEFT / Evaluation support for - Baichuan model - CodeLlama model * Fixed SFT/PEFT support nemo-launcher configs for Mistral and Mixtral - Edited configs with correct values * Documentation refactor and landing page added * NeMo Framework developer docs added NeMo Framework 24.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-24-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * New end-to-end support (pretraining, conversion, evaluation, SFT, PEFT) for community models, featuring: * Support for community model Falcon * Support for community model Mixtral (expert parallelism coming in future release) * Support for community model Mistral * Support for community model Code Llama * General availability release of NeMo Multimodal, featuring: * Support for vision-language foundation models: CLIP * Support for text-2-image foundation models: Stable Diffusion and Imagen * Support for text-2-image customization: SD-LoRA, SD-ControlNet, SD-instruct pix2pix * Support for multimodal LLM: NeVA and LLAVA * Support for text-2-NeRF: DreamFusion++ * Support for NSFW * New performance features and key optimization: * Support PyTorch Fully Sharded Data Parallel training (FSDP) with tensor-parallelism * Support CPU offloading and prefetch of activations and weights * Support Context Parallelism for performant long-sequence-length LLM training * Support framework-level FP8 precision that reduces memory usage and training step time * Transformer layer granularity re-computation with FP8 LLM training * Support pipelined tensor-parallel communication overlap with GEMM for all LLMs * Support LLM fine-tuning with packed sequences * Support fused RoPE and Swiglu for LLAMA2 like models * Device memory bug fix; removed FP8 cast/transpose duplicates in FP8 training * New features for NeMo Aligner: * Support for MultiEpoch * Added PPO: custom end strings + memory optimizations * Added SFT: LoRa and custom validation metrics * New features for NeMo Curator: * Multi-node multi-GPU fuzzy document-level deduplication supported within the launcher. * Added new Personal Identifiable Information (PII) Removal module * Task decontamination for SFT and PEFT (e.g., LoRA, p-tuning, adapters, etc.) datasets supported within the launcher * Code data filtering heuristics from StarCoder NeMo Framework 23.11[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-11 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Open source release of [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner.md). NeMo-Aligner is a one stop shop for efficient model alignment algorithms, featuring: * Support for the full Reinforcement Learning from Human Feedback(RLHF) pipeline including SFT, Reward Model Training and Reinforcement Learning * Support for the SteerLM technique * Support for Direct Preference Optimization * Support for all Megatron Core GPT models such as LLAMA2 70B * Improved user experience NeMo Framework 23.10[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-10 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * General announcement of the NeMo Framework Inference container, featuring: * Deployment support for distributed checkpoints ([Megatron Core](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core.md)) for NeMotron 8B and Llama 2 (BF16 only) * Deployment support for fine-tuned (SFT, RLHF, SteerLM) NeMotron 8B (BF16 only) * Deployment support for P-tuned Llama 2 on a single GPU (BF16 only) * Support for serving GPT and Llama 2 models using PyTriton on Triton Inference Server * Support for serving GPT and Llama 2 models using TensorRT-LLM C++ backend on Triton Inference Server * Support in-flight batching for TensorRT-LLM C++ backend on Triton Inference Server NeMo Framework 23.08.03[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-08-03 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- * Enabled PEFT to work with Llama-2 models * Addressed an issue that occurred when resuming Supervised Fine-Tuning with constant learning rate scheduler * Fixed model parallelism bug in SFT and PEFT * Included P-tuning state dictionary handling for distributed checkpoints * Fixed bug that occurred when using the save_best_model flag * Fixed bug where progress bar would show the wrong number of steps NeMo Framework 23.08.02[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-08-02 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- * Fixed container paths in Hydra configurations NeMo Framework 23.08.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-08-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- * Fixed checkpoint search for distributed checkpoints NeMo Framework 23.08[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-08 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Added the Distributed Checkpoint Format to NeMo and Megatron Core for GPT * New GPT transformer from Megatron Core which enables training of improved LLM configs * When training 175B GPT with FP8, use tensor parallelism TP=8 and micro batch size MBS = 2 to ensure the model-parallel partitioning fits GPU memory * New GPT transformer from Megatron Core which enables Group and Multi Query Attention for models like LLAMA2 * Support Llama 1 and Llama 2 pre-training with Megatron Core * Customize LLMs for Llama 1 and Llama 2 models with techniques like SFT, PEFT (p-tuning, adapters, IA3) * Added examples and documentation for Kubernetes training * NeMo Data Curator: added downstream task decontamination support NeMo Framework 23.07[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-07 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Added Low-Rank Adaptation (LoRA) Support for T5 and mT5 * Added Batch Size Ramp-up Support for GPT NeMo Framework 23.05[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-05 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Low-Rank Adaptation (LoRA) Support for GPT * LDDL (Language Datasets and Data Loaders) for BERT on 100B model resulting in a 30% performance speedup * Unify dataset and model classes for all PEFT (p-tuning, adapters, IA3) with SFT model class as parent for GPT * Converter from Interleaved PP to non-Interleaved PP * Dialog dataset guidance for SFT to help create better chat models * Support Dynamic Sequence Length Batches with GPT SFT * Data parallelism enabled for RLHF servers, providing a 2x end-to-end speedup in most jobs NeMo Framework 23.04.1 ——————–@@ * Addressed issue in RLHF which prevented some jobs from running in Slurm clusters * Corrections related to the renaming of NeMo Megatron to NeMo Framework * Modified run.name in the *_improved configuration files to match the correct parameter count NeMo Framework 23.04[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-04 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports NeMo Data Curator, a scalable Python library for curating the large-scale datasets required for training large language foundation models * Enables continued training for P-tuning * Switches to Megatron Core for Model Parallelism * Extends the Data Validation Tool to provide P-tuning GPU runtime estimates * Supports tensor and pipeline parallelism conversion for GPT and T5 models * Supports supervised fine-tuning for GPT * Adds Reinforcement Learning from Human Feedback (RLHF) for GPT models * Adds four GPT model sizes based on new and improved model configurations: * 400M_improved * 1B_improved * 7B_improved * 40B_improved Following is a list of GPT model configuration changes: | Configuration | Previous | New | | --- | --- | --- | | Activation | GeLU | Fast-SwiGLU | | Position Embedding | Learned Absolute | RoPE | | Dropout | 0.1 | 0 | | Embeddings and Output Layer | Tied | Untied | | Bias terms | Yes | No | | Normalization | LayerNorm | LayerNorm1p | NeMo Framework 23.03[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-03 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Adds a per microbatch data loader for GPT and BERT models * Supports `SquaredReLU` and `SwiGLU` activation functions for GPT and T5 models * Supports Rotary Position Embedding (RoPE) for GPT and RETRO * Supports early stopping when P‑tuning or prompt tuning GPT, T5, and mT5 models * Implements refactored adapter learning to mimic the parameter-efficient transfer learning of the NLP approach * Adds flash attention for GPT models in Transformer Engine NeMo Framework 23.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-23-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports BERT models with tensor parallelism (training only) * Supports BERT models with pipeline parallelism (training only) * Supports sequence parallelism and selective activation checkpointing for BERT (training only) * Supports interleaved pipeline scheduling for BERT models * Adds Distributed Adam Optimizer for BERT models * Supports AutoConfigurator for BERT models * Adds 110M, 4B, 20B, and 100B BERT training configurations * Supports mixture-of-experts for T5 models (no expert parallelism, training only) * Improves performance for GPT P‑tuning (20%−25% speed-up) * Adds ALiBi position embeddings for T5 and mT5 (training only) * Logs total model size (across modal parallel ranks) for GPT, T5, mT5, and BERT models NeMo Framework 22.11[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-11 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Adds interleaved pipeline scheduling for GPT models (training only) * Supports FP8 using Transformer Engine (training only) * Adds Distributed Adam Optimizer for T5 and mT5 models * Supports P‑tuning and prompt tuning for GPT models with sequence parallelism * Improves training configurations throughput by 7.9% (5B GPT), 9.6% (3B T5), 4.3% (11B T5), 52.4% (23B T5), and 26.6% (41B T5) NeMo Framework 22.09[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-09 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports NeMo Framework training and inference containers on OCI; for details on orchestration scripts, reach out to [oci_nm@nvidia.com](mailto:oci_nm%40nvidia.com.md) * Supports P‑tuning and prompt tuning for T5 and mT5 models with pipeline parallelism (training only) * Supports adapter learning for GPT and T5 with tensor parallelism and pipeline parallelism (training only) * Supports IA3 learning for GPT and T5 with tensor parallelism and pipeline parallelism (training only) * Adds AutoConfigurator to find the highest-throughput configurations for training on Base Command Platform * Adds AutoConfigurator for parallel inference hyperparameter search for GPT on Base Command Manager NeMo Framework 22.08.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-08-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- * Supports Amazon Web Services as a cloud service provider (performance validated up to 20 `p4d.24xlarge` instances) * Adds switched orchestration for cloud service providers from Azure CycleCloud to NVIDIA Nephele for Microsoft Azure NeMo Framework 22.08[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-08 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Adds distributed Adam Optimizer for GPT models * Adds asymmetric encoder-decoder configuration for T5 and mT5 models * Supports untying embeddings from the classifier layer for T5 and mT5 models * Supports relative position embeddings for T5 and mT5 models (pipeline parallelism ≥3) * Supports P‑tuning and prompt tuning for T5 and mT5 models with tensor parallelism (training only) * Refactors code to yield improved consistency and readability of configurations and logs * Supports SQuAD fine-tuning and evaluation for T5 models with pipeline parallelism ≤2 * Supports XQuAD fine tuning-and evaluation for mT5 models with pipeline parallelism ≤2 NeMo Framework 22.06-hotfix.01 —————————-@@ * Fixes AutoConfigurator for T5 and mT5 models * Fixes Evaluation harness in GPT models * Fixes Prompt learning in GPT models * Fixes “out of memory” condition when pretraining GPT models with sequence parallelism NeMo Framework 22.06[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-06 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports sequence parallelism and selective activation checkpointing for GPT * Supports relative position embeddings for T5 NVIDIA used the mC4 dataset (24 Languages) for pretraining the mT5 models, and verified the results on KNLI, KorQuAD, KLUE-STS, and XNLI tasks. * Updates AutoConfigurator with sequence parallelism and selective activation checkpointing for GPT models * Adds AutoConfigurator support for DGX A100 40GB configurations for GPT, T5, and mT5 models * Supports P‑tuning and prompt tuning for GPT with pipeline parallelism (training only) * Supports operation fusions for higher training throughput (2%-7% speed-up) * Changes default GPT configurations to include sequence parallelism and selective activation checkpointing: 20B (speed-up: 14%), 40B (speed-up: 9%), and 175B (speed-up: 15%) NeMo Framework 22.05.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-05-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- * Adds cloud service provider support for Microsoft Azure (performance validated up to 36 `Standard_ND96amsr_A100_v4` instances) * Adds cluster validation tools (DGMI, NCCL) * Improves performance of 20B GPT training configuration by 2.7% NeMo Framework 22.05[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-05 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports asynchronous gradient all-reduce for GPT, T5, mT5 models with pipeline parallel size equal to 1 * Supports P‑tuning and prompt tuning for GPT with tensor parallelism (training only) * Adds AutoConfigurator to find the highest-throughput configurations for training and inference on Base Command Manager * Supports custom tokenizers (training only) * Supports GPT models with pipeline parallelism on Base Command Manager (inference) * Supports new hyperparameters for text generation: `top-p`, `top-k`, and `temperature` NeMo Framework 22.04[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-04 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports T5 models with pipeline parallelism (training only) * Switches from GeLU to GeGLU as activation function for T5 * Supports mT5 with tensor parallelism and pipeline parallelism (training only) * Adds 11B, 23B, and 41B T5 model training configurations * Adds 170M, 390M, and 3B mT5 model training configurations * Adds automatic and configurable Non-Uniform Memory Access (NUMA) mapping NeMo Framework 22.03[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-03 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Adds tensor parallelism support for T5 models (optimized for <20B parameters, training only) * Adds 220M and 3B T5 model training configurations * Supports GLUE fine-tuning and evaluation for T5 models NeMo Framework 22.02[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-02 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports GPT models with pipeline parallelism (training only) * Adds 40B and 175B GPT model training configurations NeMo Framework 22.01[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-01 "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- * Supports GPT with tensor parallelism on Base Command Platform * Supports O2-style AMP (accelerated training of larger models) * Includes a chatbot sample application using your trained GPT model * Supports training metric monitoring and visualization with Weights & Biases Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html.md#nemo-framework-22-01) - [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo.md) - [PR](https://github.com/NVIDIA-NeMo/Automodel/pull/81.md) - [Docs](https://docs.nvidia.com/nemo/automodel/latest/guides/llm/dataset.html.md#enable-packed-sequences-in-nemo-automodel) - [Export & Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy.md) - [NeMo Export-Deploy Release](https://github.com/NVIDIA-NeMo/Export-Deploy/releases.md) - [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo.md) - [NeMo Curator GitHub Release](https://github.com/NVIDIA-NeMo/Curator/releases.md) - [NeMo RL GitHub Release](https://github.com/NVIDIA-NeMo/RL/releases.md) - [here](https://docs.nvidia.com/nemo/rl/latest/docker.html.md) - [NVIDIA Security](https://www.nvidia.com/en-us/security.md/) - [PSIRT@nvidia.com](mailto:PSIRT%40nvidia.com.md) - [nvidia-lm-eval](https://pypi.org/project/nvidia-lm-eval.md/) - [Local checkpointing](https://nvidia.github.io/nvidia-resiliency-ext/checkpointing/local/index.html.md) - [Known Issue](https://docs.nvidia.com/nemo-framework/user-guide/latest/knownissues.html.md#llama4-known-issue) - [Nemo RL](https://github.com/NVIDIA/NeMo-RL.md) - [migration guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html.md) - [docs](https://github.com/NVIDIA/NeMo/blob/main/docs/source/nlp/distillation.rst.md) - [Uzbek](https://huggingface.co/nvidia/stt_uz_fastconformer_hybrid_large_pc.md) - [Gregorian](https://huggingface.co/nvidia/stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc.md) - [Fully Sharded Data Parallel (FSDP)](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt/fsdp.html.md) - [Support matrix Temp Internal Link](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/peft/landing_page.html.md) - [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner.md) - [Megatron Core](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core.md) - [oci_nm@nvidia.com](mailto:oci_nm%40nvidia.com.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md Title: Fine-tuning with Custom Datasets — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html Published Time: Fri, 05 Sep 2025 18:59:46 GMT Markdown Content: Fine-tuning with Custom Datasets[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md#fine-tuning-with-custom-datasets "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Overview[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md#overview "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------- The `FineTuningDataModule` is a base class in NeMo 2 for fine-tuning Large Language Models (LLMs) for supervised tasks, such as question answering, instruction tuning, function calling, etc. It handles data loading, preprocessing, and batch creation for training, validation, and testing phases. This class integrates with PyTorch Lightning’s `LightningDataModule` and NeMo’s SFT dataset classes (`GPTSFTDataset`, `GPTSFTChatDataset`, and `GPTSFTPackedDataset`). NeMo’s fine-tuning datasets are formatted as jsonl files. Each file contains lines of json-formatted text, and each line should contain a minimum of two keys, “input” and “output”. Additional keys can be added, and are returned by the data loader as is. This is useful, for example, if you want to filter or modify any data on-the-fly. {"input": "This is the input/prompt/context/question for sample 1. Escape any double quotes like \"this\".", "output": "This is the output/answer/completion part of sample 1"} {"input": "This is the input/prompt/context/question for sample 2. Escape any double quotes like \"this\".", "output": "This is the output/answer/completion part of sample 2"} ... During training, by default, “input” and “output” are naively concatenated to be passed into the transformer model. Moreover, loss is only computed on the “output” tokens by default. These two behaviors can be customized with the `dataset_kwargs` field in the data module. FineTuningDataModule( ..., dataset_kwargs={ "prompt_template": "Question: {input} Answer: {output}", # default is "{input} {output}" (naive concatenation) "answer_only_loss": False, # default is True (only calculate loss on answer/output) } ) NeMo 2 comes with a few pre-defined dataset-specific data modules which subclass `FineTuningDataModule`, so that users can get started with fine-tuning in NeMo 2 easily. See the list of pre-defined data modules [here](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/gpt/data). When you are ready to use your own datasets, this guide provides you with two options to prepare the datasets for training in NeMo 2. Option 1: Create a Custom DataModule[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md#option-1-create-a-custom-datamodule "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ To create your own DataModule, subclass `FineTuningDataModule` and implement the necessary preprocessing logic, similar to the pre-defined data modules [here](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/gpt/data). * `_download_data()` defines the logic to download the raw dataset from the internet. If you dataset is locally hosted, you can load it in this function and return the loaded dataset. * `_preprocess_and_split_data()` defines the logic to preprocess the raw data into the jsonl format specified above, as well as splitting the dataset into training, validation, and test sets. The function should save three files: > dataset_root/ > ├── training.jsonl > ├── validation.jsonl > └── test.jsonl Note: Both of these functions are called by the `prepare_data()` hook in Pytorch Lightning, which runs this function in a single process. You can find an end-to-end tutorial utilizing a custom data module here: [🔗 Create a Distillation Pipeline to Distill DeepSeek-R1 into Qwen model with NeNo 2.0 Framework](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/distill_deepseek_r1/qwen2_distill_nemo.ipynb). Option 2: Use FineTuningDataModule with Preprocessed Data[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md#option-2-use-finetuningdatamodule-with-preprocessed-data "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ If you prefer preprocessing the dataset offline, you can also use `FineTuningDataModule` directly by specifying the location of preprocessed data. 1. Create training, validation and test files by preprocessing your raw data into the format specified above: > your_dataset_root/ > ├── training.jsonl > ├── validation.jsonl > └── test.jsonl 2. Set up FineTuningDataModule to point to `dataset_root`, as well as any additional kwargs, if needed. > FineTuningDataModule( > dataset_root="your_dataset_root", > seq_length=512, > micro_batch_size=1, > global_batch_size=128, > dataset_kwargs={}, > ) You can find an end-to-end tutorial utilizing data prepared offline here: [🔗 Fine-Tuning LLMs for Function Calling](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/function_calling/nemo2-chat-sft-function-calling.ipynb). This tutorial uses `ChatDataModule`, which sets a few default arguments on top of `FineTuningDataModule`, but is otherwise the same. Advanced Features[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md#advanced-features "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------- 1. **Packed Sequence Training**: To minimize the impact of padding for uneven sequence lengths, you can enable packed sequence training by providing `packed_sequence_specs`. Read more here: [Sequence Packing](https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/packed_sequence.html.md#packed-seq). 2. **Sequence Length Truncation**: You can customize how a sequence is truncated when it is longer than `seq_length` using the following two dataset_kwargs: `truncation_field` and `truncation_method`. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md#advanced-features) - [here](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/gpt/data) - [🔗 Create a Distillation Pipeline to Distill DeepSeek-R1 into Qwen model with NeNo 2.0 Framework](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/distill_deepseek_r1/qwen2_distill_nemo.ipynb) - [🔗 Fine-Tuning LLMs for Function Calling](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/function_calling/nemo2-chat-sft-function-calling.ipynb) - [Sequence Packing](https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/packed_sequence.html.md#packed-seq) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/data/index.html.md Title: NeMo 2.0 Data Modules — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/data/index.html Published Time: Fri, 18 Jul 2025 19:27:25 GMT Markdown Content: NeMo 2.0 Data Modules[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/index.html.md#nemo-2-0-data-modules "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------- NeMo provides two primary data modules for working with Large Language Models (LLMs): PreTrainingDataModule[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/index.html.md#pretrainingdatamodule "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------- Located in `nemo.collections.llm.gpt.data.pre_training`, this module is optimized for unsupervised pre-training of LLMs from scratch on large corpora of text data. In this case, the dataset is pre-tokenized and saved as token indices on disk using the [Megatron dataset format](https://github.com/NVIDIA/Megatron-LM/blob/main/tools/preprocess_data.py.md). It supports: * Training on multiple data distributions with customizable weights * Efficient data loading through memory mapping * Automatic validation and test set creation * Built-in data validation and accessibility checks * Support for distributed training with Megatron-style data parallelism FineTuningDataModule[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/index.html.md#finetuningdatamodule "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------- Located in `nemo.collections.llm.gpt.data.fine_tuning`, this module is designed for supervised fine-tuning (including parameter-efficient fine-tuning) of pre-trained models on specific tasks or domains. Key features include: * Support for standard fine-tuning datasets in JSONL format * Packed sequence training for improved efficiency * Automatic handling of train/validation/test splits * Integration with various tokenizers * Memory-efficient data loading Both modules inherit from PyTorch Lightning’s `LightningDataModule`, providing a consistent interface while being optimized for their respective use cases. The separation between pre-training and fine-tuning data modules reflects the distinct requirements and optimizations needed for these two phases of LLM development. For detailed usage of the two data modules, please see the following pages. * [Pre-Training Data Module](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/pretrain_data.html.md) * [Fine-Tuning Data Module](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md) Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/index.html.md#finetuningdatamodule) - [Megatron dataset format](https://github.com/NVIDIA/Megatron-LM/blob/main/tools/preprocess_data.py.md) - [Pre-Training Data Module](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/pretrain_data.html.md) - [Fine-Tuning Data Module](https://docs.nvidia.com/nemo-framework/user-guide/latest/data/finetune_data.html.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md Title: Task Decontamination — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html Published Time: Fri, 18 Jul 2025 19:26:26 GMT Markdown Content: Task Decontamination[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#task-decontamination "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Base Class[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#base-class "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo_curator.tasks.DownstreamTask[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.DownstreamTask "Link to this definition")_class_ nemo_curator.tasks.import_task(_task\_path:str_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.import_task "Link to this definition") Module[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#module "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo_curator.TaskDecontamination(_tasks:[DownstreamTask](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo\_curator.tasks.DownstreamTask "nemo\_curator.tasks.downstream\_task.DownstreamTask")|Iterable[[DownstreamTask](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo\_curator.tasks.DownstreamTask "nemo\_curator.tasks.downstream\_task.DownstreamTask")]_,_text\_field:str='text'_,_max\_ngram\_size:int=13_,_max\_matches:int=10_,_min\_document\_length:int=200_,_remove\_char\_each\_side:int=200_,_max\_splits:int=10_,_removed\_dir:str|None=None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.TaskDecontamination "Link to this definition")call(_dataset:[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo\_curator.datasets.DocumentDataset "nemo\_curator.datasets.doc\_dataset.DocumentDataset")_,)→[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.doc_dataset.DocumentDataset")[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.TaskDecontamination.call "Link to this definition") Performs an arbitrary operation on a dataset Parameters: **dataset** ([_DocumentDataset_](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.DocumentDataset")) – The dataset to operate on prepare_task_ngram_count()→dict[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.TaskDecontamination.prepare_task_ngram_count "Link to this definition") Computes a dictionary of all ngrams in each task as keys and each value set to 0. Tasks[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#tasks "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo_curator.tasks.Race(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Race "Link to this definition")_class_ nemo_curator.tasks.Squad(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Squad "Link to this definition")_class_ nemo_curator.tasks.ArcEasy(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.ArcEasy "Link to this definition")_class_ nemo_curator.tasks.ArcChallenge(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.ArcChallenge "Link to this definition")_class_ nemo_curator.tasks.OpenBookQA(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.OpenBookQA "Link to this definition")_class_ nemo_curator.tasks.BoolQ(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.BoolQ "Link to this definition")_class_ nemo_curator.tasks.Copa(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Copa "Link to this definition")_class_ nemo_curator.tasks.RTE(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.RTE "Link to this definition")_class_ nemo_curator.tasks.MultiRC(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.MultiRC "Link to this definition")_class_ nemo_curator.tasks.WSC(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.WSC "Link to this definition")_class_ nemo_curator.tasks.CB(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.CB "Link to this definition")_class_ nemo_curator.tasks.ANLI(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.ANLI "Link to this definition")_class_ nemo_curator.tasks.Record(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Record "Link to this definition")_class_ nemo_curator.tasks.COQA(_file\_path:str|None=None_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.COQA "Link to this definition")_class_ nemo_curator.tasks.TriviaQA(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.TriviaQA "Link to this definition")_class_ nemo_curator.tasks.Quac(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Quac "Link to this definition")_class_ nemo_curator.tasks.WebQA(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.WebQA "Link to this definition")_class_ nemo_curator.tasks.Drop(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Drop "Link to this definition")_class_ nemo_curator.tasks.WiC(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.WiC "Link to this definition")_class_ nemo_curator.tasks.MMLU(_path:str|None=None_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.MMLU "Link to this definition")_class_ nemo_curator.tasks.BigBenchHard(_path:str|None=None_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.BigBenchHard "Link to this definition")_class_ nemo_curator.tasks.BigBenchLight(_path:str|None=None_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.BigBenchLight "Link to this definition")_class_ nemo_curator.tasks.Multilingual(_path:str|None=None_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Multilingual "Link to this definition")_class_ nemo_curator.tasks.PIQA(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.PIQA "Link to this definition")_class_ nemo_curator.tasks.Winogrande(_min\_ngram\_size:int=8_, _max\_ngram\_size:int=13_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Winogrande "Link to this definition")_class_ nemo_curator.tasks.Lambada(_file\_path:str_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.Lambada "Link to this definition")_class_ nemo_curator.tasks.NumDasc(_n:int_,_file\_path:str_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.NumDasc "Link to this definition")_class_ nemo_curator.tasks.StoryCloze(_file\_path:str_,_min\_ngram\_size:int=8_,_max\_ngram\_size:int=13_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.StoryCloze "Link to this definition") Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/decontamination.html.md#nemo_curator.tasks.StoryCloze) - [DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md Title: Miscellaneous — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html Published Time: Fri, 18 Jul 2025 19:26:27 GMT Markdown Content: Miscellaneous[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#miscellaneous "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------ _class_ nemo_curator.Sequential(_modules:list[BaseModule]_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.Sequential "Link to this definition")@nemo_curator.utils.decorators.batched[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.utils.decorators.batched "Link to this definition") Marks a function as accepting a pandas series of elements instead of a single element Parameters: **function** – The function that accepts a batch of elements _class_ nemo_curator.AddId(_id\_field:str_,_id\_prefix:str='doc\_id'_,_start\_index:int|None=None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.AddId "Link to this definition")call(_dataset:[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo\_curator.datasets.DocumentDataset "nemo\_curator.datasets.doc\_dataset.DocumentDataset")_,)→[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.doc_dataset.DocumentDataset")[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.AddId.call "Link to this definition") Performs an arbitrary operation on a dataset Parameters: **dataset** ([_DocumentDataset_](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.DocumentDataset")) – The dataset to operate on _class_ nemo_curator.blend_datasets(_target\_size:int_,_datasets:list[[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo\_curator.datasets.DocumentDataset "nemo\_curator.datasets.doc\_dataset.DocumentDataset")]_,_sampling\_weights:list[float]_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.blend_datasets "Link to this definition") Combines multiple datasets into one with different amounts of each dataset. :param target_size: The number of documents the resulting dataset should have. > The actual size of the dataset may be slightly larger if the normalized weights do not allow for even mixtures of the datasets. Parameters: * **datasets** – A list of all datasets to combine together * **sampling_weights** – A list of weights to assign to each dataset in the input. Weights will be normalized across the whole list as a part of the sampling process. For example, if the normalized sampling weight for dataset 1 is 0.02, 2% ofthe total samples will be sampled from dataset 1. There are guaranteed to be math.ceil(normalized_weight_i * target_size) elements from dataset i in the final blend. _class_ nemo_curator.Shuffle(_seed:int|None=None,npartitions:int|None=None,partition\_to\_filename:~collections.abc.Callable[[int],str]=,filename\_col:str='file\_name'_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.Shuffle "Link to this definition")call(_dataset:[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo\_curator.datasets.DocumentDataset "nemo\_curator.datasets.doc\_dataset.DocumentDataset")_,)→[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.doc_dataset.DocumentDataset")[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.Shuffle.call "Link to this definition") Performs an arbitrary operation on a dataset Parameters: **dataset** ([_DocumentDataset_](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.DocumentDataset")) – The dataset to operate on _class_ nemo_curator.DocumentSplitter(_separator:str_,_text\_field:str='text'_,_segment\_id\_field:str='segment\_id'_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.DocumentSplitter "Link to this definition") Splits documents into segments based on a separator. Each segment is a new document with an additional column indicating the segment id. To restore the original document, ensure that each document has a unique id prior to splitting. call(_dataset:[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo\_curator.datasets.DocumentDataset "nemo\_curator.datasets.doc\_dataset.DocumentDataset")_,)→[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.doc_dataset.DocumentDataset")[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.DocumentSplitter.call "Link to this definition") Splits the documents into segments based on the separator and adds a column indicating the segment id. _class_ nemo_curator.DocumentJoiner(_separator:str_,_text\_field:str='text'_,_segment\_id\_field:str='segment\_id'_,_document\_id\_field:str='id'_,_drop\_segment\_id\_field:bool=True_,_max\_length:int|None=None_,_length\_field:str|None=None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.DocumentJoiner "Link to this definition") Joins documents that have a common id back into a single document. The order of the documents is dictated by an additional segment_id column. A maximum length can be specified to limit the size of the joined documents. The joined documents are joined by a separator. call(_dataset:[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo\_curator.datasets.DocumentDataset "nemo\_curator.datasets.doc\_dataset.DocumentDataset")_,)→[DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset "nemo_curator.datasets.doc_dataset.DocumentDataset")[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.DocumentJoiner.call "Link to this definition") Joins the documents back into a single document while preserving all the original fields. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/misc.html.md#nemo_curator.DocumentJoiner.call) - [DocumentDataset](https://docs.nvidia.com/nemo-framework/user-guide/latest/datacuration/api/datasets.html.md#nemo_curator.datasets.DocumentDataset) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/libraries/index.html.md Title: Library Documentation — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/libraries/index.html Published Time: Thu, 30 Oct 2025 07:07:28 GMT Markdown Content: Library Documentation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/libraries/index.html.md#library-documentation "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------- [NeMo](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/index.html.md), developed by NVIDIA, is a generative AI framework targeting researchers and developers who use PyTorch. Its core purpose is to provide a robust and scalable framework to facilitate the design and implementation of generative AI models. NeMo simplifies access to pre-existing code and pretrained models, helping users from both industry and academia accelerate their development processes. The developer guide offers extensive technical details regarding NeMo’s design, implementation, and optimizations. [NeMo AutoModel](https://docs.nvidia.com/nemo/automodel/latest/index.html) includes a suite of libraries and recipe collections that help users train models end to end. The AutoModel library (“NeMo AutoModel”) provides Day-0 GPU-accelerated PyTorch training for Hugging Face models. Users can start training and fine-tuning instantly with no conversion delays, and scale effortlessly using PyTorch-native parallelisms, optimized custom kernels, and memory-efficient recipes—all while preserving the original checkpoint format for smooth integration across the Hugging Face ecosystem. [NeMo Curator](https://docs.nvidia.com/nemo/curator/latest/index.html.md) is a Python library composed of several scalable data-mining modules, specifically designed for curating Natural Language Processing (NLP) data to train large language models (LLMs). It enables NLP researchers to extract high-quality text from vast, uncurated web corpora efficiently, supporting the development of more accurate and powerful language models. [NeMo Eval](https://docs.nvidia.com/nemo/evaluator/latest/index.html.md) is a comprehensive evaluation module under NeMo Framework for Large Language Models (LLMs). It provides seamless deployment and evaluation capabilities for models trained using NeMo Framework via state-of-the-art evaluation harnesses. [NeMo Export and Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/index.html.md) provides tools and APIs for exporting and deploying NeMo and Hugging Face models to production environments. It supports various deployment paths including TensorRT, TensorRT-LLM, and vLLM deployment through NVIDIA Triton Inference Server. [NeMo Megatron Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/index.html.md) provides seamless bidirectional conversion between select Hugging Face and Megatron model definitions and checkpoints, along with robust, performant recipes built with Megatron Core for pretraining and fine-tuning large language models. It delivers state-of-the-art training throughput with advanced parallelism strategies and mixed precision support optimized for maximum performance at scale. [NeMo RL](https://docs.nvidia.com/nemo/rl/latest/index.html.md) is a scalable and efficient post-training library designed for models ranging from 1 GPU to thousands, and from tiny to over 100 billion parameters. What you can expect: * **Seamless integration with Hugging Face** for ease of use, allowing users to leverage a wide range of pre-trained models and tools. * **High-performance implementation with Megatron Core**, supporting various parallelism techniques for large models (>100B) and large context lengths. * **Efficient resource management using Ray**, enabling scalable and flexible deployment across different hardware configurations. * **Flexibility** with a modular design that allows easy integration and customization. * **Comprehensive documentation** that is both detailed and user-friendly, with practical examples. [NeMo Run](https://docs.nvidia.com/nemo/run/latest/index.html.md) is a powerful tool designed to streamline the configuration, execution and management of Machine Learning experiments across various computing environments. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/libraries/index.html.md#library-documentation) - [NeMo](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/index.html.md) - [NeMo AutoModel](https://docs.nvidia.com/nemo/automodel/latest/index.html) - [NeMo Curator](https://docs.nvidia.com/nemo/curator/latest/index.html.md) - [NeMo Eval](https://docs.nvidia.com/nemo/evaluator/latest/index.html.md) - [NeMo Export and Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/index.html.md) - [NeMo Megatron Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/index.html.md) - [NeMo RL](https://docs.nvidia.com/nemo/rl/latest/index.html.md) - [NeMo Run](https://docs.nvidia.com/nemo/run/latest/index.html.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma2.html.md Title: Gemma 2 — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma2.html Published Time: Fri, 05 Sep 2025 18:59:52 GMT Markdown Content: Gemma 2[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma2.html.md#gemma-2 "Link to this heading") -------------------------------------------------------------------------------------------------------------------- Gemma 2 offers three new, powerful, and efficient models available in 2, 9, and 27 billion parameter sizes, all with built-in safety advancements. It adopts the transformer decoder framework while adding multi-query attention, RoPE, GeGLU activations, and more. More information is available in Google’s release blog. Note Currently, Gemma 2 does not support CuDNN Fused Attention. The recipes disable CuDNN attention and use Flash Attention instead. We provide pre-defined recipes for finetuning Gemma 2 models using NeMo 2.0 and [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). These recipes configure a `run.Partial` for one of the [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) api functions introduced in NeMo 2.0. The recipes are hosted in [gemma_2_2b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma2_2b.py), [gemma_2_9b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma2_9b.py), and [gemma_2_27b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma2_27b.py). NeMo 2.0 Finetuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma2.html.md#nemo-2-0-finetuning-recipes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------ Note The finetuning recipes use the `SquadDataModule` for the `data` argument. You can replace the `SquadDataModule` with your custom dataset. To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once) from nemo.collections import llm llm.import_ckpt(model=llm.Gemma2Model(llm.Gemma2Config2B()), source='hf://google/gemma-2-2b') By default, the non-instruct version of the model is loaded. To load a different model, set `finetune.resume.restore_config.path=nemo://` or `finetune.resume.restore_config.path=` We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm recipe = llm.gemma2_2b.finetune_recipe( name="gemma2_2b_finetuning", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='lora', # 'lora', 'none' packed_sequence=False, ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # gbs=gbs, # mbs=mbs, # seq_length=recipe.model.config.seq_length, # ) # recipe.data = dataloader By default, the finetuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model. To finetune the entire model without LoRA, set `peft_scheme='none'` in the recipe argument. To finetune with sequence packing for a higher throughput, set `packed_sequence=True`. Note that you may need to tune the global batch size in order to achieve similar convergence. Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides) to learn more about its configuration and execution system. Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(recipe, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(recipe, direct=True) | Recipe | Status | | --- | --- | | Gemma 2 2B | Yes | | Gemma 2 9B | Yes | | Gemma 2 27B | Yes | Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma2.html.md#nemo-2-0-finetuning-recipes) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) - [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) - [gemma_2_2b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma2_2b.py) - [gemma_2_9b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma2_9b.py) - [gemma_2_27b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma2_27b.py) - [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md Title: GPT-OSS — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html Published Time: Thu, 30 Oct 2025 07:07:28 GMT Markdown Content: GPT-OSS[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#gpt-oss "Link to this heading") --------------------------------------------------------------------------------------------------------------------- GPT-OSS is an open-weight model released by OpenAI, providing transparent and accessible large language models. GPT-OSS models are built on the Mixture-of-Experts (MoE) transformer decoder architecture with Sink Attention and alternating Sliding-Window Attention (SWA). The model family includes two variants: GPT-OSS 20B and GPT-OSS 120B, designed to serve different computational requirements while maintaining high-quality text generation capabilities. The models are designed to be used within agentic workflows with strong instruction following, tool use like web search and Python code execution, and reasoning capabilities—including the ability to adjust the reasoning effort for tasks that don’t require complex reasoning. We provide pre-defined recipes for finetuning GPT-OSS models in two sizes: 20B and 120B using NeMo 2.0 and [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). These recipes configure a `run.Partial` for one of the [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) api functions introduced in NeMo 2.0. Note Please use the custom container `nvcr.io/nvidia/nemo:25.07.gpt_oss` when working with GPT-OSS. Please make sure you update to the latest version of `transformers`. NeMo 2.0 Finetuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#nemo-2-0-finetuning-recipes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------- Note The finetuning recipes use the `SquadDataModule` for the `data` argument. You can replace the `SquadDataModule` with your custom dataset. Note that this model is a reasoning model with a specific chat template, so it is best to use a chat dataset with the `use_hf_tokenizer_chat_template=True` argument when finetuning. To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once): cd apt-get update && apt-get install git-lfs git lfs install git clone https://huggingface.co/openai/gpt-oss-20b git clone https://huggingface.co/openai/gpt-oss-120b from nemo.collections import llm if __name__ == "__main__": # For GPT-OSS 20B llm.import_ckpt(model=llm.GPTOSSModel(llm.GPTOSSConfig20B()), source='hf:////gpt-oss-20b') # For GPT-OSS 120B # llm.import_ckpt(model=llm.GPTOSSModel(llm.GPTOSSConfig120B()), source='hf:////gpt-oss-120b') To import the original OpenAI checkpoint and convert to NeMo 2.0 format, run the following command (this only needs to be done once): from nemo.collections import llm if __name__ == "__main__": # For GPT-OSS 20B llm.import_ckpt(model=llm.GPTOSSModel(llm.GPTOSSConfig20B()), source='openai:///path/to/gpt-oss-20b') # For GPT-OSS 120B # llm.import_ckpt(model=llm.GPTOSSModel(llm.GPTOSSConfig120B()), source='openai:///path/to/gpt-oss-120b') We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm # For GPT-OSS 20B recipe = llm.gpt_oss_20b.finetune_recipe( name="gpt_oss_20b_finetuning", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='lora', # 'lora', 'none' ) # For GPT-OSS 120B # recipe = llm.gpt_oss_120b.finetune_recipe( # name="gpt_oss_120b_finetuning", # dir=f"/path/to/checkpoints", # num_nodes=4, # num_gpus_per_node=8, # peft_scheme='lora', # 'lora', 'none' # ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # gbs=gbs, # mbs=mbs, # seq_length=recipe.model.config.seq_length, # use_hf_tokenizer_chat_template=True, # ) # recipe.data = dataloader By default, the finetuning recipe will run LoRA finetuning with LoRA applied to linear layers in the attention block in the language model. To finetune the entire model without LoRA, set `peft_scheme='none'` in the recipe argument. Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://docs.nvidia.com/nemo/run/latest/guides/) to learn more about its configuration and execution system. Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(recipe, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(recipe, direct=True) Inference[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#inference "Link to this heading") ------------------------------------------------------------------------------------------------------------------------- To run inference with GPT-OSS models, you can use the following command: # For GPT-OSS 20B torchrun --nproc-per-node=1 /opt/NeMo/scripts/llm/generate.py \ --model_path= \ --devices=1 \ --num_tokens_to_generate=512 \ --temperature=0.0 \ --top_p=0.0 \ --top_k=1 \ --disable_flash_decode # For GPT-OSS 120B # torchrun --nproc-per-node=8 /opt/NeMo/scripts/llm/generate.py \ # --model_path= \ # --ep=4 \ # --pp=2 \ # --devices=8 \ # --num_tokens_to_generate=512 \ # --temperature=0.0 \ # --top_p=0.0 \ # --top_k=1 \ # --disable_flash_decode Export to HF[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#export-to-hf "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------- After training or finetuning your GPT-OSS model, you can export it to Hugging Face format for easy sharing and deployment: from nemo.collections import llm # Export NeMo checkpoint to Hugging Face format llm.export_ckpt( target="hf", path=Path(""), output_path=Path(""), ) Note Ensure you have sufficient disk space and appropriate permissions when exporting large models. The export process may take some time depending on the model size and your storage setup. Deployment[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#deployment "Link to this heading") --------------------------------------------------------------------------------------------------------------------------- ### Install TensorRT-LLM in NeMo GPT-OSS container[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#install-tensorrt-llm-in-nemo-gpt-oss-container "Link to this heading") The NeMo Framework container does not have TensorRT-LLM pre-installed. To expoert gpt-oss to TRT-LLM, run the following commands inside the container `nvcr.io/nvidia/nemo:25.07.gpt_oss`. 1. Reinstall NeMo Export-Deploy package rm -r /opt/Export-Deploy && pip install git+https://github.com/NVIDIA-NeMo/Export-Deploy.git 2. Install prerequisites for TensorRT-LLM curl -sL https://github.com/NVIDIA/TensorRT-LLM/raw/refs/heads/feat/gpt-oss/docker/common/install_tensorrt.sh | bash 3. Install TensorRT-LLM with GPT-OSS branch git clone -b feat/gpt-oss --single-branch https://github.com/NVIDIA/TensorRT-LLM.git cd TensorRT-LLM python scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --benchmark --job_count $(nproc) pip install ./build/tensorrt_llm*.whl ### Export and Deploy Hugging Face checkpoint to TensorRT-LLM and Triton Inference Server[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#export-and-deploy-hugging-face-checkpoint-to-tensorrt-llm-and-triton-inference-server "Link to this heading") from nemo_deploy.nlp.trtllm_api_deployable import TensorRTLLMAPIDeployable from nemo_deploy import DeployPyTriton deployable = TensorRTLLMAPIDeployable( hf_model_id_path="openai/gpt-oss-120b", tensor_parallel_size=2, ) output = deployable.generate( prompts=["What is the color of a banana?"], max_length=20, ) print("output: ", output) # Deploy to Triton nm = DeployPyTriton(model=deployable, triton_model_name="gpt-oss", http_port=8000) nm.deploy() nm.serve() ### Query Triton Inference Server[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#query-triton-inference-server "Link to this heading") from nemo_deploy.nlp import NemoQueryTRTLLMAPI nq = NemoQueryTRTLLMAPI(url="localhost:8000", model_name="gpt-oss") output = nq.query_llm( prompts=["What is the capital of France?"], max_length=100, ) print(output) Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#query-triton-inference-server) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) - [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) - [documentation](https://docs.nvidia.com/nemo/run/latest/guides/) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md Title: Mixtral — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html Published Time: Thu, 30 Oct 2025 07:07:28 GMT Markdown Content: Mixtral[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#mixtral "Link to this heading") --------------------------------------------------------------------------------------------------------------------- Released in December 2023, Mistral AI’s second marquee model, Mixtral-8x7B, is one of the first performant and open-source (Apache 2.0) Sparse Mixture of Experts Model (SMoE). The key distinguishing feature of Mixtral’s SMoE implementation, compared to Mistral 7B, is the inclusion of a router network that guides tokens through a set of two groups of parameters (experts) of a possible eight. This allows the model to perform better and be significantly larger without a corresponding significant increase in cost and latency. More specific details are available in the companion paper “[Mixtral of Experts](https://arxiv.org/abs/2401.04088)”. Released in April 2024, Mistral AI’s second SMoE model, Mixtral-8x22B sets a new standard for performance and efficiency within the AI community. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size “[announcement page](https://mistral.ai/news/mixtral-8x22b/)”. In the following documentation pages we use the terms “mixtral” and “mixtral_8x22b” to refer to the Mixtral-8x7B and Mixtral-8x22B models, respectively. We provide recipes for pretraining and finetuning Mixtral models for two sizes: 8x7B, and 8x22B. The recipes use NeMo 2.0 and [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). These recipes configure a `run.Partial` for one of the [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) api functions introduced in NeMo 2.0. The recipes are hosted in [mixtral_8x7b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/mixtral_8x7b.py#L80) and [mixtral_8x22b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/mixtral_8x22b.py#L78) files. NeMo 2.0 Pretraining Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#nemo-2-0-pretraining-recipes "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------- Note The pretraining recipes use the `MockDataModule` for the `data` argument. You are expected to replace the `MockDataModule` with your own custom dataset. We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm pretrain = llm.mixtral_8x7b.pretrain_recipe( name="mixtral_8x7b_pretraining", dir=f"/path/to/checkpoints", num_nodes=2, num_gpus_per_node=8, ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # gbs=gbs, # mbs=mbs, # seq_length=pretrain.model.config.seq_length, # ) # pretrain.data = dataloader NeMo 2.0 Finetuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#nemo-2-0-finetuning-recipes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------- Note The finetuning recipes use the `SquadDataModule` for the `data` argument. You can replace the `SquadDataModule` with your custom dataset. To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once) from nemo.collections import llm if __name__ == "__main__": llm.import_ckpt(model=llm.MixtralModel(llm.MixtralConfig8x7B()), source='hf://mistralai/Mixtral-8x7B-v0.1') By default, the non-instruct version of the model is loaded. To load a different model, set `finetune.resume.restore_config.path=nemo://` or `finetune.resume.restore_config.path=` We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm recipe = llm.mixtral_8x7b.finetune_recipe( name="mixtral_8x7b_finetuning", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='lora', # 'lora', 'none' packed_sequence=False, ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # gbs=gbs, # mbs=mbs, # seq_length=recipe.model.config.seq_length, # ) # recipe.data = dataloader By default, the finetuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model. To finetune the entire model without LoRA, set `peft_scheme='none'` in the recipe argument. To finetune with sequence packing for a higher throughput, set `packed_sequence=True`. Note that you may need to tune the global batch size in order to achieve similar convergence. Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://github.com/NVIDIA/NeMo-Run/docs/source/guides) to learn more about its configuration and execution system. Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(pretrain, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(pretrain, direct=True) A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference: | Recipe | Status | | --- | --- | | Mixtral 8x7B | Yes | | Mixtral 8x7B FP8 | N/A | | Mixtral 8x7B 16k | Yes | | Mixtral 8x7B 64k | Yes | | Mixtral 8x22B | Yes | | Mixtral 8x22B FP8 | N/A | | Mixtral 8x22B 16k | N/A | | Mixtral 8x22B 64k | N/A | Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#nemo-2-0-finetuning-recipes) - [Mixtral of Experts](https://arxiv.org/abs/2401.04088) - [announcement page](https://mistral.ai/news/mixtral-8x22b/) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) - [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) - [mixtral_8x7b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/mixtral_8x7b.py#L80) - [mixtral_8x22b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/mixtral_8x22b.py#L78) - [documentation](https://github.com/NVIDIA/NeMo-Run/docs/source/guides) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html.md Title: Phi 3 — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html Published Time: Fri, 18 Jul 2025 19:27:37 GMT Markdown Content: Phi 3[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html.md#phi-3 "Link to this heading") -------------------------------------------------------------------------------------------------------------- [Microsoft’s Phi-3-mini-4K-Instruct is a 3.8B parameters, lightweight state of the art open trained model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct.md/) The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support We provide pre-defined recipes for pretraining and finetuning a Llama 3 model in two sizes: 8B and 70B, as well as Llama 3.1 model in three sizes: 8B, 70B and 405B. The recipes use NeMo 2.0 and [NeMo-Run](https://github.com/NVIDIA/NeMo-Run.md). These recipes configure a `run.Partial` for one of the [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html.md) api functions introduced in NeMo 2.0. The recipes are hosted in the following files: [llama3_8b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama3_8b.py.md), [llama3_70b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama3_70b.py.md), [llama31_8b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama31_8b.py.md), [llama31_70b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama31_70b.py.md), [llama31_405b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama31_405b.py.md). NeMo 2.0 Pretraining Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html.md#nemo-2-0-pretraining-recipes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------ Note The pretraining recipes use the `MockDataModule` for the `data` argument. You are expected to replace the `MockDataModule` with your custom dataset. We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm pretrain = llm.phi3_mini_4k_instruct.pretrain_recipe( name="phi3_mini_4k_instruct_pretraining", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # gbs=gbs, # mbs=mbs, # seq_length=pretrain.model.config.seq_length, # ) # pretrain.data = dataloader NeMo 2.0 Finetuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html.md#nemo-2-0-finetuning-recipes "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------- Note The finetuning recipes use the `SquadDataModule` for the `data` argument. You can replace the `SquadDataModule` with your custom dataset. To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once) from pathlib import Path from nemo.collections.llm import import_ckpt from nemo.collections.llm.gpt.model.phi3mini import Phi3ConfigMini, Phi3Model import_ckpt(model=Phi3Model(Phi3ConfigMini()),source='hf://microsoft/Phi-3-mini-4k-instruct') We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm recipe = llm.phi3_mini_4k_instruct.pretrain_recipe( name="phi3_mini_4k_instruct_pretrainin", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=1, peft_scheme='lora', # 'lora', 'none' packed_sequence=None, ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # gbs=gbs, # mbs=mbs, # seq_length=recipe.model.config.seq_length, # ) # recipe.data = dataloader By default, the finetuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model. To finetune the entire model without LoRA, set `peft_scheme='none'` in the recipe argument. To finetune with sequence packing for a higher throughput, set `packed_sequence=True`. Note that you may need to tune the global batch size in order to achieve similar convergence. Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides.md) to learn more about its configuration and execution system. Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(pretrain, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(pretrain, direct=True) A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference: | Recipe | Status | | --- | --- | | Phi 3 mini 4k instruct | Yes | | Phi 3 mini 128k instruct | N/A | | Phi 3 small 8k instruct | N/A | Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html.md#nemo-2-0-finetuning-recipes) - [Microsoft’s Phi-3-mini-4K-Instruct is a 3.8B parameters, lightweight state of the art open trained model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct.md/) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run.md) - [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html.md) - [llama3_8b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama3_8b.py.md) - [llama3_70b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama3_70b.py.md) - [llama31_8b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama31_8b.py.md) - [llama31_70b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama31_70b.py.md) - [llama31_405b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/llama31_405b.py.md) - [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md Title: T5 — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html Published Time: Fri, 18 Jul 2025 19:27:39 GMT Markdown Content: T5[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#t5 "Link to this heading") ------------------------------------------------------------------------------------------------------ T5, or Text-to-Text Transfer Transformer, is a versatile language model that frames all natural language processing (NLP) tasks as text-to-text problems. This means that every task, whether it’s translation, summarization, or question answering, is treated uniformly by converting input text into output text. T5 employs a transformer architecture, utilizing both encoder and decoder components to effectively process and generate language. We provide pre-defined recipes for pretraining and finetuning a T5 model in sizes: 220M, 3B and 11B. The recipes use NeMo 2.0 and [NeMo-Run](https://github.com/NVIDIA/NeMo-Run.md). These recipes configure a `run.Partial` for one of the [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html.md) api functions introduced in NeMo 2.0. The recipes are hosted in [t5_220m](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes/t5_220m.py.md), [t5_3b](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes/t5_3b.py.md) and [t5_11b](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes/t5_11b.py.md) files. NeMo 2.0 Pretraining Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#nemo-2-0-pretraining-recipes "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------- Note The pretraining recipes use the `MockDataModule` for the `data` argument. You are expected to replace the `MockDataModule` with your custom dataset. We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm pretrain = llm.t5_220m.pretrain_recipe( name="t5_220m_pretraining", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # global_batch_size=global_batch_size, # micro_batch_size=micro_batch_size, # seq_length=pretrain.model.config.seq_length, # seq_length_dec=recipe.model.config.seq_length_dec, # ) # pretrain.data = dataloader NeMo 2.0 Finetuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#nemo-2-0-finetuning-recipes "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------- Note The finetuning recipes use the `SquadDataModule` for the `data` argument. You can replace the `SquadDataModule` with your custom dataset. We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import llm recipe = llm.t5_220m.finetune_recipe( name="t5_220m_finetuning", checkpoint_path=f"/path/to/pretrained_checkpoints", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='lora', # 'lora', 'none' ) # # To override the data argument # dataloader = a_function_that_configures_your_custom_dataset( # global_batch_size=global_batch_size, # micro_batch_size=micro_batch_size, # seq_length=recipe.model.config.seq_length, # seq_length_dec=recipe.model.config.seq_length_dec, # ) # recipe.data = dataloader By default, the finetuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model. To finetune the entire model without LoRA, set `peft_scheme='none'` in the recipe argument. Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides.md) to learn more about its configuration and execution system. Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(pretrain, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(pretrain, direct=True) A list of pretraining recipes that we currently support or plan to support soon is provided below for reference: | Recipe | Status | | --- | --- | | T5 220M | Yes | | T5 3B | Yes | | T5 11B | Yes | Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#nemo-2-0-finetuning-recipes) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run.md) - [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html.md) - [t5_220m](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes/t5_220m.py.md) - [t5_3b](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes/t5_3b.py.md) - [t5_11b](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes/t5_11b.py.md) - [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md Title: Quantization — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html Published Time: Fri, 18 Jul 2025 19:27:10 GMT Markdown Content: Quantization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#quantization "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo offers Post-Training Quantization (PTQ) to postprocess a FP16/BF16 model to a lower precision format for efficient deployment. The following sections detail how to use it. Post-Training Quantization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#post-training-quantization "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- PTQ enables deploying a model in a low-precision format – FP8, INT4, or INT8 – for efficient serving. Different [quantization methods](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_choosing_quant_methods.html.md) are available including FP8 quantization, INT8 SmoothQuant, and INT4 AWQ. Model quantization has three primary benefits: reduced model memory requirements, lower memory bandwidth pressure, and increased inference throughput. In NeMo, quantization is enabled by the [NVIDIA TensorRT Model Optimizer (ModelOpt)](https://github.com/NVIDIA/TensorRT-Model-Optimizer.md) – a library to quantize and compress deep learning models for optimized inference on GPUs. The quantization process consists of the following steps: 1. Load a model checkpoint using an appropriate parallelism strategy. 2. Calibrate the model to obtain scaling factors for lower-precision GEMMs. 3. Produce a [TensorRT-LLM checkpoint](https://nvidia.github.io/TensorRT-LLM/architecture/checkpoint.html.md) with model config (json) and quantized weights (safetensors). Additionally, the necessary context to set up the model tokenizer is saved. Loading models requires using a custom ModelOpt spec defined in the [megatron.core.post_training.modelopt](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core/post_training/modelopt.md) module for both Transformer and Mamba-type models. Typically, the calibration step is lightweight and uses a small dataset to obtain appropriate statistics for scaling tensors. The output directory produced is ready to be used to build a serving engine with the NVIDIA TensorRT-LLM library (see [Deploy NeMo Models by Exporting TensorRT-LLM](https://docs.nvidia.com/nemo-framework/user-guide/latest/deployment/llm/nemo_models/optimized/tensorrt_llm.html.md#deploy-nemo-framework-models-tensorrt-llm)). We refer to this checkpoint as the qnemo checkpoint henceforth. The quantization algorithm can also be conveniently set to `"no_quant"` to perform only the weights export step using the default precision for TensorRT-LLM deployment. This is useful for obtaining baseline performance and accuracy results for comparison. ### Support Matrix[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#support-matrix "Link to this heading") The table below presents a verified model support matrix for popular LLM architectures. Support for other model families is experimental. | Model Name | Model Parameters | Decoder Type | FP8 | INT8 SQ | INT4 AWQ | | --- | --- | --- | --- | --- | --- | | GPT | 2B, 8B, 43B | gptnext | ✓ | ✓ | ✓ | | Nemotron-3 | 8B, 22B | gptnext | ✓ | ✓ | ✓ | | Nemotron-4 | 15B, 340B | gptnext | ✓ | ✓ | ✓ | | Llama 2 | 7B, 13B, 70B | llama | ✓ | ✓ | ✓ | | Llama 3 | 8B, 70B | llama | ✓ | ✓ | ✓ | | Llama 3.1 | 8B, 70B, 405B | llama | ✓ | ✓ | ✓ | | Llama 3.2 | 1B, 3B | llama | ✓ | ✓ | ✓ | | Falcon | 7B, 40B | falcon | ✗ | ✗ | ✗ | | Gemma 1 | 2B, 7B | gemma | ✓ | ✓ | ✓ | | StarCoder 1 | 15B | gpt2 | ✓ | ✓ | ✓ | | StarCoder 2 | 3B, 7B, 15B | gptnext | ✓ | ✓ | ✓ | | Mistral | 7B | llama | ✓ | ✓ | ✓ | | Mixtral | 8x7B | llama | ✓ | ✗ | ✗ | When running PTQ, the decoder type for exporting TensorRT-LLM checkpoint is detected automatically based on the model used. If necessary, it can be overriden using `decoder_type` parameter. ### Example[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#example "Link to this heading") The example below shows how to quantize the Llama 3 70b model into FP8 precision, using tensor parallelism of 8 on a single DGX H100 node. The quantized model is designed for serving using 2 H100 GPUs specified with the `export.inference_tp` parameter. The quantization workflow can be launched with NeMo CLI or using a PTQ script with `torchrun` or Slurm. This is shown below. #### Use the NeMo CLI[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#use-the-nemo-cli "Link to this heading") The command below can be launched inside a NeMo container (only single-node use cases are supported): CALIB_TP=8 INFER_TP=2 nemo llm ptq \ nemo_checkpoint=/opt/checkpoints/llama3-70b-base \ calibration_tp=$CALIB_TP \ quantization_config.algorithm=fp8 \ export_config.inference_tp=$INFER_TP \ export_config.path=/opt/checkpoints/llama3-70b-base-fp8-qnemo \ run.executor=torchrun \ run.executor.ntasks_per_node=$CALIB_TP #### Use the PTQ script with `torchrun` or Slurm[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#use-the-ptq-script-with-torchrun-or-slurm "Link to this heading") Alternatively, the `torchrun` command and [scripts/llm/ptq.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/llm/ptq.py.md) can be used directly. The script must be launched correctly with the number of processes equal to tensor parallelism: CALIB_TP=8 CALIB_PP=1 INFER_TP=2 torchrun --nproc_per_node $CALIB_TP /opt/NeMo/scripts/llm/ptq.py \ --nemo_checkpoint=/opt/checkpoints/llama3-70b-base \ --calibration_tp=$CALIB_TP \ --calibration_pp=$CALIB_PP \ --algorithm=fp8 \ --inference_tp=$INFER_TP \ --export_path=/opt/checkpoints/llama3-70b-base-fp8-qnemo For large models, this script can be launched on Slurm for multi-node use cases by setting the `--calibration_tp` and `--calibration_pp` along with the corresponding Slurm `--ntasks-per-node` and `--nodes` parameters, respectively: CALIB_TP=8 CALIB_PP=2 INFER_TP=8 srun --nodes $CALIB_PP --ntasks-per-node $CALIB_TP ... \ python /opt/NeMo/scripts/llm/ptq.py \ --nemo_checkpoint=/opt/checkpoints/nemotron4-340b-base \ --calibration_tp=$CALIB_TP \ --calibration_pp=$CALIB_PP \ ... For the Llama 3 70b example, the output directory has the following structure: llama3-70b-base-fp8-qnemo/ ├── config.json ├── nemo_context/ ├── rank0.safetensors └── rank1.safetensors The next step is to build a TensorRT-LLM engine for the checkpoint produced. This can be conveniently achieved and run using the `TensorRTLLM` class available in the `nemo.export` module. See [Deploy NeMo Models by Exporting TensorRT-LLM](https://docs.nvidia.com/nemo-framework/user-guide/latest/deployment/llm/nemo_models/optimized/tensorrt_llm.html.md#deploy-nemo-framework-models-tensorrt-llm) for details. Alternatively, you can use the TensorRT-LLM trtllm-build command directly. References[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#references "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------- Please refer to the following papers for more details on quantization techniques: * [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation, 2020](https://arxiv.org/abs/2004.09602.md) * [FP8 Formats for Deep Learning, 2022](https://arxiv.org/abs/2209.05433.md) * [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models, 2022](https://arxiv.org/abs/2211.10438.md) * [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, 2023](https://arxiv.org/abs/2306.00978.md) Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html.md#references) - [quantization methods](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_choosing_quant_methods.html.md) - [NVIDIA TensorRT Model Optimizer (ModelOpt)](https://github.com/NVIDIA/TensorRT-Model-Optimizer.md) - [TensorRT-LLM checkpoint](https://nvidia.github.io/TensorRT-LLM/architecture/checkpoint.html.md) - [megatron.core.post_training.modelopt](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core/post_training/modelopt.md) - [Deploy NeMo Models by Exporting TensorRT-LLM](https://docs.nvidia.com/nemo-framework/user-guide/latest/deployment/llm/nemo_models/optimized/tensorrt_llm.html.md#deploy-nemo-framework-models-tensorrt-llm) - [scripts/llm/ptq.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/llm/ptq.py.md) - [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation, 2020](https://arxiv.org/abs/2004.09602.md) - [FP8 Formats for Deep Learning, 2022](https://arxiv.org/abs/2209.05433.md) - [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models, 2022](https://arxiv.org/abs/2211.10438.md) - [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, 2023](https://arxiv.org/abs/2306.00978.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md Title: Logging and Checkpointing — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html Published Time: Thu, 30 Oct 2025 07:07:29 GMT Markdown Content: Logging and Checkpointing[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#logging-and-checkpointing "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Three main classes in NeMo 2.0 are responsible for configuring logging and checkpointing directories: 1. `nemo.lightning.pytorch.callbacks.model_checkpoint.ModelCheckpoint` > * Handles the logic that determines when to save a checkpoint. > > * Provides the ability to perform asynchronous checkpointing. 2. `nemo.lightning.nemo_logger.NeMoLogger` > * Responsible for setting logging directories. > > * Optionally configures the trainer’s loggers. 3. `nemo.lightning.resume.AutoResume` > * Sets the checkpointing directory. > > * Determines whether there is an existing checkpoint from which to resume. ModelCheckpoint[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#modelcheckpoint "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- The `ModelCheckpoint` callback in NeMo 2.0 is a wrapper around Pytorch Lightning’s `ModelCheckpoint`. It manages when to save and clean up checkpoints during training. Additionally, it supports saving a checkpoint at the end of training and provides the necessary support for asynchronous checkpointing. The following is an example of how to instantiate a `ModelCheckpoint` callback: from nemo.lightning.pytorch.callbacks import ModelCheckpoint checkpoint_callback = ModelCheckpoint( save_last=True, monitor="val_loss", save_top_k=2, every_n_train_steps=30, dirpath='my_model_directory', always_save_context=True, ) Refer to the documentation for [NeMo Lightning](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/pytorch/callbacks/model_checkpoint.py) and [PyTorch Lightning’s](https://github.com/Lightning-AI/pytorch-lightning/blob/master/src/lightning/pytorch/callbacks/model_checkpoint.py)`ModelCheckpoint` classes to find the complete list of supported arguments. Here, `dirpath` refers to the directory to save the checkpoints. Note that `dirpath` is optional. If not provided, it will default to `log_dir / checkpoints`, where `log_dir` is the path determined by the `NeMoLogger`, as described in detail in the subsequent section. In addition, note that asynchronous checkpointing is set using the `ckpt_async_save` argument in [MegatronStrategy](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#design-megatron). This attribute is then accessed by the checkpoint callback to perform async checkpointing as requested. Two options are available to pass the `ModelCheckpoint` callback instance to the trainer. 1. Add the callback to the set of callbacks and then pass the callbacks directly to the trainer: import nemo.lightning as nl callbacks = [checkpoint_callback] ### add any other desired callbacks... trainer = nl.Trainer( ... callbacks = callbacks, ... ) 2. Pass the callback to the `NeMoLogger`, as described in the `NeMoLogger` section below. ### Checkpoint Directory Structure[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#checkpoint-directory-structure "Link to this heading") By default, `ModelCheckpoint` in NeMo saves checkpoints with the following structure: log_dir └-checkpoints/ └-model_name=...step=...consumed_samples=.../ ├-context/ | ├-io.json | ├-model.yaml | ├- ... ├-weights/ | ├-common.pt | ├-metadata.json | ├-__0_0.distcp | ├-__1_0.distcp | ├-... The `context` directory contains the artifacts needed to reinitialize the experiment’s model, trainer, and dataloader. It is present only if one of the following conditions is met: 1. `always_save_context` is set to `True` when instantiating `ModelCheckpoint`, or 2. `save_context_on_train_end` is set to `True` and the checkpoint is the final checkpoint of the training run. The configuration of the model checkpoint is saved in `io.json` and displayed as a human-readable file in `model.yaml`. `io.json` is the source of truth for model configuration; modifying `model.yaml` has no effect when loading the model. The `weights` directory consists primarily of `.distcp` files which store the distributed checkpoint. By default, there are two `.distcp` files per rank. NeMoLogger[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#nemologger "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------- The `NeMoLogger` class provides a standardized way to set up logging for NeMo experiments. It creates a new log directory (or reuses an existing one), manages experiment names and versions (optionally using timestamps), and can configure multiple loggers (e.g., TensorBoard and WandB). It also handles copying important files (like configurations) and manages checkpoint settings, ensuring all experiment artifacts are consistently organized. Please refer to the [NeMoLogger documentation](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/nemo_logger.py) for details on all supported arguments. Here is an example of creating a new `NeMoLogger` instance: from nemo.lightning import NeMoLogger nemo_logger = NeMoLogger( log_dir='my_logging_dir', name='experiment1', use_datetime_version=False, ) By default, the directory where logs are written is `log_dir / name / version`. If an explicit version is not provided and `use_datetime_version` is False, the directory will change to `log_dir / name`. As mentioned earlier, you can optionally pass your `ModelCheckpoint` instance in here, and the logger will automatically configure the checkpoint callback in your trainer: nemo_logger = NeMoLogger( ... ckpt=checkpoint_callback, ... ) Once your trainer has been initialized, the `NeMoLogger` can be set up using the following command: nemo_logger.setup( trainer, resume_if_exists, ) The `resume_if_exists` boolean indicates whether to resume from the latest checkpoint, if one is available. The value of `resume_if_exists` should match the value passed into `AutoResume`, as described below. Experiment Logging[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#experiment-logging "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo 2.0 provides built-in support for logging experiments using popular tracking tools like TensorBoard and Weights & Biases (wandb). ### TensorBoard Logging[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#tensorboard-logging "Link to this heading") To use TensorBoard logging with NeMo 2.0: 1. First, ensure you have TensorBoard installed: pip install tensorboard 2. Configure the TensorBoardLogger and add it to your NeMoLogger: from lightning.pytorch.loggers import TensorBoardLogger # Create TensorBoard logger tensorboard = TensorBoardLogger( save_dir="tb_logs", # Directory to store TensorBoard logs name="my_model", # Name of the experiment version=None, # Optional version number ) # Add TensorBoard logger to NeMoLogger nemo_logger = NeMoLogger( tensorboard=tensorboard, # Pass TensorBoard logger here ... ) In this example, The TensorBoard logs will be saved in the directory `tb_logs` as a subdirectory of the `my_model` experiment dir. The `update_logger_directory` argument in `NeMoLogger` controls whether to update the directory of the TensorBoard logger to match the NeMo log dir. If set to `True`, the TensorBoard logger will also write to the same log directory. ### Weights & Biases (wandb) Logging[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#weights-biases-wandb-logging "Link to this heading") To use Weights & Biases (wandb) logging with NeMo 2.0: 1. First, ensure you have wandb installed: pip install wandb 2. Configure the WandbLogger and add it to your NeMoLogger: from lightning.pytorch.loggers import WandbLogger # Create Wandb logger wandb_logger = WandbLogger( project="my_project", # Name of the W&B project name="my_experiment", # Name of this specific run entity="my_team", # Optional: username or team name config={}, # Optional: dictionary of hyperparameters ) # Add Wandb logger to NeMoLogger nemo_logger = NeMoLogger( wandb=wandb_logger, # Pass Wandb logger here ... ) The Weights & Biases logs will be automatically synced to your wandb account under the specified `project` and `name`. You can view your experiment metrics, system stats, and model artifacts through the wandb web interface. Just as with the TensorBoard logger, the `update_logger_directory` argument in `NeMoLogger` controls whether to update the directory of the wandb logger to match the NeMo log dir. If set to `True`, the wandb logger will also write to the same log directory. AutoResume[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#autoresume "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------- The `AutoResume` class manages checkpoint paths and checks for existing checkpoints to restore from. Here’s an example of how it can be used: from nemo.lightning import AutoResume resume = AutoResume( resume_if_exists=True, resume_ignore_no_checkpoint=True, resume_from_directory="checkpoint_dir_to_resume_from" ) In the script, `resume_from_directory` refers to the path of the checkpoint directory to resume from. If no `resume_from_directory` is provided, the directory to resume from will default to `log_dir / checkpoints`, where `log_dir` is determined by the `NemoLogger` instance as described in the previous section. The `resume_ignore_no_checkpoint` boolean determines whether to proceed without error if `resume_if_exists` is set to `True` and no checkpoint is found in the checkpointing directory. Ensure that the value of `resume_if_exists` matches the argument passed into the `NemoLogger` instance. `AutoResume` should be set up in a similar fashion to `NeMoLogger`. resume.setup(trainer, model) Passing a model into the setup is optional. It is only required when importing a checkpoint from Hugging Face or other non-NeMo checkpoint formats. Putting it All Together[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#putting-it-all-together "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------ To put it all together, configuring loggers and checkpointers in NeMo 2.0 looks like this: from lightning.pytorch.loggers import TensorBoardLogger from lightning.pytorch.loggers import WandbLogger checkpoint_callback = ModelCheckpoint( save_last=True, monitor="reduced_train_loss", save_top_k=2, every_n_train_steps=30, dirpath='my_model_directory', ) tensorboard = TensorBoardLogger( save_dir="tb_logs", name="experiment1", ) wandb_logger = WandbLogger( project="my_project", name="my_experiment", entity="my_team", ) logger = nemo_logger = NeMoLogger( log_dir='my_logging_dir', name='experiment1', use_datetime_version=False, ckpt=checkpoint_callback, tensorboard=tensorboard, wandb=wandb_logger, update_logger_directory=True, ) resume = AutoResume( resume_if_exists=True, resume_ignore_no_checkpoint=True, ) ### setup your trainer here ### nemo_logger.setup( trainer, getattr(resume, "resume_if_exists", False), ) resume.setup(trainer) Note that using both `TensorBoardLogger` and `WandbLogger` at the same time is possible, as shown here, but uncommon. This example is mainly for demonstration purposes, so please adapt it to your needs. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#putting-it-all-together) - [NeMo Lightning](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/pytorch/callbacks/model_checkpoint.py) - [PyTorch Lightning’s](https://github.com/Lightning-AI/pytorch-lightning/blob/master/src/lightning/pytorch/callbacks/model_checkpoint.py) - [MegatronStrategy](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#design-megatron) - [NeMoLogger documentation](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/nemo_logger.py) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md Title: The Link Between Lightning and Megatron Core# URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html Published Time: Thu, 30 Oct 2025 07:07:29 GMT Markdown Content: In PyTorch Lightning, a Strategy is responsible for managing the distributed execution of a model during training, validation, and testing. Strategies typically wrap the user-defined model with a class that can handle distributed execution. For instance, the standard DDPStrategy (Distributed Data Parallel Strategy) wraps the model with PyTorch’s DistributedDataParallel class. This wrapper handles the distribution of data across multiple GPUs or nodes, synchronizes gradients during the backward pass, and ensures that model parameters remain consistent across all processes. Strategies in Lightning abstract away much of the complexity of distributed training, allowing users to focus on their model architecture and training logic while the framework handles the intricacies of distributed execution. MegatronStrategy[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#megatronstrategy "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------- The MegatronStrategy is a PyTorch Lightning strategy that enables distributed training of large language models using NVIDIA’s Megatron Core library. It’s designed to handle models that exceed the memory capacity of a single GPU by implementing various forms of model parallelism. To use the MegatronStrategy, you initialize it with parameters that define the parallelism setup: from nemo import lightning as nl strategy = nl.MegatronStrategy( tensor_model_parallel_size=2, pipeline_model_parallel_size=2, virtual_pipeline_model_parallel_size=None, context_parallel_size=1, sequence_parallel=False, expert_model_parallel_size=1, ) These parameters determine how the model will be split across available GPUs. The strategy then sets up the necessary distributed environment, initializing process groups for each type of parallelism. The strategy is also responsible for configuring the checkpoint IO interface that handles saving and loading checkpoints. For a full list of options that can be configured via MegatronStrategy, refer to the [documentation](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/pytorch/strategies.py). When you create your PyTorch Lightning Trainer, you pass this strategy: trainer = nl.Trainer(strategy=strategy, devices=8, accelerator="gpu") The MegatronStrategy utilizes Megatron’s distributed checkpointing system for model I/O. This system efficiently manages checkpoints for models partitioned across multiple GPUs, maintaining consistency across various parallelism configurations. It enables correct model reconstruction even when GPU setups differ between saving and loading. The MegatronStrategy wraps the user-defined training_step, validation_step, and test_step methods to make them compatible with Megatron’s forward-backward pass implementation. This wrapping process allows these steps to be executed within the context of Megatron’s distributed execution framework, ensuring that all forms of parallelism are properly handled during each phase of the training loop. By doing this, the strategy maintains the familiar PyTorch Lightning interface for users while seamlessly integrating the complex distributed operations required for large-scale model training. The `MegatronStrategy` employs the `MegatronParallel` class to manage the distributed execution of the user-defined model. This class breaks down the execution process into three key steps: 1. Data Step: Prepares and distributes the input data across the model parallel groups. 2. Forward Step: Executes the forward pass across the partitioned model. 3. Loss Reduction: Computes and reduces the loss across the distributed setup. MegatronParallel utilizes these steps to perform the forward-backward pass, which is derived from the user-defined `training_step`, `validation_step`, and `test_step` methods. It orchestrates the flow of data and gradients through the partitioned model, manages inter-GPU communication, and ensures proper gradient synchronization. This approach enables efficient execution across multiple GPUs while preserving the logical structure of the user’s Lightning module. MegatronParallel[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#megatronparallel "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------- The `MegatronParallel` class is the core component that implements distributed model parallelism in the Megatron Core library. It manages the execution of the model across multiple GPUs, breaking down the process into three key steps: 1. Data Step: This step prepares and distributes the input data across the model parallel groups. For the GPT model, it uses the `gpt_data_step` function: def data_step(self, dataloader_iter): return gpt_data_step(dataloader_iter) This function handles: 1. Fetching a batch from the dataloader 2. Moving required tensors to CUDA 3. Slicing the batch for context parallelism using `get_batch_on_this_context_parallel_rank` 4. Preparing packed sequence parameters if necessary 2. Forward Step: This step executes the forward pass across the partitioned model. For the GPT model, it uses the `gpt_forward_step` function: def forward_step(self, model, batch): return gpt_forward_step(model, batch) This function: 1. Prepares the forward arguments from the batch 2. Calls the model’s forward method with these arguments 3. Handles both standard and packed sequence inputs 3. Loss Reduction: After the forward pass, this step computes and reduces the loss across the distributed setup. The GPT model uses `MaskedTokenLossReduction`: def loss_reduction(self, model): return model.training_loss_reduction() For validation: def validation_loss_reduction(self, model): return model.validation_loss_reduction() These methods handle: 1. Calculating the loss using masked token loss 2. Reducing the loss across data parallel groups 3. Handling special cases for validation (e.g., not dropping the last batch) The `MegatronParallel` class orchestrates these steps to perform the complete forward-backward pass. By using these model-specific functions, `MegatronParallel` allows the GPT model to define its own data processing, forward pass, and loss calculation logic while still benefiting from the distributed execution framework. This approach enables researchers and engineers to work with large language models using familiar PyTorch Lightning interfaces, while the underlying distributed execution is handled transparently. MegatronMixedPrecision[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#megatronmixedprecision "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------- The MegatronMixedPrecision class is a specialized precision plugin for Megatron Core library models in PyTorch Lightning. It extends the standard MixedPrecision plugin to handle the specific requirements of large language models trained with Megatron Core library. from nemo import lightning as nl precision = nl.MegatronMixedPrecision(precision="bf16-mixed") trainer = nl.Trainer(strategy=strategy, plugins=precision) Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#megatronmixedprecision) - [documentation](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/pytorch/strategies.py) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md Title: NeMo 2.0 — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html Published Time: Thu, 30 Oct 2025 07:07:29 GMT Markdown Content: NeMo 2.0[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#nemo-2-0 "Link to this heading") ------------------------------------------------------------------------------------------------------------------------- In NeMo 1.0, the main interface for configuring experiments is through YAML files. This approach allows for a declarative way to set up experiments, but it has limitations in terms of flexibility and programmatic control. NeMo 2.0 shifts to a Python-based configuration, which offers several advantages: * More flexibility and control over the configuration. * Better integration with IDEs for code completion and type checking. * Easier to extend and customize configurations programmatically. By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 makes it easy for users to adapt the framework to their specific use cases and experiment with various configurations. This section offers an overview of the new features in NeMo 2.0 and includes a migration guide with step-by-step instructions for transitioning your models from NeMo 1.0 to NeMo 2.0. Install NeMo 2.0[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#install-nemo-2-0 "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------- NeMo 2.0 installation instructions can be found in the [Getting Started guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html.md#nemo-2-quickstart-nemo-run). Quickstart[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#quickstart "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------- Important In any script you write, please make sure you wrap your code in an `if __name__ == "__main__":` block. See [Working with scripts in NeMo 2.0](https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html.md#main-block-best-practice) for details. The following is an example of running a simple training loop using NeMo 2.0. This example uses the [train API](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/api.py) from the NeMo Framework LLM collection. Once you have set up your environment using the instructions above, you’re ready to run this simple train script. import torch from nemo import lightning as nl from nemo.collections import llm from megatron.core.optimizer import OptimizerConfig if __name__ == "__main__": seq_length = 2048 global_batch_size = 16 ## setup the dummy dataset data = llm.MockDataModule(seq_length=seq_length, global_batch_size=global_batch_size) ## initialize a small GPT model gpt_config = llm.GPTConfig( num_layers=6, hidden_size=384, ffn_hidden_size=1536, num_attention_heads=6, seq_length=seq_length, init_method_std=0.023, hidden_dropout=0.1, attention_dropout=0.1, layernorm_epsilon=1e-5, make_vocab_size_divisible_by=128, ) model = llm.GPTModel(gpt_config, tokenizer=data.tokenizer) ## initialize the strategy strategy = nl.MegatronStrategy( tensor_model_parallel_size=1, pipeline_model_parallel_size=1, pipeline_dtype=torch.bfloat16, ) ## setup the optimizer opt_config = OptimizerConfig( optimizer='adam', lr=6e-4, bf16=True, ) opt = nl.MegatronOptimizerModule(config=opt_config) trainer = nl.Trainer( devices=1, ## you can change the number of devices to suit your setup max_steps=50, accelerator="gpu", strategy=strategy, plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"), ) nemo_logger = nl.NeMoLogger( log_dir="test_logdir", ## logs and checkpoints will be written here ) llm.train( model=model, data=data, trainer=trainer, log=nemo_logger, tokenizer='data', optim=opt, ) CLI Quickstart[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#cli-quickstart "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------- NeMo comes equipped with a CLI that allows you to launch experiments locally or on a remote cluster. Every command has a help flag that you can use to get more information about the command. To list all the commands inside the llm-collection, you can use the following command: $ nemo llm --help Usage: nemo llm [OPTIONS] COMMAND [ARGS]... [Module] llm ╭─ Options ────────────────────────────────────────────────────────────────╮ │ --help Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────╯ ╭─ Commands ───────────────────────────────────────────────────────────────╮ │ train [Entrypoint] train │ │ pretrain [Entrypoint] pretrain │ │ finetune [Entrypoint] finetune │ │ validate [Entrypoint] validate │ │ prune [Entrypoint] prune │ │ distill [Entrypoint] distill │ │ ptq [Entrypoint] ptq │ │ deploy [Entrypoint] deploy │ │ import [Entrypoint] import │ │ export [Entrypoint] export │ │ generate [Entrypoint] generate │ ╰──────────────────────────────────────────────────────────────────────────╯ Most commands come with various pre-configured recipes. To list all the recipes for a given command, you can use the following command: $ nemo llm finetune --help Usage: nemo llm finetune [OPTIONS] [ARGUMENTS] [Entrypoint] finetune Finetunes a model using the specified data and trainer, with optional logging, resuming, and PEFT. ╭─ Pre-loaded entrypoint factories, run with --factory ──────────────────────────────────╮ │ baichuan2_7b nemo.collections.llm.recipes.baichuan2_7b.fi… line 236 │ │ chatglm3_6b nemo.collections.llm.recipes.chatglm3_6b.fin… line 236 │ │ deepseek_v2 nemo.collections.llm.recipes.deepseek_v2.fin… line 108 │ │ deepseek_v2_lite nemo.collections.llm.recipes.deepseek_v2_lit… line 107 │ │ gemma2_2b nemo.collections.llm.recipes.gemma2_2b.finet… line 173 │ │ gemma2_9b nemo.collections.llm.recipes.gemma2_9b.finet… line 173 │ │ llama2_7b nemo.collections.llm.recipes.llama2_7b.finet… line 230 │ │ llama3_8b nemo.collections.llm.recipes.llama3_8b.finet… line 245 │ │ mixtral_8x7b nemo.collections.llm.recipes.mixtral_8x7b.fi… line 240 │ │ nemotron3_8b nemo.collections.llm.recipes.nemotron3_8b.fi… line 253 │ │ nemotron4_15b nemo.collections.llm.recipes.nemotron4_15b.f… line 227 │ │ ... (output truncated) │ ╰────────────────────────────────────────────────────────────────────────────────────────╯ You can also use the `--factory` flag to run a specific recipe. For example, to run the `llama32_1b` recipe, you can use the following command: $ nemo llm finetune --factory llama32_1b NeMo CLI supports overriding any configuration parameter using Hydra-style dot notation. This powerful feature allows you to customize any aspect of the recipe without modifying the source code. For example, to change the number of GPUs used for training from the default to just 1 device: $ nemo llm finetune --factory llama32_1b trainer.devices=1 Configuring global options Dry run for task nemo.collections.llm.api:finetune Resolved Arguments ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Argument Name ┃ Resolved Value ┃ ┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ trainer │ Trainer( │ │ │ ... │ │ │ devices='1', │ │ │ ... │ └──────────────────────┴──────────────────────────────────────────────────────────────┘ Continue? [y/N]: This syntax follows the pattern `component.parameter=value`, allowing you to navigate nested configurations. You can override multiple parameters at once by adding more space-separated overrides: $ nemo llm finetune --factory llama32_1b trainer.devices=1 trainer.max_steps=500 optim.config.lr=5e-5 The command prints a preview of the resolved configuration values so you can verify your changes before starting the training run. NeMo 2.0 also seamlessly supports scaling to thousands of GPUs using [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). For examples of launching large-scale experiments using NeMo-Run, refer to [Quickstart with NeMo-Run](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html.md#nemo-2-quickstart-nemo-run). Note If you are an existing user of NeMo 1.0 and would like to use a NeMo 1.0 dataset in place of the `MockDataModule` in the example, refer to the [data migration guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/data.html.md#migration-data) for instructions. Extend Quickstart with NeMo-Run[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#extend-quickstart-with-nemo-run "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- While [Quickstart with NeMo-Run](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html.md#nemo-2-quickstart-nemo-run) covers how to configure your NeMo 2.0 experiment using NeMo-Run, it is not mandatory to use the configuration system from NeMo-Run. In fact, you can take the Python script from the [Quickstart](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#nemo-2-quickstart-python) above and launch it on remote clusters directly using NeMo-Run. For more details about NeMo-Run, refer to [NeMo-Run Github](https://github.com/NVIDIA/NeMo-Run) and the [hello_scripts example](https://github.com/NVIDIA/NeMo-Run/blob/main/examples/hello-world/hello_scripts.py). Below, we will walk through how to do this. ### Prerequisites[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#prerequisites "Link to this heading") 1. Save the script above as `train.py` in your working directory. 2. Install NeMo-Run using the following command: pip install git+https://github.com/NVIDIA/NeMo-Run.git Let’s assume that you have the above script saved as `train.py` in your current working directory. ### Launch the Experiment Locally[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#launch-the-experiment-locally "Link to this heading") Locally here means from your local workstation. It can be a `venv` in your workstation or an interactive NeMo Docker container. 1. Write a new file called `run.py` with the following contents: import os import nemo_run as run if __name__ == "__main__": training_job = run.Script( inline=""" # This string will get saved to a sh file and executed with bash # Run any preprocessing commands # Run the training command python train.py # Run any post processing commands """ ) # Run it locally executor = run.LocalExecutor() with run.Experiment("nemo_2.0_training_experiment", log_level="INFO") as exp: exp.add(training_job, executor=executor, tail_logs=True, name="training") # Add more jobs as needed # Run the experiment exp.run(detach=False) 1. Launch the experiment using the following command: python run.py ### Launch the Experiment on Slurm[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#launch-the-experiment-on-slurm "Link to this heading") Writing an extra script to just launch locally is not very useful. So let’s see how we can extend `run.py` to launch the job on any supported [NeMo-Run executors](https://github.com/NVIDIA/NeMo-Run/blob/main/docs/source/guides/execution.md). For this tutorial, we will use the slurm executor. Note Each cluster might have different settings. It is recommended that you reach out to the cluster administrators for specific details. 1. Define a function to configure your slurm executor as follows: def slurm_executor( user: str, host: str, remote_job_dir: str, account: str, partition: str, nodes: int, devices: int, time: str = "01:00:00", custom_mounts: Optional[list[str]] = None, custom_env_vars: Optional[dict[str, str]] = None, container_image: str = "nvcr.io/nvidia/nemo:dev", retries: int = 0, ) -> run.SlurmExecutor: if not (user and host and remote_job_dir and account and partition and nodes and devices): raise RuntimeError( "Please set user, host, remote_job_dir, account, partition, nodes, and devices args for using this function." ) mounts = [] # Custom mounts are defined here. if custom_mounts: mounts.extend(custom_mounts) # Env vars for jobs are configured here env_vars = { "TORCH_NCCL_AVOID_RECORD_STREAMS": "1", "NCCL_NVLS_ENABLE": "0", "NVTE_DP_AMAX_REDUCE_INTERVAL": "0", "NVTE_ASYNC_AMAX_REDUCTION": "1", } if custom_env_vars: env_vars |= custom_env_vars # This will package the train.py script in the current working directory to the remote cluster. # If you are inside a git repo, you can also use https://github.com/NVIDIA/NeMo-Run/blob/main/src/nemo_run/core/packaging/git.py. # If the script already exists on your container and you call it with the absolute path, you can also just use `run.Packager()`. packager = run.PatternPackager(include_pattern="train.py", relative_path=os.getcwd()) # This defines the slurm executor. # We connect to the executor via the tunnel defined by user, host and remote_job_dir. executor = run.SlurmExecutor( account=account, partition=partition, tunnel=run.SSHTunnel( user=user, host=host, job_dir=remote_job_dir, # This is where the results of the run will be stored by default. # identity="/path/to/identity/file" OPTIONAL: Provide path to the private key that can be used to establish the SSH connection without entering your password. ), nodes=nodes, ntasks_per_node=devices, gpus_per_node=devices, mem="0", exclusive=True, gres="gpu:8", packager=packager, ) executor.container_image = container_image executor.container_mounts = mounts executor.env_vars = env_vars executor.retries = retries executor.time = time return executor 1. Replace the executor in `run.py` as follows: executor = slurm_executor(...) # pass in args relevant to your cluster 1. Run the file with the same command and it will launch your job on the cluster. Similarly, you can define multiple slurm executors for multiple Slurm clusters and use them interchangeably, or use any of the supported executors in NeMo-Run. Where to Find NeMo 2.0[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#where-to-find-nemo-2-0 "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------- Currently, the code for NeMo 2.0 can be found in two main locations within the [NeMo GitHub](https://github.com/NVIDIA/NeMo/tree/main) repository: 1. [LLM collection](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm): This is the first collection to adopt the NeMo 2.0 APIs. This collection provides implementations of common language models using NeMo 2.0. Currently, the collection supports the following models: > * GPT > > * [LLama](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/llama3.html.md#llama) > > * [Mixtral](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#mixtral) > > * [Nemotron](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron.html.md#nemotron) > > * [Mamba2 and Hybrid Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mamba.html.md#mamba) > > * [T5](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#t5) 2. [NeMo 2.0 LLM Recipes](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes): Provides comprehensive recipes for pretraining and fine-tuning large language models. Recipes can be easily configured and modified for specific use-cases with the help of [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). 3. [NeMo Lightning](https://github.com/NVIDIA/NeMo/tree/main/nemo/lightning): Provides custom PyTorch Lightning-compatible objects that make it possible to train Megatron Core-based models using PTL in a modular fashion. NeMo 2.0 employs these objects to train models in a simple and efficient manner. Pretraining, Supervised Fine-Tuning (SFT), and Parameter-Efficient Fine-Tuning (PEFT) are all supported by the LLM collection. More information about each model can be found in the model-specific documentation linked above. Long context recipes are also supported with the help of context parallelism. For more information on the available long conext recipes, refer to the [long context documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/longcontext/index.html.md#long-context-recipes). Inference via TensorRT-LLM supported in NeMo 2.0. For more information, refer to the TRT-LLM deployment documentation. Additional Resources[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#additional-resources "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------- * The [Feature Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/index.html.md#nemo-2-design) provides an in-depth exploration of the main features of NeMo 2.0. Refer to this guide for information on: > * [The interaction between NeMo Lightning and Megatron](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#design-megatron) > > * [Logging and checkpointing](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#design-logging) > > * [Parameter-efficient fine-tuning](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/peft.html.md#design-peft) > > * [Serialization](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/serialization.html.md#design-serialization) > > * [Hugging Face integration](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/hf-integration.html.md#design-hf) * For users familiar with NeMo 1.0, the [Migration Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html.md#nemo-2-migration) explains how to migrate your experiments from NeMo 1.0 to NeMo 2.0. To convert your existing NeMo 1.0 checkpoint to NeMo 2.0, follow the guide [here](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md#migration-checkpointing). * [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) contains additional examples of launching large-scale runs using NeMo 2.0 and NeMo-Run. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#additional-resources) - [Getting Started guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html.md#nemo-2-quickstart-nemo-run) - [Working with scripts in NeMo 2.0](https://docs.nvidia.com/nemo-framework/user-guide/latest/best-practices.html.md#main-block-best-practice) - [train API](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/api.py) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) - [data migration guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/data.html.md#migration-data) - [Quickstart](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#nemo-2-quickstart-python) - [hello_scripts example](https://github.com/NVIDIA/NeMo-Run/blob/main/examples/hello-world/hello_scripts.py) - [NeMo-Run executors](https://github.com/NVIDIA/NeMo-Run/blob/main/docs/source/guides/execution.md) - [NeMo GitHub](https://github.com/NVIDIA/NeMo/tree/main) - [LLM collection](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm) - [LLama](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/llama3.html.md#llama) - [Mixtral](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#mixtral) - [Nemotron](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron.html.md#nemotron) - [Mamba2 and Hybrid Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mamba.html.md#mamba) - [T5](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#t5) - [NeMo 2.0 LLM Recipes](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes) - [NeMo Lightning](https://github.com/NVIDIA/NeMo/tree/main/nemo/lightning) - [long context documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/longcontext/index.html.md#long-context-recipes) - [Feature Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/index.html.md#nemo-2-design) - [The interaction between NeMo Lightning and Megatron](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/megatron.html.md#design-megatron) - [Logging and checkpointing](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/logging.html.md#design-logging) - [Parameter-efficient fine-tuning](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/peft.html.md#design-peft) - [Serialization](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/serialization.html.md#design-serialization) - [Hugging Face integration](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/hf-integration.html.md#design-hf) - [Migration Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html.md#nemo-2-migration) - [here](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md#migration-checkpointing) - [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md Title: Migrate Checkpointing Configurations from NeMo 1.0 to NeMo 2.0# URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html Published Time: Thu, 30 Oct 2025 07:07:29 GMT Markdown Content: NeMo 1.0 (Previous Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md#nemo-1-0-previous-release "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 1.0, [distributed checkpointing](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/checkpoints/dist_ckpt.html.md?highlight=distributed%2520checkpointing#) was configured in the YAML configuration file. # Distributed checkpoint setup dist_ckpt_format: 'zarr' # Set to 'torch_dist' to use PyTorch distributed checkpoint format. dist_ckpt_load_on_device: True # whether to load checkpoint weights directly on GPU or to CPU dist_ckpt_parallel_save: False # if true, each worker will write its own part of the dist checkpoint dist_ckpt_parallel_load: False # if true, each worker will load part of the dist checkpoint and exchange with NCCL. Might use some extra GPU memory dist_ckpt_torch_dist_multiproc: 2 # number of extra processes per rank used during ckpt save with PyTorch distributed format dist_ckpt_assume_constant_structure: False # set to True only if the state dict structure doesn't change within a single job. Allows caching some computation across checkpoint saves. dist_ckpt_parallel_dist_opt: True # parallel save/load of a DistributedOptimizer. 'True' allows performant save and reshardable checkpoints. Set to 'False' only in order to minimize the number of checkpoint files. NeMo 2.0 (New Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md#nemo-2-0-new-release "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 2.0, these settings are controlled from the `MegatronStrategy`. from nemo.collections import llm from nemo import lightning as nl strategy = nl.MegatronStrategy( save_ckpt_format='zarr', ckpt_load_on_device=True, ckpt_parallel_save=False, ckpt_parallel_load=False, ckpt_assume_constant_structure=False, ckpt_parallel_save_optim=False, ) nl.Trainer( strategy=strategy, ... ) Migrate Distributed Checkpoint Setup Settings[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md#migrate-distributed-checkpoint-setup-settings "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1. Locate the [distributed checkpoint setup](https://github.com/NVIDIA/NeMo/blob/00fe96f01baff193418e3d71e78acf3748907b6e/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L176-L185) section in your NeMo 1.0 YAML config file. 2. Pass the `distributed checkpoint setup` settings into `MegatronStrategy`: strategy = nl.MegatronStrategy( save_ckpt_format='zarr', ckpt_load_on_device=True, ckpt_parallel_save=False, ckpt_parallel_load=False, ckpt_torch_dist_multiproc=2, ckpt_assume_constant_structure=False, ckpt_parallel_save_optim=False, ) Note Non-distributed checkpointing is not supported by NeMo 2.0. Convert NeMo 1.0 Checkpoint to NeMo 2.0 Checkpoint[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md#convert-nemo-1-0-checkpoint-to-nemo-2-0-checkpoint "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- We provide a script to convert NeMo 1.0 checkpoints to NeMo 2.0 checkpoints. The script is available at `scripts/checkpoint_converters/convert_nemo1_to_nemo2.py`. 1. The NeMo 1.0 checkpoint is in the `model_name.nemo` tarball format. After you extract the tarball you will see the following structure: model_name.nemo ├── model_config.yaml ├── model_weights │ ├── distributed checkpointing directories/files in zarr or torch_dist format │ ├── metadata.json │ └── common.pt 2. The NeMo 2.0 checkpoint is a directory with the following structure. The content in the context directory may be optional: model_name ├── context │ ├── model_config.yaml │ ├── io.json │ └── tokenizer ├── weights │ ├── distributed checkpointing directories/files in zarr or torch_dist format │ ├── metadata.json │ └── common.pt 3. The conversion script needs the NeMo 1.0 weights, NeMo 1.0 model ID (that specifis the model structure and configurations) and NeMo 1.0 tokenizer information (either sentencepiece tokenizer.model or Hugging Face tokenizer ID) to convert the checkpoint to NeMo 2.0 format. The script will create a new directory with the NeMo 2.0 checkpoint structure. The script utilizes CPU and CPU memory for checkpoint conversion. When your NeMo 1.0 checkpoint uses Hugging Face tokenizer, the conversion script will download the tokenizer from Hugging Face. If the tokenizer comes from a gated repo, you will need to first log in to Hugging Face: huggingface-cli login Currently, only sentencepice and Hugging Face tokenizers are supported. The following commands should be used inside a NeMo container. * You can pass the `model_name.nemo` tarball, which contains weights and tokenizer info, to the script. We take `meta-llama/Meta-Llama-3-8B` as an example: python /opt/NeMo/scripts/checkpoint_converters/convert_nemo1_to_nemo2.py \ --input_path=Meta-Llama-3-8B.nemo \ --output_path=your_output_dir \ --model_id=meta-llama/Meta-Llama-3-8B * If you have a model weight directory (whose structure is similar to the `model_weights` directory in the NeMo 1.0 checkpoint), you can pass the weight directory to the script. In this case, the script will also need the tokenizer info since the weights directory doesn’t contain this information. We take nemotron-3-8b-base-4k` as an example: python /opt/NeMo/scripts/checkpoint_converters/convert_nemo1_to_nemo2.py \ --input_path=nemotron3-8b-extracted/model_weights \ --tokenizer_path=path_to_your_tokenizer_model.model \ --tokenizer_library=sentencepiece \ --output_path=your_output_dir \ --model_id=nvidia/nemotron-3-8b-base-4k 4. Supported models: Currently, we have validated for the following models: * `meta-llama/Meta-Llama-3-8B` * `mistralai/Mixtral-8x7B-v0.1` * `nvidia/nemotron-3-8b-base-4k` Models of same family/structure and different sizes should work, but have not been validated. The model conversion will only work for models supported by NeMo 2.0. We will add validation for more models in the future. The list of available model IDs can be find in the script `scripts/checkpoint_converters/convert_nemo1_to_nemo2.py`. The `--model_id` argument should be one of the following: * `meta-llama/Llama-2-7b-hf` * `meta-llama/Llama-2-13b-hf` * `meta-llama/Llama-2-70b-hf` * `meta-llama/Meta-Llama-3-8B` * `meta-llama/Meta-Llama-3-70B` * `mistralai/Mixtral-8x7B-v0.1` * `mistralai/Mixtral-8x22B-v0.1` * `mistralai/Mistral-7B-v0.1` * `nvidia/nemotron-3-8b-base-4k` * `nemotron4-22b` * `nemotron4-15b` * `nemotron4-340b` Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/checkpointing.html.md#convert-nemo-1-0-checkpoint-to-nemo-2-0-checkpoint) - [distributed checkpointing](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/checkpoints/dist_ckpt.html.md?highlight=distributed%2520checkpointing#) - [distributed checkpoint setup](https://github.com/NVIDIA/NeMo/blob/00fe96f01baff193418e3d71e78acf3748907b6e/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L176-L185) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/exp-manager.html.md Title: Migrate exp_manager to NeMoLogger and AutoResume# URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/exp-manager.html Published Time: Thu, 30 Oct 2025 07:07:29 GMT Markdown Content: In NeMo 2.0, the `exp_manager` configuration has been replaced with [NeMoLogger](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/nemo_logger.py) and [AutoResume](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/resume.py) objects. This guide will help you migrate your experiment management setup. NeMo 1.0 (Previous Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/exp-manager.html.md#nemo-1-0-previous-release "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 1.0, experiment management was configured in the YAML configuration file. exp_manager: explicit_log_dir: null exp_dir: null name: megatron_gpt create_wandb_logger: False wandb_logger_kwargs: project: null name: null resume_if_exists: True resume_ignore_no_checkpoint: True resume_from_checkpoint: ${model.resume_from_checkpoint} create_checkpoint_callback: True checkpoint_callback_params: dirpath: null # to use S3 checkpointing, set the dirpath in format s3://bucket/key monitor: val_loss save_top_k: 10 mode: min always_save_nemo: False # saves nemo file during validation, not implemented for model parallel save_nemo_on_train_end: False # not recommended when training large models on clusters with short time limits filename: 'megatron_gpt--{val_loss:.2f}-{step}-{consumed_samples}' model_parallel_size: ${multiply:${model.tensor_model_parallel_size}, ${model.pipeline_model_parallel_size}} async_save: False # Set to True to enable async checkpoint save. Currently works only with distributed checkpoints NeMo 2.0 (New Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/exp-manager.html.md#nemo-2-0-new-release "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 2.0, experiment management is configured using the `NeMoLogger` and `AutoResume` classes. from nemo.collections import llm from nemo import lightning as nl from pytorch_lightning.loggers import WandbLogger log = nl.NeMoLogger( name="megatron_gpt", log_dir=None, # This will default to ./nemo_experiments explicit_log_dir=None, version=None, use_datetime_version=True, log_local_rank_0_only=False, log_global_rank_0_only=False, files_to_copy=None, update_logger_directory=True, wandb=WandbLogger(project=None, name=None), ckpt=nl.ModelCheckpoint( dirpath=None, # to use S3 checkpointing, set the dirpath in format s3://bucket/key monitor="val_loss", save_top_k=10, mode="min", always_save_nemo=False, save_nemo_on_train_end=False, filename='megatron_gpt--{val_loss:.2f}-{step}-{consumed_samples}', ) ) resume = nl.AutoResume( path=None, # Equivalent to resume_from_checkpoint dirpath=None, import_path=None, resume_if_exists=True, resume_past_end=False, resume_ignore_no_checkpoint=True, ) llm.train(..., log=log, resume=resume) Additionally, the NeMo 1.0 experiment manager provided the option to add some callbacks to the trainer. In NeMo 2.0, those callbacks can be passed directly to your trainer. Notably, the `TimingCallback()` was used in NeMo 1.0 to log step times. To add the `TimingCallback` in NeMo 2.0, add the callback directly to the trainer: import nemo.lightning as nl from nemo.utils.exp_manager import TimingCallback trainer = nl.Trainer( ... callbacks=[TimingCallback()], ... ) Migration Steps[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/exp-manager.html.md#migration-steps "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------- 1. Remove the `exp_manager` section from your YAML config file. 2. Add the following imports to your Python script: from nemo import lightning as nl from pytorch_lightning.loggers import WandbLogger 3. Create a `NeMoLogger` object with the appropriate parameters: log = nl.NeMoLogger( name="megatron_gpt", log_dir=None, # This will default to ./nemo_experiments explicit_log_dir=None, version=None, use_datetime_version=True, log_local_rank_0_only=False, log_global_rank_0_only=False, files_to_copy=None, update_logger_directory=True, wandb=WandbLogger(project=None, name=None), ckpt=nl.ModelCheckpoint( dirpath=None, monitor="val_loss", save_top_k=10, mode="min", always_save_nemo=False, save_nemo_on_train_end=False, filename='megatron_gpt--{val_loss:.2f}-{step}-{consumed_samples}', async_save=False, ) ) 4. Create an `AutoResume` object with the appropriate parameters: resume = nl.AutoResume( path=None, # Equivalent to resume_from_checkpoint dirpath=None, import_path=None, resume_if_exists=True, resume_past_end=False, resume_ignore_no_checkpoint=True, ) 5. Add any callbacks you want to the trainer: import nemo.lightning as nl from nemo.lightning.python.callbacks import PreemptionCallback from nemo.utils.exp_manager import TimingCallback callback = [TimingCallback(), PreemptionCallback()] trainer = nl.Trainer( ... callbacks=callbacks, ... ) 6. Pass the `trainer`, `log`, and `resume` objects to the `llm.train()` function: llm.train(..., trainer=trainer, log=log, resume=resume) 1. Adjust the parameters in `NeMoLogger` and `AutoResume` to match your previous YAML configuration. Note * The `model_parallel_size` parameter is no longer needed in the checkpoint configuration. * For S3 checkpointing, set the `dirpath` in the `ModelCheckpoint` to the format `s3://bucket/key`. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/exp-manager.html.md#migration-steps) - [NeMoLogger](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/nemo_logger.py) - [AutoResume](https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/resume.py) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/precision.html.md Title: Migrate Precision Configurations from NeMo 1.0 to NeMo 2.0# URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/precision.html Published Time: Thu, 30 Oct 2025 07:07:29 GMT Markdown Content: In NeMo 2.0, precision configuration has been centralized to the `MegatronMixedPrecision` plugin. NeMo 1.0 (Previous Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/precision.html.md#nemo-1-0-previous-release "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 1.0, various model and training precision settings (including FP8 configuration) are spread throughout the YAML configuration file. trainer: precision: bf16 ... model: native_amp_init_scale: 4294967296 native_amp_growth_interval: 1000 ... fp8: False # enables fp8 in TransformerLayer forward fp8_e4m3: False # sets E4M3 FP8 format fp8_hybrid: False # sets hybrid FP8 format fp8_margin: 0 fp8_amax_history_len: 1024 fp8_amax_compute_algo: max NeMo 2.0 (New Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/precision.html.md#nemo-2-0-new-release "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 2.0, these settings are controlled using the `MegatronMixedPrecision` plugin. from nemo import lightning as nl plugin = nl.MegatronMixedPrecision( precision="bf16", fp16_initial_loss_scale=4294967296, fp16_loss_scale_window=1000, fp8=None, # Can be either "e4m3" or "hybrid" fp8_margin=0, fp8_amax_history_len=1024, fp8_amax_compute_algo="max", ) trainer = nl.Trainer( plugins=plugin, ... ) Migrate Precision Configurations[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/precision.html.md#migrate-precision-configurations "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1. Locate and remove all [precision](https://github.com/NVIDIA/NeMo/blob/00fe96f01baff193418e3d71e78acf3748907b6e/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L11) and [fp8](https://github.com/NVIDIA/NeMo/blob/00fe96f01baff193418e3d71e78acf3748907b6e/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L222-L228) configurations in your NeMo 1.0 YAML config file. 2. Add the following import to your Python script: from nemo import lightning as nl 3. Create a `MegatronMixedPrecision` plugin with the appropriate parameters: plugin = nl.MegatronMixedPrecision( precision="bf16", fp16_initial_loss_scale=4294967296, fp16_loss_scale_window=1000, fp8=None, # Can be either "e4m3" or "hybrid" fp8_margin=0, fp8_amax_history_len=1024, fp8_amax_compute_algo="max", ) 4. Adjust the arguments in the plugin to match your previous YAML configuration. 5. Add the precision plugin to your `Trainer` (see [Trainer migration guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/trainer.html.md#migration-trainer)): trainer = nl.Trainer( ... plugins=plugin, ... ) Note * TransformerEngine must be installed to use FP8 precision. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/precision.html.md#migrate-precision-configurations) - [precision](https://github.com/NVIDIA/NeMo/blob/00fe96f01baff193418e3d71e78acf3748907b6e/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L11) - [fp8](https://github.com/NVIDIA/NeMo/blob/00fe96f01baff193418e3d71e78acf3748907b6e/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L222-L228) - [Trainer migration guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/trainer.html.md#migration-trainer) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/sft.html.md Title: Migrate SFT Training and Inference from NeMo 1.0 to NeMo 2.0# URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/sft.html Published Time: Thu, 30 Oct 2025 07:07:30 GMT Markdown Content: NeMo 1.0 (Previous Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/sft.html.md#nemo-1-0-previous-release "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 1.0, SFT is configured using [megatron_gpt_finetuning_config.yaml](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_finetuning_config.yaml), and launched with [megatron_gpt_finetuning.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py), which are both completely separate from the pretraining scripts. Internally, this script instantiates a different model class (`MegatronGPTSFTModel`), even though the only difference from pretraining is the data pipeline. This design has been a point of confusion for many users of NeMo 1.0. NeMo 2.0 (New Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/sft.html.md#nemo-2-0-new-release "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 2.0, the design has been improved to address this problem. The data module and model class are now independent building blocks which can be combined intuitively to improve versatility and minimize redundancy. For SFT, the data module is [FineTuningDataModule](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/gpt/data/fine_tuning.py), and the rest of the pipeline is shared with pretraining. In addition, we provide dataset-specific data modules for the convenience of users to start training without having to worry about data downloading and preprocessing. Supported datasets can be found [here](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/gpt/data/api.py). Warning When using `import_ckpt` in NeMo 2.0, ensure your script includes `if __name__ == "__main__":`. Without this, Python’s multiprocessing won’t initialize threads properly, causing a “Failure to acquire lock” error. In NeMo 2.0, a fine-tuning workload can be run like this: from nemo import lightning as nl from nemo.collections import llm if __name__ == "__main__": trainer = nl.Trainer(...) model = llm.LlamaModel(...) ckpt_path = model.import_ckpt("hf://meta-llama/Meta-Llama-3-8B") # Option 1: custom dataset data = llm.FineTuningDataModule(dataset_root, seq_length=2048, micro_batch_size=1, global_batch_size=128, ...) # Option 2: provided dataset data = llm.SquadDataModule(seq_length=2048, micro_batch_size=1, global_batch_size=128, ...) trainer.fit(model, data, ckpt_path=ckpt_path) Using the [llm.finetune](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/api.py) API with NeMo-Run: import nemo_run as run sft = run.Partial( llm.finetune, model=llm.mistral, data=llm.squad, trainer=trainer, log=logger, optim=adam_with_cosine_annealing, ) run.run(sft, name="mistral-sft", direct=True) Migration Steps[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/sft.html.md#migration-steps "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------- 1. Create a `FineTuningDataModule` using arguments from the `data` field in the YAML file. The jsonl files should be processed in the same way as in NeMo 1.0. Alternatively, use one of the provided datasets and have it processed automatically for you. 2. Initialize the trainer, model, optimizer, logger in the same way as pretraining. 3. Instead of `MegatronGPTSFTModel.restore_from`, use `trainer.fit(..., ckpt_path=model.import_ckpt(...))` Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/sft.html.md#migration-steps) - [megatron_gpt_finetuning_config.yaml](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_finetuning_config.yaml) - [megatron_gpt_finetuning.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py) - [FineTuningDataModule](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/gpt/data/fine_tuning.py) - [here](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/gpt/data/api.py) - [llm.finetune](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/api.py) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/tokenizer.html.md Title: Tokenizers — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/tokenizer.html Published Time: Fri, 18 Jul 2025 19:26:15 GMT Markdown Content: Tokenizers[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/tokenizer.html.md#tokenizers "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------- NeMo 1.0 (Previous Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/tokenizer.html.md#nemo-1-0-previous-release "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 1.0, tokenizers were configured in the [tokenizer section](https://github.com/NVIDIA/NeMo/blob/54458fa9c1c913b2b0ea80f072b32d011c063e67/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml.md#L130-L137) of the YAML configuration file. NeMo 2.0 (New Release)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/tokenizer.html.md#nemo-2-0-new-release "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------- In NeMo 2.0, tokenizers can be initialized directly in Python. [get_nmt_tokenizer](https://github.com/NVIDIA/NeMo/blob/54458fa9c1c913b2b0ea80f072b32d011c063e67/nemo/collections/nlp/modules/common/tokenizer_utils.py.md#L148) is a utility function used in NeMo to instantiate many of the common tokenizers used for llm and multimodal training. For example, the following code will construct a `GPT2BPETokenizer`: from nemo.collections.nlp.modules.common.tokenizer_utils import get_nmt_tokenizer tokenizer = get_nmt_tokenizer( library="megatron", model_name="GPT2BPETokenizer", vocab_file="/path/to/vocab", merges_file="/path/to/merges", ) The following will construct a `SentencePiece` tokenizer: from nemo.collections.nlp.modules.common.tokenizer_utils import get_nmt_tokenizer tokenizer = get_nmt_tokenizer( library="sentencepiece", tokenizer_model='/path/to/sentencepiece/model' ) The following will construct a `Hugging Face` tokenizer: from nemo.collections.nlp.modules.common.tokenizer_utils import get_nmt_tokenizer tokenizer = get_nmt_tokenizer( library="huggingface", model_name='nvidia/Minitron-4B-Base', use_fast=True, ) Refer to the `get_nmt_tokenizer` code for a full list of supported arguments. To set up the tokenizer using nemo_run, use the following code: > import nemo_run as run > from nemo.collections.common.tokenizers import SentencePieceTokenizer > from nemo.collections.common.tokenizers.huggingface.auto_tokenizer import AutoTokenizer > > # Set up Sentence Piece tokenizer > tokenizer = run.Config(SentencePieceTokenizer, model_path="/path/to/tokenizer.model") > > # Set up Hugging Face tokenizer > tokenizer = run.Config(AutoTokenizer, pretrained_model_name="/path/to/tokenizer/model") Refer to the [SentencePieceTokenizer](https://github.com/NVIDIA/NeMo/blob/45f35240a608c295ce199fb50b7336c346099617/nemo/collections/common/tokenizers/sentencepiece_tokenizer.py.md#L35) or [AutoTokenizer](https://github.com/NVIDIA/NeMo/blob/45f35240a608c295ce199fb50b7336c346099617/nemo/collections/common/tokenizers/huggingface/auto_tokenizer.py.md#L28) code for a full list of supported arguments. To change the tokenizer path for model recipe, use the following code: from nemo.collections import llm recipe = partial(llm.llama3_8b)() # Change path for Hugging Face tokenizer recipe.data.tokenizer.pretrained_model_name = "/path/to/tokenizer/model" # Change tokenizer path for Sentence Piece tokenizer recipe.data.tokenizer.model_path = "/path/to/tokenizer.model" Basic NeMo 2.0 recipes can contain predefined tokenizers. Visit [this page](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/mamba2_8b.py.md#L38) to see an example of setting up the tokenizer in the recipe. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/tokenizer.html.md#nemo-2-0-new-release) - [tokenizer section](https://github.com/NVIDIA/NeMo/blob/54458fa9c1c913b2b0ea80f072b32d011c063e67/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml.md#L130-L137) - [get_nmt_tokenizer](https://github.com/NVIDIA/NeMo/blob/54458fa9c1c913b2b0ea80f072b32d011c063e67/nemo/collections/nlp/modules/common/tokenizer_utils.py.md#L148) - [SentencePieceTokenizer](https://github.com/NVIDIA/NeMo/blob/45f35240a608c295ce199fb50b7336c346099617/nemo/collections/common/tokenizers/sentencepiece_tokenizer.py.md#L35) - [AutoTokenizer](https://github.com/NVIDIA/NeMo/blob/45f35240a608c295ce199fb50b7336c346099617/nemo/collections/common/tokenizers/huggingface/auto_tokenizer.py.md#L28) - [this page](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/mamba2_8b.py.md#L38) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md Title: NeMo Speaker Diarization API — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html Published Time: Fri, 18 Jul 2025 19:25:08 GMT Markdown Content: NeMo Speaker Diarization API[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo-speaker-diarization-api "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Model Classes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#model-classes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------ Mixins[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#mixins "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.collections.asr.parts.mixins.DiarizationMixin[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.DiarizationMixin "Link to this definition") Bases: `VerificationMixin` _abstract_ diarize(_paths2audio\_files:List[str]_,_batch\_size:int=1_,)→List[str][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.DiarizationMixin.diarize "Link to this definition") Takes paths to audio files and returns speaker labels :param paths2audio_files: paths to audio fragment to be transcribed Returns: Speaker labels _class_ nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin "Link to this definition") Bases: `ABC` An abstract class for diarize-able models. Creates a template function diarize() that provides an interface to perform transcription of audio tensors or filepaths. The following abstract classes must be implemented by the subclass: > * _setup_diarize_dataloader(): > Setup the dataloader for diarization. Receives the output from _diarize_input_manifest_processing(). > > * _diarize_forward(): > Implements the model’s custom forward pass to return outputs that are processed by _diarize_output_processing(). > > * _diarize_output_processing(): > Implements the post processing of the model’s outputs to return the results to the user. The result can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects, or a dict of list of objects. _abstract_ _diarize_forward(_batch:Any_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._diarize_forward "Link to this definition") Internal function to perform the model’s custom forward pass to return outputs that are processed by _diarize_output_processing(). This function is called by diarize() and diarize_generator() to perform the model’s forward pass. Parameters: **batch** – A batch of input data from the data loader that is used to perform the model’s forward pass. Returns: The model’s outputs that are processed by _diarize_output_processing(). _diarize_input_manifest_processing(_audio\_files:List[str]_,_temp\_dir:str_,_diarcfg:DiarizeConfig_,)→Dict[str,Any][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._diarize_input_manifest_processing "Link to this definition") Internal function to process the input audio filepaths and return a config dict for the dataloader. Parameters: * **audio_files** – A list of string filepaths for audio files. * **temp_dir** – A temporary directory to store intermediate files. * **diarcfg** – The diarization config dataclass. Subclasses can change this to a different dataclass if needed. Returns: A config dict that is used to setup the dataloader for diarization. _diarize_input_processing(_audio_,_diarcfg:DiarizeConfig_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._diarize_input_processing "Link to this definition") Internal function to process the input audio data and return a DataLoader. This function is called by diarize() and diarize_generator() to setup the input data for diarization. Parameters: * **audio** – Of type GenericDiarizationType * **diarcfg** – The diarization config dataclass. Subclasses can change this to a different dataclass if needed. Returns: A DataLoader object that is used to iterate over the input audio data. _diarize_on_begin(_audio:str|List[str]_,_diarcfg:DiarizeConfig_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._diarize_on_begin "Link to this definition") Internal function to setup the model for diarization. Perform all setup and pre-checks here. Parameters: * **audio** (_Union_ _[_ _str_ _,_ _List_ _[_ _str_ _]_ _]_) – Of type GenericDiarizationType * **diarcfg** (_DiarizeConfig_) – An instance of DiarizeConfig. _diarize_on_end(_diarcfg:DiarizeConfig_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._diarize_on_end "Link to this definition") Internal function to teardown the model after transcription. Perform all teardown and post-checks here. Parameters: **diarcfg** – The diarization config dataclass. Subclasses can change this to a different dataclass if needed. _abstract_ _diarize_output_processing(_outputs_,_uniq\_ids_,_diarcfg:DiarizeConfig_,)→List[Any]|List[List[Any]]|Tuple[Any]|Tuple[List[Any]][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._diarize_output_processing "Link to this definition") Internal function to process the model’s outputs to return the results to the user. This function is called by diarize() and diarize_generator() to process the model’s outputs. Parameters: * **outputs** – The model’s outputs that are processed by _diarize_forward(). * **uniq_ids** – List of unique recording identificators in batch * **diarcfg** – The diarization config dataclass. Subclasses can change this to a different dataclass if needed. Returns: The output can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects. Its type is defined in GenericDiarizationType. _input_audio_to_rttm_processing(_audio\_files:List[str]_,)→List[Dict[str,str|float]][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._input_audio_to_rttm_processing "Link to this definition") Generate manifest style dict if audio is a list of paths to audio files. Parameters: **audio_files** – A list of paths to audio files. Returns: audio_rttm_map_dict A list of manifest style dicts. _abstract_ _setup_diarize_dataloader(_config:Dict_,)→torch.utils.data.DataLoader[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin._setup_diarize_dataloader "Link to this definition") Internal function to setup the dataloader for diarization. This function is called by diarize() and diarize_generator() to setup the input data for diarization. Parameters: **config** – A config dict that is used to setup the dataloader for diarization. It can be generated by _diarize_input_manifest_processing(). Returns: A DataLoader object that is used to iterate over the input audio data. diarize(_audio:str|List[str]|numpy.ndarray|torch.utils.data.DataLoader_,_batch\_size:int=1_,_include\_tensor\_outputs:bool=False_,_postprocessing\_yaml:str|None=None_,_num\_workers:int=1_,_verbose:bool=False_,_override\_config:DiarizeConfig|None=None_,_**config\_kwargs_,)→List[Any]|List[List[Any]]|Tuple[Any]|Tuple[List[Any]][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin.diarize "Link to this definition") Takes paths to audio files and returns speaker labels diarize_generator(_audio_,_override\_config:DiarizeConfig|None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin.diarize_generator "Link to this definition") A generator version of diarize function. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/api.html.md#nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin.diarize_generator) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md Title: End-to-End Speaker Diarization Configuration Files — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html Published Time: Thu, 30 Oct 2025 07:07:30 GMT Markdown Content: End-to-End Speaker Diarization Configuration Files[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#end-to-end-speaker-diarization-configuration-files "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hydra Configurations for Sortformer Diarizer Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#hydra-configurations-for-sortformer-diarizer-training "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sortformer Diarizer is an end-to-end speaker diarization model that is solely based on Transformer-encoder type of architecture. Model name convention for Sortformer Diarizer: sortformer_diarizer__-.yaml * Example /examples/speaker_tasks/diarization/neural_diarizer/conf/sortformer_diarizer_hybrid_loss_4spk-v1.yaml. name: "SortformerDiarizer" num_workers: 18 batch_size: 8 model: sample_rate: 16000 pil_weight: 0.5 # Weight for Permutation Invariant Loss (PIL) used in training the Sortformer diarizer model ats_weight: 0.5 # Weight for Arrival Time Sort (ATS) loss in training the Sortformer diarizer model max_num_of_spks: 4 # Maximum number of speakers per model; currently set to 4 model_defaults: fc_d_model: 512 # Hidden dimension size of the Fast-conformer Encoder tf_d_model: 192 # Hidden dimension size of the Transformer Encoder train_ds: manifest_filepath: ??? sample_rate: ${model.sample_rate} num_spks: ${model.max_num_of_spks} session_len_sec: 90 # Maximum session length in seconds soft_label_thres: 0.5 # Threshold for binarizing target values; higher values make the model more conservative in predicting speaker activity. soft_targets: False # If True, use continuous values as target values when calculating cross-entropy loss labels: null batch_size: ${batch_size} shuffle: True num_workers: ${num_workers} validation_mode: False # lhotse config use_lhotse: False use_bucketing: True num_buckets: 10 bucket_duration_bins: [10, 20, 30, 40, 50, 60, 70, 80, 90] pin_memory: True min_duration: 10 max_duration: 90 batch_duration: 400 quadratic_duration: 1200 bucket_buffer_size: 20000 shuffle_buffer_size: 10000 window_stride: ${model.preprocessor.window_stride} subsampling_factor: ${model.encoder.subsampling_factor} validation_ds: manifest_filepath: ??? is_tarred: False tarred_audio_filepaths: null sample_rate: ${model.sample_rate} num_spks: ${model.max_num_of_spks} session_len_sec: 90 # Maximum session length in seconds soft_label_thres: 0.5 # A threshold value for setting up the binarized labels. The higher the more conservative the model becomes. soft_targets: False labels: null batch_size: ${batch_size} shuffle: False num_workers: ${num_workers} validation_mode: True # lhotse config use_lhotse: False use_bucketing: False drop_last: False pin_memory: True window_stride: ${model.preprocessor.window_stride} subsampling_factor: ${model.encoder.subsampling_factor} test_ds: manifest_filepath: null is_tarred: False tarred_audio_filepaths: null sample_rate: 16000 num_spks: ${model.max_num_of_spks} session_len_sec: 90 # Maximum session length in seconds soft_label_thres: 0.5 soft_targets: False labels: null batch_size: ${batch_size} shuffle: False seq_eval_mode: True num_workers: ${num_workers} validation_mode: True # lhotse config use_lhotse: False use_bucketing: False drop_last: False pin_memory: True window_stride: ${model.preprocessor.window_stride} subsampling_factor: ${model.encoder.subsampling_factor} preprocessor: _target_ : nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor normalize: "per_feature" window_size: 0.025 sample_rate: ${model.sample_rate} window_stride: 0.01 window: "hann" features: 80 n_fft: 512 frame_splicing: 1 dither: 0.00001 sortformer_modules: _target_ : nemo.collections.asr.modules.sortformer_modules.SortformerModules num_spks: ${model.max_num_of_spks} # Number of speakers per model. This is currently fixed at 4. dropout_rate: 0.5 # Dropout rate fc_d_model: ${model.model_defaults.fc_d_model} tf_d_model: ${model.model_defaults.tf_d_model} # Hidden layer size for linear layers in Sortformer Diarizer module encoder: _target_ : nemo.collections.asr.modules.ConformerEncoder feat_in: ${model.preprocessor.features} feat_out: -1 n_layers: 18 d_model: ${model.model_defaults.fc_d_model} # Sub-sampling parameters subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding subsampling_factor: 8 # must be power of 2 for striding and vggnet subsampling_conv_channels: 256 # set to -1 to make it equal to the d_model causal_downsampling: false # Feed forward module's params ff_expansion_factor: 4 # Multi-headed Attention Module's params self_attention_model: rel_pos # rel_pos or abs_pos n_heads: 8 # may need to be lower for smaller d_models # [left, right] specifies the number of steps to be seen from left and right of each step in self-attention att_context_size: [-1, -1] # -1 means unlimited context att_context_style: regular # regular or chunked_limited xscaling: true # scales up the input embeddings by sqrt(d_model) untie_biases: true # unties the biases of the TransformerXL layers pos_emb_max_len: 5000 # Convolution module's params conv_kernel_size: 9 conv_norm_type: 'batch_norm' # batch_norm or layer_norm or groupnormN (N specifies the number of groups) conv_context_size: null # Regularization dropout: 0.1 # The dropout used in most of the Conformer Modules dropout_pre_encoder: 0.1 # The dropout used before the encoder dropout_emb: 0.0 # The dropout used for embeddings dropout_att: 0.1 # The dropout for multi-headed attention modules # Set to non-zero to enable stochastic depth stochastic_depth_drop_prob: 0.0 stochastic_depth_mode: linear # linear or uniform stochastic_depth_start_layer: 1 transformer_encoder: _target_ : nemo.collections.asr.modules.transformer.transformer_encoders.TransformerEncoder num_layers: 18 hidden_size: ${model.model_defaults.tf_d_model} # Needs to be multiple of num_attention_heads inner_size: 768 num_attention_heads: 8 attn_score_dropout: 0.5 attn_layer_dropout: 0.5 ffn_dropout: 0.5 hidden_act: relu pre_ln: False pre_ln_final_layer_norm: True loss: _target_ : nemo.collections.asr.losses.bce_loss.BCELoss weight: null # Weight for binary cross-entropy loss. Either `null` or list type input. (e.g. [0.5,0.5]) reduction: mean lr: 0.0001 optim: name: adamw lr: ${model.lr} # optimizer arguments betas: [0.9, 0.98] weight_decay: 1e-3 sched: name: InverseSquareRootAnnealing warmup_steps: 2500 warmup_ratio: null min_lr: 1e-06 trainer: devices: 1 # number of gpus (devices) accelerator: gpu max_epochs: 800 max_steps: -1 # computed at runtime if not set num_nodes: 1 strategy: ddp_find_unused_parameters_true # Could be "ddp" accumulate_grad_batches: 1 deterministic: True enable_checkpointing: False logger: False log_every_n_steps: 1 # Interval of logging. val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations exp_manager: use_datetime_version: False exp_dir: null name: ${name} resume_if_exists: True resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. resume_ignore_no_checkpoint: True create_tensorboard_logger: True create_checkpoint_callback: True create_wandb_logger: False checkpoint_callback_params: monitor: "val_f1_acc" mode: "max" save_top_k: 9 every_n_epochs: 1 wandb_logger_kwargs: resume: True name: null project: null Hydra Configurations for Streaming Sortformer Diarizer Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#hydra-configurations-for-streaming-sortformer-diarizer-training "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Model name convention for Streaming Sortformer Diarizer: streaming_sortformer_diarizer_-.yaml * Example /examples/speaker_tasks/diarization/neural_diarizer/conf/streaming_sortformer_diarizer_4spk-v2.yaml. name: "StreamingSortformerDiarizer" num_workers: 18 batch_size: 4 model: sample_rate: 16000 pil_weight: 0.5 # Weight for Permutation Invariant Loss (PIL) used in training the Sortformer diarizer model ats_weight: 0.5 # Weight for Arrival Time Sort (ATS) loss in training the Sortformer diarizer model max_num_of_spks: 4 # Maximum number of speakers per model; currently set to 4 streaming_mode: True model_defaults: fc_d_model: 512 # Hidden dimension size of the Fast-conformer Encoder tf_d_model: 192 # Hidden dimension size of the Transformer Encoder train_ds: manifest_filepath: ??? sample_rate: ${model.sample_rate} num_spks: ${model.max_num_of_spks} session_len_sec: 90 # Maximum session length in seconds soft_label_thres: 0.5 # Threshold for binarizing target values; higher values make the model more conservative in predicting speaker activity. soft_targets: False # If True, use continuous values as target values when calculating cross-entropy loss labels: null batch_size: ${batch_size} shuffle: True num_workers: ${num_workers} validation_mode: False # lhotse config use_lhotse: False use_bucketing: True num_buckets: 10 bucket_duration_bins: [10, 20, 30, 40, 50, 60, 70, 80, 90] pin_memory: True min_duration: 10 max_duration: 90 batch_duration: 400 quadratic_duration: 1200 bucket_buffer_size: 20000 shuffle_buffer_size: 10000 window_stride: ${model.preprocessor.window_stride} subsampling_factor: ${model.encoder.subsampling_factor} validation_ds: manifest_filepath: ??? is_tarred: False tarred_audio_filepaths: null sample_rate: ${model.sample_rate} num_spks: ${model.max_num_of_spks} session_len_sec: 90 # Maximum session length in seconds soft_label_thres: 0.5 # A threshold value for setting up the binarized labels. The higher the more conservative the model becomes. soft_targets: False labels: null batch_size: ${batch_size} shuffle: False num_workers: ${num_workers} validation_mode: True # lhotse config use_lhotse: False use_bucketing: False drop_last: False pin_memory: True window_stride: ${model.preprocessor.window_stride} subsampling_factor: ${model.encoder.subsampling_factor} test_ds: manifest_filepath: null is_tarred: False tarred_audio_filepaths: null sample_rate: 16000 num_spks: ${model.max_num_of_spks} session_len_sec: 90 # Maximum session length in seconds soft_label_thres: 0.5 soft_targets: False labels: null batch_size: ${batch_size} shuffle: False seq_eval_mode: True num_workers: ${num_workers} validation_mode: True # lhotse config use_lhotse: False use_bucketing: False drop_last: False pin_memory: True window_stride: ${model.preprocessor.window_stride} subsampling_factor: ${model.encoder.subsampling_factor} preprocessor: _target_ : nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor normalize: "NA" window_size: 0.025 sample_rate: ${model.sample_rate} window_stride: 0.01 window: "hann" features: 128 n_fft: 512 frame_splicing: 1 dither: 0.00001 sortformer_modules: _target_ : nemo.collections.asr.modules.sortformer_modules.SortformerModules num_spks: ${model.max_num_of_spks} # Maximum number of speakers the model can handle dropout_rate: 0.5 # Dropout rate fc_d_model: ${model.model_defaults.fc_d_model} # Hidden dimension size for Fast Conformer encoder tf_d_model: ${model.model_defaults.tf_d_model} # Hidden dimension size for Transformer encoder # Streaming mode parameters spkcache_len: 188 # Length of speaker cache buffer (total number of frames for all speakers) fifo_len: 0 # Length of FIFO buffer for streaming processing (0 = disabled) chunk_len: 188 # Number of frames processed in each streaming chunk spkcache_update_period: 188 # Speaker cache update period in frames chunk_left_context: 1 # Number of previous frames for each streaming chunk chunk_right_context: 1 # Number of future frames for each streaming chunk # Speaker cache update parameters spkcache_sil_frames_per_spk: 3 # Number of silence frames allocated per speaker in the speaker cache scores_add_rnd: 0 # Standard deviation of random noise added to scores in speaker cache update (training only) pred_score_threshold: 0.25 # Probability threshold for internal scores processing in speaker cache update max_index: 99999 # Maximum allowed index value for internal processing in speaker cache update scores_boost_latest: 0.05 # Gain for scores for recently added frames in speaker cache update sil_threshold: 0.2 # Threshold for determining silence frames to calculate average silence embedding strong_boost_rate: 0.75 # Rate determining number of frames per speaker that receive strong score boosting weak_boost_rate: 1.5 # Rate determining number of frames per speaker that receive weak score boosting min_pos_scores_rate: 0.5 # Rate threshold for dropping overlapping frames when enough non-overlapping exist # Self-attention parameters (training only) causal_attn_rate: 0.5 # Proportion of batches that use self-attention with limited right context causal_attn_rc: 7 # Right context size for self-attention with limited right context encoder: _target_ : nemo.collections.asr.modules.ConformerEncoder feat_in: ${model.preprocessor.features} feat_out: -1 n_layers: 17 d_model: ${model.model_defaults.fc_d_model} # Sub-sampling parameters subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding subsampling_factor: 8 # must be power of 2 for striding and vggnet subsampling_conv_channels: 256 # set to -1 to make it equal to the d_model causal_downsampling: false # Feed forward module's params ff_expansion_factor: 4 # Multi-headed Attention Module's params self_attention_model: rel_pos # rel_pos or abs_pos n_heads: 8 # may need to be lower for smaller d_models # [left, right] specifies the number of steps to be seen from left and right of each step in self-attention att_context_size: [-1, -1] # -1 means unlimited context att_context_style: regular # regular or chunked_limited xscaling: true # scales up the input embeddings by sqrt(d_model) untie_biases: true # unties the biases of the TransformerXL layers pos_emb_max_len: 5000 # Convolution module's params conv_kernel_size: 9 conv_norm_type: 'batch_norm' # batch_norm or layer_norm or groupnormN (N specifies the number of groups) conv_context_size: null # Regularization dropout: 0.1 # The dropout used in most of the Conformer Modules dropout_pre_encoder: 0.1 # The dropout used before the encoder dropout_emb: 0.0 # The dropout used for embeddings dropout_att: 0.1 # The dropout for multi-headed attention modules # Set to non-zero to enable stochastic depth stochastic_depth_drop_prob: 0.0 stochastic_depth_mode: linear # linear or uniform stochastic_depth_start_layer: 1 transformer_encoder: _target_ : nemo.collections.asr.modules.transformer.transformer_encoders.TransformerEncoder num_layers: 18 hidden_size: ${model.model_defaults.tf_d_model} # Needs to be multiple of num_attention_heads inner_size: 768 num_attention_heads: 8 attn_score_dropout: 0.5 attn_layer_dropout: 0.5 ffn_dropout: 0.5 hidden_act: relu pre_ln: False pre_ln_final_layer_norm: True loss: _target_ : nemo.collections.asr.losses.bce_loss.BCELoss weight: null # Weight for binary cross-entropy loss. Either `null` or list type input. (e.g. [0.5,0.5]) reduction: mean lr: 0.0001 optim: name: adamw lr: ${model.lr} # optimizer arguments betas: [0.9, 0.98] weight_decay: 1e-3 sched: name: InverseSquareRootAnnealing warmup_steps: 500 warmup_ratio: null min_lr: 1e-06 trainer: devices: 1 # number of gpus (devices) accelerator: gpu max_epochs: 800 max_steps: -1 # computed at runtime if not set num_nodes: 1 strategy: ddp_find_unused_parameters_true # Could be "ddp" accumulate_grad_batches: 1 deterministic: True enable_checkpointing: False logger: False log_every_n_steps: 1 # Interval of logging. val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations exp_manager: use_datetime_version: False exp_dir: null name: ${name} resume_if_exists: True resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc. resume_ignore_no_checkpoint: True create_tensorboard_logger: True create_checkpoint_callback: True create_wandb_logger: False checkpoint_callback_params: monitor: "val_f1_acc" mode: "max" save_top_k: 9 every_n_epochs: 1 wandb_logger_kwargs: resume: True name: null project: null Hydra Configurations for (Streaming) Sortformer Diarization Post-processing[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#hydra-configurations-for-streaming-sortformer-diarization-post-processing "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Post-processing converts the floating point number based Tensor output to time stamp output. While generating the speaker-homogeneous segments, onset and offset threshold, paddings can be considered to render the time stamps that can lead to the lowest diarization error rate (DER). This post-processing can be applied to both offline and streaming Sortformer diarizer. By default, post-processing is bypassed, and only binarization is performed. If you want to reproduce DER scores reported on NeMo model cards, you need to apply post-processing steps. Use batch_size = 1 to have the longest inference window and the highest possible accuracy. parameters: onset: 0.64 # Onset threshold for detecting the beginning of a speech segment offset: 0.74 # Offset threshold for detecting the end of a speech segment pad_onset: 0.06 # Adds the specified duration at the beginning of each speech segment pad_offset: 0.0 # Adds the specified duration at the end of each speech segment min_duration_on: 0.1 # Removes short speech segments if the duration is less than the specified minimum duration min_duration_off: 0.15 # Removes short silences if the duration is less than the specified minimum duration Cascaded Speaker Diarization Configuration Files[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#cascaded-speaker-diarization-configuration-files "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Both training and inference of cascaded speaker diarization is configured by `.yaml` files. The diarizer section will generally require information about the dataset(s) being used, models used in this pipeline, as well as inference related parameters such as post processing of each models. The sections on this page cover each of these in more detail. Note For model details and deep understanding about configs, training, fine-tuning and evaluations, please refer to `/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb` and `/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb`; for other applications such as possible integration with ASR, have a look at `/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb`. Hydra Configurations for Diarization Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#hydra-configurations-for-diarization-training "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Currently, NeMo supports Multi-scale diarization decoder (MSDD) as a neural diarizer model. MSDD is a speaker diarization model based on initializing clustering and multi-scale segmentation input. Example configuration files for MSDD model training can be found in `/examples/speaker_tasks/diarization/conf/neural_diarizer/`. * Model name convention for MSDD: msdd_scl___Povl_xxx * Example: `msdd_5scl_15_05_50Povl_256x3x32x2.yaml` has 5 scales, the longest scale is 1.5 sec, the shortest scale is 0.5 sec, with 50 percent overlap, hidden layer size is 256, 3 LSTM layers, 32 CNN channels, 2 repeated Conv layers MSDD model checkpoint (.ckpt) and NeMo file (.nemo) contain speaker embedding model (TitaNet) and the speaker model is loaded along with standalone MSDD module. Note that MSDD models require more than one scale. Thus, the parameters in `diarizer.speaker_embeddings.parameters` should have more than one scale to function as a MSDD model. ### General Diarizer Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#general-diarizer-configuration "Link to this heading") The items (OmegaConfig keys) directly under `model` determines segmentation and clustering related parameters. Multi-scale parameters (`window_length_in_sec`, `shift_length_in_sec` and `multiscale_weights`) are specified. `max_num_of_spks`, `scale_n`, `soft_label_thres` and `emb_batch_size` are set here and then assigned to dataset configurations. diarizer: out_dir: null oracle_vad: True # If True, uses RTTM files provided in manifest file to get speech activity (VAD) timestamps speaker_embeddings: model_path: ??? # .nemo local model path or pretrained model name (titanet_large is recommended) parameters: window_length_in_sec: [1.5,1.25,1.0,0.75,0.5] # Window length(s) in sec (floating-point number). either a number or a list. ex) 1.5 or [1.5,1.0,0.5] shift_length_in_sec: [0.75,0.625,0.5,0.375,0.25] # Shift length(s) in sec (floating-point number). either a number or a list. ex) 0.75 or [0.75,0.5,0.25] multiscale_weights: [1,1,1,1,1] # Weight for each scale. should be null (for single scale) or a list matched with window/shift scale count. ex) [0.33,0.33,0.33] save_embeddings: True # Save embeddings as pickle file for each audio input. num_workers: ${num_workers} # Number of workers used for data-loading. max_num_of_spks: 2 # Number of speakers per model. This is currently fixed at 2. scale_n: 5 # Number of scales for MSDD model and initializing clustering. soft_label_thres: 0.5 # Threshold for creating discretized speaker label from continuous speaker label in RTTM files. emb_batch_size: 0 # If this value is bigger than 0, corresponding number of embedding vectors are attached to torch graph and trained. ### Dataset Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#dataset-configuration "Link to this heading") Training, validation, and test parameters are specified using the `train_ds`, `validation_ds`, and `test_ds` sections in the configuration YAML file, respectively. The items such as `num_spks`, `soft_label_thres` and `emb_batch_size` follow the settings in `model` key. You may also leave fields such as the `manifest_filepath` or `emb_dir` blank, and then specify it via command-line interface. Note that `test_ds` is not used during training and only used for speaker diarization inference. train_ds: manifest_filepath: ??? emb_dir: ??? sample_rate: ${sample_rate} num_spks: ${model.max_num_of_spks} soft_label_thres: ${model.soft_label_thres} labels: null batch_size: ${batch_size} emb_batch_size: ${model.emb_batch_size} shuffle: True validation_ds: manifest_filepath: ??? emb_dir: ??? sample_rate: ${sample_rate} num_spks: ${model.max_num_of_spks} soft_label_thres: ${model.soft_label_thres} labels: null batch_size: 2 emb_batch_size: ${model.emb_batch_size} shuffle: False test_ds: manifest_filepath: null emb_dir: null sample_rate: 16000 num_spks: ${model.max_num_of_spks} soft_label_thres: ${model.soft_label_thres} labels: null batch_size: 2 shuffle: False seq_eval_mode: False ### Pre-processor Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#pre-processor-configuration "Link to this heading") In the MSDD configuration, pre-processor configuration follows the pre-processor of the embedding extractor model. preprocessor: _target_ : nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor normalize: "per_feature" window_size: 0.025 sample_rate: ${sample_rate} window_stride: 0.01 window: "hann" features: 80 n_fft: 512 frame_splicing: 1 dither: 0.00001 ### Model Architecture Configurations[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#model-architecture-configurations "Link to this heading") The hyper-parameters for MSDD models are under the `msdd_module` key. The model architecture can be changed by setting up the `weighting_scheme` and `context_vector_type`. The detailed explanation for architecture can be found in the [Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md) page. msdd_module: _target_ : nemo.collections.asr.modules.msdd_diarizer.MSDD_module num_spks: ${model.max_num_of_spks} # Number of speakers per model. This is currently fixed at 2. hidden_size: 256 # Hidden layer size for linear layers in MSDD module num_lstm_layers: 3 # Number of stacked LSTM layers dropout_rate: 0.5 # Dropout rate cnn_output_ch: 32 # Number of filters in a conv-net layer. conv_repeat: 2 # Determines the number of conv-net layers. Should be greater or equal to 1. emb_dim: 192 # Dimension of the speaker embedding vectors scale_n: ${model.scale_n} # Number of scales for multiscale segmentation input weighting_scheme: 'conv_scale_weight' # Type of weighting algorithm. Options: ('conv_scale_weight', 'attn_scale_weight') context_vector_type: 'cos_sim' # Type of context vector: options. Options: ('cos_sim', 'elem_prod') ### Loss Configurations[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#loss-configurations "Link to this heading") Neural diarizer uses a binary cross entropy (BCE) loss. A set of weights for negative (absence of the speaker’s speech) and positive (presence of the speaker’s speech) can be provided to the loss function. loss: _target_ : nemo.collections.asr.losses.bce_loss.BCELoss weight: null # Weight for binary cross-entropy loss. Either `null` or list type input. (e.g. [0.5,0.5]) Hydra Configurations for Diarization Inference[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#hydra-configurations-for-diarization-inference "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Example configuration files for speaker diarization inference can be found in `/examples/speaker_tasks/diarization/conf/inference/`. Choose a yaml file that fits your targeted domain. For example, if you want to diarize audio recordings of telephonic speech, choose `diar_infer_telephonic.yaml`. The configurations for all the components of diarization inference are included in a single file named `diar_infer_.yaml`. Each `.yaml` file has a few different sections for the following modules: VAD, Speaker Embedding, Clustering and ASR. In speaker diarization inference, the datasets provided in manifest format denote the data that you would like to perform speaker diarization on. Diarizer Configurations[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#diarizer-configurations "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ An example `diarizer` Hydra configuration could look like: diarizer: manifest_filepath: ??? out_dir: ??? oracle_vad: False # If True, uses RTTM files provided in manifest file to get speech activity (VAD) timestamps collar: 0.25 # Collar value for scoring ignore_overlap: True # Consider or ignore overlap segments while scoring Under `diarizer` key, there are `vad`, `speaker_embeddings`, `clustering` and `asr` keys containing configurations for the inference of the corresponding modules. ### Configurations for Voice Activity Detector[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#configurations-for-voice-activity-detector "Link to this heading") Parameters for VAD model are provided as in the following Hydra config example. vad: model_path: null # .nemo local model path or pretrained model name or none external_vad_manifest: null # This option is provided to use external vad and provide its speech activity labels for speaker embeddings extraction. Only one of model_path or external_vad_manifest should be set parameters: # Tuned parameters for CH109 (using the 11 multi-speaker sessions as dev set) window_length_in_sec: 0.15 # Window length in sec for VAD context input shift_length_in_sec: 0.01 # Shift length in sec for generate frame level VAD prediction smoothing: "median" # False or type of smoothing method (eg: median) overlap: 0.875 # Overlap ratio for overlapped mean/median smoothing filter onset: 0.4 # Onset threshold for detecting the beginning and end of a speech offset: 0.7 # Offset threshold for detecting the end of a speech pad_onset: 0.05 # Adding durations before each speech segment pad_offset: -0.1 # Adding durations after each speech segment min_duration_on: 0.2 # Threshold for short speech segment deletion min_duration_off: 0.2 # Threshold for small non_speech deletion filter_speech_first: True ### Configurations for Speaker Embedding in Diarization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#configurations-for-speaker-embedding-in-diarization "Link to this heading") Parameters for speaker embedding model are provided in the following Hydra config example. Note that multiscale parameters either accept list or single floating point number. speaker_embeddings: model_path: ??? # .nemo local model path or pretrained model name (titanet_large, ecapa_tdnn or speakerverification_speakernet) parameters: window_length_in_sec: 1.5 # Window length(s) in sec (floating-point number). Either a number or a list. Ex) 1.5 or [1.5,1.25,1.0,0.75,0.5] shift_length_in_sec: 0.75 # Shift length(s) in sec (floating-point number). Either a number or a list. Ex) 0.75 or [0.75,0.625,0.5,0.375,0.25] multiscale_weights: null # Weight for each scale. should be null (for single scale) or a list matched with window/shift scale count. Ex) [1,1,1,1,1] save_embeddings: False # Save embeddings as pickle file for each audio input. ### Configurations for Clustering in Diarization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#configurations-for-clustering-in-diarization "Link to this heading") Parameters for clustering algorithm are provided in the following Hydra config example. clustering: parameters: oracle_num_speakers: False # If True, use num of speakers value provided in the manifest file. max_num_speakers: 20 # Max number of speakers for each recording. If oracle_num_speakers is passed, this value is ignored. enhanced_count_thres: 80 # If the number of segments is lower than this number, enhanced speaker counting is activated. max_rp_threshold: 0.25 # Determines the range of p-value search: 0 < p <= max_rp_threshold. sparse_search_volume: 30 # The higher the number, the more values will be examined with more time. ### Configurations for Diarization with ASR[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#configurations-for-diarization-with-asr "Link to this heading") The following configuration needs to be appended under `diarizer` to run ASR with diarization to get a transcription with speaker labels. asr: model_path: ??? # Provide NGC cloud ASR model name. stt_en_conformer_ctc_* models are recommended for diarization purposes. parameters: asr_based_vad: False # if True, speech segmentation for diarization is based on word-timestamps from ASR inference. asr_based_vad_threshold: 50 # threshold (multiple of 10ms) for ignoring the gap between two words when generating VAD timestamps using ASR based VAD. asr_batch_size: null # Batch size can be dependent on each ASR model. Default batch sizes are applied if set to null. lenient_overlap_WDER: True # If true, when a word falls into speaker-overlapped regions, consider the word as a correctly diarized word. decoder_delay_in_sec: null # Native decoder delay. null is recommended to use the default values for each ASR model. word_ts_anchor_offset: null # Offset to set a reference point from the start of the word. Recommended range of values is [-0.05 0.2]. word_ts_anchor_pos: "start" # Select which part of the word timestamp we want to use. The options are: 'start', 'end', 'mid'. fix_word_ts_with_VAD: False # Fix the word timestamp using VAD output. You must provide a VAD model to use this feature. colored_text: False # If True, use colored text to distinguish speakers in the output transcript. print_time: True # If True, the start of the end time of each speaker turn is printed in the output transcript. break_lines: False # If True, the output transcript breaks the line to fix the line width (default is 90 chars) ctc_decoder_parameters: # Optional beam search decoder (pyctcdecode) pretrained_language_model: null # KenLM model file: .arpa model file or .bin binary file. beam_width: 32 alpha: 0.5 beta: 2.5 realigning_lm_parameters: # Experimental feature arpa_language_model: null # Provide a KenLM language model in .arpa format. min_number_of_words: 3 # Min number of words for the left context. max_number_of_words: 10 # Max number of words for the right context. logprob_diff_threshold: 1.2 # The threshold for the difference between two log probability values from two hypotheses. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md#configurations-for-diarization-with-asr) - [Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/datasets.html.md Title: Datasets — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/datasets.html Published Time: Fri, 18 Jul 2025 19:25:06 GMT Markdown Content: Datasets[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/datasets.html.md#datasets "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------- Data Preparation for Speaker Diarization Training (For End-to-End Diarization)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/datasets.html.md#data-preparation-for-speaker-diarization-training-for-end-to-end-diarization "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Speaker diarization training and inference both require the same type of manifest files. This manifest file can be created by using the script in `/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py`. The following example shows how to run `pathfiles_to_diarize_manifest.py` by providing path list files. python NeMo/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py \ --add_duration \ --paths2audio_files="/path/to/audio_file_path_list.txt" \ --paths2rttm_files="/path/to/rttm_file_list.txt" \ --manifest_filepath="/path/to/manifest_filepath/train_manifest.json" All three arguments are required. Note that we need to maintain consistency on unique filenames for every field (key) by only changing the filename extensions. For example, if there is an audio file named `abcd01.wav`, the rttm file should be named as `abcd01.rttm` and the transcription file should be named as `abcd01.txt`. * Example audio file path list `audio_file_path_list.txt` /path/to/abcd01.wav /path/to/abcd02.wav To train a diarization model, one needs to provide Rich Transcription Time Marked (RTTM) files as ground truth label files. Here is one line from a RTTM file as an example: SPEAKER TS3012d.Mix-Headset 1 32.679 0.671 MTD046ID Make a list of RTTM files for the audio files you have in `audio_file_path_list.txt`. * Example RTTM file path list `rttm_file_path_list.txt` /path/to/abcd01.rttm /path/to/abcd02.rttm Note We expect all the provided files (e.g. audio, rttm, text) to have the same base name and the name should be unique (uniq-id). As an output file, `train_manifest.json` will have the following line for each audio file: {"audio_filepath": "/path/to/abcd01.wav", "offset": 0, "duration": 90, "label": "infer", "text": "-", "num_speakers": 2, "rttm_filepath": "/path/to/rttm/abcd01.rttm"} For end-to-end speaker diarization training, the manifest file described in this section fullfils the requirements for the input manifest file. For cascaded speaker diarization training (TS-VAD style), the manifest file should be further processed to generate session-wise manifest files. Manifest JSON files for MSDD (TS-VAD style model) Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/datasets.html.md#manifest-json-files-for-msdd-ts-vad-style-model-training "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This section is about formatting a dataset for cascaded diarization training (e.g., TS-VAD, MSDD, etc.). To train or fine-tune the speaker diarization system, you could either train/fine-tune speaker embedding extractor model separately or you can train/fine-tune speaker embedding extractor and neural diarizer at the same time. * To train or fine-tune a speaker embedding extractor model separately, please go check out these pages: [Speech Classification Datasets](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speech_classification/datasets.html.md) and [Speaker Recognition Datasets](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/datasets.html.md) for preparing datasets for training and validating VAD and speaker embedding models respectively. [![Image 1: MSDD training and inference](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/msdd_train_and_infer.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/msdd_train_and_infer.png) As shown in the above figure, a full-fledged speaker diarization process through speaker embedding extractor, clustering algorithm and neural diarizer. Note that only speaker embedding extractor and neural diarizer are trainable models and they can be train/fine-tune together on diarization datasets. We recommend to use a speaker embedding extractor model that is trained on large amount of single-speaker dataset and use it for training a neural diarizer model. For training MSDD, we need one more step of trucating the source manifest into even shorter chunks. After generating a session-wise manifest file, we need to break down each session-wise manifest file into a split manifest file containing start time and duration of the split samples due to memory capacity. More importantly, since MSDD only uses pairwise (two-speaker) model and data samples, we need to split RTTM files if there are more than two speakers. Note that you should specify window length and shift length of the base scale of your MSDD model when you generate the manifest file for training samples. More importantly, `step_count` determines how many steps (i.e., base-scale segments) are in a split data sample. If `step_count` is too long, you might not be able to load a single sample in a batch. python NeMo/scripts/speaker_tasks/create_msdd_train_dataset.py \ --input_manifest_path='path/to/train_manifest.json' \ --output_manifest_path='path/to/train_manifest.50step.json' \ --pairwise_rttm_output_folder='path/to/rttm_output_folder' \ --window=0.5 \ --shift=0.25 \ --step_count=50 All arguments are required to generate a new manifest file. Specify a session-wise diarization manifest file to `--input_manifest_path` and specify an output file name in `--output_manifest_path`. In the folder that is specified for `--pairwise_rttm_output_folder`, the script will create multiple two-speaker RTTM files from the given RTTM file and create manifest file that only contains two speakers in the specified RTTM range. For example, if `abcd01.wav` has three speakers (`1911,1988,192`), the three RTTM files will be created: `abcd01.1911_1988.rttm`, `abcd01.1911_192.rttm` and `abcd01.1988_192.rttm`. Subsequently, the segments will be only generated from the newly generated two-speaker RTTM files. Specify `window` and `shift` of the base-scale in your MSDD model. In this example, we use default setting of `window=0.5` and `shift=0.25` and `step_count=50`. Here are example lines in the output file `/path/to/train_manifest.50step.json`. * Example manifest file `train_manifest.50step.json`. {"audio_filepath": "/path/to/abcd01.wav", "offset": 0.007, "duration": 14.046, "label": "infer", "text": "-", "num_speakers": 2, "rttm_filepath": "simulated_train/abcd01.1919_1988.rttm"} {"audio_filepath": "/path/to/abcd01.wav", "offset": 13.553, "duration": 16.429, "label": "infer", "text": "-", "num_speakers": 2, "rttm_filepath": "simulated_train/abcd01.1919_1988.rttm"} {"audio_filepath": "/path/to/abcd02.wav", "offset": 0.246, "duration": 15.732, "label": "infer", "text": "-", "num_speakers": 2, "rttm_filepath": "path/to/rttm_output_folder/abcd02.777_5694.rttm"} {"audio_filepath": "/path/to/abcd02.wav", "offset": 15.478, "duration": 14.47, "label": "infer", "text": "-", "num_speakers": 2, "rttm_filepath": "path/to/rttm_output_folder/abcd02.777_5694.rttm"} Prepare the msdd training dataset for both train and validation. After the training dataset is prepared, you can train an MSDD model with the following script: python ./multiscale_diar_decoder.py --config-path='../conf/neural_diarizer' --config-name='msdd_5scl_15_05_50Povl_256x3x32x2.yaml' \ trainer.devices=1 \ trainer.max_epochs=20 \ model.base.diarizer.speaker_embeddings.model_path="titanet_large" \ model.train_ds.manifest_filepath="" \ model.validation_ds.manifest_filepath="" \ model.train_ds.emb_dir="" \ model.validation_ds.emb_dir="" \ exp_manager.name='sample_train' \ exp_manager.exp_dir='./msdd_exp' \ In the above example training session, we use `titanet_large` model as a pretrained speaker embedding model. Data Preparation for Diarization Inference: for Both End-to-end and Cascaded Systems[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/datasets.html.md#data-preparation-for-diarization-inference-for-both-end-to-end-and-cascaded-systems "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- As in dataset preparation for diarization trainiing, diarization inference is based on Hydra configurations which are fulfilled by `.yaml` files. See [NeMo Speaker Diarization Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md) for setting up the input Hydra configuration file for speaker diarization inference. Input data should be provided in line delimited JSON format as below: {"audio_filepath": "/path/to/abcd.wav", "offset": 0, "duration": null, "label": "infer", "text": "-", "num_speakers": null, "rttm_filepath": "/path/to/rttm/abcd.rttm", "uem_filepath": "/path/to/uem/abcd.uem"} In each line of the input manifest file, `audio_filepath` item is mandatory while the rest of the items are optional and can be passed for desired diarization setting. We refer to this file as a manifest file. This manifest file can be created by using the script in `/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py`. The following example shows how to run `pathfiles_to_diarize_manifest.py` by providing path list files. python pathfiles_to_diarize_manifest.py --paths2audio_files /path/to/audio_file_path_list.txt \ --paths2txt_files /path/to/transcript_file_path_list.txt \ --paths2rttm_files /path/to/rttm_file_path_list.txt \ --paths2uem_files /path/to/uem_file_path_list.txt \ --paths2ctm_files /path/to/ctm_file_path_list.txt \ --manifest_filepath /path/to/manifest_output/input_manifest.json The `--paths2audio_files` and `--manifest_filepath` are required arguments. Note that we need to maintain consistency on unique filenames for every field (key) by only changing the filename extensions. For example, if there is an audio file named `abcd.wav`, the rttm file should be named as `abcd.rttm` and the transcription file should be named as `abcd.txt`. * Example audio file path list `audio_file_path_list.txt` /path/to/abcd01.wav /path/to/abcd02.wav * Example RTTM file path list `rttm_file_path_list.txt` /path/to/abcd01.rttm /path/to/abcd02.rttm The path list files containing the absolute paths to these WAV, RTTM, TXT, CTM and UEM files should be provided as in the above example. `pathsfiles_to_diarize_manifest.py` script will match each file using the unique filename (e.g. `abcd`). Finally, the absolute path of the created manifest file should be provided through Hydra configuration as shown below: diarizer.manifest_filepath="path/to/manifest/input_manifest.json" The following are descriptions about each field in an input manifest JSON file. Note We expect all the provided files (e.g. audio, rttm, text) to have the same base name and the name should be unique (uniq-id). `audio_filepath` (Required): > a string containing absolute path to the audio file. `num_speakers` (Optional): > If the number of speakers is known, provide the integer number or assign null if not known. `rttm_filepath` (Optional): > To evaluate a diarization system with known rttm files, one needs to provide Rich Transcription Time Marked (RTTM) files as ground truth label files. If RTTM files are provided, the diarization evaluation will be initiated. Here is one line from a RTTM file as an example: SPEAKER TS3012d.Mix-Headset 1 331.573 0.671 MTD046ID `text` (Optional): > Ground truth transcription for diarization with ASR inference. Provide the ground truth transcription of the given audio file in string format {"text": "this is an example transcript"} `uem_filepath` (Optional): > The UEM file is used for specifying the scoring regions to be evaluated in the given audio file. UEMfile follows the following convention: ` `. `` is set to 1. > > > Example lines of UEM file: TS3012d.Mix-Headset 1 12.31 108.98 TS3012d.Mix-Headset 1 214.00 857.09 `ctm_filepath` (Optional): > The CTM file is used for the evaluation of word-level diarization results and word-timestamp alignment. The CTM file follows this convention: ` `. Note that the `` should exactly match speaker IDs in RTTM. Since confidence is not required for evaluating diarization results, we assign `` the value `NA`. If the type of token is words, we assign `` as `lex`. > > > Example lines of CTM file: TS3012d.Mix-Headset 1 12.879 0.32 okay NA lex MTD046ID TS3012d.Mix-Headset 1 13.203 0.24 yeah NA lex MTD046ID Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/datasets.html.md#data-preparation-for-diarization-inference-for-both-end-to-end-and-cascaded-systems) - [Speech Classification Datasets](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speech_classification/datasets.html.md) - [Speaker Recognition Datasets](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/datasets.html.md) - [](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/msdd_train_and_infer.png) - [NeMo Speaker Diarization Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/configs.html.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md Title: Models — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html Published Time: Thu, 30 Oct 2025 07:07:30 GMT Markdown Content: Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#models "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------- This section gives a brief overview of the supported speaker diarization models in NeMo’s ASR collection. Currently NeMo Speech AI supports two types of speaker diarization systems: 1. **End-to-end Speaker Diarization:** Sortformer Diarizer Sortformer is a Transformer encoder-based end-to-end speaker diarization model that generates predicted speaker labels directly from input audio clips. We offer offline and online versions of Sortformer Diarizer. Online version of Sortformer diarizer can also be used for offline diarization by setting a long enough chunk size. 1. **Cascaded (Pipelined) Speaker Diarization:** Clustering diarizer with Multi-Scale Diarization Decoder (MSDD) The speaker diarization pipeline in NeMo Speech AI involves the use of the [MarbleNet](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speech_classification/models.html.md) model for Voice Activity Detection (VAD), the [TitaNet](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md) model for speaker embedding extraction, and the Multi-Scale Diarization Decoder for neural diarization, all of which are explained on this page. Sortformer Diarizer[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#sortformer-diarizer "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Speaker diarization is all about figuring out who’s speaking when in an audio recording. In the world of automatic speech recognition (ASR), this becomes even more important for handling conversations with multiple speakers. Multispeaker ASR (also known as speaker-attributed or multitalker ASR) uses this process to not just transcribe what’s being said, but also to label each part of the transcript with the right speaker. As ASR technology continues to advance, speaker diarization is increasingly becoming part of the ASR workflow itself. Some systems now handle speaker labeling and transcription at the same time during decoding. This means you not only get accurate text—you’re also getting insights into who said what, making it more useful for conversational analysis. However, despite significant advancements, integrating speaker diarization and ASR into a unified, seamless system remains a considerable challenge. A key obstacle lies in the need for extensive high-quality, annotated audio data featuring multiple speakers. Acquiring such data is far more complex than collecting monaural-speaker datasets. This challenge is particularly pronounced for low-resource languages and domains like healthcare, where strict privacy regulations further constrain data availability. On top of that, many real-world use cases need these models to handle really long audio files—sometimes hours of conversation at a time. Training on such lengthy data is even more complicated because it’s hard to find or annotate. This creates a big gap between what’s needed and what’s available, making multispeaker ASR one of the toughest nuts to crack in the field of speech technology. [![Image 1: Intro Comparison](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/intro_comparison.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/intro_comparison.png) To tackle the complexities of multispeaker automatic speech recognition (ASR), we introduce [Sortformer](https://arxiv.org/abs/2409.06656), a new approach that incorporates _Sort Loss_ and techniques to align timestamps with text tokens. Traditional approaches like permutation-invariant loss (PIL) face challenges when applied in batchable and differentiable computational graphs, especially since token-based objectives struggle to incorporate speaker-specific attributes into PIL-based loss functions. To address this, we propose an arrival time sorting (ATS) approach. In this method, speaker tokens from ASR outputs and speaker timestamps from diarization outputs are sorted by their arrival times to resolve permutations. This approach allows the multispeaker ASR system to be trained or fine-tuned using token-based cross-entropy loss, eliminating the need for timestamp-based or frame-level objectives with PIL. [![Image 2: Arrival Time Sort](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/ats.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/ats.png) The ATS-based multispeaker ASR system is powered by an end-to-end neural diarizer model, Sortformer, which generates speaker-label timestamps in arrival time order (ATO). To train the neural diarizer to produce sorted outputs, we introduce Sort Loss, a method that creates gradients enabling the Transformer model to learn the ATS mechanism. [![Image 3: Main Dataflow](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/main_dataflow.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/main_dataflow.png) Additionally, as shown in the above figure, our diarization system integrates directly with the ASR encoder. By embedding speaker supervision data as speaker kernels into the ASR encoder states, the system seamlessly combines speaker and transcription information. This unified approach improves performance and simplifies the overall architecture. As a result, our end-to-end multispeaker ASR system is fully or partially trainable with token objectives, allowing both the ASR and speaker diarization modules to be trained or fine-tuned using these objectives. Additionally, during the multispeaker ASR training phase, no specialized loss calculation functions are needed when using Sortformer, as frameworks for standard single-speaker ASR models can be employed. These compatibilities greatly simplify and accelerate the training and fine-tuning process of multispeaker ASR systems. On top of all these benefits, _Sortformer_ can be used as a stand-alone end-to-end speaker diarization model. By training a Sortformer diarizer model especially on high-quality simulated data with accurate time-stamps, you can boost the performance of multi-speaker ASR systems, just by integrating the _Sortformer_ model as _Speaker Supervision_ model in a computation graph. In this tutorial, we will walk you through the process of training a Sortformer diarizer model with toy dataset. Before starting, we will introduce the concepts of Sort-Loss calculation and the Hybrid loss technique. [![Image 4: Sortformer Model with Hybrid Loss](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/sortformer.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/sortformer.png)[![Image 5: PIL model VS SortLoss model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/loss_types.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/loss_types.png) _Sort Loss_ is designed to compare the predicted outputs with the true labels, typically sorted in arrival-time order or another relevant metric. The key distinction that _Sortformer_ introduces compared to previous end-to-end diarization systems such as [EEND-SA](https://arxiv.org/pdf/1909.06247), [EEND-EDA](https://arxiv.org/abs/2106.10654) lies in the organization of class presence . The figure below illustrates the difference between _Sort Loss_ and permutation-invariant loss (PIL) or permutation-free loss. * PIL is calculated by finding the permutation of the target that minimizes the loss value between the prediction and the target. * _Sort Loss_ simply compares the arrival-time-sorted version of speaker activity outputs for both the prediction and the target. Note that sometimes the same ground-truth labels lead to different target matrices for _Sort Loss_ and PIL. For example, the figure below shows two identical source target matrices (the two matrices at the top), but the resulting target matrices for _Sort Loss_ and PIL are different. Streaming Sortformer Diarizer[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#streaming-sortformer-diarizer "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [Streaming Sortformer](https://www.arxiv.org/pdf/2507.18446) is a streaming version of Sortformer diarizer. To handle live audio, Streaming Sortformer processes the sound in small, overlapping chunks. It employs an Arrival-Order Speaker Cache (AOSC) that stores frame-level acoustic embeddings for all speakers previously detected in the audio stream. This allows the model to compare speakers in the current chunk with those in the previous ones, ensuring a person is consistently identified with the same label throughout the stream. [![Image 6: Chunk-wise processing with AOSC and FIFO buffer in Streaming Sortformer inference](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/cache_fifo_chunk.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/cache_fifo_chunk.png) Streaming Sortformer employs a pre-encoder layer in the Fast-Conformer to generate a speaker cache. At each step, speaker cache is filtered to only retain the high-quality speaker cache vectors. Aside from speaker-cache management part, Streaming Sortformer follows the architecture of the offline version of Sortformer. [![Image 7: The dataflow of step-wise Streaming Sortformer inference](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/streaming_steps.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/streaming_steps.png) Below is the animated heatmap illustrating real-time speaker diarization for a three-speaker conversation using Streaming Sortformer. The heatmap shows how activities of speakers are detected in the current chunk and updated in the Arrival-Order Speaker Cache and FIFO queue. [![Image 8: Streaming Sortformer Animated](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/aosc_3spk_example.gif)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/aosc_3spk_example.gif) Multi-Scale Diarization Decoder[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#multi-scale-diarization-decoder "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [![Image 9: Speaker diarization pipeline- VAD, segmentation, speaker embedding extraction, clustering](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/sd_pipeline.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/sd_pipeline.png) Speaker diarization system needs to produce very accurate timestamps since speaker turns can be extremely short in conversational settings. Human conversation often involves very short back-channel words such as “yes”, “uh-huh” or “oh” and these words are very challenging for machines to transcribe and tell the speaker. Therefore, while segmenting audio recordings in terms of speaker identity, speaker diarization requires fine-grained decisions on relatively short segments, ranging from a few tenths of a second to several seconds. Making accurate, fine-grained decisions on such short audio segments is challenging because it is less likely to capture reliable speaker traits from the very short audio segments. We will discuss how this problem can be addressed by introducing a new technique called the multi-scale approach and multiscale diarization decoder to handle multi-scale inputs. Extracting long audio segments is desirable in terms of the quality of speaker characteristics. However, the length of audio segments also limits the granularity, which leads to a coarse unit length for speaker label decisions. Therefore, speaker diarization systems are challenged by a trade-off between temporal resolution and the fidelity of the speaker representation, as depicted in the curve shown in the figure below. During the speaker feature extraction process in the speaker diarization pipeline, the temporal resolution is inevitably sacrificed by taking a long speech segment to obtain high-quality speaker representation vectors. In plain and simple language, if we try to be very accurate on voice characteristics then we need to look into a longer span of time. However, at the same time, if we look into a longer span of time, we have to make a decision on a fairly long span of time and this leads to coarse decisions (temporal resolution is low). This can be easily understood if we think about the fact that even human listeners cannot accurately tell who is speaking if only half a second of recorded speech is given. In traditional diarization systems, an audio segment length ranges from 1.5~3.0 seconds since such numbers make a good compromise between the quality of speaker characteristics and temporal resolution. We refer to this type of segmentation method as a single-scale approach. Even with an overlap technique, the single-scale segmentation limits the temporal resolution to 0.75~1.5 seconds, which leaves room for improvement in terms of temporal accuracy. Having a coarse temporal resolution not only deteriorates the performance of diarization but also decreases speaker counting accuracy since short speech segments are not captured properly. More importantly, such coarse temporal resolution in the speaker timestamps makes the matching between the decoded ASR text and speaker diarization result more error-prone. .. image:: images/ms_trade_off.png > align: > center > > width: > 800px > > alt: > Speaker diarization pipeline- VAD, segmentation, speaker embedding extraction, clustering To tackle the problem, the multi-scale approach is proposed to cope with such a trade-off by extracting speaker features from multiple segment lengths and then combining the results from multiple scales. The multi-scale approach is fulfilled by employing multi-scale segmentation and extracting speaker embeddings from each scale. The left side of the above figure shows how four different scales in a multi-scale segmentation approach are performed. During the segment affinity calculation process, all the information from the longest scale to the shortest scale is combined, yet a decision is made only for the shortest segment range. When combining the features from each scale, the weight of each scale largely affects the speaker diarization performance. Since scale weights largely determine the accuracy of the speaker diarization system, the scale weights should be set to have the maximized speaker diarization performance. Hence, we came up with a novel multi-scale diarization system called multiscale diarization decoder [[SD-MODELS1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#id117 "Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, and Boris Ginsburg. Multi-scale speaker diarization with dynamic scale weighting. 2022. URL: https://arxiv.org/abs/2203.15974, doi:10.48550/ARXIV.2203.15974.")] that dynamically determines the importance of each scale at each timestep. Multiscale diarization decoder takes the multiple speaker embedding vectors from multiple scales and then estimates desirable scale weights. Based on the estimated scale weights, speaker labels are generated. Hence, the proposed system weighs more on the large scale if the input signals are considered to have more accurate information on the certain scales. [![Image 10: Speaker diarization pipeline- VAD, segmentation, speaker embedding extraction, clustering](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/data_flow.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/data_flow.png) The data flow of the multiscale speaker diarization system is shown in the above figure. Multi-scale segments are extracted from audio input, and corresponding speaker embedding vectors for multi-scale audio input are generated by using speaker embedding extractor (TitaNet). Followingly, the extracted multi-scale embeddings are processed by clustering algorithm to provide initializing clustering result to MSDD module. MSDD module uses cluster-average speaker embedding vectors to compare these with input speaker embedding sequences. The scale weights for each step is estimated to weigh the importance of each scale. Finally, the sequence model is trained to output speaker label probabilities for each speaker. [![Image 11: A figure explaining CNN based scale weighting mechanism](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/scale_weight_cnn.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/scale_weight_cnn.png) A neural network model named multi-scale diarization decoder (MSDD) is trained to take advantage of a multi-scale approach by dynamically calculating the weight of each scale. MSDD takes the initial clustering results and compares the extracted speaker embeddings with the cluster-average speaker representation vectors. Most importantly, the weight of each scale at each time step is determined through a scale weighting mechanism where the scale weights are calculated from a 1-D convolutional neural networks (CNNs) applied to the multi-scale speaker embedding inputs and the cluster average embeddings as described in the above figure. [![Image 12: A figure explaining weighted sum of cosine similarity values](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/weighted_sum.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/weighted_sum.png) The estimated scale weights are applied to cosine similarity values calculated for each speaker and each scale. The above figure shows the process of calculating the context vector by applying the estimated scale weights on cosine similarity calculated between cluster-average speaker embedding and input speaker embeddings. Aside from CNN-based weighting scheme, MSDD implementation in NeMo toolkit allows multiple options for calculating scale weights `model.msdd_module.weighting_scheme`: * `conv_scale_weight`: Default setting. Use 1-D CNN filters to calculate scale weights. * `attn_scale_weight`: Calculate the scale weights by applying an attention mechanism between cluster-average embeddings and input embeddings. This can be viewed as attention values for scale at each timestep. Finally, each context vector for each step is fed to a multi-layer LSTM model that generates per-speaker speaker existence probability. The figure below shows how speaker label sequences are estimated by LSTM model and context vector input. [![Image 13: Speaker diarization pipeline- VAD, segmentation, speaker embedding extraction, clustering](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/sequence_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/sequence_model.png) In NeMo toolkit, MSDD implementation has multiple options for the context vector by specifying `model.msdd_module.context_vector_type`: * `cos_sim`: As described in this document, scale weights are applied to cosine similarity values between cluster-average embedding vectors and input embedding vectors. Default is `cos_sim`. * `elem_prod`: The scale weights are directly applied to speaker embedding vectors then a weighted speaker embedding vector is calculated for both cluster-average embedding vectors and input embedding vectors. Finally, elementwise product between the cluster-average weighted speaker embedding vector and input multi-scale embedding vector are calculated and fed to LSTMs as a context vector for each step. References[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#references "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------- Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#references) - [MarbleNet](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speech_classification/models.html.md) - [TitaNet](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md) - [](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/sequence_model.png) - [Sortformer](https://arxiv.org/abs/2409.06656) - [EEND-SA](https://arxiv.org/pdf/1909.06247) - [EEND-EDA](https://arxiv.org/abs/2106.10654) - [Streaming Sortformer](https://www.arxiv.org/pdf/2507.18446) - [SD-MODELS1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/models.html.md#id4) - [https://arxiv.org/abs/2203.15974](https://arxiv.org/abs/2203.15974) - [doi:10.48550/ARXIV.2203.15974](https://doi.org/10.48550/ARXIV.2203.15974) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md Title: NeMo Speaker Recognition API — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html Published Time: Fri, 18 Jul 2025 19:25:22 GMT Markdown Content: NeMo Speaker Recognition API[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md#nemo-speaker-recognition-api "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Model Classes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md#model-classes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------ _class_ nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel(_*args:Any_, _**kwargs:Any_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md#nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel "Link to this definition") Bases: [`ModelPT`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT "nemo.core.classes.modelPT.ModelPT"), `ExportableEncDecModel`, `VerificationMixin` Encoder decoder class for speaker label models. Model class creates training, validation methods for setting up data performing model forward pass. Expects config dict for > * preprocessor > > * Jasper/Quartznet Encoder > > * Speaker Decoder get_embedding(_path2audio\_file_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md#nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel.get_embedding "Link to this definition") Returns the speaker embeddings for a provided audio file. Parameters: **path2audio_file** – path to an audio wav file Returns: speaker embeddings (Audio representations) Return type: emb verify_speakers(_path2audio\_file1_,_path2audio\_file2_,_threshold=0.7_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md#nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel.verify_speakers "Link to this definition") Verify if two audio files are from the same speaker or not. Parameters: * **path2audio_file1** – path to audio wav file of speaker 1 * **path2audio_file2** – path to audio wav file of speaker 2 * **threshold** – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7) Returns: True if both audio files are from same speaker, False otherwise verify_speakers_batch(_audio\_files\_pairs_,_threshold=0.7_,_batch\_size=32_,_sample\_rate=16000_,_device='cuda'_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md#nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel.verify_speakers_batch "Link to this definition") Verify if audio files from the first and second manifests are from the same speaker or not. Parameters: * **audio_files_pairs** – list of tuples with audio_files pairs to be verified * **threshold** – cosine similarity score used as a threshold to distinguish two embeddings (default = 0.7) * **batch_size** – batch size to perform batch inference * **sample_rate** – sample rate of audio files in manifest file * **device** – compute device to perform operations. Returns: True if both audio pair is from same speaker, False otherwise Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/api.html.md#nemo.collections.asr.models.label_models.EncDecSpeakerLabelModel.verify_speakers_batch) - [ModelPT](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md Title: Models — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html Published Time: Thu, 30 Oct 2025 07:07:30 GMT Markdown Content: Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#models "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------- Examples of config files for all the below models can be found in the `/examples/speaker_recognition/conf` directory. For more information about the config files and how they should be structured, see the [NeMo Speaker Recognition Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/configs.html.md) page. Pretrained checkpoints for all of these models, as well as instructions on how to load them, can be found on the [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/results.html.md) page. You can use the available checkpoints for immediate inference, or fine-tune them on your own datasets. The Checkpoints page also contains benchmark results for the available speaker recognition models. TitaNet[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#titanet "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------- TitaNet model [[SR-MODELS4](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id110 "Nithin Rao Koluguri, Taejin Park, and Boris Ginsburg. Titanet: neural model for speaker representation with 1d depth-wise separable convolutions and global context. arXiv preprint arXiv:2110.04410, 2021.")] is based on the ContextNet architecture [[SR-MODELS2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id106 "Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, and Yonghui Wu. Contextnet: improving convolutional neural networks for automatic speech recognition with global context. arXiv:2005.03191, 2020.")] for extracting speaker representations. We employ 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with global context followed by channel attention based statistics pooling layer to map variable-length utterances to a fixed-length embedding (tvector). TitaNet is a scalable architecture and achieves state-of-the-art performance on speaker verification and diarization tasks. > [![Image 1: speakernet model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/titanet_network.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/titanet_network.png) SpeakerNet models can be instantiated using the `EncDecSpeakerLabelModel` class. SpeakerNet[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#speakernet "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------- The model is based on the QuartzNet ASR architecture [[SR-MODELS3](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id108 "Nithin Rao Koluguri, Jason Li, Vitaly Lavrukhin, and Boris Ginsburg. Speakernet: 1d depth-wise separable convolutional network for text-independent speaker recognition and verification. arXiv preprint arXiv:2010.12653, 2020.")] comprising of an encoder and decoder structure. We use the encoder of the QuartzNet model as a top-level feature extractor, and feed the output to the statistics pooling layer, where we compute the mean and variance across channel dimensions to capture the time-independent utterance-level speaker features. The QuartzNet encoder used for speaker embeddings shown in figure below has the following structure: a QuartzNet BxR model has B blocks, each with R sub-blocks. Each sub-block applies the following operations: a 1D convolution, batch norm, ReLU, and dropout. All sub-blocks in a block have the same number of output channels. These blocks are connected with residual connections. We use QuartzNet with 3 blocks, 2 sub-blocks, and 512 channels, as the Encoder for Speaker Embeddings. All conv layers have stride 1 and dilation 1. > [![Image 2: speakernet model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/ICASPP_SpeakerNet.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/ICASPP_SpeakerNet.png) Top level acoustic Features, obtained from the output of encoder are used to compute intermediate features that are then passed to the decoder for getting utterance level speaker embeddings. The intermediate time-independent features are computed using a statistics pooling layer, where we compute the mean and standard deviation of features across time-channels, to get a time-independent feature representation S of size Batch_size × 3000. The intermediate features, S are passed through the Decoder consisting of two layers each of output size 512 for a linear transformation from S to the final number of classes N for the larger (L) model, and a single linear layer of output size 256 to the final number of classes N for the medium (M) model. We extract q-vectors after the final linear layer of fixed size 512, 256 for SpeakerNet-L and SpeakerNet-M models respectively. SpeakerNet models can be instantiated using the `EncDecSpeakerLabelModel` class. ECAPA_TDNN[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#ecapa-tdnn "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------- The model is based on the paper “ECAPA_TDNN Embeddings for Speaker Diarization” [[SR-MODELS1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id111 "Nauman Dawalatabad, Mirco Ravanelli, François Grondin, Jenthe Thienpondt, Brecht Desplanques, and Hwidong Na. Ecapa-tdnn embeddings for speaker diarization. Interspeech 2021, Aug 2021. URL: http://dx.doi.org/10.21437/Interspeech.2021-941, doi:10.21437/interspeech.2021-941.")] comprising an encoder of time dilation layers which are based on Emphasized Channel Attention, Propagation, and Aggregation. The ECAPA-TDNN model employs a channel and context dependent attention mechanism, Multi layer Feature Aggregation (MFA), as well as Squeeze-Excitation (SE) and residual blocks, due to faster training and inference we replacing residual blocks with group convolution blocks of single dilation. These models has shown good performance over various speaker tasks. ecapa_tdnn models can be instantiated using the `EncDecSpeakerLabelModel` class. References[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#references "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------- [[SR-MODELS2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id2)] Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, and Yonghui Wu. Contextnet: improving convolutional neural networks for automatic speech recognition with global context. _arXiv:2005.03191_, 2020. [[SR-MODELS3](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id3)] Nithin Rao Koluguri, Jason Li, Vitaly Lavrukhin, and Boris Ginsburg. Speakernet: 1d depth-wise separable convolutional network for text-independent speaker recognition and verification. _arXiv preprint arXiv:2010.12653_, 2020. [[SR-MODELS4](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id1)] Nithin Rao Koluguri, Taejin Park, and Boris Ginsburg. Titanet: neural model for speaker representation with 1d depth-wise separable convolutions and global context. _arXiv preprint arXiv:2110.04410_, 2021. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#references) - [NeMo Speaker Recognition Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/configs.html.md) - [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/results.html.md) - [SR-MODELS4](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id1) - [SR-MODELS2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id2) - [](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/ICASPP_SpeakerNet.png) - [SR-MODELS3](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id3) - [SR-MODELS1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/models.html.md#id4) - [http://dx.doi.org/10.21437/Interspeech.2021-941](http://dx.doi.org/10.21437/Interspeech.2021-941) - [doi:10.21437/interspeech.2021-941](https://doi.org/10.21437/interspeech.2021-941) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/resources.html.md Title: Resources and Documentation — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/resources.html Published Time: Thu, 30 Oct 2025 07:07:31 GMT Markdown Content: Resources and Documentation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/resources.html.md#resources-and-documentation "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Refer to [SSL-for-ASR notebook](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/asr/Self_Supervised_Pre_Training.ipynb) for a hands-on tutorial. If you are a beginner to NeMo, consider trying out the [ASR with NeMo](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/asr/ASR_with_NeMo.ipynb) tutorial. This and most other tutorials can be run on Google Colab by specifying the link to the notebooks’ GitHub pages on Colab. If you are looking for information about a particular ASR model, or would like to find out more about the model architectures available in the `nemo_asr` collection, refer to the [ASR Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/models.html.md) page. NeMo includes preprocessing scripts for several common ASR datasets. The [ASR Datasets](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/datasets.html.md) page contains instructions on running those scripts. It also includes guidance for creating your own NeMo-compatible dataset, if you have your own data. Information about how to load model checkpoints (either local files or pretrained ones from NGC), as well as a list of the checkpoints available on NGC are located on the [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/results.html.md) page. Documentation regarding the configuration files specific to the SSL can be found in the [Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/configs.html.md) page. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/resources.html.md#resources-and-documentation) - [SSL-for-ASR notebook](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/asr/Self_Supervised_Pre_Training.ipynb) - [ASR with NeMo](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/asr/ASR_with_NeMo.ipynb) - [ASR Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/models.html.md) - [ASR Datasets](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/datasets.html.md) - [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/results.html.md) - [Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/ssl/configs.html.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md Title: Metrics — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html Published Time: Fri, 05 Sep 2025 19:01:20 GMT Markdown Content: Metrics[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#metrics "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.collections.common.metrics.Perplexity(_*args:Any_, _**kwargs:Any_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity "Link to this definition") Bases: `Metric` This class computes mean perplexity of distributions in the last dimension of inputs. It is a wrapper around torch.distributions.Categorical.perplexity method. You have to provide either `probs` or `logits` to the [`update()`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.update "nemo.collections.common.metrics.Perplexity.update") method. The class computes perplexities for distributions passed to [`update()`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.update "nemo.collections.common.metrics.Perplexity.update") method in `probs` or `logits` arguments and averages the perplexities. Reducing results between all workers is done via SUM operations. See the [TorchMetrics in PyTorch Lightning guide](https://lightning.ai/docs/torchmetrics/stable/pages/lightning.html) for the metric usage instructions. Parameters: * **dist_sync_on_step** – Synchronize metric state across processes at each `forward()` before returning the value at the step. * **process_group** – Specify the process group on which synchronization is called. default: `None` (which selects the entire world) * **validate_args** – If `True` values of [`update()`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.update "nemo.collections.common.metrics.Perplexity.update") method parameters are checked. `logits` has to not contain NaNs and `probs` last dim has to be valid probability distribution. compute()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.compute "Link to this definition") Returns perplexity across all workers and resets to 0 `perplexities_sum` and `num_distributions`. full_state_update _=True_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.full_state_update "Link to this definition")update(_probs=None_, _logits=None_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.update "Link to this definition") Updates `perplexities_sum` and `num_distributions`. :param probs: A `torch.Tensor` which innermost dimension is valid probability distribution. :param logits: A `torch.Tensor` without NaNs. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.full_state_update) - [update()](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/metrics.html.md#nemo.collections.common.metrics.Perplexity.update) - [TorchMetrics in PyTorch Lightning guide](https://lightning.ai/docs/torchmetrics/stable/pages/lightning.html) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md Title: Adapter Components — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html Published Time: Fri, 18 Jul 2025 19:25:38 GMT Markdown Content: Adapter Components[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#adapter-components "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Adapters can be considered as any set of parameters that are added to a pre-existing module/model. In our case, we currently support the standard adapter in literature, more advanced adapter modules are being researched and can potentially be supported by NeMo. An adapter module can be any pytorch module, but it must follow certain straightforward requirements - 1. The model accepts an input of some input dimension, and its output must match this dimension. 2. Ideally, the module is initialized such that the output of the adapter when initialized is such that it does not modify the original input. This allows the model to produce the same output results, even when additional parameters have been added. According to Junxian et al [[1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#id6 "Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. Towards a unified view of parameter-efficient transfer learning. 2021. URL: https://arxiv.org/abs/2110.04366, doi:10.48550/ARXIV.2110.04366.")], we can consider an adapter being represented as three components - 1. Functional form - the trainable parameters that will modify the input 2. Insertion form - Where the adapter outputs are integrated with the original input. The input to the adapters can be the last output of the layer, the input to some attention layer, or even the original input to the module itself (before even the modules forward pass). 3. Composition function - How the adapters outputs are integrated with the inputs. It can be as simple as residual addition connection, or concatenation, or point-wise multiplication etc. Functional Form - Adapter Networks[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#functional-form-adapter-networks "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Adapter modules represent the functional form of the adapter. We discuss an example of a most commonly used adapter module found in literature, titled the `LinearAdapter` (or Houlsby Adapter) [[2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#id5 "Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, 2790–2799. PMLR, 2019.")]. Note All adapter modules must extend `AdapterModuleUtil` and should ideally have an equivalent DataClass config for easy instantiation ! _class_ nemo.collections.common.parts.adapter_modules.AdapterModuleUtil Bases: [`AccessMixin`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.access_mixins.AccessMixin "nemo.core.classes.mixins.access_mixins.AccessMixin") Base class of Adapter Modules, providing common functionality to all Adapter Modules. setup_adapter_strategy(_adapter\_strategy:AbstractAdapterStrategy|None_,) Setup adapter strategy of this class, enabling dynamic change in the way the adapter output is merged with the input. When called successfully, will assign the variable adapter_strategy to the module. Parameters: **adapter_strategy** – Can be a None or an implementation of AbstractAdapterStrategy. get_default_strategy_config()→dataclass Returns a default adapter module strategy. adapter_unfreeze() Sets the requires grad for all parameters in the adapter to True. This method should be overridden for any custom unfreeze behavior that is required. For example, if not all params of the adapter should be unfrozen. * * * _class_ nemo.collections.common.parts.adapter_modules.LinearAdapter(_*args:Any_, _**kwargs:Any_) Bases: `Module`, `AdapterModuleUtil` Simple Linear Feedforward Adapter module with LayerNorm and singe hidden layer with activation function. Note: The adapter explicitly initializes its final layer with all zeros in order to avoid affecting the original model when all adapters are disabled. Parameters: * **in_features** – Input dimension of the module. Note that for adapters, input_dim == output_dim. * **dim** – Hidden dimension of the feed forward network. * **activation** – Str name for an activation function. * **norm_position** – Str, can be pre or post. Defaults to pre. Determines whether the normalization will occur in the first layer or the last layer. Certain architectures may prefer one over the other. * **dropout** – float value, whether to perform dropout on the output of the last layer of the adapter. * **adapter_strategy** – By default, ResidualAddAdapterStrategyConfig. An adapter composition function object. Insertion Form - Module Adapters[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#insertion-form-module-adapters "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Adapter modules can be integrated into many different locations of a given module. For example, it is possible to have an adapter that affects only the outputs of the final layer in each module. We can also have a `Parallel Adapter`[[1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#id6 "Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. Towards a unified view of parameter-efficient transfer learning. 2021. URL: https://arxiv.org/abs/2110.04366, doi:10.48550/ARXIV.2110.04366.")] that operates at the input of the module itself, in parallel to the forward pass of the module. Yet another insertion location is inside the Multi Head Attention Layers. On top of this, while adapters are commonly used only in the layers containing the most parameters (say the Encoder of a network), some models can support adapters in multiple locations (Encoder-Decoder architecture for Language Models, Machine Translation, or even Encoder-Decoder-Joint for ASR with Transducer Loss). As such, NeMo utilizes the concept of `Module Adapters`. `Module Adapters` are very simply defined when adding an adapter - by specifying the module that the adapter should be inserted into. # Get the list of supported modules / locations in a adapter compatible Model print(model.adapter_module_names) # assume ['', 'encoder', 'decoder'] # When calling add_adapter, specify the module name in the left of the colon symbol, and the adapter name afterwords. # The adapter is then directed to the decoder module instead of the default / encoder module. model.add_adapter("decoder:first_adapter", cfg=...) You might note that `model.adapter_module_names` can sometimes return `''` as one of the supported module names - this refers to the “default module”. Generally we try to provide the default as the most commonly used adapter in literature - for example, Encoder adapters in NLP/NMT/ASR. Composition Function - Adapter Strategies[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#composition-function-adapter-strategies "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finally, we discuss how to compose the input and output of adapter modules. In order to generalize this step, we construct `Adapter Strategies`. A strategy is any class (not torch.nn.Module!) that extends `AbstractAdapterStrategy`, and provides a `forward()` method that accepts a specific signature of the inputs and produces an output tensor which combines the input and output with some specific method. We discuss a simple residual additional connection strategy below - that accepts an input to the adapter and an adapters output and simply adds them together. It also supports `stochastic_depth` which enables adapters to be dynamically switched off during training, making training more robust. _class_ nemo.core.classes.mixins.adapter_mixin_strategies.AbstractAdapterStrategy Bases: `ABC` forward(_input:torch.Tensor_,_adapter:torch.nn.Module_,_*_,_module:[AdapterModuleMixin](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html.md#nemo.core.adapter\_mixins.AdapterModuleMixin "nemo.core.adapter\_mixins.AdapterModuleMixin")_,) Forward method that defines how the output of the adapter should be merged with the input, or if it should be merged at all. Also provides the module that called this strategy - thereby allowing access to all other adapters in the calling module. This can be useful if one adapter is a meta adapter, that combines the outputs of various adapters. In such a case, the input can be forwarded across all other adapters, collecting their outputs, and those outputs can then be merged via some strategy. For example, refer to : * [AdapterFusion: Non-Destructive Task Composition for Transfer Learning]([https://arxiv.org/abs/2005.00247](https://arxiv.org/abs/2005.00247.md)) * [Exploiting Adapters for Cross-lingual Low-resource Speech Recognition]([https://arxiv.org/abs/2105.11905](https://arxiv.org/abs/2105.11905.md)) Parameters: * **input** – Original output tensor of the module, or the output of the previous adapter (if more than one adapters are enabled). * **adapter** – The adapter module that is currently required to perform the forward pass. * **module** – The calling module, in its entirety. It is a module that implements AdapterModuleMixin, therefore the strategy can access all other adapters in this module via module.adapter_layer. Returns: The result tensor, after one of the active adapters has finished its forward passes. * * * _class_ nemo.core.classes.mixins.adapter_mixin_strategies.ResidualAddAdapterStrategy(_stochastic\_depth:float=0.0_,_l2\_lambda:float=0.0_,) Bases: `AbstractAdapterStrategy` An implementation of residual addition of an adapter module with its input. Supports stochastic depth regularization. forward(_input:torch.Tensor_,_adapter:torch.nn.Module_,_*_,_module:[AdapterModuleMixin](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html.md#nemo.core.adapter\_mixins.AdapterModuleMixin "nemo.core.adapter\_mixins.AdapterModuleMixin")_,) A basic strategy, comprising of a residual connection over the input, after forward pass by the underlying adapter. Parameters: * **input** – Original output tensor of the module, or the output of the previous adapter (if more than one adapters are enabled). * **adapter** – The adapter module that is currently required to perform the forward pass. * **module** – The calling module, in its entirety. It is a module that implements AdapterModuleMixin, therefore the strategy can access all other adapters in this module via module.adapter_layer. Returns: The result tensor, after one of the active adapters has finished its forward passes. compute_output(_input:torch.Tensor_,_adapter:torch.nn.Module_,_*_,_module:[AdapterModuleMixin](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html.md#nemo.core.adapter\_mixins.AdapterModuleMixin "nemo.core.adapter\_mixins.AdapterModuleMixin")_,)→torch.Tensor Compute the output of a single adapter to some input. Parameters: * **input** – Original output tensor of the module, or the output of the previous adapter (if more than one adapters are enabled). * **adapter** – The adapter module that is currently required to perform the forward pass. * **module** – The calling module, in its entirety. It is a module that implements AdapterModuleMixin, therefore the strategy can access all other adapters in this module via module.adapter_layer. Returns: The result tensor, after one of the active adapters has finished its forward passes. apply_stochastic_depth(_output:torch.Tensor_,_input:torch.Tensor_,_adapter:torch.nn.Module_,_*_,_module:[AdapterModuleMixin](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html.md#nemo.core.adapter\_mixins.AdapterModuleMixin "nemo.core.adapter\_mixins.AdapterModuleMixin")_,) Compute and apply stochastic depth if probability is greater than 0. Parameters: * **output** – The result tensor, after one of the active adapters has finished its forward passes. * **input** – Original output tensor of the module, or the output of the previous adapter (if more than one adapters are enabled). * **adapter** – The adapter module that is currently required to perform the forward pass. * **module** – The calling module, in its entirety. It is a module that implements AdapterModuleMixin, therefore the strategy can access all other adapters in this module via module.adapter_layer. Returns: The result tensor, after stochastic depth has been potentially applied to it. compute_auxiliary_losses(_output:torch.Tensor_,_input:torch.Tensor_,_adapter:torch.nn.Module_,_*_,_module:[AdapterModuleMixin](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html.md#nemo.core.adapter\_mixins.AdapterModuleMixin "nemo.core.adapter\_mixins.AdapterModuleMixin")_,) Compute any auxiliary losses and preserve it in the tensor registry. Parameters: * **output** – The result tensor, after one of the active adapters has finished its forward passes. * **input** – Original output tensor of the module, or the output of the previous adapter (if more than one adapters are enabled). * **adapter** – The adapter module that is currently required to perform the forward pass. * **module** – The calling module, in its entirety. It is a module that implements AdapterModuleMixin, therefore the strategy can access all other adapters in this module via module.adapter_layer. * * * References[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#references "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------- [[2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#id2)] Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. In _International Conference on Machine Learning_, 2790–2799. PMLR, 2019. Links/Buttons: - [1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#id1) - [2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#id2) - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/components.html.md#references) - [AccessMixin](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.access_mixins.AccessMixin) - [AdapterModuleMixin](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/adapters/intro.html.md#nemo.core.adapter_mixins.AdapterModuleMixin) - [https://arxiv.org/abs/2005.00247](https://arxiv.org/abs/2005.00247.md) - [https://arxiv.org/abs/2105.11905](https://arxiv.org/abs/2105.11905.md) - [https://arxiv.org/abs/2110.04366](https://arxiv.org/abs/2110.04366.md) - [doi:10.48550/ARXIV.2110.04366](https://doi.org/10.48550/ARXIV.2110.04366.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md Title: NeMo Core APIs — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html Published Time: Fri, 05 Sep 2025 19:01:32 GMT Markdown Content: NeMo Core APIs[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo-core-apis "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------- Base class for all NeMo models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#base-class-for-all-nemo-models "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.core.ModelPT(_*args:Any_, _**kwargs:Any_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT "Link to this definition") Bases: `LightningModule`, `Model` Interface for Pytorch-lightning based NeMo models on_fit_start()→None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_fit_start "Link to this definition") Register debug hooks. register_artifact(_config\_path:str_,_src:str_,_verify\_src\_exists:bool=True_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.register_artifact "Link to this definition") Register model artifacts with this function. These artifacts (files) will be included inside .nemo file when model.save_to(“mymodel.nemo”) is called. How it works: 1. It always returns existing absolute path which can be used during Model constructor call EXCEPTION: src is None or “” in which case nothing will be done and src will be returned 2. It will add (config_path, model_utils.ArtifactItem()) pair to self.artifacts > If "src" is local existing path: > then it will be returned in absolute path form. > elif "src" starts with "nemo_file:unique_artifact_name": > .nemo will be untarred to a temporary folder location and an actual existing path will be returned > else: > an error will be raised. WARNING: use .register_artifact calls in your models’ constructors. The returned path is not guaranteed to exist after you have exited your model’s constructor. Parameters: * **config_path** (_str_) – Artifact key. Usually corresponds to the model config. * **src** (_str_) – Path to artifact. * **verify_src_exists** (_bool_) – If set to False, then the artifact is optional and register_artifact will return None even if src is not found. Defaults to True. Returns:If src is not None or empty it always returns absolute path which is guaranteed to exist during model instance life Return type: str has_artifacts()→bool[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.has_artifacts "Link to this definition") Returns True if model has artifacts registered has_native_or_submodules_artifacts()→bool[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.has_native_or_submodules_artifacts "Link to this definition") Returns True if it has artifacts or any of the submodules have artifacts has_nemo_submodules()→bool[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.has_nemo_submodules "Link to this definition") Returns True if it has any registered NeMo submodules register_nemo_submodule(_name:str_,_config\_field:str_,_model:[ModelPT](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT "nemo.core.classes.modelPT.ModelPT")_,)→None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.register_nemo_submodule "Link to this definition") Adds a NeMo model as a submodule. Submodule can be accessed via the name attribute on the parent NeMo model this submodule was registered on (self). In the saving process, the whole parent model (self) is held as a solid model with artifacts from the child submodule, the submodule config will be saved to the config_field of the parent model. This method is necessary to create a nested model, e.g. class ParentModel(ModelPT): def __init__ (self, cfg, trainer=None): super(). __init__ (cfg=cfg, trainer=trainer) # annotate type for autocompletion and type checking (optional) self.child_model: Optional[ChildModel] = None if cfg.get("child_model") is not None: self.register_nemo_submodule( name="child_model", config_field="child_model", model=ChildModel(self.cfg.child_model, trainer=trainer), ) # ... other code Parameters: * **name** – name of the attribute for the submodule * **config_field** – field in config, where submodule config should be saved * **model** – NeMo model, instance of ModelPT named_nemo_modules(_prefix\_name:str=''_,_prefix\_config:str=''_,)→Iterator[Tuple[str,str,[ModelPT](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT "nemo.core.classes.modelPT.ModelPT")]][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.named_nemo_modules "Link to this definition") Returns an iterator over all NeMo submodules recursively, yielding tuples of (attribute path, path in config, submodule), starting from the core module Parameters: * **prefix_name** – prefix for the name path * **prefix_config** – prefix for the path in config Returns: Iterator over (attribute path, path in config, submodule), starting from (prefix, self) save_to(_save\_path:str_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.save_to "Link to this definition")Saves model instance (weights and configuration) into .nemo file You can use “restore_from” method to fully restore instance from .nemo file. .nemo file is an archive (tar.gz) with the following:model_config.yaml - model configuration in .yaml format. You can deserialize this into cfg argument for model’s constructor model_wights.ckpt - model checkpoint Parameters: **save_path** – Path to .nemo file where model instance should be saved _classmethod_ restore_from(_restore\_path:str_,_override\_config\_path:omegaconf.OmegaConf|str|None=None_,_map\_location:torch.device|None=None_,_strict:bool=True_,_return\_config:bool=False_,_save\_restore\_connector:[SaveRestoreConnector](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save\_restore\_connector.SaveRestoreConnector "nemo.core.connectors.save\_restore\_connector.SaveRestoreConnector")|None=None_,_trainer:lightning.pytorch.Trainer|None=None_,_validate\_access\_integrity:bool=True_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.restore_from "Link to this definition") Restores model instance (weights and configuration) from .nemo file. Parameters: * **restore_path** – path to .nemo file from which model should be instantiated * **override_config_path** – path to a yaml config that will override the internal config file or an OmegaConf / DictConfig object representing the model config. * **map_location** – Optional torch.device() to map the instantiated model to a device. By default (None), it will select a GPU if available, falling back to CPU otherwise. * **strict** – Passed to load_state_dict. By default True. * **return_config** – If set to true, will return just the underlying config of the restored model as an OmegaConf DictConfig object without instantiating the model. * **trainer** – Optional, a pytorch lightning Trainer object that will be forwarded to the instantiated model’s constructor. * **save_restore_connector** ([_SaveRestoreConnector_](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector "nemo.core.connectors.save_restore_connector.SaveRestoreConnector")) – Can be overridden to add custom save and restore logic. * **Example** – ``` ` model = nemo.collections.asr.models.EncDecCTCModel.restore_from('asr.nemo') assert isinstance(model, nemo.collections.asr.models.EncDecCTCModel) ` ``` Returns: An instance of type cls or its underlying config (if return_config is set). _classmethod_ load_from_checkpoint(_checkpoint\_path:str_,_*args_,_map\_location:Dict[str,str]|str|torch.device|int|Callable|None=None_,_hparams\_file:str|None=None_,_strict:bool=True_,_**kwargs_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.load_from_checkpoint "Link to this definition") Loads ModelPT from checkpoint, with some maintenance of restoration. For documentation, please refer to LightningModule.load_from_checkpoint() documentation. _abstract_ setup_training_data(_train\_data\_config:omegaconf.DictConfig|Dict_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_training_data "Link to this definition") Setups data loader to be used in training Parameters: **train_data_layer_config** – training data layer parameters. Returns: _abstract_ setup_validation_data(_val\_data\_config:omegaconf.DictConfig|Dict_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_validation_data "Link to this definition") Setups data loader to be used in validation :param val_data_layer_config: validation data layer parameters. Returns: setup_test_data(_test\_data\_config:omegaconf.DictConfig|Dict_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_test_data "Link to this definition") (Optionally) Setups data loader to be used in test Parameters: **test_data_layer_config** – test data layer parameters. Returns: setup_multiple_validation_data(_val\_data\_config:omegaconf.DictConfig|Dict_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_multiple_validation_data "Link to this definition") (Optionally) Setups data loader to be used in validation, with support for multiple data loaders. Parameters: **val_data_layer_config** – validation data layer parameters. setup_multiple_test_data(_test\_data\_config:omegaconf.DictConfig|Dict_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_multiple_test_data "Link to this definition") (Optionally) Setups data loader to be used in test, with support for multiple data loaders. Parameters: **test_data_layer_config** – test data layer parameters. setup_megatron_optimization(_optim\_config:Dict[str,Any]|omegaconf.DictConfig_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_megatron_optimization "Link to this definition") Setup mcore optimizer config. Parameters: **optim_config** – Nemo optim args used to set up Mcore optimizer options. setup_optimization(_optim\_config:omegaconf.DictConfig|Dict|None=None_,_optim\_kwargs:Dict[str,Any]|None=None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_optimization "Link to this definition") Prepares an optimizer from a string name and its optional config parameters. Parameters: * **optim_config** – A dictionary containing the following keys: * ”lr”: mandatory key for learning rate. Will raise ValueError if not provided. * ”optimizer”: string name pointing to one of the available optimizers in the registry. If not provided, defaults to “adam”. * ”opt_args”: Optional list of strings, in the format “arg_name=arg_value”. The list of “arg_value” will be parsed and a dictionary of optimizer kwargs will be built and supplied to instantiate the optimizer. * **optim_kwargs** – A dictionary with additional kwargs for the optimizer. Used for non-primitive types that are not compatible with OmegaConf. setup_optimizer_param_groups()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup_optimizer_param_groups "Link to this definition") Used to create param groups for the optimizer. As an example, this can be used to specify per-layer learning rates: optim.SGD([ {‘params’: model.base.parameters()}, {‘params’: model.classifier.parameters(), ‘lr’: 1e-3} ], lr=1e-2, momentum=0.9) See [https://pytorch.org/docs/stable/optim.html](https://pytorch.org/docs/stable/optim.html) for more information. By default, ModelPT will use self.parameters(). Override this method to add custom param groups. In the config file, add ‘optim_param_groups’ to support different LRs for different components (unspecified params will use the default LR): model:optim_param_groups:encoder: lr: 1e-4 momentum: 0.8 decoder: lr: 1e-3 optim: lr: 3e-3 momentum: 0.9 configure_optimizers()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.configure_optimizers "Link to this definition") Configure the optimizer and scheduler. propagate_model_guid()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.propagate_model_guid "Link to this definition") Propagates the model GUID to all submodules, recursively. setup(_stage:str|None=None_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.setup "Link to this definition") Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP. Parameters: **stage** – fit, validate, test or predict train_dataloader()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.train_dataloader "Link to this definition") Get the training dataloader. val_dataloader()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.val_dataloader "Link to this definition") Get the validation dataloader. test_dataloader()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.test_dataloader "Link to this definition") Get the test dataloader. on_validation_epoch_end(_sync\_metrics:bool=False_,)→Dict[str,Dict[str,torch.Tensor]]|None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_validation_epoch_end "Link to this definition") Default DataLoader for Validation set which automatically supports multiple data loaders via multi_validation_epoch_end. If multi dataset support is not required, override this method entirely in base class. In such a case, there is no need to implement multi_validation_epoch_end either. Note If more than one data loader exists, and they all provide val_loss, only the val_loss of the first data loader will be used by default. This default can be changed by passing the special key val_dl_idx: int inside the validation_ds config. Parameters: **outputs** – Single or nested list of tensor outputs from one or more data loaders. Returns: A dictionary containing the union of all items from individual data_loaders, along with merged logs from all data loaders. on_test_epoch_end()→Dict[str,Dict[str,torch.Tensor]]|None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_test_epoch_end "Link to this definition") Default DataLoader for Test set which automatically supports multiple data loaders via multi_test_epoch_end. If multi dataset support is not required, override this method entirely in base class. In such a case, there is no need to implement multi_test_epoch_end either. Note If more than one data loader exists, and they all provide test_loss, only the test_loss of the first data loader will be used by default. This default can be changed by passing the special key test_dl_idx: int inside the test_ds config. Parameters: **outputs** – Single or nested list of tensor outputs from one or more data loaders. Returns: A dictionary containing the union of all items from individual data_loaders, along with merged logs from all data loaders. multi_validation_epoch_end(_outputs:List[Dict[str,torch.Tensor]]_,_dataloader\_idx:int=0_,)→Dict[str,Dict[str,torch.Tensor]]|None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.multi_validation_epoch_end "Link to this definition") Adds support for multiple validation datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders. Parameters: * **outputs** – Same as that provided by LightningModule.on_validation_epoch_end() for a single dataloader. * **dataloader_idx** – int representing the index of the dataloader. Returns: A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix. multi_test_epoch_end(_outputs:List[Dict[str,torch.Tensor]]_,_dataloader\_idx:int=0_,)→Dict[str,Dict[str,torch.Tensor]]|None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.multi_test_epoch_end "Link to this definition") Adds support for multiple test datasets. Should be overriden by subclass, so as to obtain appropriate logs for each of the dataloaders. Parameters: * **outputs** – Same as that provided by LightningModule.on_validation_epoch_end() for a single dataloader. * **dataloader_idx** – int representing the index of the dataloader. Returns: A dictionary of values, optionally containing a sub-dict log, such that the values in the log will be pre-pended by the dataloader prefix. get_validation_dataloader_prefix(_dataloader\_idx:int=0_,)→str[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.get_validation_dataloader_prefix "Link to this definition") Get the name of one or more data loaders, which will be prepended to all logs. Parameters: **dataloader_idx** – Index of the data loader. Returns: str name of the data loader at index provided. get_test_dataloader_prefix(_dataloader\_idx:int=0_)→str[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.get_test_dataloader_prefix "Link to this definition") Get the name of one or more data loaders, which will be prepended to all logs. Parameters: **dataloader_idx** – Index of the data loader. Returns: str name of the data loader at index provided. load_part_of_state_dict(_state\_dict_,_include_,_exclude_,_load\_from\_string=None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.load_part_of_state_dict "Link to this definition") Load a part of the state dict into the model. maybe_init_from_pretrained_checkpoint(_cfg:omegaconf.OmegaConf_,_map\_location:str='cpu'_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.maybe_init_from_pretrained_checkpoint "Link to this definition") Initializes a given model with the parameters obtained via specific config arguments. The state dict of the provided model will be updated with strict=False setting so as to prevent requirement of exact model parameters matching. Initializations: init_from_nemo_model: Str path to a .nemo model in order to load state_dict from single nemo file; if loading from multiple files, pass in a dict where the values have the following fields: > path: Str path to .nemo model > > > include: Optional list of strings, at least one of which needs to be contained in parameter name to be loaded from this .nemo file. Default: everything is included. > > > exclude: Optional list of strings, which can be used to exclude any parameter containing one of these strings from being loaded from this .nemo file. Default: nothing is excluded. > > > hydra usage example: > > init_from_nemo_model:model0: > path: include:[“encoder”] > > model1: > path: include:[“decoder”] exclude:[“embed”] init_from_pretrained_model: Str name of a pretrained model checkpoint (obtained via cloud). The model will be downloaded (or a cached copy will be used), instantiated and then its state dict will be extracted. If loading from multiple models, you can pass in a dict with the same format as for init_from_nemo_model, except with “name” instead of “path” init_from_ptl_ckpt: Str name of a Pytorch Lightning checkpoint file. It will be loaded and the state dict will extracted. If loading from multiple files, you can pass in a dict with the same format as for init_from_nemo_model. Parameters: * **cfg** – The config used to instantiate the model. It need only contain one of the above keys. * **map_location** – str or torch.device() which represents where the intermediate state dict (from the pretrained model or checkpoint) will be loaded. Extract the state dict(s) from a provided .nemo tarfile and save it to a directory. Parameters: * **restore_path** – path to .nemo file from which state dict(s) should be extracted * **save_dir** – directory in which the saved state dict(s) should be stored * **split_by_module** – bool flag, which determins whether the output checkpoint should be for the entire Model, or the individual module’s that comprise the Model * **save_restore_connector** ([_SaveRestoreConnector_](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector "nemo.core.connectors.save_restore_connector.SaveRestoreConnector")) – Can be overrided to add custom save and restore logic. Example To convert the .nemo tarfile into a single Model level PyTorch checkpoint :: state_dict = nemo.collections.asr.models.EncDecCTCModel.extract_state_dict_from(‘asr.nemo’, ‘./asr_ckpts’) To restore a model from a Model level checkpoint :: model = nemo.collections.asr.models.EncDecCTCModel(cfg) # or any other method of restoration model.load_state_dict(torch.load(“./asr_ckpts/model_weights.ckpt”)) To convert the .nemo tarfile into multiple Module level PyTorch checkpoints :: state_dict = nemo.collections.asr.models.EncDecCTCModel.extract_state_dict_from( > > ‘asr.nemo’, ‘./asr_ckpts’, split_by_module=True > > > ) To restore a module from a Module level checkpoint :: model = nemo.collections.asr.models.EncDecCTCModel(cfg) # or any other method of restoration # load the individual components model.preprocessor.load_state_dict(torch.load(“./asr_ckpts/preprocessor.ckpt”)) model.encoder.load_state_dict(torch.load(“./asr_ckpts/encoder.ckpt”)) model.decoder.load_state_dict(torch.load(“./asr_ckpts/decoder.ckpt”)) Returns: The state dict that was loaded from the original .nemo checkpoint prepare_test(_trainer:lightning.pytorch.Trainer_)→bool[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.prepare_test "Link to this definition") Helper method to check whether the model can safely be tested on a dataset after training (or loading a checkpoint). trainer = Trainer() if model.prepare_test(trainer): trainer.test(model) Returns: bool which declares the model safe to test. Provides warnings if it has to return False to guide the user. set_trainer(_trainer:lightning.pytorch.Trainer_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.set_trainer "Link to this definition") Set an instance of Trainer object. Parameters: **trainer** – PyTorch Lightning Trainer object. set_world_size(_trainer:lightning.pytorch.Trainer_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.set_world_size "Link to this definition") Determines the world size from the PyTorch Lightning Trainer. And then updates AppState. Parameters: **trainer** (_Trainer_) – PyTorch Lightning Trainer object summarize(_max\_depth:int=1_,)→lightning.pytorch.utilities.model_summary.ModelSummary[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.summarize "Link to this definition") Summarize this LightningModule. Parameters: **max_depth** – The maximum depth of layer nesting that the summary will include. A value of 0 turns the layer summary off. Default: 1. Returns: The model summary object _property_ num_weights[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.num_weights "Link to this definition") Utility property that returns the total number of parameters of the Model. trainer()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.trainer "Link to this definition") Get the trainer object. _property_ cfg[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.cfg "Link to this definition") Property that holds the finalized internal config of the model. Note Changes to this config are not reflected in the state of the model. Please create a new model using an updated config to properly update the model. _property_ hparams[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.hparams "Link to this definition") Overwrite default hparams property to return the lastest model config. Without this change, the hparams property would return the old config if there was a direct change to self._cfg (e.g., in self.setup_optimization()) that was not done via self.cfg = new_cfg. _property_ validation_step_outputs[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.validation_step_outputs "Link to this definition") Cached outputs of validation_step. It can be a list of items (for single data loader) or a list of lists (for multiple data loaders). Returns: List of outputs of validation_step. _property_ test_step_outputs[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.test_step_outputs "Link to this definition") Cached outputs of test_step. It can be a list of items (for single data loader) or a list of lists (for multiple data loaders). Returns: List of outputs of test_step. _classmethod_ update_save_restore_connector(_save\_restore\_connector_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.update_save_restore_connector "Link to this definition") Update the save_restore_connector for the model. on_train_start()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_train_start "Link to this definition") PyTorch Lightning hook: [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-start](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-start) We use it here to copy the relevant config for dynamic freezing. on_train_batch_start(_batch:Any_,_batch\_idx:int_,_unused:int=0_,)→int|None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_train_batch_start "Link to this definition") PyTorch Lightning hook: [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-start](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-start) We use it here to enable profiling and dynamic freezing. on_train_batch_end(_outputs_,_batch:Any_,_batch\_idx:int_,_unused:int=0_,)→None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_train_batch_end "Link to this definition") PyTorch Lightning hook: [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-end](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-end) We use it here to enable nsys profiling. on_train_end()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_train_end "Link to this definition") PyTorch Lightning hook: [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-end](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-end) We use it here to cleanup the dynamic freezing config. on_test_end()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_test_end "Link to this definition") PyTorch Lightning hook: [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end) on_predict_end()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT.on_predict_end "Link to this definition") PyTorch Lightning hook: [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end) Base Neural Module class[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#base-neural-module-class "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.core.NeuralModule(_*args:Any_, _**kwargs:Any_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.NeuralModule "Link to this definition") Bases: `Module`, `Typing`, `Serialization`, `FileIO` Abstract class offering interface shared between all PyTorch Neural Modules. _property_ num_weights[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.NeuralModule.num_weights "Link to this definition") Utility property that returns the total number of parameters of NeuralModule. input_example(_max\_batch=None_, _max\_dim=None_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.NeuralModule.input_example "Link to this definition") Override this method if random inputs won’t work :returns: A tuple sample of valid input data. freeze()→None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.NeuralModule.freeze "Link to this definition") Freeze all params for inference. This method sets requires_grad to False for all parameters of the module. It also stores the original requires_grad state of each parameter in a dictionary, so that unfreeze() can restore the original state if partial=True is set in unfreeze(). unfreeze(_partial:bool=False_)→None[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.NeuralModule.unfreeze "Link to this definition") Unfreeze all parameters for training. Allows for either total unfreeze or partial unfreeze (if the module was explicitly frozen previously with freeze()). The partial argument is used to determine whether to unfreeze all parameters or only the parameters that were previously unfrozen prior freeze(). Example Consider a model that has an encoder and a decoder module. Assume we want the encoder to be frozen always. ``` `python model.encoder.freeze()  # Freezes all parameters in the encoder explicitly ` ``` During inference, all parameters of the model should be frozen - we do this by calling the model’s freeze method. This step records that the encoder module parameters were already frozen, and so if partial unfreeze is called, we should keep the encoder parameters frozen. ``` `python model.freeze()  # Freezes all parameters in the model; encoder remains frozen ` ``` Now, during fine-tuning, we want to unfreeze the decoder but keep the encoder frozen. We can do this by calling unfreeze(partial=True). ``` `python model.unfreeze(partial=True)  # Unfreezes only the decoder; encoder remains frozen ` ``` Parameters: **partial** – If True, only unfreeze parameters that were previously frozen. If the parameter was already frozen when calling freeze(), it will remain frozen after calling unfreeze(partial=True). as_frozen()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.NeuralModule.as_frozen "Link to this definition") Context manager which temporarily freezes a module, yields control and finally unfreezes the module partially to return to original state. Allows for either total unfreeze or partial unfreeze (if the module was explicitly frozen previously with freeze()). The partial argument is used to determine whether to unfreeze all parameters or only the parameters that were previously unfrozen prior freeze(). Example with model.as_frozen(): # by default, partial = True # Do something with the model pass # Model’s parameters are now back to original state of requires_grad Base Mixin classes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#base-mixin-classes "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.core.Typing Bases: `ABC` An interface which endows module with neural types _property_ input_types _:Dict[str,[NeuralType](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural\_types.NeuralType "nemo.core.neural\_types.neural\_type.NeuralType")]|None_ Define these to enable input neural type checks _property_ output_types _:Dict[str,[NeuralType](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural\_types.NeuralType "nemo.core.neural\_types.neural\_type.NeuralType")]|None_ Define these to enable output neural type checks _validate_input_types(_input\_types=None_,_ignore\_collections=False_,_**kwargs_,) This function does a few things. 1. It ensures that len(self.input_types ) <= len(kwargs) <= len(self.input_types). 2. For each (keyword name, keyword value) passed as input to the wrapped function: * Check if the keyword name exists in the list of valid self.input_types names. * Check if keyword value has the neural_type property. * If it does, then perform a comparative check and assert that neural types are compatible (SAME or GREATER). * Check if keyword value is a container type (list or tuple). If yes, then perform the elementwise test of neural type above on each element of the nested structure, recursively. Parameters: * **input_types** – Either the input_types defined at class level, or the local function overridden type definition. * **ignore_collections** – For backward compatibility, container support can be disabled explicitly using this flag. When set to True, all nesting is ignored and nest-depth checks are skipped. * **kwargs** – Dictionary of argument_name:argument_value pairs passed to the wrapped function upon call. _attach_and_validate_output_types(_out\_objects_,_ignore\_collections=False_,_output\_types=None_,) This function does a few things. 1. It ensures that len(out_object) == len(self.output_types). 2. If the output is a tensor (or list/tuple of list/tuple … of tensors), it attaches a neural_type to it. For objects without the neural_type attribute, such as python objects (dictionaries and lists, primitive data types, structs), no neural_type is attached. Note: tensor.neural_type is only checked during _validate_input_types which is called prior to forward(). Parameters: * **output_types** – Either the output_types defined at class level, or the local function overridden type definition. * **ignore_collections** – For backward compatibility, container support can be disabled explicitly using this flag. When set to True, all nesting is ignored and nest-depth checks are skipped. * **out_objects** – The outputs of the wrapped function. __check_neural_type(_obj_,_metadata:TypecheckMetadata_,_depth:int_,_name:str|None=None_,) Recursively tests whether the obj satisfies the semantic neural type assertion. Can include shape checks if shape information is provided. Parameters: * **obj** – Any python object that can be assigned a value. * **metadata** – TypecheckMetadata object. * **depth** – Current depth of recursion. * **name** – Optional name used of the source obj, used when an error occurs. __attach_neural_type(_obj_,_metadata:TypecheckMetadata_,_depth:int_,_name:str|None=None_,) Recursively attach neural types to a given object - as long as it can be assigned some value. Parameters: * **obj** – Any python object that can be assigned a value. * **metadata** – TypecheckMetadata object. * **depth** – Current depth of recursion. * **name** – Optional name used of the source obj, used when an error occurs. * * * _class_ nemo.core.Serialization Bases: `ABC` _classmethod_ from_config_dict(_config:DictConfig_,_trainer:'Trainer'|None=None_,) Instantiates object using DictConfig-based configuration to_config_dict()→omegaconf.DictConfig Returns object’s configuration to config dictionary * * * _class_ nemo.core.FileIO Bases: `ABC` save_to(_save\_path:str_) Standardized method to save a tarfile containing the checkpoint, config, and any additional artifacts. Implemented via [`nemo.core.connectors.save_restore_connector.SaveRestoreConnector.save_to()`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.save_to "nemo.core.connectors.save_restore_connector.SaveRestoreConnector.save_to"). Parameters: **save_path** – str, path to where the file should be saved. _classmethod_ restore_from(_restore\_path:str_,_override\_config\_path:str|None=None_,_map\_location:'torch.device'|None=None_,_strict:bool=True_,_return\_config:bool=False_,_trainer:'Trainer'|None=None_,_save\_restore\_connector:[SaveRestoreConnector](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save\_restore\_connector.SaveRestoreConnector "nemo.core.connectors.save\_restore\_connector.SaveRestoreConnector")=None_,) Restores model instance (weights and configuration) from a .nemo file Parameters: * **restore_path** – path to .nemo file from which model should be instantiated * **override_config_path** – path to a yaml config that will override the internal config file or an OmegaConf / DictConfig object representing the model config. * **map_location** – Optional torch.device() to map the instantiated model to a device. By default (None), it will select a GPU if available, falling back to CPU otherwise. * **strict** – Passed to load_state_dict. By default True * **return_config** – If set to true, will return just the underlying config of the restored model as an OmegaConf DictConfig object without instantiating the model. * **trainer** – An optional Trainer object, passed to the model constructor. * **save_restore_connector** – An optional SaveRestoreConnector object that defines the implementation of the restore_from() method. _classmethod_ from_config_file(_path2yaml\_file:str_) Instantiates an instance of NeMo Model from YAML config file. Weights will be initialized randomly. :param path2yaml_file: path to yaml file with model configuration Returns: to_config_file(_path2yaml\_file:str_) Saves current instance’s configuration to YAML config file. Weights will not be saved. :param path2yaml_file: path2yaml_file: path to yaml file where model model configuration will be saved Returns: Base Connector classes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#base-connector-classes "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.core.connectors.save_restore_connector.SaveRestoreConnector[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector "Link to this definition") Bases: `object` Connector for saving and restoring models. save_to(_model:[ModelPT](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.ModelPT "nemo.core.classes.modelPT.ModelPT")_,_save\_path:str_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.save_to "Link to this definition") Saves model instance (weights and configuration) into .nemo file. You can use “restore_from” method to fully restore instance from .nemo file. .nemo file is an archive (tar.gz) with the following:model_config.yaml - model configuration in .yaml format. You can deserialize this into cfg argument for model’s constructor model_wights.ckpt - model checkpoint Parameters: * **model** – ModelPT object to be saved. * **save_path** – Path to .nemo file where model instance should be saved Returns:Path to .nemo file where model instance was saved (same as save_path argument) or None if not rank 0 The path can be a directory if the flag pack_nemo_file is set to False. Return type: str load_config_and_state_dict(_calling\_cls_,_restore\_path:str_,_override\_config\_path:omegaconf.OmegaConf|str|None=None_,_map\_location:torch.device|None=None_,_strict:bool=True_,_return\_config:bool=False_,_trainer:lightning.pytorch.trainer.trainer.Trainer|None=None_,_validate\_access\_integrity:bool=True_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.load_config_and_state_dict "Link to this definition") Restores model instance (weights and configuration) into .nemo file Parameters: * **restore_path** – path to .nemo file from which model should be instantiated * **override_config_path** – path to a yaml config that will override the internal config file or an OmegaConf / DictConfig object representing the model config. * **map_location** – Optional torch.device() to map the instantiated model to a device. By default (None), it will select a GPU if available, falling back to CPU otherwise. * **strict** – Passed to load_state_dict. By default True * **return_config** – If set to true, will return just the underlying config of the restored model as an OmegaConf DictConfig object without instantiating the model. Example ``` ` model = nemo.collections.asr.models.EncDecCTCModel.restore_from('asr.nemo') assert isinstance(model, nemo.collections.asr.models.EncDecCTCModel) ` ``` Returns: An instance of type cls or its underlying config (if return_config is set). modify_state_dict(_conf_, _state\_dict_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.modify_state_dict "Link to this definition") Utility method that allows to modify the state dict before loading parameters into a model. :param conf: A model level OmegaConf object. :param state_dict: The state dict restored from the checkpoint. Returns: A potentially modified state dict. load_instance_with_state_dict(_instance_,_state\_dict_,_strict_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.load_instance_with_state_dict "Link to this definition") Utility method that loads a model instance with the (potentially modified) state dict. Parameters: * **instance** – ModelPT subclass instance. * **state_dict** – The state dict (which may have been modified) * **strict** – Bool, whether to perform strict checks when loading the state dict. restore_from(_calling\_cls_,_restore\_path:str_,_override\_config\_path:omegaconf.OmegaConf|str|None=None_,_map\_location:torch.device|None=None_,_strict:bool=True_,_return\_config:bool=False_,_trainer:lightning.pytorch.trainer.trainer.Trainer|None=None_,_validate\_access\_integrity:bool=True_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.restore_from "Link to this definition") Restores model instance (weights and configuration) into .nemo file Parameters: * **restore_path** – path to .nemo file from which model should be instantiated * **override_config_path** – path to a yaml config that will override the internal config file or an OmegaConf / DictConfig object representing the model config. * **map_location** – Optional torch.device() to map the instantiated model to a device. By default (None), it will select a GPU if available, falling back to CPU otherwise. * **strict** – Passed to load_state_dict. By default True * **return_config** – If set to true, will return just the underlying config of the restored model as an OmegaConf DictConfig object without instantiating the model. * **trainer** – An optional Trainer object, passed to the model constructor. Example ``` ` model = nemo.collections.asr.models.EncDecCTCModel.restore_from('asr.nemo') assert isinstance(model, nemo.collections.asr.models.EncDecCTCModel) ` ``` Returns: An instance of type cls or its underlying config (if return_config is set). Extract the state dict(s) from a provided .nemo tarfile and save it to a directory. Parameters: * **restore_path** – path to .nemo file from which state dict(s) should be extracted * **save_dir** – directory in which the saved state dict(s) should be stored * **split_by_module** – bool flag, which determins whether the output checkpoint should be for the entire Model, or the individual module’s that comprise the Model Example To convert the .nemo tarfile into a single Model level PyTorch checkpoint :: state_dict = nemo.collections.asr.models.EncDecCTCModel.extract_state_dict_from(‘asr.nemo’, ‘./asr_ckpts’) To restore a model from a Model level checkpoint :: model = nemo.collections.asr.models.EncDecCTCModel(cfg) # or any other method of restoration model.load_state_dict(torch.load(“./asr_ckpts/model_weights.ckpt”)) To convert the .nemo tarfile into multiple Module level PyTorch checkpoints :: state_dict = nemo.collections.asr.models.EncDecCTCModel.extract_state_dict_from( > ‘asr.nemo’, ‘./asr_ckpts’, split_by_module=True ) To restore a module from a Module level checkpoint :: model = nemo.collections.asr.models.EncDecCTCModel(cfg) # or any other method of restoration # load the individual components model.preprocessor.load_state_dict(torch.load(“./asr_ckpts/preprocessor.ckpt”)) model.encoder.load_state_dict(torch.load(“./asr_ckpts/encoder.ckpt”)) model.decoder.load_state_dict(torch.load(“./asr_ckpts/decoder.ckpt”)) Returns: The state dict that was loaded from the original .nemo checkpoint register_artifact(_model_,_config\_path:str_,_src:str_,_verify\_src\_exists:bool=True_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.register_artifact "Link to this definition") Register model artifacts with this function. These artifacts (files) will be included inside .nemo file when model.save_to(“mymodel.nemo”) is called. How it works: 1. It always returns existing absolute path which can be used during Model constructor call EXCEPTION: src is None or “” in which case nothing will be done and src will be returned 2. It will add (config_path, model_utils.ArtifactItem()) pair to self.artifacts > If "src" is local existing path: > then it will be returned in absolute path form > elif "src" starts with "nemo_file:unique_artifact_name": > .nemo will be untarred to a temporary folder location and an actual existing path will be returned > else: > an error will be raised. WARNING: use .register_artifact calls in your models’ constructors. The returned path is not guaranteed to exist after you have exited your model’s constructor. Parameters: * **model** – ModelPT object to register artifact for. * **config_path** (_str_) – Artifact key. Usually corresponds to the model config. * **src** (_str_) – Path to artifact. * **verify_src_exists** (_bool_) – If set to False, then the artifact is optional and register_artifact will return None even if src is not found. Defaults to True. Returns:If src is not None or empty it always returns absolute path which is guaranteed to exists during model instance life Return type: str _property_ model_config_yaml _:str_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.model_config_yaml "Link to this definition") Get the path to the model config yaml file. _property_ model_weights_ckpt _:str_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.model_weights_ckpt "Link to this definition") Get the path to the model weights checkpoint file. Get the path to the model extracted directory. _property_ pack_nemo_file _:bool_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.pack_nemo_file "Link to this definition") Get the flag for packing a nemo file. Base Mixin Classes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#id1 "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------ _class_ nemo.core.classes.mixins.access_mixins.AccessMixin[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.access_mixins.AccessMixin "Link to this definition") Bases: `ABC` Allows access to output of intermediate layers of a model register_accessible_tensor(_name_, _tensor_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.access_mixins.AccessMixin.register_accessible_tensor "Link to this definition") Register tensor for later use. _classmethod_ get_module_registry(_module:torch.nn.Module_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.access_mixins.AccessMixin.get_module_registry "Link to this definition") Extract all registries from named submodules, return dictionary where the keys are the flattened module names, the values are the internal registry of each such module. reset_registry(_registry\_key:str|None=None_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.access_mixins.AccessMixin.reset_registry "Link to this definition") Reset the registries of all named sub-modules _property_ access_cfg[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.access_mixins.AccessMixin.access_cfg "Link to this definition") Returns: The global access config shared across all access mixin modules. * * * _class_ nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO "Link to this definition") Bases: `ABC` Mixin that provides Hugging Face file IO functionality for NeMo models. It is usually implemented as a mixin to ModelPT. This mixin provides the following functionality: - search_huggingface_models(): Search the hub programmatically via some model filter. - push_to_hf_hub(): Push a model to the hub. _classmethod_ get_hf_model_filter()→Dict[str,Any][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO.get_hf_model_filter "Link to this definition") Generates a filter for HuggingFace models. Additionaly includes default values of some metadata about results returned by the Hub. Metadata: resolve_card_info: Bool flag, if set, returns the model card metadata. Default: False. limit_results: Optional int, limits the number of results returned. Returns: A dict representing the arguments passable to huggingface list_models(). _classmethod_ search_huggingface_models(_model\_filter:Dict[str,Any]|None=None_,)→Iterable[huggingface_hub.hf_api.ModelInfo][#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO.search_huggingface_models "Link to this definition") Should list all pre-trained models available via Hugging Face Hub. The following metadata can be passed via the model_filter for additional results. Metadata: > resolve_card_info: Bool flag, if set, returns the model card metadata. Default: False. > > > limit_results: Optional int, limits the number of results returned. # You can replace with any subclass of ModelPT. from nemo.core import ModelPT # Get default filter dict filt = .get_hf_model_filter() # Make any modifications to the filter as necessary filt['language'] = [...] filt['task'] = ... filt['tags'] = [...] # Add any metadata to the filter as needed (kwargs to list_models) filt['limit'] = 5 # Obtain model info model_infos = .search_huggingface_models(model_filter=filt) # Browse through cards and select an appropriate one card = model_infos[0] # Restore model using `modelId` of the card. model = ModelPT.from_pretrained(card.modelId) Parameters: **model_filter** – Optional Dictionary (for Hugging Face Hub kwargs) that filters the returned list of compatible model cards, and selects all results from each filter. Users can then use model_card.modelId in from_pretrained() to restore a NeMo Model. Returns: A list of ModelInfo entries. push_to_hf_hub(_repo\_id:str_,_*_,_pack\_nemo\_file:bool=True_,_model\_card:huggingface\_hub.ModelCard|None|object|str=None_,_commit\_message:str='Push model using huggingface\_hub.'_,_private:bool=False_,_api\_endpoint:str|None=None_,_token:str|None=None_,_branch:str|None=None_,_allow\_patterns:List[str]|str|None=None_,_ignore\_patterns:List[str]|str|None=None_,_delete\_patterns:List[str]|str|None=None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO.push_to_hf_hub "Link to this definition") Upload model checkpoint to the Hub. Use allow_patterns and ignore_patterns to precisely filter which files should be pushed to the hub. Use delete_patterns to delete existing remote files in the same commit. See [upload_folder] reference for more details. Parameters: * **repo_id** (str) – ID of the repository to push to (example: “username/my-model”). * **pack_nemo_file** (bool, _optional_, defaults to True) – Whether to pack the model checkpoint and configuration into a single .nemo file. If set to false, uploads the contents of the directory containing the model checkpoint and configuration plus additional artifacts. * **model_card** (ModelCard, _optional_) – Model card to upload with the model. If None, will use the model card template provided by the class itself via generate_model_card(). Any object that implements str(obj) can be passed here. Two keyword replacements are passed to generate_model_card(): model_name and repo_id. If the model card generates a string, and it contains {model_name} or {repo_id}, they will be replaced with the actual values. * **commit_message** (str, _optional_) – Message to commit while pushing. * **private** (bool, _optional_, defaults to False) – Whether the repository created should be private. * **api_endpoint** (str, _optional_) – The API endpoint to use when pushing the model to the hub. * **token** (str, _optional_) – The token to use as HTTP bearer authorization for remote files. By default, it will use the token cached when running huggingface-cli login. * **branch** (str, _optional_) – The git branch on which to push the model. This defaults to “main”. * **allow_patterns** (List[str] or str, _optional_) – If provided, only files matching at least one pattern are pushed. * **ignore_patterns** (List[str] or str, _optional_) – If provided, files matching any of the patterns are not pushed. * **delete_patterns** (List[str] or str, _optional_) – If provided, remote files matching any of the patterns will be deleted from the repo. Returns: The url of the uploaded HF repo. Neural Type checking[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#neural-type-checking "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.core.classes.common.typecheck(_input\_types:[TypeState](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.TypeState "nemo.core.classes.common.typecheck.TypeState")|Dict[str,[NeuralType](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural\_types.NeuralType "nemo.core.neural\_types.NeuralType")]=TypeState.UNINITIALIZED_,_output\_types:[TypeState](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.TypeState "nemo.core.classes.common.typecheck.TypeState")|Dict[str,[NeuralType](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural\_types.NeuralType "nemo.core.neural\_types.NeuralType")]=TypeState.UNINITIALIZED_,_ignore\_collections:bool=False_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck "Link to this definition") Bases: `object` A decorator which performs input-output neural type checks, and attaches neural types to the output of the function that it wraps. Requires that the class inherit from `Typing` in order to perform type checking, and will raise an error if that is not the case. # Usage (Class level type support) @typecheck() def fn(self, arg1, arg2, ...): ... # Usage (Function level type support) @typecheck(input_types=..., output_types=...) def fn(self, arg1, arg2, ...): ... Points to be noted: 1. The brackets () in @typecheck() are necessary. > You will encounter a TypeError: __init__() takes 1 positional argument but X were given without those brackets. 2. The function can take any number of positional arguments during definition. > When you call this function, all arguments must be passed using kwargs only. __call__ (_wrapped_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.__call__ "Link to this definition") Call self as a function. _class_ TypeState(_value_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.TypeState "Link to this definition") Bases: `Enum` Placeholder to denote the default value of type information provided. If the constructor of this decorator is used to override the class level type definition, this enum value indicate that types will be overridden. wrapped_call(_wrapped_,_instance:Typing_,_args_,_kwargs_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.wrapped_call "Link to this definition") Wrapper method that can be used on any function of a class that implements `Typing`. By default, it will utilize the input_types and output_types properties of the class inheriting Typing. Local function level overrides can be provided by supplying dictionaries as arguments to the decorator. Parameters: * **input_types** – Union[TypeState, Dict[str, NeuralType]]. By default, uses the global input_types. * **output_types** – Union[TypeState, Dict[str, NeuralType]]. By default, uses the global output_types. * **ignore_collections** – Bool. Determines if container types should be asserted for depth checks, or if depth checks are skipped entirely. _static_ set_typecheck_enabled(_enabled:bool=True_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.set_typecheck_enabled "Link to this definition") Global method to enable/disable typechecking. Parameters: **enabled** – bool, when True will enable typechecking. _static_ disable_checks()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.disable_checks "Link to this definition") Context manager that temporarily disables type checking within its context. _static_ set_semantic_check_enabled(_enabled:bool=True_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.set_semantic_check_enabled "Link to this definition") Global method to enable/disable semantic typechecking. Parameters: **enabled** – bool, when True will enable semantic typechecking. _static_ disable_semantic_checks()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.disable_semantic_checks "Link to this definition") Context manager that temporarily disables semantic type checking within its context. Neural Type classes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#neural-type-classes "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.core.neural_types.NeuralType(_axes:Any|None=None_,_elements\_type:Any|None=None_,_optional:bool=False_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.NeuralType "Link to this definition") Bases: `object` This is the main class which would represent neural type concept. It is used to represent _the types_ of inputs and outputs. Parameters: * **axes** (_Optional_ _[_ _Tuple_ _]_) – a tuple of AxisTypes objects representing the semantics of what varying each axis means You can use a short, string-based form here. For example: (‘B’, ‘C’, ‘H’, ‘W’) would correspond to an NCHW format frequently used in computer vision. (‘B’, ‘T’, ‘D’) is frequently used for signal processing and means [batch, time, dimension/channel]. * **elements_type** ([_ElementType_](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.elements.ElementType "nemo.core.neural_types.elements.ElementType")) – an instance of ElementType class representing the semantics of what is stored inside the tensor. For example: logits (LogitsType), log probabilities (LogprobType), etc. * **optional** (_bool_) – By default, this is false. If set to True, it would means that input to the port of this type can be optional. compare(_second_,)→[NeuralTypeComparisonResult](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.comparison.NeuralTypeComparisonResult "nemo.core.neural_types.comparison.NeuralTypeComparisonResult")[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.NeuralType.compare "Link to this definition") Performs neural type comparison of self with second. When you chain two modules’ inputs/outputs via __call__ method, this comparison will be called to ensure neural type compatibility. compare_and_raise_error(_parent\_type\_name_,_port\_name_,_second\_object_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.NeuralType.compare_and_raise_error "Link to this definition") Method compares definition of one type with another and raises an error if not compatible. * * * _class_ nemo.core.neural_types.axes.AxisType(_kind:AxisKindAbstract_,_size:int|None=None_,_is\_list=False_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.axes.AxisType "Link to this definition") Bases: `object` This class represents axis semantics and (optionally) it’s dimensionality :param kind: what kind of axis it is? For example Batch, Height, etc. :type kind: AxisKindAbstract :param size: specify if the axis should have a fixed size. By default it is set to None and you :type size: int, optional :param typically do not want to set it for Batch and Time: :param is_list: whether this is a list or a tensor axis :type is_list: bool, default=False * * * _class_ nemo.core.neural_types.elements.ElementType[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.elements.ElementType "Link to this definition") Bases: `ABC` Abstract class defining semantics of the tensor elements. We are relying on Python for inheritance checking _property_ type_parameters _:Dict[str,Any]_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.elements.ElementType.type_parameters "Link to this definition") Override this property to parametrize your type. For example, you can specify ‘storage’ type such as float, int, bool with ‘dtype’ keyword. Another example, is if you want to represent a signal with a particular property (say, sample frequency), then you can put sample_freq->value in there. When two types are compared their type_parameters must match. _property_ fields[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.elements.ElementType.fields "Link to this definition") This should be used to logically represent tuples/structures. For example, if you want to represent a bounding box (x, y, width, height) you can put a tuple with names (‘x’, y’, ‘w’, ‘h’) in here. Under the hood this should be converted to the last tesnor dimension of fixed size = len(fields). When two types are compared their fields must match. * * * _class_ nemo.core.neural_types.comparison.NeuralTypeComparisonResult(_value_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.comparison.NeuralTypeComparisonResult "Link to this definition") Bases: `Enum` The result of comparing two neural type objects for compatibility. When comparing A.compare_to(B): Experiment manager[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#experiment-manager "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.utils.exp_manager.exp_manager(_trainer:lightning.pytorch.Trainer_,_cfg:omegaconf.DictConfig|Dict|None=None_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.utils.exp_manager.exp_manager "Link to this definition") Bases: exp_manager is a helper function used to manage folders for experiments. It follows the pytorch lightning paradigm of exp_dir/model_or_experiment_name/version. If the lightning trainer has a logger, exp_manager will get exp_dir, name, and version from the logger. Otherwise it will use the exp_dir and name arguments to create the logging directory. exp_manager also allows for explicit folder creation via explicit_log_dir. The version can be a datetime string or an integer. Datestime version can be disabled if use_datetime_version is set to False. It optionally creates TensorBoardLogger, WandBLogger, DLLogger, MLFlowLogger, ClearMLLogger, ModelCheckpoint objects from pytorch lightning. It copies sys.argv, and git information if available to the logging directory. It creates a log file for each process to log their output into. exp_manager additionally has a resume feature (resume_if_exists) which can be used to continuing training from the constructed log_dir. When you need to continue the training repeatedly (like on a cluster which you need multiple consecutive jobs), you need to avoid creating the version folders. Therefore from v1.0.0, when resume_if_exists is set to True, creating the version folders is ignored. Parameters: * **trainer** (_lightning.pytorch.Trainer_) – The lightning trainer. * **cfg** (_DictConfig_ _,_ _dict_) – Can have the following keys: * explicit_log_dir (str, Path): Can be used to override exp_dir/name/version folder creation. Defaults to None, which will use exp_dir, name, and version to construct the logging directory. * exp_dir (str, Path): The base directory to create the logging directory. Defaults to None, which logs to ./nemo_experiments. * name (str): The name of the experiment. Defaults to None which turns into “default” via name = name or “default”. * version (str): The version of the experiment. Defaults to None which uses either a datetime string or lightning’s TensorboardLogger system of using version_{int}. * use_datetime_version (bool): Whether to use a datetime string for version. Defaults to True. * resume_if_exists (bool): Whether this experiment is resuming from a previous run. If True, it sets trainer._checkpoint_connector._ckpt_path so that the trainer should auto-resume. exp_manager will move files under log_dir to log_dir/run_{int}. Defaults to False. From v1.0.0, when resume_if_exists is True, we would not create version folders to make it easier to find the log folder for next runs. * resume_past_end (bool): exp_manager errors out if resume_if_exists is True and a checkpoint matching `*end.ckpt` indicating a previous training run fully completed. This behaviour can be disabled, in which case the `*end.ckpt` will be loaded by setting resume_past_end to True. Defaults to False. * resume_ignore_no_checkpoint (bool): exp_manager errors out if resume_if_exists is True and no checkpoint could be found. This behaviour can be disabled, in which case exp_manager will print a message and continue without restoring, by setting resume_ignore_no_checkpoint to True. Defaults to False. * resume_from_checkpoint (str): Can be used to specify a path to a specific checkpoint file to load from. This will override any checkpoint found when resume_if_exists is True. Defaults to None. * create_tensorboard_logger (bool): Whether to create a tensorboard logger and attach it to the pytorch lightning trainer. Defaults to True. * summary_writer_kwargs (dict): A dictionary of kwargs that can be passed to lightning’s TensorboardLogger class. Note that log_dir is passed by exp_manager and cannot exist in this dict. Defaults to None. * create_wandb_logger (bool): Whether to create a Weights and Baises logger and attach it to the pytorch lightning trainer. Defaults to False. * wandb_logger_kwargs (dict): A dictionary of kwargs that can be passed to lightning’s WandBLogger class. Note that name and project are required parameters if create_wandb_logger is True. Defaults to None. * create_mlflow_logger (bool): Whether to create an MLFlow logger and attach it to the pytorch lightning training. Defaults to False * mlflow_logger_kwargs (dict): optional parameters for the MLFlow logger * create_dllogger_logger (bool): Whether to create an DLLogger logger and attach it to the pytorch lightning training. Defaults to False * dllogger_logger_kwargs (dict): optional parameters for the DLLogger logger * create_clearml_logger (bool): Whether to create an ClearML logger and attach it to the pytorch lightning training. Defaults to False * clearml_logger_kwargs (dict): optional parameters for the ClearML logger * create_checkpoint_callback (bool): Whether to create a ModelCheckpoint callback and attach it to the pytorch lightning trainer. The ModelCheckpoint saves the top 3 models with the best “val_loss”, the most recent checkpoint under `*last.ckpt`, and the final checkpoint after training completes under `*end.ckpt`. Defaults to True. * create_early_stopping_callback (bool): Flag to decide if early stopping should be used to stop training. Default is False. See EarlyStoppingParams dataclass above. * create_preemption_callback (bool): Flag to decide whether to enable preemption callback to save checkpoints and exit training immediately upon preemption. Default is True. * create_straggler_detection_callback (bool): Use straggler detection callback. Default is False. * create_fault_tolerance_callback (bool): Use fault tolerance callback. Default is False. * files_to_copy (list): A list of files to copy to the experiment logging directory. Defaults to None which copies no files. * log_local_rank_0_only (bool): Whether to only create log files for local rank 0. Defaults to False. Set this to True if you are using DDP with many GPUs and do not want many log files in your exp dir. * log_global_rank_0_only (bool): Whether to only create log files for global rank 0. Defaults to False. Set this to True if you are using DDP with many GPUs and do not want many log files in your exp dir. * max_time (str): The maximum wall clock time _per run_. This is intended to be used on clusters where you want a checkpoint to be saved after this specified time and be able to resume from that checkpoint. Defaults to None. * seconds_to_sleep (float): seconds to sleep non rank 0 processes for. Used to give enough time for rank 0 to initialize * train_time_interval (timedelta): pass an object of timedelta to save the model every timedelta. Defaults to None. (use _target_ with hydra to achieve this) Returns:The final logging directory where logging files are saved. Usually the concatenation of exp_dir, name, and version. Return type: log_dir (Path) _class_ nemo.utils.exp_manager.ExpManagerConfig(_explicit\_log\_dir:str|None=None_,_exp\_dir:str|None=None_,_name:str|None=None_,_version:str|None=None_,_use\_datetime\_version:bool|None=True_,_resume\_if\_exists:bool|None=False_,_resume\_past\_end:bool|None=False_,_resume\_ignore\_no\_checkpoint:bool|None=False_,_resume\_from\_checkpoint:str|None=None_,_create\_tensorboard\_logger:bool|None=True_,_summary\_writer\_kwargs:~typing.Dict[~typing.Any_,_~typing.Any]|None=None_,_create\_wandb\_logger:bool|None=False_,_wandb\_logger\_kwargs:~typing.Dict[~typing.Any_,_~typing.Any]|None=None_,_create\_mlflow\_logger:bool|None=False_,_mlflow\_logger\_kwargs:~nemo.utils.loggers.mlflow\_logger.MLFlowParams|None=_,_create\_dllogger\_logger:bool|None=False_,_dllogger\_logger\_kwargs:~nemo.utils.loggers.dllogger.DLLoggerParams|None=_,_create\_clearml\_logger:bool|None=False_,_clearml\_logger\_kwargs:~nemo.utils.loggers.clearml\_logger.ClearMLParams|None=_,_create\_neptune\_logger:bool|None=False_,_neptune\_logger\_kwargs:~typing.Dict[~typing.Any_,_~typing.Any]|None=None_,_create\_checkpoint\_callback:bool|None=True_,_checkpoint\_callback\_params:~nemo.utils.exp\_manager.CallbackParams|None=_,_create\_early\_stopping\_callback:bool|None=False_,_early\_stopping\_callback\_params:~nemo.utils.exp\_manager.EarlyStoppingParams|None=_,_create\_preemption\_callback:bool|None=True_,_files\_to\_copy:~typing.List[str]|None=None_,_log\_step\_timing:bool|None=True_,_log\_delta\_step\_timing:bool|None=False_,_step\_timing\_kwargs:~nemo.utils.exp\_manager.StepTimingParams|None=_,_log\_local\_rank\_0\_only:bool|None=False_,_log\_global\_rank\_0\_only:bool|None=False_,_disable\_validation\_on\_resume:bool|None=True_,_ema:~nemo.utils.exp\_manager.EMAParams|None=_,_max\_time\_per\_run:str|None=None_,_seconds\_to\_sleep:float=5_,_create\_straggler\_detection\_callback:bool|None=False_,_straggler\_detection\_params:~nemo.utils.exp\_manager.StragglerDetectionParams|None=_,_create\_fault\_tolerance\_callback:bool|None=False_,_fault\_tolerance:~nemo.utils.exp\_manager.FaultToleranceParams|None=_,_log\_tflops\_per\_sec\_per\_gpu:bool|None=True_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.utils.exp_manager.ExpManagerConfig "Link to this definition") Bases: `object` Experiment Manager config for validation of passed arguments. Exportable[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#exportable "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.core.classes.exportable.Exportable[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable "Link to this definition") Bases: `ABC` This Interface should be implemented by particular classes derived from nemo.core.NeuralModule or nemo.core.ModelPT. It gives these entities ability to be exported for deployment to formats such as ONNX. Usage: # exporting pre-trained model to ONNX file for deployment. model.eval() model.to(‘cuda’) # or to(‘cpu’) if you don’t have GPU model.export(‘mymodel.onnx’, [options]) # all arguments apart from output are optional. export(_output:str_,_input\_example=None_,_verbose=False_,_do\_constant\_folding=True_,_onnx\_opset\_version=None_,_check\_trace:bool|List[torch.Tensor]=False_,_dynamic\_axes=None_,_check\_tolerance=0.01_,_export\_modules\_as\_functions=False_,_keep\_initializers\_as\_inputs=None_,_use\_dynamo=False_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.export "Link to this definition") Exports the model to the specified format. The format is inferred from the file extension of the output file. Parameters: * **output** (_str_) – Output file name. File extension be .onnx, .pt, or .ts, and is used to select export path of the model. * **input_example** (_list_ _or_ _dict_) – Example input to the model’s forward function. This is used to trace the model and export it to ONNX/TorchScript. If the model takes multiple inputs, then input_example should be a list of input examples. If the model takes named inputs, then input_example should be a dictionary of input examples. * **verbose** (_bool_) – If True, will print out a detailed description of the model’s export steps, along with the internal trace logs of the export process. * **do_constant_folding** (_bool_) – If True, will execute constant folding optimization on the model’s graph before exporting. This is ONNX specific. * **onnx_opset_version** (_int_) – The ONNX opset version to export the model to. If None, will use a reasonable default version. * **check_trace** (_bool_) – If True, will verify that the model’s output matches the output of the traced model, upto some tolerance. * **dynamic_axes** (_dict_) – A dictionary mapping input and output names to their dynamic axes. This is used to specify the dynamic axes of the model’s inputs and outputs. If the model takes multiple inputs, then dynamic_axes should be a list of dictionaries. If the model takes named inputs, then dynamic_axes should be a dictionary of dictionaries. If None, will use the dynamic axes of the input_example derived from the NeuralType of the input and output of the model. * **check_tolerance** (_float_) – The tolerance to use when checking the model’s output against the traced model’s output. This is only used if check_trace is True. Note the high tolerance is used because the traced model is not guaranteed to be 100% accurate. * **export_modules_as_functions** (_bool_) – If True, will export the model’s submodules as functions. This is ONNX specific. * **keep_initializers_as_inputs** (_bool_) – If True, will keep the model’s initializers as inputs in the onnx graph. This is ONNX specific. * **use_dynamo** (_bool_) – If True, use onnx.dynamo_export() instead of onnx.export(). This is ONNX specific. Returns: A tuple of two outputs. Item 0 in the output is a list of outputs, the outputs of each subnet exported. Item 1 in the output is a list of string descriptions. The description of each subnet exported can be used for logging purposes. _property_ disabled_deployment_input_names _:List[str]_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.disabled_deployment_input_names "Link to this definition") Implement this method to return a set of input names disabled for export _property_ disabled_deployment_output_names _:List[str]_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.disabled_deployment_output_names "Link to this definition") Implement this method to return a set of output names disabled for export _property_ supported_export_formats _:List[ExportFormat]_[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.supported_export_formats "Link to this definition") Implement this method to return a set of export formats supported. Default is all types. get_export_subnet(_subnet=None_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.get_export_subnet "Link to this definition") Returns Exportable subnet model/module to export list_export_subnets()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.list_export_subnets "Link to this definition") Returns default set of subnet names exported for this model First goes the one receiving input (input_example) get_export_config()[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.get_export_config "Link to this definition") Returns export_config dictionary set_export_config(_args_)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.set_export_config "Link to this definition") Sets/updates export_config dictionary Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.exportable.Exportable.set_export_config) - [SaveRestoreConnector](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector) - [https://pytorch.org/docs/stable/optim.html](https://pytorch.org/docs/stable/optim.html) - [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-start](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-start) - [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-start](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-start) - [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-end](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-batch-end) - [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-end](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-train-end) - [https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#on-test-end) - [NeuralType](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.NeuralType) - [nemo.core.connectors.save_restore_connector.SaveRestoreConnector.save_to()](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.connectors.save_restore_connector.SaveRestoreConnector.save_to) - [TypeState](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.common.typecheck.TypeState) - [ElementType](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.elements.ElementType) - [NeuralTypeComparisonResult](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.neural_types.comparison.NeuralTypeComparisonResult) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md Title: NeMo Models — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html Published Time: Thu, 30 Oct 2025 07:07:32 GMT Markdown Content: NeMo Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#nemo-models "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------- Basics[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#basics "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------- NeMo models contain everything needed to train and reproduce conversational AI models: * neural network architectures * datasets/data loaders * data preprocessing/postprocessing * data augmentors * optimizers and schedulers * tokenizers * language models NeMo uses [Hydra](https://hydra.cc/) for configuring both NeMo models and the PyTorch Lightning Trainer. Note Every NeMo model has an example configuration file and training script that can be found [here](https://github.com/NVIDIA/NeMo/tree/stable/examples). The end result of using NeMo, [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning), and Hydra is that NeMo models all have the same look and feel and are also fully compatible with the PyTorch ecosystem. Pretrained[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#pretrained "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------ NeMo comes with many pretrained models for each of our collections: ASR, NLP, and TTS. Every pretrained NeMo model can be downloaded and used with the `from_pretrained()` method. As an example, we can instantiate QuartzNet with the following: import nemo.collections.asr as nemo_asr model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En") To see all available pretrained models for a specific NeMo model, use the `list_available_models()` method: nemo_asr.models.EncDecCTCModel.list_available_models() For detailed information on the available pretrained models, refer to the collections documentation: * [Automatic Speech Recognition (ASR)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md) * Natural Language Processing (NLP) * [Text-to-Speech Synthesis (TTS)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/intro.html.md) Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#training "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------- NeMo leverages [PyTorch Lightning](https://www.pytorchlightning.ai/) for model training. PyTorch Lightning lets NeMo decouple the conversational AI code from the PyTorch training code. This means that NeMo users can focus on their domain (ASR, NLP, TTS) and build complex AI applications without having to rewrite boilerplate code for PyTorch training. When using PyTorch Lightning, NeMo users can automatically train with: * multi-GPU/multi-node * mixed precision * model checkpointing * logging * early stopping * and more The two main aspects of the Lightning API are the [LightningModule](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#) and the [Trainer](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html). ### PyTorch Lightning `LightningModule`[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#pytorch-lightning-lightningmodule "Link to this heading") Every NeMo model is a `LightningModule` which is an `nn.module`. This means that NeMo models are compatible with the PyTorch ecosystem and can be plugged into existing PyTorch workflows. Creating a NeMo model is similar to any other PyTorch workflow. We start by initializing our model architecture, then define the forward pass: class TextClassificationModel(NLPModel, Exportable): ... def __init__ (self, cfg: DictConfig, trainer: Trainer = None): """Initializes the BERTTextClassifier model.""" ... super(). __init__ (cfg=cfg, trainer=trainer) # instantiate a BERT based encoder self.bert_model = get_lm_model( config_file=cfg.language_model.config_file, config_dict=cfg.language_model.config, vocab_file=cfg.tokenizer.vocab_file, trainer=trainer, cfg=cfg, ) # instantiate the FFN for classification self.classifier = SequenceClassifier( hidden_size=self.bert_model.config.hidden_size, num_classes=cfg.dataset.num_classes, num_layers=cfg.classifier_head.num_output_layers, activation='relu', log_softmax=False, dropout=cfg.classifier_head.fc_dropout, use_transformer_init=True, idx_conditioned_on=0, ) def forward(self, input_ids, token_type_ids, attention_mask): """ No special modification required for Lightning, define it as you normally would in the `nn.Module` in vanilla PyTorch. """ hidden_states = self.bert_model( input_ids=input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask ) logits = self.classifier(hidden_states=hidden_states) return logits The `LightningModule` organizes PyTorch code so that across all NeMo models we have a similar look and feel. For example, the training logic can be found in `training_step`: def training_step(self, batch, batch_idx): """ Lightning calls this inside the training loop with the data from the training dataloader passed in as `batch`. """ # forward pass input_ids, input_type_ids, input_mask, labels = batch logits = self.forward(input_ids=input_ids, token_type_ids=input_type_ids, attention_mask=input_mask) train_loss = self.loss(logits=logits, labels=labels) lr = self._optimizer.param_groups[0]['lr'] self.log('train_loss', train_loss) self.log('lr', lr, prog_bar=True) return { 'loss': train_loss, 'lr': lr, } While validation logic can be found in `validation_step`: def validation_step(self, batch, batch_idx): """ Lightning calls this inside the validation loop with the data from the validation dataloader passed in as `batch`. """ if self.testing: prefix = 'test' else: prefix = 'val' input_ids, input_type_ids, input_mask, labels = batch logits = self.forward(input_ids=input_ids, token_type_ids=input_type_ids, attention_mask=input_mask) val_loss = self.loss(logits=logits, labels=labels) preds = torch.argmax(logits, axis=-1) tp, fn, fp, _ = self.classification_report(preds, labels) return {'val_loss': val_loss, 'tp': tp, 'fn': fn, 'fp': fp} PyTorch Lightning then handles all of the boilerplate code needed for training. Virtually any aspect of training can be customized via PyTorch Lightning [hooks](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#hooks), [Plugins](https://pytorch-lightning.readthedocs.io/en/stable/extensions/plugins.html), [callbacks](https://pytorch-lightning.readthedocs.io/en/stable/extensions/callbacks.html), or by overriding [methods](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#methods). For more domain-specific information, see: * [Automatic Speech Recognition (ASR)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md) * Natural Language Processing (NLP) * [Text-to-Speech Synthesis (TTS)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/intro.html.md) ### PyTorch Lightning Trainer[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#pytorch-lightning-trainer "Link to this heading") Since every NeMo model is a `LightningModule`, we can automatically take advantage of the PyTorch Lightning `Trainer`. Every NeMo [example](https://github.com/NVIDIA/NeMo/tree/v1.0.2/examples) training script uses the `Trainer` object to fit the model. First, instantiate the model and trainer, then call `.fit`: # We first instantiate the trainer based on the model configuration. # See the model configuration documentation for details. trainer = pl.Trainer(**cfg.trainer) # Then pass the model configuration and trainer object into the NeMo model model = TextClassificationModel(cfg.model, trainer=trainer) # Now we can train with by calling .fit trainer.fit(model) # Or we can run the test loop on test data by calling trainer.test(model=model) All [trainer flags](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#trainer-flags) can be set from from the NeMo configuration. Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#configuration "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------ Hydra is an open-source Python framework that simplifies configuration for complex applications that must bring together many different software libraries. Conversational AI model training is a great example of such an application. To train a conversational AI model, we must be able to configure: * neural network architectures * training and optimization algorithms * data pre/post processing * data augmentation * experiment logging/visualization * model checkpointing For an introduction to using Hydra, refer to the [Hydra Tutorials](https://hydra.cc/docs/tutorials/intro). With Hydra, we can configure everything needed for NeMo with three interfaces: * Command Line (CLI) * Configuration Files (YAML) * Dataclasses (Python) ### YAML[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#yaml "Link to this heading") NeMo provides YAML configuration files for all of our [example](https://github.com/NVIDIA/NeMo/tree/v1.0.2/examples) training scripts. YAML files make it easy to experiment with different model and training configurations. Every NeMo example YAML has the same underlying configuration structure: * trainer * exp_manager * model The model configuration always contains `train_ds`, `validation_ds`, `test_ds`, and `optim`. Model architectures, however, can vary across domains. Refer to the documentation of specific collections (LLM, ASR etc.) for detailed information on model architecture configuration. A NeMo configuration file should look similar to the following: # PyTorch Lightning Trainer configuration # any argument of the Trainer object can be set here trainer: devices: 1 # number of gpus per node accelerator: gpu num_nodes: 1 # number of nodes max_epochs: 10 # how many training epochs to run val_check_interval: 1.0 # run validation after every epoch # Experiment logging configuration exp_manager: exp_dir: /path/to/my/nemo/experiments name: name_of_my_experiment create_tensorboard_logger: True create_wandb_logger: True # Model configuration # model network architecture, train/val/test datasets, data augmentation, and optimization model: train_ds: manifest_filepath: /path/to/my/train/manifest.json batch_size: 256 shuffle: True validation_ds: manifest_filepath: /path/to/my/validation/manifest.json batch_size: 32 shuffle: False test_ds: manifest_filepath: /path/to/my/test/manifest.json batch_size: 32 shuffle: False optim: name: novograd lr: .01 betas: [0.8, 0.5] weight_decay: 0.001 # network architecture can vary greatly depending on the domain encoder: ... decoder: ... ### CLI[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#cli "Link to this heading") With NeMo and Hydra, every aspect of model training can be modified from the command-line. This is extremely helpful for running lots of experiments on compute clusters or for quickly testing parameters during development. All NeMo [examples](https://github.com/NVIDIA/NeMo/tree/stable/examples) come with instructions on how to run the training/inference script from the command-line (e.g. see [here](https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_ctc/speech_to_text_ctc.py) for an example). With Hydra, arguments are set using the `=` operator: python examples/asr/asr_ctc/speech_to_text_ctc.py \ model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ trainer.devices=2 \ trainer.accelerator='gpu' \ trainer.max_epochs=50 We can use the `+` operator to add arguments from the CLI: python examples/asr/asr_ctc/speech_to_text_ctc.py \ model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ trainer.devices=2 \ trainer.accelerator='gpu' \ trainer.max_epochs=50 \ +trainer.fast_dev_run=true We can use the `~` operator to remove configurations: python examples/asr/asr_ctc/speech_to_text_ctc.py \ model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ ~model.test_ds \ trainer.devices=2 \ trainer.accelerator='gpu' \ trainer.max_epochs=50 \ +trainer.fast_dev_run=true We can specify configuration files using the `--config-path` and `--config-name` flags: python examples/asr/asr_ctc/speech_to_text_ctc.py \ --config-path=conf/quartznet \ --config-name=quartznet_15x5 \ model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \ model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \ ~model.test_ds \ trainer.devices=2 \ trainer.accelerator='gpu' \ trainer.max_epochs=50 \ +trainer.fast_dev_run=true ### Dataclasses[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#dataclasses "Link to this heading") Dataclasses allow NeMo to ship model configurations as part of the NeMo library and also enables pure Python configuration of NeMo models. With Hydra, dataclasses can be used to create [structured configs](https://hydra.cc/docs/tutorials/structured_config/intro) for the conversational AI application. As an example, refer to the code block below for an _Attenion is All You Need_ machine translation model. The model configuration can be instantiated and modified like any Python [Dataclass](https://docs.python.org/3/library/dataclasses.html). from nemo.collections.nlp.models.machine_translation.mt_enc_dec_config import AAYNBaseConfig cfg = AAYNBaseConfig() # modify the number of layers in the encoder cfg.encoder.num_layers = 8 # modify the training batch size cfg.train_ds.tokens_in_batch = 8192 Note Configuration with Hydra always has the following precedence CLI > YAML > Dataclass. Optimization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#optimization "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------- Optimizers and learning rate schedules are configurable across all NeMo models and have their own namespace. Here is a sample YAML configuration for a Novograd optimizer with a Cosine Annealing learning rate schedule. optim: name: novograd lr: 0.01 # optimizer arguments betas: [0.8, 0.25] weight_decay: 0.001 # scheduler setup sched: name: CosineAnnealing # Optional arguments max_steps: -1 # computed at runtime or explicitly set here monitor: val_loss reduce_on_plateau: false # scheduler config override warmup_steps: 1000 warmup_ratio: null min_lr: 1e-9: Note [NeMo Examples](https://github.com/NVIDIA/NeMo/tree/stable/examples) has optimizer and scheduler configurations for every NeMo model. Optimizers can be configured from the CLI as well: python examples/asr/asr_ctc/speech_to_text_ctc.py \ --config-path=conf/quartznet \ --config-name=quartznet_15x5 \ ... # train with the adam optimizer model.optim=adam \ # change the learning rate model.optim.lr=.0004 \ # modify betas model.optim.betas=[.8, .5] ### Optimizers[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#optimizers "Link to this heading") `name` corresponds to the lowercase name of the optimizer. To view a list of available optimizers, run: from nemo.core.optim.optimizers import AVAILABLE_OPTIMIZERS for name, opt in AVAILABLE_OPTIMIZERS.items(): print(f'name: {name}, opt: {opt}') name: sgd opt: name: adam opt: name: adamw opt: name: adadelta opt: name: adamax opt: name: adagrad opt: name: rmsprop opt: name: rprop opt: name: novograd opt: ### Optimizer Params[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#optimizer-params "Link to this heading") Optimizer params can vary between optimizers but the `lr` param is required for all optimizers. To see the available params for an optimizer, we can look at its corresponding dataclass. from nemo.core.config.optimizers import NovogradParams print(NovogradParams()) NovogradParams(lr='???', betas=(0.95, 0.98), eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False, luc=False, luc_trust=0.001, luc_eps=1e-08) `'???'` indicates that the lr argument is required. ### Register Optimizer[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#register-optimizer "Link to this heading") To register a new optimizer to be used with NeMo, run: nemo.core.optim.optimizers.register_optimizer(_name:str_,_optimizer:torch.optim.optimizer.Optimizer_,_optimizer\_params:OptimizerParams_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#nemo.core.optim.optimizers.register_optimizer "Link to this definition") Checks if the optimizer name exists in the registry, and if it doesnt, adds it. This allows custom optimizers to be added and called by name during instantiation. Parameters: * **name** – Name of the optimizer. Will be used as key to retrieve the optimizer. * **optimizer** – Optimizer class * **optimizer_params** – The parameters as a dataclass of the optimizer ### Learning Rate Schedulers[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#learning-rate-schedulers "Link to this heading") Learning rate schedulers can be optionally configured under the `optim.sched` namespace. `name` corresponds to the name of the learning rate schedule. To view a list of available schedulers, run: from nemo.core.optim.lr_scheduler import AVAILABLE_SCHEDULERS for name, opt in AVAILABLE_SCHEDULERS.items(): print(f'name: {name}, schedule: {opt}') name: WarmupPolicy, schedule: name: WarmupHoldPolicy, schedule: name: SquareAnnealing, schedule: name: CosineAnnealing, schedule: name: NoamAnnealing, schedule: name: WarmupAnnealing, schedule: name: InverseSquareRootAnnealing, schedule: name: SquareRootAnnealing, schedule: name: PolynomialDecayAnnealing, schedule: name: PolynomialHoldDecayAnnealing, schedule: name: StepLR, schedule: name: ExponentialLR, schedule: name: ReduceLROnPlateau, schedule: name: CyclicLR, schedule: ### Scheduler Params[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#scheduler-params "Link to this heading") To see the available params for a scheduler, we can look at its corresponding dataclass: from nemo.core.config.schedulers import CosineAnnealingParams print(CosineAnnealingParams()) CosineAnnealingParams(last_epoch=-1, warmup_steps=None, warmup_ratio=None, min_lr=0.0) ### Register scheduler[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#register-scheduler "Link to this heading") To register a new scheduler to be used with NeMo, run: nemo.core.optim.lr_scheduler.register_scheduler(_name:str_,_scheduler:torch.optim.lr\_scheduler.\_LRScheduler_,_scheduler\_params:SchedulerParams_,)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#nemo.core.optim.lr_scheduler.register_scheduler "Link to this definition") Checks if the scheduler name exists in the registry, and if it doesnt, adds it. This allows custom schedulers to be added and called by name during instantiation. Parameters: * **name** – Name of the optimizer. Will be used as key to retrieve the optimizer. * **scheduler** – Scheduler class (inherits from _LRScheduler) * **scheduler_params** – The parameters as a dataclass of the scheduler Save and Restore[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#save-and-restore "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------ NeMo models all come with `.save_to` and `.restore_from` methods. ### Save[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#save "Link to this heading") To save a NeMo model, run: model.save_to('/path/to/model.nemo') Everything needed to use the trained model is packaged and saved in the `.nemo` file. For example, in the NLP domain, `.nemo` files include the necessary tokenizer models and/or vocabulary files, etc. Note A `.nemo` file is simply an archive like any other `.tar` file. ### Restore[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#restore "Link to this heading") To restore a NeMo model, run: # Here, you should usually use the class of the model, or simply use ModelPT.restore_from() for simplicity. model.restore_from('/path/to/model.nemo') When using the PyTorch Lightning Trainer, a PyTorch Lightning checkpoint is created. These are mainly used within NeMo to auto-resume training. Since NeMo models are `LightningModules`, the PyTorch Lightning method `load_from_checkpoint` is available. Note that `load_from_checkpoint` won’t necessarily work out-of-the-box for all models as some models require more artifacts than just the checkpoint to be restored. For these models, the user will have to override `load_from_checkpoint` if they want to use it. It’s highly recommended to use `restore_from` to load NeMo models. ### Restore with Modified Config[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#restore-with-modified-config "Link to this heading") Sometimes, there may be a need to modify the model (or it’s sub-components) prior to restoring a model. A common case is when the model’s internal config must be updated due to various reasons (such as deprecation, newer versioning, support a new feature). As long as the model has the same parameters as compared to the original config, the parameters can once again be restored safely. In NeMo, as part of the .nemo file, the model’s internal config will be preserved. This config is used during restoration, and as shown below we can update this config prior to restoring the model. # When restoring a model, you should generally use the class of the model # Obtain the config (as an OmegaConf object) config = model_class.restore_from('/path/to/model.nemo', return_config=True) # OR config = model_class.from_pretrained('name_of_the_model', return_config=True) # Modify the config as needed config.x.y = z # Restore the model from the updated config model = model_class.restore_from('/path/to/model.nemo', override_config_path=config) # OR model = model_class.from_pretrained('name_of_the_model', override_config_path=config) Register Artifacts[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#register-artifacts "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------- Restoring conversational AI models can be complicated because it requires more than just the checkpoint weights; additional information is also needed to use the model. NeMo models can save additional artifacts in the .nemo file by calling `.register_artifact`. When restoring NeMo models using `.restore_from` or `.from_pretrained`, any artifacts that were registered will be available automatically. As an example, consider an NLP model that requires a trained tokenizer model. The tokenizer model file can be automatically added to the .nemo file with the following: self.encoder_tokenizer = get_nmt_tokenizer( ... tokenizer_model=self.register_artifact(config_path='encoder_tokenizer.tokenizer_model', src='/path/to/tokenizer.model', verify_src_exists=True), ) By default, `.register_artifact` will always return a path. If the model is being restored from a .nemo file, then that path will be to the artifact in the .nemo file. Otherwise, `.register_artifact` will return the local path specified by the user. `config_path` is the artifact key. It usually corresponds to a model configuration but does not have to. The model config that is packaged with the .nemo file will be updated according to the `config_path` key. In the above example, the model config will have encoder_tokenizer: ... tokenizer_model: nemo:4978b28103264263a03439aaa6560e5e_tokenizer.model `src` is the path to the artifact and the base-name of the path will be used when packaging the artifact in the .nemo file. Each artifact will have a hash prepended to the basename of `src` in the .nemo file. This is to prevent collisions with basenames base-names that are identical (say when there are two or more tokenizers, both called tokenizer.model). The resulting .nemo file will then have the following file: 4978b28103264263a03439aaa6560e5e_tokenizer.model If `verify_src_exists` is set to `False`, then the artifact is optional. This means that `.register_artifact` will return `None` if the `src` cannot be found. Push to Hugging Face Hub[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#push-to-hugging-face-hub "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo models can be pushed to the [Hugging Face Hub](https://huggingface.co/) with the [`push_to_hf_hub()`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO.push_to_hf_hub "nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO.push_to_hf_hub") method. This method performs the same actions as `save_to()` and then uploads the model to the HuggingFace Hub. It offers an additional `pack_nemo_file` argument that allows the user to upload the entire NeMo file or just the `.nemo` file. This is useful for large language models that have a massive number of parameters, and a single NeMo file could exceed the max upload size of Hugging Face Hub. ### Upload a model to the Hub[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#upload-a-model-to-the-hub "Link to this heading") token = "" or None pack_nemo_file = True # False will upload multiple files that comprise the NeMo file onto HF Hub; Generally useful for LLMs model.push_to_hf_hub( repo_id=repo_id, pack_nemo_file=pack_nemo_file, token=token, ) ### Use a Custom Model Card Template for the Hub[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#use-a-custom-model-card-template-for-the-hub "Link to this heading") # Override the default model card template = """ # {model_name} """ kwargs = {"model_name": "ABC", "repo_id": "nvidia/ABC_XYZ"} model_card = model.generate_model_card(template=template, template_kwargs=kwargs, type="hf") model.push_to_hf_hub( repo_id=repo_id, token=token, model_card=model_card ) # Write your own model card class class MyModelCard: def __init__ (self, model_name): self.model_name = model_name def __repr__ (self): template = """This is the {model_name} model""".format(model_name=self.model_name) return template model.push_to_hf_hub( repo_id=repo_id, token=token, model_card=MyModelCard("ABC") ) Nested NeMo Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#nested-nemo-models "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------- In some cases, it may be helpful to use NeMo models inside other NeMo models. For example, we can incorporate language models into ASR models to use in a decoding process to improve accuracy or use hybrid ASR-TTS models to generate audio from the text on the fly to train or fine-tune the ASR model. There are three ways to instantiate child models inside parent models: * use subconfig directly * use the `.nemo` checkpoint path to load the child model * use a pretrained NeMo model To register a child model, use the `register_nemo_submodule` method of the parent model. This method will add the child model to a specified model attribute. During serialization, it will correctly handle child artifacts and store the child model’s configuration in the parent model’s `config_field`. from nemo.core.classes import ModelPT class ChildModel(ModelPT): ... # implement necessary methods class ParentModel(ModelPT): def __init__ (self, cfg, trainer=None): super(). __init__ (cfg=cfg, trainer=trainer) # optionally annotate type for IDE autocompletion and type checking self.child_model: Optional[ChildModel] if cfg.get("child_model") is not None: # load directly from config # either if config provided initially, or automatically # after model restoration self.register_nemo_submodule( name="child_model", config_field="child_model", model=ChildModel(self.cfg.child_model, trainer=trainer), ) elif cfg.get('child_model_path') is not None: # load from .nemo model checkpoint # while saving, config will be automatically assigned/updated # in cfg.child_model self.register_nemo_submodule( name="child_model", config_field="child_model", model=ChildModel.restore_from(self.cfg.child_model_path, trainer=trainer), ) elif cfg.get('child_model_name') is not None: # load from pretrained model # while saving, config will be automatically assigned/updated # in cfg.child_model self.register_nemo_submodule( name="child_model", config_field="child_model", model=ChildModel.from_pretrained(self.cfg.child_model_name, trainer=trainer), ) else: self.child_model = None Profiling[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#profiling "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------- NeMo offers users two options for profiling: Nsys and CUDA memory profiling. These two options allow users to debug performance issues as well as memory issues such as memory leaks. To enable Nsys profiling, add the following options to the model config: nsys_profile: False start_step: 10 # Global batch to start profiling end_step: 10 # Global batch to end profiling ranks: [0] # Global rank IDs to profile gen_shape: False # Generate model and kernel details including input shapes Finally, run the model training script with: nsys profile -s none -o -t cuda,nvtx --force-overwrite true --capture-range=cudaProfilerApi --capture-range-end=stop python ./examples/... See more options at [nsight user guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html.md#cli-profiling). To enable CUDA memory profiling, add the following options to the model config: memory_profile: enabled: True start_step: 10 # Global batch to start profiling end_step: 10 # Global batch to end profiling rank: 0 # Global rank ID to profile output_path: None # Path to store the profile output file Then invoke your NeMo script without any changes in the invocation command. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/core.html.md#profiling) - [Hydra](https://hydra.cc/) - [here](https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_ctc/speech_to_text_ctc.py) - [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) - [Automatic Speech Recognition (ASR)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md) - [Text-to-Speech Synthesis (TTS)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/intro.html.md) - [PyTorch Lightning](https://www.pytorchlightning.ai/) - [LightningModule](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#) - [Trainer](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html) - [hooks](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#hooks) - [Plugins](https://pytorch-lightning.readthedocs.io/en/stable/extensions/plugins.html) - [callbacks](https://pytorch-lightning.readthedocs.io/en/stable/extensions/callbacks.html) - [methods](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#methods) - [example](https://github.com/NVIDIA/NeMo/tree/v1.0.2/examples) - [trainer flags](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#trainer-flags) - [Hydra Tutorials](https://hydra.cc/docs/tutorials/intro) - [structured configs](https://hydra.cc/docs/tutorials/structured_config/intro) - [Dataclass](https://docs.python.org/3/library/dataclasses.html) - [Hugging Face Hub](https://huggingface.co/) - [push_to_hf_hub()](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.core.classes.mixins.hf_io_mixin.HuggingFaceFileIO.push_to_hf_hub) - [nsight user guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html.md#cli-profiling) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md Title: Experiment Manager — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html Published Time: Fri, 05 Sep 2025 19:01:33 GMT Markdown Content: Experiment Manager[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#experiment-manager "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------- The NeMo Framework Experiment Manager leverages PyTorch Lightning for model checkpointing, TensorBoard Logging, Weights and Biases, DLLogger and MLFlow logging. The Experiment Manager is included by default in all NeMo example scripts. To use the Experiment Manager, call [`exp_manager`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.utils.exp_manager.exp_manager "nemo.utils.exp_manager.exp_manager") and pass in the PyTorch Lightning `Trainer`. exp_dir = exp_manager(trainer, cfg.get("exp_manager", None)) The Experiment Manager is configurable using YAML with Hydra. exp_manager: exp_dir: /path/to/my/experiments name: my_experiment_name create_tensorboard_logger: True create_checkpoint_callback: True Optionally, launch TensorBoard to view the training results in `exp_dir`, which by default is set to `./nemo_experiments`. tensorboard --bind_all --logdir nemo_experiments If `create_checkpoint_callback` is set to `True`, then NeMo automatically creates checkpoints during training using PyTorch Lightning’s [ModelCheckpoint](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.ModelCheckpoint.html). We can configure the `ModelCheckpoint` via YAML or CLI: exp_manager: ... # configure the PyTorch Lightning ModelCheckpoint using checkpoint_call_back_params # any ModelCheckpoint argument can be set here # save the best checkpoints based on this metric checkpoint_callback_params.monitor=val_loss # choose how many total checkpoints to save checkpoint_callback_params.save_top_k=5 Resume Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#resume-training "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------- To auto-resume training, configure the `exp_manager`. This feature is important for long training runs that might be interrupted or shut down before the procedure has completed. To auto-resume training, set the following parameters via YAML or CLI: exp_manager: ... # resume training if checkpoints already exist resume_if_exists: True # to start training with no existing checkpoints resume_ignore_no_checkpoint: True # by default experiments will be versioned by datetime # we can set our own version with exp_manager.version: my_experiment_version Experiment Loggers[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#experiment-loggers "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------- Alongside Tensorboard, NeMo also supports Weights and Biases, MLFlow, DLLogger, ClearML and NeptuneLogger. To use these loggers, set the following via YAML or [`ExpManagerConfig`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.utils.exp_manager.ExpManagerConfig "nemo.utils.exp_manager.ExpManagerConfig"). ### Weights and Biases (WandB)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#weights-and-biases-wandb "Link to this heading") exp_manager: ... create_checkpoint_callback: True create_wandb_logger: True wandb_logger_kwargs: name: ${name} project: ${project} entity: ${entity} ### MLFlow[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#mlflow "Link to this heading") exp_manager: ... create_checkpoint_callback: True create_mlflow_logger: True mlflow_logger_kwargs: experiment_name: ${name} tags: save_dir: './mlruns' prefix: '' artifact_location: None # provide run_id if resuming a previously started run run_id: Optional[str] = None ### DLLogger[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#dllogger "Link to this heading") exp_manager: ... create_checkpoint_callback: True create_dllogger_logger: True dllogger_logger_kwargs: verbose: False stdout: False json_file: "./dllogger.json" ### ClearML[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#clearml "Link to this heading") exp_manager: ... create_checkpoint_callback: True create_clearml_logger: True clearml_logger_kwargs: project: None # name of the project task: None # optional name of task connect_pytorch: False model_name: None # optional name of model tags: None # Should be a list of str log_model: False # log model to clearml server log_cfg: False # log config to clearml server log_metrics: False # log metrics to clearml server ### Neptune[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#neptune "Link to this heading") exp_manager: ... create_checkpoint_callback: True create_neptune_logger: false neptune_logger_kwargs: project: ${project} name: ${name} prefix: train log_model_checkpoints: false # set to True if checkpoints need to be pushed to Neptune tags: null # can specify as an array of strings in yaml array format description: null Exponential Moving Average[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#exponential-moving-average "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo supports using exponential moving average (EMA) for model parameters. This can be useful for improving model generalization and stability. To use EMA, set the following parameters via YAML or [`ExpManagerConfig`](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.utils.exp_manager.ExpManagerConfig "nemo.utils.exp_manager.ExpManagerConfig"). exp_manager: ... # use exponential moving average for model parameters ema: enabled: True # False by default decay: 0.999 # decay rate cpu_offload: False # If EMA parameters should be offloaded to CPU to save GPU memory every_n_steps: 1 # How often to update EMA weights validate_original_weights: False # Whether to use original weights for validation calculation or EMA weights Hydra Multi-Run with NeMo[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#hydra-multi-run-with-nemo "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- When training neural networks, it is common to perform a hyperparameter search to improve the model’s performance on validation data. However, manually preparing a grid of experiments and managing all checkpoints and their metrics can be tedious. To simplify these tasks, NeMo integrates with [Hydra Multi-Run support](https://hydra.cc/docs/tutorials/basic/running_your_app/multi-run/), providing a unified way to run a set of experiments directly from the configuration. There are certain limitations to this framework, which we list below: * All experiments are assumed to be run on a single GPU, and multi GPU for single run (model parallel models are not supported as of now). * NeMo Multi-Run currently supports only grid search over a set of hyperparameters. Support for advanced hyperparameter search strategies will be added in the future. * **NeMo Multi-Run requires one or more GPUs** to function and will not work without GPU devices. ### Config Setup[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#config-setup "Link to this heading") In order to enable NeMo Multi-Run, we first update our YAML configs with some information to let Hydra know we expect to run multiple experiments from this one config - # Required for Hydra launch of hyperparameter search via multirun defaults: - override hydra/launcher: nemo_launcher # Hydra arguments necessary for hyperparameter optimization hydra: # Helper arguments to ensure all hyper parameter runs are from the directory that launches the script. sweep: dir: "." subdir: "." # Define all the hyper parameters here sweeper: params: # Place all the parameters you wish to search over here (corresponding to the rest of the config) # NOTE: Make sure that there are no spaces between the commas that separate the config params ! model.optim.lr: 0.001,0.0001 model.encoder.dim: 32,64,96,128 model.decoder.dropout: 0.0,0.1,0.2 # Arguments to the process launcher launcher: num_gpus: -1 # Number of gpus to use. Each run works on a single GPU. jobs_per_gpu: 1 # If each GPU has large memory, you can run multiple jobs on the same GPU for faster results (until OOM). Next, we will setup the config for `Experiment Manager`. When we perform hyper parameter search, each run may take some time to complete. We want to therefore avoid the case where a run ends (say due to OOM or timeout on the machine) and we need to redo all experiments. We therefore setup the experiment manager config such that every experiment has a unique “key”, whose value corresponds to a single resumable experiment. Let us see how to setup such a unique “key” via the experiment name. Simply attach all the hyper parameter arguments to the experiment name as shown below - exp_manager: exp_dir: null # Can be set by the user. # Add a unique name for all hyper parameter arguments to allow continued training. # NOTE: It is necessary to add all hyperparameter arguments to the name ! # This ensures successful restoration of model runs in case HP search crashes. name: ${name}-lr-${model.optim.lr}-adim-${model.adapter.dim}-sd-${model.adapter.adapter_strategy.stochastic_depth} ... checkpoint_callback_params: ... save_top_k: 1 # Dont save too many .ckpt files during HP search always_save_nemo: True # saves the checkpoints as nemo files for fast checking of results later ... # We highly recommend use of any experiment tracking took to gather all the experiments in one location create_wandb_logger: True wandb_logger_kwargs: project: "" # HP Search may crash due to various reasons, best to attempt continuation in order to # resume from where the last failure case occurred. resume_if_exists: true resume_ignore_no_checkpoint: true ### Run a NeMo Multi-Run Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#run-a-nemo-multi-run-configuration "Link to this heading") Once the config has been updated, we can now run it just like any normal Hydra script, with one special flag (`-m`). python script.py --config-path=ABC --config-name=XYZ -m \ trainer.max_steps=5000 \ # Any additional arg after -m will be passed to all the runs generated from the config ! ... Tips and Tricks[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#tips-and-tricks "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------- This section provides recommendations for using the Experiment Manager. ### Preserving disk space for a large number of experiments[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#preserving-disk-space-for-a-large-number-of-experiments "Link to this heading") Some models may have a large number of parameters, making it very expensive to save numerous checkpoints on physical storage drives. For example, if you use the Adam optimizer, each PyTorch Lightning “.ckpt” file will be three times the size of just the model parameters. This can become exorbitant if you have multiple runs. In the above configuration, we explicitly set `save_top_k: 1` and `always_save_nemo: True`. This limits the number of “.ckpt” files to just one and also saves a NeMo file, which contains only the model parameters without the optimizer state. This NeMo file can be restored immediately for further work. We can further save storage space by using NeMo’s utility functions to automatically delete either “.ckpt” or NeMo files after a training run has finished. This is sufficient if you are collecting results in an experiment tracking tool and can simply rerun the best configuration after the search is completed. # Import `clean_exp_ckpt` along with exp_manager from nemo.utils.exp_manager import clean_exp_ckpt, exp_manager @hydra_runner(...) def main(cfg): ... # Keep track of the experiment directory exp_log_dir = exp_manager(trainer, cfg.get("exp_manager", None)) ... add any training code here as needed ... # Add following line to end of the training script # Remove PTL ckpt file, and potentially also remove .nemo file to conserve storage space. clean_exp_ckpt(exp_log_dir, remove_ckpt=True, remove_nemo=False) ### Debugging Multi-Run Scripts[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#debugging-multi-run-scripts "Link to this heading") When running Hydra scripts, you may encounter configuration issues that crash the program. In NeMo Multi-Run, a crash in any single run will not crash the entire program. Instead, we will note the error and proceed to the next job. Once all jobs are completed, we will raise the errors in the order they occurred, crashing the program with the first error’s stack trace. To debug NeMo Multi-Run, we recommend commenting out the entire hyperparameter configuration set inside `sweep.params`. Instead, run a single experiment with the configuration, which will immediately raise the error. ### Experiment name cannot be parsed by Hydra[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#experiment-name-cannot-be-parsed-by-hydra "Link to this heading") Sometimes our hyperparameters include PyTorch Lightning `trainer` arguments, such as the number of steps, number of epochs, and whether to use gradient accumulation. When we attempt to add these as keys to the experiment manager’s `name`, Hydra may complain that `trainer.xyz` cannot be resolved. A simple solution is to finalize the Hydra config before you call `exp_manager()` as follows: @hydra_runner(...) def main(cfg): # Make any changes as necessary to the config cfg.xyz.abc = uvw # Finalize the config cfg = OmegaConf.resolve(cfg) # Carry on as normal by calling trainer and exp_manager trainer = pl.Trainer(**cfg.trainer) exp_log_dir = exp_manager(trainer, cfg.get("exp_manager", None)) ... ExpManagerConfig[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#expmanagerconfig "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------- _class_ nemo.utils.exp_manager.ExpManagerConfig(_explicit\_log\_dir:str|None=None_,_exp\_dir:str|None=None_,_name:str|None=None_,_version:str|None=None_,_use\_datetime\_version:bool|None=True_,_resume\_if\_exists:bool|None=False_,_resume\_past\_end:bool|None=False_,_resume\_ignore\_no\_checkpoint:bool|None=False_,_resume\_from\_checkpoint:str|None=None_,_create\_tensorboard\_logger:bool|None=True_,_summary\_writer\_kwargs:~typing.Dict[~typing.Any_,_~typing.Any]|None=None_,_create\_wandb\_logger:bool|None=False_,_wandb\_logger\_kwargs:~typing.Dict[~typing.Any_,_~typing.Any]|None=None_,_create\_mlflow\_logger:bool|None=False_,_mlflow\_logger\_kwargs:~nemo.utils.loggers.mlflow\_logger.MLFlowParams|None=_,_create\_dllogger\_logger:bool|None=False_,_dllogger\_logger\_kwargs:~nemo.utils.loggers.dllogger.DLLoggerParams|None=_,_create\_clearml\_logger:bool|None=False_,_clearml\_logger\_kwargs:~nemo.utils.loggers.clearml\_logger.ClearMLParams|None=_,_create\_neptune\_logger:bool|None=False_,_neptune\_logger\_kwargs:~typing.Dict[~typing.Any_,_~typing.Any]|None=None_,_create\_checkpoint\_callback:bool|None=True_,_checkpoint\_callback\_params:~nemo.utils.exp\_manager.CallbackParams|None=_,_create\_early\_stopping\_callback:bool|None=False_,_early\_stopping\_callback\_params:~nemo.utils.exp\_manager.EarlyStoppingParams|None=_,_create\_preemption\_callback:bool|None=True_,_files\_to\_copy:~typing.List[str]|None=None_,_log\_step\_timing:bool|None=True_,_log\_delta\_step\_timing:bool|None=False_,_step\_timing\_kwargs:~nemo.utils.exp\_manager.StepTimingParams|None=_,_log\_local\_rank\_0\_only:bool|None=False_,_log\_global\_rank\_0\_only:bool|None=False_,_disable\_validation\_on\_resume:bool|None=True_,_ema:~nemo.utils.exp\_manager.EMAParams|None=_,_max\_time\_per\_run:str|None=None_,_seconds\_to\_sleep:float=5_,_create\_straggler\_detection\_callback:bool|None=False_,_straggler\_detection\_params:~nemo.utils.exp\_manager.StragglerDetectionParams|None=_,_create\_fault\_tolerance\_callback:bool|None=False_,_fault\_tolerance:~nemo.utils.exp\_manager.FaultToleranceParams|None=_,_log\_tflops\_per\_sec\_per\_gpu:bool|None=True_,) Bases: `object` Experiment Manager config for validation of passed arguments. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/exp_manager.html.md#expmanagerconfig) - [exp_manager](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.utils.exp_manager.exp_manager) - [ModelCheckpoint](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.ModelCheckpoint.html) - [ExpManagerConfig](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/api.html.md#nemo.utils.exp_manager.ExpManagerConfig) - [Hydra Multi-Run support](https://hydra.cc/docs/tutorials/basic/running_your_app/multi-run/) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/optimizations/activation_recomputation.html.md Title: Activation Recomputation — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/optimizations/activation_recomputation.html Published Time: Thu, 30 Oct 2025 07:07:32 GMT Markdown Content: Activation Recomputation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/optimizations/activation_recomputation.html.md#activation-recomputation "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ The input activations of network layers are stored in device memory and are used to compute gradients during back-propagation. When training a LLM with a long sequence length or a large micro-batch size, these input activations can quickly saturate device memory. Checkpointing a few activations and recomputing the rest is a common technique to reduce device memory usage. Transformer Layer Recomputation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/optimizations/activation_recomputation.html.md#transformer-layer-recomputation "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo supports transformer layer recomputation, which checkpoints the input of each transformer layer and recomputes the activations for the remaining layers. This technique significantly reduces activation memory usage. However, it increases the per-transformer layer computation cost by 30% due to re-executing the entire layer’s forward computation. NeMo also supports partial transformer layer recomputation, which is beneficial when recomputing a few transformer layers help to reduce enough GPU memory for model to fit. This approach avoids the need to recompute the rest of the layers. The recomputation config can be enabled via the transformer config [TransformerConfig](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/transformer/transformer_config.py#L25). Transformer layer recomputation is enabled by setting `recompute_method=full`. The number of transformer layers to recompute can be set using `recompute_num_layers` along with `recompute_method=block`. If you set `recompute_num_layers` as the total number of layers, the inputs of all transformer layers are checkpointed and recomputed. When training with the pipeline parallelism, `recompute_num_layers` indicates the layers per pipeline stage. When using virtual pipelining, `recompute_num_layers` specifies the number of layers per virtual pipeline stage. NeMo also supports checkpointing the input to a block of multiple consecutive transformer layers, meaning that a block of transformer layers becomes the recomputation granularity. This approach can save activation memory but increases the recomputation buffer memory. Thus, it is only beneficial for memory savings when the model has many transformer layers or when the intermediate layers of a transformer layer hold relatively small activation stores. This recomputation mode can be enabled by setting `recompute_method=uniform`, with the number of transformer layers per recomputation block set using `recompute_num_layers`. > from nemo.collections import llm > from functools import partial > > # Load train recipe > recipe = partial(llm.llama3_8b.pretrain_recipe)() > > recipe.model.config.recompute_method = "block" # Enable 'block'-wise recomputation > recipe.model.config.recompute_num_layers = 4 Self-attention Recomputation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/optimizations/activation_recomputation.html.md#self-attention-recomputation "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo supports the self-attention recomputation that checkpoints the inputs of each self-attention block and recomputes the intermediate input activations. This cost-efficient method achieves high memory savings with minimal recomputation cost. The intermediate layers of the self-attention block accounts for the majority of the activation memory. This is because the input sizes of softmax, dropout, and qkv dot-product attention layers have the memory complexity of the sequence length square. However, their recomputation cost is relatively smaller than the other linear projection layers that are linear with the hidden size square. Self-attention recomputation is hard-enabled when using FlashAttention, which is supported in Transformer Engine. Also, you can use the self-attention recomputation without FlashAttention by setting `recompute_method=selective`. > from nemo.collections import llm > from functools import partial > > # Load train recipe > recipe = partial(llm.llama3_8b.pretrain_recipe)() > > recipe.model.config.recompute_granularity = "selective" # Enable selective recomputation Scheme of full and selective checkpointing granularity: ![Image 1: activation-recomputation-example-2](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-activation-recomputation-exampe-2.jpg) Scheme of uniform and block checkpointing method (full checkpointing granularity): ![Image 2: activation-recomputation-example-1](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-activation-recomputation-exampe-1.jpg) Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/optimizations/activation_recomputation.html.md#self-attention-recomputation) - [TransformerConfig](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/transformer/transformer_config.py#L25) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html.md Title: Ramp Up Batch Size — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html Published Time: Thu, 30 Oct 2025 07:07:32 GMT Markdown Content: Ramp Up Batch Size[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html.md#ramp-up-batch-size "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Ramp up batch size is a feature that allows training to start with a smaller global batch size and linearly increase to a target global batch size over a given number of training samples with specified incremental steps. Usage[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html.md#usage "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------- To enable global batch size rampup during training, set the rampup_batch_size parameter under the model section of training configuration. This parameter should be a list of three values: * `start_batch_size`: The initial batch size. * `batch_size_increment`: The amount by which the batch size will increase at each step. * `rampup_samples`: The number of training samples over which the batch size will be ramped up. `model.global_batch_size=1024 model.rampup_batch_size=[256, 128, 50000000]` In this example, the training will start with a batch size of 256, increment by 128, and reach the target global batch size of 1024 over 50,000,000 training samples. Ramp Up Stages and Training Interruption[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html.md#ramp-up-stages-and-training-interruption "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Once the next rampup stage is reached (the point in training when the global batch size increases), NeMo will stop the training. It allows to rerun the training job with a larger number of GPUs or nodes for the next stage of ramp up batch size. Automatic Node Scheduling[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html.md#automatic-node-scheduling "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- In the [NeMo-Framework-Launcher](https://github.com/NVIDIA/NeMo-Framework-Launcher), when using rampup batch size, a node scheduler is created automatically. This scheduler allows the use smaller number of nodes for smaller batch size stages and scales up according to the `training.trainer.num_nodes` parameter. This parameter corresponds to the maximum number of nodes you want to use for the maximum global batch size. Example[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html.md#example "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------- Detailed example of ramp up batch size feature usage with GPT3 5B model and [NeMo-Framework-Launcher](https://github.com/NVIDIA/NeMo-Framework-Launcher). In this example, the training started with a global batch size of 256, increased by 256 at each ramp up stage, and reached the target global batch size of 2048 over 10,000,000 training samples. Node schedule looks as follows: | global_batch_size | num_nodes | | --- | --- | | 256 | 8 | | 512 | 8 | | 768 | 8 | | 1024 | 8 | | 1280 | 10 | | 1536 | 12 | | 1792 | 14 | | 2048 | 16 | Plot of `global_batch_size` increase during training: [![Image 1](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-rampup-batch-size-example.png)](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-rampup-batch-size-example.png) Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/rampup_batch_size.html.md#example) - [NeMo-Framework-Launcher](https://github.com/NVIDIA/NeMo-Framework-Launcher) - [](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/asset-post-rampup-batch-size-example.png) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md Title: Training and Scaling — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html Published Time: Thu, 30 Oct 2025 07:07:33 GMT Markdown Content: Training and Scaling[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#training-and-scaling "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This page provides detailed information on training speechlm2 models, including setup requirements, running experiments at scale, debugging, and parallelism strategies. Running Experiments[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#running-experiments "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The speechlm2 collection includes several scripts to facilitate running experiments, especially on SLURM-based clusters. ### SLURM Job Submission[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#slurm-job-submission "Link to this heading") For training on SLURM clusters, use the following workflow: # Submit 8 consecutive jobs with random seeds scripts/speechlm2/auto_launcher_with_seed.sh -n8 s2s_tinyllama_repro.sub The `auto_launcher_with_seed.sh` script: 1. Generates a random seed for each submitted job 2. Leverages `shard_seed="randomized"` in Lhotse to ensure each data parallel rank is seeded differently 3. Ensures each tensor parallel rank is seeded identically ### SLURM Submission Script[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#slurm-submission-script "Link to this heading") Example `s2s_tinyllama_repro.sub` script: #!/bin/bash #SBATCH --job-name=s2s_training #SBATCH --nodes=4 #SBATCH --ntasks-per-node=8 #SBATCH --gres=gpu:8 #SBATCH --time=24:00:00 #SBATCH --exclusive #SBATCH --output=s2s_tinyllama_repro_%j.out # Check that the global random seed base is provided if [ -z "$1" ]; then echo "Usage: $0 " exit 1 fi SEED=${1} EXP_NAME="s2s_training" RESULTS_DIR="results/${EXP_NAME}" srun --ntasks=${SLURM_NTASKS} --ntasks-per-node=${SLURM_NTASKS_PER_NODE} \ python -u examples/speechlm2/s2s_duplex_train.py \ --config-path=/path/to/config/dir \ --config-name=s2s_training.yaml \ exp_manager.name=${EXP_NAME} \ exp_manager.wandb_logger_kwargs.name=${EXP_NAME} \ trainer.num_nodes=$SLURM_JOB_NUM_NODES \ exp_manager.explicit_log_dir=${RESULTS_DIR} \ data.train_ds.seed=$SEED \ data.validation_ds.seed=$SEED ### Configuration Files[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#configuration-files "Link to this heading") The main configuration file (`s2s_training.yaml`) contains all model, training, and data parameters. See [Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/configs.html.md) for more details. It’s recommended to copy and modify this file rather than overriding options in the SLURM script to maintain versioning and configuration clarity. Debugging[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#debugging "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------- ### Running Locally with torchrun[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#running-locally-with-torchrun "Link to this heading") For local debugging and profiling, use `torchrun`: # Run with 4 GPUs locally torchrun --nproc_per_node=4 examples/speechlm2/s2s_duplex_train.py \ --config-path=/path/to/config/dir \ --config-name=s2s_training.yaml Scaling Strategies[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#scaling-strategies "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The speechlm2 collection includes support for model parallelism to scale training to large models across multiple GPUs. ### Model Parallel Strategies[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#model-parallel-strategies "Link to this heading") The collection supports multiple parallelism strategies: 1. **Fully Sharded Data Parallel (FSDP2)**: Distributes model parameters across GPUs 2. **Tensor Parallelism (TP)**: Splits individual tensors across GPUs 3. **Sequence Parallelism (SP)**: Splits sequence processing across GPUs 4. **2D Parallelism**: Combination of FSDP2 with TP/SP ### Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#configuration "Link to this heading") To configure parallelism, modify the `trainer.strategy` section in your YAML config: trainer: strategy: _target_ : nemo.core.ModelParallelStrategy find_unused_parameters: False data_parallel: 1 # World size for data parallelism (FSDP2) tensor_parallel: 8 # World size for tensor parallelism devices: 8 num_nodes: 1 accelerator: gpu precision: bf16-true The model’s `configure_model` method automatically sets up the appropriate parallelization based on this configuration. ### FSDP2 Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#fsdp2-configuration "Link to this heading") For Fully Sharded Data Parallel training: 1. Set `data_parallel` to the number of GPUs you want to use for data parallelism 2. Set `tensor_parallel` to 1 (disabled) FSDP2 shards the model parameters across GPUs, all-gathers them for forward/backward passes, and then de-allocates after computation. This allows training of larger models with limited GPU memory. See [:doc:`PyTorch FSDP2 `_](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#id1) for more details. ### Tensor Parallelism Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#tensor-parallelism-configuration "Link to this heading") For Tensor Parallelism: 1. Set `tensor_parallel` to the number of GPUs you want to use for tensor parallelism 2. Set `data_parallel` to 1 (or higher for 2D parallelism) The `parallelize_module` function applies a parallelization plan to specific model components, like splitting attention heads or embedding dimensions across GPUs. See [:doc:`PyTorch TP `_](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#id3) for more details. Implementation Details[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#implementation-details "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The core implementation of model parallelism is in the `configure_model` method of the model classes. Key aspects include: 1. **Module Sharding**: Calling `fully_shard` on modules to distribute parameters across data parallel ranks 2. **Parallelization Plans**: Creating and applying plans that specify how different layers should be parallelized 3. **Model-Specific Adaptations**: Handling architectural differences between different LLMs Advanced Usage[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#advanced-usage "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------- ### Script Customization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#script-customization "Link to this heading") When customizing the training scripts, keep these points in mind: 1. **Path Overrides**: Override paths in the YAML configuration files with your own, as needed 2. **W&B Keys**: Update Weights & Biases API keys in configuration files 3. **Batch Size Tuning**: Adjust batch size based on your GPU memory and model size Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#script-customization) - [Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/configs.html.md) - [:doc:`PyTorch FSDP2 `_](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#id1) - [:doc:`PyTorch TP `_](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/speechlm2/training_and_scaling.html.md#id3) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md Title: Introduction — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html Published Time: Fri, 05 Sep 2025 19:00:57 GMT Markdown Content: Introduction[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#introduction "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------- Training generative AI architectures typically requires significant data and computing resources. NeMo utilizes [PyTorch Lightning](https://www.pytorchlightning.ai/) for efficient and performant multi-GPU/multi-node mixed-precision training. NeMo is built on top of NVIDIA’s powerful Megatron Core library and Transformer Engine for its Large Language Models (LLMs) and Multimodal Models (MMs), leveraging cutting-edge advancements in model training and optimization. For Speech AI applications, Automatic Speech Recognition (ASR) and Text-to-Speech (TTS), NeMo is developed with native PyTorch and PyTorch Lightning, ensuring seamless integration and ease of use. Future updates are planned to align Speech AI models with the Megatron framework, enhancing training efficiency and model performance. [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo) features separate collections for Large Language Models (LLMs), Multimodal Models (MMs), Computer Vision (CV), Automatic Speech Recognition (ASR), and Text-to-Speech (TTS) models. Each collection comprises prebuilt modules that include everything needed to train on your data. These modules can be easily customized, extended, and composed to create new generative AI model architectures. Pre-trained NeMo models are available to download on [NGC](https://catalog.ngc.nvidia.com/models?query=nemo&orderBy=weightPopularDESC) and [HuggingFace Hub](https://huggingface.co/nvidia). Prerequisites[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#prerequisites "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------ Before using NeMo, make sure you meet the following prerequisites: 1. Python version 3.10 or above. 2. Pytorch version 1.13.1 or 2.0+. 3. Access to an NVIDIA GPU for model training. Installation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#installation "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------- Refer to the NeMo Framework [User Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html.md) for the latest installation instructions. Quick Start Guide[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#quick-start-guide "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------- To explore NeMo’s capabilities in LLM, ASR, and TTS, follow the example below based on the [Audio Translation](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/AudioTranslationSample.ipynb) tutorial. Ensure NeMo is installed before proceeding. # Import NeMo's ASR, NLP and TTS collections import nemo.collections.asr as nemo_asr import nemo.collections.nlp as nemo_nlp import nemo.collections.tts as nemo_tts # Download an audio file that we will transcribe, translate, and convert the written translation to speech import wget wget.download("https://nemo-public.s3.us-east-2.amazonaws.com/zh-samples/common_voice_zh-CN_21347786.mp3") # Instantiate a Mandarin speech recognition model and transcribe an audio file. asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="stt_zh_citrinet_1024_gamma_0_25") mandarin_text = asr_model.transcribe(['common_voice_zh-CN_21347786.mp3']) print(mandarin_text) # Instantiate Neural Machine Translation model and translate the text nmt_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name="nmt_zh_en_transformer24x6") english_text = nmt_model.translate(mandarin_text) print(english_text) # Instantiate a spectrogram generator (which converts text -> spectrogram) # and vocoder model (which converts spectrogram -> audio waveform) spectrogram_generator = nemo_tts.models.FastPitchModel.from_pretrained(model_name="tts_en_fastpitch") vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name="tts_en_hifigan") # Parse the text input, generate the spectrogram, and convert it to audio parsed_text = spectrogram_generator.parse(english_text[0]) spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed_text) audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) # Save the audio to a file import soundfile as sf sf.write("output_audio.wav", audio.to('cpu').detach().numpy()[0], 22050) For detailed tutorials and documentation on specific tasks or to learn more about NeMo, check out the NeMo [tutorials](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/tutorials.html.md) or dive deeper into the documentation, such as learning about ASR in [here](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md). Discussion Board[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#discussion-board "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------ For additional information and questions, visit the [NVIDIA NeMo Discussion Board](https://github.com/NVIDIA/NeMo/discussions). Contribute to NeMo[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#contribute-to-nemo "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------- Community contributions are welcome! See the [CONTRIBUTING.md](https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md) file for how to contribute. License[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#license "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------ NeMo is released under the [Apache 2.0 license](https://github.com/NVIDIA/NeMo/blob/stable/LICENSE). Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md#license) - [NeMo Framework User Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/index.html) - [PyTorch Lightning](https://www.pytorchlightning.ai/) - [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo) - [NGC](https://catalog.ngc.nvidia.com/models?query=nemo&orderBy=weightPopularDESC) - [HuggingFace Hub](https://huggingface.co/nvidia) - [User Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html.md) - [Audio Translation](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/AudioTranslationSample.ipynb) - [tutorials](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/tutorials.html.md) - [here](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md) - [NVIDIA NeMo Discussion Board](https://github.com/NVIDIA/NeMo/discussions) - [CONTRIBUTING.md](https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md) - [Apache 2.0 license](https://github.com/NVIDIA/NeMo/blob/stable/LICENSE) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md Title: Checkpoints — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html Published Time: Fri, 18 Jul 2025 19:26:53 GMT Markdown Content: Checkpoints[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#checkpoints "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- There are two main ways to load pretrained checkpoints in NeMo as described in [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/results.html.md). * Using the `restore_from()` method to load a local checkpoint file (`.nemo`), or * Using the `from_pretrained()` method to download and set up a checkpoint from NGC. Note that these instructions are for loading fully trained checkpoints for evaluation or fine-tuning. For resuming an unfinished training experiment, use the Experiment Manager to do so by setting the `resume_if_exists` flag to `True`. Local Checkpoints[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#local-checkpoints "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------- * **Save Model Checkpoints**: NeMo automatically saves final model checkpoints with `.nemo` suffix. You could also manually save any model checkpoint using `model.save_to(.nemo)`. * **Load Model Checkpoints**: if you’d like to load a checkpoint saved at ``, use the `restore_from()` method below, where `` is the TTS model class of the original checkpoint. import nemo.collections.tts as nemo_tts model = nemo_tts.models..restore_from(restore_path="") NGC Pretrained Checkpoints[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#ngc-pretrained-checkpoints "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The NGC [NeMo Text to Speech collection](https://catalog.ngc.nvidia.com/orgs/nvidia/collections/nemo_tts.md) aggregates model cards that contain detailed information about checkpoints of various models trained on various datasets. The tables below in [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#ngc-tts-models) list part of available TTS models from NGC including speech/text aligners, acoustic models, and vocoders. ### Load Model Checkpoints[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#load-model-checkpoints "Link to this heading") The models can be accessed via the `from_pretrained()` method inside the TTS Model class. In general, you can load any of these models with code in the following format, import nemo.collections.tts as nemo_tts model = nemo_tts.models..from_pretrained(model_name="") where `` is the value in `Model Name` column in the tables in [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#ngc-tts-models). These names are predefined in the each model’s member function `self.list_available_models()`. For example, the available NGC FastPitch model names can be found, In [1]: import nemo.collections.tts as nemo_tts In [2]: nemo_tts.models.FastPitchModel.list_available_models() Out[2]: [PretrainedModelInfo( pretrained_model_name=tts_en_fastpitch, description=This model is trained on LJSpeech sampled at 22050Hz with and can be used to generate female English voices with an American accent. It is ARPABET-based., location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_fastpitch/versions/1.8.1/files/tts_en_fastpitch_align.nemo, class_= ), PretrainedModelInfo( pretrained_model_name=tts_en_fastpitch_ipa, description=This model is trained on LJSpeech sampled at 22050Hz with and can be used to generate female English voices with an American accent. It is IPA-based., location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_fastpitch/versions/IPA_1.13.0/files/tts_en_fastpitch_align_ipa.nemo, class_= ), PretrainedModelInfo( pretrained_model_name=tts_en_fastpitch_multispeaker, description=This model is trained on HiFITTS sampled at 44100Hz with and can be used to generate male and female English voices with an American accent., location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_multispeaker_fastpitchhifigan/versions/1.10.0/files/tts_en_fastpitch_multispeaker.nemo, class_= ), PretrainedModelInfo( pretrained_model_name=tts_de_fastpitch_singlespeaker, description=This model is trained on a single male speaker data in OpenSLR Neutral German Dataset sampled at 22050Hz and can be used to generate male German voices., location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.10.0/files/tts_de_fastpitch_align.nemo, class_= ), PretrainedModelInfo( pretrained_model_name=tts_de_fastpitch_multispeaker_5, description=This model is trained on 5 speakers in HUI-Audio-Corpus-German clean subset sampled at 44100Hz with and can be used to generate male and female German voices., location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_fastpitch_multispeaker_5.nemo, class_= )] From the above key-value pair `pretrained_model_name=tts_en_fastpitch`, you could get the model name `tts_en_fastpitch` and load it by running, model = nemo_tts.models.FastPitchModel.from_pretrained(model_name="tts_en_fastpitch") If you would like to programmatically list the models available for a particular base class, you can use the `list_available_models()` method, nemo_tts.models..list_available_models() ### Inference and Audio Generation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#inference-and-audio-generation "Link to this heading") NeMo TTS supports both cascaded and end-to-end models to synthesize audios. Most of steps in between are the same except that cascaded models need to load an extra vocoder model before generating audios. Below code snippet demonstrates steps of generating a audio sample from a text input using a cascaded FastPitch and HiFiGAN models. Please refer to NeMo TTS Collection API for detailed implementation of model classes. import nemo.collections.tts as nemo_tts # Load mel spectrogram generator spec_generator = nemo_tts.models.FastPitchModel.from_pretrained("tts_en_fastpitch") # Load vocoder vocoder = nemo_tts.models.HifiGanModel.from_pretrained(model_name="tts_en_hifigan") # Generate audio import soundfile as sf parsed = spec_generator.parse("You can type your sentence here to get nemo to produce speech.") spectrogram = spec_generator.generate_spectrogram(tokens=parsed) audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram) # Save the audio to disk in a file called speech.wav sf.write("speech.wav", audio.to('cpu').numpy(), 22050) ### Fine-Tuning on Different Datasets[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#fine-tuning-on-different-datasets "Link to this heading") There are multiple TTS tutorials provided in the directory of [tutorials/tts/](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/tts.md). Most of these tutorials demonstrate how to instantiate a pre-trained model, and prepare the model for fine-tuning on datasets with the same language or different languages, the same speaker or different speakers. * **cross-lingual fine-tuning**: [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/tts/FastPitch_GermanTTS_Training.ipynb.md) * **cross-speaker fine-tuning**: [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/tts/FastPitch_Finetuning.ipynb.md) NGC TTS Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#ngc-tts-models "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------- This section summarizes a full list of available NeMo TTS models that have been released in [NGC NeMo Text to Speech Collection](https://catalog.ngc.nvidia.com/orgs/nvidia/collections/nemo_tts/entities.md). You can download model checkpoints of your interest via either way below, * `wget ''` * `curl -LO ''` ### Speech/Text Aligners[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#speech-text-aligners "Link to this heading") | Locale | Model Name | Dataset | Sampling Rate | #Spk | Phoneme Unit | Model Class | Overview | Checkpoint | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | en-US | tts_en_radtts_aligner | LJSpeech | 22050Hz | 1 | ARPABET | nemo.collections.tts.models.aligner.AlignerModel | [tts_en_radtts_aligner](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_radtts_aligner.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_radtts_aligner/versions/ARPABET_1.11.0/files/Aligner.nemo` | | en-US | tts_en_radtts_aligner_ipa | LJSpeech | 22050Hz | 1 | IPA | nemo.collections.tts.models.aligner.AlignerModel | [tts_en_radtts_aligner](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_radtts_aligner.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_radtts_aligner/versions/IPA_1.13.0/files/Aligner.nemo` | ### Mel-Spectrogram Generators[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#mel-spectrogram-generators "Link to this heading") | Locale | Model Name | Dataset | Sampling Rate | #Spk | Symbols | Model Class | Overview | Checkpoint | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | en-US | tts_en_fastpitch | LJSpeech | 22050Hz | 1 | ARPABET | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_en_fastpitch](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_fastpitch.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_fastpitch/versions/1.8.1/files/tts_en_fastpitch_align.nemo` | | en-US | tts_en_fastpitch_ipa | LJSpeech | 22050Hz | 1 | IPA | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_en_fastpitch](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_fastpitch.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_fastpitch/versions/IPA_1.13.0/files/tts_en_fastpitch_align_ipa.nemo` | | en-US | tts_en_fastpitch_multispeaker | HiFiTTS | 44100Hz | 10 | ARPABET | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_en_multispeaker_fastpitchhifigan](https://ngc.nvidia.com/models/nvidia:nemo:tts_en_multispeaker_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_multispeaker_fastpitchhifigan/versions/1.10.0/files/tts_en_fastpitch_multispeaker.nemo` | | en-US | tts_en_lj_mixertts | LJSpeech | 22050Hz | 1 | ARPABET | nemo.collections.tts.models.mixer_tts.MixerTTSModel | [tts_en_lj_mixertts](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_mixertts.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_mixertts/versions/1.6.0/files/tts_en_lj_mixertts.nemo` | | en-US | tts_en_lj_mixerttsx | LJSpeech | 22050Hz | 1 | ARPABET | nemo.collections.tts.models.mixer_tts.MixerTTSModel | [tts_en_lj_mixerttsx](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_mixerttsx.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_mixerttsx/versions/1.6.0/files/tts_en_lj_mixerttsx.nemo` | | en-US | RAD-TTS | TBD | TBD | TBD | ARPABET | nemo.collections.tts.models.radtts.RadTTSModel | TBD | | | en-US | tts_en_tacotron2 | LJSpeech | 22050Hz | 1 | ARPABET | nemo.collections.tts.models.tacotron2.Tacotron2Model | [tts_en_tacotron2](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_tacotron2.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_tacotron2/versions/1.10.0/files/tts_en_tacotron2.nemo` | | de-DE | tts_de_fastpitch_multispeaker_5 | HUI Audio Corpus German | 44100Hz | 5 | ARPABET | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_de_fastpitch_multispeaker_5](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitch_multispeaker_5.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_fastpitch_multispeaker_5.nemo` | | de-DE | tts_de_fastpitch_singleSpeaker_thorstenNeutral_2102 | Thorsten Müller Neutral 21.02 dataset | 22050Hz | 1 | Graphemes | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_de_fastpitchhifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.15.0/files/tts_de_fastpitch_thorstens2102.nemo` | | de-DE | tts_de_fastpitch_singleSpeaker_thorstenNeutral_2210 | Thorsten Müller Neutral 22.10 dataset | 22050Hz | 1 | Graphemes | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_de_fastpitchhifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.15.0/files/tts_de_fastpitch_thorstens2210.nemo` | | es | tts_es_fastpitch_multispeaker | OpenSLR crowdsourced Latin American Spanish | 44100Hz | 174 | IPA | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_es_multispeaker_fastpitchhifigan](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_es_multispeaker_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.15.0/files/tts_es_fastpitch_multispeaker.nemo` | | zh-CN | tts_zh_fastpitch_sfspeech | SFSpeech Chinese/English Bilingual Speech | 22050Hz | 1 | pinyin | nemo.collections.tts.models.fastpitch.FastPitchModel | [tts_zh_fastpitch_hifigan_sfspeech](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_zh_fastpitch_hifigan_sfspeech.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.15.0/files/tts_zh_fastpitch_sfspeech.nemo` | ### Vocoders[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#vocoders "Link to this heading") | Locale | Model Name | Spectrogram Generator | Dataset | Sampling Rate | #Spk | Model Class | Overview | Checkpoint | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | en-US | tts_en_hifigan | librosa.filters.mel | LJSpeech | 22050Hz | 1 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_en_hifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_hifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_hifigan/versions/1.0.0rc1/files/tts_hifigan.nemo` | | en-US | tts_en_lj_hifigan_ft_mixertts | Mixer-TTS | LJSpeech | 22050Hz | 1 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_en_lj_hifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_hifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_hifigan/versions/1.6.0/files/tts_en_lj_hifigan_ft_mixertts.nemo` | | en-US | tts_en_lj_hifigan_ft_mixerttsx | Mixer-TTS-X | LJSpeech | 22050Hz | 1 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_en_lj_hifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_hifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_hifigan/versions/1.6.0/files/tts_en_lj_hifigan_ft_mixerttsx.nemo` | | en-US | tts_en_hifitts_hifigan_ft_fastpitch | FastPitch | HiFiTTS | 44100Hz | 10 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_en_multispeaker_fastpitchhifigan](https://ngc.nvidia.com/models/nvidia:nemo:tts_en_multispeaker_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_multispeaker_fastpitchhifigan/versions/1.10.0/files/tts_en_hifitts_hifigan_ft_fastpitch.nemo` | | en-US | tts_en_lj_univnet | librosa.filters.mel | LJSpeech | 22050Hz | 1 | nemo.collections.tts.models.univnet.UnivNetModel | [tts_en_lj_univnet](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_univnet.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_univnet/versions/1.7.0/files/tts_en_lj_univnet.nemo` | | en-US | tts_en_libritts_univnet | librosa.filters.mel | LibriTTS | 24000Hz | 1 | nemo.collections.tts.models.univnet.UnivNetModel | [tts_en_libritts_univnet](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_libritts_univnet.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_libritts_univnet/versions/1.7.0/files/tts_en_libritts_multispeaker_univnet.nemo` | | en-US | tts_en_waveglow_88m | librosa.filters.mel | LJSpeech | 22050Hz | 1 | nemo.collections.tts.models.waveglow.WaveGlowModel | [tts_en_waveglow_88m](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_waveglow_88m.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_waveglow_88m/versions/1.0.0/files/tts_waveglow.nemo` | | de-DE | tts_de_hui_hifigan_ft_fastpitch_multispeaker_5 | FastPitch | HUI Audio Corpus German | 44100Hz | 5 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_de_fastpitch_multispeaker_5](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitch_multispeaker_5.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitch_multispeaker_5/versions/1.11.0/files/tts_de_hui_hifigan_ft_fastpitch_multispeaker_5.nemo` | | de-DE | tts_de_hifigan_singleSpeaker_thorstenNeutral_2102 | FastPitch | Thorsten Müller Neutral 21.02 dataset | 22050Hz | 1 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_de_fastpitchhifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.15.0/files/tts_de_hifigan_thorstens2102.nemo` | | de-DE | tts_de_hifigan_singleSpeaker_thorstenNeutral_2210 | FastPitch | Thorsten Müller Neutral 22.10 dataset | 22050Hz | 1 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_de_fastpitchhifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_de_fastpitchhifigan/versions/1.15.0/files/tts_de_hifigan_thorstens2210.nemo` | | es | tts_es_hifigan_ft_fastpitch_multispeaker | FastPitch | OpenSLR crowdsourced Latin American Spanish | 44100Hz | 174 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_es_multispeaker_fastpitchhifigan](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_es_multispeaker_fastpitchhifigan.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_es_multispeaker_fastpitchhifigan/versions/1.15.0/files/tts_es_hifigan_ft_fastpitch_multispeaker.nemo` | | zh-CN | tts_zh_hifigan_sfspeech | FastPitch | SFSpeech Chinese/English Bilingual Speech | 22050Hz | 1 | nemo.collections.tts.models.hifigan.HifiGanModel | [tts_zh_fastpitch_hifigan_sfspeech](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_zh_fastpitch_hifigan_sfspeech.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_zh_fastpitch_hifigan_sfspeech/versions/1.15.0/files/tts_zh_hifigan_sfspeech.nemo` | ### End2End models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#end2end-models "Link to this heading") | Locale | Model Name | Dataset | Sampling Rate | #Spk | Phoneme Unit | Model Class | Overview | Checkpoint | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | en-US | tts_en_lj_vits | LJSpeech | 22050Hz | 1 | IPA | nemo.collections.tts.models.vits.VitsModel | [tts_en_lj_vits](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_vits.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_lj_vits/versions/1.13.0/files/vits_ljspeech_fp16_full.nemo` | | en-US | tts_en_hifitts_vits | HiFiTTS | 44100Hz | 10 | IPA | nemo.collections.tts.models.vits.VitsModel | [tts_en_hifitts_vits](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_hifitts_vits.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_en_hifitts_vits/versions/r1.15.0/files/vits_en_hifitts.nemo` | ### Codec models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#codec-models "Link to this heading") | Model Name | Dataset | Sampling Rate | Model Class | Overview | Checkpoint | | --- | --- | --- | --- | --- | --- | | audio_codec_16khz_small | Libri-Light | 16000Hz | nemo.collections.tts.models.AudioCodecModel | [audio_codec_16khz_small](https://ngc.nvidia.com/catalog/models/nvidia:nemo:audio_codec_16khz_small.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/audio_codec_16khz_small/versions/v1/files/audio_codec_16khz_small.nemo` | | mel_codec_22khz_medium | LibriVox and Common Voice | 22050Hz | nemo.collections.tts.models.AudioCodecModel | [mel_codec_22khz_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_22khz_medium.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/mel_codec_22khz_medium/versions/v1/files/mel_codec_22khz_medium.nemo` | | mel_codec_44khz_medium | LibriVox and Common Voice | 44100Hz | nemo.collections.tts.models.AudioCodecModel | [mel_codec_44khz_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_44khz_medium.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/mel_codec_44khz_medium/versions/v1/files/mel_codec_44khz_medium.nemo` | | mel_codec_22khz_fullband_medium | LibriVox and Common Voice | 22050Hz | nemo.collections.tts.models.AudioCodecModel | [mel_codec_22khz_fullband_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_22khz_fullband_medium.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/mel_codec_22khz_fullband_medium/versions/v1/files/mel_codec_22khz_fullband_medium.nemo` | | mel_codec_44khz_fullband_medium | LibriVox and Common Voice | 44100Hz | nemo.collections.tts.models.AudioCodecModel | [mel_codec_44khz_fullband_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_44khz_fullband_medium.md) | `https://api.ngc.nvidia.com/v2/models/nvidia/nemo/mel_codec_44khz_fullband_medium/versions/v1/files/mel_codec_44khz_fullband_medium.nemo` | | nvidia/low-frame-rate-speech-codec-22khz | LibriVox and Common Voice | 22050Hz | nemo.collections.tts.models.AudioCodecModel | [audio_codec_low_frame_rate_22khz](https://huggingface.co/nvidia/low-frame-rate-speech-codec-22khz.md) | `https://huggingface.co/nvidia/low-frame-rate-speech-codec-22khz/resolve/main/low-frame-rate-speech-codec-22khz.nemo` | | nvidia/audio-codec-22khz | LibriVox and Common Voice | 22050Hz | nemo.collections.tts.models.AudioCodecModel | [audio-codec-22khz](https://huggingface.co/nvidia/audio-codec-22khz.md) | `https://huggingface.co/nvidia/audio-codec-22khz/resolve/main/audio-codec-22khz.nemo` | | nvidia/audio-codec-44khz | LibriVox and Common Voice | 44100Hz | nemo.collections.tts.models.AudioCodecModel | [audio-codec-44khz](https://huggingface.co/nvidia/audio-codec-44khz.md) | `https://huggingface.co/nvidia/audio-codec-44khz/resolve/main/audio-codec-44khz.nemo` | | nvidia/mel-codec-22khz | LibriVox and Common Voice | 22050Hz | nemo.collections.tts.models.AudioCodecModel | [mel-codec-22khz](https://huggingface.co/nvidia/mel-codec-22khz.md) | `https://huggingface.co/nvidia/mel-codec-22khz/resolve/main/mel-codec-22khz.nemo` | | nvidia/mel-codec-44khz | LibriVox and Common Voice | 44100Hz | nemo.collections.tts.models.AudioCodecModel | [mel-codec-44khz](https://huggingface.co/nvidia/mel-codec-44khz.md) | `https://huggingface.co/nvidia/mel-codec-44khz/resolve/main/mel-codec-44khz.nemo` | Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#codec-models) - [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md#ngc-tts-models) - [NeMo Text to Speech collection](https://catalog.ngc.nvidia.com/orgs/nvidia/collections/nemo_tts.md) - [tutorials/tts/](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/tts.md) - [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo/tree/stable/tutorials/tts/FastPitch_Finetuning.ipynb.md) - [NGC NeMo Text to Speech Collection](https://catalog.ngc.nvidia.com/orgs/nvidia/collections/nemo_tts/entities.md) - [tts_en_radtts_aligner](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_radtts_aligner.md) - [tts_en_fastpitch](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_fastpitch.md) - [tts_en_multispeaker_fastpitchhifigan](https://ngc.nvidia.com/models/nvidia:nemo:tts_en_multispeaker_fastpitchhifigan.md) - [tts_en_lj_mixertts](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_mixertts.md) - [tts_en_lj_mixerttsx](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_mixerttsx.md) - [tts_en_tacotron2](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_tacotron2.md) - [tts_de_fastpitch_multispeaker_5](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitch_multispeaker_5.md) - [tts_de_fastpitchhifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_de_fastpitchhifigan.md) - [tts_es_multispeaker_fastpitchhifigan](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_es_multispeaker_fastpitchhifigan.md) - [tts_zh_fastpitch_hifigan_sfspeech](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_zh_fastpitch_hifigan_sfspeech.md) - [tts_en_hifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_hifigan.md) - [tts_en_lj_hifigan](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_hifigan.md) - [tts_en_lj_univnet](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_univnet.md) - [tts_en_libritts_univnet](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_libritts_univnet.md) - [tts_en_waveglow_88m](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_waveglow_88m.md) - [tts_en_lj_vits](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_lj_vits.md) - [tts_en_hifitts_vits](https://ngc.nvidia.com/catalog/models/nvidia:nemo:tts_en_hifitts_vits.md) - [audio_codec_16khz_small](https://ngc.nvidia.com/catalog/models/nvidia:nemo:audio_codec_16khz_small.md) - [mel_codec_22khz_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_22khz_medium.md) - [mel_codec_44khz_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_44khz_medium.md) - [mel_codec_22khz_fullband_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_22khz_fullband_medium.md) - [mel_codec_44khz_fullband_medium](https://ngc.nvidia.com/catalog/models/nvidia:nemo:mel_codec_44khz_fullband_medium.md) - [audio_codec_low_frame_rate_22khz](https://huggingface.co/nvidia/low-frame-rate-speech-codec-22khz.md) - [audio-codec-22khz](https://huggingface.co/nvidia/audio-codec-22khz.md) - [audio-codec-44khz](https://huggingface.co/nvidia/audio-codec-44khz.md) - [mel-codec-22khz](https://huggingface.co/nvidia/mel-codec-22khz.md) - [mel-codec-44khz](https://huggingface.co/nvidia/mel-codec-44khz.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md Title: Models — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html Published Time: Fri, 18 Jul 2025 19:26:52 GMT Markdown Content: Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#models "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------- This section provides a brief overview of TTS models that NeMo’s TTS collection currently supports. * **Model Recipes** can be accessed through [examples/tts/*.py](https://github.com/NVIDIA/NeMo/tree/stable/examples/tts.md). * **Configuration Files** can be found in the directory of [examples/tts/conf/](https://github.com/NVIDIA/NeMo/tree/stable/examples/tts/conf.md). For detailed information about TTS configuration files and how they should be structured, please refer to the section [NeMo TTS Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/configs.html.md). * **Pretrained Model Checkpoints** are available for any users for immediately synthesizing speech or fine-tuning models on your custom datasets. Please follow the section [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md) for instructions on how to use those pretrained models. Mel-Spectrogram Generators[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#mel-spectrogram-generators "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------- ### FastPitch[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#fastpitch "Link to this heading") FastPitch is a fully-parallel text-to-speech synthesis model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency contours improves the overall quality of synthesized speech, making it comparable to the state of the art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformers architecture, with over 900x real-time factor for mel-spectrogram synthesis of a typical utterance. The architecture of FastPitch is shown below. It is based on FastSpeech and consists of two feed-forward Transformer (FFTr) stacks. The first FFTr operates in the resolution of input tokens, and the other one in the resolution of the output frames. Please refer to [[TTS-MODELS12](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id16 "Adrian Łańcucki. Fastpitch: parallel text-to-speech with pitch prediction. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6588–6592. IEEE, 2021.")] for details. > [![Image 1: fastpitch model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/fastpitch_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/fastpitch_model.png) ### Mixer-TTS/Mixer-TTS-X[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#mixer-tts-mixer-tts-x "Link to this heading") Mixer-TTS is a non-autoregressive model for mel-spectrogram generation. The model is based on MLP-Mixer architecture adapted for speech synthesis. The basic Mixer-TTS contains pitch and duration predictors, with the latter being trained with supervised TTS alignment framework. Alongside the basic model, we propose the extended version, Mixer-TTS-X, which additionally uses token embeddings from a pre-trained language model. Basic Mixer-TTS and its extended version have a small number of parameters and enable much faster speech synthesis compared to the models with similar quality. The model architectures of basic Mixer-TTS is shown below (left). The basic Mixer-TTS uses the same architectures of duration and pitch predictors as FastPitch, but it has two major changes. It replaces all feed-forward transformer-based blocks in the encoder and decoder with new Mixer-TTS blocks (right); it uses an unsupervised speech-to-text alignment framework to train the duration predictor. Please refer to [[TTS-MODELS10](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id17 "Oktai Tatanov, Stanislav Beliaev, and Boris Ginsburg. Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7482–7486. IEEE, 2022.")] for details. > [![Image 2: mixertts model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/mixertts_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/mixertts_model.png) ### RAD-TTS[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#rad-tts "Link to this heading") RAD-TTS introduces a predominantly parallel, end-to-end TTS model based on normalizing flows. It extends prior parallel approaches by additionally modeling speech rhythm as a separate generative distribution to facilitate variable token duration during inference. RAD-TTS further designs a robust framework for the on-line extraction of speech-text alignments, which is a critical yet highly unstable learning problem in end-to-end TTS frameworks. Overall, RAD-TTS yields improved alignment quality, better output diversity compared to controlled baselines. The following diagrams summarizes the inference pipeline for RAD-TTS. The duration normalizing flow first samples the phoneme durations which are then used to prepare the input to the parallel Mel-Decoder flow. Please refer to [[TTS-MODELS9](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id18 "Kevin J Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, and Bryan Catanzaro. RAD-TTS: parallel flow-based TTS with robust alignment learning and diverse synthesis. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models. 2021.")] for details. > [![Image 3: radtts model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/radtts_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/radtts_model.png) ### Tacotron2[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#tacotron2 "Link to this heading") Tacotron 2 consists of a recurrent sequence-to-sequence feature prediction network with attention that maps character embeddings to mel-spectrogram frames, and a modified version of WaveNet as a vocoder that generate time-domain waveform samples conditioned on the predicted mel-spectrogram frames. This system uses mel-spectrograms as the conditioning input to WaveNet instead of linguistic, duration, and F0 features, which shows a significant reduction in the size of the WaveNet architecture. The block diagram of the Tacotron 2 architecture is shown below. Please refer to [[TTS-MODELS8](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id15 "Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerrv-Ryan, and others. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), 4779–4783. IEEE, 2018.")] for details. > [![Image 4: tacotron2 model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/tacotron2_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/tacotron2_model.png) ### SSL FastPitch[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#ssl-fastpitch "Link to this heading") This **experimental** version of FastPitch takes in content and speaker embeddings generated by an SSL Disentangler and generates mel-spectrograms, with the goal that voice characteristics are taken from the speaker embedding while the content of speech is determined by the content embedding. Voice conversion can be done using this model by swapping the speaker embedding input to that of a target speaker, while keeping the content embedding the same. More details to come. Vocoders[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#vocoders "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------- ### HiFiGAN[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#hifigan "Link to this heading") HiFi-GAN focuses on designing a vocoder model that efficiently synthesizes raw waveform audios from the intermediate mel-spectrograms. It consists of one generator and two discriminators (multi-scale and multi-period). The generator and discriminators are trained adversarially with two additional loses for improving training stability and model performance. The generator is a fully convolutional neural network which takes a mel-spectrogram as input and upsamples it through transposed convolutions until the length of the output sequence matches the temporal resolution of raw waveforms. Every transposed convolution is followed by a multi-receptive field fusion (MRF) module. The architecture of the generator is shown below (left). Multi-period discriminator (MPD) is a mixer of sub-discriminators, each of which only accepts equally spaced samples of an input audio. The sub-discriminators are designed to capture different implicit structures from each other by looking at different parts of an input audio. While MPD only accepts disjoint samples, multi-scale discriminator (MSD) is added to consecutively evaluate the audio sequence. MSD is a mixer of 3 sub-discriminators operating on different input scales (raw audio, x2 average-pooled audio, and x4 average-pooled audio). HiFi-GAN could achieve both higher computational efficiency and sample quality than the best publicly available auto-regressive or flow-based models, such as WaveNet and WaveGlow. Please refer to [[TTS-MODELS5](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id19 "Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems, 33:17022–17033, 2020.")] for details. > [![Image 5: hifigan_g model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/hifigan_g_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/hifigan_g_model.png) 1. Generator [![Image 6: hifigan_d model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/hifigan_d_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/hifigan_d_model.png) 2. Discriminators ### UnivNet[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#univnet "Link to this heading") UnivNet is a neural vocoder that synthesizes high-fidelity waveforms in real time. It consists of a generator and two waveform discriminators (multi-period and multi-resolution). The generator is inspired by MelGAN, and adds a location-variable convolution (LVC) to efficiently capture the local information of the log-mel-spectrogram. The kernels of the LVC layers are predicted using a kernel predictor that takes as input the log-mel-spectrograms. The architecture of the generator is shown below (left). Multi-resolution spectrogram discriminator (MRSD) uses multiple linear spectrogram magnitudes with various temporal and spectral resolutions so that generating high-resolution signals over the full-band is possible. Multi-period waveform discriminator (MPWD) is added to improve detailed adversarial modeling in temporal domain. The architecture of the discriminators is shown below (right). Please refer to [[TTS-MODELS3](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id21 "Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. In Proc. Interspeech 2021, 2207–2211. 2021. doi:10.21437/Interspeech.2021-1016.")] for details. > [![Image 7: univnet model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/univnet_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/univnet_model.png) ### WaveGlow[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#waveglow "Link to this heading") WaveGlow combines insights from Glow and WaveNet to provide fast, efficient and high quality audio synthesis without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function, i.e. maximizing the likelihood of the training data, which makes the training procedure simple and stable. Despite the simplicity of the model, our Pytorch implementation could synthesizes speech at more than 500kHz on an NVIDIA V100 GPU, and its audio quality is as good as the best publicly available WaveNet implementation trained on the same data. The model network is most similar to the recent Glow work as shown below. For the forward pass through the network, we take groups of 8 audio samples as vectors, which is called as “squeeze” operation. We then process these vectors through several “steps of flow”, each of which consists of an invertible 1x1 convolution followed by an affine coupling layer. Please refer to [[TTS-MODELS7](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id20 "Ryan Prenger, Rafael Valle, and Bryan Catanzaro. Waveglow: a flow-based generative network for speech synthesis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3617–3621. IEEE, 2019.")] for details. > [![Image 8: waveglow model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/waveglow_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/waveglow_model.png) Speech-to-Text Aligners[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#speech-to-text-aligners "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------- ### RAD-TTS Aligner[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#rad-tts-aligner "Link to this heading") Speech-to-text alignment is a critical component of neural TTS models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive end-to-end TTS models rely on durations extracted from external sources. RAD-TTS Aligner leverages the alignment mechanism proposed in RAD-TTS and demonstrates its applicability to wide variety of neural TTS models. The alignment learning framework combines the forward-sum algorithm, Viterbi algorithm, and an efficient static prior. RAD-TTS Aligner can improve all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed, simplifies the training pipeline by eliminating need for external aligners, enhances robustness to errors on long utterances and improves the perceived speech synthesis quality, as judged by human evaluators. The alignment framework is shown below. Please refer to [[TTS-MODELS1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id22 "Rohan Badlani, Adrian Łańcucki, Kevin J Shih, Rafael Valle, Wei Ping, and Bryan Catanzaro. One TTS alignment to rule them all. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6092–6096. IEEE, 2022.")] for details. > [![Image 9: rad-aligner model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/radaligner_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/radaligner_model.png) End2End Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#end2end-models "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------- ### VITS[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#vits "Link to this heading") VITS is an end-to-end speech synthesis model, which generates raw waveform audios from grapheme/phoneme input. It uses Variational Autoencoder to combine GlowTTS-like spectrogram generator with HiFi-GAN vocoder model. Also, it has separate flow-based duration predictor, which samples alignments from noise with conditioning on text. Please refer to [[TTS-MODELS4](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id28 "Jaehyeon Kim, Jungil Kong, and Juhee Son. Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech. In International Conference on Machine Learning, 5530–5540. PMLR, 2021.")] for details. The model is experimental yet, so we do not guarantee clean running. > [![Image 10: vits model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/vits_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/vits_model.png) Enhancers[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#enhancers "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------- ### Spectrogram Enhancer[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#spectrogram-enhancer "Link to this heading") GAN-based model to add details to blurry spectrograms from TTS models like Tacotron or FastPitch. Codecs[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#codecs "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------- ### Audio Codec[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#audio-codec "Link to this heading") The NeMo Audio Codec model is a non-autoregressive convolutional encoder-quantizer-decoder model for coding or tokenization of raw audio signals or mel-spectrogram features. The NeMo Audio Codec model supports residual vector quantizer (RVQ) [[TTS-MODELS11](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id29 "Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. SoundStream: an end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495-507, 2022. doi:10.1109/TASLP.2021.3129994.")] and finite scalar quantizer (FSQ) [[TTS-MODELS6](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id31 "Fabian Mentzer, David Minnen, Eirikur Agustsson, and Michael Tschannen. Finite scalar quantization: VQ-VAE made simple. arXiv preprint arXiv:2309.15505, 2023.")] for quantization of the encoder output. This model is trained end-to-end using generative loss, discriminative loss, and reconstruction loss, similar to other neural audio codecs such as SoundStream [[TTS-MODELS11](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id29 "Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. SoundStream: an end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495-507, 2022. doi:10.1109/TASLP.2021.3129994.")] and EnCodec [[TTS-MODELS2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id30 "Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi. High fidelity neural audio compression. arXiv preprint arXiv:2210.13438, 2022.")]. For further information refer to the `Audio Codec Training` tutorial in the TTS tutorial section. > [![Image 11: audiocodec model](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/audiocodec_model.png)](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/audiocodec_model.png) References[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#references "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------- [[TTS-MODELS1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id8)] Rohan Badlani, Adrian Łańcucki, Kevin J Shih, Rafael Valle, Wei Ping, and Bryan Catanzaro. One TTS alignment to rule them all. In _ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)_, 6092–6096. IEEE, 2022. [[TTS-MODELS2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id13)] Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi. High fidelity neural audio compression. _arXiv preprint arXiv:2210.13438_, 2022. [[TTS-MODELS3](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id6)] Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. In _Proc. Interspeech 2021_, 2207–2211. 2021. [doi:10.21437/Interspeech.2021-1016](https://doi.org/10.21437/Interspeech.2021-1016.md). [[TTS-MODELS4](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id9)] Jaehyeon Kim, Jungil Kong, and Juhee Son. Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech. In _International Conference on Machine Learning_, 5530–5540. PMLR, 2021. [[TTS-MODELS5](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id5)] Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. _Advances in Neural Information Processing Systems_, 33:17022–17033, 2020. [[TTS-MODELS6](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id11)] Fabian Mentzer, David Minnen, Eirikur Agustsson, and Michael Tschannen. Finite scalar quantization: VQ-VAE made simple. _arXiv preprint arXiv:2309.15505_, 2023. [[TTS-MODELS7](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id7)] Ryan Prenger, Rafael Valle, and Bryan Catanzaro. Waveglow: a flow-based generative network for speech synthesis. In _ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)_, 3617–3621. IEEE, 2019. [[TTS-MODELS8](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id4)] Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerrv-Ryan, and others. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In _2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)_, 4779–4783. IEEE, 2018. [[TTS-MODELS9](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id3)] Kevin J Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, and Bryan Catanzaro. RAD-TTS: parallel flow-based TTS with robust alignment learning and diverse synthesis. In _ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models_. 2021. [[TTS-MODELS10](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id2)] Oktai Tatanov, Stanislav Beliaev, and Boris Ginsburg. Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings. In _ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)_, 7482–7486. IEEE, 2022. [TTS-MODELS11]([1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id10),[2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id12)) Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. SoundStream: an end-to-end neural audio codec. _IEEE/ACM Transactions on Audio, Speech, and Language Processing_, 30:495–507, 2022. [doi:10.1109/TASLP.2021.3129994](https://doi.org/10.1109/TASLP.2021.3129994.md). [[TTS-MODELS12](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id1)] Adrian Łańcucki. Fastpitch: parallel text-to-speech with pitch prediction. In _ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)_, 6588–6592. IEEE, 2021. Links/Buttons: - [1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id10) - [2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id12) - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#references) - [examples/tts/*.py](https://github.com/NVIDIA/NeMo/tree/stable/examples/tts.md) - [examples/tts/conf/](https://github.com/NVIDIA/NeMo/tree/stable/examples/tts/conf.md) - [NeMo TTS Configuration Files](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/configs.html.md) - [Checkpoints](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md) - [TTS-MODELS12](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id1) - [](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/audiocodec_model.png) - [TTS-MODELS10](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id2) - [TTS-MODELS9](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id3) - [TTS-MODELS8](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id4) - [TTS-MODELS5](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id5) - [TTS-MODELS3](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id6) - [TTS-MODELS7](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id7) - [TTS-MODELS1](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id8) - [TTS-MODELS4](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id9) - [TTS-MODELS11](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id29) - [TTS-MODELS6](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id11) - [TTS-MODELS2](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/models.html.md#id13) - [doi:10.21437/Interspeech.2021-1016](https://doi.org/10.21437/Interspeech.2021-1016.md) - [doi:10.1109/TASLP.2021.3129994](https://doi.org/10.1109/TASLP.2021.3129994.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md Title: Overview — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html Published Time: Thu, 30 Oct 2025 07:07:33 GMT Markdown Content: Overview[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#overview "Link to this heading") ------------------------------------------------------------------------------------------------------------------- NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and developers working on [Large Language Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/index.html.md#nemo-2-llms), Multimodal, and [Speech AI](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html.md) (e.g. [Automatic Speech Recognition](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md) and [Text-to-Speech](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/intro.html.md)). It enables users to efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints. **Setup Instructions**: [Install NeMo Framework](https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html.md#install-nemo-framework) Large Language Models and Multimodal Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#large-language-models-and-multimodal-models "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo Framework provides end-to-end support for developing Large Language Models (LLMs) and Multimodal Models (MMs). It provides the flexibility to be used on-premises, in a data-center, or with your preferred cloud provider. It also supports execution on SLURM or Kubernetes enabled environments. ![Image 1: _images/nemo-llm-mm-stack.png](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/nemo-llm-mm-stack.png) ### Data Curation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#data-curation "Link to this heading") [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator)[[1]](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html#f1) is a Python library that includes a suite of modules for data-mining and synthetic data generation. They are scalable and optimized for GPUs, making them ideal for curating natural language data to train or fine-tune LLMs. With NeMo Curator, you can efficiently extract high-quality text from extensive raw web data sources. ### Training and Customization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#training-and-customization "Link to this heading") NeMo Framework provides tools for efficient training and customization of [LLMs](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/index.html.md#nemo-2-llms) and Multimodal models. It includes default configurations for compute cluster setup, data downloading, and model hyperparameters, which can be adjusted to train on new datasets and models. In addition to pre-training, NeMo supports both Supervised Fine-Tuning (SFT) and Parameter Efficient Fine-Tuning (PEFT) techniques like LoRA, Ptuning, and more. Two options are available to launch training in NeMo - using the NeMo 2.0 API interface or with [NeMo Run](https://github.com/NVIDIA/NeMo-Run). * **With NeMo Run (Recommended):** NeMo Run provides an interface to streamline configuration, execution and management of experiments across various compute environments. This includes launching jobs on your workstation locally or on big clusters - both SLURM enabled or Kubernetes in a cloud environment. * **Using the NeMo 2.0 API:** This method works well with a simple setup involving small models, or if you are interested in writing your own custom dataloader, training loops, or change model layers. It gives you more flexibility and control over configurations, and makes it easy to extend and customize configurations programmatically. ### RL[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#rl "Link to this heading") [NeMo RL](https://github.com/NVIDIA/NeMo-RL)[[1]](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html#f1) is a scalable and efficient post-training library designed for models ranging from 1 GPU to thousands, and from tiny to over 100 billion parameters. What you can expect: * **Seamless integration with Hugging Face** for ease of use, allowing users to leverage a wide range of pre-trained models and tools. * **High-performance implementation with Megatron Core**, supporting various parallelism techniques for large models (>100B) and large context lengths. * **Efficient resource management using Ray**, enabling scalable and flexible deployment across different hardware configurations. * **Flexibility** with a modular design that allows easy integration and customization. * **Comprehensive documentation** that is both detailed and user-friendly, with practical examples. Check out the [NeMo RL Documentation](https://docs.nvidia.com/nemo/rl/latest/index.html.md) for more information. ### Multimodal Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#multimodal-models "Link to this heading") NeMo Framework provides optimized software to train and deploy state-of-the-art multimodal models across several categories: Multimodal Language Models, Vision-Language Foundations, Text-to-Image models, and Beyond 2D Generation using Neural Radiance Fields (NeRF). Each category is designed to cater to specific needs and advancements in the field, leveraging cutting-edge models to handle a wide range of data types, including text, images, and 3D models. Note We are migrating support for multimodal models from NeMo 1.0 to NeMo 2.0. If you want to explore this domain in the meantime, please refer to the documentation for the NeMo 24.07 (previous) release. ### Deployment and Inference[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#deployment-and-inference "Link to this heading") NeMo Framework provides various paths for LLM inference, catering to different deployment scenarios and performance needs. #### Deploy with NVIDIA NIM[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#deploy-with-nvidia-nim "Link to this heading") NeMo Framework seamlessly integrates with enterprise-level model deployment tools through [NVIDIA NIM](https://www.nvidia.com/en-gb/launchpad/ai/generative-ai-inference-with-nim.md). This integration is powered by NVIDIA TensorRT-LLM ensuring optimized and scalable inference. For more information on NIM, visit the [NVIDIA website](https://www.nvidia.com/en-gb/launchpad/ai/generative-ai-inference-with-nim). #### Deploy with TensorRT-LLM or vLLM[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#deploy-with-tensorrt-llm-or-vllm "Link to this heading") NeMo Framework offers scripts and APIs to export models to two inference optimized libraries, TensorRT-LLM and vLLM, and to deploy the exported model with the NVIDIA Triton Inference Server. For scenarios requiring optimized performance, NeMo models can leverage TensorRT-LLM, a specialized library for accelerating and optimizing LLM inference on NVIDIA GPUs. This process involves converting NeMo models into a format compatible with TensorRT-LLM using the nemo.export module. > LLM Deployment Overview > > > Deploy NeMo Large Language Models with NIM > > > Deploy NeMo Large Language Models with TensorRT-LLM > > > Deploy NeMo Large Language Models with vLLM ### Supported Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#supported-models "Link to this heading") #### Large Language Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#large-language-models "Link to this heading") Large Language Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#id3 "Link to this table")| Large Language Models | Pretraining & SFT | PEFT | Alignment | FP8 Training Convergence | TRT/TRTLLM | Convert To & From Hugging Face | Evaluation | | --- | --- | --- | --- | --- | --- | --- | --- | | [Llama3 8B/70B, Llama3.1 405B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/llama3.html.md#llama) | Yes | Yes | x | Yes (partially verified) | Yes | Both | Yes | | [Mixtral 8x7B/8x22B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#mixtral) | Yes | Yes | x | Yes (unverified) | Yes | Both | Yes | | [Nemotron 3 8B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron.html.md#nemotron) | Yes | x | x | Yes (unverified) | x | Both | Yes | | [Nemotron 4 340B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron.html.md#nemotron) | Yes | x | x | Yes (unverified) | x | Both | Yes | | [Baichuan2 7B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/baichuan2.html.md#baichuan) | Yes | Yes | x | Yes (unverified) | x | Both | Yes | | [ChatGLM3 6B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/chatglm3.html.md#chatglm) | Yes | Yes | x | Yes (unverified) | x | Both | Yes | | [DeepSeek V2/V3](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/deepseek_v3.html.md#deepseek-v3) | Yes | Yes | x | Yes (unverified) | x | Both | Yes | | [Gemma 2B/7B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma.html.md#gemma) | Yes | Yes | x | Yes (unverified) | Yes | Both | Yes | | [Gemma2 2B/9B/27B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma2.html.md#gemma2) | Yes | Yes | x | Yes (unverified) | x | Both | Yes | | [GPT-OSS 20B/120B/](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#gpt-oss) | Yes | Yes | x | Yes (unverified) | x | Both | Yes | | [Mamba2 130M/370M/780M/1.3B/2.7B/8B/ Hybrid-8B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mamba.html.md#mamba) | Yes | Yes | x | Yes (unverified) | x | x | Yes | | [Phi3 mini 4k](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html.md#phi3) | x | Yes | x | Yes (unverified) | x | x | x | | [Qwen2 0.5B/1.5B/7B/72B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/qwen2.html.md#qwen2) | Yes | Yes | x | Yes (unverified) | Yes | Both | Yes | | [Qwen3 0.6B/1.7B/4B/8B/14B/32B/30B_A3B/235B_A22B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/qwen3.html.md#qwen3) | Yes | Yes | x | Yes (unverified) | Yes | Both | Yes | | [StarCoder 15B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/starcoder.html.md#starcoder) | Yes | Yes | x | Yes (unverified) | Yes | Both | Yes | | [StarCoder2 3B/7B/15B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/starcoder2.html.md#starcoder2) | Yes | Yes | x | Yes (unverified) | Yes | Both | Yes | | [BERT 110M/340M](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/bert.html.md#bert) | Yes | Yes | x | Yes (unverified) | x | Both | x | | [T5 220M/3B/11B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#t5) | Yes | Yes | x | x | x | x | x | #### Vision Language Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#vision-language-models "Link to this heading") Vision Language Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#id4 "Link to this table")| Vision Language Models | Pretraining & SFT | PEFT | Alignment | FP8 Training Convergence | TRT/TRTLLM | Convert To & From Hugging Face | Evaluation | | --- | --- | --- | --- | --- | --- | --- | --- | | [NeVA (LLaVA 1.5)](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/neva.html.md#neva) | Yes | Yes | x | Yes (unverified) | x | From | x | | [Llama 3.2 Vision 11B/90B](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/mllama.html.md#mllama) | Yes | Yes | x | Yes (unverified) | x | From | x | | [LLaVA Next (LLaVA 1.6)](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#llavanext) | Yes | Yes | x | Yes (unverified) | x | From | x | | [Llama Nemotron Nano VL 8B](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llama_nemotron_vl.html.md#llama-nemotron-vl) | Yes | Yes | x | Yes (unverified) | x | From | x | #### Embedding Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#embedding-models "Link to this heading") Embedding Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#id5 "Link to this table")| Embedding Language Models | Pretraining & SFT | PEFT | Alignment | FP8 Training Convergence | TRT/TRTLLM | Convert To & From Hugging Face | Evaluation | | --- | --- | --- | --- | --- | --- | --- | --- | | [SBERT 340M](https://docs.nvidia.com/nemo-framework/user-guide/latest/embeddingmodels/bert/sbert.html.md#sbert) | Yes | x | x | Yes (unverified) | x | Both | x | | [Llama 3.2 Embedding 1B](https://docs.nvidia.com/nemo-framework/user-guide/latest/embeddingmodels/gpt/llama_embedding.html.md#llama-embed) | Yes | x | x | Yes (unverified) | x | Both | x | #### World Foundation Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#world-foundation-models "Link to this heading") World Foundation Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#id6 "Link to this table")| World Foundation Models | Post-Training | Accelerated Inference | | --- | --- | --- | | [Cosmos-1.0-Diffusion-Text2World-7B](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/post_training/README.md) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/inference/README.md) | | [Cosmos-1.0-Diffusion-Text2World-14B](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Text2World) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/post_training/README.md) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/inference/README.md) | | Cosmos-1.0-Diffusion-Video2World-7B | Coming Soon | Coming Soon | | Cosmos-1.0-Diffusion-Video2World-14B | Coming Soon | Coming Soon | | [Cosmos-1.0-Autoregressive-4B](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-4B) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/post_training/README.md) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/inference/README.md) | | Cosmos-1.0-Autoregressive-Video2World-5B | Coming Soon | Coming Soon | | [Cosmos-1.0-Autoregressive-12B](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-12B) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/post_training/README.md) | [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/inference/README.md) | | Cosmos-1.0-Autoregressive-Video2World-13B | Coming Soon | Coming Soon | Note NeMo also supports pretraining for both [diffusion](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/diffusion) and [autoregressive](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/multimodal_autoregressive) architectures `text2world` foundation models. Speech AI[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#speech-ai "Link to this heading") --------------------------------------------------------------------------------------------------------------------- Developing conversational AI models is a complex process that involves defining, constructing, and training models within particular domains. This process typically requires several iterations to reach a high level of accuracy. It often involves multiple iterations to achieve high accuracy, fine-tuning on various tasks and domain-specific data, ensuring training performance, and preparing models for inference deployment. ![Image 2: _images/nemo-speech-ai.png](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/nemo-speech-ai.png) NeMo Framework provides support for the training and customization of Speech AI models. This includes tasks like Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) synthesis. It offers a smooth transition to enterprise-level production deployment with [NVIDIA Riva](https://developer.nvidia.com/riva.md). To assist developers and researchers, NeMo Framework includes state-of-the-art pre-trained checkpoints, tools for reproducible speech data processing, and features for interactive exploration and analysis of speech datasets. The components of the NeMo Framework for Speech AI are as follows: Training and Customization NeMo Framework contains everything needed to train and customize speech models ([ASR](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md), [Speech Classification](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speech_classification/intro.html.md), [Speaker Recognition](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/intro.html.md), [Speaker Diarization](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/intro.html.md), and [TTS](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/intro.html.md)) in a reproducible manner. SOTA Pre-trained Models NeMo Framework provides state-of-the-art recipes and pre-trained checkpoints of several [ASR](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/results.html.md) and [TTS](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md) models, as well as instructions on how to load them. [Speech Tools](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/intro.html.md) NeMo Framework provides a set of tools useful for developing ASR and TTS models, including: * [NeMo Forced Aligner (NFA)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/nemo_forced_aligner.html.md) for generating token-, word- and segment-level timestamps of speech in audio using NeMo’s CTC-based Automatic Speech Recognition models. * [Speech Data Processor (SDP)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/speech_data_processor.html.md), a toolkit for simplifying speech data processing. It allows you to represent data processing operations in a config file, minimizing boilerplate code, and allowing reproducibility and shareability. * [Speech Data Explorer (SDE)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/speech_data_explorer.html.md), a Dash-based web application for interactive exploration and analysis of speech datasets. * [Dataset creation tool](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/ctc_segmentation.html.md) which provides functionality to align long audio files with the corresponding transcripts and split them into shorter fragments that are suitable for Automatic Speech Recognition (ASR) model training. * [Comparison Tool](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/intro.html.md) for ASR Models to compare predictions of different ASR models at word accuracy and utterance level. * [ASR Evaluator](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/asr_evaluator.html.md) for evaluating the performance of ASR models and other features such as Voice Activity Detection. * [Text Normalization Tool](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/text_normalization/intro.html.md) for converting text from the written form to the spoken form and vice versa (e.g. “31st” vs “thirty first”). Path to Deployment NeMo models that have been trained or customized using the NeMo Framework can be optimized and deployed with [NVIDIA Riva](https://developer.nvidia.com/riva.md). Riva provides containers and Helm charts specifically designed to automate the steps for push-button deployment. Getting Started with Speech AI Other Resources[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#other-resources "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------- ### GitHub Repos[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#github-repos "Link to this heading") * [NeMo](https://github.com/NVIDIA/NeMo): The main repository for the NeMo Framework * [NeMo-Run](https://github.com/NVIDIA/NeMo-Run): A tool to configure, launch and manage your machine learning experiments. * [NeMo-RL](https://github.com/NVIDIA/NeMo-RL): A Scalable and Efficient Post-Training Library * [NeMo-Curator](https://github.com/NVIDIA/NeMo-Curator): Scalable data pre-processing and curation toolkit for LLMs ### Getting Help[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#getting-help "Link to this heading") Engage with the NeMo community, ask questions, get support, or report bugs. * [NeMo Discussions](https://github.com/NVIDIA/NeMo/discussions) * [NeMo Issues](https://github.com/NVIDIA/NeMo/issues) Programming Languages and Frameworks[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#programming-languages-and-frameworks "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- * Python: The main interface to use NeMo Framework * Pytorch: NeMo Framework is built on top of PyTorch Licenses[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#licenses "Link to this heading") ------------------------------------------------------------------------------------------------------------------- * NeMo Github repo is licensed under the [Apache 2.0 license](https://github.com/NVIDIA/NeMo?tab=Apache-2.0-1-ov-file#readme) * NeMo Framework is licensed under the [NVIDIA AI PRODUCT AGREEMENT](https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/). By pulling and using the container, you accept the terms and conditions of this license. * The NeMo Framework container contains Llama materials governed by the [Meta Llama3 Community License Agreement](https://huggingface.co/meta-llama/Meta-Llama-3-8B/tree/main). Footnotes Links/Buttons: - [1](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#id1) - [2](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#id2) - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html.md#licenses) - [Large Language Models](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/index.html.md#nemo-2-llms) - [Speech AI](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html.md) - [Automatic Speech Recognition](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/intro.html.md) - [Text-to-Speech](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/intro.html.md) - [Install NeMo Framework](https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html.md#install-nemo-framework) - [NeMo Curator](https://github.com/NVIDIA/NeMo-Curator) - [[1]](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html#f1) - [Getting Started Tutorials](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#data-curation-tutorials) - [Data Curation features and usage](https://docs.nvidia.com/nemo/curator/latest/index.html.md) - [API Documentation](https://docs.nvidia.com/nemo/curator/latest/apidocs/index.html.md) - [NeMo Run](https://github.com/NVIDIA/NeMo-Run) - [Pre-training & PEFT Quickstart with NeMo Run](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html.md#nemo-2-quickstart-nemo-run) - [Training Quickstart with NeMo 2.0 API](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#nemo-2-quickstart-api) - [Migrating from NeMo 1.0 to NeMo 2.0 API](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html.md#nemo-2-migration) - [NeMo RL](https://github.com/NVIDIA/NeMo-RL) - [NeMo RL Documentation](https://docs.nvidia.com/nemo/rl/latest/index.html.md) - [NVIDIA NIM](https://www.nvidia.com/en-gb/launchpad/ai/generative-ai-inference-with-nim.md) - [Llama3 8B/70B, Llama3.1 405B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/llama3.html.md#llama) - [Mixtral 8x7B/8x22B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mixtral.html.md#mixtral) - [Nemotron 3 8B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron.html.md#nemotron) - [Baichuan2 7B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/baichuan2.html.md#baichuan) - [ChatGLM3 6B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/chatglm3.html.md#chatglm) - [DeepSeek V2/V3](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/deepseek_v3.html.md#deepseek-v3) - [Gemma 2B/7B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma.html.md#gemma) - [Gemma2 2B/9B/27B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gemma2.html.md#gemma2) - [GPT-OSS 20B/120B/](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/gpt_oss.html.md#gpt-oss) - [Mamba2 130M/370M/780M/1.3B/2.7B/8B/ Hybrid-8B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/mamba.html.md#mamba) - [Phi3 mini 4k](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/phi3.html.md#phi3) - [Qwen2 0.5B/1.5B/7B/72B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/qwen2.html.md#qwen2) - [Qwen3 0.6B/1.7B/4B/8B/14B/32B/30B_A3B/235B_A22B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/qwen3.html.md#qwen3) - [StarCoder 15B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/starcoder.html.md#starcoder) - [StarCoder2 3B/7B/15B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/starcoder2.html.md#starcoder2) - [BERT 110M/340M](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/bert.html.md#bert) - [T5 220M/3B/11B](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/t5.html.md#t5) - [NeVA (LLaVA 1.5)](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/neva.html.md#neva) - [Llama 3.2 Vision 11B/90B](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/mllama.html.md#mllama) - [LLaVA Next (LLaVA 1.6)](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#llavanext) - [Llama Nemotron Nano VL 8B](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llama_nemotron_vl.html.md#llama-nemotron-vl) - [SBERT 340M](https://docs.nvidia.com/nemo-framework/user-guide/latest/embeddingmodels/bert/sbert.html.md#sbert) - [Llama 3.2 Embedding 1B](https://docs.nvidia.com/nemo-framework/user-guide/latest/embeddingmodels/gpt/llama_embedding.html.md#llama-embed) - [Cosmos-1.0-Diffusion-Text2World-7B](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World) - [Yes](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/inference/README.md) - [Cosmos-1.0-Diffusion-Text2World-14B](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Text2World) - [Cosmos-1.0-Autoregressive-4B](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-4B) - [Cosmos-1.0-Autoregressive-12B](https://huggingface.co/nvidia/Cosmos-1.0-Autoregressive-12B) - [diffusion](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/diffusion) - [autoregressive](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/multimodal_autoregressive) - [NVIDIA Riva](https://developer.nvidia.com/riva.md) - [Speech Classification](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speech_classification/intro.html.md) - [Speaker Recognition](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_recognition/intro.html.md) - [Speaker Diarization](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/speaker_diarization/intro.html.md) - [ASR](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/results.html.md) - [TTS](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tts/checkpoints.html.md) - [Speech Tools](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/intro.html.md) - [NeMo Forced Aligner (NFA)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/nemo_forced_aligner.html.md) - [Speech Data Processor (SDP)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/speech_data_processor.html.md) - [Speech Data Explorer (SDE)](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/speech_data_explorer.html.md) - [Dataset creation tool](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/ctc_segmentation.html.md) - [ASR Evaluator](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/tools/asr_evaluator.html.md) - [Text Normalization Tool](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/text_normalization/intro.html.md) - [Quickstart Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/intro.html.md) - [Tutorial Notebooks](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/starthere/tutorials.html.md) - [NeMo](https://github.com/NVIDIA/NeMo) - [NeMo Discussions](https://github.com/NVIDIA/NeMo/discussions) - [NeMo Issues](https://github.com/NVIDIA/NeMo/issues) - [Apache 2.0 license](https://github.com/NVIDIA/NeMo?tab=Apache-2.0-1-ov-file#readme) - [NVIDIA AI PRODUCT AGREEMENT](https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/) - [Meta Llama3 Community License Agreement](https://huggingface.co/meta-llama/Meta-Llama-3-8B/tree/main) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md Title: Tutorials — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html Published Time: Fri, 05 Sep 2025 19:00:43 GMT Markdown Content: Tutorials[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#tutorials "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------- The best way to get started with NeMo is to start with one of our tutorials. They cover various domains and provide both introductory and advanced topics. These tutorials can be run from inside the [NeMo Framework Docker Container](https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html.md#install-nemo-framework). Large Language Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#large-language-models "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------- ### Data Curation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#data-curation "Link to this heading") Explore examples of data curation techniques using NeMo Curator: | Title with Link | Description | | --- | --- | | [Distributed Data Classification](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/distributed_data_classification) | The notebook showcases how to use NeMo Curator with two distinct classifiers: one for evaluating data quality and another for identifying data domains. Integrating these classifiers streamlines the annotation process, enhancing the combination of diverse datasets essential for training foundational models. | | [PEFT Curation](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation) | The tutorial demonstrates how to use the NeMo Curator Python API to curate a dataset for Parameter Efficient Fine-Tuning (PEFT). Specifically, it uses the Enron dataset, which contains emails along with classification labels. Each email entry includes a subject, body, and category (class label). The tutorial showcases various filtering and processing operations that can be applied to each record. | | [Single Node Data Curation Pipeline](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/single_node_tutorial) | The notebook provides a typical data curation pipeline using NeMo Curator, with the Thai Wikipedia dataset as an example. It demonstrates how to download Wikipedia data using NeMo Curator, perform language separation with FastText, apply GPU-based exact and fuzzy deduplication, and utilize CPU-based heuristic filtering. | | [NeMo Curator Python API with Tinystories](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/tinystories) | The tutorial shows how to use the NeMo Curator Python API to curate the TinyStories dataset. TinyStories is a dataset of short stories generated by GPT-3.5 and GPT-4, featuring words that are understood by 3 to 4-year-olds. The small size of this dataset makes it ideal for creation and validation. | | [Curating Datasets for Parameter Efficient Fine-tuning (PEFT) with Synthetic Data Generation (SDG)](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg) | The tutorial demonstrates how to use the NeMo Curator Python API for data curation, as well as synthetic data generation and qualitative score assignment to prepare a dataset for PEFT of LLMs. | | [Custom Tokenization for Domain Adaptive Pre-Training (DAPT)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/llama/domain-adaptive-pretraining/code/custom_tokenization.ipynb) | This notebook walks through the custom tokenization workflow required for DAPT, including training a customized tokenizer, dataset preprocessing, and checkpoints embedding table altering. | ### Training and Customization[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#training-and-customization "Link to this heading") | Title with Link | Description | | --- | --- | | [Quickstart with NeMo 2.0 API](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#nemo-2-quickstart-api) | The example shows how to run a simple training loop using NeMo 2.0. It uses the train API from the NeMo Framework LLM collection. | | [Pre-training & PEFT Quickstart with NeMo Run](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html.md#nemo-2-quickstart-nemo-run) | This tutorial introduces how to run any of the supported [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) using [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). It also takes a pretraining and fine-tuning recipe and shows how to run it locally, as well as remotely, on a Slurm-based cluster. | | [Long-Context LLM Training with NeMo Run](https://docs.nvidia.com/nemo-framework/user-guide/latest/longcontext/index.html.md#long-context-recipes) | This example demonstrates how to use [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) with NeMo-Run for long-context model training, as well as extending the context length of an existing pretrained model. | | [Llama 3 Supervised Fine-Tuning and Parameter Efficient Fine-Tuning with NeMo 2.0](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/llama/nemo2-sft-peft) | This example shows how to perform Llama 3 Supervised Fine-Tuning and Parameter Efficient Fine-Tuning using SFT and LoRA notebooks with NeMo 2.0 and NeMo-Run. | | [Parameter Efficient Fine-Tuning with NeMo AutoModel](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/automodel/peft.ipynb) | This example shows how to perform Parameter Efficient Fine-Tuning on Hugging Face Hub-available models with NeMo AutoModel. | | [Supervised Fine-Tuning with NeMo AutoModel](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/automodel/sft.ipynb) | This example shows how to perform Supervised Fine-Tuning on Hugging Face Hub-available models with NeMo AutoModel. | | [NeMo SlimPajama Data Pipeline and Pretraining Tutorial](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/llama-3/slimpajama) | This tutorial provides step-by-step instructions for preprocessing the SlimPajama dataset and pretraining a Llama-based model using the NeMo 2.0 library. | | [Domain Adaptive Pre-Training (DAPT) with Llama2 7B](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/llama/domain-adaptive-pretraining/code/domain_adaptive_pretraining_nemo2.0.ipynb) | This tutorial demonstrates how to perform DAPT on Pre-trained models such as Llama2-7B using [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) with [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). | | [Finetuning Llama 3.2 Model into Embedding Model](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/embedding/llama_embedding.ipynb) | This tutorial provides a detailed walkthrough of fine-tuning a Llama 3.2 model into an embedding model using NeMo 2.0. | World Foundation Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#world-foundation-models "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------- ### Post Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#post-training "Link to this heading") Explore examples of post-training techniques using World Foundation Models: | Title with Link | Description | | --- | --- | | [Cosmos Diffusion Models](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/post_training/README.md) | This example shows how to post-train Cosmos Diffusion-based World Foundation Models using the NeMo Framework for your custom physical AI tasks. | | [Cosmos Autoregressive Models](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/post_training/README.md) | This example shows how to post-train Cosmos Autoregressive-based World Foundation Models using the NeMo Framework for your custom physical AI tasks. | Speech AI[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#speech-ai "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------- Most NeMo Speech AI tutorials can be run on [Google’s Colab](https://colab.research.google.com/notebooks/intro.ipynb). ### Running Tutorials on Colab[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#running-tutorials-on-colab "Link to this heading") To run a tutorial: 1. Click the **Colab** link associated with the tutorial you are interested in from the table below. 2. Once in Colab, connect to an instance with a GPU by clicking **Runtime**>**Change runtime type** and selecting **GPU** as the hardware accelerator. ### Speech AI Fundamentals[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#speech-ai-fundamentals "Link to this heading") | Title | GitHub / Colab URL | | --- | --- | | Getting Started: NeMo Fundamentals | [NeMo Fundamentals](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/00_NeMo_Primer.ipynb) | | Getting Started: Audio translator example | [Audio translator example](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/AudioTranslationSample.ipynb) | | Getting Started: Voice swap example | [Voice swap example](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/VoiceSwapSample.ipynb) | | Getting Started: NeMo Models | [NeMo Models](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/01_NeMo_Models.ipynb) | | Getting Started: NeMo Adapters | [NeMo Adapters](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/02_NeMo_Adapters.ipynb) | | Getting Started: NeMo Models on Hugging Face Hub | [NeMo Models on HF Hub](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/Publish_NeMo_Model_On_Hugging_Face_Hub.ipynb) | ### Automatic Speech Recognition (ASR) Tutorials[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#automatic-speech-recognition-asr-tutorials "Link to this heading") | Title | GitHub / Colab URL | | --- | --- | | ASR with NeMo | [ASR with NeMo](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_NeMo.ipynb) | | ASR with Subword Tokenization | [ASR with Subword Tokenization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_Subword_Tokenization.ipynb) | | Offline ASR | [Offline ASR](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Offline_ASR.ipynb) | | Online ASR Microphone Cache Aware Streaming | [Online ASR Microphone Cache Aware Streaming](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb) | | Online ASR Microphone Buffered Streaming | [Online ASR Microphone Buffered Streaming](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_ASR_Microphone_Demo_Buffered_Streaming.ipynb) | | ASR CTC Language Fine-Tuning | [ASR CTC Language Fine-Tuning](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb) | | Intro to Transducers | [Intro to Transducers](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Intro_to_Transducers.ipynb) | | ASR with Transducers | [ASR with Transducers](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_Transducers.ipynb) | | ASR with Adapters | [ASR with Adapters](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb) | | Speech Commands | [Speech Commands](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Speech_Commands.ipynb) | | Online Offline Microphone Speech Commands | [Online Offline Microphone Speech Commands](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb) | | Voice Activity Detection | [Voice Activity Detection](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Voice_Activity_Detection.ipynb) | | Online Offline Microphone VAD | [Online Offline Microphone VAD](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb) | | Speaker Recognition and Verification | [Speaker Recognition and Verification](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb) | | Speaker Diarization Inference | [Speaker Diarization Inference](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb) | | ASR with Speaker Diarization | [ASR with Speaker Diarization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb) | | Online Noise Augmentation | [Online Noise Augmentation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_Noise_Augmentation.ipynb) | | ASR for Telephony Speech | [ASR for Telephony Speech](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_for_telephony_speech.ipynb) | | Streaming inference | [Streaming inference](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Streaming_ASR.ipynb) | | Buffered Transducer inference | [Buffered Transducer inference](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Buffered_Transducer_Inference.ipynb) | | Buffered Transducer inference with LCS Merge | [Buffered Transducer inference with LCS Merge](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Buffered_Transducer_Inference_with_LCS_Merge.ipynb) | | Offline ASR with VAD for CTC models | [Offline ASR with VAD for CTC models](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Offline_ASR_with_VAD_for_CTC_models.ipynb) | | Self-supervised Pre-training for ASR | [Self-supervised Pre-training for ASR](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Self_Supervised_Pre_Training.ipynb) | | Multi-lingual ASR | [Multi-lingual ASR](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Multilang_ASR.ipynb) | | Hybrid ASR-TTS Models | [Hybrid ASR-TTS Models](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_TTS_Tutorial.ipynb) | | ASR Confidence Estimation | [ASR Confidence Estimation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_Confidence_Estimation.ipynb) | | Confidence-based Ensembles | [Confidence-based Ensembles](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Confidence_Ensembles.ipynb) | ### Text-to-Speech (TTS) Tutorials[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#text-to-speech-tts-tutorials "Link to this heading") | Title | GitHub / Colab URL | | --- | --- | | Basic and Advanced: NeMo TTS Primer | [NeMo TTS Primer](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/NeMo_TTS_Primer.ipynb) | | Basic and Advanced: TTS Speech/Text Aligner Inference | [TTS Speech/Text Aligner Inference](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Aligner_Inference_Examples.ipynb) | | Basic and Advanced: FastPitch and MixerTTS Model Training | [FastPitch and MixerTTS Model Training](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_MixerTTS_Training.ipynb) | | Basic and Advanced: FastPitch Finetuning | [FastPitch Finetuning](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_Finetuning.ipynb) | | Basic and Advanced: FastPitch and HiFiGAN Model Training for German | [FastPitch and HiFiGAN Model Training for German](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_GermanTTS_Training.ipynb) | | Basic and Advanced: Tacotron2 Model Training | [Tacotron2 Model Training](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Tacotron2_Training.ipynb) | | Basic and Advanced: FastPitch Duration and Pitch Control | [FastPitch Duration and Pitch Control](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Inference_DurationPitchControl.ipynb) | | Basic and Advanced: FastPitch Speaker Interpolation | [FastPitch Speaker Interpolation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_Speaker_Interpolation.ipynb) | | Basic and Advanced: TTS Inference and Model Selection | [TTS Inference and Model Selection](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Inference_ModelSelect.ipynb) | | Basic and Advanced: TTS Pronunciation Customization | [TTS Pronunciation Customization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Pronunciation_customization.ipynb) | ### Tools and Utilities[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#tools-and-utilities "Link to this heading") | Title | GitHub / Colab URL | | --- | --- | | Utility Tools for Speech and Text: NeMo Forced Aligner | [NeMo Forced Aligner](https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/tools/NeMo_Forced_Aligner_Tutorial.ipynb) | | Utility Tools for Speech and Text: Speech Data Explorer | [Speech Data Explorer](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tools/SDE_HowTo_v2.ipynb) | | Utility Tools for Speech and Text: CTC Segmentation | [CTC Segmentation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tools/CTC_Segmentation_Tutorial.ipynb) | ### Text Processing (TN/ITN) Tutorials[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#text-processing-tn-itn-tutorials "Link to this heading") | Title | GitHub / Colab URL | | --- | --- | | Text Normalization Techniques: Text Normalization | [Text Normalization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_(Inverse)_Normalization.ipynb) | | Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger | [Inverse Text Normalization with Thutmose Tagger](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/nlp/ITN_with_Thutmose_Tagger.ipynb) | | Text Normalization Techniques: WFST Tutorial | [WFST Tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb) | Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html.md#text-processing-tn-itn-tutorials) - [NeMo Framework Docker Container](https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html.md#install-nemo-framework) - [Distributed Data Classification](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/distributed_data_classification) - [PEFT Curation](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation) - [Single Node Data Curation Pipeline](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/single_node_tutorial) - [NeMo Curator Python API with Tinystories](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/tinystories) - [Curating Datasets for Parameter Efficient Fine-tuning (PEFT) with Synthetic Data Generation (SDG)](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg) - [Custom Tokenization for Domain Adaptive Pre-Training (DAPT)](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/llama/domain-adaptive-pretraining/code/custom_tokenization.ipynb) - [Quickstart with NeMo 2.0 API](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html.md#nemo-2-quickstart-api) - [Pre-training & PEFT Quickstart with NeMo Run](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html.md#nemo-2-quickstart-nemo-run) - [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) - [Long-Context LLM Training with NeMo Run](https://docs.nvidia.com/nemo-framework/user-guide/latest/longcontext/index.html.md#long-context-recipes) - [Llama 3 Supervised Fine-Tuning and Parameter Efficient Fine-Tuning with NeMo 2.0](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/llama/nemo2-sft-peft) - [Parameter Efficient Fine-Tuning with NeMo AutoModel](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/automodel/peft.ipynb) - [Supervised Fine-Tuning with NeMo AutoModel](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/automodel/sft.ipynb) - [NeMo SlimPajama Data Pipeline and Pretraining Tutorial](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/llama-3/slimpajama) - [Domain Adaptive Pre-Training (DAPT) with Llama2 7B](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/llama/domain-adaptive-pretraining/code/domain_adaptive_pretraining_nemo2.0.ipynb) - [Finetuning Llama 3.2 Model into Embedding Model](https://github.com/NVIDIA/NeMo/blob/main/tutorials/llm/embedding/llama_embedding.ipynb) - [Cosmos Diffusion Models](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/post_training/README.md) - [Cosmos Autoregressive Models](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/post_training/README.md) - [Google’s Colab](https://colab.research.google.com/notebooks/intro.ipynb) - [NeMo Fundamentals](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/00_NeMo_Primer.ipynb) - [Audio translator example](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/AudioTranslationSample.ipynb) - [Voice swap example](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/VoiceSwapSample.ipynb) - [NeMo Models](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/01_NeMo_Models.ipynb) - [NeMo Adapters](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/02_NeMo_Adapters.ipynb) - [NeMo Models on HF Hub](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/Publish_NeMo_Model_On_Hugging_Face_Hub.ipynb) - [ASR with NeMo](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_NeMo.ipynb) - [ASR with Subword Tokenization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_Subword_Tokenization.ipynb) - [Offline ASR](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Offline_ASR.ipynb) - [Online ASR Microphone Cache Aware Streaming](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb) - [Online ASR Microphone Buffered Streaming](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_ASR_Microphone_Demo_Buffered_Streaming.ipynb) - [ASR CTC Language Fine-Tuning](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb) - [Intro to Transducers](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Intro_to_Transducers.ipynb) - [ASR with Transducers](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_Transducers.ipynb) - [ASR with Adapters](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb) - [Speech Commands](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Speech_Commands.ipynb) - [Online Offline Microphone Speech Commands](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb) - [Voice Activity Detection](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Voice_Activity_Detection.ipynb) - [Online Offline Microphone VAD](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb) - [Speaker Recognition and Verification](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb) - [Speaker Diarization Inference](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb) - [ASR with Speaker Diarization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb) - [Online Noise Augmentation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Online_Noise_Augmentation.ipynb) - [ASR for Telephony Speech](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_for_telephony_speech.ipynb) - [Streaming inference](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/asr/Streaming_ASR.ipynb) - [Buffered Transducer inference](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Buffered_Transducer_Inference.ipynb) - [Buffered Transducer inference with LCS Merge](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Buffered_Transducer_Inference_with_LCS_Merge.ipynb) - [Offline ASR with VAD for CTC models](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Offline_ASR_with_VAD_for_CTC_models.ipynb) - [Self-supervised Pre-training for ASR](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Self_Supervised_Pre_Training.ipynb) - [Multi-lingual ASR](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Multilang_ASR.ipynb) - [Hybrid ASR-TTS Models](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_TTS_Tutorial.ipynb) - [ASR Confidence Estimation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_Confidence_Estimation.ipynb) - [Confidence-based Ensembles](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Confidence_Ensembles.ipynb) - [NeMo TTS Primer](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/NeMo_TTS_Primer.ipynb) - [TTS Speech/Text Aligner Inference](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Aligner_Inference_Examples.ipynb) - [FastPitch and MixerTTS Model Training](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_MixerTTS_Training.ipynb) - [FastPitch Finetuning](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_Finetuning.ipynb) - [FastPitch and HiFiGAN Model Training for German](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_GermanTTS_Training.ipynb) - [Tacotron2 Model Training](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Tacotron2_Training.ipynb) - [FastPitch Duration and Pitch Control](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Inference_DurationPitchControl.ipynb) - [FastPitch Speaker Interpolation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_Speaker_Interpolation.ipynb) - [TTS Inference and Model Selection](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Inference_ModelSelect.ipynb) - [TTS Pronunciation Customization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/Pronunciation_customization.ipynb) - [NeMo Forced Aligner](https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/tools/NeMo_Forced_Aligner_Tutorial.ipynb) - [Speech Data Explorer](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tools/SDE_HowTo_v2.ipynb) - [CTC Segmentation](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tools/CTC_Segmentation_Tutorial.ipynb) - [Text Normalization](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_(Inverse)_Normalization.ipynb) - [Inverse Text Normalization with Thutmose Tagger](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/nlp/ITN_with_Thutmose_Tagger.ipynb) - [WFST Tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md Title: Resiliency Features — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html Published Time: Thu, 30 Oct 2025 07:07:34 GMT Markdown Content: Resiliency Features[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#resiliency-features "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------- NeMo Framework incorporates resilient training features from the [NVIDIA Resiliency Extension](https://github.com/NVIDIA/nvidia-resiliency-ext). This extension provides fault-tolerant capabilities that help minimize downtime due to failures and interruptions during training. The key features include: * Fault Tolerance: Automatically resumes training from the last checkpoint in case of interruptions. * Straggler Detection: Identifies and mitigates slow-performing nodes to ensure efficient training. * Local Checkpointing: Saves checkpoints directly to local storage on each node. For more information on the design and use of these features, please see the Resiliency Extension’s [documentation](https://nvidia.github.io/nvidia-resiliency-ext/). Fault Tolerance[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#fault-tolerance "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------- The Resiliency Extension’s Fault Tolerance subpackage can detect hangs during training and automatically restart a workload due to a hang or error. This is useful if transient faults are common, for example, if training on unreliable hardware or at a very large scale. ### Use Fault Tolerance Features[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#use-fault-tolerance-features "Link to this heading") Warning This plugin is currently only supported on Slurm-based clusters. The package contains a PyTorch Lightning callback, `FaultToleranceCallback`, and the `ft_launcher`, a launcher similar to `torchrun`. To use the features mentioned above, the callback must be added to the trainer and the workload must be launched with the `ft_launcher`. We’ve provided a NeMo-Run plugin to simplify this integration to one step. Please note that [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) must be installed to use this plugin. The following example adds the plugin to the LLaMA3 8B recipe: from nemo import lightning as nl from nemo.collections import llm from nemo.lightning.run.plugins import FaultTolerancePlugin recipe = llm.llama3_8b.pretrain_recipe(name="llama3_with_fault_tolerance", ...) # fill in other recipe arguments ... executor = # set up your NeMo-Run executor ... run_plugins = [FaultTolerancePlugin()] run.run(recipe, plugins=run_plugins, executor=executor) When using this feature, if a hang is encountered, you should see log statements similar to the following: [WARNING] [RankMonitorServer:34] Did not get subsequent heartbeat. Waited 171.92 seconds. [WARNING] [RankMonitorServer:58] Did not get subsequent heartbeat. Waited 171.92 seconds. torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 453152 closing signal SIGTERM torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 453157 closing signal SIGTERM ### Default Settings[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#default-settings "Link to this heading") The NeMo-Run plugin will configure `FaultToleranceCallback` with `autoresume=True` and `calculate_timeout=True`. The `autoresume` setting is necessary to automatically launch another job in case of a fault or if training is not complete, which is expected to be useful for most users. This feature also makes training more hands-off when a long training session cannot complete within a single job’s time limit. The `calculate_timeout` setting automatically calculates the thresholds used to determine if the job is stuck in a hang, simplifying the user experience. Therefore, we have enabled it by default. We’ve also limited the default maximum successive in-job restarts (`num_in_job_restarts`) to 3 and job retries (`num_job_retries_on_failure`) to 2. In our experience, when failures occur more frequently than this, there is usually a non-transient application issue that needs to be addressed. These are arguments to the plugin, so you can adjust them as needed. Straggler Detection[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#straggler-detection "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------- The Resiliency Extension’s Straggler Detection functionality detects slow-performing ranks and terminates the training if the performance of any rank falls below a user-specified threshold. ### Use Straggler Detection Features[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#use-straggler-detection-features "Link to this heading") The package provides a PyTorch Lightning callback, which makes this feature easy to use with NeMo. We’ve provided a recipe for configuration of this callback. See the following usage example: from nemo import lightning as nl from nemo.collections import llm from nemo.collections.llm.recipes.callbacks import straggler_det_callback trainer = nl.Trainer() straggler_cb = straggler_det_callback() trainer.callbacks.append(straggler_cb) When using this feature, the training log should contain performance reports similar to the following: GPU relative performance: Worst performing 5/512 ranks: Rank=76 Node=h100-001-253-012 Score=0.94 Rank=13 Node=h100-001-010-003 Score=0.94 Rank=45 Node=h100-001-172-026 Score=0.94 Rank=433 Node=h100-004-141-026 Score=0.95 Rank=308 Node=h100-003-263-012 Score=0.95 Best performing 5/512 ranks: Rank=432 Node=h100-004-141-026 Score=0.99 Rank=376 Node=h100-004-005-003 Score=0.98 Rank=487 Node=h100-004-255-026 Score=0.98 Rank=369 Node=h100-004-004-033 Score=0.98 Rank=361 Node=h100-004-004-023 Score=0.98 GPU individual performance: Worst performing 5/512 ranks: Rank=76 Node=h100-001-253-012 Score=0.98 Rank=162 Node=h100-002-042-026 Score=0.98 Rank=79 Node=h100-001-253-012 Score=0.98 Rank=357 Node=h100-004-004-013 Score=0.98 Rank=85 Node=h100-001-253-026 Score=0.98 Best performing 5/512 ranks: Rank=297 Node=h100-003-095-026 Score=1.00 Rank=123 Node=h100-001-273-026 Score=1.00 Rank=21 Node=h100-001-010-013 Score=1.00 Rank=389 Node=h100-004-074-012 Score=1.00 Rank=489 Node=h100-004-269-026 Score=1.00 Straggler report processing time: 0.042 sec. If Weights and Biases logging is configured (e.g. by using `nemo.lightning.run.plugins.WandbPlugin`), the WandB run will contain plots for the minimum, maximum, and median scores across ranks, for both individual and relative performance. ![Image 1: _images/straggler_plots.png](https://docs.nvidia.com/nemo-framework/user-guide/latest/_images/straggler_plots.png) ### Default Settings[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#id2 "Link to this heading") The callback recipe exposes the following arguments: * `straggler_report_time_interval` is the performance score reporting frequency in seconds, with a default of 300 seconds. We do not see any significant impact on training throughput while training Llama 3.1 models on up to 1,000 H100 GPUs with `straggler_det_callback` enabled and the reporting time set to 300 seconds. Feel free to increase or decrease this frequency based on your workload and any observed overheads. * `stop_if_detected_straggler` decides whether to stop training if a straggler is detected. This is enabled to ensure that training is stopped if there are stragglers, but can be disabled by setting to False if training should proceed even with stragglers. When using the callback recipe, both the individual GPU performance scores and the relative GPU performance scores are calculated and the top 5 scores for each are printed in the log, which is set by `num_gpu_perf_scores_to_print=5`. Also, a score below 0.7 means that the rank is a straggler, which is determined by `gpu_relative_perf_threshold` and `gpu_individual_perf_threshold`. This value of 0.7 is set based on the defaults in the nvidia-resiliency-extension package. If you would like more control over this behavior, you can always directly configure the `StragglerDetectionCallback`. Local Checkpointing[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#local-checkpointing "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------- Local checkpointing saves model checkpoints directly to storage on each node (e.g., local SSDs or RAM disks), instead of relying solely on a shared network filesystem. This approach can significantly speed up the saving process and reduce the load on shared storage infrastructure. Key features leveraged from the extension include: * Local Saving: Each node saves its part of the checkpoint locally. * Synchronous and Asynchronous Support: Saving can happen synchronously or asynchronously. In NeMo, this mirrors the configuration used for global checkpoints. * Automatic Cleanup: Handles the removal of outdated or incomplete local checkpoints. * Optional Replication: For multi-node jobs, checkpoints are replicated to other nodes (LazyCliqueReplicationStrategy) to allow recovery even if a node fails after saving. Single-node jobs do not use replication. * Automated Loading: When resuming, the framework automatically finds the latest valid checkpoint, comparing local and global checkpoints, and retrieves any needed parts across nodes. ### Use Local Checkpointing Features[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#use-local-checkpointing-features "Link to this heading") Note This integration currently only works with Megatron Core models and requires using the MegatronStrategy. To enable local checkpointing in NeMo, add the LocalCheckpointCallback from the Resiliency Extension to your PyTorch Lightning Trainer. from nemo import lightning as nl from nemo.lightning.pytorch.local_ckpt import update_trainer_local_checkpoint_io # Define a function to extract the iteration number from a globally saved checkpoint path def get_iteration_from_checkpoint(checkpoint_path: str) -> int: ... # Define the base directory for local checkpoints on each node's filesystem local_checkpoint_dir = "/path/to/local/node/storage/checkpoints" # Pass any additional kwargs to the update function # Trainer should have the local checkpoint callback added # e.g. trainer = nl.Trainer(callbacks=[LocalCheckpointCallback(every_n_train_steps=10)], ...) update_trainer_local_checkpoint_io(trainer, local_checkpoint_dir, get_iteration_from_checkpoint, **kwargs) # ... rest of the training Note An example implementation for extracting the iteration from a checkpoint path, suitable for use as `get_iteration_from_checkpoint`, can be found in [nemo/collections/llm/recipes/log/default.py](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/log/default.py) as `get_global_step_from_global_checkpoint_path`. This function is designed to work with the default NeMo checkpoint naming convention for recipes under the LLM collection. If using a customized name format, write a corresponding implementation. ### Configuration[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#configuration "Link to this heading") The primary configuration needed is the local_checkpoint_base_dir argument passed to update_trainer_local_checkpoint_io. This specifies the root directory _on each node’s local filesystem_ where checkpoints will be stored. The actual path used on a node will be /local_ckpt/. Ensure this path points to a fast local storage medium for best performance. Other aspects are configured automatically: * Asynchronous Saving: Local checkpoint saving will be asynchronous if asynchronous saving is enabled for global checkpoints (i.e., if trainer.strategy.async_save is True). * Replication: Replication strategy is automatically chosen based on the number of nodes used for training. (Lazy Clique Replication for >1 node, None for 1 node). Preemption[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#preemption "Link to this heading") ------------------------------------------------------------------------------------------------------------------------- Training a foundation model can take several hours or even days to complete. In some cases, training jobs must be halted preemptively due to cluster time limits, higher priority jobs, or other reasons. NeMo Framework provides functionality to gracefully perform a preemptive shutdown of training. This feature will listen for a user-specified signal at the end of each training step. When the signal is sent, the job will save a checkpoint and exit. ### Use Preemption Features[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#use-preemption-features "Link to this heading") Warning The `PreemptionPlugin` is currently only supported on Slurm-based clusters. To enable this feature for Slurm workloads, use the NeMo-Run plugin. Please note that [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) must be installed to use this plugin. The following example adds the plugin to the LLaMA3 8B recipe: from nemo import lightning as nl from nemo.collections import llm from nemo.lightning.run.plugins import PreemptionPlugin recipe = llm.llama3_8b.pretrain_recipe(name="llama3_with_preemption", ...) # fill in other recipe arguments ... executor = # set up your NeMo-Run executor ... run_plugins = [PreemptionPlugin()] run.run(recipe, plugins=run_plugins, executor=executor) The above plugin will configure a PyTorch Lightning callback to catch and handle a preemption signal. For non-Slurm workloads (e.g. training on a single device), you can directly configure this callback. See the following usage example: from nemo import lightning as nl from nemo.collections import llm from nemo.lightning.pytorch.callbacks import PreemptionCallback trainer = nl.Trainer() trainer.callbacks.append(PreemptionCallback()) When the preemption signal is sent, the log should contain statements similar to the following: Received Signals.SIGTERM death signal, shutting down workers Sending process 404288 closing signal SIGTERM Received signal 15, initiating graceful stop Preemption detected, saving checkpoint and exiting ### Default Settings[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#id3 "Link to this heading") The default signal the `PreemptionCallback` will listen for is `SIGTERM` (set by `sig`), since this is the signal Slurm will send to all processes when the job time limit is reached. The `PreemptionPlugin` is configured to send this signal 60 seconds before the actual job time limit (set by `preempt_time`) to ensure sufficient time for saving a checkpoint. You can adjust this as needed. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/resiliency.html.md#id3) - [NVIDIA Resiliency Extension](https://github.com/NVIDIA/nvidia-resiliency-ext) - [documentation](https://nvidia.github.io/nvidia-resiliency-ext/) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) - [nemo/collections/llm/recipes/log/default.py](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/log/default.py) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/supported_methods.html.md Title: Supported PEFT Methods — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/supported_methods.html Published Time: Fri, 18 Jul 2025 19:27:33 GMT Markdown Content: Supported PEFT Methods[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/supported_methods.html.md#supported-peft-methods "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo 2.0[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/supported_methods.html.md#nemo-2-0 "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------- NeMo 2.0 supports the following PEFT tuning methods: 1. **LoRA**: [LoRA: Low-Rank Adaptation of Large Language Models](http://arxiv.org/abs/2106.09685.md) * LoRA makes fine-tuning efficient by representing weight updates with two low rank decomposition matrices. The original model weights remain frozen, while the low-rank decomposition matrices are updated to adapt to the new data, keeping the number of trainable parameters low. In contrast with adapters, the original model weights and adapted weights can be combined during inference, avoiding any architectural change or additional latency in the model at inference time. * In NeMo, you can customize the adapter bottleneck dimension and the target modules to apply LoRA. LoRA can be applied to any linear layer. In a transformer model, this includes 1) Q, K, V attention projections, 2) attention output projection layer, and 3) either or both of the two transformer MLP layers. For QKV, NeMo’s attention implementation fuses QKV into a single projection, so our LoRA implementation learns a single low-rank projection for QKV combined. 2. **DoRA**: [DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353.md) * DoRA decomposes the pre-trained weight into magnitude and direction. It learns a separate magnitude parameter while employing LoRA for directional updates, efficiently minimizing the number of trainable parameters. DoRA enhances both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA has been shown to consistently outperform LoRA on various downstream tasks. * In NeMo, DoRA leverages the same adapter structure as LoRA. NeMo adds support for Tensor Parallelism and Pipeline Parallelism for DoRA, enabling DoRA to be scaled to larger model variants. NeMo 1.0 (Legacy)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/supported_methods.html.md#nemo-1-0-legacy "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------- 1. **LoRA**: [LoRA: Low-Rank Adaptation of Large Language Models](http://arxiv.org/abs/2106.09685.md) 2. **QLoRA**: [QLoRA: Efficient Finetuning of Quantized LLMs](http://arxiv.org/abs/2305.14314.md) * Similar to LoRA, QLoRA keeps the original model weights frozen while introducing low-rank adapters for customization. However, QLoRA goes a step further by quantizing the frozen linear weights with a custom 4-bit data type called Normal Float 4 (NF4). The adapters are identical to those of LoRA and kept in BF16. * Compared to LoRA, QLoRA is up to 60% more memory-efficient, allowing for fine-tuning large models with smaller/less GPUs and/or higher batch size. QLoRA is able to achieve the same accuracy, although a different convergence recipe may be required. However, the drawback is that QLoRA training is slower than LoRA by 50% to 200%. * For more details, please visit the NeMo QLoRA Guide. 3. **P-Tuning**: [GPT Understands, Too](https://arxiv.org/abs/2103.10385.md) * P-Tuning is an example of the prompt learning family of methods, in which trainable virtual tokens are inserted into the model input prompt to induce it to perform a task. Virtual tokens (also called “continuous” or “soft” tokens) are embeddings that have no concrete mapping to strings or characters within the model’s vocabulary. They are simply 1D vectors that match the dimensionality of real tokens which make up the model’s vocabulary. * In P-Tuning, an intermediate MLP model is used to generate virtual token embeddings. We refer to this intermediate model as our `prompt_encoder`. The prompt encoder parameters are randomly initialized at the start of p-tuning. All base model parameters are frozen, and only the prompt encoder weights are updated at each training step. * In Nemo, you can customize the number of virtual tokens, as well as the embedding and MLP bottleneck dimensions. 4. **Adapters (Canonical)**: [Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751.md) * Adapters (Houlsby setup) is one of the first PEFT methods applied to NLP. Adapter tuning is more efficient than full fine-tuning because the base model weights are frozen, while only a small number of adapter module weights are updated. In this method, two linear layers with a bottleneck and a non-linear activation are inserted into each transformer layer via a residual connection. In each case, the output linear layer is initialized to 0 to ensure that an untrained adapter does not affect the normal forward pass of the transformer layer. * In NeMo, you can customize the adapter bottleneck dimension, adapter dropout amount, as well as the type and position of normalization layer. 5. **IA3**: [Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning](http://arxiv.org/abs/2205.05638.md) * IA3 makes fine-tuning efficient by rescaling activations with learned vectors. The rescaling layers are injected in the attention (for key and value) and feedforward modules in the base model. Similar to other PEFT methods, only the rescaling vectors are updated during fine-tuning to adapt to the new data so the number of updated parameters is low. However, since rescaling vectors are much smaller than low rank matrices (LoRA) and bottleneck layers (Adapters), IA3 cuts down the number of trainable parameters further by an order of magnitude. The learning rescaling vectors can also be merged with the base weights, leading to no architectural change and no additional latency at inference time. * There is no hyperparameter to tune for the IA3 adapter. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/supported_methods.html.md#nemo-1-0-legacy) - [LoRA: Low-Rank Adaptation of Large Language Models](http://arxiv.org/abs/2106.09685.md) - [DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353.md) - [QLoRA: Efficient Finetuning of Quantized LLMs](http://arxiv.org/abs/2305.14314.md) - [GPT Understands, Too](https://arxiv.org/abs/2103.10385.md) - [Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751.md) - [Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning](http://arxiv.org/abs/2205.05638.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md Title: Use Auto Configurator to Find the Optimal Configuration# URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html Published Time: Fri, 18 Jul 2025 19:29:23 GMT Markdown Content: Auto Configurator searches for hyperparameters (HPs) that achieve the maximum highest training throughput when working with Large Language Models (LLMs) utilizing the NeMo Framework. Note Auto Configurator is supported for Bert, T5, and GPT-based models: GPT3, LLama, Mixtral, Mistral, Gemma, Nemotron, Starcoder, and Qwen. Auto Configurator Capabilities[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#auto-configurator-capabilities "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Auto Configurator is intended to iterate over different model configurations quickly and find the best configuration, that is, the configuration that minimizes both time and financial expenditure. It offers a range of features to facilitate this, as detailed in the list below. * Model size recommendation: finds the optimal model size if the parameter is not specified. * Training time estimation: estimates model training time based on input parameters. * Hyperparameters recommendation: finds the optimal list of hyperparameters to be trained. * Optimal configuration recommendation: calculates the performance after a short training of candidate configurations and finds the optimal model configuration. Note Auto Configurator supports model size and hyperparameters recommendations only for pretrain mode. For finetune mode, the user is expected to specify lists of model parallelism parameters. ### Model Size Recommendation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#model-size-recommendation "Link to this heading") If you have not decided what model size you want to train, Auto Configurator can recommend a model size for your use case. If you know the number of GPUs, TFLOPS per GPU, the maximum time to train, and the number of tokens to train for, it can recommend a model size that can be trained with the specified hardware and time constraints. For example, if you had 20 NVIDIA DGX nodes available (in 80 GB GPU memory), and wanted to train a GPT model for a maximum of 5 days, Auto Configurator would recommend using a 5B parameter GPT model. ### Training Time Estimation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#training-time-estimation "Link to this heading") Auto Configurator calculates the estimated training time for your model. It provides a projection of the training time in days, based on the input dataset and parameters you provide. ### Hyperparameters Recommendation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#hyperparameters-recommendation "Link to this heading") After Auto Configurator generates the base configuration, it searches over four critical hyperparameters that have a great impact on training throughput but do not affect model convergence. These hyperparameters include Tensor Parallelism (TP), Pipeline Parallelism (PP), Context Parallelism (CP), Expert Parallelism (EP), Micro Batch Size (MBS), and Activation Checkpointing Layers (ActCkpt). Auto Configurator will also provide optimal Global Batch Size (GBS) if it’s not specified. Auto Configurator initially applies heuristics to identify suitable candidates for the four key parameters, subsequently generating a grid of candidate configurations. It returns all of the candidate configurations in NeMo 2.0 format. Note Some of the candidate configurations may not work due to high-memory usage or other issues. Once the candidate configurations are generated, you can use NeMo Framework to launch the most promising candidates. When running the candidates on the cluster, you can limit job time and job max steps by using `max_minutes_per_run` and `max_steps_per_run` parameters. During this search, the jobs will run with the number of nodes specified in the configuration files, using the `num_nodes` parameter. Once all of the jobs have finished running, you’ll need to run compare_throughput.py to get a .csv table with performance results for each succeeded job. ### Optimal Configuration Recommendation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#optimal-configuration-recommendation "Link to this heading") After all of the candidate jobs are done, Auto Configurator calculates performance parameters for each of the candidates. Auto Configurator generates two .csv files: one detailing the performance measures of the candidates and another listing the candidates that failed due to out-of-memory errors. ### Configurations Generation Example[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#configurations-generation-example "Link to this heading") The following list shows the required input parameters for the Auto Configurator runner: * `recipe`: model recipe based on NeMo 2.0. * `path_to_logs`: path to the directory where the logs will be stored. The following list shows the optional parameters for the Auto Configurator runner: * `mode`: a string, `pretrain` or `finetune` mode. * `tensor_parallel_sizes`: a list, such as `[1, 2, 4]`. * `pipeline_parallel_sizes`: a list, such as `[1, 2, 4]`. * `context_parallel_sizes`: a list, such as `[1, 2, 4]`. * `expert_parallel_sizes`: a list, such as `[1, 2, 4]`. * `micro_batch_sizes`: a list, such as `[1, 2, 4]`. * `min_model_parallel_size`: a value for the minimum desired parallelism. * `max_model_parallel_size`: a value for the maximum desired parallelism. For each of the optional parameters, Auto Configurator will find the optimal value if the parameter is not specified. To view the full list of parameters, please visit [this page](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/tools/auto_configurator/runner.py.md#L68). We provide an example below on how to generate configurations by a given **pretrain** recipe: from functools import partial from nemo.collections import llm from nemo.collections.llm.tools.auto_configurator import AutoConfigurator, generate_configs # Import recipe and change needed parameters recipe = partial(llm.llama3_8b.pretrain_recipe, num_nodes=128, num_gpus_per_node=8)() recipe.model.config.seq_length = recipe.data.seq_length = 8192 recipe.data.global_batch_size = 512 # Set this value to True if you want Auto Configurator # to calculate size for your model based on given parameters. # Auto Configurator will also change num_layers, num_attention_heads, hidden_size, ffn_hidden_size # for given recipe in respect to calculated model size. calculate_model_size = False # Initialize Auto Configurator runner runner = AutoConfigurator( recipe=recipe, path_to_logs="/path/to/save/logs", gpu_memory_gb=80, tensor_parallel_sizes=[1,2], pipeline_parallel_sizes="auto", context_parallel_sizes=[1,2], micro_batch_sizes="auto", max_model_parallel_size=4, min_model_parallel_size=1, max_training_days=7, max_steps_per_run=50, max_minutes_per_run=20, num_tokens_in_b=840, vocab_size=32000, calculate_model_size=calculate_model_size, ) # Generate configs (NeMo 2.0 reicpes) with different model parallelism. base_config, configs = generate_configs(runner) Example output of the above script: You can train a 8B parameter model in 4.34 days using 1024 GPUs. This result assumes you are training to 840B tokens, and each GPU achieves 140 TFLOPS. Valid config: SeqLen=8192, GBS=512, MBS=1, TP=1, PP=2, CP=1, EP=1, VP=None. Adding to directory. Valid config: SeqLen=8192, GBS=512, MBS=1, TP=1, PP=2, CP=2, EP=1, VP=None. Adding to directory. Valid config: SeqLen=8192, GBS=512, MBS=1, TP=2, PP=1, CP=1, EP=1, VP=None. Adding to directory. Valid config: SeqLen=8192, GBS=512, MBS=1, TP=2, PP=1, CP=2, EP=1, VP=None. Adding to directory. Valid config: SeqLen=8192, GBS=512, MBS=1, TP=2, PP=2, CP=1, EP=1, VP=None. Adding to directory. Valid config: SeqLen=8192, GBS=512, MBS=2, TP=2, PP=2, CP=1, EP=1, VP=None. Adding to directory. Valid config: SeqLen=8192, GBS=512, MBS=1, TP=2, PP=2, CP=2, EP=1, VP=None. Adding to directory. Valid config: SeqLen=8192, GBS=512, MBS=2, TP=2, PP=2, CP=2, EP=1, VP=None. Adding to directory. All candidate configurations created correctly. Total number of configs: 8. We also provide an example on how to generate configurations by a given **finetune** recipe: from functools import partial from nemo.collections import llm from nemo.collections.llm.tools.auto_configurator import AutoConfigurator, generate_configs # Import recipe and change needed parameters recipe = partial( llm.llama3_8b.finetune_recipe, num_nodes=16, num_gpus_per_node=8, dir='/path/to/pretrained/model', scheme='lora', )() recipe.model.config.seq_length = recipe.data.seq_length = 4096 recipe.data.global_batch_size = 128 # Initialize Auto Configurator runner. # Please, make sure you specified all the model parallelism parameters # since 'finetune' mode doesn't support auto selection. runner = AutoConfigurator( recipe=recipe, path_to_logs="/path/to/save/logs", mode="finetune", gpu_memory_gb=80, tensor_parallel_sizes=[1,2,4], pipeline_parallel_sizes=[1,2], context_parallel_sizes=[1,2], micro_batch_sizes=[1,2,4], max_model_parallel_size=8, min_model_parallel_size=1, max_steps_per_run=25, max_minutes_per_run=20, num_tokens_in_b=140, vocab_size=32000, ) # Generate configs (NeMo 2.0 reicpes) with different model parallelism. base_config, configs = generate_configs(runner) ### Calculate Performance[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#calculate-performance "Link to this heading") We provide an example below on how to calculate the performance and get the optimal configuration after all the generated recipes were trained: from nemo.collections.llm.tools.auto_configurator import get_results # Generate results # The results will be saved in .csv format get_results( base_config=base_config, # base_config which was given by generate_configs function runner=runner, # runner which was used for config generation path_to_save="/path/to/save/results" # Path where to save the results output_top_n=10, # Print out top n configurations log_file_prefix="log", # Prefix of the logs file ) Example output of the above script: All candidate configurations created correctly. Total number of configs: 3. Top 3 configs sorted from fastest to slowest: Config 1: 53.62 TFLOPS per GPU with 0.5400s per global step. Config 2: 50.79 TFLOPS per GPU with 0.5700s per global step. Config 3: 46.7 TFLOPS per GPU with 0.6200s per global step. ================================================== Optimal config: llama_0.145b_1nodes_tp_1_pp_1_cp_1_ep_1_mbs_4_vp_None with 0.5400s per global step. ================================================== The results were successfully saved to /home/llama_auto_conf. ### End-To-End Example[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#end-to-end-example "Link to this heading") To view an end-to-end example of how to generate candidate configs, train them, and calculate the performance using Auto Configurator with NeMo Framework, please visit [this page](https://github.com/NVIDIA/NeMo/blob/main/examples/llm/auto_configurator/auto_config.py.md). Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/usingautoconfigurator.html.md#end-to-end-example) - [this page](https://github.com/NVIDIA/NeMo/blob/main/examples/llm/auto_configurator/auto_config.py.md) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md Title: Audio-Vision Language Model — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html Published Time: Fri, 05 Sep 2025 19:01:37 GMT Markdown Content: Audio-Vision Language Model[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#audio-vision-language-model "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------------------------- AVLM (Audio-Vision Language Model) is an extension of the VLM (Vision Language Model) framework designed to handle both audio and visual modalities in a unified manner. It enables users to perform multimodal reasoning across audio and vision inputs, such as speech transcribing, video captioning, image-audio question-answering. We have extended the NeVA model to support AVLM by providing a separate audio encoder to process input speech and sound. To migrate from NeVA to AVLM, users would: (1) replace the task encoder with AvlmTaskEncoder (which is designed to jointly encode audio and vision features while maintaining compatibility with the AVLM architecture); (2) set up AVLM model’s AVLMConfig architecture configuration, which is similar to NeVa’s NevaConfig, but now support both vision and audio encoder and projection layer. To get started with AVLM, follow these steps, which are similar to those for NeVA, with minor adjustments to accommodate audio inputs. We provide some default training recipes for AVLM at [avlm_8b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/avlm/recipes/avlm_8b.py). This pretraining and finetuning recipe works along with the AVLM-8B model architecture configuration defined in [avlm.py](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/avlm/model/avlm.py). Additionally, users can also look at the example script, such as [avlm_pretrain.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/avlm/avlm_pretrain.py) that contains all scripts for model configuration, data, training recipes in one file. Configure AVLM Model[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#configure-avlm-model "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------- We can configure each component of the AVLM model (language model, vision encoder, vision projector, audio encoder, audio projector), as shown in the default model architecture in configuration [avlm.py](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/avlm/model/avlm.py). @dataclass class AVLMConfig8B(AVLMConfig): """ Configuration class for the 8B parameter variant of the AVLM model. """ from transformers import PretrainedConfig language_transformer_config: TransformerConfig = field(default_factory=lambda: Llama3Config8B()) vision_transformer_config: Union[TransformerConfig, PretrainedConfig] = field( default_factory=lambda: HFCLIPVisionConfig( pretrained_model_name_or_path="openai/clip-vit-large-patch14-336", ) ) vision_projection_config: TransformerConfig = field( default_factory=lambda: MultimodalProjectorConfig( projector_type="mlp2x_gelu", input_size=1024, hidden_size=4096, ffn_hidden_size=4096) ) audio_transformer_config: TransformerConfig = field( default_factory=lambda: ASRModuleConfig( _target_ ="nemo.collections.speechlm.modules.asr_module.ASRModuleConfig", use_hf_auto_model=True, hf_trust_remote_code=False, hf_load_pretrained_weights=True, pretrained_model="openai/whisper-large-v3", hidden_size=1280, target_module="model.encoder", ) ) audio_projection_config: TransformerConfig = field( default_factory=lambda: MultimodalProjectorConfig( projector_type="mlp2x_gelu", input_size=1280, hidden_size=4096, ffn_hidden_size=4096) ) NeMo 2.0 Modalities Alignment Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#nemo-2-0-modalities-alignment-recipes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Similar to NeVA model, the first stage of training an AVLM model is aligning the modalities embeddings with text embeddings. This includes vision alignment and audio alignment. Vision Alignment[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#vision-alignment "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------ In this step, we only tune the vision projection layers to align vision embeddings with text embeddings. This step uses image/video-text datasets, such as image/video captioning, image question-answering, etc. The model’s components are intialized from pretrained models: Llama-3-8B for the language model, CLIP for vision encoder, Whisper-large-v3 for the audio encoder. The vision and audio projection layers are initialized from scratch. We can control which components will be training through the freeze_modules argument. from nemo.collections import avlm finetune = avlm.avlm_8b.pretrain_recipe( name="avlm_8b_pretrain", dir=f"/path/to/avlm_vision_alignment_checkpoints", num_nodes=1, num_gpus_per_node=8, language_model_from_pretrained='/root/.cache/nemo/models/meta/llama3_8b', # Can be None or change based on local checkpoint path freeze_modules={ "freeze_language_model": True, "freeze_vision_model": True, "freeze_audio_model": True, "freeze_vision_projection": False, "freeze_audio_projection": True, } ) Audio Alignment[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#audio-alignment "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------- In this step, we only tune the audio projection layers to align audio embeddings with text embeddings. This step uses audio-text datasets, such as sound captioning, speech transcribing, speech translation, etc. The model is initialized with previous step’s checkpoint. from nemo.collections import avlm finetune = avlm.avlm_8b.pretrain_recipe( name="avlm_8b_pretrain", dir=f"/path/to/avlm_audio_alignment_checkpoints", num_nodes=1, num_gpus_per_node=8, checkpoint_path=f"/path/to/avlm_vision_alignment_checkpoints", # Can be None or change based on local checkpoint path freeze_modules={ "freeze_language_model": True, "freeze_vision_model": True, "freeze_audio_model": True, "freeze_vision_projection": True, "freeze_audio_projection": False, } ) NeMo 2.0 Fine-Tuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#nemo-2-0-fine-tuning-recipes "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------ Here, depending on our choices, we can tune all or some components of the model (vision encoder, audio encoder, vision projector, audio projector, language model). This stage uses datasets involving both modalities in one sample, such as video (with audio) captioning, image-audio question-answering, etc. When finetuning, we can also enable PEFT (Parameter-Efficient Fine-Tuning) for efficient tuning the large language model, avoid out-of-memory problem when training all components in the AVLM model. Users can enable this by setting peft_scheme argument to “lora”. from nemo.collections import vlm finetune = avlm.avlm_8b.finetune_recipe( name="avlm_8b_finetune", dir=f"/path/to/avlm_multimodals_finetune_checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='none', # 'lora', 'none', checkpoint_path=f"/path/to/avlm_audio_alignment_checkpoints", freeze_modules={ "freeze_language_model": False, "freeze_vision_model": True, "freeze_audio_model": True, "freeze_vision_projection": False, "freeze_audio_projection": False, } ) Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides) to learn more about its configuration and execution system. Note The recipes use the `MockDataModule` for the `data` argument. You are expected to replace the `MockDataModule` with your custom dataset. Once you have your final configuration ready, you can execute it using any of the NeMo-Run supported executors. The simplest option is the local executor, which runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(finetune, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(finetune, direct=True) Use the Energon Dataloader[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#use-the-energon-dataloader "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------- Below is an example of how to set up the [Energon](https://github.com/NVIDIA/Megatron-Energon) data module for AVLM training. Note that, for AVLM’s Energon data module, we will provide the vision and audio encoders’s configs to AVLMSampleConfig. This is so that when processing data, we can directly calculate the exact numnber of tokens representing an image/audio, and insert their corresponding placeholders into the input tokens in advance. This helps simplify the model’s logic when using certain parallelisms, such as pipeline, context parallelisms, etc. from nemo.collections.avlm.data.energon import AVLMSampleConfig from nemo.collections.avlm import AVLMTaskEncoder from nemo.collections.avlm.data.energon import AVLMDataModule from nemo.collections.common.tokenizers.huggingface.auto_tokenizer import AutoTokenizer from transformers import AutoProcessor # Load the processor and tokenizer image_processor = AutoProcessor.from_pretrained("openai/clip-vit-large-patch14") tokenizer = AutoTokenizer("meta-llama/Meta-Llama-3-8B") # Paths and configuration data_path = "" # Define multimodal sample configuration avlm_sample_config = AVLMSampleConfig( audio_encoder_config={ "model_type": "whisper", "window_stride": 0.01, "sample_rate": 16000, "fixed_max_audio_length": None, "encoder_down_sampling": 2, }, image_encoder_config={ "model_type": "vit", "img_width": 336, "img_height": 336, "patch_size": 14, "projection_downsample_factor": None, }, ) # Initialize the AVLM task encoder task_encoder = AVLMTaskEncoder( tokenizer=tokenizer, audio_processor=None, image_processor=image_processor, multimodal_sample_config=avlm_sample_config, ) # Create the data module data = AVLMDataModule( path=data_path, num_workers=num_workers, micro_batch_size=mbs, global_batch_size=gbs, seq_length=decoder_seq_length, tokenizer=tokenizer, audio_processor=None, image_processor=image_processor, multimodal_sample_config=avlm_sample_config, task_encoder=task_encoder, ) Replace the `MockDataModule` in the default recipes with the above data. from nemo.collections import vlm # Define the fine-tuning recipe finetune = avlm.avlm_8b.finetune_recipe( name="avlm_8b_finetune", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='none', # 'lora', 'none' ) # Assign the above data module to the finetuning recipe finetune.data = data We have also included additional example scripts to further customize AVLM training: * **Pretraining**: [avlm_pretrain.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/avlm/avlm_pretrain.py) * **Generation**: [avlm_generation.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/avlm/avlm_generate.py) * **NeMo Run**: [avlm_nemo_run.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/avlm/avlm_nemo_run.py) These scripts allow for flexible and comprehensive training workflows tailored to your requirements. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/avlm.html.md#use-the-energon-dataloader) - [avlm_8b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/avlm/recipes/avlm_8b.py) - [avlm.py](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/avlm/model/avlm.py) - [avlm_pretrain.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/avlm/avlm_pretrain.py) - [documentation](https://github.com/NVIDIA/NeMo-Run/tree/main/docs/source/guides) - [Energon](https://github.com/NVIDIA/Megatron-Energon) - [avlm_generation.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/avlm/avlm_generate.py) - [avlm_nemo_run.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/avlm/avlm_nemo_run.py) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md Title: Gemma 3 Models — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html Published Time: Thu, 30 Oct 2025 07:07:34 GMT Markdown Content: Gemma 3 Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md#gemma-3-models "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------- Gemma 3 introduces powerful and efficient open models in 1B/4B/12B/27B sizes, available as both large language models (LLMs) and vision-language models (VLMs). It is optimized for deployment across cloud, edge, and mobile devices. It builds on the transformer decoder architecture with improvements like grouped-query attention, advanced positional embeddings, and training techniques aligned with the Gemini family. More details are available in Google’s official release. * **Gemma3 1B**: A 1B parameter text-only model. * **Gemma3 4B/12B/27B**: A 4B/12B/27B parameter model with vision encoder. **Resources:** * **Hugging Face Gemma Collection:**[HF Gemma collection](https://huggingface.co/collections/google/googles-gemma-models-family-675bfd70e574a62dd0e406bd) * **Google Gemma Source Code:**[GitHub Repository](https://github.com/google/gemma_pytorch) Import from Hugging Face to NeMo 2.0[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md#import-from-hugging-face-to-nemo-2-0 "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ To import the Hugging Face (HF) model and convert it to NeMo 2.0 format, run the following command. This step only needs to be performed once: To convert Gemma 1B (LLM Only) model, or if you only want to convert the LLM component of the 4B/12B/27B model: from nemo.collections import llm if __name__ == '__main__': # Specify the Hugging Face model ID (e.g., Gemma3 1B Instruct Model) hf_model_id = 'google/gemma-3-1b-it' # Import the model and convert to NeMo 2.0 format llm.import_ckpt( model=llm.Gemma3Model(config=llm.Gemma3Config1B()), source=f"hf://{hf_model_id}", ) To convert Gemma 4B/12B/27B VLM model: from nemo.collections import llm, vlm if __name__ == '__main__': # Specify the Hugging Face model ID (e.g., Gemma3 4B Instruct Model) hf_model_id = 'google/gemma-3-4b-it' # Import the model and convert to NeMo 2.0 format llm.import_ckpt( model=vlm.Gemma3VLModel(config=vlm.Gemma3VLConfig4B()), source=f"hf://{hf_model_id}", ) The command above saves the converted file in the NeMo cache folder, located at: `~/.cache/nemo`. If needed, you can change the default cache directory by setting the `NEMO_CACHE_DIR` environment variable before running the script. NeMo 2.0 Gemma 3 Scripts[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md#nemo-2-0-gemma-3-scripts "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------ The scripts for working with Gemma 3 models within the NeMo Framework are located in [scripts/vlm/](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm) and [scripts/llm/](https://github.com/NVIDIA/NeMo/tree/main/scripts/llm) . * **gemma3vl_generate.py**: Performs inference (generation) using a fine-tuned or pre-converted Gemma 3 NeMo 2.0 VLM model. **Usage:** python scripts/vlm/gemma3vl_generate.py \ --local_model_path= * **gemma3_generate.py**: Performs inference (generation) using a fine-tuned or pre-converted Gemma 3 NeMo 2.0 LLM 1B model. **Usage:** python scripts/llm/gemma3_generate.py * **Multi-Node Usage (Example with SLURM and Pyxis):** The following example demonstrates how to run text generation inference on 4 nodes with 8 GPUs each (total 32 GPUs) using SLURM. It assumes a containerized environment managed by Pyxis. srun --mpi=pmix --no-kill \ --container-image \ --container-mounts \ -N 4 --ntasks-per-node=8 -p --pty \ bash -c " \ python scripts/vlm/gemma3vl_generate.py \ --local_model_path= \ --tp 8 \ --pp 4 \ " * **gemma3vl_finetune.py**: Fine-tunes a Gemma 3 4B model on a given dataset. **Usage:** torchrun --nproc_per_node=2 scripts/vlm/gemma3vl_finetune.py * **gemma3_pretrain.py**: Pretrains a Gemma 3 1B LLM model from scratch. **Usage:** torchrun scripts/llm/gemma3_pretrain.py NeMo 2.0 Fine-Tuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md#nemo-2-0-fine-tuning-recipes "Link to this heading") -------------------------------------------------------------------------------------------------------------------------------------------------------------- We provide pre-defined recipes for fine-tuning Gemma 3 VLM models (Gemma3VLModel) using NeMo 2.0 and [NeMo-Run](https://github.com/NVIDIA/NeMo-Run). These recipes configure a `run.Partial` for one of the [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) api functions introduced in NeMo 2.0. The recipes use the `Gemma3VLMockDataModule` for the `data` argument by default. The recipes are hosted in [gemma3vl_4b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/gemma3vl_4b.py), [gemma3vl_12b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/gemma3vl_12b.py), and [gemma3vl_27b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/gemma3vl_27b.py) files. The Gemma 3 1B LLM model recipe is hosted in [gemma3_1b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma3_1b.py). Note The recipes use the `Gemma3VLMockDataModule` for the `data` argument. You are expected to replace the `Gemma3VLMockDataModule` with your custom dataset module. By default, the instruct version of the model is loaded. To load a different model, set resume_path args in the recipe We provide an example below on how to invoke the default recipe and override the data argument: from nemo.collections import vlm, llm # Get the fine-tuning recipe function (adjust for the specific Gemma VLM model) finetune = vlm.gemma3vl_4b.finetune_recipe( name="gemma3_vl_4b_finetune", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='lora', # or 'none' for full fine-tuning ) # Finetune LLM recipe for Gemma finetune = llm.gemma3_1b.finetune_recipe( name="gemma3_llm_1b_finetune", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='lora', # or 'none' for full fine-tuning ) By default, the fine-tuning recipe applies LoRA to all linear layers in the language model, including cross-attention layers, while keeping the vision model unfrozen. * **To configure which layers to apply LoRA**: Set `finetune.peft.target_modules`. For example, to apply LoRA only on the self-attention qkv projection layers, set `finetune.peft.target_modules=["*.language_model.*.linear_qkv"]`. * **To freeze the vision model**: Set `finetune.peft.freeze_vision_model=True`. * **To fine-tune the entire model without LoRA**: Set `peft_scheme='none'` in the recipe argument. Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://docs.nvidia.com/nemo/run/latest/guides/) to learn more about its configuration and execution system. Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(finetune, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(finetune, direct=True) ### Bring Your Own Data[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md#bring-your-own-data "Link to this heading") Replace the `Gemma3VLMockDataModule` in default recipes with your custom dataset module. Please refer to the [Data Preparation to Use Megatron-Energon Dataloader](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/energondataprep.html.md#energondataprep) for how to prepare your llava-like data for fine-tuning. from nemo.collections.vlm as vlm # Import your custom Gemma 3 data module and necessary configs from nemo.collections.vlm.data.data_module import EnergonDataModule from nemo.collections.vlm.gemma3vl.data.task_encoder import TaskEncoder as Gemma3VLTaskEncoder from nemo.collections.vlm.gemma3vl.data.task_encoder import TaskEncoderConfig as Gemma3VLTaskEncoderConfig # Define the fine-tuning recipe using the appropriate Gemma 3 recipe (adjust name if needed) finetune = vlm.recipes.gemma3vl_4b.finetune_recipe( name="gemma3vl_4b_finetune", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='lora', # or 'none' ) # Example custom dataset configuration (replace with your actual data setup) # Gemma 3 VL specific data configuration might be required here task_encoder = Gemma3VLTaskEncoder( config=Gemma3VLTaskEncoderConfig( hf_path='google/gemma-3-4b-it', # Use the appropriate model path ) ) custom_data = EnergonDataModule( path="/path/to/energon/dataset", # Path to your Energon dataset train_encoder=task_encoder, seq_length=8192, # Adjust as needed global_batch_size=16, # Adjust based on GPU memory micro_batch_size=1, # Adjust based on GPU memory num_workers=8, # Adjust based on system capabilities ) # Assign custom data to the fine-tuning recipe finetune.data = custom_data A comprehensive list of recipes that we currently support or plan to support soon is provided below for reference: | Recipe | Status | | --- | --- | | Gemma 3 VLM 4B Pretrain/LoRA/Full Fine-tuning | Yes | | Gemma 3 VLM 12B Pretrain/LoRA/Full Fine-tuning | Yes | | Gemma 3 VLM 27B Pretrain/LoRA/Full Fine-tuning | Yes | | Gemma 3 LLM 1B Pretrain/LoRA/Full Fine-tuning | Yes | Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/gemma3.html.md#bring-your-own-data) - [HF Gemma collection](https://huggingface.co/collections/google/googles-gemma-models-family-675bfd70e574a62dd0e406bd) - [GitHub Repository](https://github.com/google/gemma_pytorch) - [scripts/vlm/](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm) - [scripts/llm/](https://github.com/NVIDIA/NeMo/tree/main/scripts/llm) - [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) - [nemo.collections.llm](https://docs.nvidia.com/nemo-framework/user-guide/nemo-2.0/index.html) - [gemma3vl_4b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/gemma3vl_4b.py) - [gemma3vl_12b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/gemma3vl_12b.py) - [gemma3vl_27b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/gemma3vl_27b.py) - [gemma3_1b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/gemma3_1b.py) - [documentation](https://docs.nvidia.com/nemo/run/latest/guides/) - [Data Preparation to Use Megatron-Energon Dataloader](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/energondataprep.html.md#energondataprep) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md Title: LLaVA-Next — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html Published Time: Thu, 30 Oct 2025 07:07:34 GMT Markdown Content: LLaVA-Next[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#llava-next "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------- LLaVA-Next is an extension of the LLaVA model designed to handle high-resolution images efficiently through tiling. This enables users to work with larger image sizes for improved accuracy in various VL tasks. For more details about LLaVA-Next, refer to the [LLaVA-Next blog](https://llava-vl.github.io/blog/2024-01-30-llava-next/). We have extended the NeVA model to support LLaVA-Next. Users can easily switch to LLaVA-Next for high-resolution image tiling support with minimal configuration changes. To switch to LLaVA-Next from NeVA (LLaVA), replace the task encoder with LlavaNextTaskEncoder. It is designed to handle image tiling, supporting the LLaVA-Next architecture. To get started with LLaVA-Next, follow these steps, which are similar to NeVA but with minor modifications. Import from Hugging Face to NeMo 2.0[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#import-from-hugging-face-to-nemo-2-0 "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The following script downloads the checkpoint for LLM (vicuna - 7b) and converts it to NeMo format. The converted checkpoint is stored in the NeMo cache folder at: `~/.cache/nemo`. For example, when used with the NeMo container, the full path is `/root/.cache/nemo/models/lmsys/vicuna-7b-v1.5/`. The checkpoint can be used to initialize the LLM for pretraining LlaVa-Next. from nemo.collections.llm import import_ckpt from nemo.collections.llm import Llama2Config7B, LlamaModel if __name__ == "__main__": # Specify the Hugging Face model ID hf_model_id = "lmsys/vicuna-7b-v1.5" # Import the model and convert to NeMo 2.0 format import_ckpt( model=LlamaModel(Llama2Config7B()), source=f"hf://{hf_model_id}", ) This step is optional and is intended for users who want to fine-tune the LLaVA-Next model starting from a pretrained checkpoint from Hugging Face. To run the script, save it as import_llava_next.py and then execute it: python import_llava_next.py from nemo.collections.llm import import_ckpt from nemo.collections import vlm if __name__ == '__main__': # Specify the Hugging Face model ID hf_model_id = "llava-hf/llava-v1.6-vicuna-7b-hf" # Import the model and convert to NeMo 2.0 format import_ckpt( model=vlm.LlavaNextModel(vlm.LlavaNextConfig7B()), # Model configuration source=f"hf://{hf_model_id}", # Hugging Face model source ) NeMo 2.0 Pretraining Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#nemo-2-0-pretraining-recipes "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Similar to the NeVA model, we provide some default recipes for pretraining LLaVA-NEXT [llava_next_7b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/llava_next_7b.py). from nemo.collections import vlm finetune = vlm.llava_next_7b.pretrain_recipe( name="llava_next_7b_pretrain", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, language_model_from_pretrained='/root/.cache/nemo/models/lmsys/vicuna-7b-v1.5/', # Can be None or change based on local checkpoint path ) NeMo 2.0 Fine-Tuning Recipes[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#nemo-2-0-fine-tuning-recipes "Link to this heading") ----------------------------------------------------------------------------------------------------------------------------------------------------------------- We also provide a fine-tuning recipe - [llava_next_7b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/llava_next_7b.py) that you can use. from nemo.collections import vlm finetune = vlm.llava_next_7b.finetune_recipe( name="llava_next_7b_finetune", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='none', # 'lora', 'none' ) Note The configuration in the recipes is done using the NeMo-Run `run.Config` and `run.Partial` configuration objects. Please review the NeMo-Run [documentation](https://docs.nvidia.com/nemo/run/latest/guides/) to learn more about its configuration and execution system. Note The recipes use the `MockDataModule` for the `data` argument. You are expected to replace the `MockDataModule` with your custom dataset. Once you have your final configuration ready, you can execute it using any of the NeMo-Run supported executors. The simplest option is the local executor, which runs the pretraining locally in a separate process. You can use it as follows: import nemo_run as run run.run(finetune, executor=run.LocalExecutor()) Additionally, you can also run it directly in the same Python process as follows: run.run(finetune, direct=True) Use the Energon Dataloader[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#use-the-energon-dataloader "Link to this heading") ------------------------------------------------------------------------------------------------------------------------------------------------------------- Below is an example of how to set up the [Energon](https://github.com/NVIDIA/Megatron-Energon) data module for LLaVA-Next training: from nemo.collections.multimodal.data.energon.config import MultiModalSampleConfig from nemo.collections.vlm import LlavaNextTaskEncoder from nemo.collections.multimodal.data.energon import EnergonMultiModalDataModule from transformers import AutoProcessor # Load the processor from the pretrained LLaVA-Next model processor = AutoProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf") # Paths and configuration data_path = "" image_processor = processor.image_processor tokenizer = processor.tokenizer # Define multimodal sample configuration multimodal_sample_config = MultiModalSampleConfig() # Initialize the LLaVA-Next task encoder task_encoder = LlavaNextTaskEncoder( tokenizer=tokenizer, image_processor=image_processor, multimodal_sample_config=multimodal_sample_config, ) # Create the data module data = EnergonMultiModalDataModule( path=data_path, tokenizer=tokenizer, image_processor=image_processor, num_workers=8, micro_batch_size=4, global_batch_size=32, multimodal_sample_config=multimodal_sample_config, task_encoder=task_encoder, ) Replace the `MockDataModule` in the default recipes with the above data. from nemo.collections import vlm # Define the fine-tuning recipe finetune = vlm.llava_next_7b.finetune_recipe( name="llava_next_7b_finetune", dir=f"/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, peft_scheme='none', # 'lora', 'none' ) # Assign the above data module to the finetuning recipe finetune.data = data We have also included additional example scripts to further customize LLaVA-NeXT training: * **Pretraining**: [llava_next_pretrain.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_pretrain.py) * **Finetuning**: [llava_next_finetune.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_finetune.py) * **Generation**: [llava_next_generation.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_generation.py) * **NeMo Run**: [llava_next_nemo_run.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_nemo_run.py) These scripts allow for flexible and comprehensive training workflows tailored to your requirements. Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/vlms/llavanext.html.md#use-the-energon-dataloader) - [LLaVA-Next blog](https://llava-vl.github.io/blog/2024-01-30-llava-next/) - [llava_next_7b](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/vlm/recipes/llava_next_7b.py) - [documentation](https://docs.nvidia.com/nemo/run/latest/guides/) - [Energon](https://github.com/NVIDIA/Megatron-Energon) - [llava_next_pretrain.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_pretrain.py) - [llava_next_finetune.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_finetune.py) - [llava_next_generation.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_generation.py) - [llava_next_nemo_run.py](https://github.com/NVIDIA/NeMo/tree/main/scripts/vlm/llava_next_nemo_run.py) --- # Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md Title: Why NeMo Framework? — NVIDIA NeMo Framework User Guide URL Source: https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html Published Time: Fri, 05 Sep 2025 18:59:42 GMT Markdown Content: Why NeMo Framework?[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#why-nemo-framework "Link to this heading") ---------------------------------------------------------------------------------------------------------------------------------------- Developing deep learning models for Gen AI is a complex process, encompassing the design, construction, and training of models across specific domains. Achieving high accuracy requires extensive experimentation, fine-tuning for diverse tasks and domain-specific datasets, ensuring optimal training performance, and preparing models for deployment. NeMo simplifies this intricate development landscape through its modular approach. It introduces neural modules—logical blocks of AI applications with typed inputs and outputs—facilitating the seamless construction of models by chaining these blocks based on neural types. This methodology accelerates development, improves model accuracy on domain-specific data, and promotes modularity, flexibility, and reusability within AI workflows. Further enhancing its utility, NeMo provides collections of modules designed for core tasks in speech recognition, natural language processing, and speech synthesis. It supports the training of new models or fine-tuning of existing pre-trained modules, leveraging pre-trained weights to expedite the training process. The framework encompasses models trained and optimized for multiple languages, including Mandarin, and offers extensive tutorials for conversational AI development across these languages. NeMo’s emphasis on interoperability with other research tools broadens its applicability and ease of use. Large Language Models & Multimodal (LLM & MM)[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#large-language-models-multimodal-llm-mm "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NeMo excels in training large-scale LLM & MM, utilizing optimizations from the Megatron Core library and Transformer Engine to deliver state-of-the-art performance. It includes a comprehensive feature set for large-scale training: * Supports Multi-GPU and Multi-Node computing to enable scalability. * Precision options including FP32/TF32, FP16, BF16, and TransformerEngine/FP8. * Parallelism strategies: Data parallelism, Tensor parallelism, Pipeline parallelism, Interleaved Pipeline parallelism, Sequence parallelism and Context parallelism, Distributed Optimizer, and Fully Shared Data Parallel. * Optimized utilities such as Flash Attention, Activation Recomputation, and Communication Overlap. * Advanced checkpointing through the Distributed Checkpoint Format. Speech AI[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#speech-ai "Link to this heading") --------------------------------------------------------------------------------------------------------------------- ### Data Augmentation[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#data-augmentation "Link to this heading") Augmenting ASR data is essential but can be time-consuming during training. NeMo advocates for offline dataset preprocessing to conserve training time, illustrated in a tutorial covering speed perturbation and noise augmentation techniques. ### Speech Data Explorer[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#speech-data-explorer "Link to this heading") A Dash-based tool for interactive exploration of ASR/TTS datasets, providing insights into dataset statistics, utterance inspections, and error analysis. Installation instructions for this tool are available in NeMo’s GitHub repository. ### Using Kaldi Formatted Data[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#using-kaldi-formatted-data "Link to this heading") NeMo supports Kaldi-formatted datasets, enabling the development of models with existing Kaldi data by substituting the AudioToTextDataLayer with the KaldiFeatureDataLayer. ### Speech Command Recognition[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#speech-command-recognition "Link to this heading") Specialized training for speech command recognition is covered in a dedicated NeMo Jupyter notebook, guiding users through the process of training a QuartzNet model on a speech commands dataset. General Optimizations[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#general-optimizations "Link to this heading") --------------------------------------------------------------------------------------------------------------------------------------------- ### Mixed Precision Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#mixed-precision-training "Link to this heading") Utilizing NVIDIA’s Apex AMP, mixed precision training enhances training speeds with minimal precision loss, especially on hardware equipped with Tensor Cores. ### Multi-GPU Training[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#multi-gpu-training "Link to this heading") NeMo enables multi-GPU training, substantially reducing training durations for large models. This section clarifies the advantages of mixed precision and the distinctions between multi-GPU and multi-node training. ### NeMo, PyTorch Lightning, and Hydra[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#nemo-pytorch-lightning-and-hydra "Link to this heading") Integrating PyTorch Lightning for training efficiency and Hydra for configuration management, NeMo streamlines conversational AI research by organizing PyTorch code and automating training workflows. ### Optimized Pretrained Models[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#optimized-pretrained-models "Link to this heading") Through NVIDIA GPU Cloud (NGC), NeMo offers a collection of optimized, pre-trained models for various conversational AI applications, facilitating easy integration into research projects and providing a head start in conversational AI development. Resources[#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#resources "Link to this heading") --------------------------------------------------------------------------------------------------------------------- Ensure you are familiar with the following resources for NeMo. * Developer blogs * [How to Build Domain Specific Automatic Speech Recognition Models on GPUs](https://developer.nvidia.com/blog/how-to-build-domain-specific-automatic-speech-recognition-models-on-gpus.md/) * [Develop Smaller Speech Recognition Models with NVIDIA’s NeMo Framework](https://developer.nvidia.com/blog/develop-smaller-speech-recognition-models-with-nvidias-nemo-framework.md/) * [Neural Modules for Fast Development of Speech and Language Models](https://developer.nvidia.com/blog/neural-modules-for-speech-language-models.md/) * Domain specific, transfer learning, Docker container with Jupyter Notebooks * [Domain Specific NeMo ASR Application](https://ngc.nvidia.com/catalog/containers/nvidia:nemo_asr_app_img) Links/Buttons: - [#](https://docs.nvidia.com/nemo-framework/user-guide/latest/why-nemo.html.md#resources) - [How to Build Domain Specific Automatic Speech Recognition Models on GPUs](https://developer.nvidia.com/blog/how-to-build-domain-specific-automatic-speech-recognition-models-on-gpus.md/) - [Develop Smaller Speech Recognition Models with NVIDIA’s NeMo Framework](https://developer.nvidia.com/blog/develop-smaller-speech-recognition-models-with-nvidias-nemo-framework.md/) - [Neural Modules for Fast Development of Speech and Language Models](https://developer.nvidia.com/blog/neural-modules-for-speech-language-models.md/) - [Domain Specific NeMo ASR Application](https://ngc.nvidia.com/catalog/containers/nvidia:nemo_asr_app_img)