# Lmdeploy > .. autoclass:: PytorchEngineConfig ## Pages - [inference pipeline](api-pipeline.md): inference pipeline - [On Other Platforms](get-started.md): On Other Platforms - [Welcome to LMDeploy's tutorials!](index.md): Welcome to LMDeploy's tutorials! - [Vision-Language Models](multi-modal.md): Vision-Language Models - [Customized chat template](advance-chat-template.md): The effect of the applied chat template can be observed by **setting log level**`INFO`. - [Context Parallel](advance-context-parallel.md): When the memory on a single GPU is insufficient to deploy a model, it is often deployed using tensor parallelism (TP)... - [How to debug Turbomind](advance-debug-turbomind.md): Turbomind is implemented in C++, which is not as easy to debug as Python. This document provides basic methods for de... - [Context length extrapolation](advance-long-context.md): Long text extrapolation refers to the ability of LLM to handle data longer than the training text during inference. T... - [Production Metrics](advance-metrics.md): LMDeploy exposes a set of metrics via Prometheus, and provides visualization via Grafana. - [PyTorchEngine Multi-Node Deployment Guide](advance-pytorch-multinodes.md): To support larger-scale model deployment requirements, PyTorchEngine provides multi-node deployment support. Below ar... - [PyTorchEngine Multithread](advance-pytorch-multithread.md): We have removed`thread_safe`mode from PytorchEngine since [PR2907](https://github.com/InternLM/lmdeploy/pull/2907).... - [lmdeploy.pytorch New Model Support](advance-pytorch-new-model.md): lmdeploy.pytorch is designed to simplify the support for new models and the development of prototypes. Users can adap... - [PyTorchEngine Profiling](advance-pytorch-profiling.md): We provide multiple profiler to analysis the performance of PyTorchEngine. - [Speculative Decoding](advance-spec-decoding.md): Speculative decoding is an optimization technique that introcude a lightweight draft model to propose multiple next t... - [Structured output](advance-structed-output.md): Structured output, also known as guided decoding, forces the model to generate text that exactly matches a user-suppl... - [Update Weights](advance-update-weights.md): LMDeploy supports update model weights online for scenes such as RL training. Here are the steps to do so. - [TurboMind Benchmark on A100](benchmark-a100-fp16.md): All the following results are tested on A100-80G(x8) CUDA 11.8. - [Benchmark](benchmark-benchmark.md): Please install the lmdeploy precompiled package and download the script and the test dataset: - [Model Evaluation Guide](benchmark-evaluate-with-opencompass.md): This document describes how to evaluate a model's capabilities on academic datasets using OpenCompass and LMDeploy. T... - [Multi-Modal Model Evaluation Guide](benchmark-evaluate-with-vlmevalkit.md): This document describes how to evaluate multi-modal models' capabilities using VLMEvalKit and LMDeploy. - [FAQ](faq.md): There is probably a cached mmengine in your local host. Try to install its latest version. - [Get Started with Huawei Ascend](get-started-ascend-get-started.md): We currently support running lmdeploy on **Atlas 800T A3, Atlas 800T A2 and Atlas 300I Duo**. - [Cambricon](get-started-camb-get-started.md): The usage of lmdeploy on a Cambricon device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. - [Quick Start](get-started-get-started.md): This tutorial shows the usage of LMDeploy on CUDA platform: - [Installation](get-started-installation.md): LMDeploy is a python library for compressing, deploying, and serving Large Language Models(LLMs) and Vision-Language ... - [MetaX-tech](get-started-maca-get-started.md): The usage of lmdeploy on a MetaX-tech device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. - [Load huggingface model directly](inference-load-hf.md): Starting from v0.1.0, Turbomind adds the ability to pre-process the model parameters on-the-fly while loading them fr... - [Architecture of lmdeploy.pytorch](inference-pytorch.md): `lmdeploy.pytorch`is an inference engine in LMDeploy that offers a developer-friendly framework to users interested ... - [Architecture of TurboMind](inference-turbomind.md): TurboMind is an inference engine that supports high throughput inference for conversational LLMs. It's based on NVIDI... - [TurboMind Config](inference-turbomind-config.md): TurboMind is one of the inference engines of LMDeploy. When using it to do model inference, you need to convert the i... - [OpenAI Compatible Server](llm-api-server.md): This article primarily discusses the deployment of a single LLM model across multiple GPUs on a single node, providin... - [Serving LoRA](llm-api-server-lora.md): LoRA is currently only supported by the PyTorch backend. Its deployment process is similar to that of other models, a... - [Reasoning Outputs](llm-api-server-reasoning.md): For models that support reasoning capabilities, such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)... - [Tools Calling](llm-api-server-tools.md): LMDeploy supports tools for InternLM2, InternLM2.5, llama3.1 and Qwen2.5 models. Please use`--tool-call-parser`to s... - [codellama](llm-codellama.md): [codellama](https://github.com/facebookresearch/codellama) features enhanced coding capabilities. It can generate cod... - [Offline Inference Pipeline](llm-pipeline.md): In this tutorial, We will present a list of examples to introduce the usage of`lmdeploy.pipeline`. - [Request Distributor Server](llm-proxy-server.md): The request distributor service can parallelize multiple api_server services. Users only need to access the proxy URL... - [OpenAI Compatible Server](multi-modal-api-server-vl.md): This article primarily discusses the deployment of a single large vision language model across multiple GPUs on a sin... - [CogVLM](multi-modal-cogvlm.md): CogVLM is a powerful open-source visual language model (VLM). LMDeploy supports CogVLM-17B models like [THUDM/cogvlm-... - [DeepSeek-VL2](multi-modal-deepseek-vl2.md): DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves... - [Gemma3](multi-modal-gemma3.md): Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technolo... - [InternVL](multi-modal-internvl.md): LMDeploy supports the following InternVL series of models, which are detailed in the table below: - [LLaVA](multi-modal-llava.md): LMDeploy supports the following llava series of models, which are detailed in the table below: - [MiniCPM-V](multi-modal-minicpmv.md): LMDeploy supports the following MiniCPM-V series of models, which are detailed in the table below: - [Mllama](multi-modal-mllama.md): [Llama3.2-VL](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf) is a family of large l... - [Molmo](multi-modal-molmo.md): LMDeploy supports the following molmo series of models, which are detailed in the table below: - [Phi-3 Vision](multi-modal-phi3.md): [Phi-3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) is a family of small language an... - [Qwen2.5-VL](multi-modal-qwen2-5-vl.md): LMDeploy supports the following Qwen-VL series of models, which are detailed in the table below: - [Qwen2-VL](multi-modal-qwen2-vl.md): LMDeploy supports the following Qwen-VL series of models, which are detailed in the table below: - [Offline Inference Pipeline](multi-modal-vl-pipeline.md): LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipe... - [InternLM-XComposer-2.5](multi-modal-xcomposer2d5.md): [InternLM-XComposer-2.5](https://github.com/InternLM/InternLM-XComposer) excels in various text-image comprehension a... - [INT4/INT8 KV Cache](quantization-kv-quant.md): Since v0.4.0, LMDeploy has supported **online** key-value (kv) cache quantization with int4 and int8 numerical precis... - [AWQ/GPTQ](quantization-w4a16.md): LMDeploy TurboMind engine supports the inference of 4bit quantized models that are quantized both by [AWQ](https://ar... - [SmoothQuant](quantization-w8a8.md): LMDeploy provides functions for quantization and inference of large language models using 8-bit integers(INT8). For G... - [Reward Models](supported-models-reward-models.md): LMDeploy supports reward models, which are detailed in the table below: - [Supported Models](supported-models-supported-models.md): The following tables detail the models supported by LMDeploy's TurboMind engine and PyTorch engine across different p...