# Qdrant > Documentation for Qdrant --- # Qdrant Documentation Source: https://qdrant.tech/llms-full.txt --- --- # https://qdrant.tech/ llms-full.txt ## Overall Summary > Qdrant is a cutting-edge platform focused on delivering exceptional performance and efficiency in vector similarity search. As a robust vector database, it specializes in managing, searching, and retrieving high-dimensional vector data, essential for enhancing AI applications, machine learning, and modern search engines. With a suite of powerful features such as state-of-the-art hybrid search capabilities, retrieval-augmented generation (RAG) applications, and dense and sparse vector support, Qdrant stands out as an industry leader. Its offerings include managed cloud services, enabling users to harness the robust functionality of Qdrant without the burden of maintaining infrastructure. The platform supports advanced data security measures and seamless integrations with popular platforms and frameworks, catering to diverse data handling and analytic needs. Additionally, Qdrant offers comprehensive solutions for complex searching requirements through its innovative Query API and multivector representations, allowing for precise matching and enhanced retrieval quality. With its commitment to open-source principles and continuous innovation, Qdrant tailors solutions to meet both small-scale projects and enterprise-level demands efficiently, helping organizations unlock profound insights from their unstructured data and optimize their AI capabilities. <|page-1-lllmstxt|> ## backups - [Documentation](https://qdrant.tech/documentation/) - [Private cloud](https://qdrant.tech/documentation/private-cloud/) - Backups --- # [Anchor](https://qdrant.tech/documentation/private-cloud/backups/\#backups) Backups To create a one-time backup, create a `QdrantClusterSnapshot` resource: ```yaml apiVersion: qdrant.io/v1 kind: QdrantClusterSnapshot metadata: name: "qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840-snapshot-timestamp" labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" retention: 1h ``` You can also create a recurring backup with the `QdrantClusterScheduledSnapshot` resource: ```yaml apiVersion: qdrant.io/v1 kind: QdrantClusterScheduledSnapshot metadata: name: "qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840-snapshot-timestamp" labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: scheduleShortId: a7d8d973 cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" # every hour schedule: "0 * * * *" retention: 1h ``` To resture from a backup, create a `QdrantClusterRestore` resource: ```yaml apiVersion: qdrant.io/v1 kind: QdrantClusterRestore metadata: name: "qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840-snapshot-restore-01" labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: source: snapshotName: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840-snapshot-timestamp namespace: qdrant-private-cloud destination: name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840 namespace: qdrant-private-cloud ``` Note that with all resources `cluster-id` and `customer-id` label must be set to the values of the corresponding `QdrantCluster` resource. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/backups.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/backups.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-2-lllmstxt|> ## benchmark-faq --- # [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#benchmarks-faq) Benchmarks F.A.Q. ## [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#are-we-biased) Are we biased? Probably, yes. Even if we try to be objective, we are not experts in using all the existing vector databases. We build Qdrant and know the most about it. Due to that, we could have missed some important tweaks in different vector search engines. However, we tried our best, kept scrolling the docs up and down, experimented with combinations of different configurations, and gave all of them an equal chance to stand out. If you believe you can do it better than us, our **benchmarks are fully [open-sourced](https://github.com/qdrant/vector-db-benchmark), and contributions are welcome**! ## [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#what-do-we-measure) What do we measure? There are several factors considered while deciding on which database to use. Of course, some of them support a different subset of functionalities, and those might be a key factor to make the decision. But in general, we all care about the search precision, speed, and resources required to achieve it. There is one important thing - **the speed of the vector databases should to be compared only if they achieve the same precision**. Otherwise, they could maximize the speed factors by providing inaccurate results, which everybody would rather avoid. Thus, our benchmark results are compared only at a specific search precision threshold. ## [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#how-we-select-hardware) How we select hardware? In our experiments, we are not focusing on the absolute values of the metrics but rather on a relative comparison of different engines. What is important is the fact we used the same machine for all the tests. It was just wiped off between launching different engines. We selected an average machine, which you can easily rent from almost any cloud provider. No extra quota or custom configuration is required. ## [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#why-you-are-not-comparing-with-faiss-or-annoy) Why you are not comparing with FAISS or Annoy? Libraries like FAISS provide a great tool to do experiments with vector search. But they are far away from real usage in production environments. If you are using FAISS in production, in the best case, you never need to update it in real-time. In the worst case, you have to create your custom wrapper around it to support CRUD, high availability, horizontal scalability, concurrent access, and so on. Some vector search engines even use FAISS under the hood, but a search engine is much more than just an indexing algorithm. We do, however, use the same benchmark datasets as the famous [ann-benchmarks project](https://github.com/erikbern/ann-benchmarks), so you can align your expectations for any practical reasons. ### [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#why-we-decided-to-test-with-the-python-client) Why we decided to test with the Python client There is no consensus when it comes to the best technology to run benchmarks. You’re free to choose Go, Java or Rust-based systems. But there are two main reasons for us to use Python for this: 1. While generating embeddings you’re most likely going to use Python and python based ML frameworks. 2. Based on GitHub stars, python clients are one of the most popular clients across all the engines. From the user’s perspective, the crucial thing is the latency perceived while using a specific library - in most cases a Python client. Nobody can and even should redefine the whole technology stack, just because of using a specific search tool. That’s why we decided to focus primarily on official Python libraries, provided by the database authors. Those may use some different protocols under the hood, but at the end of the day, we do not care how the data is transferred, as long as it ends up in the target location. ## [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#what-about-closed-source-saas-platforms) What about closed-source SaaS platforms? There are some vector databases available as SaaS only so that we couldn’t test them on the same machine as the rest of the systems. That makes the comparison unfair. That’s why we purely focused on testing the Open Source vector databases, so everybody may reproduce the benchmarks easily. This is not the final list, and we’ll continue benchmarking as many different engines as possible. ## [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#how-to-reproduce-the-benchmark) How to reproduce the benchmark? The source code is available on [Github](https://github.com/qdrant/vector-db-benchmark) and has a `README.md` file describing the process of running the benchmark for a specific engine. ## [Anchor](https://qdrant.tech/benchmarks/benchmark-faq/\#how-to-contribute) How to contribute? We made the benchmark Open Source because we believe that it has to be transparent. We could have misconfigured one of the engines or just done it inefficiently. If you feel like you could help us out, check out our [benchmark repository](https://github.com/qdrant/vector-db-benchmark). Share this article [x](https://twitter.com/intent/tweet?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fbenchmark-faq%2F&text=Benchmarks%20F.A.Q. "x")[LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fbenchmark-faq%2F "LinkedIn") Up! <|page-3-lllmstxt|> ## qdrant.tech --- # High-Performance Vector Search at Scale Powering the next generation of AI applications with advanced, open-source vector similarity search technology. [Get Started](https://cloud.qdrant.io/signup) [Learn More](https://qdrant.tech/qdrant-vector-database/) [Star us\\ 24.2k](https://github.com/qdrant/qdrant) ![Hero image: an astronaut looking at dark hole from the planet surface.](https://qdrant.tech/img/hero-home-illustration-x1.png) Qdrant Powers Thousands of Top AI Solutions. [Customer Stories](https://qdrant.tech/customers/) ## AI Meets Advanced Vector Search The leading open source vector database and similarity search engine designed to handle high-dimensional vectors for performance and massive-scale AI applications. [All features](https://qdrant.tech/qdrant-vector-database/) [**Cloud-Native Scalability & High-Availability** \\ \\ Enterprise-grade Managed Cloud. Vertical and horizontal scaling and zero-downtime upgrades.\\ \\ Qdrant Cloud](https://qdrant.tech/cloud/) [**Ease of Use & Simple Deployment** \\ \\ Quick deployment in any environment with Docker and a lean API for easy integration, ideal for local testing.\\ \\ Quick Start Guide](https://qdrant.tech/documentation/quick-start/) [**Cost Efficiency with Storage Options** \\ \\ Dramatically reduce memory usage with built-in compression options and offload data to disk.\\ \\ Quantization](https://qdrant.tech/documentation/guides/quantization/) [**Rust-Powered Reliability & Performance** \\ \\ Purpose built in Rust for unmatched speed and reliability even when processing billions of vectors.\\ \\ Benchmarks](https://qdrant.tech/benchmarks/) ### Our Customers Words [Customer Stories](https://qdrant.tech/customers/) ![Cognizant](https://qdrant.tech/img/brands/cognizant.svg) “We LOVE Qdrant! The exceptional engineering, strong business value, and outstanding team behind the product drove our choice. Thank you for your great contribution to the technology community!” ![Kyle Tobin](https://qdrant.tech/img/customers/kyle-tobin.png) Kyle Tobin Principal, Cognizant ![Hubspot](https://qdrant.tech/img/brands/hubspot.svg) “Qdrant powers our demanding recommendation and RAG applications. We chose it for its ease of deployment and high performance at scale, and have been consistently impressed with its results.” ![Srubin Sethu Madhavan](https://qdrant.tech/img/customers/srubin-sethu-madhavan.svg) Srubin Sethu Madhavan Technical Lead II at Hubspot ![Bayer](https://qdrant.tech/img/brands/bayer.svg) “VectorStores are definitely here to stay, the objects in the world around us from image, sound, video and text become easily universal and searchable thanks to the embedding models. I personally recommend Qdrant. We have been using it for a while and couldn't be happier.“ ![Hooman Sedghamiz](https://qdrant.tech/img/customers/hooman-sedghamiz.svg) Hooman Sedghamiz Director Al /ML, Bayer ![CB Insights](https://qdrant.tech/img/brands/cb-insights.svg) “We looked at all the big options out there right now for vector databases, with our focus on ease of use, performance, pricing, and communication. **Qdrant came out on top in each category...** ultimately, it wasn't much of a contest.” ![Alex Webb](https://qdrant.tech/img/customers/alex-webb.svg) Alex Webb Director of Engineering, CB Insights ![Bosch](https://qdrant.tech/img/brands/bosch.svg) “With Qdrant, we found the missing piece to develop our own provider independent multimodal generative AI platform on enterprise scale.” ![Jeremy T. & Daly Singh](https://qdrant.tech/img/customers/jeremy-t.png)![Jeremy T. & Daly Singh](https://qdrant.tech/img/customers/daly-singh.png) Jeremy T. & Daly Singh Generative AI Expert & Product Owner, Bosch ![Cognizant](https://qdrant.tech/img/brands/cognizant.svg) “We LOVE Qdrant! The exceptional engineering, strong business value, and outstanding team behind the product drove our choice. Thank you for your great contribution to the technology community!” ![Kyle Tobin](https://qdrant.tech/img/customers/kyle-tobin.png) Kyle Tobin Principal, Cognizant ![Hubspot](https://qdrant.tech/img/brands/hubspot.svg) “Qdrant powers our demanding recommendation and RAG applications. We chose it for its ease of deployment and high performance at scale, and have been consistently impressed with its results.” ![Srubin Sethu Madhavan](https://qdrant.tech/img/customers/srubin-sethu-madhavan.svg) Srubin Sethu Madhavan Technical Lead II at Hubspot ![Bayer](https://qdrant.tech/img/brands/bayer.svg) “VectorStores are definitely here to stay, the objects in the world around us from image, sound, video and text become easily universal and searchable thanks to the embedding models. I personally recommend Qdrant. We have been using it for a while and couldn't be happier.“ ![Hooman Sedghamiz](https://qdrant.tech/img/customers/hooman-sedghamiz.svg) Hooman Sedghamiz Director Al /ML, Bayer ![CB Insights](https://qdrant.tech/img/brands/cb-insights.svg) “We looked at all the big options out there right now for vector databases, with our focus on ease of use, performance, pricing, and communication. **Qdrant came out on top in each category...** ultimately, it wasn't much of a contest.” ![Alex Webb](https://qdrant.tech/img/customers/alex-webb.svg) Alex Webb Director of Engineering, CB Insights ![Bosch](https://qdrant.tech/img/brands/bosch.svg) “With Qdrant, we found the missing piece to develop our own provider independent multimodal generative AI platform on enterprise scale.” ![Jeremy T. & Daly Singh](https://qdrant.tech/img/customers/jeremy-t.png)![Jeremy T. & Daly Singh](https://qdrant.tech/img/customers/daly-singh.png) Jeremy T. & Daly Singh Generative AI Expert & Product Owner, Bosch ![Cognizant](https://qdrant.tech/img/brands/cognizant.svg) “We LOVE Qdrant! The exceptional engineering, strong business value, and outstanding team behind the product drove our choice. Thank you for your great contribution to the technology community!” ![Kyle Tobin](https://qdrant.tech/img/customers/kyle-tobin.png) Kyle Tobin Principal, Cognizant ![Hubspot](https://qdrant.tech/img/brands/hubspot.svg) “Qdrant powers our demanding recommendation and RAG applications. We chose it for its ease of deployment and high performance at scale, and have been consistently impressed with its results.” ![Srubin Sethu Madhavan](https://qdrant.tech/img/customers/srubin-sethu-madhavan.svg) Srubin Sethu Madhavan Technical Lead II at Hubspot ![Bayer](https://qdrant.tech/img/brands/bayer.svg) “VectorStores are definitely here to stay, the objects in the world around us from image, sound, video and text become easily universal and searchable thanks to the embedding models. I personally recommend Qdrant. We have been using it for a while and couldn't be happier.“ ![Hooman Sedghamiz](https://qdrant.tech/img/customers/hooman-sedghamiz.svg) Hooman Sedghamiz Director Al /ML, Bayer ![CB Insights](https://qdrant.tech/img/brands/cb-insights.svg) “We looked at all the big options out there right now for vector databases, with our focus on ease of use, performance, pricing, and communication. **Qdrant came out on top in each category...** ultimately, it wasn't much of a contest.” ![Alex Webb](https://qdrant.tech/img/customers/alex-webb.svg) Alex Webb Director of Engineering, CB Insights ![Bosch](https://qdrant.tech/img/brands/bosch.svg) “With Qdrant, we found the missing piece to develop our own provider independent multimodal generative AI platform on enterprise scale.” ![Jeremy T. & Daly Singh](https://qdrant.tech/img/customers/jeremy-t.png)![Jeremy T. & Daly Singh](https://qdrant.tech/img/customers/daly-singh.png) Jeremy T. & Daly Singh Generative AI Expert & Product Owner, Bosch ![Cognizant](https://qdrant.tech/img/brands/cognizant.svg) “We LOVE Qdrant! The exceptional engineering, strong business value, and outstanding team behind the product drove our choice. Thank you for your great contribution to the technology community!” ![Kyle Tobin](https://qdrant.tech/img/customers/kyle-tobin.png) Kyle Tobin Principal, Cognizant ![Hubspot](https://qdrant.tech/img/brands/hubspot.svg) “Qdrant powers our demanding recommendation and RAG applications. We chose it for its ease of deployment and high performance at scale, and have been consistently impressed with its results.” ![Srubin Sethu Madhavan](https://qdrant.tech/img/customers/srubin-sethu-madhavan.svg) Srubin Sethu Madhavan Technical Lead II at Hubspot See what our community is saying on our [Vector Space Wall](https://testimonial.to/qdrant/all) ## Integrations Qdrant integrates with all leading [embeddings](https://qdrant.tech/documentation/embeddings/) and [frameworks](https://qdrant.tech/documentation/frameworks/). [See Integrations](https://qdrant.tech/documentation/frameworks/) ### Deploy Qdrant locally with Docker Get started with our [Quick Start Guide](https://qdrant.tech/documentation/quick-start/), or our main [GitHub repository](https://github.com/qdrant/qdrant). `1 docker pull qdrant/qdrant 2 docker run -p 6333:6333 qdrant/qdrant ` ## Vectors in Action Turn embeddings or neural network encoders into full-fledged applications for matching, searching, recommending, and more. #### Advanced Search Elevate your apps with advanced search capabilities. Qdrant excels in processing high-dimensional data, enabling nuanced similarity searches, and understanding semantics in depth. Qdrant also handles multimodal data with fast and accurate search algorithms. [Learn More](https://qdrant.tech/advanced-search/) #### Recommendation Systems Create highly responsive and personalized recommendation systems with tailored suggestions. Qdrant’s Recommendation API offers great flexibility, featuring options such as best score recommendation strategy. This enables new scenarios of using multiple vectors in a single query to impact result relevancy. [Learn More](https://qdrant.tech/recommendations/) #### Retrieval Augmented Generation (RAG) Enhance the quality of AI-generated content. Leverage Qdrant's efficient nearest neighbor search and payload filtering features for retrieval-augmented generation. You can then quickly access relevant vectors and integrate a vast array of data points. [Learn More](https://qdrant.tech/rag/) #### Data Analysis and Anomaly Detection Transform your approach to Data Analysis and Anomaly Detection. Leverage vectors to quickly identify patterns and outliers in complex datasets. This ensures robust and real-time anomaly detection for critical applications. [Learn More](https://qdrant.tech/data-analysis-anomaly-detection/) #### AI Agents Unlock the full potential of your AI agents with Qdrant’s powerful vector search and scalable infrastructure, allowing them to handle complex tasks, adapt in real time, and drive smarter, data-driven outcomes across any environment. [Learn More](https://qdrant.tech/ai-agents/) ### Get started for free Turn embeddings or neural network encoders into full-fledged applications for matching, searching, recommending, and more. [Get Started](https://cloud.qdrant.io/signup) <|page-4-lllmstxt|> ## fastembed - [Documentation](https://qdrant.tech/documentation/) - FastEmbed --- # [Anchor](https://qdrant.tech/documentation/fastembed/\#what-is-fastembed) What is FastEmbed? FastEmbed is a lightweight Python library built for embedding generation. It supports popular embedding models and offers a user-friendly experience for embedding data into vector space. By using FastEmbed, you can ensure that your embedding generation process is not only fast and efficient but also highly accurate, meeting the needs of various machine learning and natural language processing applications. FastEmbed easily integrates with Qdrant for a variety of multimodal search purposes. ## [Anchor](https://qdrant.tech/documentation/fastembed/\#how-to-get-started-with-fastembed) How to get started with FastEmbed | Beginner | Advanced | | --- | --- | | [Generate Text Embedings with FastEmbed](https://qdrant.tech/documentation/fastembed/fastembed-quickstart/) | [Combine FastEmbed with Qdrant for Vector Search](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/) | ## [Anchor](https://qdrant.tech/documentation/fastembed/\#why-is-fastembed-useful) Why is FastEmbed useful? - Light: Unlike other inference frameworks, such as PyTorch, FastEmbed requires very little external dependencies. Because it uses the ONNX runtime, it is perfect for serverless environments like AWS Lambda. - Fast: By using ONNX, FastEmbed ensures high-performance inference across various hardware platforms. - Accurate: FastEmbed aims for better accuracy and recall than models like OpenAI’s `Ada-002`. It always uses model which demonstrate strong results on the MTEB leaderboard. - Support: FastEmbed supports a wide range of models, including multilingual ones, to meet diverse use case needs. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-5-lllmstxt|> ## hybrid-cloud - [Documentation](https://qdrant.tech/documentation/) - Hybrid Cloud --- # [Anchor](https://qdrant.tech/documentation/hybrid-cloud/\#qdrant-hybrid-cloud) Qdrant Hybrid Cloud Seamlessly deploy and manage your vector database across diverse environments, ensuring performance, security, and cost efficiency for AI-driven applications. [Qdrant Hybrid Cloud](https://qdrant.tech/hybrid-cloud/) integrates Kubernetes clusters from any setting - cloud, on-premises, or edge - into a unified, enterprise-grade managed service. You can use [Qdrant Cloud’s UI](https://qdrant.tech/documentation/cloud/create-cluster/) to create and manage your database clusters, while they still remain within your infrastructure. **All Qdrant databases will operate solely within your network, using your storage and compute resources. All user data will stay securely within your environment and won’t be accessible by the Qdrant Cloud platform, or anyone else outside your organization.** Qdrant Hybrid Cloud ensures data privacy, deployment flexibility, low latency, and delivers cost savings, elevating standards for vector search and AI applications. **How it works:** Qdrant Hybrid Cloud relies on Kubernetes and works with any standard compliant Kubernetes distribution. When you onboard a Kubernetes cluster as a Hybrid Cloud Environment, you can deploy the Qdrant Kubernetes Operator and Cloud Agent into this cluster. These will manage Qdrant databases within your Kubernetes cluster and establish an outgoing connection to Qdrant Cloud to transport telemetry and receive management instructions. You can then benefit from the same cloud management features and transport telemetry that is available with any managed Qdrant Cloud cluster. **Setup instructions:** To begin using Qdrant Hybrid Cloud, [read our installation guide](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/\#hybrid-cloud-architecture) Hybrid Cloud architecture The Hybrid Cloud onboarding will install a Kubernetes Operator and Cloud Agent into your Kubernetes cluster. The Cloud Agent will establish an outgoing connection to `cloud.qdrant.io` on port `443` to transport telemetry and receive management instructions. It will also interact with the Kubernetes API through a ServiceAccount to create, read, update and delete the necessary Qdrant CRs (Custom Resources) based on the configuration setup in the Qdrant Cloud Console. The Qdrant Kubernetes Operator will manage the Qdrant databases within your Kubernetes cluster. Based on the Qdrant CRs, it will interact with the Kubernetes API through a ServiceAccount to create and manage the necessary resources to deploy and run Qdrant databases, such as Pods, Services, ConfigMaps, and Secrets. Both component’s access is limited to the Kubernetes namespace that you chose during the onboarding process. The Cloud Agent only sends telemetry data and status information to the Qdrant Cloud platform. It does not send any user data or sensitive information. The telemetry data includes: - The health status and resource (CPU, memory, disk and network) usage of the Qdrant databases and Qdrant control plane components. - Information about the Qdrant databases, such as the number, name and configuration of collections, the number of vectors, the number of queries, and the number of indexing operations. - Telemetry and notification data from the Qdrant databases. - Kubernetes operations and scheduling events reported for the Qdrant databases and Qdrant control plane components. After the initial onboarding, the lifecycle of these components will be controlled by the Qdrant Cloud platform via the built-in Helm controller. You don’t need to expose your Kubernetes Cluster to the Qdrant Cloud platform, you don’t need to open any ports for incoming traffic, and you don’t need to provide any Kubernetes or cloud provider credentials to the Qdrant Cloud platform. ![hybrid-cloud-architecture](https://qdrant.tech/blog/hybrid-cloud/hybrid-cloud-architecture.png) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-6-lllmstxt|> ## cloud - [Documentation](https://qdrant.tech/documentation/) - Managed Cloud --- # [Anchor](https://qdrant.tech/documentation/cloud/\#about-qdrant-managed-cloud) About Qdrant Managed Cloud Qdrant Managed Cloud is our SaaS (software-as-a-service) solution, providing managed Qdrant database clusters on the cloud. We provide you the same fast and reliable similarity search engine, but without the need to maintain your own infrastructure. Transitioning to the Managed Cloud version of Qdrant does not change how you interact with the service. All you need is a [Qdrant Cloud account](https://qdrant.to/cloud/) and an [API key](https://qdrant.tech/documentation/cloud/authentication/) for each request. You can also attach your own infrastructure as a Hybrid Cloud Environment. For details, see our [Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/) documentation. ## [Anchor](https://qdrant.tech/documentation/cloud/\#cluster-configuration) Cluster Configuration Each database cluster comes pre-configured with the following tools, features, and support services: - Allows the creation of highly available clusters with automatic failover. - Supports upgrades to later versions of Qdrant as they are released. - Upgrades are zero-downtime on highly available clusters. - Includes monitoring and logging to observe the health of each cluster. - Horizontally and vertically scalable. - Available natively on AWS and GCP, and Azure. - Available on your own infrastructure and other providers if you use the Hybrid Cloud. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-7-lllmstxt|> ## migration - [Documentation](https://qdrant.tech/documentation/) - [Database tutorials](https://qdrant.tech/documentation/database-tutorials/) - Migration to Qdrant --- # [Anchor](https://qdrant.tech/documentation/database-tutorials/migration/\#migration) Migration Migrating data between vector databases, especially across regions, platforms, or deployment types, can be a hassle. That’s where the [Qdrant Migration Tool](https://github.com/qdrant/migration) comes in. It supports a wide range of migration needs, including transferring data between Qdrant instances and migrating from other vector database providers to Qdrant. You can run the migration tool on any machine where you have connectivity to both the source and the target Qdrant databases. Direct connectivity between both databases is not required. For optimal performance, you should run the tool on a machine with a fast network connection and minimum latency to both databases. In this tutorial, we will learn how to use the migration tool and walk through a practical example of migrating from other vector databases to Qdrant. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/migration/\#why-use-this-instead-of-qdrants-native-snapshotting) Why use this instead of Qdrant’s Native Snapshotting? Qdrant supports [snapshot-based backups](https://qdrant.tech/documentation/concepts/snapshots/), low-level disk operations built for same cluster recovery or local backups. These snapshots: - Require snapshot consistency across nodes. - Can be hard to port across machines or cloud zones. On the other hand, the Qdrant Migration Tool: - Streams data in live batches. - Can resume interrupted migrations. - Works even when data is being inserted. - Supports collection reconfiguration (e.g., change replication, and quantization) - Supports migrating from other vector DBs (Pinecone, Chroma, Weaviate, etc.) ## [Anchor](https://qdrant.tech/documentation/database-tutorials/migration/\#how-to-use-the-qdrant-migration-tool) How to Use the Qdrant Migration Tool You can run the tool via Docker. Installation: ```shell docker pull registry.cloud.qdrant.io/library/qdrant-migration ``` Here is an example of how to perform a Qdrant to Qdrant migration: ```bash docker run --rm -it \ -e SOURCE_API_KEY='your-source-key' \ -e TARGET_API_KEY='your-target-key' \ registry.cloud.qdrant.io/library/qdrant-migration qdrant \ --source-url 'https://source-instance.cloud.qdrant.io' \ --source-collection 'benchmark' \ --target-url 'https://target-instance.cloud.qdrant.io' \ --target-collection 'benchmark' ``` ## [Anchor](https://qdrant.tech/documentation/database-tutorials/migration/\#example-migrate-from-pinecone-to-qdrant) Example: Migrate from Pinecone to Qdrant Let’s now walk through an example of migrating from Pinecone to Qdrant. Assuming your Pinecone index looks like this: ![Pinecone Dashboard showing index details](https://qdrant.tech/documentation/guides/pinecone-index.png) The information you need from Pinecone is: - Your Pinecone API key - The index name - The index host URL With that information, you can migrate your vector database from Pinecone to Qdrant with the following command: ```bash docker run --net=host --rm -it registry.cloud.qdrant.io/library/qdrant-migration pinecone \ --pinecone.index-host 'https://sample-movies-efgjrye.svc.aped-4627-b74a.pinecone.io' \ --pinecone.index-name 'sample-movies' \ --pinecone.api-key 'pcsk_7Dh5MW_…' \ --qdrant.url 'https://5f1a5c6c-7d47-45c3-8d47-d7389b1fad66.eu-west-1-0.aws.cloud.qdrant.io:6334' \ --qdrant.api-key 'eyJhbGciOiJIUzI1NiIsInR5c…' \ --qdrant.collection 'sample-movies' \ --migration.batch-size 64 ``` When the migration is complete, you will see the new collection on Qdrant with all the vectors. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/migration/\#conclusion) Conclusion The **Qdrant Migration Tool** makes data transfer across vector database instances effortless. Whether you’re moving between cloud regions, upgrading from self-hosted to Qdrant Cloud, or switching from other databases such as Pinecone, this tool saves you hours of manual effort. [Try it today](https://github.com/qdrant/migration). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/migration.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/migration.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-8-lllmstxt|> ## interfaces - [Documentation](https://qdrant.tech/documentation/) - API & SDKs --- # [Anchor](https://qdrant.tech/documentation/interfaces/\#interfaces) Interfaces Qdrant supports these “official” clients. > **Note:** If you are using a language that is not listed here, you can use the REST API directly or generate a client for your language > using [OpenAPI](https://github.com/qdrant/qdrant/blob/master/docs/redoc/master/openapi.json) > or [protobuf](https://github.com/qdrant/qdrant/tree/master/lib/api/src/grpc/proto) definitions. ## [Anchor](https://qdrant.tech/documentation/interfaces/\#client-libraries) Client Libraries | | Client Repository | Installation | Version | | --- | --- | --- | --- | | [![python](https://qdrant.tech/docs/misc/python.webp)](https://python-client.qdrant.tech/) | **[Python](https://github.com/qdrant/qdrant-client)** \+ **[(Client Docs)](https://python-client.qdrant.tech/)** | `pip install qdrant-client[fastembed]` | [Latest Release](https://github.com/qdrant/qdrant-client/releases) | | ![typescript](https://qdrant.tech/docs/misc/ts.webp) | **[JavaScript / Typescript](https://github.com/qdrant/qdrant-js)** | `npm install @qdrant/js-client-rest` | [Latest Release](https://github.com/qdrant/qdrant-js/releases) | | ![rust](https://qdrant.tech/docs/misc/rust.png) | **[Rust](https://github.com/qdrant/rust-client)** | `cargo add qdrant-client` | [Latest Release](https://github.com/qdrant/rust-client/releases) | | ![golang](https://qdrant.tech/docs/misc/go.webp) | **[Go](https://github.com/qdrant/go-client)** | `go get github.com/qdrant/go-client` | [Latest Release](https://github.com/qdrant/go-client/releases) | | ![.net](https://qdrant.tech/docs/misc/dotnet.webp) | **[.NET](https://github.com/qdrant/qdrant-dotnet)** | `dotnet add package Qdrant.Client` | [Latest Release](https://github.com/qdrant/qdrant-dotnet/releases) | | ![java](https://qdrant.tech/docs/misc/java.webp) | **[Java](https://github.com/qdrant/java-client)** | [Available on Maven Central](https://central.sonatype.com/artifact/io.qdrant/client) | [Latest Release](https://github.com/qdrant/java-client/releases) | ## [Anchor](https://qdrant.tech/documentation/interfaces/\#api-reference) API Reference All interaction with Qdrant takes place via the REST API. We recommend using REST API if you are using Qdrant for the first time or if you are working on a prototype. | API | Documentation | | --- | --- | | REST API | [OpenAPI Specification](https://api.qdrant.tech/api-reference) | | gRPC API | [gRPC Documentation](https://github.com/qdrant/qdrant/blob/master/docs/grpc/docs.md) | ### [Anchor](https://qdrant.tech/documentation/interfaces/\#grpc-interface) gRPC Interface The gRPC methods follow the same principles as REST. For each REST endpoint, there is a corresponding gRPC method. As per the [configuration file](https://github.com/qdrant/qdrant/blob/master/config/config.yaml), the gRPC interface is available on the specified port. ```yaml service: grpc_port: 6334 ``` Running the service inside of Docker will look like this: ```bash docker run -p 6333:6333 -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage:z \ qdrant/qdrant ``` **When to use gRPC:** The choice between gRPC and the REST API is a trade-off between convenience and speed. gRPC is a binary protocol and can be more challenging to debug. We recommend using gRPC if you are already familiar with Qdrant and are trying to optimize the performance of your application. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/interfaces.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/interfaces.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-9-lllmstxt|> ## single-node-speed-benchmark-2022 --- # Single node benchmarks (2022) August 23, 2022 Dataset:deep-image-96-angulargist-960-euclideanglove-100-angular Search threads:1008421 Plot values: RPS Latency p95 latency Index time | Engine | Setup | Dataset | Upload Time(m) | Upload + Index Time(m) | Latency(ms) | P95(ms) | P99(ms) | RPS | Precision | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | qdrant | qdrant-rps-m-64-ef-512 | deep-image-96-angular | 14.096 | 149.32 | 24.73 | 55.75 | 63.73 | 1541.86 | 0.96 | | weaviate | weaviate-m-16-ef-128 | deep-image-96-angular | 148.70 | 148.70 | 190.94 | 351.75 | 414.16 | 507.33 | 0.94 | | milvus | milvus-m-16-ef-128 | deep-image-96-angular | 6.074 | 35.28 | 171.50 | 220.26 | 236.97 | 339.44 | 0.97 | | elastic | elastic-m-16-ef-128 | deep-image-96-angular | 87.54 | 101.16 | 923.031 | 1116.83 | 1671.31 | 95.90 | 0.97 | _Download raw data: [here](https://qdrant.tech/benchmarks/result-2022-08-10.json)_ This is an archived version of Single node benchmarks. Please refer to the new version [here](https://qdrant.tech/benchmarks/single-node-speed-benchmark/). Share this article [x](https://twitter.com/intent/tweet?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fsingle-node-speed-benchmark-2022%2F&text=Single%20node%20benchmarks%20%282022%29 "x")[LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fsingle-node-speed-benchmark-2022%2F "LinkedIn") Up! <|page-10-lllmstxt|> ## using-multivector-representations - [Documentation](https://qdrant.tech/documentation/) - [Advanced tutorials](https://qdrant.tech/documentation/advanced-tutorials/) - How to Use Multivector Representations with Qdrant Effectively --- # [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#how-to-effectively-use-multivector-representations-in-qdrant-for-reranking) How to Effectively Use Multivector Representations in Qdrant for Reranking Multivector Representations are one of the most powerful features of Qdrant. However, most people don’t use them effectively, resulting in massive RAM overhead, slow inserts, and wasted compute. In this tutorial, you’ll discover how to effectively use multivector representations in Qdrant. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#what-are-multivector-representations) What are Multivector Representations? In most vector engines, each document is represented by a single vector - an approach that works well for short texts but often struggles with longer documents. Single vector representations perform pooling of the token-level embeddings, which obviously leads to losing some information. Multivector representations offer a more fine-grained alternative where a single document is represented using multiple vectors, often at the token or phrase level. This enables more precise matching between specific query terms and relevant parts of the document. Matching is especially effective in Late Interaction models like [ColBERT](https://qdrant.tech/documentation/fastembed/fastembed-colbert/), which retain token-level embeddings and perform interaction during query time leading to relevance scoring. ![Multivector Representations](https://qdrant.tech/documentation/advanced-tutorials/multivectors.png) As you will see later in the tutorial, Qdrant supports multivectors and thus late interaction models natively. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#why-token-level-vectors-are-useful) Why Token-level Vectors are Useful With token-level vectors, models like ColBERT can match specific query tokens to the most relevant parts of a document, enabling high-accuracy retrieval through Late Interaction. In late interaction, each document is converted into multiple token-level vectors instead of a single vector. The query is also tokenized and embedded into various vectors. Then, the query and document vectors are matched using a similarity function: MaxSim. You can see how it is calculated [here](https://qdrant.tech/documentation/concepts/vectors/#multivectors). In traditional retrieval, the query and document are converted into single embeddings, after which similarity is computed. This is an early interaction because the information is compressed before retrieval. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#what-is-rescoring-and-why-is-it-used) What is Rescoring, and Why is it Used? Rescoring is two-fold: - Retrieve relevant documents using a fast model. - Rerank them using a more accurate but slower model such as ColBERT. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#why-indexing-every-vector-by-default-is-a-problem) Why Indexing Every Vector by Default is a Problem In multivector representations (such as those used by Late Interaction models like ColBERT), a single logical document results in hundreds of token-level vectors. Indexing each of these vectors individually with HNSW in Qdrant can lead to: - High RAM usage - Slow insert times due to the complexity of maintaining the HNSW graph However, because multivectors are typically used in the reranking stage (after a first-pass retrieval using dense vectors), there’s often no need to index these token-level vectors with HNSW. Instead, they can be stored as multi-vector fields (without HNSW indexing) and used at query-time for reranking, which reduces resource overhead and improves performance. For more on this, check out Qdrant’s detailed breakdown in our [Scaling PDF Retrieval with Qdrant tutorial](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/#math-behind-the-scaling). With Qdrant, you have full control of how indexing works. You can disable indexing by setting the HNSW `m` parameter to `0`: ```python from qdrant_client import QdrantClient, models client = QdrantClient("http://localhost:6333") collection_name = "dense_multivector_demo" client.create_collection( collection_name=collection_name, vectors_config={ "dense": models.VectorParams( size=384, distance=models.Distance.COSINE # Leave HNSW indexing ON for dense ), "colbert": models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), hnsw_config=models.HnswConfigDiff(m=0) # Disable HNSW for reranking ) } ) ``` By disabling HNSW on multivectors, you: - Save compute. - Reduce memory usage. - Speed up vector uploads. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#how-to-generate-multivectors-using-fastembed) How to Generate Multivectors Using FastEmbed Let’s demonstrate how to effectively use multivectors using [FastEmbed](https://github.com/qdrant/fastembed), which wraps ColBERT into a simple API. Install FastEmbed and Qdrant: ```bash pip install qdrant-client[fastembed]>=1.14.2 ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#step-by-step-colbert--qdrant-setup) Step-by-Step: ColBERT + Qdrant Setup Ensure that Qdrant is running and create a client: ```python from qdrant_client import QdrantClient, models --- # 1. Connect to Qdrant server client = QdrantClient("http://localhost:6333") ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#1-encode-documents) 1\. Encode Documents Next, encode your documents: ```python from fastembed import TextEmbedding, LateInteractionTextEmbedding --- # Example documents and query documents = [\ "Artificial intelligence is used in hospitals for cancer diagnosis and treatment.",\ "Self-driving cars use AI to detect obstacles and make driving decisions.",\ "AI is transforming customer service through chatbots and automation.",\ # ...\ ] query_text = "How does AI help in medicine?" dense_documents = [\ models.Document(text=doc, model="BAAI/bge-small-en")\ for doc in documents\ ] dense_query = models.Document(text=query_text, model="BAAI/bge-small-en") colbert_documents = [\ models.Document(text=doc, model="colbert-ir/colbertv2.0")\ for doc in documents\ ] colbert_query = models.Document(text=query_text, model="colbert-ir/colbertv2.0") ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#2-create-a-qdrant-collection) 2\. Create a Qdrant collection Then create a Qdrant collection with both vector types. Note that we leave indexing on for the `dense` vector but turn it off for the `colbert` vector that will be used for reranking. ```python collection_name = "dense_multivector_demo" client.create_collection( collection_name=collection_name, vectors_config={ "dense": models.VectorParams( size=384, distance=models.Distance.COSINE # Leave HNSW indexing ON for dense ), "colbert": models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), hnsw_config=models.HnswConfigDiff(m=0) # Disable HNSW for reranking ) } ) ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#3-upload-documents-dense--multivector) 3\. Upload Documents (Dense + Multivector) Now upload the vectors: ```python points = [\ models.PointStruct(\ id=i,\ vector={\ "dense": dense_documents[i],\ "colbert": colbert_documents[i]\ },\ payload={"text": documents[i]}\ ) for i in range(len(documents))\ ] client.upsert(collection_name="dense_multivector_demo", points=points) ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#query-with-retrieval--reranking-in-one-call) Query with Retrieval + Reranking in One Call Now let’s run a search: ```python results = client.query_points( collection_name="dense_multivector_demo", prefetch=models.Prefetch( query=dense_query, using="dense", ), query=colbert_query, using="colbert", limit=3, with_payload=True ) ``` - The dense vector retrieves the top candidates quickly. - The Colbert multivector reranks them using token-level `MaxSim` with fine-grained precision. - Returns the top 3 results. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/\#conclusion) Conclusion Multivector search is one of the most powerful features of a vector database when used correctly. With this functionality in Qdrant, you can: - Store token-level embeddings natively. - Disable indexing to reduce overhead. - Run fast retrieval and accurate reranking in one API call. - Efficiently scale late interaction. Combining FastEmbed and Qdrant leads to a production-ready pipeline for ColBERT-style reranking without wasting resources. You can do this locally or use Qdrant Cloud. Qdrant offers an easy-to-use API to get started with your search engine, so if you’re ready to dive in, sign up for free at [Qdrant Cloud](https://qdrant.tech/cloud/) and start building. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/using-multivector-representations.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/using-multivector-representations.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-11-lllmstxt|> ## cloud-api - [Documentation](https://qdrant.tech/documentation/) - Qdrant Cloud API --- # [Anchor](https://qdrant.tech/documentation/cloud-api/\#qdrant-cloud-api-powerful-grpc-and-flexible-restjson-interfaces) Qdrant Cloud API: Powerful gRPC and Flexible REST/JSON Interfaces **Note:** This is not the Qdrant REST or gPRC API of the database itself. For database APIs & SDKs, see our list of [interfaces](https://qdrant.tech/documentation/interfaces/) ## [Anchor](https://qdrant.tech/documentation/cloud-api/\#introduction) Introduction The Qdrant Cloud API lets you automate the Qdrant Cloud platform. You can use this API to manage your accounts, clusters, backup schedules, authentication methods, hybrid cloud environments, and more. To cater to diverse integration needs, the Qdrant Cloud API offers two primary interaction models: - **gRPC API**: For high-performance, low-latency, and type-safe communication. This is the recommended way for backend services and applications requiring maximum efficiency. The API is defined using Protocol Buffers. - **REST/JSON API**: A conventional HTTP/1.1 (and HTTP/2) interface with JSON payloads. This API is provided via a gRPC Gateway, translating RESTful calls into gRPC messages, offering ease of use for web clients, scripts, and broader tool compatibility. You can find the API definitions and generated client libraries in our Qdrant Cloud Public API [GitHub repository](https://github.com/qdrant/qdrant-cloud-public-api). **Note:** The API is splitted into multiple services to make it easier to use. ### [Anchor](https://qdrant.tech/documentation/cloud-api/\#qdrant-cloud-api-endpoints) Qdrant Cloud API Endpoints - **gRPC Endpoint**: grpc.cloud.qdrant.io:443 - **REST/JSON Endpoint**: [https://api.cloud.qdrant.io](https://api.cloud.qdrant.io/) ### [Anchor](https://qdrant.tech/documentation/cloud-api/\#authentication) Authentication Most of the Qdrant Cloud API requests must be authenticated. Authentication is handled via API keys (so called management keys), which should be passed in the Authorization header. **Management Keys**: `Authorization: apikey ` Replace with the actual API key obtained from your Qdrant Cloud dashboard or generated programmatically. You can create a management key in the Cloud Console UI. Go to **Access Management** \> **Cloud Management Keys**. ![Authentication](https://qdrant.tech/documentation/cloud/authentication.png) **Note:** Ensure that the API key is kept secure and not exposed in public repositories or logs. Once authenticated, the API allows you to manage clusters, backup schedules, and perform other operations available to your account. ### [Anchor](https://qdrant.tech/documentation/cloud-api/\#samples) Samples For samples on how to use the API, with a tool like grpcurl, curl or any of the provided SDKs, please see the [Qdrant Cloud Public API](https://github.com/qdrant/qdrant-cloud-public-api) repository. ## [Anchor](https://qdrant.tech/documentation/cloud-api/\#terraform-provider) Terraform Provider Qdrant Cloud also provides a Terraform provider to manage your Qdrant Cloud resources. [Learn more](https://qdrant.tech/documentation/infrastructure/terraform/). ## [Anchor](https://qdrant.tech/documentation/cloud-api/\#deprecated-openapi-specification) Deprecated OpenAPI specification We still support our deprecated OpenAPI endpoint, but this is scheduled to be removed later this year (November 1st, 2025). We do _NOT_ recommend to use this endpoint anymore and use the replacement as described above. | REST API | Documentation | | --- | --- | | v.0.1.0 | [OpenAPI Specification](https://cloud.qdrant.io/pa/v1/docs) | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-api.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-api.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-12-lllmstxt|> ## configuration - [Documentation](https://qdrant.tech/documentation/) - [Private cloud](https://qdrant.tech/documentation/private-cloud/) - Configuration --- # [Anchor](https://qdrant.tech/documentation/private-cloud/configuration/\#private-cloud-configuration) Private Cloud Configuration The Qdrant Private Cloud helm chart has several configuration options. The following YAML shows all configuration options with their default values: ```yaml operator: # Amount of replicas for the Qdrant operator (v2) replicaCount: 1 image: # Image repository for the qdrant operator repository: registry.cloud.qdrant.io/qdrant/operator # Image pullPolicy pullPolicy: IfNotPresent # Overrides the image tag whose default is the chart appVersion. tag: "" # Optional image pull secrets imagePullSecrets: - name: qdrant-registry-creds nameOverride: "" fullnameOverride: "operator" # Service account configuration serviceAccount: create: true annotations: {} # Additional pod annotations podAnnotations: {} # pod security context podSecurityContext: runAsNonRoot: true runAsUser: 10001 runAsGroup: 20001 fsGroup: 30001 # container security context securityContext: capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 10001 runAsGroup: 20001 allowPrivilegeEscalation: false seccompProfile: type: RuntimeDefault # Configuration for the Qdrant operator service to expose metrics service: enabled: true type: ClusterIP metricsPort: 9290 # Configuration for the Qdrant operator service monitor to scrape metrics serviceMonitor: enabled: false # Resource requests and limits for the Qdrant operator resources: {} # Node selector for the Qdrant operator nodeSelector: {} # Tolerations for the Qdrant operator tolerations: [] # Affinity configuration for the Qdrant operator affinity: {} watch: # If true, watches only the namespace where the Qdrant operator is deployed, otherwise watches the namespaces in watch.namespaces onlyReleaseNamespace: true # an empty list watches all namespaces. namespaces: [] limitRBAC: true # Configuration for the Qdrant operator (v2) settings: # Does the operator run inside of a Kubernetes cluster (kubernetes) or outside (local) appEnvironment: kubernetes # The log level for the operator # Available options: DEBUG | INFO | WARN | ERROR logLevel: INFO # Metrics contains the operator config related the metrics metrics: # The port used for metrics port: 9290 # Health contains the operator config related the health probe healthz: # The port used for the health probe port: 8285 # Controller related settings controller: # The period a forced recync is done by the controller (if watches are missed / nothing happened) forceResyncPeriod: 10h # QPS indicates the maximum QPS to the master from this client. # Default is 200 qps: 200 # Maximum burst for throttle. # Default is 500. burst: 500 # Features contains the settings for enabling / disabling the individual features of the operator features: # ClusterManagement contains the settings for qdrant (database) cluster management clusterManagement: # Whether or not the Qdrant cluster features are enabled. # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected. # Default is true. enable: true # The StorageClass used to make database and snapshot PVCs. # Default is nil, meaning the default storage class of Kubernetes. storageClass: # The StorageClass used to make database PVCs. # Default is nil, meaning the default storage class of Kubernetes. #database: # The StorageClass used to make snapshot PVCs. # Default is nil, meaning the default storage class of Kubernetes. #snapshot: # Qdrant config contains settings specific for the database qdrant: # The config where to find the image for qdrant image: # The repository where to find the image for qdrant # Default is "qdrant/qdrant" repository: registry.cloud.qdrant.io/qdrant/qdrant # Docker image pull policy # Default "IfNotPresent", unless the tag is dev, master or latest. Then "Always" #pullPolicy: # Docker image pull secret name # This secret should be available in the namespace where the cluster is running # Default not set pullSecretName: qdrant-registry-creds # storage contains the settings for the storage of the Qdrant cluster storage: performance: # CPU budget, how many CPUs (threads) to allocate for an optimization job. # If 0 - auto selection, keep 1 or more CPUs unallocated depending on CPU size # If negative - subtract this number of CPUs from the available CPUs. # If positive - use this exact number of CPUs. optimizerCpuBudget: 0 # Enable async scorer which uses io_uring when rescoring. # Only supported on Linux, must be enabled in your kernel. # See: asyncScorer: false # Qdrant DB log level # Available options: DEBUG | INFO | WARN | ERROR # Default is "INFO" logLevel: INFO # Default Qdrant security context configuration securityContext: # Enable default security context # Default is false enabled: false # Default user for qdrant container # Default not set #user: 1000 # Default fsGroup for qdrant container # Default not set #fsUser: 2000 # Default group for qdrant container # Default not set #group: 3000 # Network policies configuration for the Qdrant databases networkPolicies: # Whether or not NetworkPolicy management is enabled. # If set to false, no NetworkPolicies will be created. # Default is true. enable: true ingress: - ports: - protocol: TCP port: 6333 - protocol: TCP port: 6334 # Allow DNS resolution from qdrant pods at Kubernetes internal DNS server egress: - ports: - protocol: UDP port: 53 # Scheduling config contains the settings specific for scheduling scheduling: # Default topology spread constraints (list from type corev1.TopologySpreadConstraint) # Default is an empty list topologySpreadConstraints: [] # Default pod disruption budget (object from type policyv1.PodDisruptionBudgetSpec) # Default is not set podDisruptionBudget: {} # ClusterManager config contains the settings specific for cluster manager clusterManager: # Whether or not the cluster manager (on operator level). # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected. # Default is false. enable: true # The endpoint address where the cluster manager can be reached endpointAddress: "http://qdrant-cluster-manager" # InvocationInterval is the interval between calls (started after the previous call is retured) # Default is 10 seconds invocationInterval: 10s # Timeout is the duration a single call to the cluster manager is allowed to take. # Default is 30 seconds timeout: 30s # Specifies overrides for the manage rules manageRulesOverrides: #dry_run: #max_transfers: #max_transfers_per_collection: #rebalance: #replicate: # Ingress config contains the settings specific for ingress ingress: # Whether or not the Ingress feature is enabled. # Default is true. enable: false # Which specific ingress provider should be used # Default is KubernetesIngress provider: KubernetesIngress # The specific settings when the Provider is QdrantCloudTraefik qdrantCloudTraefik: # Enable tls # Default is false tls: false # Secret with TLS certificate # Default is None secretName: "" # List of Traefik middlewares to apply # Default is an empty list middlewares: [] # IP Allowlist Strategy for Traefik # Default is None ipAllowlistStrategy: # Enable body validator plugin and matching ingressroute rules # Default is false enableBodyValidatorPlugin: false # The specific settings when the Provider is KubernetesIngress kubernetesIngress: # Name of the ingress class # Default is None #ingressClassName: # TelemetryTimeout is the duration a single call to the cluster telemetry endpoint is allowed to take. # Default is 3 seconds telemetryTimeout: 3s # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 20. maxConcurrentReconciles: 20 # VolumeExpansionMode specifies the expansion mode, which can be online or offline (e.g. in case of Azure). # Available options: Online, Offline # Default is Online volumeExpansionMode: Online # BackupManagementConfig contains the settings for backup management backupManagement: # Whether or not the backup features are enabled. # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected. # Default is true. enable: true # Snapshots contains the settings for snapshots as part of backup management. snapshots: # Whether or not the Snapshot feature is enabled. # Default is true. enable: true # The VolumeSnapshotClass used to make VolumeSnapshots. # Default is "csi-snapclass". volumeSnapshotClass: "csi-snapclass" # The duration a snapshot is retained when the phase becomes Failed or Skipped # Default is 72h (3d). retainUnsuccessful: 72h # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1. maxConcurrentReconciles: 1 # ScheduledSnapshots contains the settings for scheduled snapshot as part of backup management. scheduledSnapshots: # Whether or not the ScheduledSnapshot feature is enabled. # Default is true. enable: true # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1. maxConcurrentReconciles: 1 # Restores contains the settings for restoring (a snapshot) as part of backup management. restores: # Whether or not the Restore feature is enabled. # Default is true. enable: true # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1. maxConcurrentReconciles: 1 qdrant-cluster-manager: replicaCount: 1 image: repository: registry.cloud.qdrant.io/qdrant/cluster-manager pullPolicy: IfNotPresent # Overrides the image tag whose default is the chart appVersion. tag: "" imagePullSecrets: - name: qdrant-registry-creds nameOverride: "" fullnameOverride: "qdrant-cluster-manager" serviceAccount: # Specifies whether a service account should be created create: true # Automatically mount a ServiceAccount's API credentials? automount: true # Annotations to add to the service account annotations: {} # The name of the service account to use. # If not set and create is true, a name is generated using the fullname template name: "" podAnnotations: {} podLabels: {} podSecurityContext: runAsNonRoot: true runAsUser: 10001 runAsGroup: 20001 fsGroup: 30001 securityContext: capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 10001 runAsGroup: 20001 allowPrivilegeEscalation: false seccompProfile: type: RuntimeDefault service: type: ClusterIP networkPolicy: create: true resources: {} # We usually recommend not to specify default resources and to leave this as a conscious # choice for the user. This also increases chances charts run on environments with little # resources, such as Minikube. If you do want to specify resources, uncomment the following # lines, adjust them as necessary, and remove the curly braces after 'resources:'. # limits: # cpu: 100m # memory: 128Mi # requests: # cpu: 100m # memory: 128Mi nodeSelector: {} tolerations: [] affinity: {} ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/configuration.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/configuration.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-13-lllmstxt|> ## sparse-vectors - [Articles](https://qdrant.tech/articles/) - What is a Sparse Vector? How to Achieve Vector-based Hybrid Search [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # What is a Sparse Vector? How to Achieve Vector-based Hybrid Search Nirant Kasliwal · December 09, 2023 ![What is a Sparse Vector? How to Achieve Vector-based Hybrid Search](https://qdrant.tech/articles_data/sparse-vectors/preview/title.jpg) Think of a library with a vast index card system. Each index card only has a few keywords marked out (sparse vector) of a large possible set for each book (document). This is what sparse vectors enable for text. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#what-are-sparse-and-dense-vectors) What are sparse and dense vectors? Sparse vectors are like the Marie Kondo of data—keeping only what sparks joy (or relevance, in this case). Consider a simplified example of 2 documents, each with 200 words. A dense vector would have several hundred non-zero values, whereas a sparse vector could have, much fewer, say only 20 non-zero values. In this example: We assume it selects only 2 words or tokens from each document. The rest of the values are zero. This is why it’s called a sparse vector. ```python dense = [0.2, 0.3, 0.5, 0.7, ...] # several hundred floats sparse = [{331: 0.5}, {14136: 0.7}] # 20 key value pairs ``` The numbers 331 and 14136 map to specific tokens in the vocabulary e.g. `['chocolate', 'icecream']`. The rest of the values are zero. This is why it’s called a sparse vector. The tokens aren’t always words though, sometimes they can be sub-words: `['ch', 'ocolate']` too. They’re pivotal in information retrieval, especially in ranking and search systems. BM25, a standard ranking function used by search engines like [Elasticsearch](https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors), exemplifies this. BM25 calculates the relevance of documents to a given search query. BM25’s capabilities are well-established, yet it has its limitations. BM25 relies solely on the frequency of words in a document and does not attempt to comprehend the meaning or the contextual importance of the words. Additionally, it requires the computation of the entire corpus’s statistics in advance, posing a challenge for large datasets. Sparse vectors harness the power of neural networks to surmount these limitations while retaining the ability to query exact words and phrases. They excel in handling large text data, making them crucial in modern data processing a and marking an advancement over traditional methods such as BM25. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#understanding-sparse-vectors) Understanding sparse vectors Sparse Vectors are a representation where each dimension corresponds to a word or subword, greatly aiding in interpreting document rankings. This clarity is why sparse vectors are essential in modern search and recommendation systems, complimenting the meaning-rich embedding or dense vectors. Dense vectors from models like OpenAI Ada-002 or Sentence Transformers contain non-zero values for every element. In contrast, sparse vectors focus on relative word weights per document, with most values being zero. This results in a more efficient and interpretable system, especially in text-heavy applications like search. Sparse Vectors shine in domains and scenarios where many rare keywords or specialized terms are present. For example, in the medical domain, many rare terms are not present in the general vocabulary, so general-purpose dense vectors cannot capture the nuances of the domain. | Feature | Sparse Vectors | Dense Vectors | | --- | --- | --- | | **Data Representation** | Majority of elements are zero | All elements are non-zero | | **Computational Efficiency** | Generally higher, especially in operations involving zero elements | Lower, as operations are performed on all elements | | **Information Density** | Less dense, focuses on key features | Highly dense, capturing nuanced relationships | | **Example Applications** | Text search, Hybrid search | [RAG](https://qdrant.tech/articles/what-is-rag-in-ai/), many general machine learning tasks | Where do sparse vectors fail though? They’re not great at capturing nuanced relationships between words. For example, they can’t capture the relationship between “king” and “queen” as well as dense vectors. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#splade) SPLADE Let’s check out [SPLADE](https://europe.naverlabs.com/research/computer-science/splade-a-sparse-bi-encoder-bert-based-model-achieves-effective-and-efficient-full-text-document-ranking/?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors), an excellent way to make sparse vectors. Let’s look at some numbers first. Higher is better: | Model | MRR@10 (MS MARCO Dev) | Type | | --- | --- | --- | | BM25 | 0.184 | Sparse | | TCT-ColBERT | 0.359 | Dense | | doc2query-T5 [link](https://github.com/castorini/docTTTTTquery) | 0.277 | Sparse | | SPLADE | 0.322 | Sparse | | SPLADE-max | 0.340 | Sparse | | SPLADE-doc | 0.322 | Sparse | | DistilSPLADE-max | 0.368 | Sparse | All numbers are from [SPLADEv2](https://arxiv.org/abs/2109.10086). MRR is [Mean Reciprocal Rank](https://www.wikiwand.com/en/Mean_reciprocal_rank#References), a standard metric for ranking. [MS MARCO](https://microsoft.github.io/MSMARCO-Passage-Ranking/?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) is a dataset for evaluating ranking and retrieval for passages. SPLADE is quite flexible as a method, with regularization knobs that can be tuned to obtain [different models](https://github.com/naver/splade) as well: > SPLADE is more a class of models rather than a model per se: depending on the regularization magnitude, we can obtain different models (from very sparse to models doing intense query/doc expansion) with different properties and performance. First, let’s look at how to create a sparse vector. Then, we’ll look at the concepts behind SPLADE. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#creating-a-sparse-vector) Creating a sparse vector We’ll explore two different ways to create a sparse vector. The higher performance way to create a sparse vector from dedicated document and query encoders. We’ll look at a simpler approach – here we will use the same model for both document and query. We will get a dictionary of token ids and their corresponding weights for a sample text - representing a document. If you’d like to follow along, here’s a [Colab Notebook](https://colab.research.google.com/gist/NirantK/ad658be3abefc09b17ce29f45255e14e/splade-single-encoder.ipynb), [alternate link](https://gist.github.com/NirantK/ad658be3abefc09b17ce29f45255e14e) with all the code. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#setting-up) Setting Up ```python from transformers import AutoModelForMaskedLM, AutoTokenizer model_id = "naver/splade-cocondenser-ensembledistil" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForMaskedLM.from_pretrained(model_id) text = """Arthur Robert Ashe Jr. (July 10, 1943 – February 6, 1993) was an American professional tennis player. He won three Grand Slam titles in singles and two in doubles.""" ``` ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#computing-the-sparse-vector) Computing the sparse vector ```python import torch def compute_vector(text): """ Computes a vector from logits and attention mask using ReLU, log, and max operations. """ tokens = tokenizer(text, return_tensors="pt") output = model(**tokens) logits, attention_mask = output.logits, tokens.attention_mask relu_log = torch.log(1 + torch.relu(logits)) weighted_log = relu_log * attention_mask.unsqueeze(-1) max_val, _ = torch.max(weighted_log, dim=1) vec = max_val.squeeze() return vec, tokens vec, tokens = compute_vector(text) print(vec.shape) ``` You’ll notice that there are 38 tokens in the text based on this tokenizer. This will be different from the number of tokens in the vector. In a TF-IDF, we’d assign weights only to these tokens or words. In SPLADE, we assign weights to all the tokens in the vocabulary using this vector using our learned model. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#term-expansion-and-weights) Term expansion and weights ```python def extract_and_map_sparse_vector(vector, tokenizer): """ Extracts non-zero elements from a given vector and maps these elements to their human-readable tokens using a tokenizer. The function creates and returns a sorted dictionary where keys are the tokens corresponding to non-zero elements in the vector, and values are the weights of these elements, sorted in descending order of weights. This function is useful in NLP tasks where you need to understand the significance of different tokens based on a model's output vector. It first identifies non-zero values in the vector, maps them to tokens, and sorts them by weight for better interpretability. Args: vector (torch.Tensor): A PyTorch tensor from which to extract non-zero elements. tokenizer: The tokenizer used for tokenization in the model, providing the mapping from tokens to indices. Returns: dict: A sorted dictionary mapping human-readable tokens to their corresponding non-zero weights. """ # Extract indices and values of non-zero elements in the vector cols = vector.nonzero().squeeze().cpu().tolist() weights = vector[cols].cpu().tolist() # Map indices to tokens and create a dictionary idx2token = {idx: token for token, idx in tokenizer.get_vocab().items()} token_weight_dict = { idx2token[idx]: round(weight, 2) for idx, weight in zip(cols, weights) } # Sort the dictionary by weights in descending order sorted_token_weight_dict = { k: v for k, v in sorted( token_weight_dict.items(), key=lambda item: item[1], reverse=True ) } return sorted_token_weight_dict --- # Usage example sorted_tokens = extract_and_map_sparse_vector(vec, tokenizer) sorted_tokens ``` There will be 102 sorted tokens in total. This has expanded to include tokens that weren’t in the original text. This is the term expansion we will talk about next. Here are some terms that are added: “Berlin”, and “founder” - despite having no mention of Arthur’s race (which leads to Owen’s Berlin win) and his work as the founder of Arthur Ashe Institute for Urban Health. Here are the top few `sorted_tokens` with a weight of more than 1: ```python { "ashe": 2.95, "arthur": 2.61, "tennis": 2.22, "robert": 1.74, "jr": 1.55, "he": 1.39, "founder": 1.36, "doubles": 1.24, "won": 1.22, "slam": 1.22, "died": 1.19, "singles": 1.1, "was": 1.07, "player": 1.06, "titles": 0.99, ... } ``` If you’re interested in using the higher-performance approach, check out the following models: 1. [naver/efficient-splade-VI-BT-large-doc](https://huggingface.co/naver/efficient-splade-vi-bt-large-doc) 2. [naver/efficient-splade-VI-BT-large-query](https://huggingface.co/naver/efficient-splade-vi-bt-large-doc) ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#why-splade-works-term-expansion) Why SPLADE works: term expansion Consider a query “solar energy advantages”. SPLADE might expand this to include terms like “renewable,” “sustainable,” and “photovoltaic,” which are contextually relevant but not explicitly mentioned. This process is called term expansion, and it’s a key component of SPLADE. SPLADE learns the query/document expansion to include other relevant terms. This is a crucial advantage over other sparse methods which include the exact word, but completely miss the contextually relevant ones. This expansion has a direct relationship with what we can control when making a SPLADE model: Sparsity via Regularisation. The number of tokens (BERT wordpieces) we use to represent each document. If we use more tokens, we can represent more terms, but the vectors become denser. This number is typically between 20 to 200 per document. As a reference point, the dense BERT vector is 768 dimensions, OpenAI Embedding is 1536 dimensions, and the sparse vector is 30 dimensions. For example, assume a 1M document corpus. Say, we use 100 sparse token ids + weights per document. Correspondingly, dense BERT vector would be 768M floats, the OpenAI Embedding would be 1.536B floats, and the sparse vector would be a maximum of 100M integers + 100M floats. This could mean a **10x reduction in memory usage**, which is a huge win for large-scale systems: | Vector Type | Memory (GB) | | --- | --- | | Dense BERT Vector | 6.144 | | OpenAI Embedding | 12.288 | | Sparse Vector | 1.12 | ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#how-splade-works-leveraging-bert) How SPLADE works: leveraging BERT SPLADE leverages a transformer architecture to generate sparse representations of documents and queries, enabling efficient retrieval. Let’s dive into the process. The output logits from the transformer backbone are inputs upon which SPLADE builds. The transformer architecture can be something familiar like BERT. Rather than producing dense probability distributions, SPLADE utilizes these logits to construct sparse vectors—think of them as a distilled essence of tokens, where each dimension corresponds to a term from the vocabulary and its associated weight in the context of the given document or query. This sparsity is critical; it mirrors the probability distributions from a typical [Masked Language Modeling](http://jalammar.github.io/illustrated-bert/?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) task but is tuned for retrieval effectiveness, emphasizing terms that are both: 1. Contextually relevant: Terms that represent a document well should be given more weight. 2. Discriminative across documents: Terms that a document has, and other documents don’t, should be given more weight. The token-level distributions that you’d expect in a standard transformer model are now transformed into token-level importance scores in SPLADE. These scores reflect the significance of each term in the context of the document or query, guiding the model to allocate more weight to terms that are likely to be more meaningful for retrieval purposes. The resulting sparse vectors are not only memory-efficient but also tailored for precise matching in the high-dimensional space of a search engine like Qdrant. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#interpreting-splade) Interpreting SPLADE A downside of dense vectors is that they are not interpretable, making it difficult to understand why a document is relevant to a query. SPLADE importance estimation can provide insights into the ‘why’ behind a document’s relevance to a query. By shedding light on which tokens contribute most to the retrieval score, SPLADE offers some degree of interpretability alongside performance, a rare feat in the realm of neural IR systems. For engineers working on search, this transparency is invaluable. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#known-limitations-of-splade) Known limitations of SPLADE ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#pooling-strategy) Pooling strategy The switch to max pooling in SPLADE improved its performance on the MS MARCO and TREC datasets. However, this indicates a potential limitation of the baseline SPLADE pooling method, suggesting that SPLADE’s performance is sensitive to the choice of pooling strategy​​. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#document-and-query-eecoder) Document and query Eecoder The SPLADE model variant that uses a document encoder with max pooling but no query encoder reaches the same performance level as the prior SPLADE model. This suggests a limitation in the necessity of a query encoder, potentially affecting the efficiency of the model​​. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#other-sparse-vector-methods) Other sparse vector methods SPLADE is not the only method to create sparse vectors. Essentially, sparse vectors are a superset of TF-IDF and BM25, which are the most popular text retrieval methods. In other words, you can create a sparse vector using the term frequency and inverse document frequency (TF-IDF) to reproduce the BM25 score exactly. Additionally, attention weights from Sentence Transformers can be used to create sparse vectors. This method preserves the ability to query exact words and phrases but avoids the computational overhead of query expansion used in SPLADE. We will cover these methods in detail in a future article. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#leveraging-sparse-vectors-in-qdrant-for-hybrid-search) Leveraging sparse vectors in Qdrant for hybrid search Qdrant supports a separate index for Sparse Vectors. This enables you to use the same collection for both dense and sparse vectors. Each “Point” in Qdrant can have both dense and sparse vectors. But let’s first take a look at how you can work with sparse vectors in Qdrant. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#practical-implementation-in-python) Practical implementation in Python Let’s dive into how Qdrant handles sparse vectors with an example. Here is what we will cover: 1. Setting Up Qdrant Client: Initially, we establish a connection with Qdrant using the QdrantClient. This setup is crucial for subsequent operations. 2. Creating a Collection with Sparse Vector Support: In Qdrant, a collection is a container for your vectors. Here, we create a collection specifically designed to support sparse vectors. This is done using the create\_collection method where we define the parameters for sparse vectors, such as setting the index configuration. 3. Inserting Sparse Vectors: Once the collection is set up, we can insert sparse vectors into it. This involves defining the sparse vector with its indices and values, and then upserting this point into the collection. 4. Querying with Sparse Vectors: To perform a search, we first prepare a query vector. This involves computing the vector from a query text and extracting its indices and values. We then use these details to construct a query against our collection. 5. Retrieving and Interpreting Results: The search operation returns results that include the id of the matching document, its score, and other relevant details. The score is a crucial aspect, reflecting the similarity between the query and the documents in the collection. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#1-set-up) 1\. Set up ```python --- # Qdrant client setup client = QdrantClient(":memory:") --- # Define collection name COLLECTION_NAME = "example_collection" --- # Insert sparse vector into Qdrant collection point_id = 1 # Assign a unique ID for the point ``` ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#2-create-a-collection-with-sparse-vector-support) 2\. Create a collection with sparse vector support ```python client.create_collection( collection_name=COLLECTION_NAME, vectors_config={}, sparse_vectors_config={ "text": models.SparseVectorParams( index=models.SparseIndexParams( on_disk=False, ) ) }, ) ``` ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#3-insert-sparse-vectors) 3\. Insert sparse vectors Here, we see the process of inserting a sparse vector into the Qdrant collection. This step is key to building a dataset that can be quickly retrieved in the first stage of the retrieval process, utilizing the efficiency of sparse vectors. Since this is for demonstration purposes, we insert only one point with Sparse Vector and no dense vector. ```python client.upsert( collection_name=COLLECTION_NAME, points=[\ models.PointStruct(\ id=point_id,\ payload={}, # Add any additional payload if necessary\ vector={\ "text": models.SparseVector(\ indices=indices.tolist(), values=values.tolist()\ )\ },\ )\ ], ) ``` By upserting points with sparse vectors, we prepare our dataset for rapid first-stage retrieval, laying the groundwork for subsequent detailed analysis using dense vectors. Notice that we use “text” to denote the name of the sparse vector. Those familiar with the Qdrant API will notice that the extra care taken to be consistent with the existing named vectors API – this is to make it easier to use sparse vectors in existing codebases. As always, you’re able to **apply payload filters**, shard keys, and other advanced features you’ve come to expect from Qdrant. To make things easier for you, the indices and values don’t have to be sorted before upsert. Qdrant will sort them when the index is persisted e.g. on disk. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#4-query-with-sparse-vectors) 4\. Query with sparse vectors We use the same process to prepare a query vector as well. This involves computing the vector from a query text and extracting its indices and values. We then use these details to construct a query against our collection. ```python --- # Preparing a query vector query_text = "Who was Arthur Ashe?" query_vec, query_tokens = compute_vector(query_text) query_vec.shape query_indices = query_vec.nonzero().numpy().flatten() query_values = query_vec.detach().numpy()[query_indices] ``` In this example, we use the same model for both document and query. This is not a requirement, but it’s a simpler approach. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#5-retrieve-and-interpret-results) 5\. Retrieve and interpret results After setting up the collection and inserting sparse vectors, the next critical step is retrieving and interpreting the results. This process involves executing a search query and then analyzing the returned results. ```python --- # Searching for similar documents result = client.search( collection_name=COLLECTION_NAME, query_vector=models.NamedSparseVector( name="text", vector=models.SparseVector( indices=query_indices, values=query_values, ), ), with_vectors=True, ) result ``` In the above code, we execute a search against our collection using the prepared sparse vector query. The `client.search` method takes the collection name and the query vector as inputs. The query vector is constructed using the `models.NamedSparseVector`, which includes the indices and values derived from the query text. This is a crucial step in efficiently retrieving relevant documents. ```python ScoredPoint( id=1, version=0, score=3.4292831420898438, payload={}, vector={ "text": SparseVector( indices=[2001, 2002, 2010, 2018, 2032, ...], values=[\ 1.0660614967346191,\ 1.391068458557129,\ 0.8903818726539612,\ 0.2502821087837219,\ ...,\ ], ) }, ) ``` The result, as shown above, is a `ScoredPoint` object containing the ID of the retrieved document, its version, a similarity score, and the sparse vector. The score is a key element as it quantifies the similarity between the query and the document, based on their respective vectors. To understand how this scoring works, we use the familiar dot product method: Similarity(Query,Document)=∑i∈IQueryi×Documenti This formula calculates the similarity score by multiplying corresponding elements of the query and document vectors and summing these products. This method is particularly effective with sparse vectors, where many elements are zero, leading to a computationally efficient process. The higher the score, the greater the similarity between the query and the document, making it a valuable metric for assessing the relevance of the retrieved documents. ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#hybrid-search-combining-sparse-and-dense-vectors) Hybrid search: combining sparse and dense vectors By combining search results from both dense and sparse vectors, you can achieve a hybrid search that is both efficient and accurate. Results from sparse vectors will guarantee, that all results with the required keywords are returned, while dense vectors will cover the semantically similar results. The mixture of dense and sparse results can be presented directly to the user, or used as a first stage of a two-stage retrieval process. Let’s see how you can make a hybrid search query in Qdrant. First, you need to create a collection with both dense and sparse vectors: ```python client.create_collection( collection_name=COLLECTION_NAME, vectors_config={ "text-dense": models.VectorParams( size=1536, # OpenAI Embeddings distance=models.Distance.COSINE, ) }, sparse_vectors_config={ "text-sparse": models.SparseVectorParams( index=models.SparseIndexParams( on_disk=False, ) ) }, ) ``` Then, assuming you have upserted both dense and sparse vectors, you can query them together: ```python query_text = "Who was Arthur Ashe?" --- # Compute sparse and dense vectors query_indices, query_values = compute_sparse_vector(query_text) query_dense_vector = compute_dense_vector(query_text) client.search_batch( collection_name=COLLECTION_NAME, requests=[\ models.SearchRequest(\ vector=models.NamedVector(\ name="text-dense",\ vector=query_dense_vector,\ ),\ limit=10,\ ),\ models.SearchRequest(\ vector=models.NamedSparseVector(\ name="text-sparse",\ vector=models.SparseVector(\ indices=query_indices,\ values=query_values,\ ),\ ),\ limit=10,\ ),\ ], ) ``` The result will be a pair of result lists, one for dense and one for sparse vectors. Having those results, there are several ways to combine them: ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#mixing-or-fusion) Mixing or fusion You can mix the results from both dense and sparse vectors, based purely on their relative scores. This is a simple and effective approach, but it doesn’t take into account the semantic similarity between the results. Among the [popular mixing methods](https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18) are: ``` - Reciprocal Ranked Fusion (RRF) - Relative Score Fusion (RSF) - Distribution-Based Score Fusion (DBSF) ``` ![Relative Score Fusion](https://qdrant.tech/articles_data/sparse-vectors/mixture.png) Relative Score Fusion [Ranx](https://github.com/AmenRa/ranx) is a great library for mixing results from different sources. ### [Anchor](https://qdrant.tech/articles/sparse-vectors/\#re-ranking) Re-ranking You can use obtained results as a first stage of a two-stage retrieval process. In the second stage, you can re-rank the results from the first stage using a more complex model, such as [Cross-Encoders](https://www.sbert.net/examples/applications/cross-encoder/README.html) or services like [Cohere Rerank](https://txt.cohere.com/rerank/). And that’s it! You’ve successfully achieved hybrid search with Qdrant! ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#additional-resources) Additional resources For those who want to dive deeper, here are the top papers on the topic most of which have code available: 1. Problem Motivation: [Sparse Overcomplete Word Vector Representations](https://ar5iv.org/abs/1506.02004?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) 2. [SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval](https://ar5iv.org/abs/2109.10086?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) 3. [SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](https://ar5iv.org/abs/2107.05720?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) 4. Late Interaction - [ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction](https://ar5iv.org/abs/2112.01488?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) 5. [SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval](https://research.google/pubs/pub52289/?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) **Why just read when you can try it out?** We’ve packed an easy-to-use Colab for you on how to make a Sparse Vector: [Sparse Vectors Single Encoder Demo](https://colab.research.google.com/drive/1wa2Yr5BCOgV0MTOFFTude99BOXCLHXky?usp=sharing). Run it, tinker with it, and start seeing the magic unfold in your projects. We can’t wait to hear how you use it! ## [Anchor](https://qdrant.tech/articles/sparse-vectors/\#conclusion) Conclusion Alright, folks, let’s wrap it up. Better search isn’t a ’nice-to-have,’ it’s a game-changer, and Qdrant can get you there. Got questions? Our [Discord community](https://qdrant.to/discord?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) is teeming with answers. If you enjoyed reading this, why not sign up for our [newsletter](https://qdrant.tech/subscribe/?utm_source=qdrant&utm_medium=website&utm_campaign=sparse-vectors&utm_content=article&utm_term=sparse-vectors) to stay ahead of the curve. And, of course, a big thanks to you, our readers, for pushing us to make ranking better for everyone. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/sparse-vectors.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/sparse-vectors.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-14-lllmstxt|> ## rag-chatbot-red-hat-openshift-haystack - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Private Chatbot for Interactive Learning --- # [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#private-chatbot-for-interactive-learning) Private Chatbot for Interactive Learning | Time: 120 min | Level: Advanced | | | | --- | --- | --- | --- | With chatbots, companies can scale their training programs to accommodate a large workforce, delivering consistent and standardized learning experiences across departments, locations, and time zones. Furthermore, having already completed their online training, corporate employees might want to refer back old course materials. Most of this information is proprietary to the company, and manually searching through an entire library of materials takes time. However, a chatbot built on this knowledge can respond in the blink of an eye. With a simple RAG pipeline, you can build a private chatbot. In this tutorial, you will combine open source tools inside of a closed infrastructure and tie them together with a reliable framework. This custom solution lets you run a chatbot without public internet access. You will be able to keep sensitive data secure without compromising privacy. ![OpenShift](https://qdrant.tech/documentation/examples/student-rag-haystack-red-hat-openshift-hc/openshift-diagram.png)**Figure 1:** The LLM and Qdrant Hybrid Cloud are containerized as separate services. Haystack combines them into a RAG pipeline and exposes the API via Hayhooks. ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#components) Components To maintain complete data isolation, we need to limit ourselves to open-source tools and use them in a private environment, such as [Red Hat OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift). The pipeline will run internally and will be inaccessible from the internet. - **Dataset:** [Red Hat Interactive Learning Portal](https://developers.redhat.com/learn), an online library of Red Hat course materials. - **LLM:** `mistralai/Mistral-7B-Instruct-v0.1`, deployed as a standalone service on OpenShift. - **Embedding Model:** `BAAI/bge-base-en-v1.5`, lightweight embedding model deployed from within the Haystack pipeline with [FastEmbed](https://github.com/qdrant/fastembed) - **Vector DB:** [Qdrant Hybrid Cloud](https://hybrid-cloud.qdrant.tech/) running on OpenShift. - **Framework:** [Haystack 2.x](https://haystack.deepset.ai/) to connect all and [Hayhooks](https://docs.haystack.deepset.ai/docs/hayhooks) to serve the app through HTTP endpoints. ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#procedure) Procedure The [Haystack](https://haystack.deepset.ai/) framework leverages two pipelines, which combine our components sequentially to process data. 1. The **Indexing Pipeline** will run offline in batches, when new data is added or updated. 2. The **Search Pipeline** will retrieve information from Qdrant and use an LLM to produce an answer. > **Note:** We will define the pipelines in Python and then export them to YAML format, so that [Hayhooks](https://docs.haystack.deepset.ai/docs/hayhooks) can run them as a web service. ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#prerequisites) Prerequisites ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#deploy-the-llm-to-openshift) Deploy the LLM to OpenShift Follow the steps in [Chapter 6. Serving large language models](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.5/html/working_on_data_science_projects/serving-large-language-models_serving-large-language-models#doc-wrapper). This will download the LLM from the [HuggingFace](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), and deploy it to OpenShift using a _single model serving platform_. Your LLM service will have a URL, which you need to store as an environment variable. shellpython ```shell export INFERENCE_ENDPOINT_URL="http://mistral-service.default.svc.cluster.local" ``` ```python import os os.environ["INFERENCE_ENDPOINT_URL"] = "http://mistral-service.default.svc.cluster.local" ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#launch-qdrant-hybrid-cloud) Launch Qdrant Hybrid Cloud Complete **How to Set Up Qdrant on Red Hat OpenShift**. When in Hybrid Cloud, your Qdrant instance is private and and its nodes run on the same OpenShift infrastructure as your other components. Retrieve your Qdrant URL and API key and store them as environment variables: shellpython ```shell export QDRANT_URL="https://qdrant.example.com" export QDRANT_API_KEY="your-api-key" ``` ```python os.environ["QDRANT_URL"] = "https://qdrant.example.com" os.environ["QDRANT_API_KEY"] = "your-api-key" ``` ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#implementation) Implementation We will first create an indexing pipeline to add documents to the system. Then, the search pipeline will retrieve relevant data from our documents. After the pipelines are tested, we will export them to YAML files. ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#indexing-pipeline) Indexing pipeline [Haystack 2.x](https://haystack.deepset.ai/) comes packed with a lot of useful components, from data fetching, through HTML parsing, up to the vector storage. Before we start, there are a few Python packages that we need to install: ```shell pip install haystack-ai \ qdrant-client \ qdrant-haystack \ fastembed-haystack ``` Our environment is now ready, so we can jump right into the code. Let’s define an empty pipeline and gradually add components to it: ```python from haystack import Pipeline indexing_pipeline = Pipeline() ``` #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#data-fetching-and-conversion) Data fetching and conversion In this step, we will use Haystack’s `LinkContentFetcher` to download course content from a list of URLs and store it in Qdrant for retrieval. As we don’t want to store raw HTML, this tool will extract text content from each webpage. Then, the fetcher will divide them into digestible chunks, since the documents might be pretty long. Let’s start with data fetching and text conversion: ```python from haystack.components.fetchers import LinkContentFetcher from haystack.components.converters import HTMLToDocument fetcher = LinkContentFetcher() converter = HTMLToDocument() indexing_pipeline.add_component("fetcher", fetcher) indexing_pipeline.add_component("converter", converter) ``` Our pipeline knows there are two components, but they are not connected yet. We need to define the flow between them: ```python indexing_pipeline.connect("fetcher.streams", "converter.sources") ``` Each component has a set of inputs and outputs which might be combined in a directed graph. The definitions of the inputs and outputs are usually provided in the documentation of the component. The `LinkContentFetcher` has the following parameters: ![Parameters of the LinkContentFetcher](https://qdrant.tech/documentation/examples/student-rag-haystack-red-hat-openshift-hc/haystack-link-content-fetcher.png) _Source: [https://docs.haystack.deepset.ai/docs/linkcontentfetcher](https://docs.haystack.deepset.ai/docs/linkcontentfetcher)_ #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#chunking-and-creating-the-embeddings) Chunking and creating the embeddings We used `HTMLToDocument` to convert the HTML sources into `Document` instances of Haystack, which is a base class containing some data to be queried. However, a single document might be too long to be processed by the embedding model, and it also carries way too much information to make the search relevant. Therefore, we need to split the document into smaller parts and convert them into embeddings. For this, we will use the `DocumentSplitter` and `FastembedDocumentEmbedder` pointed to our `BAAI/bge-base-en-v1.5` model: ```python from haystack.components.preprocessors import DocumentSplitter from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder splitter = DocumentSplitter(split_by="sentence", split_length=5, split_overlap=2) embedder = FastembedDocumentEmbedder(model="BAAI/bge-base-en-v1.5") embedder.warm_up() indexing_pipeline.add_component("splitter", splitter) indexing_pipeline.add_component("embedder", embedder) indexing_pipeline.connect("converter.documents", "splitter.documents") indexing_pipeline.connect("splitter.documents", "embedder.documents") ``` #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#writing-data-to-qdrant) Writing data to Qdrant The splitter will be producing chunks with a maximum length of 5 sentences, with an overlap of 2 sentences. Then, these smaller portions will be converted into embeddings. Finally, we need to store our embeddings in Qdrant. ```python from haystack.utils import Secret from haystack_integrations.document_stores.qdrant import QdrantDocumentStore from haystack.components.writers import DocumentWriter document_store = QdrantDocumentStore( os.environ["QDRANT_URL"], api_key=Secret.from_env_var("QDRANT_API_KEY"), index="red-hat-learning", return_embedding=True, embedding_dim=768, ) writer = DocumentWriter(document_store=document_store) indexing_pipeline.add_component("writer", writer) indexing_pipeline.connect("embedder.documents", "writer.documents") ``` Our pipeline is now complete. Haystack comes with a handy visualization of the pipeline, so you can see and verify the connections between the components. It is displayed in the Jupyter notebook, but you can also export it to a file: ```python indexing_pipeline.draw("indexing_pipeline.png") ``` ![Structure of the indexing pipeline](https://qdrant.tech/documentation/examples/student-rag-haystack-red-hat-openshift-hc/indexing_pipeline.png) #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#test-the-entire-pipeline) Test the entire pipeline We can finally run it on a list of URLs to index the content in Qdrant. We have a bunch of URLs to all the Red Hat OpenShift Foundations course lessons, so let’s use them: ```python course_urls = [\ "https://developers.redhat.com/learn/openshift/foundations-openshift",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:openshift-and-developer-sandbox",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:overview-web-console",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:use-terminal-window-within-red-hat-openshift-web-console",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:install-application-source-code-github-repository-using-openshift-web-console",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:install-application-linux-container-image-repository-using-openshift-web-console",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:install-application-linux-container-image-using-oc-cli-tool",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:install-application-source-code-using-oc-cli-tool",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:scale-applications-using-openshift-web-console",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:scale-applications-using-oc-cli-tool",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:work-databases-openshift-using-oc-cli-tool",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:work-databases-openshift-web-console",\ "https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:view-performance-information-using-openshift-web-console",\ ] indexing_pipeline.run(data={ "fetcher": { "urls": course_urls, } }) ``` The execution might take a while, as the model needs to process all the documents. After the process is finished, we should have all the documents stored in Qdrant, ready for search. You should see a short summary of processed documents: ```shell {'writer': {'documents_written': 381}} ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#search-pipeline) Search pipeline Our documents are now indexed and ready for search. The next pipeline is a bit simpler, but we still need to define a few components. Let’s start again with an empty pipeline: ```python search_pipeline = Pipeline() ``` Our second process takes user input, converts it into embeddings and then searches for the most relevant documents using the query embedding. This might look familiar, but we arent working with `Document` instances anymore, since the query only accepts raw text. Thus, some of the components will be different, especially the embedder, as it has to accept a single string as an input and produce a single embedding as an output: ```python from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever query_embedder = FastembedTextEmbedder(model="BAAI/bge-base-en-v1.5") query_embedder.warm_up() retriever = QdrantEmbeddingRetriever( document_store=document_store, # The same document store as the one used for indexing top_k=3, # Number of documents to return ) search_pipeline.add_component("query_embedder", query_embedder) search_pipeline.add_component("retriever", retriever) search_pipeline.connect("query_embedder.embedding", "retriever.query_embedding") ``` #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#run-a-test-query) Run a test query If our goal was to just retrieve the relevant documents, we could stop here. Let’s try the current pipeline on a simple query: ```python query = "How to install an application using the OpenShift web console?" search_pipeline.run(data={ "query_embedder": { "text": query } }) ``` We set the `top_k` parameter to 3, so the retriever should return the three most relevant documents. Your output should look like this: ```text { 'retriever': { 'documents': [\ Document(id=867b4aa4c37a91e72dc7ff452c47972c1a46a279a7531cd6af14169bcef1441b, content: 'Install a Node.js application from GitHub using the web console The following describes the steps r...', meta: {'content_type': 'text/html', 'source_id': 'f56e8f827dda86abe67c0ba3b4b11331d896e2d4f7b2b43c74d3ce973d07be0c', 'url': 'https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:work-databases-openshift-web-console'}, score: 0.9209432),\ Document(id=0c74381c178597dd91335ebfde790d13bf5989b682d73bf5573c7734e6765af7, content: 'How to remove an application from OpenShift using the web console. In addition to providing the cap...', meta: {'content_type': 'text/html', 'source_id': '2a0759f3ce4a37d9f5c2af9c0ffcc80879077c102fb8e41e576e04833c9d24ce', 'url': 'https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:install-application-linux-container-image-repository-using-openshift-web-console'}, score: 0.9132109500000001),\ Document(id=3e5f8923a34ab05611ef20783211e5543e880c709fd6534d9c1f63576edc4061, content: 'Path resource: Install an application from source code in a GitHub repository using the OpenShift w...', meta: {'content_type': 'text/html', 'source_id': 'a4c4cd62d07c0d9d240e3289d2a1cc0a3d1127ae70704529967f715601559089', 'url': 'https://developers.redhat.com/learning/learn:openshift:foundations-openshift/resource/resources:install-application-source-code-github-repository-using-openshift-web-console'}, score: 0.912748935)\ ] } } ``` #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#generating-the-answer) Generating the answer Retrieval should serve more than just documents. Therefore, we will need to use an LLM to generate exact answers to our question. This is the final component of our second pipeline. Haystack will create a prompt which adds your documents to the model’s context. ```python from haystack.components.builders.prompt_builder import PromptBuilder from haystack.components.generators import HuggingFaceTGIGenerator prompt_builder = PromptBuilder(""" Given the following information, answer the question. Context: {% for document in documents %} {{ document.content }} {% endfor %} Question: {{ query }} """) llm = HuggingFaceTGIGenerator( model="mistralai/Mistral-7B-Instruct-v0.1", url=os.environ["INFERENCE_ENDPOINT_URL"], generation_kwargs={ "max_new_tokens": 1000, # Allow longer responses }, ) search_pipeline.add_component("prompt_builder", prompt_builder) search_pipeline.add_component("llm", llm) search_pipeline.connect("retriever.documents", "prompt_builder.documents") search_pipeline.connect("prompt_builder.prompt", "llm.prompt") ``` The `PromptBuilder` is a Jinja2 template that will be filled with the documents and the query. The `HuggingFaceTGIGenerator` connects to the LLM service and generates the answer. Let’s run the pipeline again: ```python query = "How to install an application using the OpenShift web console?" response = search_pipeline.run(data={ "query_embedder": { "text": query }, "prompt_builder": { "query": query }, }) ``` The LLM may provide multiple replies, if asked to do so, so let’s iterate over and print them out: ```python for reply in response["llm"]["replies"]: print(reply.strip()) ``` In our case there is a single response, which should be the answer to the question: ```text Answer: To install an application using the OpenShift web console, follow these steps: 1. Select +Add on the left side of the web console. 2. Identify the container image to install. 3. Using your web browser, navigate to the Developer Sandbox for Red Hat OpenShift and select Start your Sandbox for free. 4. Install an application from source code stored in a GitHub repository using the OpenShift web console. ``` Our final search pipeline might also be visualized, so we can see how the components are glued together: ```python search_pipeline.draw("search_pipeline.png") ``` ![Structure of the search pipeline](https://qdrant.tech/documentation/examples/student-rag-haystack-red-hat-openshift-hc/search_pipeline.png) ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#deployment) Deployment The pipelines are now ready, and we can export them to YAML. Hayhooks will use these files to run the pipelines as HTTP endpoints. To do this, specify both file paths and your environment variables. > Note: The indexing pipeline might be run inside your ETL tool, but search should be definitely exposed as an HTTP endpoint. Let’s run it on the local machine: ```shell pip install hayhooks ``` First of all, we need to save the pipelines to the YAML file: ```python with open("search-pipeline.yaml", "w") as fp: search_pipeline.dump(fp) ``` And now we are able to run the Hayhooks service: ```shell hayhooks run ``` The command should start the service on the default port, so you can access it at `http://localhost:1416`. The pipeline is not deployed yet, but we can do it with just another command: ```shell hayhooks deploy search-pipeline.yaml ``` Once it’s finished, you should be able to see the OpenAPI documentation at [http://localhost:1416/docs](http://localhost:1416/docs), and test the newly created endpoint. ![Search pipeline in the OpenAPI documentation](https://qdrant.tech/documentation/examples/student-rag-haystack-red-hat-openshift-hc/hayhooks-openapi.png) Our search is now accessible through the HTTP endpoint, so we can integrate it with any other service. We can even control the other parameters, like the number of documents to return: ```shell curl -X 'POST' \ 'http://localhost:1416/search-pipeline' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "llm": { }, "prompt_builder": { "query": "How can I remove an application?" }, "query_embedder": { "text": "How can I remove an application?" }, "retriever": { "top_k": 5 } }' ``` The response should be similar to the one we got in the Python before: ```json { "llm": { "replies": [\ "\n\nAnswer: You can remove an application running in OpenShift by right-clicking on the circular graphic representing the application in Topology view and selecting the Delete Application text from the dialog that appears when you click the graphic’s outer ring. Alternatively, you can use the oc CLI tool to delete an installed application using the oc delete all command."\ ], "meta": [\ {\ "model": "mistralai/Mistral-7B-Instruct-v0.1",\ "index": 0,\ "finish_reason": "eos_token",\ "usage": {\ "completion_tokens": 75,\ "prompt_tokens": 642,\ "total_tokens": 717\ }\ }\ ] } } ``` ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/\#next-steps) Next steps - In this example, [Red Hat OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift) is the infrastructure of choice for proprietary chatbots. [Read more](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8) about how to host AI projects in their [extensive documentation](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8). - [Haystack’s documentation](https://docs.haystack.deepset.ai/docs/kubernetes) describes [how to deploy the Hayhooks service in a Kubernetes\\ environment](https://docs.haystack.deepset.ai/docs/kubernetes), so you can easily move it to your own OpenShift infrastructure. - If you are just getting started and need more guidance on Qdrant, read the [quickstart](https://qdrant.tech/documentation/quick-start/) or try out our [beginner tutorial](https://qdrant.tech/documentation/tutorials/neural-search/). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-chatbot-red-hat-openshift-haystack.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-chatbot-red-hat-openshift-haystack.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-15-lllmstxt|> ## qdrant-1.7.x - [Articles](https://qdrant.tech/articles/) - Qdrant 1.7.0 has just landed! [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Qdrant 1.7.0 has just landed! Kacper Łukawski · December 10, 2023 ![Qdrant 1.7.0 has just landed!](https://qdrant.tech/articles_data/qdrant-1.7.x/preview/title.jpg) Please welcome the long-awaited [Qdrant 1.7.0 release](https://github.com/qdrant/qdrant/releases/tag/v1.7.0). Except for a handful of minor fixes and improvements, this release brings some cool brand-new features that we are excited to share! The latest version of your favorite vector search engine finally supports **sparse vectors**. That’s the feature many of you requested, so why should we ignore it? We also decided to continue our journey with [vector similarity beyond search](https://qdrant.tech/articles/vector-similarity-beyond-search/). The new Discovery API covers some utterly new use cases. We’re more than excited to see what you will build with it! But there is more to it! Check out what’s new in **Qdrant 1.7.0**! 1. Sparse vectors: do you want to use keyword-based search? Support for sparse vectors is finally here! 2. Discovery API: an entirely new way of using vectors for restricted search and exploration. 3. User-defined sharding: you can now decide which points should be stored on which shard. 4. Snapshot-based shard transfer: a new option for moving shards between nodes. Do you see something missing? Your feedback drives the development of Qdrant, so do not hesitate to [join our Discord community](https://qdrant.to/discord) and help us build the best vector search engine out there! ## [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#new-features) New features Qdrant 1.7.0 brings a bunch of new features. Let’s take a closer look at them! ### [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#sparse-vectors) Sparse vectors Traditional keyword-based search mechanisms often rely on algorithms like TF-IDF, BM25, or comparable methods. While these techniques internally utilize vectors, they typically involve sparse vector representations. In these methods, the **vectors are predominantly filled with zeros, containing a relatively small number of non-zero values**. Those sparse vectors are theoretically high dimensional, definitely way higher than the dense vectors used in semantic search. However, since the majority of dimensions are usually zeros, we store them differently and just keep the non-zero dimensions. Until now, Qdrant has not been able to handle sparse vectors natively. Some were trying to convert them to dense vectors, but that was not the best solution or a suggested way. We even wrote a piece with [our thoughts on building a hybrid search](https://qdrant.tech/articles/hybrid-search/), and we encouraged you to use a different tool for keyword lookup. Things have changed since then, as so many of you wanted a single tool for sparse and dense vectors. And responding to this [popular](https://github.com/qdrant/qdrant/issues/1678) [demand](https://github.com/qdrant/qdrant/issues/1135), we’ve now introduced sparse vectors! If you’re coming across the topic of sparse vectors for the first time, our [Brief History of Search](https://qdrant.tech/documentation/overview/vector-search/) explains the difference between sparse and dense vectors. Check out the [sparse vectors article](https://qdrant.tech/articles/sparse-vectors/) and [sparse vectors index docs](https://qdrant.tech/documentation/concepts/indexing/#sparse-vector-index) for more details on what this new index means for Qdrant users. ### [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#discovery-api) Discovery API The recently launched [Discovery API](https://qdrant.tech/documentation/concepts/explore/#discovery-api) extends the range of scenarios for leveraging vectors. While its interface mirrors the [Recommendation API](https://qdrant.tech/documentation/concepts/explore/#recommendation-api), it focuses on refining the search parameters for greater precision. The concept of ‘context’ refers to a collection of positive-negative pairs that define zones within a space. Each pair effectively divides the space into positive or negative segments. This concept guides the search operation to prioritize points based on their inclusion within positive zones or their avoidance of negative zones. Essentially, the search algorithm favors points that fall within multiple positive zones or steer clear of negative ones. The Discovery API can be used in two ways - either with or without the target point. The first case is called a **discovery search**, while the second is called a **context search**. #### [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#discovery-search) Discovery search _Discovery search_ is an operation that uses a target point to find the most relevant points in the collection, while performing the search in the preferred areas only. That is basically a search operation with more control over the search space. ![Discovery search visualization](https://qdrant.tech/articles_data/qdrant-1.7.x/discovery-search.png) Please refer to the [Discovery API documentation on discovery search](https://qdrant.tech/documentation/concepts/explore/#discovery-search) for more details and the internal mechanics of the operation. #### [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#context-search) Context search The mode of _context search_ is similar to the discovery search, but it does not use a target point. Instead, the `context` is used to navigate the [HNSW graph](https://arxiv.org/abs/1603.09320) towards preferred zones. It is expected that the results in that mode will be diverse, and not centered around one point. _Context Search_ could serve as a solution for individuals seeking a more exploratory approach to navigate the vector space. ![Context search visualization](https://qdrant.tech/articles_data/qdrant-1.7.x/context-search.png) ### [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#user-defined-sharding) User-defined sharding Qdrant’s collections are divided into shards. A single **shard** is a self-contained store of points, which can be moved between nodes. Up till now, the points were distributed among shards by using a consistent hashing algorithm, so that shards were managing non-intersecting subsets of points. The latter one remains true, but now you can define your own sharding and decide which points should be stored on which shard. Sounds cool, right? But why would you need that? Well, there are multiple scenarios in which you may want to use custom sharding. For example, you may want to store some points on a dedicated node, or you may want to store points from the same user on the same shard and While the existing behavior is still the default one, you can now define the shards when you create a collection. Then, you can assign each point to a shard by providing a `shard_key` in the `upsert` operation. What’s more, you can also search over the selected shards only, by providing the `shard_key` parameter in the search operation. ```http POST /collections/my_collection/points/search { "vector": [0.29, 0.81, 0.75, 0.11], "shard_key": ["cats", "dogs"], "limit": 10, "with_payload": true, } ``` If you want to know more about the user-defined sharding, please refer to the [sharding documentation](https://qdrant.tech/documentation/guides/distributed_deployment/#sharding). ### [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#snapshot-based-shard-transfer) Snapshot-based shard transfer That’s a really more in depth technical improvement for the distributed mode users, that we implemented a new options the shard transfer mechanism. The new approach is based on the snapshot of the shard, which is transferred to the target node. Moving shards is required for dynamical scaling of the cluster. Your data can migrate between nodes, and the way you move it is crucial for the performance of the whole system. The good old `stream_records` method (still the default one) transmits all the records between the machines and indexes them on the target node. In the case of moving the shard, it’s necessary to recreate the HNSW index each time. However, with the introduction of the new `snapshot` approach, the snapshot itself, inclusive of all data and potentially quantized content, is transferred to the target node. This comprehensive snapshot includes the entire index, enabling the target node to seamlessly load it and promptly begin handling requests without the need for index recreation. There are multiple scenarios in which you may prefer one over the other. Please check out the docs of the [shard transfer method](https://qdrant.tech/documentation/guides/distributed_deployment/#shard-transfer-method) for more details and head-to-head comparison. As for now, the old `stream_records` method is still the default one, but we may decide to change it in the future. ## [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#minor-improvements) Minor improvements Beyond introducing new features, Qdrant 1.7.0 enhances performance and addresses various minor issues. Here’s a rundown of the key improvements: 1. Improvement of HNSW Index Building on High CPU Systems ( [PR#2869](https://github.com/qdrant/qdrant/pull/2869)). 2. Improving [Search Tail Latencies](https://github.com/qdrant/qdrant/pull/2931): improvement for high CPU systems with many parallel searches, directly impacting the user experience by reducing latency. 3. [Adding Index for Geo Map Payloads](https://github.com/qdrant/qdrant/pull/2768): index for geo map payloads can significantly improve search performance, especially for applications involving geographical data. 4. Stability of Consensus on Big High Load Clusters: enhancing the stability of consensus in large, high-load environments is critical for ensuring the reliability and scalability of the system ( [PR#3013](https://github.com/qdrant/qdrant/pull/3013), [PR#3026](https://github.com/qdrant/qdrant/pull/3026), [PR#2942](https://github.com/qdrant/qdrant/pull/2942), [PR#3103](https://github.com/qdrant/qdrant/pull/3103), [PR#3054](https://github.com/qdrant/qdrant/pull/3054)). 5. Configurable Timeout for Searches: allowing users to configure the timeout for searches provides greater flexibility and can help optimize system performance under different operational conditions ( [PR#2748](https://github.com/qdrant/qdrant/pull/2748), [PR#2771](https://github.com/qdrant/qdrant/pull/2771)). ## [Anchor](https://qdrant.tech/articles/qdrant-1.7.x/\#release-notes) Release notes [Our release notes](https://github.com/qdrant/qdrant/releases/tag/v1.7.0) are a place to go if you are interested in more details. Please remember that Qdrant is an open source project, so feel free to [contribute](https://github.com/qdrant/qdrant/issues)! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.7.x.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.7.x.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-16-lllmstxt|> ## logging-monitoring - [Documentation](https://qdrant.tech/documentation/) - [Private cloud](https://qdrant.tech/documentation/private-cloud/) - Logging & Monitoring --- # [Anchor](https://qdrant.tech/documentation/private-cloud/logging-monitoring/\#configuring-logging--monitoring-in-qdrant-private-cloud) Configuring Logging & Monitoring in Qdrant Private Cloud ## [Anchor](https://qdrant.tech/documentation/private-cloud/logging-monitoring/\#logging) Logging You can access the logs with kubectl or the Kubernetes log management tool of your choice. For example: ```bash kubectl -n qdrant-private-cloud logs -l app=qdrant,cluster-id=a7d8d973-0cc5-42de-8d7b-c29d14d24840 ``` **Configuring log levels:** You can configure log levels for the databases individually through the QdrantCluster spec. Example: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840 labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" version: "v1.11.3" size: 1 resources: cpu: 100m memory: "1Gi" storage: "2Gi" config: log_level: "DEBUG" ``` ### [Anchor](https://qdrant.tech/documentation/private-cloud/logging-monitoring/\#integrating-with-a-log-management-system) Integrating with a log management system You can integrate the logs into any log management system that supports Kubernetes. There are no Qdrant specific configurations necessary. Just configure the agents of your system to collect the logs from all Pods in the Qdrant namespace. ## [Anchor](https://qdrant.tech/documentation/private-cloud/logging-monitoring/\#monitoring) Monitoring The Qdrant Cloud console gives you access to basic metrics about CPU, memory and disk usage of your Qdrant clusters. If you want to integrate the Qdrant metrics into your own monitoring system, you can instruct it to scrape the following endpoints that provide metrics in a Prometheus/OpenTelemetry compatible format: - `/metrics` on port 6333 of every Qdrant database Pod, this provides metrics about each the database and its internals itself - `/metrics` on port 9290 of the Qdrant Operator Pod, this provides metrics about the Operator, as well as the status of Qdrant Clusters and Snapshots - For metrics about the state of Kubernetes resources like Pods and PersistentVolumes within the Qdrant Hybrid Cloud namespace, we recommend using [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) ### [Anchor](https://qdrant.tech/documentation/private-cloud/logging-monitoring/\#grafana-dashboard) Grafana dashboard If you scrape the above metrics into your own monitoring system, and your are using Grafana, you can use our [Grafana dashboard](https://github.com/qdrant/qdrant-cloud-grafana-dashboard) to visualize these metrics. ![Grafa dashboard](https://qdrant.tech/documentation/cloud/cloud-grafana-dashboard.png) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/logging-monitoring.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/logging-monitoring.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-17-lllmstxt|> ## administration - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Administration --- # [Anchor](https://qdrant.tech/documentation/guides/administration/\#administration) Administration Qdrant exposes administration tools which enable to modify at runtime the behavior of a qdrant instance without changing its configuration manually. ## [Anchor](https://qdrant.tech/documentation/guides/administration/\#locking) Locking A locking API enables users to restrict the possible operations on a qdrant process. It is important to mention that: - The configuration is not persistent therefore it is necessary to lock again following a restart. - Locking applies to a single node only. It is necessary to call lock on all the desired nodes in a distributed deployment setup. Lock request sample: ```http POST /locks { "error_message": "write is forbidden", "write": true } ``` Write flags enables/disables write lock. If the write lock is set to true, qdrant doesn’t allow creating new collections or adding new data to the existing storage. However, deletion operations or updates are not forbidden under the write lock. This feature enables administrators to prevent a qdrant process from using more disk space while permitting users to search and delete unnecessary data. You can optionally provide the error message that should be used for error responses to users. ## [Anchor](https://qdrant.tech/documentation/guides/administration/\#recovery-mode) Recovery mode _Available as of v1.2.0_ Recovery mode can help in situations where Qdrant fails to start repeatedly. When starting in recovery mode, Qdrant only loads collection metadata to prevent going out of memory. This allows you to resolve out of memory situations, for example, by deleting a collection. After resolving Qdrant can be restarted normally to continue operation. In recovery mode, collection operations are limited to [deleting](https://qdrant.tech/documentation/concepts/collections/#delete-collection) a collection. That is because only collection metadata is loaded during recovery. To enable recovery mode with the Qdrant Docker image you must set the environment variable `QDRANT_ALLOW_RECOVERY_MODE=true`. The container will try to start normally first, and restarts in recovery mode if initialisation fails due to an out of memory error. This behavior is disabled by default. If using a Qdrant binary, recovery mode can be enabled by setting a recovery message in an environment variable, such as `QDRANT__STORAGE__RECOVERY_MODE="My recovery message"`. ## [Anchor](https://qdrant.tech/documentation/guides/administration/\#strict-mode) Strict mode _Available as of v1.13.0_ Strict mode is a feature to restrict certain type of operations on the collection in order to protect it. The goal is to prevent inefficient usage patterns that could overload the collections. This configuration ensures a more predictible and responsive service when you do not have control over the queries that are being executed. Here is a non exhaustive list of operations that can be restricted using strict mode: - Preventing querying non indexed payload which can be very slow - Maximum number of filtering conditions in a query - Maximum batch size when inserting vectors - Maximum collection size (in terms of vectors or payload size) See [schema definitions](https://api.qdrant.tech/api-reference/collections/create-collection#request.body.strict_mode_config) for all the `strict_mode_config` parameters. Upon crossing a limit, the server will return a client side error with the information about the limit that was crossed. As part of the config, the `enabled` field act as a toggle to enable or disable the strict mode dynamically. The `strict_mode_config` can be enabled when [creating](https://qdrant.tech/documentation/guides/administration/#create-a-collection) a collection, for instance below to activate the `unindexed_filtering_retrieve` limit. Setting `unindexed_filtering_retrieve` to false prevents the usage of filtering on a non indexed payload key. httpbashpythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "strict_mode_config": { "enabled": true, "unindexed_filtering_retrieve": false } } ``` ```bash curl -X PUT http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "strict_mode_config": { "enabled":" true, "unindexed_filtering_retrieve": false } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", strict_mode_config=models.StrictModeConfig(enabled=True, unindexed_filtering_retrieve=false), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { strict_mode_config: { enabled: true, unindexed_filtering_retrieve: false, }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{CreateCollectionBuilder, StrictModeConfigBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .strict_config_mode(StrictModeConfigBuilder::default().enabled(true).unindexed_filtering_retrieve(false)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.StrictModeCOnfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setStrictModeConfig( StrictModeConfig.newBuilder().setEnabled(true).setUnindexedFilteringRetrieve(false).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", strictModeConfig: new StrictModeConfig { enabled = true, unindexed_filtering_retrieve = false } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", StrictModeConfig: &qdrant.StrictModeConfig{ Enabled: qdrant.PtrOf(true), IndexingThreshold: qdrant.PtrOf(false), }, }) ``` Or activate it later on an existing collection through the [collection update](https://qdrant.tech/documentation/guides/administration/#update-collection-parameters) API: httpbashpythontypescriptrustjavacsharpgo ```http PATCH /collections/{collection_name} { "strict_mode_config": { "enabled": true, "unindexed_filtering_retrieve": false } } ``` ```bash curl -X PATCH http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "strict_mode_config": { "enabled": true, "unindexed_filtering_retrieve": false } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.update_collection( collection_name="{collection_name}", strict_mode_config=models.StrictModeConfig(enabled=True, unindexed_filtering_retrieve=False), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.updateCollection("{collection_name}", { strict_mode_config: { enabled: true, unindexed_filtering_retrieve: false, }, }); ``` ```rust use qdrant_client::qdrant::{StrictModeConfigBuilder, UpdateCollectionBuilder}; client .update_collection( UpdateCollectionBuilder::new("{collection_name}").strict_mode_config( StrictModeConfigBuilder::default().enabled(true).unindexed_filtering_retrieve(false), ), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.StrictModeConfigBuilder; import io.qdrant.client.grpc.Collections.UpdateCollection; client.updateCollectionAsync( UpdateCollection.newBuilder() .setCollectionName("{collection_name}") .setStrictModeConfig( StrictModeConfig.newBuilder().setEnabled(true).setUnindexedFilteringRetrieve(false).build()) .build()); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateCollectionAsync( collectionName: "{collection_name}", strictModeConfig: new StrictModeConfig { Enabled = true, UnindexedFilteringRetrieve = false } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.UpdateCollection(context.Background(), &qdrant.UpdateCollection{ CollectionName: "{collection_name}", StrictModeConfig: &qdrant.StrictModeConfig{ Enabled: qdrant.PtrOf(true), UnindexedFilteringRetrieve: qdrant.PtrOf(false), }, }) ``` To disable completely strict mode on an existing collection use: httpbashpythontypescriptrustjavacsharpgo ```http PATCH /collections/{collection_name} { "strict_mode_config": { "enabled": false } } ``` ```bash curl -X PATCH http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "strict_mode_config": { "enabled": false, } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.update_collection( collection_name="{collection_name}", strict_mode_config=models.StrictModeConfig(enabled=False), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.updateCollection("{collection_name}", { strict_mode_config: { enabled: false, }, }); ``` ```rust use qdrant_client::qdrant::{StrictModeConfigBuilder, UpdateCollectionBuilder}; client .update_collection( UpdateCollectionBuilder::new("{collection_name}").strict_mode_config( StrictModeConfigBuilder::default().enabled(false), ), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.StrictModeConfigBuilder; import io.qdrant.client.grpc.Collections.UpdateCollection; client.updateCollectionAsync( UpdateCollection.newBuilder() .setCollectionName("{collection_name}") .setStrictModeConfig( StrictModeConfig.newBuilder().setEnabled(false).build()) .build()); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateCollectionAsync( collectionName: "{collection_name}", strictModeConfig: new StrictModeConfig { Enabled = false } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.UpdateCollection(context.Background(), &qdrant.UpdateCollection{ CollectionName: "{collection_name}", StrictModeConfig: &qdrant.StrictModeConfig{ Enabled: qdrant.PtrOf(false), }, }) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/administration.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/administration.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-18-lllmstxt|> ## new-recommendation-api - [Articles](https://qdrant.tech/articles/) - Deliver Better Recommendations with Qdrant’s new API [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Deliver Better Recommendations with Qdrant’s new API Kacper Łukawski · October 25, 2023 ![Deliver Better Recommendations with Qdrant’s new API](https://qdrant.tech/articles_data/new-recommendation-api/preview/title.jpg) The most popular use case for vector search engines, such as Qdrant, is Semantic search with a single query vector. Given the query, we can vectorize (embed) it and find the closest points in the index. But [Vector Similarity beyond Search](https://qdrant.tech/articles/vector-similarity-beyond-search/) does exist, and recommendation systems are a great example. Recommendations might be seen as a multi-aim search, where we want to find items close to positive and far from negative examples. This use of vector databases has many applications, including recommendation systems for e-commerce, content, or even dating apps. Qdrant has provided the [Recommendation API](https://qdrant.tech/documentation/concepts/search/#recommendation-api) for a while, and with the latest release, [Qdrant 1.6](https://github.com/qdrant/qdrant/releases/tag/v1.6.0), we’re glad to give you more flexibility and control over the Recommendation API. Here, we’ll discuss some internals and show how they may be used in practice. ### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#recap-of-the-old-recommendations-api) Recap of the old recommendations API The previous [Recommendation API](https://qdrant.tech/documentation/concepts/search/#recommendation-api) in Qdrant came with some limitations. First of all, it was required to pass vector IDs for both positive and negative example points. If you wanted to use vector embeddings directly, you had to either create a new point in a collection or mimic the behaviour of the Recommendation API by using the [Search API](https://qdrant.tech/documentation/concepts/search/#search-api). Moreover, in the previous releases of Qdrant, you were always asked to provide at least one positive example. This requirement was based on the algorithm used to combine multiple samples into a single query vector. It was a simple, yet effective approach. However, if the only information you had was that your user dislikes some items, you couldn’t use it directly. Qdrant 1.6 brings a more flexible API. You can now provide both IDs and vectors of positive and negative examples. You can even combine them within a single request. That makes the new implementation backward compatible, so you can easily upgrade an existing Qdrant instance without any changes in your code. And the default behaviour of the API is still the same as before. However, we extended the API, so **you can now choose the strategy of how to find the recommended points**. ```http POST /collections/{collection_name}/points/recommend { "positive": [100, 231], "negative": [718, [0.2, 0.3, 0.4, 0.5]], "filter": { "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ] }, "strategy": "average_vector", "limit": 3 } ``` There are two key changes in the request. First of all, we can adjust the strategy of search and set it to `average_vector` (the default) or `best_score`. Moreover, we can pass both IDs ( `718`) and embeddings ( `[0.2, 0.3, 0.4, 0.5]`) as both positive and negative examples. ## [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#hnsw-ann-example-and-strategy) HNSW ANN example and strategy Let’s start with an example to help you understand the [HNSW graph](https://qdrant.tech/articles/filtrable-hnsw/). Assume you want to travel to a small city on another continent: 1. You start from your hometown and take a bus to the local airport. 2. Then, take a flight to one of the closest hubs. 3. From there, you have to take another flight to a hub on your destination continent. 4. Hopefully, one last flight to your destination city. 5. You still have one more leg on local transport to get to your final address. This journey is similar to the HNSW graph’s use in Qdrant’s approximate nearest neighbours search. ![Transport network](https://qdrant.tech/articles_data/new-recommendation-api/example-transport-network.png) HNSW is a multilayer graph of vectors (embeddings), with connections based on vector proximity. The top layer has the least points, and the distances between those points are the biggest. The deeper we go, the more points we have, and the distances get closer. The graph is built in a way that the points are connected to their closest neighbours at every layer. All the points from a particular layer are also in the layer below, so switching the search layer while staying in the same location is possible. In the case of transport networks, the top layer would be the airline hubs, well-connected but with big distances between the airports. Local airports, along with railways and buses, with higher density and smaller distances, make up the middle layers. Lastly, our bottom layer consists of local means of transport, which is the densest and has the smallest distances between the points. You don’t have to check all the possible connections when you travel. You select an intercontinental flight, then a local one, and finally a bus or a taxi. All the decisions are made based on the distance between the points. The search process in HNSW is also based on similarly traversing the graph. Start from the entry point in the top layer, find its closest point and then use that point as the entry point into the next densest layer. This process repeats until we reach the bottom layer. Visited points and distances to the original query vector are kept in memory. If none of the neighbours of the current point is better than the best match, we can stop the traversal, as this is a local minimum. We start at the biggest scale, and then gradually zoom in. In this oversimplified example, we assumed that the distance between the points is the only factor that matters. In reality, we might want to consider other criteria, such as the ticket price, or avoid some specific locations due to certain restrictions. That means, there are various strategies for choosing the best match, which is also true in the case of vector recommendations. We can use different approaches to determine the path of traversing the HNSW graph by changing how we calculate the score of a candidate point during traversal. The default behaviour is based on pure distance, but Qdrant 1.6 exposes two strategies for the recommendation API. ### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#average-vector) Average vector The default strategy, called `average_vector` is the previous one, based on the average of positive and negative examples. It simplifies the recommendations process and converts it into a single vector search. It supports both point IDs and vectors as parameters. For example, you can get recommendations based on past interactions with existing points combined with query vector embedding. Internally, that mechanism is based on the averages of positive and negative examples and was calculated with the following formula: average vector=avg(positive vectors)+(avg(positive vectors)−avg(negative vectors)) The `average_vector` converts the problem of recommendations into a single vector search. ### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#the-new-hotness---best-score) The new hotness - Best score The new strategy is called `best_score`. It does not rely on averages and is more flexible. It allows you to pass just negative samples and uses a slightly more sophisticated algorithm under the hood. The best score is chosen at every step of HNSW graph traversal. We separately calculate the distance between a traversed point and every positive and negative example. In the case of the best score strategy, **there is no single query vector anymore, but a** **bunch of positive and negative queries**. As a result, for each sample in the query, we have a set of distances, one for each sample. In the next step, we simply take the best scores for positives and negatives, creating two separate values. Best scores are just the closest distances of a query to positives and negatives. The idea is: **if a point is closer to any negative than to** **any positive example, we do not want it**. We penalize being close to the negatives, so instead of using the similarity value directly, we check if it’s closer to positives or negatives. The following formula is used to calculate the score of a traversed potential point: ```rust if best_positive_score > best_negative_score { score = best_positive_score } else { score = -(best_negative_score * best_negative_score) } ``` If the point is closer to the negatives, we penalize it by taking the negative squared value of the best negative score. For a closer negative, the score of the candidate point will always be lower or equal to zero, making the chances of choosing that point significantly lower. However, if the best negative score is higher than the best positive score, we still prefer those that are further away from the negatives. That procedure effectively **pulls the traversal procedure away from the negative examples**. If you want to know more about the internals of HNSW, you can check out the article about the [Filtrable HNSW](https://qdrant.tech/articles/filtrable-hnsw/) that covers the topic thoroughly. ## [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#food-discovery-demo) Food Discovery demo Our [Food Discovery demo](https://qdrant.tech/articles/food-discovery-demo/) is an application built on top of the new [Recommendation API](https://qdrant.tech/documentation/concepts/search/#recommendation-api). It allows you to find a meal based on liked and disliked photos. There are some updates, enabled by the new Qdrant release: - **Ability to include multiple textual queries in the recommendation request.** Previously, we only allowed passing a single query to solve the cold start problem. Right now, you can pass multiple queries and mix them with the liked/disliked photos. This became possible because of the new flexibility in parameters. We can pass both point IDs and embedding vectors in the same request, and user queries are obviously not a part of the collection. - **Switch between the recommendation strategies.** You can now choose between the `average_vector` and the `best_score` scoring algorithm. ### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#differences-between-the-strategies) Differences between the strategies The UI of the Food Discovery demo allows you to switch between the strategies. The `best_vector` is the default one, but with just a single switch, you can see how the results differ when using the previous `average_vector` strategy. If you select just a single positive example, both algorithms work identically. ##### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#one-positive-example) One positive example The difference only becomes apparent when you start adding more examples, especially if you choose some negatives. ##### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#one-positive-and-one-negative-example) One positive and one negative example The more likes and dislikes we add, the more diverse the results of the `best_score` strategy will be. In the old strategy, there is just a single vector, so all the examples are similar to it. The new one takes into account all the examples separately, making the variety richer. ##### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#multiple-positive-and-negative-examples) Multiple positive and negative examples Choosing the right strategy is dataset-dependent, and the embeddings play a significant role here. Thus, it’s always worth trying both of them and comparing the results in a particular case. #### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#handling-the-negatives-only) Handling the negatives only In the case of our Food Discovery demo, passing just the negative images can work as an outlier detection mechanism. While the dataset was supposed to contain only food photos, this is not actually true. A simple way to find these outliers is to pass in food item photos as negatives, leading to the results being the most “unlike” food images. In our case you will see pill bottles and books. **The `average_vector` strategy still requires providing at least one positive example!** However, since cosine distance is set up for the collection used in the demo, we faked it using [a trick described in the previous article](https://qdrant.tech/articles/food-discovery-demo/#negative-feedback-only). In a nutshell, if you only pass negative examples, their vectors will be averaged, and the negated resulting vector will be used as a query to the search endpoint. ##### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#negatives-only) Negatives only Still, both methods return different results, so they each have their place depending on the questions being asked and the datasets being used. #### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#challenges-with-multimodality) Challenges with multimodality Food Discovery uses the [CLIP embeddings model](https://huggingface.co/sentence-transformers/clip-ViT-B-32), which is multimodal, allowing both images and texts encoded into the same vector space. Using this model allows for image queries, text queries, or both of them combined. We utilized that mechanism in the updated demo, allowing you to pass the textual queries to filter the results further. ##### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#a-single-text-query) A single text query Text queries might be mixed with the liked and disliked photos, so you can combine them in a single request. However, you might be surprised by the results achieved with the new strategy, if you start adding the negative examples. ##### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#a-single-text-query-with-negative-example) A single text query with negative example This is an issue related to the embeddings themselves. Our dataset contains a bunch of image embeddings that are pretty close to each other. On the other hand, our text queries are quite far from most of the image embeddings, but relatively close to some of them, so the text-to-image search seems to work well. When all query items come from the same domain, such as only text, everything works fine. However, if we mix positive text and negative image embeddings, the results of the `best_score` are overwhelmed by the negative samples, which are simply closer to the dataset embeddings. If you experience such a problem, the `average_vector` strategy might be a better choice. ### [Anchor](https://qdrant.tech/articles/new-recommendation-api/\#check-out-the-demo) Check out the demo The [Food Discovery Demo](https://food-discovery.qdrant.tech/) is available online, so you can test and see the difference. This is an open source project, so you can easily deploy it on your own. The source code is available in the [GitHub repository](https://github.com/qdrant/demo-food-discovery/) and the [README](https://github.com/qdrant/demo-food-discovery/blob/main/README.md) describes the process of setting it up. Since calculating the embeddings takes a while, we precomputed them and exported them as a [snapshot](https://storage.googleapis.com/common-datasets-snapshots/wolt-clip-ViT-B-32.snapshot), which might be easily imported into any Qdrant instance. [Qdrant Cloud is the easiest way to start](https://cloud.qdrant.io/), though! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/new-recommendation-api.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/new-recommendation-api.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-19-lllmstxt|> ## hybrid-search-llamaindex-jinaai - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Chat With Product PDF Manuals Using Hybrid Search --- # [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#chat-with-product-pdf-manuals-using-hybrid-search) Chat With Product PDF Manuals Using Hybrid Search | Time: 120 min | Level: Advanced | Output: [GitHub](https://github.com/infoslack/qdrant-example/blob/main/HC-demo/HC-DO-LlamaIndex-Jina-v2.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/infoslack/qdrant-example/blob/main/HC-demo/HC-DO-LlamaIndex-Jina-v2.ipynb) | | --- | --- | --- | --- | With the proliferation of digital manuals and the increasing demand for quick and accurate customer support, having a chatbot capable of efficiently parsing through complex PDF documents and delivering precise information can be a game-changer for any business. In this tutorial, we’ll walk you through the process of building a RAG-based chatbot, designed specifically to assist users with understanding the operation of various household appliances. We’ll cover the essential steps required to build your system, including data ingestion, natural language understanding, and response generation for customer support use cases. ## [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#components) Components - **Embeddings:** Jina Embeddings, served via the [Jina Embeddings API](https://jina.ai/embeddings/#apiform) - **Database:** [Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/), deployed in a managed Kubernetes cluster on [DigitalOcean\\ (DOKS)](https://www.digitalocean.com/products/kubernetes) - **LLM:** [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) language model on HuggingFace - **Framework:** [LlamaIndex](https://www.llamaindex.ai/) for extended RAG functionality and [Hybrid Search support](https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/). - **Parser:** [LlamaParse](https://github.com/run-llama/llama_parse) as a way to parse complex documents with embedded objects such as tables and figures. ![Architecture diagram](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/architecture-diagram.png) ### [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#procedure) Procedure Retrieval Augmented Generation (RAG) combines search with language generation. An external information retrieval system is used to identify documents likely to provide information relevant to the user’s query. These documents, along with the user’s request, are then passed on to a text-generating language model, producing a natural response. This method enables a language model to respond to questions and access information from a much larger set of documents than it could see otherwise. The language model only looks at a few relevant sections of the documents when generating responses, which also helps to reduce inexplicable errors. ## [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#heading) [Service Managed Kubernetes](https://www.ovhcloud.com/en-in/public-cloud/kubernetes/), powered by OVH Public Cloud Instances, a leading European cloud provider. With OVHcloud Load Balancers and disks built in. OVHcloud Managed Kubernetes provides high availability, compliance, and CNCF conformance, allowing you to focus on your containerized software layers with total reversibility. ## [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#prerequisites) Prerequisites ### [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#deploying-qdrant-hybrid-cloud-on-digitalocean) Deploying Qdrant Hybrid Cloud on DigitalOcean [DigitalOcean Kubernetes (DOKS)](https://www.digitalocean.com/products/kubernetes) is a managed Kubernetes service that lets you deploy Kubernetes clusters without the complexities of handling the control plane and containerized infrastructure. Clusters are compatible with standard Kubernetes toolchains and integrate natively with DigitalOcean Load Balancers and volumes. 1. To start using managed Kubernetes on DigitalOcean, follow the [platform-specific documentation](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/#digital-ocean). 2. Once your Kubernetes clusters are up, [you can begin deploying Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/). 3. Once it’s deployed, you should have a running Qdrant cluster with an API key. ### [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#development-environment) Development environment Then, install all dependencies: ```python !pip install -U \ llama-index \ llama-parse \ python-dotenv \ llama-index-embeddings-jinaai \ llama-index-llms-huggingface \ llama-index-vector-stores-qdrant \ "huggingface_hub[inference]" \ datasets ``` Set up secret key values on `.env` file: ```bash JINAAI_API_KEY HF_INFERENCE_API_KEY LLAMA_CLOUD_API_KEY QDRANT_HOST QDRANT_API_KEY ``` Load all environment variables: ```python import os from dotenv import load_dotenv load_dotenv('./.env') ``` ## [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#implementation) Implementation ### [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#connect-jina-embeddings-and-mixtral-llm) Connect Jina Embeddings and Mixtral LLM LlamaIndex provides built-in support for the [Jina Embeddings API](https://jina.ai/embeddings/#apiform). To use it, you need to initialize the `JinaEmbedding` object with your API Key and model name. For the LLM, you need wrap it in a subclass of `llama_index.llms.CustomLLM` to make it compatible with LlamaIndex. ```python --- # connect embeddings from llama_index.embeddings.jinaai import JinaEmbedding jina_embedding_model = JinaEmbedding( model="jina-embeddings-v2-base-en", api_key=os.getenv("JINAAI_API_KEY"), ) --- # connect LLM from llama_index.llms.huggingface import HuggingFaceInferenceAPI mixtral_llm = HuggingFaceInferenceAPI( model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1", token=os.getenv("HF_INFERENCE_API_KEY"), ) ``` ### [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#prepare-data-for-rag) Prepare data for RAG This example will use household appliance manuals, which are generally available as PDF documents. LlamaPar In the `data` folder, we have three documents, and we will use it to extract the textual content from the PDF and use it as a knowledge base in a simple RAG. The free LlamaIndex Cloud plan is sufficient for our example: ```python import nest_asyncio nest_asyncio.apply() from llama_parse import LlamaParse llamaparse_api_key = os.getenv("LLAMA_CLOUD_API_KEY") llama_parse_documents = LlamaParse(api_key=llamaparse_api_key, result_type="markdown").load_data([\ "data/DJ68-00682F_0.0.pdf",\ "data/F500E_WF80F5E_03445F_EN.pdf",\ "data/O_ME4000R_ME19R7041FS_AA_EN.pdf"\ ]) ``` ### [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#store-data-into-qdrant) Store data into Qdrant The code below does the following: - create a vector store with Qdrant client; - get an embedding for each chunk using Jina Embeddings API; - combines `sparse` and `dense` vectors for hybrid search; - stores all data into Qdrant; Hybrid search with Qdrant must be enabled from the beginning - we can simply set `enable_hybrid=True`. ```python --- # setting embed_model to Jina and llm model to Mixtral from llama_index.core import Settings Settings.embed_model = jina_embedding_model Settings.llm = mixtral_llm from llama_index.core import VectorStoreIndex, StorageContext from llama_index.vector_stores.qdrant import QdrantVectorStore import qdrant_client client = qdrant_client.QdrantClient( url=os.getenv("QDRANT_HOST"), api_key=os.getenv("QDRANT_API_KEY") ) vector_store = QdrantVectorStore( client=client, collection_name="demo", enable_hybrid=True, batch_size=20 ) Settings.chunk_size = 512 storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents=llama_parse_documents, storage_context=storage_context ) ``` ### [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#prepare-a-prompt) Prepare a prompt Here we will create a custom prompt template. This prompt asks the LLM to use only the context information retrieved from Qdrant. When querying with hybrid mode, we can set `similarity_top_k` and `sparse_top_k` separately: - `sparse_top_k` represents how many nodes will be retrieved from each dense and sparse query. - `similarity_top_k` controls the final number of returned nodes. In the above setting, we end up with 10 nodes. Then, we assemble the query engine using the prompt. ```python from llama_index.core import PromptTemplate qa_prompt_tmpl = ( "Context information is below.\n" "-------------------------------" "{context_str}\n" "-------------------------------" "Given the context information and not prior knowledge," "answer the query. Please be concise, and complete.\n" "If the context does not contain an answer to the query," "respond with \"I don't know!\"." "Query: {query_str}\n" "Answer: " ) qa_prompt = PromptTemplate(qa_prompt_tmpl) from llama_index.core.retrievers import VectorIndexRetriever from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core import get_response_synthesizer from llama_index.core import Settings Settings.embed_model = jina_embedding_model Settings.llm = mixtral_llm --- # retriever retriever = VectorIndexRetriever( index=index, similarity_top_k=2, sparse_top_k=12, vector_store_query_mode="hybrid" ) --- # response synthesizer response_synthesizer = get_response_synthesizer( llm=mixtral_llm, text_qa_template=qa_prompt, response_mode="compact", ) --- # query engine query_engine = RetrieverQueryEngine( retriever=retriever, response_synthesizer=response_synthesizer, ) ``` ## [Anchor](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/\#run-a-test-query) Run a test query Now you can ask questions and receive answers based on the data: **Question** ```python result = query_engine.query("What temperature should I use for my laundry?") print(result.response) ``` **Answer** ```text The water temperature is set to 70 ˚C during the Eco Drum Clean cycle. You cannot change the water temperature. However, the temperature for other cycles is not specified in the context. ``` And that’s it! Feel free to scale this up to as many documents and complex PDFs as you like. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/hybrid-search-llamaindex-jinaai.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/hybrid-search-llamaindex-jinaai.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-20-lllmstxt|> ## fastembed-rerankers - [Documentation](https://qdrant.tech/documentation/) - [Fastembed](https://qdrant.tech/documentation/fastembed/) - Reranking with FastEmbed --- # [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/\#how-to-use-rerankers-with-fastembed) How to use rerankers with FastEmbed ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/\#rerankers) Rerankers A reranker is a model that improves the ordering of search results. A subset of documents is initially retrieved using a fast, simple method (e.g., BM25 or dense embeddings). Then, a reranker – a more powerful, precise, but slower and heavier model – re-evaluates this subset to refine document relevance to the query. Rerankers analyze token-level interactions between the query and each document in depth, making them expensive to use but precise in defining relevance. They trade speed for accuracy, so they are best used on a limited candidate set rather than the entire corpus. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/\#goal-of-this-tutorial) Goal of this Tutorial It’s common to use [cross-encoder](https://sbert.net/examples/applications/cross-encoder/README.html) models as rerankers. This tutorial uses [Jina Reranker v2 Base Multilingual](https://jina.ai/news/jina-reranker-v2-for-agentic-rag-ultra-fast-multilingual-function-calling-and-code-search/) (licensed under CC-BY-NC-4.0) – a cross-encoder reranker supported in FastEmbed. We use the `all-MiniLM-L6-v2` dense embedding model (also supported in FastEmbed) as a first-stage retriever and then refine results with `Jina Reranker v2`. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/\#setup) Setup Install `qdrant-client` with `fastembed`. ```python pip install "qdrant-client[fastembed]>=1.14.1" ``` Import cross-encoders and text embeddings for the first-stage retrieval. ```python from fastembed import TextEmbedding from fastembed.rerank.cross_encoder import TextCrossEncoder ``` You can list the cross-encoder rerankers supported in FastEmbed using the following command. ```python TextCrossEncoder.list_supported_models() ``` This command displays the available models, including details such as output embedding dimensions, model description, model size, model sources, and model file. Avaliable models ```python [{'model': 'Xenova/ms-marco-MiniLM-L-6-v2',\ 'size_in_GB': 0.08,\ 'sources': {'hf': 'Xenova/ms-marco-MiniLM-L-6-v2'},\ 'model_file': 'onnx/model.onnx',\ 'description': 'MiniLM-L-6-v2 model optimized for re-ranking tasks.',\ 'license': 'apache-2.0'},\ {'model': 'Xenova/ms-marco-MiniLM-L-12-v2',\ 'size_in_GB': 0.12,\ 'sources': {'hf': 'Xenova/ms-marco-MiniLM-L-12-v2'},\ 'model_file': 'onnx/model.onnx',\ 'description': 'MiniLM-L-12-v2 model optimized for re-ranking tasks.',\ 'license': 'apache-2.0'},\ {'model': 'BAAI/bge-reranker-base',\ 'size_in_GB': 1.04,\ 'sources': {'hf': 'BAAI/bge-reranker-base'},\ 'model_file': 'onnx/model.onnx',\ 'description': 'BGE reranker base model for cross-encoder re-ranking.',\ 'license': 'mit'},\ {'model': 'jinaai/jina-reranker-v1-tiny-en',\ 'size_in_GB': 0.13,\ 'sources': {'hf': 'jinaai/jina-reranker-v1-tiny-en'},\ 'model_file': 'onnx/model.onnx',\ 'description': 'Designed for blazing-fast re-ranking with 8K context length and fewer parameters than jina-reranker-v1-turbo-en.',\ 'license': 'apache-2.0'},\ {'model': 'jinaai/jina-reranker-v1-turbo-en',\ 'size_in_GB': 0.15,\ 'sources': {'hf': 'jinaai/jina-reranker-v1-turbo-en'},\ 'model_file': 'onnx/model.onnx',\ 'description': 'Designed for blazing-fast re-ranking with 8K context length.',\ 'license': 'apache-2.0'},\ {'model': 'jinaai/jina-reranker-v2-base-multilingual',\ 'size_in_GB': 1.11,\ 'sources': {'hf': 'jinaai/jina-reranker-v2-base-multilingual'},\ 'model_file': 'onnx/model.onnx',\ 'description': 'A multi-lingual reranker model for cross-encoder re-ranking with 1K context length and sliding window',\ 'license': 'cc-by-nc-4.0'}] # some of the fields are omitted for brevity ``` Now, load the first-stage retriever and reranker. ```python encoder_name = "sentence-transformers/all-MiniLM-L6-v2" dense_embedding_model = TextEmbedding(model_name=encoder_name) reranker = TextCrossEncoder(model_name='jinaai/jina-reranker-v2-base-multilingual') ``` The model files will be fetched and downloaded, with progress displayed. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/\#embed--index-data-for-the-first-stage-retrieval) Embed & index data for the first-stage retrieval We will vectorize a toy movie description dataset using the `all-MiniLM-L6-v2` model and save the embeddings in Qdrant for first-stage retrieval. Then, we will use a cross-encoder reranking model to rerank a small subset of data retrieved in the first stage. Movie description dataset ```python descriptions = ["In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions.",\ "A film projectionist longs to be a detective, and puts his meagre skills to work when he is framed by a rival for stealing his girlfriend's father's pocketwatch.",\ "A group of high-end professional thieves start to feel the heat from the LAPD when they unknowingly leave a clue at their latest heist.",\ "A petty thief with an utter resemblance to a samurai warlord is hired as the lord's double. When the warlord later dies the thief is forced to take up arms in his place.",\ "A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past.",\ "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre.",\ "When a machine that allows therapists to enter their patients' dreams is stolen, all hell breaks loose. Only a young female therapist, Paprika, can stop it.",\ "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop.",\ "A story that revolves around drug abuse in the affluent north Indian State of Punjab and how the youth there have succumbed to it en-masse resulting in a socio-economic decline.",\ "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent.",\ "Concurrent theatrical ending of the TV series Neon Genesis Evangelion (1995).",\ "During World War II, a rebellious U.S. Army Major is assigned a dozen convicted murderers to train and lead them into a mass assassination mission of German officers.",\ "The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home.",\ "A soldier fighting aliens gets to relive the same day over and over again, the day restarting every time he dies.",\ "After two male musicians witness a mob hit, they flee the state in an all-female band disguised as women, but further complications set in.",\ "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household.",\ "A renegade reporter trailing a young runaway heiress for a big story joins her on a bus heading from Florida to New York, and they end up stuck with each other when the bus leaves them behind at one of the stops.",\ "Story of 40-man Turkish task force who must defend a relay station.",\ "Spinal Tap, one of England's loudest bands, is chronicled by film director Marty DiBergi on what proves to be a fateful tour.",\ "Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl."] ``` ```python descriptions_embeddings = list( dense_embedding_model.embed(descriptions) ) ``` Let’s upload the embeddings to Qdrant. Qdrant Client offers a simple in-memory mode, allowing you to experiment locally with small data volumes. Alternatively, you can use [a free cluster](https://qdrant.tech/documentation/cloud/create-cluster/#create-a-cluster) in Qdrant Cloud for experiments. ```python from qdrant_client import QdrantClient, models client = QdrantClient(":memory:") # Qdrant is running from RAM. ``` Let’s create a [collection](https://qdrant.tech/documentation/concepts/collections/) with our movie data. ```python client.create_collection( collection_name="movies", vectors_config={ "embedding": models.VectorParams( size=client.get_embedding_size("sentence-transformers/all-MiniLM-L6-v2"), distance=models.Distance.COSINE ) } ) ``` And upload the embeddings to it. ```python client.upload_points( collection_name="movies", points=[\ models.PointStruct(\ id=idx,\ payload={"description": description},\ vector={"embedding": vector}\ )\ for idx, (description, vector) in enumerate(\ zip(descriptions, descriptions_embeddings)\ )\ ], ) ``` Upload with implicit embeddings computation ```python client.upload_points( collection_name="movies", points=[\ models.PointStruct(\ id=idx,\ payload={"description": description},\ vector={"embedding": models.Document(text=description, model=encoder_name)},\ )\ for idx, description in enumerate(descriptions)\ ], ) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/\#first-stage-retrieval) First-stage retrieval Let’s see how relevant the results will be using only an `all-MiniLM-L6-v2`-based dense retriever. ```python query = "A story about a strong historically significant female figure." query_embedded = list(dense_embedding_model.query_embed(query))[0] initial_retrieval = client.query_points( collection_name="movies", using="embedding", query=query_embedded, with_payload=True, limit=10 ) description_hits = [] for i, hit in enumerate(initial_retrieval.points): print(f'Result number {i+1} is \"{hit.payload["description"]}\"') description_hits.append(hit.payload["description"]) ``` Query points with implicit embeddings computation ```python query = "A story about a strong historically significant female figure." initial_retrieval = client.query_points( collection_name="movies", using="embedding", query=models.Document(text=query, model=encoder_name), with_payload=True, limit=10 ) ``` The result is as follows: ```bash Result number 1 is "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent." Result number 2 is "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household." ... Result number 9 is "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre." Result number 10 is "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions." ``` We can see that the description of _“The Messenger: The Story of Joan of Arc”_, which is the most fitting, appears 10th in the results. Let’s try refining the order of the retrieved subset with `Jina Reranker v2`. It takes a query and a set of documents (movie descriptions) as input and calculates a relevance score based on token-level interactions between the query and each document. ```python new_scores = list( reranker.rerank(query, description_hits) ) # returns scores between query and each document ranking = [\ (i, score) for i, score in enumerate(new_scores)\ ] # saving document indices ranking.sort( key=lambda x: x[1], reverse=True ) # sorting them in order of relevance defined by reranker for i, rank in enumerate(ranking): print(f'''Reranked result number {i+1} is \"{description_hits[rank[0]]}\"''') ``` The reranker moves the desired movie to the first position based on relevance. ```bash Reranked result number 1 is "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions." Reranked result number 2 is "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household." ... Reranked result number 9 is "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop." Reranked result number 10 is "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre." ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/\#conclusion) Conclusion Rerankers refine search results by reordering retrieved candidates through deeper semantic analysis. For efficiency, they should be applied **only to a small subset of retrieved results**. Balance speed and accuracy in search by leveraging the power of rerankers! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-rerankers.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-rerankers.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-21-lllmstxt|> ## late-interaction-models - [Articles](https://qdrant.tech/articles/) - Any\* Embedding Model Can Become a Late Interaction Model... If You Give It a Chance! [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Any\* Embedding Model Can Become a Late Interaction Model... If You Give It a Chance! Kacper Łukawski · August 14, 2024 ![Any* Embedding Model Can Become a Late Interaction Model... If You Give It a Chance!](https://qdrant.tech/articles_data/late-interaction-models/preview/title.jpg) \\* At least any open-source model, since you need access to its internals. ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#you-can-adapt-dense-embedding-models-for-late-interaction) You Can Adapt Dense Embedding Models for Late Interaction Qdrant 1.10 introduced support for multi-vector representations, with late interaction being a prominent example of this model. In essence, both documents and queries are represented by multiple vectors, and identifying the most relevant documents involves calculating a score based on the similarity between the corresponding query and document embeddings. If you’re not familiar with this paradigm, our updated [Hybrid Search](https://qdrant.tech/articles/hybrid-search/) article explains how multi-vector representations can enhance retrieval quality. **Figure 1:** We can visualize late interaction between corresponding document-query embedding pairs. ![Late interaction model](https://qdrant.tech/articles_data/late-interaction-models/late-interaction.png) There are many specialized late interaction models, such as [ColBERT](https://qdrant.tech/documentation/fastembed/fastembed-colbert/), but **it appears that regular dense embedding models can also be effectively utilized in this manner**. > In this study, we will demonstrate that standard dense embedding models, traditionally used for single-vector representations, can be effectively adapted for late interaction scenarios using output token embeddings as multi-vector representations. By testing out retrieval with Qdrant’s multi-vector feature, we will show that these models can rival or surpass specialized late interaction models in retrieval performance, while offering lower complexity and greater efficiency. This work redefines the potential of dense models in advanced search pipelines, presenting a new method for optimizing retrieval systems. ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#understanding-embedding-models) Understanding Embedding Models The inner workings of embedding models might be surprising to some. The model doesn’t operate directly on the input text; instead, it requires a tokenization step to convert the text into a sequence of token identifiers. Each token identifier is then passed through an embedding layer, which transforms it into a dense vector. Essentially, the embedding layer acts as a lookup table that maps token identifiers to dense vectors. These vectors are then fed into the transformer model as input. **Figure 2:** The tokenization step, which takes place before vectors are added to the transformer model. ![Input token embeddings](https://qdrant.tech/articles_data/late-interaction-models/input-embeddings.png) The input token embeddings are context-free and are learned during the model’s training process. This means that each token always receives the same embedding, regardless of its position in the text. At this stage, the token embeddings are unaware of the context in which they appear. It is the transformer model’s role to contextualize these embeddings. Much has been discussed about the role of attention in transformer models, but in essence, this mechanism is responsible for capturing cross-token relationships. Each transformer module takes a sequence of token embeddings as input and produces a sequence of output token embeddings. Both sequences are of the same length, with each token embedding being enriched by information from the other token embeddings at the current step. **Figure 3:** The mechanism that produces a sequence of output token embeddings. ![Output token embeddings](https://qdrant.tech/articles_data/late-interaction-models/output-embeddings.png) **Figure 4:** The final step performed by the embedding model is pooling the output token embeddings to generate a single vector representation of the input text. ![Pooling](https://qdrant.tech/articles_data/late-interaction-models/pooling.png) There are several pooling strategies, but regardless of which one a model uses, the output is always a single vector representation, which inevitably loses some information about the input. It’s akin to giving someone detailed, step-by-step directions to the nearest grocery store versus simply pointing in the general direction. While the vague direction might suffice in some cases, the detailed instructions are more likely to lead to the desired outcome. ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#using-output-token-embeddings-for-multi-vector-representations) Using Output Token Embeddings for Multi-Vector Representations We often overlook the output token embeddings, but the fact is—they also serve as multi-vector representations of the input text. So, why not explore their use in a multi-vector retrieval model, similar to late interaction models? ### [Anchor](https://qdrant.tech/articles/late-interaction-models/\#experimental-findings) Experimental Findings We conducted several experiments to determine whether output token embeddings could be effectively used in place of traditional late interaction models. The results are quite promising. | Dataset | Model | Experiment | NDCG@10 | | --- | --- | --- | --- | | SciFact | `prithivida/Splade_PP_en_v1` | sparse vectors | 0.70928 | | `colbert-ir/colbertv2.0` | late interaction model | 0.69579 | | `all-MiniLM-L6-v2` | single dense vector representation | 0.64508 | | output token embeddings | 0.70724 | | `BAAI/bge-small-en` | single dense vector representation | 0.68213 | | output token embeddings | 0.73696 | | | | NFCorpus | `prithivida/Splade_PP_en_v1` | sparse vectors | 0.34166 | | `colbert-ir/colbertv2.0` | late interaction model | 0.35036 | | `all-MiniLM-L6-v2` | single dense vector representation | 0.31594 | | output token embeddings | 0.35779 | | `BAAI/bge-small-en` | single dense vector representation | 0.29696 | | output token embeddings | 0.37502 | | | | ArguAna | `prithivida/Splade_PP_en_v1` | sparse vectors | 0.47271 | | `colbert-ir/colbertv2.0` | late interaction model | 0.44534 | | `all-MiniLM-L6-v2` | single dense vector representation | 0.50167 | | output token embeddings | 0.45997 | | `BAAI/bge-small-en` | single dense vector representation | 0.58857 | | output token embeddings | 0.57648 | The [source code for these experiments is open-source](https://github.com/kacperlukawski/beir-qdrant/blob/main/examples/retrieval/search/evaluate_all_exact.py) and utilizes [`beir-qdrant`](https://github.com/kacperlukawski/beir-qdrant), an integration of Qdrant with the [BeIR library](https://github.com/beir-cellar/beir). While this package is not officially maintained by the Qdrant team, it may prove useful for those interested in experimenting with various Qdrant configurations to see how they impact retrieval quality. All experiments were conducted using Qdrant in exact search mode, ensuring the results are not influenced by approximate search. Even the simple `all-MiniLM-L6-v2` model can be applied in a late interaction model fashion, resulting in a positive impact on retrieval quality. However, the best results were achieved with the `BAAI/bge-small-en` model, which outperformed both sparse and late interaction models. It’s important to note that ColBERT has not been trained on BeIR datasets, making its performance fully out of domain. Nevertheless, the `all-MiniLM-L6-v2` [training dataset](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) also lacks any BeIR data, yet it still performs remarkably well. ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#comparative-analysis-of-dense-vs-late-interaction-models) Comparative Analysis of Dense vs. Late Interaction Models The retrieval quality speaks for itself, but there are other important factors to consider. The traditional dense embedding models we tested are less complex than late interaction or sparse models. With fewer parameters, these models are expected to be faster during inference and more cost-effective to maintain. Below is a comparison of the models used in the experiments: | Model | Number of parameters | | --- | --- | | `prithivida/Splade_PP_en_v1` | 109,514,298 | | `colbert-ir/colbertv2.0` | 109,580,544 | | `BAAI/bge-small-en` | 33,360,000 | | `all-MiniLM-L6-v2` | 22,713,216 | One argument against using output token embeddings is the increased storage requirements compared to ColBERT-like models. For instance, the `all-MiniLM-L6-v2` model produces 384-dimensional output token embeddings, which is three times more than the 128-dimensional embeddings generated by ColBERT-like models. This increase not only leads to higher memory usage but also impacts the computational cost of retrieval, as calculating distances takes more time. Mitigating this issue through vector compression would make a lot of sense. ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#exploring-quantization-for-multi-vector-representations) Exploring Quantization for Multi-Vector Representations Binary quantization is generally more effective for high-dimensional vectors, making the `all-MiniLM-L6-v2` model, with its relatively low-dimensional outputs, less ideal for this approach. However, scalar quantization appeared to be a viable alternative. The table below summarizes the impact of quantization on retrieval quality. | Dataset | Model | Experiment | NDCG@10 | | --- | --- | --- | --- | | SciFact | `all-MiniLM-L6-v2` | output token embeddings | 0.70724 | | output token embeddings (uint8) | 0.70297 | | | | NFCorpus | `all-MiniLM-L6-v2` | output token embeddings | 0.35779 | | output token embeddings (uint8) | 0.35572 | It’s important to note that quantization doesn’t always preserve retrieval quality at the same level, but in this case, scalar quantization appears to have minimal impact on retrieval performance. The effect is negligible, while the memory savings are substantial. We managed to maintain the original quality while using four times less memory. Additionally, a quantized vector requires 384 bytes, compared to ColBERT’s 512 bytes. This results in a 25% reduction in memory usage, with retrieval quality remaining nearly unchanged. ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#practical-application-enhancing-retrieval-with-dense-models) Practical Application: Enhancing Retrieval with Dense Models If you’re using one of the sentence transformer models, the output token embeddings are calculated by default. While a single vector representation is more efficient in terms of storage and computation, there’s no need to discard the output token embeddings. According to our experiments, these embeddings can significantly enhance retrieval quality. You can store both the single vector and the output token embeddings in Qdrant, using the single vector for the initial retrieval step and then reranking the results with the output token embeddings. **Figure 5:** A single model pipeline that relies solely on the output token embeddings for reranking. ![Single model reranking](https://qdrant.tech/articles_data/late-interaction-models/single-model-reranking.png) To demonstrate this concept, we implemented a simple reranking pipeline in Qdrant. This pipeline uses a dense embedding model for the initial oversampled retrieval and then relies solely on the output token embeddings for the reranking step. ### [Anchor](https://qdrant.tech/articles/late-interaction-models/\#single-model-retrieval-and-reranking-benchmarks) Single Model Retrieval and Reranking Benchmarks Our tests focused on using the same model for both retrieval and reranking. The reported metric is NDCG@10. In all tests, we applied an oversampling factor of 5x, meaning the retrieval step returned 50 results, which were then narrowed down to 10 during the reranking step. Below are the results for some of the BeIR datasets: | Dataset | `all-miniLM-L6-v2` | `BAAI/bge-small-en` | | --- | --- | --- | | dense embeddings only | dense + reranking | dense embeddings only | dense + reranking | | --- | --- | --- | --- | | SciFact | 0.64508 | 0.70293 | 0.68213 | 0.73053 | | NFCorpus | 0.31594 | 0.34297 | 0.29696 | 0.35996 | | ArguAna | 0.50167 | 0.45378 | 0.58857 | 0.57302 | | Touche-2020 | 0.16904 | 0.19693 | 0.13055 | 0.19821 | | TREC-COVID | 0.47246 | 0.6379 | 0.45788 | 0.53539 | | FiQA-2018 | 0.36867 | 0.41587 | 0.31091 | 0.39067 | The source code for the benchmark is publicly available, and [you can find it in the repository of the `beir-qdrant` package](https://github.com/kacperlukawski/beir-qdrant/blob/main/examples/retrieval/search/evaluate_reranking.py). Overall, adding a reranking step using the same model typically improves retrieval quality. However, the quality of various late interaction models is [often reported based on their reranking performance when BM25 is used for the initial retrieval](https://huggingface.co/mixedbread-ai/mxbai-colbert-large-v1#1-reranking-performance). This experiment aimed to demonstrate how a single model can be effectively used for both retrieval and reranking, and the results are quite promising. Now, let’s explore how to implement this using the new Query API introduced in Qdrant 1.10. ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#setting-up-qdrant-for-late-interaction) Setting Up Qdrant for Late Interaction The new Query API in Qdrant 1.10 enables the construction of even more complex retrieval pipelines. We can use the single vector created after pooling for the initial retrieval step and then rerank the results using the output token embeddings. Assuming the collection is named `my-collection` and is configured to store two named vectors: `dense-vector` and `output-token-embeddings`, here’s how such a collection could be created in Qdrant: ```python from qdrant_client import QdrantClient, models client = QdrantClient("http://localhost:6333") client.create_collection( collection_name="my-collection", vectors_config={ "dense-vector": models.VectorParams( size=384, distance=models.Distance.COSINE, ), "output-token-embeddings": models.VectorParams( size=384, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), ), } ) ``` Both vectors are of the same size since they are produced by the same `all-MiniLM-L6-v2` model. ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-MiniLM-L6-v2") ``` Now, instead of using the search API with just a single dense vector, we can create a reranking pipeline. First, we retrieve 50 results using the dense vector, and then we rerank them using the output token embeddings to obtain the top 10 results. ```python query = "What else can be done with just all-MiniLM-L6-v2 model?" client.query_points( collection_name="my-collection", prefetch=[\ # Prefetch the dense embeddings of the top-50 documents\ models.Prefetch(\ query=model.encode(query).tolist(),\ using="dense-vector",\ limit=50,\ )\ ], # Rerank the top-50 documents retrieved by the dense embedding model # and return just the top-10. Please note we call the same model, but # we ask for the token embeddings by setting the output_value parameter. query=model.encode(query, output_value="token_embeddings").tolist(), using="output-token-embeddings", limit=10, ) ``` ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#try-the-experiment-yourself) Try the Experiment Yourself In a real-world scenario, you might take it a step further by first calculating the token embeddings and then performing pooling to obtain the single vector representation. This approach allows you to complete everything in a single pass. The simplest way to start experimenting with building complex reranking pipelines in Qdrant is by using the forever-free cluster on [Qdrant Cloud](https://cloud.qdrant.io/) and reading [Qdrant’s documentation](https://qdrant.tech/documentation/). The [source code for these experiments is open-source](https://github.com/kacperlukawski/beir-qdrant/blob/main/examples/retrieval/search/evaluate_all_exact.py) and uses [`beir-qdrant`](https://github.com/kacperlukawski/beir-qdrant), an integration of Qdrant with the [BeIR library](https://github.com/beir-cellar/beir). ## [Anchor](https://qdrant.tech/articles/late-interaction-models/\#future-directions-and-research-opportunities) Future Directions and Research Opportunities The initial experiments using output token embeddings in the retrieval process have yielded promising results. However, we plan to conduct further benchmarks to validate these findings and explore the incorporation of sparse methods for the initial retrieval. Additionally, we aim to investigate the impact of quantization on multi-vector representations and its effects on retrieval quality. Finally, we will assess retrieval speed, a crucial factor for many applications. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/late-interaction-models.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/late-interaction-models.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-22-lllmstxt|> ## huggingface-datasets - [Documentation](https://qdrant.tech/documentation/) - [Database tutorials](https://qdrant.tech/documentation/database-tutorials/) - Load a HuggingFace Dataset --- # [Anchor](https://qdrant.tech/documentation/database-tutorials/huggingface-datasets/\#load-and-search-hugging-face-datasets-with-qdrant) Load and Search Hugging Face Datasets with Qdrant [Hugging Face](https://huggingface.co/) provides a platform for sharing and using ML models and datasets. [Qdrant](https://huggingface.co/Qdrant) also publishes datasets along with the embeddings that you can use to practice with Qdrant and build your applications based on semantic search. **Please [let us know](https://qdrant.to/discord) if you’d like to see a specific dataset!** ## [Anchor](https://qdrant.tech/documentation/database-tutorials/huggingface-datasets/\#arxiv-titles-instructorxl-embeddings) arxiv-titles-instructorxl-embeddings [This dataset](https://huggingface.co/datasets/Qdrant/arxiv-titles-instructorxl-embeddings) contains embeddings generated from the paper titles only. Each vector has a payload with the title used to create it, along with the DOI (Digital Object Identifier). ```json { "title": "Nash Social Welfare for Indivisible Items under Separable, Piecewise-Linear Concave Utilities", "DOI": "1612.05191" } ``` You can find a detailed description of the dataset in the [Practice Datasets](https://qdrant.tech/documentation/datasets/#journal-article-titles) section. If you prefer loading the dataset from a Qdrant snapshot, it also linked there. Loading the dataset is as simple as using the `load_dataset` function from the `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("Qdrant/arxiv-titles-instructorxl-embeddings") ``` The dataset contains 2,250,000 vectors. This is how you can check the list of the features in the dataset: ```python dataset.features ``` ### [Anchor](https://qdrant.tech/documentation/database-tutorials/huggingface-datasets/\#streaming-the-dataset) Streaming the dataset Dataset streaming lets you work with a dataset without downloading it. The data is streamed as you iterate over the dataset. You can read more about it in the [Hugging Face\\ documentation](https://huggingface.co/docs/datasets/stream). ```python from datasets import load_dataset dataset = load_dataset( "Qdrant/arxiv-titles-instructorxl-embeddings", split="train", streaming=True ) ``` ### [Anchor](https://qdrant.tech/documentation/database-tutorials/huggingface-datasets/\#loading-the-dataset-into-qdrant) Loading the dataset into Qdrant You can load the dataset into Qdrant using the [Python SDK](https://github.com/qdrant/qdrant-client). The embeddings are already precomputed, so you can store them in a collection, that we’re going to create in a second: ```python from qdrant_client import QdrantClient, models client = QdrantClient("http://localhost:6333") client.create_collection( collection_name="arxiv-titles-instructorxl-embeddings", vectors_config=models.VectorParams( size=768, distance=models.Distance.COSINE, ), ) ``` It is always a good idea to use batching, while loading a large dataset, so let’s do that. We are going to need a helper function to split the dataset into batches: ```python from itertools import islice def batched(iterable, n): iterator = iter(iterable) while batch := list(islice(iterator, n)): yield batch ``` If you are a happy user of Python 3.12+, you can use the [`batched` function from the `itertools`](https://docs.python.org/3/library/itertools.html#itertools.batched) package instead. No matter what Python version you are using, you can use the `upsert` method to load the dataset, batch by batch, into Qdrant: ```python batch_size = 100 for batch in batched(dataset, batch_size): ids = [point.pop("id") for point in batch] vectors = [point.pop("vector") for point in batch] client.upsert( collection_name="arxiv-titles-instructorxl-embeddings", points=models.Batch( ids=ids, vectors=vectors, payloads=batch, ), ) ``` Your collection is ready to be used for search! Please [let us know using Discord](https://qdrant.to/discord) if you would like to see more datasets published on Hugging Face hub. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/huggingface-datasets.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/huggingface-datasets.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-23-lllmstxt|> ## reranking-hybrid-search - [Documentation](https://qdrant.tech/documentation/) - [Advanced tutorials](https://qdrant.tech/documentation/advanced-tutorials/) - Reranking in Hybrid Search --- # [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#reranking-hybrid-search-results-with-qdrant-vector-database) Reranking Hybrid Search Results with Qdrant Vector Database Hybrid search combines dense and sparse retrieval to deliver precise and comprehensive results. By adding reranking with ColBERT, you can further refine search outputs for maximum relevance. In this guide, we’ll show you how to implement hybrid search with reranking in Qdrant, leveraging dense, sparse, and late interaction embeddings to create an efficient, high-accuracy search system. Let’s get started! ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#overview) Overview Let’s start by breaking down the architecture: ![image3.png](https://qdrant.tech/documentation/examples/reranking-hybrid-search/image3.png) Processing Dense, Sparse, and Late Interaction Embeddings in Vector Databases (VDB) ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#ingestion-stage) Ingestion Stage Here’s how we’re going to set up the advanced hybrid search. The process is similar to what we did earlier but with a few powerful additions: 1. **Documents**: Just like before, we start with the raw input—our set of documents that need to be indexed for search. 2. **Dense Embeddings**: We’ll generate dense embeddings for each document, just like in the basic search. These embeddings capture the deeper, semantic meanings behind the text. 3. **Sparse Embeddings**: This is where it gets interesting. Alongside dense embeddings, we’ll create sparse embeddings using more traditional, keyword-based methods. Specifically, we’ll use BM25, a probabilistic retrieval model. BM25 ranks documents based on how relevant their terms are to a given query, taking into account how often terms appear, document length, and how common the term is across all documents. It’s perfect for keyword-heavy searches. 4. **Late Interaction Embeddings**: Now, we add the magic of ColBERT. ColBERT uses a two-stage approach. First, it generates contextualized embeddings for both queries and documents using BERT, and then it performs late interaction—matching those embeddings efficiently using a dot product to fine-tune relevance. This step allows for deeper, contextual understanding, making sure you get the most precise results. 5. **Vector Database**: All of these embeddings—dense, sparse, and late interaction—are stored in a vector database like Qdrant. This allows you to efficiently search, retrieve, and rerank your documents based on multiple layers of relevance. ![image2.png](https://qdrant.tech/documentation/examples/reranking-hybrid-search/image2.png) Query Retrieval and Reranking Process in Search Systems ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#retrieval-stage) Retrieval Stage Now, let’s talk about how we’re going to pull the best results once the user submits a query: 1. **User’s Query**: The user enters a query, and that query is transformed into multiple types of embeddings. We’re talking about representations that capture both the deeper meaning (dense) and specific keywords (sparse). 2. **Embeddings**: The query gets converted into various embeddings—some for understanding the semantics (dense embeddings) and others for focusing on keyword matches (sparse embeddings). 3. **Hybrid Search**: Our hybrid search uses both dense and sparse embeddings to find the most relevant documents. The dense embeddings ensure we capture the overall meaning of the query, while sparse embeddings make sure we don’t miss out on those key, important terms. 4. **Rerank**: Once we’ve got a set of documents, the final step is reranking. This is where late interaction embeddings come into play, giving you results that are not only relevant but tuned to your query by prioritizing the documents that truly meet the user’s intent. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#implementation) Implementation Let’s see it in action in this section. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#additional-setup) Additional Setup This time around, we’re using FastEmbed—a lightweight Python library designed for generating embeddings, and it supports popular text models right out of the box. First things first, you’ll need to install it: ```python pip install fastembed ``` * * * Here are the models we’ll be pulling from FastEmbed: ```python from fastembed import TextEmbedding, LateInteractionTextEmbedding, SparseTextEmbedding ``` * * * ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#ingestion) Ingestion As before, we’ll convert our documents into embeddings, but thanks to FastEmbed, the process is even more straightforward because all the models you need are conveniently available in one location. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#embeddings) Embeddings First, let’s load the models we need: ```python dense_embedding_model = TextEmbedding("sentence-transformers/all-MiniLM-L6-v2") bm25_embedding_model = SparseTextEmbedding("Qdrant/bm25") late_interaction_embedding_model = LateInteractionTextEmbedding("colbert-ir/colbertv2.0") ``` * * * Now, let’s convert our documents into embeddings: ```python dense_embeddings = list(dense_embedding_model.embed(doc for doc in documents)) bm25_embeddings = list(bm25_embedding_model.embed(doc for doc in documents)) late_interaction_embeddings = list(late_interaction_embedding_model.embed(doc for doc in documents)) ``` * * * Since we’re dealing with multiple types of embeddings (dense, sparse, and late interaction), we’ll need to store them in a collection that supports a multi-vector setup. The previous collection we created won’t work here, so we’ll create a new one designed specifically for handling these different types of embeddings. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#create-collection) Create Collection Now, we’re setting up a new collection in Qdrant for our hybrid search with the right configurations to handle all the different vector types we’re working with. Here’s how you do it: ```python from qdrant_client.models import Distance, VectorParams, models client.create_collection( "hybrid-search", vectors_config={ "all-MiniLM-L6-v2": models.VectorParams( size=len(dense_embeddings[0]), distance=models.Distance.COSINE, ), "colbertv2.0": models.VectorParams( size=len(late_interaction_embeddings[0][0]), distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM, ), hnsw_config=models.HnswConfigDiff(m=0) # Disable HNSW for reranking ), }, sparse_vectors_config={ "bm25": models.SparseVectorParams(modifier=models.Modifier.IDF ) } ) ``` * * * What’s happening here? We’re creating a collection called “hybrid-search”, and we’re configuring it to handle: - **Dense embeddings** from the model all-MiniLM-L6-v2 using cosine distance for comparisons. - **Late interaction embeddings** from colbertv2.0, also using cosine distance, but with a multivector configuration to use the maximum similarity comparator. Note that we set `m=0` in the `colbertv2.0` vector to prevent indexing since it’s not needed for reranking. - **Sparse embeddings** from BM25 for keyword-based searches. They use `dot_product` for similarity calculation. This setup ensures that all the different types of vectors are stored and compared correctly for your hybrid search. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#upsert-data) Upsert Data Next, we need to insert the documents along with their multiple embeddings into the **hybrid-search** collection: ```python from qdrant_client.models import PointStruct points = [] for idx, (dense_embedding, bm25_embedding, late_interaction_embedding, doc) in enumerate(zip(dense_embeddings, bm25_embeddings, late_interaction_embeddings, documents)): point = PointStruct( id=idx, vector={ "all-MiniLM-L6-v2": dense_embedding, "bm25": bm25_embedding.as_object(), "colbertv2.0": late_interaction_embedding, }, payload={"document": doc} ) points.append(point) operation_info = client.upsert( collection_name="hybrid-search", points=points ) ``` Upload with implicit embeddings computation ```python from qdrant_client.models import PointStruct points = [] for idx, doc in enumerate(documents): point = PointStruct( id=idx, vector={ "all-MiniLM-L6-v2": models.Document(text=doc, model="sentence-transformers/all-MiniLM-L6-v2"), "bm25": models.Document(text=doc, model="Qdrant/bm25"), "colbertv2.0": models.Document(text=doc, model="colbert-ir/colbertv2.0"), }, payload={"document": doc} ) points.append(point) operation_info = client.upsert( collection_name="hybrid-search", points=points ) ``` * * * This code pulls everything together by creating a list of **PointStruct** objects, each containing the embeddings and corresponding documents. For each document, it adds: - **Dense embeddings** for the deep, semantic meaning. - **BM25 embeddings** for powerful keyword-based search. - **ColBERT embeddings** for precise contextual interactions. Once that’s done, the points are uploaded into our **“hybrid-search”** collection using the upsert method, ensuring everything’s in place. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#retrieval) Retrieval For retrieval, it’s time to convert the user’s query into the required embeddings. Here’s how you can do it: ```python dense_vectors = next(dense_embedding_model.query_embed(query)) sparse_vectors = next(bm25_embedding_model.query_embed(query)) late_vectors = next(late_interaction_embedding_model.query_embed(query)) ``` * * * The real magic of hybrid search lies in the **prefetch** parameter. This lets you run multiple sub-queries in one go, combining the power of dense and sparse embeddings. Here’s how to set it up, after which we execute the hybrid search: ```python prefetch = [\ models.Prefetch(\ query=dense_vectors,\ using="all-MiniLM-L6-v2",\ limit=20,\ ),\ models.Prefetch(\ query=models.SparseVector(**sparse_vectors.as_object()),\ using="bm25",\ limit=20,\ ),\ ] ``` * * * This code kicks off a hybrid search by running two sub-queries: - One using dense embeddings from “all-MiniLM-L6-v2” to capture the semantic meaning of the query. - The other using sparse embeddings from BM25 for strong keyword matching. Each sub-query is limited to 20 results. These sub-queries are bundled together using the prefetch parameter, allowing them to run in parallel. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#rerank) Rerank Now that we’ve got our initial hybrid search results, it’s time to rerank them using late interaction embeddings for maximum precision. Here’s how you can do it: ```python results = client.query_points( "hybrid-search", prefetch=prefetch, query=late_vectors, using="colbertv2.0", with_payload=True, limit=10, ) ``` Query points with implicit embeddings computation ```python prefetch = [\ models.Prefetch(\ query=models.Document(text=query, model="sentence-transformers/all-MiniLM-L6-v2"),\ using="all-MiniLM-L6-v2",\ limit=20,\ ),\ models.Prefetch(\ query=models.Document(text=query, model="Qdrant/bm25"),\ using="bm25",\ limit=20,\ ),\ ] results = client.query_points( "hybrid-search", prefetch=prefetch, query=models.Document(text=query, model="colbert-ir/colbertv2.0"), using="colbertv2.0", with_payload=True, limit=10, ) ``` * * * Let’s look at how the positions change after applying reranking. Notice how some documents shift in rank based on their relevance according to the late interaction embeddings. | | **Document** | **First Query Rank** | **Second Query Rank** | **Rank Change** | | --- | --- | --- | --- | --- | | | In machine learning, feature scaling is the process of normalizing the range of independent variables or features. The goal is to ensure that all features contribute equally to the model, especially in algorithms like SVM or k-nearest neighbors where distance calculations matter. | 1 | 1 | No Change | | | Feature scaling is commonly used in data preprocessing to ensure that features are on the same scale. This is particularly important for gradient descent-based algorithms where features with larger scales could disproportionately impact the cost function. | 2 | 6 | Moved Down | | | Unsupervised learning algorithms, such as clustering methods, may benefit from feature scaling, which ensures that features with larger numerical ranges don’t dominate the learning process. | 3 | 4 | Moved Down | | | Data preprocessing steps, including feature scaling, can significantly impact the performance of machine learning models, making it a crucial part of the modeling pipeline. | 5 | 2 | Moved Up | Great! We’ve now explored how reranking works and successfully implemented it. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#best-practices-in-reranking) Best Practices in Reranking Reranking can dramatically improve the relevance of search results, especially when combined with hybrid search. Here are some best practices to keep in mind: - **Implement Hybrid Reranking**: Blend keyword-based (sparse) and vector-based (dense) search results for a more comprehensive ranking system. - **Continuous Testing and Monitoring**: Regularly evaluate your reranking models to avoid overfitting and make timely adjustments to maintain performance. - **Balance Relevance and Latency**: Reranking can be computationally expensive, so aim for a balance between relevance and speed. Therefore, the first step is to retrieve the relevant documents and then use reranking on it. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/\#conclusion) Conclusion Reranking is a powerful tool that boosts the relevance of search results, especially when combined with hybrid search methods. While it can add some latency due to its complexity, applying it to a smaller, pre-filtered subset of results ensures both speed and relevance. Qdrant offers an easy-to-use API to get started with your own search engine, so if you’re ready to dive in, sign up for free at [Qdrant Cloud](https://qdrant.tech/) and start building ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/reranking-hybrid-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/reranking-hybrid-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-24-lllmstxt|> ## code-search - [Documentation](https://qdrant.tech/documentation/) - [Advanced tutorials](https://qdrant.tech/documentation/advanced-tutorials/) - Search Through Your Codebase --- # [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#navigate-your-codebase-with-semantic-search-and-qdrant) Navigate Your Codebase with Semantic Search and Qdrant | Time: 45 min | Level: Intermediate | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qdrant/examples/blob/master/code-search/code-search.ipynb) | | | --- | --- | --- | --- | You too can enrich your applications with Qdrant semantic search. In this tutorial, we describe how you can use Qdrant to navigate a codebase, to help you find relevant code snippets. As an example, we will use the [Qdrant](https://github.com/qdrant/qdrant) source code itself, which is mostly written in Rust. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#the-approach) The approach We want to search codebases using natural semantic queries, and searching for code based on similar logic. You can set up these tasks with embeddings: 1. General usage neural encoder for Natural Language Processing (NLP), in our case `sentence-transformers/all-MiniLM-L6-v2`. 2. Specialized embeddings for code-to-code similarity search. We use the `jina-embeddings-v2-base-code` model. To prepare our code for `all-MiniLM-L6-v2`, we preprocess the code to text that more closely resembles natural language. The Jina embeddings model supports a variety of standard programming languages, so there is no need to preprocess the snippets. We can use the code as is. NLP-based search is based on function signatures, but code search may return smaller pieces, such as loops. So, if we receive a particular function signature from the NLP model and part of its implementation from the code model, we merge the results and highlight the overlap. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#data-preparation) Data preparation Chunking the application sources into smaller parts is a non-trivial task. In general, functions, class methods, structs, enums, and all the other language-specific constructs are good candidates for chunks. They are big enough to contain some meaningful information, but small enough to be processed by embedding models with a limited context window. You can also use docstrings, comments, and other metadata can be used to enrich the chunks with additional information. ![Code chunking strategy](https://qdrant.tech/documentation/tutorials/code-search/data-chunking.png) ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#parsing-the-codebase) Parsing the codebase While our example uses Rust, you can use our approach with any other language. You can parse code with a [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) ( **LSP**) compatible tool. You can use an LSP to build a graph of the codebase, and then extract chunks. We did our work with the [rust-analyzer](https://rust-analyzer.github.io/). We exported the parsed codebase into the [LSIF](https://microsoft.github.io/language-server-protocol/specifications/lsif/0.4.0/specification/) format, a standard for code intelligence data. Next, we used the LSIF data to navigate the codebase and extract the chunks. For details, see our [code search\\ demo](https://github.com/qdrant/demo-code-search). We then exported the chunks into JSON documents with not only the code itself, but also context with the location of the code in the project. For example, see the description of the `await_ready_for_timeout` function from the `IsReady` struct in the `common` module: ```json { "name":"await_ready_for_timeout", "signature":"fn await_ready_for_timeout (& self , timeout : Duration) -> bool", "code_type":"Function", "docstring":"= \" Return `true` if ready, `false` if timed out.\"", "line":44, "line_from":43, "line_to":51, "context":{ "module":"common", "file_path":"lib/collection/src/common/is_ready.rs", "file_name":"is_ready.rs", "struct_name":"IsReady", "snippet":" /// Return `true` if ready, `false` if timed out.\n pub fn await_ready_for_timeout(&self, timeout: Duration) -> bool {\n let mut is_ready = self.value.lock();\n if !*is_ready {\n !self.condvar.wait_for(&mut is_ready, timeout).timed_out()\n } else {\n true\n }\n }\n" } } ``` You can examine the Qdrant structures, parsed in JSON, in the [`structures.jsonl`\\ file](https://storage.googleapis.com/tutorial-attachments/code-search/structures.jsonl) in our Google Cloud Storage bucket. Download it and use it as a source of data for our code search. ```shell wget https://storage.googleapis.com/tutorial-attachments/code-search/structures.jsonl ``` Next, load the file and parse the lines into a list of dictionaries: ```python import json structures = [] with open("structures.jsonl", "r") as fp: for i, row in enumerate(fp): entry = json.loads(row) structures.append(entry) ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#code-to-natural-language-conversion) Code to _natural language_ conversion Each programming language has its own syntax which is not a part of the natural language. Thus, a general-purpose model probably does not understand the code as is. We can, however, normalize the data by removing code specifics and including additional context, such as module, class, function, and file name. We took the following steps: 1. Extract the signature of the function, method, or other code construct. 2. Divide camel case and snake case names into separate words. 3. Take the docstring, comments, and other important metadata. 4. Build a sentence from the extracted data using a predefined template. 5. Remove the special characters and replace them with spaces. As input, expect dictionaries with the same structure. Define a `textify` function to do the conversion. We’ll use an `inflection` library to convert with different naming conventions. ```shell pip install inflection ``` Once all dependencies are installed, we define the `textify` function: ```python import inflection import re from typing import Dict, Any def textify(chunk: Dict[str, Any]) -> str: # Get rid of all the camel case / snake case # - inflection.underscore changes the camel case to snake case # - inflection.humanize converts the snake case to human readable form name = inflection.humanize(inflection.underscore(chunk["name"])) signature = inflection.humanize(inflection.underscore(chunk["signature"])) # Check if docstring is provided docstring = "" if chunk["docstring"]: docstring = f"that does {chunk['docstring']} " # Extract the location of that snippet of code context = ( f"module {chunk['context']['module']} " f"file {chunk['context']['file_name']}" ) if chunk["context"]["struct_name"]: struct_name = inflection.humanize( inflection.underscore(chunk["context"]["struct_name"]) ) context = f"defined in struct {struct_name} {context}" # Combine all the bits and pieces together text_representation = ( f"{chunk['code_type']} {name} " f"{docstring}" f"defined as {signature} " f"{context}" ) # Remove any special characters and concatenate the tokens tokens = re.split(r"\W", text_representation) tokens = filter(lambda x: x, tokens) return " ".join(tokens) ``` Now we can use `textify` to convert all chunks into text representations: ```python text_representations = list(map(textify, structures)) ``` This is how the `await_ready_for_timeout` function description appears: ```text Function Await ready for timeout that does Return true if ready false if timed out defined as Fn await ready for timeout self timeout duration bool defined in struct Is ready module common file is_ready rs ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#ingestion-pipeline) Ingestion pipeline Next, we’ll build a pipeline for vectorizing the data and set up a semantic search mechanism for both embedding models. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#building-qdrant-collection) Building Qdrant collection We use the `qdrant-client` library with the `fastembed` extra to interact with the Qdrant server and generate vector embeddings locally. Let’s install it: ```shell pip install "qdrant-client[fastembed]" ``` Of course, we need a running Qdrant server for vector search. If you need one, you can [use a local Docker container](https://qdrant.tech/documentation/quick-start/) or deploy it using the [Qdrant Cloud](https://cloud.qdrant.io/). You can use either to follow this tutorial. Configure the connection parameters: ```python QDRANT_URL = "https://my-cluster.cloud.qdrant.io:6333" # http://localhost:6333 for local instance QDRANT_API_KEY = "THIS_IS_YOUR_API_KEY" # None for local instance ``` Then use the library to create a collection: ```python from qdrant_client import QdrantClient, models client = QdrantClient(QDRANT_URL, api_key=QDRANT_API_KEY) client.create_collection( "qdrant-sources", vectors_config={ "text": models.VectorParams( size=client.get_embedding_size( model_name="sentence-transformers/all-MiniLM-L6-v2" ), distance=models.Distance.COSINE, ), "code": models.VectorParams( size=client.get_embedding_size( model_name="jinaai/jina-embeddings-v2-base-code" ), distance=models.Distance.COSINE, ), }, ) ``` Our newly created collection is ready to accept the data. Let’s upload the embeddings: ```python import uuid --- # Extract the code snippets from the structures to a separate list code_snippets = [\ structure["context"]["snippet"] for structure in structures\ ] points = [\ models.PointStruct(\ id=uuid.uuid4().hex,\ vector={\ "text": models.Document(\ text=text, model="sentence-transformers/all-MiniLM-L6-v2"\ ),\ "code": models.Document(\ text=code, model="jinaai/jina-embeddings-v2-base-code"\ ),\ },\ payload=structure,\ )\ for text, code, structure in zip(text_representations, code_snippets, structures)\ ] --- # Note: This might take a while since inference happens implicitly. --- # But too many processes may trigger swap memory and hurt performance. client.upload_points("qdrant-sources", points=points, batch_size=64) ``` Internally, `qdrant-client` uses [FastEmbed](https://github.com/qdrant/fastembed) to implicitly convert our documents into their vector representations. The uploaded points are immediately available for search. Next, query the collection to find relevant code snippets. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#querying-the-codebase) Querying the codebase We use one of the models to search the collection. Start with text embeddings. Run the following query “ _How do I count points in a collection?_”. Review the results. ```python query = "How do I count points in a collection?" hits = client.query_points( "qdrant-sources", query=models.Document(text=query, model="sentence-transformers/all-MiniLM-L6-v2"), using="text", limit=5, ).points ``` Now, review the results. The following table lists the module, the file name and score. Each line includes a link to the signature, as a code block from the file. | module | file\_name | score | signature | | --- | --- | --- | --- | | toc | point\_ops.rs | 0.59448624 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub async fn count`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/storage/src/content_manager/toc/point_ops.rs#L120) | | operations | types.rs | 0.5493385 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub struct CountRequestInternal`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/collection/src/operations/types.rs#L831) | | collection\_manager | segments\_updater.rs | 0.5121002 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub(crate) fn upsert_points<'a, T>`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/collection/src/collection_manager/segments_updater.rs#L339) | | collection | point\_ops.rs | 0.5063539 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub async fn count`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/collection/src/collection/point_ops.rs#L213) | | map\_index | mod.rs | 0.49973983 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn get_points_with_value_count`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/segment/src/index/field_index/map_index/mod.rs#L88) | It seems we were able to find some relevant code structures. Let’s try the same with the code embeddings: ```python hits = client.query_points( "qdrant-sources", query=models.Document(text=query, model="jinaai/jina-embeddings-v2-base-code"), using="code", limit=5, ).points ``` Output: | module | file\_name | score | signature | | --- | --- | --- | --- | | field\_index | geo\_index.rs | 0.73278356 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/segment/src/index/field_index/geo_index.rs#L612) | | numeric\_index | mod.rs | 0.7254976 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/numeric_index/mod.rs#L322) | | map\_index | mod.rs | 0.7124739 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/map_index/mod.rs#L315) | | map\_index | mod.rs | 0.7124739 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/map_index/mod.rs#L429) | | fixtures | payload\_context\_fixture.rs | 0.706204 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn total_point_count`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/fixtures/payload_context_fixture.rs#L122) | While the scores retrieved by different models are not comparable, but we can see that the results are different. Code and text embeddings can capture different aspects of the codebase. We can use both models to query the collection and then combine the results to get the most relevant code snippets, from a single batch request. ```python responses = client.query_batch_points( collection_name="qdrant-sources", requests=[\ models.QueryRequest(\ query=models.Document(\ text=query, model="sentence-transformers/all-MiniLM-L6-v2"\ ),\ using="text",\ with_payload=True,\ limit=5,\ ),\ models.QueryRequest(\ query=models.Document(\ text=query, model="jinaai/jina-embeddings-v2-base-code"\ ),\ using="code",\ with_payload=True,\ limit=5,\ ),\ ], ) results = [response.points for response in responses] ``` Output: | module | file\_name | score | signature | | --- | --- | --- | --- | | toc | point\_ops.rs | 0.59448624 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub async fn count`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/storage/src/content_manager/toc/point_ops.rs#L120) | | operations | types.rs | 0.5493385 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub struct CountRequestInternal`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/collection/src/operations/types.rs#L831) | | collection\_manager | segments\_updater.rs | 0.5121002 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub(crate) fn upsert_points<'a, T>`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/collection/src/collection_manager/segments_updater.rs#L339) | | collection | point\_ops.rs | 0.5063539 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`pub async fn count`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/collection/src/collection/point_ops.rs#L213) | | map\_index | mod.rs | 0.49973983 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn get_points_with_value_count`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/segment/src/index/field_index/map_index/mod.rs#L88) | | field\_index | geo\_index.rs | 0.73278356 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/segment/src/index/field_index/geo_index.rs#L612) | | numeric\_index | mod.rs | 0.7254976 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/numeric_index/mod.rs#L322) | | map\_index | mod.rs | 0.7124739 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/map_index/mod.rs#L315) | | map\_index | mod.rs | 0.7124739 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/map_index/mod.rs#L429) | | fixtures | payload\_context\_fixture.rs | 0.706204 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn total_point_count`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/fixtures/payload_context_fixture.rs#L122) | This is one example of how you can use different models and combine the results. In a real-world scenario, you might run some reranking and deduplication, as well as additional processing of the results. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#code-search-demo) Code search demo Our [Code search demo](https://code-search.qdrant.tech/) uses the following process: 1. The user sends a query. 2. Both models vectorize that query simultaneously. We get two different vectors. 3. Both vectors are used in parallel to find relevant snippets. We expect 5 examples from the NLP search and 20 examples from the code search. 4. Once we retrieve results for both vectors, we merge them in one of the following scenarios: 1. If both methods return different results, we prefer the results from the general usage model (NLP). 2. If there is an overlap between the search results, we merge overlapping snippets. In the screenshot, we search for `flush of wal`. The result shows relevant code, merged from both models. Note the highlighted code in lines 621-629. It’s where both models agree. ![Results from both models, with overlap](https://qdrant.tech/documentation/tutorials/code-search/code-search-demo-example.png) Now you see semantic code intelligence, in action. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#grouping-the-results) Grouping the results You can improve the search results, by grouping them by payload properties. In our case, we can group the results by the module. If we use code embeddings, we can see multiple results from the `map_index` module. Let’s group the results and assume a single result per module: ```python results = client.query_points_groups( collection_name="qdrant-sources", using="code", query=models.Document(text=query, model="jinaai/jina-embeddings-v2-base-code"), group_by="context.module", limit=5, group_size=1, ) ``` Output: | module | file\_name | score | signature | | --- | --- | --- | --- | | field\_index | geo\_index.rs | 0.73278356 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/7aa164bd2dda1c0fc9bf3a0da42e656c95c2e52a/lib/segment/src/index/field_index/geo_index.rs#L612) | | numeric\_index | mod.rs | 0.7254976 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/numeric_index/mod.rs#L322) | | map\_index | mod.rs | 0.7124739 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn count_indexed_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/field_index/map_index/mod.rs#L315) | | fixtures | payload\_context\_fixture.rs | 0.706204 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn total_point_count`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/fixtures/payload_context_fixture.rs#L122) | | hnsw\_index | graph\_links.rs | 0.6998417 | [![](https://qdrant.tech/documentation/tutorials/code-search/github-mark.png)`fn num_points`](https://github.com/qdrant/qdrant/blob/3fbe1cae6cb7f51a0c5bb4b45cfe6749ac76ed59/lib/segment/src/index/hnsw_index/graph_links.rs#L477) | With the grouping feature, we get more diverse results. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/code-search/\#summary) Summary This tutorial demonstrates how to use Qdrant to navigate a codebase. For an end-to-end implementation, review the [code search\\ notebook](https://colab.research.google.com/github/qdrant/examples/blob/master/code-search/code-search.ipynb) and the [code-search-demo](https://github.com/qdrant/demo-code-search). You can also check out [a running version of the code\\ search demo](https://code-search.qdrant.tech/) which exposes Qdrant codebase for search with a web interface. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/code-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/code-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-25-lllmstxt|> ## product-quantization - [Articles](https://qdrant.tech/articles/) - Product Quantization in Vector Search \| Qdrant [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Product Quantization in Vector Search \| Qdrant Kacper Łukawski · May 30, 2023 ![Product Quantization in Vector Search | Qdrant](https://qdrant.tech/articles_data/product-quantization/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/product-quantization/\#product-quantization-demystified-streamlining-efficiency-in-data-management) Product Quantization Demystified: Streamlining Efficiency in Data Management Qdrant 1.1.0 brought the support of [Scalar Quantization](https://qdrant.tech/articles/scalar-quantization/), a technique of reducing the memory footprint by even four times, by using `int8` to represent the values that would be normally represented by `float32`. The memory usage in [vector search](https://qdrant.tech/solutions/) might be reduced even further! Please welcome **Product** **Quantization**, a brand-new feature of Qdrant 1.2.0! ## [Anchor](https://qdrant.tech/articles/product-quantization/\#what-is-product-quantization) What is Product Quantization? Product Quantization converts floating-point numbers into integers like every other quantization method. However, the process is slightly more complicated than [Scalar Quantization](https://qdrant.tech/articles/scalar-quantization/) and is more customizable, so you can find the sweet spot between memory usage and search precision. This article covers all the steps required to perform Product Quantization and the way it’s implemented in Qdrant. ## [Anchor](https://qdrant.tech/articles/product-quantization/\#how-does-product-quantization-work) How Does Product Quantization Work? Let’s assume we have a few vectors being added to the collection and that our optimizer decided to start creating a new segment. ![A list of raw vectors](https://qdrant.tech/articles_data/product-quantization/raw-vectors.png) ### [Anchor](https://qdrant.tech/articles/product-quantization/\#cutting-the-vector-into-pieces) Cutting the vector into pieces First of all, our vectors are going to be divided into **chunks** aka **subvectors**. The number of chunks is configurable, but as a rule of thumb - the lower it is, the higher the compression rate. That also comes with reduced search precision, but in some cases, you may prefer to keep the memory usage as low as possible. ![A list of chunked vectors](https://qdrant.tech/articles_data/product-quantization/chunked-vectors.png) Qdrant API allows choosing the compression ratio from 4x up to 64x. In our example, we selected 16x, so each subvector will consist of 4 floats (16 bytes), and it will eventually be represented by a single byte. ### [Anchor](https://qdrant.tech/articles/product-quantization/\#clustering) Clustering The chunks of our vectors are then used as input for clustering. Qdrant uses the K-means algorithm, with K=256. It was selected a priori, as this is the maximum number of values a single byte represents. As a result, we receive a list of 256 centroids for each chunk and assign each of them a unique id. **The clustering is done separately for each group of chunks.** ![Clustered chunks of vectors](https://qdrant.tech/articles_data/product-quantization/chunks-clustering.png) Each chunk of a vector might now be mapped to the closest centroid. That’s where we lose the precision, as a single point will only represent a whole subspace. Instead of using a subvector, we can store the id of the closest centroid. If we repeat that for each chunk, we can approximate the original embedding as a vector of subsequent ids of the centroids. The dimensionality of the created vector is equal to the number of chunks, in our case 2. ![A new vector built from the ids of the centroids](https://qdrant.tech/articles_data/product-quantization/vector-of-ids.png) ### [Anchor](https://qdrant.tech/articles/product-quantization/\#full-process) Full process All those steps build the following pipeline of Product Quantization: ![Full process of Product Quantization](https://qdrant.tech/articles_data/product-quantization/full-process.png) ## [Anchor](https://qdrant.tech/articles/product-quantization/\#measuring-the-distance) Measuring the distance Vector search relies on the distances between the points. Enabling Product Quantization slightly changes the way it has to be calculated. The query vector is divided into chunks, and then we figure the overall distance as a sum of distances between the subvectors and the centroids assigned to the specific id of the vector we compare to. We know the coordinates of the centroids, so that’s easy. ![Calculating the distance of between the query and the stored vector](https://qdrant.tech/articles_data/product-quantization/distance-calculation.png) #### [Anchor](https://qdrant.tech/articles/product-quantization/\#qdrant-implementation) Qdrant implementation Search operation requires calculating the distance to multiple points. Since we calculate the distance to a finite set of centroids, those might be precomputed and reused. Qdrant creates a lookup table for each query, so it can then simply sum up several terms to measure the distance between a query and all the centroids. | | Centroid 0 | Centroid 1 | … | | --- | --- | --- | --- | | **Chunk 0** | 0.14213 | 0.51242 | | | **Chunk 1** | 0.08421 | 0.00142 | | | **…** | … | … | … | ## [Anchor](https://qdrant.tech/articles/product-quantization/\#product-quantization-benchmarks) Product Quantization Benchmarks Product Quantization comes with a cost - there are some additional operations to perform so that the performance might be reduced. However, memory usage might be reduced drastically as well. As usual, we did some benchmarks to give you a brief understanding of what you may expect. Again, we reused the same pipeline as in [the other benchmarks we published](https://qdrant.tech/benchmarks/). We selected [Arxiv-titles-384-angular-no-filters](https://github.com/qdrant/ann-filtering-benchmark-datasets) and [Glove-100](https://github.com/erikbern/ann-benchmarks/) datasets to measure the impact of Product Quantization on precision and time. Both experiments were launched with EF=128. The results are summarized in the tables: #### [Anchor](https://qdrant.tech/articles/product-quantization/\#glove-100) Glove-100 | | Original | 1D clusters | 2D clusters | 3D clusters | | --- | --- | --- | --- | --- | | Mean precision | 0.7158 | 0.7143 | 0.6731 | 0.5854 | | Mean search time | 2336 µs | 2750 µs | 2597 µs | 2534 µs | | Compression | x1 | x4 | x8 | x12 | | Upload & indexing time | 147 s | 339 s | 217 s | 178 s | Product Quantization increases both indexing and searching time. The higher the compression ratio, the lower the search precision. The main benefit is undoubtedly the reduced usage of memory. #### [Anchor](https://qdrant.tech/articles/product-quantization/\#arxiv-titles-384-angular-no-filters) Arxiv-titles-384-angular-no-filters | | Original | 1D clusters | 2D clusters | 4D clusters | 8D clusters | | --- | --- | --- | --- | --- | --- | | Mean precision | 0.9837 | 0.9677 | 0.9143 | 0.8068 | 0.6618 | | Mean search time | 2719 µs | 4134 µs | 2947 µs | 2175 µs | 2053 µs | | Compression | x1 | x4 | x8 | x16 | x32 | | Upload & indexing time | 332 s | 921 s | 597 s | 481 s | 474 s | It turns out that in some cases, Product Quantization may not only reduce the memory usage, but also the search time. ## [Anchor](https://qdrant.tech/articles/product-quantization/\#product-quantization-vs-scalar-quantization) Product Quantization vs Scalar Quantization Compared to [Scalar Quantization](https://qdrant.tech/articles/scalar-quantization/), Product Quantization offers a higher compression rate. However, this comes with considerable trade-offs in accuracy, and at times, in-RAM search speed. Product Quantization tends to be favored in certain specific scenarios: - Deployment in a low-RAM environment where the limiting factor is the number of disk reads rather than the vector comparison itself - Situations where the dimensionality of the original vectors is sufficiently high - Cases where indexing speed is not a critical factor In circumstances that do not align with the above, Scalar Quantization should be the preferred choice. ## [Anchor](https://qdrant.tech/articles/product-quantization/\#using-qdrant-for-product-quantization) Using Qdrant for Product Quantization If you’re already a Qdrant user, we have, documentation on [Product Quantization](https://qdrant.tech/documentation/guides/quantization/#setting-up-product-quantization) that will help you to set and configure the new quantization for your data and achieve even up to 64x memory reduction. Ready to experience the power of Product Quantization? [Sign up now](https://cloud.qdrant.io/signup) for a free Qdrant demo and optimize your data management today! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/product-quantization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/product-quantization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-26-lllmstxt|> ## platforms - [Documentation](https://qdrant.tech/documentation/) - Platforms ## [Anchor](https://qdrant.tech/documentation/platforms/\#platform-integrations) Platform Integrations | Platform | Description | | --- | --- | | [Apify](https://qdrant.tech/documentation/platforms/apify/) | Platform to build web scrapers and automate web browser tasks. | | [Bubble](https://qdrant.tech/documentation/platforms/bubble/) | Development platform for application development with a no-code interface | | [BuildShip](https://qdrant.tech/documentation/platforms/buildship/) | Low-code visual builder to create APIs, scheduled jobs, and backend workflows. | | [DocsGPT](https://qdrant.tech/documentation/platforms/docsgpt/) | Tool for ingesting documentation sources and enabling conversations and queries. | | [Keboola](https://qdrant.tech/documentation/platforms/keboola/) | Data operations platform that unifies data sources, transformations, and ML deployments. | | [Kotaemon](https://qdrant.tech/documentation/platforms/kotaemon/) | Open-source & customizable RAG UI for chatting with your documents. | | [Make](https://qdrant.tech/documentation/platforms/make/) | Cloud platform to build low-code workflows by integrating various software applications. | | [Mulesoft Anypoint](https://qdrant.tech/documentation/platforms/mulesoft/) | Integration platform to connect applications, data, and devices across environments. | | [N8N](https://qdrant.tech/documentation/platforms/n8n/) | Platform for node-based, low-code workflow automation. | | [Pipedream](https://qdrant.tech/documentation/platforms/pipedream/) | Platform for connecting apps and developing event-driven automation. | | [Portable.io](https://qdrant.tech/documentation/platforms/portable/) | Cloud platform for developing and deploying ELT transformations. | | [PrivateGPT](https://qdrant.tech/documentation/platforms/privategpt/) | Tool to ask questions about your documents using local LLMs emphasising privacy. | | [Rivet](https://qdrant.tech/documentation/platforms/rivet/) | A visual programming environment for building AI agents with LLMs. | | [ToolJet](https://qdrant.tech/documentation/platforms/tooljet/) | A low-code platform for business apps that connect to DBs, cloud storages and more. | | [Vectorize](https://qdrant.tech/documentation/platforms/vectorize/) | Platform to automate data extraction, RAG evaluation, deploy RAG pipelines. | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/platforms/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/platforms/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-27-lllmstxt|> ## qdrant-cluster-management - [Documentation](https://qdrant.tech/documentation/) - [Private cloud](https://qdrant.tech/documentation/private-cloud/) - Managing a Cluster --- # [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#managing-a-qdrant-cluster) Managing a Qdrant Cluster The most minimal QdrantCluster configuration is: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840 labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" version: "v1.11.3" size: 1 resources: cpu: 100m memory: "1Gi" storage: "2Gi" ``` The `id` should be unique across all Qdrant clusters in the same namespace, the `name` must follow the above pattern and the `cluster-id` and `customer-id` labels are mandatory. There are lots more configuration options to configure scheduling, security, networking, and more. For full details see the [Qdrant Private Cloud API Reference](https://qdrant.tech/documentation/private-cloud/api-reference/). ## [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#scaling-a-cluster) Scaling a Cluster To scale a cluster, update the CPU, memory and storage resources in the QdrantCluster spec. The Qdrant operator will automatically adjust the cluster configuration. This operation is highly available on a multi-node cluster with replicated collections. ## [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#upgrading-the-qdrant-version) Upgrading the Qdrant version To upgrade the Qdrant version of a database cluster, update the `version` field in the QdrantCluster spec. The Qdrant operator will automatically upgrade the cluster to the new version. The upgrade process is highly available on a multi-node cluster with replicated collections. Note, that you should not skip minor versions when upgrading. For example, if you are running version `v1.11.3`, you can upgrade to `v1.11.5` or `v1.12.6`, but not directly to `v1.13.0`. ## [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#exposing-a-cluster) Exposing a Cluster By default, a QdrantCluster will be exposed through an internal `ClusterIP` service. To expose the cluster to the outside world, you can create a `NodePort` service, a `LoadBalancer` service or an `Ingress` resource. This is an example on how to create a QdrantCluster with a `LoadBalancer` service: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840 labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" version: "v1.11.3" size: 1 resources: cpu: 100m memory: "1Gi" storage: "2Gi" service: type: LoadBalancer annotations: service.beta.kubernetes.io/aws-load-balancer-type: nlb ``` Especially if you create a LoadBalancer Service, you may need to provide annotations for the loadbalancer configration. Please refer to the documention of your cloud provider for more details. Examples: - [AWS EKS LoadBalancer annotations](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/) - [Azure AKS Public LoadBalancer annotations](https://learn.microsoft.com/en-us/azure/aks/load-balancer-standard) - [Azure AKS Internal LoadBalancer annotations](https://learn.microsoft.com/en-us/azure/aks/internal-lb) - [GCP GKE LoadBalancer annotations](https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer-parameters) ## [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#authentication-and-authorization) Authentication and Authorization Authentication information is provided by Kubernetes secrets. One way to create a secret is with kubectl: ```shell kubectl create secret generic qdrant-api-key --from-literal=api-key=your-secret-api-key --from-literal=read-only-api-key=your-secret-read-only-api-key --namespace qdrant-private-cloud ``` The resulting secret will look like this: ```yaml apiVersion: v1 data: api-key: ... read-only-api-key: ... kind: Secret metadata: name: qdrant-api-key namespace: qdrant-private-cloud type: kubernetes.io/generic ``` You can reference the secret in the QdrantCluster spec: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840 labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" version: "v1.11.3" size: 1 resources: cpu: 100m memory: "1Gi" storage: "2Gi" config: service: api_key: secretKeyRef: name: qdrant-api-key key: api-key read_only_api_key: secretKeyRef: name: qdrant-api-key key: read-only-api-key jwt_rbac: true ``` If you set the `jwt_rbac` flag, you will also be able to create granular [JWT tokens for role based access control](https://qdrant.tech/documentation/guides/security/#granular-access-control-with-jwt). ### [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#configuring-tls-for-database-access) Configuring TLS for Database Access If you want to configure TLS for accessing your Qdrant database, there are two options: - You can offload TLS at the ingress or loadbalancer level. - You can configure TLS directly in the Qdrant database. If you want to configure TLS directly in the Qdrant database, you can provide this as a secret. To create such a secret, you can use `kubectl`: ```shell kubectl create secret tls qdrant-tls --cert=mydomain.com.crt --key=mydomain.com.key --namespace the-qdrant-namespace ``` The resulting secret will look like this: ```yaml apiVersion: v1 data: tls.crt: ... tls.key: ... kind: Secret metadata: name: qdrant-tls namespace: the-qdrant-namespace type: kubernetes.io/tls ``` You can reference the secret in the QdrantCluster spec: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: test-cluster spec: id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" version: "v1.11.3" size: 1 resources: cpu: 100m memory: "1Gi" storage: "2Gi" config: service: enable_tls: true tls: cert: secretKeyRef: name: qdrant-tls key: tls.crt key: secretKeyRef: name: qdrant-tls key: tls.key ``` ### [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#configuring-tls-for-inter-cluster-communication) Configuring TLS for Inter-cluster Communication _Available as of Operator v2.2.0_ If you want to encrypt communication between Qdrant nodes, you need to enable TLS by providing certificate, key, and root CA certificate used for generating the former. Similar to the instruction stated in the previous section, you need to create a secret: ```shell kubectl create secret generic qdrant-p2p-tls \ --from-file=tls.crt=qdrant-nodes.crt \ --from-file=tls.key=qdrant-nodes.key \ --from-file=ca.crt=root-ca.crt --namespace the-qdrant-namespace ``` The resulting secret will look like this: ```yaml apiVersion: v1 data: tls.crt: ... tls.key: ... ca.crt: ... kind: Secret metadata: name: qdrant-p2p-tls namespace: the-qdrant-namespace type: Opaque ``` You can reference the secret in the QdrantCluster spec: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: test-cluster labels: cluster-id: "my-cluster" customer-id: "acme-industries" spec: id: "my-cluster" version: "v1.13.3" size: 2 resources: cpu: 100m memory: "1Gi" storage: "2Gi" config: service: enable_tls: true tls: caCert: secretKeyRef: name: qdrant-p2p-tls key: ca.crt cert: secretKeyRef: name: qdrant-p2p-tls key: tls.crt key: secretKeyRef: name: qdrant-p2p-tls key: tls.key ``` ## [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#gpu-support) GPU support Starting with Qdrant 1.13 and private-cloud version 1.6.1 you can create a cluster that uses GPUs to accelarate indexing. As a prerequisite, you need to have a Kubernetes cluster with GPU support. You can check the [Kubernetes documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/) for generic information on GPUs and Kubernetes, or the documentation of your specific Kubernetes distribution. Examples: - [AWS EKS GPU support](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/amazon-eks.html) - [Azure AKS GPU support](https://docs.microsoft.com/en-us/azure/aks/gpu-cluster) - [GCP GKE GPU support](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus) - [Vultr Kubernetes GPU support](https://blogs.vultr.com/whats-new-vultr-q2-2023) Once you have a Kubernetes cluster with GPU support, you can create a QdrantCluster with GPU support: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840 labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" version: "v1.13.4" size: 1 resources: cpu: 2 memory: "8Gi" storage: "40Gi" gpu: gpuType: "nvidia" ``` Once the cluster Pod has started, you can check in the logs if the GPU is detected: ```shell $ kubectl logs qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840-0 Starting initializing for pod 0 _ _ __ _ __| |_ __ __ _ _ __ | |_ / _` |/ _` | '__/ _` | '_ \| __| | (_| | (_| | | | (_| | | | | |_ \__, |\__,_|_| \__,_|_| |_|\__| |_| Version: 1.13.4, build: 7abc6843 Access web UI at http://localhost:6333/dashboard 2025-03-14T10:25:30.509636Z INFO gpu::instance: Found GPU device: NVIDIA A16-2Q 2025-03-14T10:25:30.509679Z INFO gpu::instance: Found GPU device: llvmpipe (LLVM 15.0.7, 256 bits) 2025-03-14T10:25:30.509734Z INFO gpu::device: Create GPU device NVIDIA A16-2Q ... ``` For more GPU configuration options, see the [Qdrant Private Cloud API Reference](https://qdrant.tech/documentation/private-cloud/api-reference/). ## [Anchor](https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/\#ephemeral-snapshot-volumes) Ephemeral Snapshot Volumes If you do not [create snapshots](https://api.qdrant.tech/api-reference/snapshots/create-snapshot), or there is no need to keep them available after cluster restart, the snapshot storage classname can be set to `emptyDir`: ```yaml apiVersion: qdrant.io/v1 kind: QdrantCluster metadata: name: qdrant-a7d8d973-0cc5-42de-8d7b-c29d14d24840 labels: cluster-id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" customer-id: "acme-industries" spec: id: "a7d8d973-0cc5-42de-8d7b-c29d14d24840" version: "v1.13.4" size: 1 resources: cpu: 2 memory: "8Gi" storage: "40Gi" storageClassNames: snapshots: emptyDir ``` See [Kubernetes docs on emptyDir volumes](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) for more details, on how k8s node ephemeral storage is allocated and used. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/qdrant-cluster-management.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/qdrant-cluster-management.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-28-lllmstxt|> ## cloud-account-setup - [Documentation](https://qdrant.tech/documentation/) - Account Setup --- # [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#setting-up-a-qdrant-cloud-account) Setting up a Qdrant Cloud Account ## [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#registration) Registration There are different ways to register for a Qdrant Cloud account: - With an email address and passwordless login via email - With a Google account - With a GitHub account - By connection an enterprise SSO solution Every account is tied to an email address. You can invite additional users to your account and manage their permissions. ### [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#email-registration) Email registration 1. Register for a [Cloud account](https://cloud.qdrant.io/signup) with your email, Google or GitHub credentials. ## [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#inviting-additional-users-to-an-account) Inviting additional users to an account You can invite additional users to your account, and manage their permissions on the **Account -> Access Management** page in the Qdrant Cloud Console. ![Invitations](https://qdrant.tech/documentation/cloud/invitations.png) Invited users will receive an email with an invitation link to join Qdrant Cloud. Once they signed up, they can accept the invitation from the Overview page. ![Accepting invitation](https://qdrant.tech/documentation/cloud/accept-invitation.png) ## [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#switching-between-accounts) Switching between accounts If you have access to multiple accounts, you can switch between accounts with the account switcher on the top menu bar of the Qdrant Cloud Console. ![Switching between accounts](https://qdrant.tech/documentation/cloud/account-switcher.png) ## [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#light--dark-mode) Light & Dark Mode The Qdrant Cloud Console supports light and dark mode. You can switch between the two modes in the _Settings_ menu, by clicking on your account picture in the top right corner. ![Light & Dark Mode](https://qdrant.tech/documentation/cloud/light-dark-mode.png) ## [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#account-settings) Account settings You can configure your account settings in the Qdrant Cloud Console on the **Account -> Settings** page. The following functionality is available. ### [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#renaming-an-account) Renaming an account If you use multiple accounts for different purposes, it is a good idea to give them descriptive names, for example _Development_, _Production_, _Testing_. You can also choose which account should be the default one, when you log in. ![Account management](https://qdrant.tech/documentation/cloud/account-management.png) ### [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#deleting-an-account) Deleting an account When you delete an account, all database clusters and associated data will be deleted. ![Delete Account](https://qdrant.tech/documentation/cloud/account-delete.png) ## [Anchor](https://qdrant.tech/documentation/cloud-account-setup/\#enterprise-single-sign-on-sso) Enterprise Single-Sign-On (SSO) Qdrant Cloud supports Enterprise Single-Sign-On for Premium Tier customers. The following providers are supported: - Active Directory/LDAP - ADFS - Azure Active Directory Native - Google Workspace - OpenID Connect - Okta - PingFederate - SAML - Azure Active Directory Enterprise Sign-On is available as an add-on for [Premium Tier](https://qdrant.tech/documentation/cloud/premium/) customers. If you are interested in using SSO, please [contact us](https://qdrant.tech/contact-us/). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-account-setup.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-account-setup.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-29-lllmstxt|> ## rag-contract-management-stackit-aleph-alpha - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Region-Specific Contract Management System --- # [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#region-specific-contract-management-system) Region-Specific Contract Management System | Time: 90 min | Level: Advanced | | | | --- | --- | --- | --- | Contract management benefits greatly from Retrieval Augmented Generation (RAG), streamlining the handling of lengthy business contract texts. With AI assistance, complex questions can be asked and well-informed answers generated, facilitating efficient document management. This proves invaluable for businesses with extensive relationships, like shipping companies, construction firms, and consulting practices. Access to such contracts is often restricted to authorized team members due to security and regulatory requirements, such as GDPR in Europe, necessitating secure storage practices. Companies want their data to be kept and processed within specific geographical boundaries. For that reason, this RAG-centric tutorial focuses on dealing with a region-specific cloud provider. You will set up a contract management system using [Aleph Alpha’s](https://aleph-alpha.com/) embeddings and LLM. You will host everything on [STACKIT](https://www.stackit.de/), a German business cloud provider. On this platform, you will run Qdrant Hybrid Cloud as well as the rest of your RAG application. This setup will ensure that your data is stored and processed in Germany. ![Architecture diagram](https://qdrant.tech/documentation/examples/contract-management-stackit-aleph-alpha/architecture-diagram.png) ## [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#components) Components A contract management platform is not a simple CLI tool, but an application that should be available to all team members. It needs an interface to upload, search, and manage the documents. Ideally, the system should be integrated with org’s existing stack, and the permissions/access controls inherited from LDAP or Active Directory. > **Note:** In this tutorial, we are going to build a solid foundation for such a system. However, it is up to your organization’s setup to implement the entire solution. - **Dataset** \- a collection of documents, using different formats, such as PDF or DOCx, scraped from internet - **Asymmetric semantic embeddings** \- [Aleph Alpha embedding](https://docs.aleph-alpha.com/api/pharia-inference/semantic-embed/) to convert the queries and the documents into vectors - **Large Language Model** \- the [Luminous-extended-control\\ model](https://docs.aleph-alpha.com/api/pharia-inference/available-models/), but you can play with a different one from the Luminous family - **Qdrant Hybrid Cloud** \- a knowledge base to store the vectors and search over the documents - **STACKIT** \- a [German business cloud](https://www.stackit.de/) to run the Qdrant Hybrid Cloud and the application processes We will implement the process of uploading the documents, converting them into vectors, and storing them in Qdrant. Then, we will build a search interface to query the documents and get the answers. All that, assuming the user interacts with the system with some set of permissions, and can only access the documents they are allowed to. ## [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#prerequisites) Prerequisites ### [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#aleph-alpha-account) Aleph Alpha account Since you will be using Aleph Alpha’s models, [sign up](https://aleph-alpha.com/) with their managed service and obtain an API token. Once you have it ready, store it as an environment variable: shellpython ```shell export ALEPH_ALPHA_API_KEY="" ``` ```python import os os.environ["ALEPH_ALPHA_API_KEY"] = "" ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#qdrant-hybrid-cloud-on-stackit) Qdrant Hybrid Cloud on STACKIT Please refer to our documentation to see [how to deploy Qdrant Hybrid Cloud on\\ STACKIT](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/#stackit). Once you finish the deployment, you will have the API endpoint to interact with the Qdrant server. Let’s store it in the environment variable as well: shellpython ```shell export QDRANT_URL="https://qdrant.example.com" export QDRANT_API_KEY="your-api-key" ``` ```python os.environ["QDRANT_URL"] = "https://qdrant.example.com" os.environ["QDRANT_API_KEY"] = "your-api-key" ``` Qdrant will be running on a specific URL and access will be restricted by the API key. Make sure to store them both as environment variables as well: _Optional:_ Whenever you use LangChain, you can also [configure LangSmith](https://docs.smith.langchain.com/), which will help us trace, monitor and debug LangChain applications. You can sign up for LangSmith [here](https://smith.langchain.com/). ```shell export LANGCHAIN_TRACING_V2=true export LANGCHAIN_API_KEY="your-api-key" export LANGCHAIN_PROJECT="your-project" # if not specified, defaults to "default" ``` ## [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#implementation) Implementation To build the application, we can use the official SDKs of Aleph Alpha and Qdrant. However, to streamline the process let’s use [LangChain](https://python.langchain.com/docs/get_started/introduction). This framework is already integrated with both services, so we can focus our efforts on developing business logic. ### [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#qdrant-collection) Qdrant collection Aleph Alpha embeddings are high dimensional vectors by default, with a dimensionality of `5120`. However, a pretty unique feature of that model is that they might be compressed to a size of `128`, with a small drop in accuracy performance (4-6%, according to the docs). Qdrant can store even the original vectors easily, and this sounds like a good idea to enable [Binary Quantization](https://qdrant.tech/documentation/guides/quantization/#binary-quantization) to save space and make the retrieval faster. Let’s create a collection with such settings: ```python from qdrant_client import QdrantClient, models client = QdrantClient( location=os.environ["QDRANT_URL"], api_key=os.environ["QDRANT_API_KEY"], ) client.create_collection( collection_name="contracts", vectors_config=models.VectorParams( size=5120, distance=models.Distance.COSINE, quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True, ) ) ), ) ``` We are going to use the `contracts` collection to store the vectors of the documents. The `always_ram` flag is set to `True` to keep the quantized vectors in RAM, which will speed up the search process. We also wanted to restrict access to the individual documents, so only users with the proper permissions can see them. In Qdrant that should be solved by adding a payload field that defines who can access the document. We’ll call this field `roles` and set it to an array of strings with the roles that can access the document. ```python client.create_payload_index( collection_name="contracts", field_name="metadata.roles", field_schema=models.PayloadSchemaType.KEYWORD, ) ``` Since we use Langchain, the `roles` field is a nested field of the `metadata`, so we have to define it as `metadata.roles`. The schema says that the field is a keyword, which means it is a string or an array of strings. We are going to use the name of the customers as the roles, so the access control will be based on the customer name. ### [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#ingestion-pipeline) Ingestion pipeline Semantic search systems rely on high-quality data as their foundation. With the [unstructured integration of Langchain](https://python.langchain.com/docs/integrations/providers/unstructured), ingestion of various document formats like PDFs, Microsoft Word files, and PowerPoint presentations becomes effortless. However, it’s crucial to split the text intelligently to avoid converting entire documents into vectors; instead, they should be divided into meaningful chunks. Subsequently, the extracted documents are converted into vectors using Aleph Alpha embeddings and stored in the Qdrant collection. Let’s start by defining the components and connecting them together: ```python embeddings = AlephAlphaAsymmetricSemanticEmbedding( model="luminous-base", aleph_alpha_api_key=os.environ["ALEPH_ALPHA_API_KEY"], normalize=True, ) qdrant = Qdrant( client=client, collection_name="contracts", embeddings=embeddings, ) ``` Now it’s high time to index our documents. Each of the documents is a separate file, and we also have to know the customer name to set the access control properly. There might be several roles for a single document, so let’s keep them in a list. ```python documents = { "data/Data-Processing-Agreement_STACKIT_Cloud_version-1.2.pdf": ["stackit"], "data/langchain-terms-of-service.pdf": ["langchain"], } ``` This is how the documents might look like: ![Example of the indexed document](https://qdrant.tech/documentation/examples/contract-management-stackit-aleph-alpha/indexed-document.png) Each has to be split into chunks first; there is no silver bullet. Our chunking algorithm will be simple and based on recursive splitting, with the maximum chunk size of 500 characters and the overlap of 100 characters. ```python from langchain_text_splitters import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=100, ) ``` Now we can iterate over the documents, split them into chunks, convert them into vectors with Aleph Alpha embedding model, and store them in the Qdrant. ```python from langchain_community.document_loaders.unstructured import UnstructuredFileLoader for document_path, roles in documents.items(): document_loader = UnstructuredFileLoader(file_path=document_path) # Unstructured loads each file into a single Document object loaded_documents = document_loader.load() for doc in loaded_documents: doc.metadata["roles"] = roles # Chunks will have the same metadata as the original document document_chunks = text_splitter.split_documents(loaded_documents) # Add the documents to the Qdrant collection qdrant.add_documents(document_chunks, batch_size=20) ``` Our collection is filled with data, and we can start searching over it. In a real-world scenario, the ingestion process should be automated and triggered by the new documents uploaded to the system. Since we already use Qdrant Hybrid Cloud running on Kubernetes, we can easily deploy the ingestion pipeline as a job to the same environment. On STACKIT, you probably use the [STACKIT Kubernetes Engine (SKE)](https://www.stackit.de/en/product/kubernetes/) and launch it in a container. The [Compute Engine](https://www.stackit.de/en/product/stackit-compute-engine/) is also an option, but everything depends on the specifics of your organization. ### [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#search-application) Search application Specialized Document Management Systems have a lot of features, but semantic search is not yet a standard. We are going to build a simple search mechanism which could be possibly integrated with the existing system. The search process is quite simple: we convert the query into a vector using the same Aleph Alpha model, and then search for the most similar documents in the Qdrant collection. The access control is also applied, so the user can only see the documents they are allowed to. We start with creating an instance of the LLM of our choice, and set the maximum number of tokens to 200, as the default value is 64, which might be too low for our purposes. ```python from langchain.llms.aleph_alpha import AlephAlpha llm = AlephAlpha( model="luminous-extended-control", aleph_alpha_api_key=os.environ["ALEPH_ALPHA_API_KEY"], maximum_tokens=200, ) ``` Then, we can glue the components together and build the search process. `RetrievalQA` is a class that takes implements the Question Retrieval process, with a specified retriever and Large Language Model. The instance of `Qdrant` might be converted into a retriever, with additional filter that will be passed to the `similarity_search` method. The filter is created as [in a regular Qdrant query](https://qdrant.tech/documentation/concepts/filtering/), with the `roles` field set to the user’s roles. ```python user_roles = ["stackit", "aleph-alpha"] qdrant_retriever = qdrant.as_retriever( search_kwargs={ "filter": models.Filter( must=[\ models.FieldCondition(\ key="metadata.roles",\ match=models.MatchAny(any=user_roles)\ )\ ] ) } ) ``` We set the user roles to `stackit` and `aleph-alpha`, so the user can see the documents that are accessible to these customers, but not to the others. The final step is to create the `RetrievalQA` instance and use it to search over the documents, with the custom prompt. ```python from langchain.prompts import PromptTemplate from langchain.chains.retrieval_qa.base import RetrievalQA prompt_template = """ Question: {question} Answer the question using the Source. If there's no answer, say "NO ANSWER IN TEXT". Source: {context} ### Response: """ prompt = PromptTemplate( template=prompt_template, input_variables=["context", "question"] ) retrieval_qa = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=qdrant_retriever, return_source_documents=True, chain_type_kwargs={"prompt": prompt}, ) response = retrieval_qa.invoke({"query": "What are the rules of performing the audit?"}) print(response["result"]) ``` Output: ```text The rules for performing the audit are as follows: 1. The Customer must inform the Contractor in good time (usually at least two weeks in advance) about any and all circumstances related to the performance of the audit. 2. The Customer is entitled to perform one audit per calendar year. Any additional audits may be performed if agreed with the Contractor and are subject to reimbursement of expenses. 3. If the Customer engages a third party to perform the audit, the Customer must obtain the Contractor's consent and ensure that the confidentiality agreements with the third party are observed. 4. The Contractor may object to any third party deemed unsuitable. ``` There are some other parameters that might be tuned to optimize the search process. The `k` parameter defines how many documents should be returned, but Langchain allows us also to control the retrieval process by choosing the type of the search operation. The default is `similarity`, which is just vector search, but we can also use `mmr` which stands for Maximal Marginal Relevance. It is a technique to diversify the search results, so the user gets the most relevant documents, but also the most diverse ones. The `mmr` search is slower, but might be more user-friendly. Our search application is ready, and we can deploy it to the same environment as the ingestion pipeline on STACKIT. The same rules apply here, so you can use the SKE or the Compute Engine, depending on the specifics of your organization. ## [Anchor](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/\#next-steps) Next steps We built a solid foundation for the contract management system, but there is still a lot to do. If you want to make the system production-ready, you should consider implementing the mechanism into your existing stack. If you have any questions, feel free to ask on our [Discord community](https://qdrant.to/discord). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-contract-management-stackit-aleph-alpha.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-contract-management-stackit-aleph-alpha.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-30-lllmstxt|> ## automate-filtering-with-llms - [Documentation](https://qdrant.tech/documentation/) - [Search precision](https://qdrant.tech/documentation/search-precision/) - Automate filtering with LLMs --- # [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#automate-filtering-with-llms) Automate filtering with LLMs Our [complete guide to filtering in vector search](https://qdrant.tech/articles/vector-search-filtering/) describes why filtering is important, and how to implement it with Qdrant. However, applying filters is easier when you build an application with a traditional interface. Your UI may contain a form with checkboxes, sliders, and other elements that users can use to set their criteria. But what if you want to build a RAG-powered application with just the conversational interface, or even voice commands? In this case, you need to automate the filtering process! LLMs seem to be particularly good at this task. They can understand natural language and generate structured output based on it. In this tutorial, we’ll show you how to use LLMs to automate filtering in your vector search application. ## [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#few-notes-on-qdrant-filters) Few notes on Qdrant filters Qdrant Python SDK defines the models using [Pydantic](https://docs.pydantic.dev/latest/). This library is de facto standard for data validation and serialization in Python. It allows you to define the structure of your data using Python type hints. For example, our `Filter` model is defined as follows: ```python class Filter(BaseModel, extra="forbid"): should: Optional[Union[List["Condition"], "Condition"]] = Field( default=None, description="At least one of those conditions should match" ) min_should: Optional["MinShould"] = Field( default=None, description="At least minimum amount of given conditions should match" ) must: Optional[Union[List["Condition"], "Condition"]] = Field(default=None, description="All conditions must match") must_not: Optional[Union[List["Condition"], "Condition"]] = Field( default=None, description="All conditions must NOT match" ) ``` Qdrant filters may be nested, and you can express even the most complex conditions using the `must`, `should`, and `must_not` notation. ## [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#structured-output-from-llms) Structured output from LLMs It isn’t an uncommon practice to use LLMs to generate structured output. It is primarily useful if their output is intended for further processing by a different application. For example, you can use LLMs to generate SQL queries, JSON objects, and most importantly, Qdrant filters. Pydantic got adopted by the LLM ecosystem quite well, so there is plenty of libraries which uses Pydantic models to define the structure of the output for the Language Models. One of the interesting projects in this area is [Instructor](https://python.useinstructor.com/) that allows you to play with different LLM providers and restrict their output to a specific structure. Let’s install the library and already choose a provider we’ll use in this tutorial: ```shell pip install "instructor[anthropic]" ``` Anthropic is not the only option out there, as Instructor supports many other providers including OpenAI, Ollama, Llama, Gemini, Vertex AI, Groq, Litellm and others. You can choose the one that fits your needs the best, or the one you already use in your RAG. ## [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#using-instructor-to-generate-qdrant-filters) Using Instructor to generate Qdrant filters Instructor has some helper methods to decorate the LLM APIs, so you can interact with them as if you were using their normal SDKs. In case of Anthropic, you just pass an instance of `Anthropic` class to the `from_anthropic` function: ```python import instructor from anthropic import Anthropic anthropic_client = instructor.from_anthropic( client=Anthropic( api_key="YOUR_API_KEY", ) ) ``` A decorated client slightly modifies the original API, so you can pass the `response_model` parameter to the `.messages.create` method. This parameter should be a Pydantic model that defines the structure of the output. In case of Qdrant filters, it should be a `Filter` model: ```python from qdrant_client import models qdrant_filter = anthropic_client.messages.create( model="claude-3-5-sonnet-latest", response_model=models.Filter, max_tokens=1024, messages=[\ {\ "role": "user",\ "content": "red T-shirt"\ }\ ], ) ``` The output of this code will be a Pydantic model that represents a Qdrant filter. Surprisingly, there is no need to pass additional instructions to already figure out that the user wants to filter by the color and the type of the product. Here is how the output looks like: ```python Filter( should=None, min_should=None, must=[\ FieldCondition(\ key="color",\ match=MatchValue(value="red"),\ range=None,\ geo_bounding_box=None,\ geo_radius=None,\ geo_polygon=None,\ values_count=None\ ),\ FieldCondition(\ key="type",\ match=MatchValue(value="t-shirt"),\ range=None,\ geo_bounding_box=None,\ geo_radius=None,\ geo_polygon=None,\ values_count=None\ )\ ], must_not=None ) ``` Obviously, giving the model complete freedom to generate the filter may lead to unexpected results, or no results at all. Your collection probably has payloads with a specific structure, so it doesn’t make sense to use anything else. Moreover, **it’s considered a good practice to filter by the fields that have been indexed**. That’s why it makes sense to automatically determine the indexed fields and restrict the output to them. ### [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#restricting-the-available-fields) Restricting the available fields Qdrant collection info contains a list of the indexes created on a particular collection. You can use this information to automatically determine the fields that can be used for filtering. Here is how you can do it: ```python from qdrant_client import QdrantClient client = QdrantClient("http://localhost:6333") collection_info = client.get_collection(collection_name="test_filter") indexes = collection_info.payload_schema print(indexes) ``` Output: ```python { "city.location": PayloadIndexInfo( data_type=PayloadSchemaType.GEO, ... ), "city.name": PayloadIndexInfo( data_type=PayloadSchemaType.KEYWORD, ... ), "color": PayloadIndexInfo( data_type=PayloadSchemaType.KEYWORD, ... ), "fabric": PayloadIndexInfo( data_type=PayloadSchemaType.KEYWORD, ... ), "price": PayloadIndexInfo( data_type=PayloadSchemaType.FLOAT, ... ), } ``` Our LLM should know the names of the fields it can use, but also their type, as e.g., range filtering only makes sense for numerical fields, and geo filtering on non-geo fields won’t yield anything meaningful. You can pass this information as a part of the prompt to the LLM, so let’s encode it as a string: ```python formatted_indexes = "\n".join([\ f"- {index_name} - {index.data_type.name}"\ for index_name, index in indexes.items()\ ]) print(formatted_indexes) ``` Output: ```text - fabric - KEYWORD - city.name - KEYWORD - color - KEYWORD - price - FLOAT - city.location - GEO ``` **It’s a good idea to cache the list of the available fields and their types**, as they are not supposed to change often. Our interactions with the LLM should be slightly different now: ```python qdrant_filter = anthropic_client.messages.create( model="claude-3-5-sonnet-latest", response_model=models.Filter, max_tokens=1024, messages=[\ {\ "role": "user",\ "content": (\ "color is red"\ f"\n{formatted_indexes}\n"\ )\ }\ ], ) ``` Output: ```python Filter( should=None, min_should=None, must=FieldCondition( key="color", match=MatchValue(value="red"), range=None, geo_bounding_box=None, geo_radius=None, geo_polygon=None, values_count=None ), must_not=None ) ``` The same query, restricted to the available fields, now generates better criteria, as it doesn’t try to filter by the fields that don’t exist in the collection. ### [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#testing-the-llm-output) Testing the LLM output Although the LLMs are quite powerful, they are not perfect. If you plan to automate filtering, it makes sense to run some tests to see how well they perform. Especially edge cases, like queries that cannot be expressed as filters. Let’s see how the LLM will handle the following query: ```python qdrant_filter = anthropic_client.messages.create( model="claude-3-5-sonnet-latest", response_model=models.Filter, max_tokens=1024, messages=[\ {\ "role": "user",\ "content": (\ "fruit salad with no more than 100 calories"\ f"\n{formatted_indexes}\n"\ )\ }\ ], ) ``` Output: ```python Filter( should=None, min_should=None, must=FieldCondition( key="price", match=None, range=Range(lt=None, gt=None, gte=None, lte=100.0), geo_bounding_box=None, geo_radius=None, geo_polygon=None, values_count=None ), must_not=None ) ``` Surprisingly, the LLM extracted the calorie information from the query and generated a filter based on the price field. It somehow extracts any numerical information from the query and tries to match it with the available fields. Generally, giving model some more guidance on how to interpret the query may lead to better results. Adding a system prompt that defines the rules for the query interpretation may help the model to do a better job. Here is how you can do it: ```python SYSTEM_PROMPT = """ You are extracting filters from a text query. Please follow the following rules: 1. Query is provided in the form of a text enclosed in tags. 2. Available indexes are put at the end of the text in the form of a list enclosed in tags. 3. You cannot use any field that is not available in the indexes. 4. Generate a filter only if you are certain that user's intent matches the field name. 5. Prices are always in USD. 6. It's better not to generate a filter than to generate an incorrect one. """ qdrant_filter = anthropic_client.messages.create( model="claude-3-5-sonnet-latest", response_model=models.Filter, max_tokens=1024, messages=[\ {\ "role": "user",\ "content": SYSTEM_PROMPT.strip(),\ },\ {\ "role": "assistant",\ "content": "Okay, I will follow all the rules."\ },\ {\ "role": "user",\ "content": (\ "fruit salad with no more than 100 calories"\ f"\n{formatted_indexes}\n"\ )\ }\ ], ) ``` Current output: ```python Filter( should=None, min_should=None, must=None, must_not=None ) ``` ### [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#handling-complex-queries) Handling complex queries We have a bunch of indexes created on the collection, and it is quite interesting to see how the LLM will handle more complex queries. For example, let’s see how it will handle the following query: ```python qdrant_filter = anthropic_client.messages.create( model="claude-3-5-sonnet-latest", response_model=models.Filter, max_tokens=1024, messages=[\ {\ "role": "user",\ "content": SYSTEM_PROMPT.strip(),\ },\ {\ "role": "assistant",\ "content": "Okay, I will follow all the rules."\ },\ {\ "role": "user",\ "content": (\ ""\ "white T-shirt available no more than 30 miles from London, "\ "but not in the city itself, below $15.70, not made from polyester"\ "\n"\ "\n"\ f"{formatted_indexes}\n"\ ""\ )\ },\ ], ) ``` It might be surprising, but Anthropic Claude is able to generate even such complex filters. Here is the output: ```python Filter( should=None, min_should=None, must=[\ FieldCondition(\ key="color",\ match=MatchValue(value="white"),\ range=None,\ geo_bounding_box=None,\ geo_radius=None,\ geo_polygon=None,\ values_count=None\ ),\ FieldCondition(\ key="city.location",\ match=None,\ range=None,\ geo_bounding_box=None,\ geo_radius=GeoRadius(\ center=GeoPoint(lon=-0.1276, lat=51.5074),\ radius=48280.0\ ),\ geo_polygon=None,\ values_count=None\ ),\ FieldCondition(\ key="price",\ match=None,\ range=Range(lt=15.7, gt=None, gte=None, lte=None),\ geo_bounding_box=None,\ geo_radius=None,\ geo_polygon=None,\ values_count=None\ )\ ], must_not=[\ FieldCondition(\ key="city.name",\ match=MatchValue(value="London"),\ range=None,\ geo_bounding_box=None,\ geo_radius=None,\ geo_polygon=None,\ values_count=None\ ),\ FieldCondition(\ key="fabric",\ match=MatchValue(value="polyester"),\ range=None,\ geo_bounding_box=None,\ geo_radius=None,\ geo_polygon=None,\ values_count=None\ )\ ] ) ``` The model even knows the coordinates of London and uses them to generate the geo filter. It isn’t the best idea to rely on the model to generate such complex filters, but it’s quite impressive that it can do it. ## [Anchor](https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/\#further-steps) Further steps Real production systems would rather require more testing and validation of the LLM output. Building a ground truth dataset with the queries and the expected filters would be a good idea. You can use this dataset to evaluate the model performance and to see how it behaves in different scenarios. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/search-precision/automate-filtering-with-llms.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/search-precision/automate-filtering-with-llms.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-31-lllmstxt|> ## sitemap.xml https://qdrant.tech/articles/distance-based-exploration/2025-03-11T14:27:31+01:00https://qdrant.tech/articles/modern-sparse-neural-retrieval/2025-05-15T19:37:07+05:30https://qdrant.tech/articles/cross-encoder-integration-gsoc/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/what-is-a-vector-database/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/what-is-vector-quantization/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/vector-search-resource-optimization/2025-05-09T12:38:02+05:30https://qdrant.tech/articles/vector-search-filtering/2025-01-06T10:45:10+01:00https://qdrant.tech/articles/immutable-data-structures/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/minicoil/2025-05-13T18:20:11+02:00https://qdrant.tech/articles/search-feedback-loop/2025-04-01T12:23:31+02:00https://qdrant.tech/articles/dedicated-vector-search/2025-02-18T12:54:36-05:00https://qdrant.tech/articles/late-interaction-models/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/indexing-optimization/2025-03-24T19:51:41+01:00https://qdrant.tech/articles/gridstore-key-value-storage/2025-02-05T09:42:23-05:00https://qdrant.tech/articles/agentic-rag/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/hybrid-search/2025-01-03T10:53:26+01:00https://qdrant.tech/articles/what-is-rag-in-ai/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/bm42/2025-04-10T12:02:16+02:00https://qdrant.tech/articles/qdrant-1.8.x/2024-07-07T18:34:56-07:00https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/2025-05-15T19:33:44+05:30https://qdrant.tech/articles/rag-is-dead/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/binary-quantization-openai/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/multitenancy/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/data-privacy/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/discovery-search/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/what-are-embeddings/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/sparse-vectors/2025-03-04T22:08:36+01:00https://qdrant.tech/articles/qdrant-1.7.x/2024-10-05T03:39:41+05:30https://qdrant.tech/articles/new-recommendation-api/2024-03-07T20:31:05+01:00https://qdrant.tech/articles/dedicated-service/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/fastembed/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/geo-polygon-filter-gsoc/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/binary-quantization/2025-04-10T09:21:38-03:00https://qdrant.tech/articles/food-discovery-demo/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/web-ui-gsoc/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/dimension-reduction-qsoc/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/search-as-you-type/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/vector-similarity-beyond-search/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/serverless/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/database-tutorials/bulk-upload/2025-03-25T21:43:45-03:00https://qdrant.tech/benchmarks/benchmarks-intro/2024-06-27T12:40:08+02:00https://qdrant.tech/documentation/faq/qdrant-fundamentals/2025-05-02T10:37:48+02:00https://qdrant.tech/documentation/search-precision/reranking-semantic-search/2025-05-21T15:27:35+08:00https://qdrant.tech/documentation/cloud-rbac/role-management/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/beginner-tutorials/search-beginners/2025-04-25T19:32:48+03:00https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/2025-03-10T22:19:22+01:00https://qdrant.tech/documentation/private-cloud/private-cloud-setup/2025-06-03T09:48:32+02:00https://qdrant.tech/documentation/overview/vector-search/2024-10-05T03:39:41+05:30https://qdrant.tech/articles/qdrant-1.3.x/2024-03-07T20:31:05+01:00https://qdrant.tech/benchmarks/single-node-speed-benchmark/2024-06-17T22:01:23+02:00https://qdrant.tech/benchmarks/single-node-speed-benchmark-2022/2024-01-11T19:41:06+05:30https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/2025-05-27T18:00:51+02:00https://qdrant.tech/documentation/beginner-tutorials/neural-search/2024-11-18T15:26:15-08:00https://qdrant.tech/documentation/private-cloud/configuration/2025-03-21T16:37:49+01:00https://qdrant.tech/documentation/database-tutorials/create-snapshot/2025-06-12T09:02:54+03:00https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/2025-06-16T17:51:31+02:00https://qdrant.tech/documentation/data-ingestion-beginners/2025-05-15T20:16:43+05:30https://qdrant.tech/documentation/faq/database-optimization/2024-10-05T03:39:41+05:30https://qdrant.tech/documentation/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/2025-06-10T11:40:10+03:00https://qdrant.tech/documentation/database-tutorials/large-scale-search/2025-03-24T14:27:15-03:00https://qdrant.tech/documentation/fastembed/fastembed-quickstart/2024-08-06T15:42:27-07:00https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/2025-06-05T14:05:27+03:00https://qdrant.tech/documentation/advanced-tutorials/code-search/2025-05-15T19:33:03+05:30https://qdrant.tech/documentation/agentic-rag-crewai-zoom/2025-04-09T12:55:16+02:00https://qdrant.tech/documentation/cloud-rbac/user-management/2025-05-02T18:40:38+02:00https://qdrant.tech/articles/io\_uring/2024-12-20T13:10:51+01:00https://qdrant.tech/benchmarks/filtered-search-intro/2024-01-11T19:41:06+05:30https://qdrant.tech/documentation/agentic-rag-langgraph/2025-05-15T19:37:07+05:30https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/2024-11-18T15:26:15-08:00https://qdrant.tech/documentation/hybrid-cloud/operator-configuration/2024-12-23T12:11:13+01:00https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/2025-04-26T13:30:39+03:00https://qdrant.tech/documentation/database-tutorials/huggingface-datasets/2024-11-18T15:26:15-08:00https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/2025-06-16T17:51:31+02:00https://qdrant.tech/documentation/cloud-rbac/permission-reference/2025-06-13T08:39:21+02:00https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/2025-04-26T18:10:19+03:00https://qdrant.tech/documentation/overview/2025-04-26T22:59:20-07:00https://qdrant.tech/articles/product-quantization/2025-02-04T13:55:26+01:00https://qdrant.tech/benchmarks/filtered-search-benchmark/2024-01-11T19:41:06+05:30https://qdrant.tech/documentation/agentic-rag-camelai-discord/2025-04-09T12:55:16+02:00https://qdrant.tech/documentation/private-cloud/backups/2024-09-05T15:17:16+02:00https://qdrant.tech/documentation/database-tutorials/async-api/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/cloud-quickstart/2025-05-29T08:51:37-04:00https://qdrant.tech/documentation/quickstart/2025-01-20T10:08:10+01:00https://qdrant.tech/documentation/private-cloud/logging-monitoring/2025-02-11T18:21:40+01:00https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/2024-11-18T15:26:15-08:00https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/2025-02-11T18:21:40+01:00https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/2025-01-28T15:29:08+01:00https://qdrant.tech/articles/scalar-quantization/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/interfaces/2024-11-21T17:41:45+05:30https://qdrant.tech/documentation/private-cloud/api-reference/2025-06-03T09:48:32+02:00https://qdrant.tech/documentation/private-cloud/changelog/2025-06-03T09:48:32+02:00https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/2024-11-18T15:42:18-08:00https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/2025-04-30T22:48:05+05:30https://qdrant.tech/documentation/guides/installation/2025-05-02T10:37:48+02:00https://qdrant.tech/documentation/multimodal-search/2025-04-09T12:55:16+02:00https://qdrant.tech/documentation/fastembed/fastembed-splade/2025-04-25T19:38:33+03:00https://qdrant.tech/articles/seed-round/2024-03-07T20:31:05+01:00https://qdrant.tech/articles/langchain-integration/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/rag-deepseek/2025-04-26T13:02:13+03:00https://qdrant.tech/documentation/web-ui/2024-11-20T23:14:39+05:30https://qdrant.tech/documentation/fastembed/fastembed-colbert/2025-06-19T16:21:03+04:00https://qdrant.tech/articles/chatgpt-plugin/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/memory-consumption/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/qa-with-cohere-and-qdrant/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/qdrant-1.2.x/2024-03-07T20:31:05+01:00https://qdrant.tech/articles/dataset-quality/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/concepts/2024-11-14T18:59:28+01:00https://qdrant.tech/documentation/fastembed/fastembed-rerankers/2025-04-26T13:20:52+03:00https://qdrant.tech/articles/faq-question-answering/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/why-rust/2024-09-05T13:07:07-07:00https://qdrant.tech/articles/embedding-recycler/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/cars-recognition/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/guides/administration/2025-05-19T15:01:52+02:00https://qdrant.tech/benchmarks/benchmark-faq/2024-01-11T19:41:06+05:30https://qdrant.tech/documentation/guides/running-with-gpu/2025-03-20T15:19:07+01:00https://qdrant.tech/articles/vector-search-manuals/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/guides/capacity-planning/2024-10-05T03:39:41+05:30https://qdrant.tech/documentation/fastembed/2025-05-27T18:00:51+02:00https://qdrant.tech/documentation/guides/optimize/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/cloud-getting-started/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/guides/multiple-partitions/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/qdrant-mcp-server/2025-05-27T18:00:51+02:00https://qdrant.tech/documentation/cloud-account-setup/2025-05-02T18:40:38+02:00https://qdrant.tech/documentation/cloud-rbac/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/cloud/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/hybrid-cloud/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/beginner-tutorials/2024-11-18T15:26:15-08:00https://qdrant.tech/documentation/advanced-tutorials/2025-02-07T18:51:10-05:00https://qdrant.tech/documentation/private-cloud/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/cloud-pricing-payments/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/2025-06-19T11:54:06+03:00https://qdrant.tech/documentation/data-management/2025-05-31T21:49:18+02:00https://qdrant.tech/documentation/examples/llama-index-multitenancy/2024-04-11T13:13:14-07:00https://qdrant.tech/documentation/database-tutorials/2025-06-11T19:02:35+03:00https://qdrant.tech/documentation/embeddings/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/cloud-premium/2025-05-02T16:53:21+02:00https://qdrant.tech/articles/metric-learning-tips/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/cloud/create-cluster/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/frameworks/2025-05-19T21:17:24+05:30https://qdrant.tech/articles/qdrant-internals/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/observability/2024-11-14T18:59:28+01:00https://qdrant.tech/documentation/platforms/2025-05-14T07:24:10-04:00https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/2024-05-15T18:01:28+02:00https://qdrant.tech/documentation/examples/cohere-rag-connector/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/send-data/2024-11-14T18:59:28+01:00https://qdrant.tech/documentation/examples/2025-06-19T11:54:06+03:00https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/2024-04-15T17:41:39-07:00https://qdrant.tech/documentation/cloud-api/2025-06-06T09:56:35+02:00https://qdrant.tech/documentation/cloud-tools/2024-11-19T17:56:47-08:00https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/datasets/2024-11-14T18:59:28+01:00https://qdrant.tech/articles/detecting-coffee-anomalies/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/triplet-loss/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/cloud/authentication/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/concepts/collections/2025-04-07T00:40:39+02:00https://qdrant.tech/articles/data-exploration/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/2024-04-15T19:50:07-07:00https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/2025-05-15T19:37:07+05:30https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/2024-08-23T22:48:27+05:30https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/cloud/cluster-access/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/support/2025-04-08T10:25:18+02:00https://qdrant.tech/documentation/send-data/databricks/2024-07-29T21:03:45+05:30https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/2024-08-13T13:38:38+03:00https://qdrant.tech/articles/machine-learning/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/concepts/points/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/concepts/vectors/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/concepts/payload/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/2024-07-22T17:09:17-07:00https://qdrant.tech/articles/neural-search-tutorial/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/rag-and-genai/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/cloud/cluster-scaling/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/concepts/search/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/concepts/explore/2025-06-12T10:45:50-04:00https://qdrant.tech/documentation/cloud/cluster-monitoring/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/cloud/cluster-upgrades/2025-05-02T16:53:21+02:00https://qdrant.tech/documentation/concepts/hybrid-queries/2025-04-23T11:15:58+02:00https://qdrant.tech/articles/filtrable-hnsw/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/concepts/filtering/2025-06-09T18:30:19+03:30https://qdrant.tech/articles/practicle-examples/2024-12-20T13:10:51+01:00https://qdrant.tech/documentation/cloud/backups/2025-05-02T16:53:21+02:00https://qdrant.tech/articles/qdrant-0-11-release/2022-12-06T13:12:27+01:00https://qdrant.tech/articles/qdrant-0-10-release/2024-05-15T18:01:28+02:00https://qdrant.tech/documentation/concepts/optimizer/2024-11-27T16:59:34+01:00https://qdrant.tech/documentation/concepts/storage/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/concepts/indexing/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/guides/distributed\_deployment/2025-02-03T17:33:39+06:00https://qdrant.tech/documentation/concepts/snapshots/2025-06-12T09:02:54+03:00https://qdrant.tech/documentation/guides/quantization/2025-04-07T00:40:39+02:00https://qdrant.tech/documentation/guides/monitoring/2025-02-11T18:21:40+01:00https://qdrant.tech/documentation/guides/configuration/2025-02-04T11:00:51+01:00https://qdrant.tech/documentation/guides/security/2025-01-20T16:32:23+01:00https://qdrant.tech/documentation/guides/usage-statistics/2024-12-03T17:03:30+01:00https://qdrant.tech/documentation/guides/common-errors/2025-05-27T12:04:07+02:00https://qdrant.tech/documentation/database-tutorials/migration/2025-06-11T18:57:35+03:00https://qdrant.tech/blog/hybrid-cloud-vultr/2024-05-21T10:11:09+02:00https://qdrant.tech/articles/quantum-quantization/2023-07-13T01:45:36+02:00https://qdrant.tech/blog/hybrid-cloud-stackit/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-scaleway/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-red-hat-openshift/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-ovhcloud/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-llamaindex/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-langchain/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-jinaai/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-haystack/2024-09-24T14:30:20-04:00https://qdrant.tech/blog/hybrid-cloud-digitalocean/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-aleph-alpha/2025-02-04T13:55:26+01:00https://qdrant.tech/blog/hybrid-cloud-airbyte/2025-02-04T13:55:26+01:00https://qdrant.tech/documentation/observability/openllmetry/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/observability/openlit/2024-08-15T08:50:37+05:30https://qdrant.tech/blog/case-study-lettria-v2/2025-06-16T22:38:02-07:00https://qdrant.tech/2025-06-19T16:21:03+04:00https://qdrant.tech/blog/beta-database-migration-tool/2025-06-18T11:55:05-04:00https://qdrant.tech/blog/case-study-lawme/2025-06-11T09:42:37-07:00https://qdrant.tech/blog/case-study-convosearch/2025-06-10T09:54:12-07:00https://qdrant.tech/blog/legal-tech-builders-guide/2025-06-13T15:44:13-07:00https://qdrant.tech/blog/soc-2-type-ii-hipaa/2025-06-17T16:48:22-07:00https://qdrant.tech/blog/n8n-node/2025-06-09T15:38:39+02:00https://qdrant.tech/blog/datatalks-course/2025-06-05T09:19:05-04:00https://qdrant.tech/blog/case-study-qovery/2025-05-27T11:19:41-07:00https://qdrant.tech/blog/case-study-tripadvisor/2025-05-13T23:15:13-07:00https://qdrant.tech/blog/case-study-aracor/2025-05-13T11:23:13-07:00https://qdrant.tech/blog/case-study-garden-intel/2025-05-09T11:56:26-07:00https://qdrant.tech/blog/product-ui-changes/2025-05-08T09:28:12-04:00https://qdrant.tech/blog/case-study-pariti/2025-05-01T10:05:43-07:00https://qdrant.tech/articles/vector-search-production/2025-04-30T17:47:55+02:00https://qdrant.tech/blog/case-study-dust-v2/2025-05-08T11:45:46-07:00https://qdrant.tech/blog/case-study-sayone/2025-04-29T09:15:10-07:00https://qdrant.tech/blog/superlinked-multimodal-search/2025-04-24T14:10:50+02:00https://qdrant.tech/blog/qdrant-1.14.x/2025-05-02T15:26:42-03:00https://qdrant.tech/blog/case-study-pathwork/2025-05-16T09:10:33-07:00https://qdrant.tech/blog/case-study-lyzr/2025-05-16T09:10:33-07:00https://qdrant.tech/blog/case-study-mixpeek/2025-05-16T09:10:33-07:00https://qdrant.tech/blog/qdrant-n8n-beyond-simple-similarity-search/2025-04-08T11:38:52+02:00https://qdrant.tech/blog/satellite-vector-broadcasting/2025-04-01T08:09:34+02:00https://qdrant.tech/blog/case-study-hubspot/2025-05-16T09:10:33-07:00https://qdrant.tech/blog/webinar-vibe-coding-rag/2025-03-21T16:36:29+01:00https://qdrant.tech/blog/case-study-deutsche-telekom/2025-04-03T08:09:56-04:00https://qdrant.tech/blog/enterprise-vector-search/2025-04-07T15:17:30-04:00https://qdrant.tech/blog/metadata-deasy-labs/2025-02-24T15:04:44-03:00https://qdrant.tech/blog/webinar-crewai-qdrant-obsidian/2025-01-24T16:10:16+01:00https://qdrant.tech/blog/qdrant-1.13.x/2025-01-24T04:19:54-05:00https://qdrant.tech/blog/static-embeddings/2025-01-17T14:53:25+01:00https://qdrant.tech/blog/case-study-voiceflow/2024-12-10T10:26:56-08:00https://qdrant.tech/blog/facial-recognition/2024-12-03T20:56:40-08:00https://qdrant.tech/blog/colpali-qdrant-optimization/2024-11-30T18:57:48-03:00https://qdrant.tech/blog/rag-evaluation-guide/2025-02-18T21:01:07+05:30https://qdrant.tech/blog/case-study-qatech/2024-11-21T16:42:35-08:00https://qdrant.tech/blog/qdrant-colpali/2024-11-06T17:18:48-08:00https://qdrant.tech/blog/case-study-sprinklr/2024-10-18T09:03:19-07:00https://qdrant.tech/blog/qdrant-1.12.x/2024-10-08T19:49:58-07:00https://qdrant.tech/blog/qdrant-deeplearning-ai-course/2024-10-07T12:25:14-07:00https://qdrant.tech/blog/qdrant-for-startups-launch/2024-10-02T19:07:16+05:30https://qdrant.tech/blog/case-study-shakudo/2025-03-13T17:47:05+01:00https://qdrant.tech/blog/qdrant-relari/2024-09-17T15:53:48-07:00https://qdrant.tech/blog/case-study-nyris/2024-09-23T14:05:33-07:00https://qdrant.tech/blog/case-study-kern/2024-09-23T14:05:33-07:00https://qdrant.tech/blog/qdrant-1.11.x/2024-08-16T00:01:23+02:00https://qdrant.tech/blog/case-study-kairoswealth/2024-09-11T14:59:00-07:00https://qdrant.tech/blog/qdrant-1.10.x/2024-07-16T22:00:30+05:30https://qdrant.tech/blog/community-highlights-1/2024-06-21T02:34:01-03:00https://qdrant.tech/blog/cve-2024-3829-response/2024-06-10T12:42:49-04:00https://qdrant.tech/blog/qdrant-soc2-type2-audit/2024-08-29T19:19:43+05:30https://qdrant.tech/blog/qdrant-stars-announcement/2024-10-05T03:39:41+05:30https://qdrant.tech/blog/qdrant-cpu-intel-benchmark/2024-10-08T12:41:46-07:00https://qdrant.tech/blog/qsoc24-interns-announcement/2024-05-08T18:04:46-03:00https://qdrant.tech/articles/semantic-cache-ai-data-retrieval/2024-12-20T13:10:51+01:00https://qdrant.tech/blog/are-you-vendor-locked/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/case-study-visua/2024-05-01T17:59:13-07:00https://qdrant.tech/blog/qdrant-1.9.x/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud-launch-partners/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/hybrid-cloud/2024-05-21T10:11:09+02:00https://qdrant.tech/blog/rag-advancements-challenges/2024-04-12T14:45:02+00:00https://qdrant.tech/blog/building-search-rag-open-api/2024-04-12T14:23:42+00:00https://qdrant.tech/blog/gen-ai-and-vector-search/2024-07-07T19:32:50-07:00https://qdrant.tech/blog/teaching-vector-db-at-scale/2024-04-09T11:06:17+00:00https://qdrant.tech/blog/meow-with-cheshire-cat/2024-04-09T11:05:51+00:00https://qdrant.tech/blog/cve-2024-2221-response/2024-08-15T17:31:04+02:00https://qdrant.tech/blog/fastllm-announcement/2024-04-01T04:13:26-07:00https://qdrant.tech/blog/virtualbrain-best-rag/2024-09-20T10:12:14-04:00https://qdrant.tech/blog/youtube-without-paying-cent/2024-03-27T12:44:32+00:00https://qdrant.tech/blog/azure-marketplace/2024-10-05T03:39:41+05:30https://qdrant.tech/blog/real-time-news-distillation-rag/2024-03-25T08:49:27+00:00https://qdrant.tech/blog/insight-generation-platform/2024-03-25T08:51:56+00:00https://qdrant.tech/blog/llm-as-a-judge/2024-03-19T15:05:24+00:00https://qdrant.tech/blog/vector-search-vector-recommendation/2024-03-19T14:22:15+00:00https://qdrant.tech/blog/using-qdrant-and-langchain/2024-05-15T18:01:28+02:00https://qdrant.tech/blog/iris-agent-qdrant/2024-03-06T09:17:19-08:00https://qdrant.tech/blog/case-study-dailymotion/2024-03-07T20:31:05+01:00https://qdrant.tech/blog/comparing-qdrant-vs-pinecone-vector-databases/2025-02-04T13:55:26+01:00https://qdrant.tech/blog/what-is-vector-similarity/2024-09-05T13:07:07-07:00https://qdrant.tech/blog/dspy-vs-langchain/2025-05-15T19:37:07+05:30https://qdrant.tech/blog/qdrant-summer-of-code-24/2024-03-14T18:24:32+01:00https://qdrant.tech/blog/dust-and-qdrant/2024-09-20T10:19:38-04:00https://qdrant.tech/blog/bitter-lesson-generative-language-model/2024-01-29T16:31:02+00:00https://qdrant.tech/blog/indexify-content-extraction-engine/2024-03-07T18:59:29+00:00https://qdrant.tech/blog/qdrant-x-dust-vector-search/2024-07-07T19:40:44-07:00https://qdrant.tech/blog/series-a-funding-round/2024-10-08T12:41:46-07:00https://qdrant.tech/blog/qdrant-cloud-on-microsoft-azure/2024-03-07T20:31:05+01:00https://qdrant.tech/blog/qdrant-benchmarks-2024/2024-03-07T20:31:05+01:00https://qdrant.tech/blog/navigating-challenges-innovations/2024-05-21T09:57:56+02:00https://qdrant.tech/blog/open-source-vector-search-engine-vector-database/2024-07-07T19:36:05-07:00https://qdrant.tech/blog/vector-image-search-rag/2024-01-25T17:51:08+01:00https://qdrant.tech/blog/semantic-search-vector-database/2024-07-07T19:46:08-07:00https://qdrant.tech/blog/llm-complex-search-copilot/2024-01-10T11:42:02+00:00https://qdrant.tech/blog/entity-matching-qdrant/2024-01-10T11:37:51+00:00https://qdrant.tech/blog/fast-embed-models/2024-01-22T10:15:56-08:00https://qdrant.tech/blog/human-language-ai-models/2024-01-10T10:31:15+00:00https://qdrant.tech/blog/binary-quantization/2024-01-10T10:26:06+00:00https://qdrant.tech/blog/qdrant-unstructured/2024-03-07T20:31:05+01:00https://qdrant.tech/blog/qdrant-n8n/2024-03-07T20:31:05+01:00https://qdrant.tech/blog/vector-search-and-applications-record/2024-09-06T13:14:12+02:00https://qdrant.tech/blog/cohere-embedding-v3/2024-09-06T13:14:12+02:00https://qdrant.tech/blog/case-study-pienso/2024-04-10T17:59:48-07:00https://qdrant.tech/blog/case-study-bloop/2024-07-18T19:11:22-07:00https://qdrant.tech/articles/qdrant-introduces-full-text-filters-and-indexes/2024-09-18T15:57:29-07:00https://qdrant.tech/articles/storing-multiple-vectors-per-object-in-qdrant/2024-12-20T13:10:51+01:00https://qdrant.tech/articles/batch-vector-search-with-qdrant/2024-12-20T13:10:51+01:00https://qdrant.tech/blog/qdrant-supports-arm-architecture/2024-01-16T22:02:52+05:30https://qdrant.tech/about-us/2024-05-21T09:57:56+02:00https://qdrant.tech/data-analysis-anomaly-detection/2024-08-29T10:01:03-04:00https://qdrant.tech/advanced-search/2024-08-21T16:31:41-07:00https://qdrant.tech/ai-agents/2025-02-12T08:47:39-06:00https://qdrant.tech/e-commerce/2025-05-22T20:23:57+02:00https://qdrant.tech/documentation/data-management/airbyte/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/embeddings/aleph-alpha/2024-11-28T08:54:13+05:30https://qdrant.tech/get\_anonymous\_id/2025-03-05T11:26:52+00:00https://qdrant.tech/documentation/data-management/airflow/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/data-management/nifi/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/data-management/spark/2025-03-06T10:23:24+05:30https://qdrant.tech/documentation/platforms/apify/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/autogen/2024-11-20T11:50:06+05:30https://qdrant.tech/documentation/embeddings/bedrock/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/frameworks/lakechain/2024-10-17T11:42:14+05:30https://qdrant.tech/about-us/about-us-resources/2025-05-30T14:14:31+03:00https://qdrant.tech/brand-resources/2024-06-17T16:56:32+03:00https://qdrant.tech/documentation/platforms/bubble/2024-08-15T08:50:37+05:30https://qdrant.tech/security/bug-bounty-program/2025-03-28T09:40:53+01:00https://qdrant.tech/documentation/build/2024-11-18T14:53:02-08:00https://qdrant.tech/documentation/platforms/buildship/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/camel/2024-12-20T13:31:09+05:30https://qdrant.tech/documentation/frameworks/cheshire-cat/2025-01-24T11:47:11+01:00https://qdrant.tech/documentation/data-management/cocoindex/2025-04-20T23:11:21-07:00https://qdrant.tech/documentation/data-management/cognee/2025-05-31T22:06:39+02:00https://qdrant.tech/documentation/embeddings/cohere/2025-02-19T10:27:39+03:00https://qdrant.tech/community/2025-01-07T11:56:39-06:00https://qdrant.tech/documentation/data-management/confluent/2024-08-15T08:50:37+05:30https://qdrant.tech/contact-us/2025-03-13T17:47:05+01:00https://qdrant.tech/legal/credits/2022-04-25T15:19:19+02:00https://qdrant.tech/documentation/frameworks/crewai/2025-02-27T09:21:41+01:00https://qdrant.tech/customers/2024-06-17T16:56:32+03:00https://qdrant.tech/documentation/frameworks/dagster/2025-04-15T18:20:05+05:30https://qdrant.tech/documentation/observability/datadog/2024-10-31T05:56:39+05:30https://qdrant.tech/documentation/frameworks/deepeval/2025-04-24T16:09:40+08:00https://qdrant.tech/documentation/data-management/dlt/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/docarray/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/platforms/docsgpt/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/frameworks/dsrag/2024-11-27T17:59:33+05:30https://qdrant.tech/documentation/frameworks/dynamiq/2025-03-24T10:22:45+02:00https://qdrant.tech/articles/ecosystem/2024-12-20T13:10:51+01:00https://qdrant.tech/enterprise-solutions/2024-08-20T14:08:09-04:00https://qdrant.tech/documentation/frameworks/feast/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/frameworks/fifty-one/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/genkit/2024-10-05T03:39:41+05:30https://qdrant.tech/documentation/data-management/fondant/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/embeddings/gemini/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/frameworks/haystack/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/honeyhive/2025-05-09T04:07:10-03:00https://qdrant.tech/hospitality-and-travel/2025-05-21T18:13:48+02:00https://qdrant.tech/legal/impressum/2024-02-28T17:57:34+01:00https://qdrant.tech/documentation/data-management/fluvio/2024-09-15T21:31:35+05:30https://qdrant.tech/documentation/platforms/rivet/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/embeddings/jina-embeddings/2024-11-28T08:54:13+05:30https://qdrant.tech/about-us/about-us-get-started/2025-05-30T14:14:31+03:00https://qdrant.tech/documentation/platforms/keboola/2025-05-14T07:24:10-04:00https://qdrant.tech/documentation/platforms/kotaemon/2024-11-07T03:37:15+05:30https://qdrant.tech/documentation/frameworks/langchain/2024-08-29T19:19:43+05:30https://qdrant.tech/documentation/frameworks/langchain-go/2024-11-04T16:55:24+01:00https://qdrant.tech/documentation/frameworks/langchain4j/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/langgraph/2024-11-20T19:27:09+05:30https://qdrant.tech/legal-tech/2025-04-24T18:13:38+02:00https://qdrant.tech/documentation/frameworks/llama-index/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/platforms/make/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/mastra/2024-12-20T13:30:42+05:30https://qdrant.tech/documentation/frameworks/mem0/2024-10-05T13:55:10+05:30https://qdrant.tech/documentation/frameworks/nlweb/2025-05-19T21:26:59+05:30https://qdrant.tech/documentation/data-management/mindsdb/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/embeddings/mistral/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/embeddings/mixedbread/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/embeddings/mixpeek/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/platforms/n8n/2025-06-06T22:10:24+05:30https://qdrant.tech/documentation/frameworks/neo4j-graphrag/2024-11-07T02:58:58+05:30https://qdrant.tech/documentation/embeddings/nomic/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/embeddings/nvidia/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/embeddings/ollama/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/embeddings/openai/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/frameworks/openai-agents/2025-04-30T14:10:48+05:30https://qdrant.tech/about-us/about-us-engineering-culture/2025-05-30T14:14:31+03:00https://qdrant.tech/documentation/frameworks/pandas-ai/2025-02-18T21:01:07+05:30https://qdrant.tech/partners/2024-06-17T16:56:32+03:00https://qdrant.tech/documentation/frameworks/canopy/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/platforms/pipedream/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/platforms/portable/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/platforms/powerapps/2025-01-10T21:05:50+05:30https://qdrant.tech/documentation/embeddings/premai/2024-11-28T08:54:13+05:30https://qdrant.tech/pricing/2024-08-20T12:47:35-07:00https://qdrant.tech/legal/privacy-policy/2025-06-19T13:22:43+02:00https://qdrant.tech/private-cloud/2024-05-21T09:57:56+02:00https://qdrant.tech/documentation/platforms/privategpt/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/cloud-tools/pulumi/2024-11-19T18:01:59-08:00https://qdrant.tech/articles/2024-12-20T13:10:51+01:00https://qdrant.tech/blog/2024-05-21T09:57:56+02:00https://qdrant.tech/cloud/2024-08-20T11:44:59-07:00https://qdrant.tech/demo/2024-09-06T13:14:12+02:00https://qdrant.tech/qdrant-for-startups/2024-09-30T18:44:08+02:00https://qdrant.tech/hybrid-cloud/2024-05-21T10:11:09+02:00https://qdrant.tech/stars/2024-06-17T16:56:32+03:00https://qdrant.tech/qdrant-vector-database/2024-08-29T08:43:52-04:00https://qdrant.tech/rag/rag-evaluation-guide/2024-09-16T18:43:11+02:00https://qdrant.tech/rag/2024-08-20T11:45:42-07:00https://qdrant.tech/documentation/frameworks/ragbits/2024-11-07T08:29:10+05:30https://qdrant.tech/recommendations/2024-08-20T12:49:28-07:00https://qdrant.tech/documentation/data-management/redpanda/2024-08-15T22:23:17+05:30https://qdrant.tech/documentation/frameworks/rig-rs/2024-11-07T08:04:53+05:30https://qdrant.tech/documentation/platforms/mulesoft/2025-01-10T21:16:11+05:30https://qdrant.tech/documentation/frameworks/semantic-router/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/smolagents/2025-01-04T22:43:37+05:30https://qdrant.tech/documentation/embeddings/snowflake/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/frameworks/solon/2025-04-15T18:20:05+05:30https://qdrant.tech/documentation/frameworks/spring-ai/2024-08-29T19:19:43+05:30https://qdrant.tech/documentation/frameworks/dspy/2025-06-16T17:32:35+03:00https://qdrant.tech/subscribe-confirmation/2023-12-26T11:53:00+00:00https://qdrant.tech/subscribe/2025-02-04T13:55:26+01:00https://qdrant.tech/documentation/frameworks/superduper/2024-11-27T17:46:12+05:30https://qdrant.tech/documentation/frameworks/sycamore/2024-10-17T11:40:28+05:30https://qdrant.tech/legal/terms\_and\_conditions/2021-12-10T10:29:52+01:00https://qdrant.tech/documentation/cloud-tools/terraform/2024-11-19T18:01:59-08:00https://qdrant.tech/documentation/frameworks/testcontainers/2025-04-24T18:47:10+10:00https://qdrant.tech/documentation/platforms/tooljet/2025-03-06T14:58:05+05:30https://qdrant.tech/documentation/embeddings/twelvelabs/2025-01-07T21:51:22+05:30https://qdrant.tech/documentation/frameworks/txtai/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/data-management/unstructured/2025-02-18T21:01:07+05:30https://qdrant.tech/documentation/embeddings/upstage/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/frameworks/vanna-ai/2024-08-15T08:50:37+05:30https://qdrant.tech/documentation/frameworks/mirror-security/2025-02-21T09:20:59+05:30https://qdrant.tech/benchmarks/2023-02-16T18:40:22+04:00https://qdrant.tech/use-cases/2024-09-04T08:01:21-07:00https://qdrant.tech/documentation/platforms/vectorize/2025-02-05T06:14:34-05:00https://qdrant.tech/documentation/embeddings/voyage/2024-11-28T08:54:13+05:30https://qdrant.tech/documentation/cloud-intro/2025-05-02T16:53:21+02:00 https://qdrant.tech/articles/distance-based-exploration/ 2025-03-11T14:27:31+01:00 ... https://qdrant.tech/articles/modern-sparse-neural-retrieval/ 2025-05-15T19:37:07+05:30 ... https://qdrant.tech/articles/cross-encoder-integration-gsoc/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/what-is-a-vector-database/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/what-is-vector-quantization/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/vector-search-resource-optimization/ 2025-05-09T12:38:02+05:30 ... https://qdrant.tech/articles/vector-search-filtering/ 2025-01-06T10:45:10+01:00 ... https://qdrant.tech/articles/immutable-data-structures/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/minicoil/ 2025-05-13T18:20:11+02:00 ... https://qdrant.tech/articles/search-feedback-loop/ 2025-04-01T12:23:31+02:00 ... https://qdrant.tech/articles/dedicated-vector-search/ 2025-02-18T12:54:36-05:00 ... https://qdrant.tech/articles/late-interaction-models/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/indexing-optimization/ 2025-03-24T19:51:41+01:00 ... https://qdrant.tech/articles/gridstore-key-value-storage/ 2025-02-05T09:42:23-05:00 ... https://qdrant.tech/articles/agentic-rag/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/hybrid-search/ 2025-01-03T10:53:26+01:00 ... https://qdrant.tech/articles/what-is-rag-in-ai/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/bm42/ 2025-04-10T12:02:16+02:00 ... https://qdrant.tech/articles/qdrant-1.8.x/ 2024-07-07T18:34:56-07:00 ... https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/ 2025-05-15T19:33:44+05:30 ... https://qdrant.tech/articles/rag-is-dead/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/binary-quantization-openai/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/multitenancy/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/data-privacy/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/discovery-search/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/what-are-embeddings/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/sparse-vectors/ 2025-03-04T22:08:36+01:00 ... https://qdrant.tech/articles/qdrant-1.7.x/ 2024-10-05T03:39:41+05:30 ... https://qdrant.tech/articles/new-recommendation-api/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/articles/dedicated-service/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/fastembed/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/geo-polygon-filter-gsoc/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/binary-quantization/ 2025-04-10T09:21:38-03:00 ... https://qdrant.tech/articles/food-discovery-demo/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/web-ui-gsoc/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/dimension-reduction-qsoc/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/search-as-you-type/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/vector-similarity-beyond-search/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/serverless/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/database-tutorials/bulk-upload/ 2025-03-25T21:43:45-03:00 ... https://qdrant.tech/benchmarks/benchmarks-intro/ 2024-06-27T12:40:08+02:00 ... https://qdrant.tech/documentation/faq/qdrant-fundamentals/ 2025-05-02T10:37:48+02:00 ... https://qdrant.tech/documentation/search-precision/reranking-semantic-search/ 2025-05-21T15:27:35+08:00 ... https://qdrant.tech/documentation/cloud-rbac/role-management/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/beginner-tutorials/search-beginners/ 2025-04-25T19:32:48+03:00 ... https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/ 2025-03-10T22:19:22+01:00 ... https://qdrant.tech/documentation/private-cloud/private-cloud-setup/ 2025-06-03T09:48:32+02:00 ... https://qdrant.tech/documentation/overview/vector-search/ 2024-10-05T03:39:41+05:30 ... https://qdrant.tech/articles/qdrant-1.3.x/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/benchmarks/single-node-speed-benchmark/ 2024-06-17T22:01:23+02:00 ... https://qdrant.tech/benchmarks/single-node-speed-benchmark-2022/ 2024-01-11T19:41:06+05:30 ... https://qdrant.tech/documentation/search-precision/automate-filtering-with-llms/ 2025-05-27T18:00:51+02:00 ... https://qdrant.tech/documentation/beginner-tutorials/neural-search/ 2024-11-18T15:26:15-08:00 ... https://qdrant.tech/documentation/private-cloud/configuration/ 2025-03-21T16:37:49+01:00 ... https://qdrant.tech/documentation/database-tutorials/create-snapshot/ 2025-06-12T09:02:54+03:00 ... https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/ 2025-06-16T17:51:31+02:00 ... https://qdrant.tech/documentation/data-ingestion-beginners/ 2025-05-15T20:16:43+05:30 ... https://qdrant.tech/documentation/faq/database-optimization/ 2024-10-05T03:39:41+05:30 ... https://qdrant.tech/documentation/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/ 2025-06-10T11:40:10+03:00 ... https://qdrant.tech/documentation/database-tutorials/large-scale-search/ 2025-03-24T14:27:15-03:00 ... https://qdrant.tech/documentation/fastembed/fastembed-quickstart/ 2024-08-06T15:42:27-07:00 ... https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/ 2025-06-05T14:05:27+03:00 ... https://qdrant.tech/documentation/advanced-tutorials/code-search/ 2025-05-15T19:33:03+05:30 ... https://qdrant.tech/documentation/agentic-rag-crewai-zoom/ 2025-04-09T12:55:16+02:00 ... https://qdrant.tech/documentation/cloud-rbac/user-management/ 2025-05-02T18:40:38+02:00 ... https://qdrant.tech/articles/io\_uring/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/benchmarks/filtered-search-intro/ 2024-01-11T19:41:06+05:30 ... https://qdrant.tech/documentation/agentic-rag-langgraph/ 2025-05-15T19:37:07+05:30 ... https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/ 2024-11-18T15:26:15-08:00 ... https://qdrant.tech/documentation/hybrid-cloud/operator-configuration/ 2024-12-23T12:11:13+01:00 ... https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/ 2025-04-26T13:30:39+03:00 ... https://qdrant.tech/documentation/database-tutorials/huggingface-datasets/ 2024-11-18T15:26:15-08:00 ... https://qdrant.tech/documentation/private-cloud/qdrant-cluster-management/ 2025-06-16T17:51:31+02:00 ... https://qdrant.tech/documentation/cloud-rbac/permission-reference/ 2025-06-13T08:39:21+02:00 ... https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/ 2025-04-26T18:10:19+03:00 ... https://qdrant.tech/documentation/overview/ 2025-04-26T22:59:20-07:00 ... https://qdrant.tech/articles/product-quantization/ 2025-02-04T13:55:26+01:00 ... https://qdrant.tech/benchmarks/filtered-search-benchmark/ 2024-01-11T19:41:06+05:30 ... https://qdrant.tech/documentation/agentic-rag-camelai-discord/ 2025-04-09T12:55:16+02:00 ... https://qdrant.tech/documentation/private-cloud/backups/ 2024-09-05T15:17:16+02:00 ... https://qdrant.tech/documentation/database-tutorials/async-api/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/cloud-quickstart/ 2025-05-29T08:51:37-04:00 ... https://qdrant.tech/documentation/quickstart/ 2025-01-20T10:08:10+01:00 ... https://qdrant.tech/documentation/private-cloud/logging-monitoring/ 2025-02-11T18:21:40+01:00 ... https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/ 2024-11-18T15:26:15-08:00 ... https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/ 2025-02-11T18:21:40+01:00 ... https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/ 2025-01-28T15:29:08+01:00 ... https://qdrant.tech/articles/scalar-quantization/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/interfaces/ 2024-11-21T17:41:45+05:30 ... https://qdrant.tech/documentation/private-cloud/api-reference/ 2025-06-03T09:48:32+02:00 ... https://qdrant.tech/documentation/private-cloud/changelog/ 2025-06-03T09:48:32+02:00 ... https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/ 2024-11-18T15:42:18-08:00 ... https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/ 2025-04-30T22:48:05+05:30 ... https://qdrant.tech/documentation/guides/installation/ 2025-05-02T10:37:48+02:00 ... https://qdrant.tech/documentation/multimodal-search/ 2025-04-09T12:55:16+02:00 ... https://qdrant.tech/documentation/fastembed/fastembed-splade/ 2025-04-25T19:38:33+03:00 ... https://qdrant.tech/articles/seed-round/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/articles/langchain-integration/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/rag-deepseek/ 2025-04-26T13:02:13+03:00 ... https://qdrant.tech/documentation/web-ui/ 2024-11-20T23:14:39+05:30 ... https://qdrant.tech/documentation/fastembed/fastembed-colbert/ 2025-06-19T16:21:03+04:00 ... https://qdrant.tech/articles/chatgpt-plugin/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/memory-consumption/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/qa-with-cohere-and-qdrant/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/qdrant-1.2.x/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/articles/dataset-quality/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/concepts/ 2024-11-14T18:59:28+01:00 ... https://qdrant.tech/documentation/fastembed/fastembed-rerankers/ 2025-04-26T13:20:52+03:00 ... https://qdrant.tech/articles/faq-question-answering/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/why-rust/ 2024-09-05T13:07:07-07:00 ... https://qdrant.tech/articles/embedding-recycler/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/cars-recognition/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/guides/administration/ 2025-05-19T15:01:52+02:00 ... https://qdrant.tech/benchmarks/benchmark-faq/ 2024-01-11T19:41:06+05:30 ... https://qdrant.tech/documentation/guides/running-with-gpu/ 2025-03-20T15:19:07+01:00 ... https://qdrant.tech/articles/vector-search-manuals/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/guides/capacity-planning/ 2024-10-05T03:39:41+05:30 ... https://qdrant.tech/documentation/fastembed/ 2025-05-27T18:00:51+02:00 ... https://qdrant.tech/documentation/guides/optimize/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/cloud-getting-started/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/guides/multiple-partitions/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/qdrant-mcp-server/ 2025-05-27T18:00:51+02:00 ... https://qdrant.tech/documentation/cloud-account-setup/ 2025-05-02T18:40:38+02:00 ... https://qdrant.tech/documentation/cloud-rbac/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/cloud/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/hybrid-cloud/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/beginner-tutorials/ 2024-11-18T15:26:15-08:00 ... https://qdrant.tech/documentation/advanced-tutorials/ 2025-02-07T18:51:10-05:00 ... https://qdrant.tech/documentation/private-cloud/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/cloud-pricing-payments/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/ 2025-06-19T11:54:06+03:00 ... https://qdrant.tech/documentation/data-management/ 2025-05-31T21:49:18+02:00 ... https://qdrant.tech/documentation/examples/llama-index-multitenancy/ 2024-04-11T13:13:14-07:00 ... https://qdrant.tech/documentation/database-tutorials/ 2025-06-11T19:02:35+03:00 ... https://qdrant.tech/documentation/embeddings/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/cloud-premium/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/articles/metric-learning-tips/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/cloud/create-cluster/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/frameworks/ 2025-05-19T21:17:24+05:30 ... https://qdrant.tech/articles/qdrant-internals/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/observability/ 2024-11-14T18:59:28+01:00 ... https://qdrant.tech/documentation/platforms/ 2025-05-14T07:24:10-04:00 ... https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/ 2024-05-15T18:01:28+02:00 ... https://qdrant.tech/documentation/examples/cohere-rag-connector/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/send-data/ 2024-11-14T18:59:28+01:00 ... https://qdrant.tech/documentation/examples/ 2025-06-19T11:54:06+03:00 ... https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/ 2024-04-15T17:41:39-07:00 ... https://qdrant.tech/documentation/cloud-api/ 2025-06-06T09:56:35+02:00 ... https://qdrant.tech/documentation/cloud-tools/ 2024-11-19T17:56:47-08:00 ... https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/datasets/ 2024-11-14T18:59:28+01:00 ... https://qdrant.tech/articles/detecting-coffee-anomalies/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/triplet-loss/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/cloud/authentication/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/concepts/collections/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/articles/data-exploration/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/ 2024-04-15T19:50:07-07:00 ... https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/ 2025-05-15T19:37:07+05:30 ... https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/ 2024-08-23T22:48:27+05:30 ... https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/cloud/cluster-access/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/support/ 2025-04-08T10:25:18+02:00 ... https://qdrant.tech/documentation/send-data/databricks/ 2024-07-29T21:03:45+05:30 ... https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/ 2024-08-13T13:38:38+03:00 ... https://qdrant.tech/articles/machine-learning/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/concepts/points/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/concepts/vectors/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/concepts/payload/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/ 2024-07-22T17:09:17-07:00 ... https://qdrant.tech/articles/neural-search-tutorial/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/rag-and-genai/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/cloud/cluster-scaling/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/concepts/search/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/concepts/explore/ 2025-06-12T10:45:50-04:00 ... https://qdrant.tech/documentation/cloud/cluster-monitoring/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/cloud/cluster-upgrades/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/documentation/concepts/hybrid-queries/ 2025-04-23T11:15:58+02:00 ... https://qdrant.tech/articles/filtrable-hnsw/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/concepts/filtering/ 2025-06-09T18:30:19+03:30 ... https://qdrant.tech/articles/practicle-examples/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/documentation/cloud/backups/ 2025-05-02T16:53:21+02:00 ... https://qdrant.tech/articles/qdrant-0-11-release/ 2022-12-06T13:12:27+01:00 ... https://qdrant.tech/articles/qdrant-0-10-release/ 2024-05-15T18:01:28+02:00 ... https://qdrant.tech/documentation/concepts/optimizer/ 2024-11-27T16:59:34+01:00 ... https://qdrant.tech/documentation/concepts/storage/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/concepts/indexing/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/guides/distributed\_deployment/ 2025-02-03T17:33:39+06:00 ... https://qdrant.tech/documentation/concepts/snapshots/ 2025-06-12T09:02:54+03:00 ... https://qdrant.tech/documentation/guides/quantization/ 2025-04-07T00:40:39+02:00 ... https://qdrant.tech/documentation/guides/monitoring/ 2025-02-11T18:21:40+01:00 ... https://qdrant.tech/documentation/guides/configuration/ 2025-02-04T11:00:51+01:00 ... https://qdrant.tech/documentation/guides/security/ 2025-01-20T16:32:23+01:00 ... https://qdrant.tech/documentation/guides/usage-statistics/ 2024-12-03T17:03:30+01:00 ... https://qdrant.tech/documentation/guides/common-errors/ 2025-05-27T12:04:07+02:00 ... https://qdrant.tech/documentation/database-tutorials/migration/ 2025-06-11T18:57:35+03:00 ... https://qdrant.tech/blog/hybrid-cloud-vultr/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/articles/quantum-quantization/ 2023-07-13T01:45:36+02:00 ... https://qdrant.tech/blog/hybrid-cloud-stackit/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-scaleway/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-red-hat-openshift/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-ovhcloud/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-llamaindex/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-langchain/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-jinaai/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-haystack/ 2024-09-24T14:30:20-04:00 ... https://qdrant.tech/blog/hybrid-cloud-digitalocean/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-aleph-alpha/ 2025-02-04T13:55:26+01:00 ... https://qdrant.tech/blog/hybrid-cloud-airbyte/ 2025-02-04T13:55:26+01:00 ... https://qdrant.tech/documentation/observability/openllmetry/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/observability/openlit/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/blog/case-study-lettria-v2/ 2025-06-16T22:38:02-07:00 ... https://qdrant.tech/ 2025-06-19T16:21:03+04:00 ... https://qdrant.tech/blog/beta-database-migration-tool/ 2025-06-18T11:55:05-04:00 ... https://qdrant.tech/blog/case-study-lawme/ 2025-06-11T09:42:37-07:00 ... https://qdrant.tech/blog/case-study-convosearch/ 2025-06-10T09:54:12-07:00 ... https://qdrant.tech/blog/legal-tech-builders-guide/ 2025-06-13T15:44:13-07:00 ... https://qdrant.tech/blog/soc-2-type-ii-hipaa/ 2025-06-17T16:48:22-07:00 ... https://qdrant.tech/blog/n8n-node/ 2025-06-09T15:38:39+02:00 ... https://qdrant.tech/blog/datatalks-course/ 2025-06-05T09:19:05-04:00 ... https://qdrant.tech/blog/case-study-qovery/ 2025-05-27T11:19:41-07:00 ... https://qdrant.tech/blog/case-study-tripadvisor/ 2025-05-13T23:15:13-07:00 ... https://qdrant.tech/blog/case-study-aracor/ 2025-05-13T11:23:13-07:00 ... https://qdrant.tech/blog/case-study-garden-intel/ 2025-05-09T11:56:26-07:00 ... https://qdrant.tech/blog/product-ui-changes/ 2025-05-08T09:28:12-04:00 ... https://qdrant.tech/blog/case-study-pariti/ 2025-05-01T10:05:43-07:00 ... https://qdrant.tech/articles/vector-search-production/ 2025-04-30T17:47:55+02:00 ... https://qdrant.tech/blog/case-study-dust-v2/ 2025-05-08T11:45:46-07:00 ... https://qdrant.tech/blog/case-study-sayone/ 2025-04-29T09:15:10-07:00 ... https://qdrant.tech/blog/superlinked-multimodal-search/ 2025-04-24T14:10:50+02:00 ... https://qdrant.tech/blog/qdrant-1.14.x/ 2025-05-02T15:26:42-03:00 ... https://qdrant.tech/blog/case-study-pathwork/ 2025-05-16T09:10:33-07:00 ... https://qdrant.tech/blog/case-study-lyzr/ 2025-05-16T09:10:33-07:00 ... https://qdrant.tech/blog/case-study-mixpeek/ 2025-05-16T09:10:33-07:00 ... https://qdrant.tech/blog/qdrant-n8n-beyond-simple-similarity-search/ 2025-04-08T11:38:52+02:00 ... https://qdrant.tech/blog/satellite-vector-broadcasting/ 2025-04-01T08:09:34+02:00 ... https://qdrant.tech/blog/case-study-hubspot/ 2025-05-16T09:10:33-07:00 ... https://qdrant.tech/blog/webinar-vibe-coding-rag/ 2025-03-21T16:36:29+01:00 ... https://qdrant.tech/blog/case-study-deutsche-telekom/ 2025-04-03T08:09:56-04:00 ... https://qdrant.tech/blog/enterprise-vector-search/ 2025-04-07T15:17:30-04:00 ... https://qdrant.tech/blog/metadata-deasy-labs/ 2025-02-24T15:04:44-03:00 ... https://qdrant.tech/blog/webinar-crewai-qdrant-obsidian/ 2025-01-24T16:10:16+01:00 ... https://qdrant.tech/blog/qdrant-1.13.x/ 2025-01-24T04:19:54-05:00 ... https://qdrant.tech/blog/static-embeddings/ 2025-01-17T14:53:25+01:00 ... https://qdrant.tech/blog/case-study-voiceflow/ 2024-12-10T10:26:56-08:00 ... https://qdrant.tech/blog/facial-recognition/ 2024-12-03T20:56:40-08:00 ... https://qdrant.tech/blog/colpali-qdrant-optimization/ 2024-11-30T18:57:48-03:00 ... https://qdrant.tech/blog/rag-evaluation-guide/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/blog/case-study-qatech/ 2024-11-21T16:42:35-08:00 ... https://qdrant.tech/blog/qdrant-colpali/ 2024-11-06T17:18:48-08:00 ... https://qdrant.tech/blog/case-study-sprinklr/ 2024-10-18T09:03:19-07:00 ... https://qdrant.tech/blog/qdrant-1.12.x/ 2024-10-08T19:49:58-07:00 ... https://qdrant.tech/blog/qdrant-deeplearning-ai-course/ 2024-10-07T12:25:14-07:00 ... https://qdrant.tech/blog/qdrant-for-startups-launch/ 2024-10-02T19:07:16+05:30 ... https://qdrant.tech/blog/case-study-shakudo/ 2025-03-13T17:47:05+01:00 ... https://qdrant.tech/blog/qdrant-relari/ 2024-09-17T15:53:48-07:00 ... https://qdrant.tech/blog/case-study-nyris/ 2024-09-23T14:05:33-07:00 ... https://qdrant.tech/blog/case-study-kern/ 2024-09-23T14:05:33-07:00 ... https://qdrant.tech/blog/qdrant-1.11.x/ 2024-08-16T00:01:23+02:00 ... https://qdrant.tech/blog/case-study-kairoswealth/ 2024-09-11T14:59:00-07:00 ... https://qdrant.tech/blog/qdrant-1.10.x/ 2024-07-16T22:00:30+05:30 ... https://qdrant.tech/blog/community-highlights-1/ 2024-06-21T02:34:01-03:00 ... https://qdrant.tech/blog/cve-2024-3829-response/ 2024-06-10T12:42:49-04:00 ... https://qdrant.tech/blog/qdrant-soc2-type2-audit/ 2024-08-29T19:19:43+05:30 ... https://qdrant.tech/blog/qdrant-stars-announcement/ 2024-10-05T03:39:41+05:30 ... https://qdrant.tech/blog/qdrant-cpu-intel-benchmark/ 2024-10-08T12:41:46-07:00 ... https://qdrant.tech/blog/qsoc24-interns-announcement/ 2024-05-08T18:04:46-03:00 ... https://qdrant.tech/articles/semantic-cache-ai-data-retrieval/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/blog/are-you-vendor-locked/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/case-study-visua/ 2024-05-01T17:59:13-07:00 ... https://qdrant.tech/blog/qdrant-1.9.x/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud-launch-partners/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/hybrid-cloud/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/blog/rag-advancements-challenges/ 2024-04-12T14:45:02+00:00 ... https://qdrant.tech/blog/building-search-rag-open-api/ 2024-04-12T14:23:42+00:00 ... https://qdrant.tech/blog/gen-ai-and-vector-search/ 2024-07-07T19:32:50-07:00 ... https://qdrant.tech/blog/teaching-vector-db-at-scale/ 2024-04-09T11:06:17+00:00 ... https://qdrant.tech/blog/meow-with-cheshire-cat/ 2024-04-09T11:05:51+00:00 ... https://qdrant.tech/blog/cve-2024-2221-response/ 2024-08-15T17:31:04+02:00 ... https://qdrant.tech/blog/fastllm-announcement/ 2024-04-01T04:13:26-07:00 ... https://qdrant.tech/blog/virtualbrain-best-rag/ 2024-09-20T10:12:14-04:00 ... https://qdrant.tech/blog/youtube-without-paying-cent/ 2024-03-27T12:44:32+00:00 ... https://qdrant.tech/blog/azure-marketplace/ 2024-10-05T03:39:41+05:30 ... https://qdrant.tech/blog/real-time-news-distillation-rag/ 2024-03-25T08:49:27+00:00 ... https://qdrant.tech/blog/insight-generation-platform/ 2024-03-25T08:51:56+00:00 ... https://qdrant.tech/blog/llm-as-a-judge/ 2024-03-19T15:05:24+00:00 ... https://qdrant.tech/blog/vector-search-vector-recommendation/ 2024-03-19T14:22:15+00:00 ... https://qdrant.tech/blog/using-qdrant-and-langchain/ 2024-05-15T18:01:28+02:00 ... https://qdrant.tech/blog/iris-agent-qdrant/ 2024-03-06T09:17:19-08:00 ... https://qdrant.tech/blog/case-study-dailymotion/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/blog/comparing-qdrant-vs-pinecone-vector-databases/ 2025-02-04T13:55:26+01:00 ... https://qdrant.tech/blog/what-is-vector-similarity/ 2024-09-05T13:07:07-07:00 ... https://qdrant.tech/blog/dspy-vs-langchain/ 2025-05-15T19:37:07+05:30 ... https://qdrant.tech/blog/qdrant-summer-of-code-24/ 2024-03-14T18:24:32+01:00 ... https://qdrant.tech/blog/dust-and-qdrant/ 2024-09-20T10:19:38-04:00 ... https://qdrant.tech/blog/bitter-lesson-generative-language-model/ 2024-01-29T16:31:02+00:00 ... https://qdrant.tech/blog/indexify-content-extraction-engine/ 2024-03-07T18:59:29+00:00 ... https://qdrant.tech/blog/qdrant-x-dust-vector-search/ 2024-07-07T19:40:44-07:00 ... https://qdrant.tech/blog/series-a-funding-round/ 2024-10-08T12:41:46-07:00 ... https://qdrant.tech/blog/qdrant-cloud-on-microsoft-azure/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/blog/qdrant-benchmarks-2024/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/blog/navigating-challenges-innovations/ 2024-05-21T09:57:56+02:00 ... https://qdrant.tech/blog/open-source-vector-search-engine-vector-database/ 2024-07-07T19:36:05-07:00 ... https://qdrant.tech/blog/vector-image-search-rag/ 2024-01-25T17:51:08+01:00 ... https://qdrant.tech/blog/semantic-search-vector-database/ 2024-07-07T19:46:08-07:00 ... https://qdrant.tech/blog/llm-complex-search-copilot/ 2024-01-10T11:42:02+00:00 ... https://qdrant.tech/blog/entity-matching-qdrant/ 2024-01-10T11:37:51+00:00 ... https://qdrant.tech/blog/fast-embed-models/ 2024-01-22T10:15:56-08:00 ... https://qdrant.tech/blog/human-language-ai-models/ 2024-01-10T10:31:15+00:00 ... https://qdrant.tech/blog/binary-quantization/ 2024-01-10T10:26:06+00:00 ... https://qdrant.tech/blog/qdrant-unstructured/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/blog/qdrant-n8n/ 2024-03-07T20:31:05+01:00 ... https://qdrant.tech/blog/vector-search-and-applications-record/ 2024-09-06T13:14:12+02:00 ... https://qdrant.tech/blog/cohere-embedding-v3/ 2024-09-06T13:14:12+02:00 ... https://qdrant.tech/blog/case-study-pienso/ 2024-04-10T17:59:48-07:00 ... https://qdrant.tech/blog/case-study-bloop/ 2024-07-18T19:11:22-07:00 ... https://qdrant.tech/articles/qdrant-introduces-full-text-filters-and-indexes/ 2024-09-18T15:57:29-07:00 ... https://qdrant.tech/articles/storing-multiple-vectors-per-object-in-qdrant/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/articles/batch-vector-search-with-qdrant/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/blog/qdrant-supports-arm-architecture/ 2024-01-16T22:02:52+05:30 ... https://qdrant.tech/about-us/ 2024-05-21T09:57:56+02:00 ... https://qdrant.tech/data-analysis-anomaly-detection/ 2024-08-29T10:01:03-04:00 ... https://qdrant.tech/advanced-search/ 2024-08-21T16:31:41-07:00 ... https://qdrant.tech/ai-agents/ 2025-02-12T08:47:39-06:00 ... https://qdrant.tech/e-commerce/ 2025-05-22T20:23:57+02:00 ... https://qdrant.tech/documentation/data-management/airbyte/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/embeddings/aleph-alpha/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/get\_anonymous\_id/ 2025-03-05T11:26:52+00:00 ... https://qdrant.tech/documentation/data-management/airflow/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/data-management/nifi/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/data-management/spark/ 2025-03-06T10:23:24+05:30 ... https://qdrant.tech/documentation/platforms/apify/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/autogen/ 2024-11-20T11:50:06+05:30 ... https://qdrant.tech/documentation/embeddings/bedrock/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/frameworks/lakechain/ 2024-10-17T11:42:14+05:30 ... https://qdrant.tech/about-us/about-us-resources/ 2025-05-30T14:14:31+03:00 ... https://qdrant.tech/brand-resources/ 2024-06-17T16:56:32+03:00 ... https://qdrant.tech/documentation/platforms/bubble/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/security/bug-bounty-program/ 2025-03-28T09:40:53+01:00 ... https://qdrant.tech/documentation/build/ 2024-11-18T14:53:02-08:00 ... https://qdrant.tech/documentation/platforms/buildship/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/camel/ 2024-12-20T13:31:09+05:30 ... https://qdrant.tech/documentation/frameworks/cheshire-cat/ 2025-01-24T11:47:11+01:00 ... https://qdrant.tech/documentation/data-management/cocoindex/ 2025-04-20T23:11:21-07:00 ... https://qdrant.tech/documentation/data-management/cognee/ 2025-05-31T22:06:39+02:00 ... https://qdrant.tech/documentation/embeddings/cohere/ 2025-02-19T10:27:39+03:00 ... https://qdrant.tech/community/ 2025-01-07T11:56:39-06:00 ... https://qdrant.tech/documentation/data-management/confluent/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/contact-us/ 2025-03-13T17:47:05+01:00 ... https://qdrant.tech/legal/credits/ 2022-04-25T15:19:19+02:00 ... https://qdrant.tech/documentation/frameworks/crewai/ 2025-02-27T09:21:41+01:00 ... https://qdrant.tech/customers/ 2024-06-17T16:56:32+03:00 ... https://qdrant.tech/documentation/frameworks/dagster/ 2025-04-15T18:20:05+05:30 ... https://qdrant.tech/documentation/observability/datadog/ 2024-10-31T05:56:39+05:30 ... https://qdrant.tech/documentation/frameworks/deepeval/ 2025-04-24T16:09:40+08:00 ... https://qdrant.tech/documentation/data-management/dlt/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/docarray/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/platforms/docsgpt/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/frameworks/dsrag/ 2024-11-27T17:59:33+05:30 ... https://qdrant.tech/documentation/frameworks/dynamiq/ 2025-03-24T10:22:45+02:00 ... https://qdrant.tech/articles/ecosystem/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/enterprise-solutions/ 2024-08-20T14:08:09-04:00 ... https://qdrant.tech/documentation/frameworks/feast/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/frameworks/fifty-one/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/genkit/ 2024-10-05T03:39:41+05:30 ... https://qdrant.tech/documentation/data-management/fondant/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/embeddings/gemini/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/frameworks/haystack/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/honeyhive/ 2025-05-09T04:07:10-03:00 ... https://qdrant.tech/hospitality-and-travel/ 2025-05-21T18:13:48+02:00 ... https://qdrant.tech/legal/impressum/ 2024-02-28T17:57:34+01:00 ... https://qdrant.tech/documentation/data-management/fluvio/ 2024-09-15T21:31:35+05:30 ... https://qdrant.tech/documentation/platforms/rivet/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/embeddings/jina-embeddings/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/about-us/about-us-get-started/ 2025-05-30T14:14:31+03:00 ... https://qdrant.tech/documentation/platforms/keboola/ 2025-05-14T07:24:10-04:00 ... https://qdrant.tech/documentation/platforms/kotaemon/ 2024-11-07T03:37:15+05:30 ... https://qdrant.tech/documentation/frameworks/langchain/ 2024-08-29T19:19:43+05:30 ... https://qdrant.tech/documentation/frameworks/langchain-go/ 2024-11-04T16:55:24+01:00 ... https://qdrant.tech/documentation/frameworks/langchain4j/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/langgraph/ 2024-11-20T19:27:09+05:30 ... https://qdrant.tech/legal-tech/ 2025-04-24T18:13:38+02:00 ... https://qdrant.tech/documentation/frameworks/llama-index/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/platforms/make/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/mastra/ 2024-12-20T13:30:42+05:30 ... https://qdrant.tech/documentation/frameworks/mem0/ 2024-10-05T13:55:10+05:30 ... https://qdrant.tech/documentation/frameworks/nlweb/ 2025-05-19T21:26:59+05:30 ... https://qdrant.tech/documentation/data-management/mindsdb/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/embeddings/mistral/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/embeddings/mixedbread/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/embeddings/mixpeek/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/platforms/n8n/ 2025-06-06T22:10:24+05:30 ... https://qdrant.tech/documentation/frameworks/neo4j-graphrag/ 2024-11-07T02:58:58+05:30 ... https://qdrant.tech/documentation/embeddings/nomic/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/embeddings/nvidia/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/embeddings/ollama/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/embeddings/openai/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/frameworks/openai-agents/ 2025-04-30T14:10:48+05:30 ... https://qdrant.tech/about-us/about-us-engineering-culture/ 2025-05-30T14:14:31+03:00 ... https://qdrant.tech/documentation/frameworks/pandas-ai/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/partners/ 2024-06-17T16:56:32+03:00 ... https://qdrant.tech/documentation/frameworks/canopy/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/platforms/pipedream/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/platforms/portable/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/platforms/powerapps/ 2025-01-10T21:05:50+05:30 ... https://qdrant.tech/documentation/embeddings/premai/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/pricing/ 2024-08-20T12:47:35-07:00 ... https://qdrant.tech/legal/privacy-policy/ 2025-06-19T13:22:43+02:00 ... https://qdrant.tech/private-cloud/ 2024-05-21T09:57:56+02:00 ... https://qdrant.tech/documentation/platforms/privategpt/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/cloud-tools/pulumi/ 2024-11-19T18:01:59-08:00 ... https://qdrant.tech/articles/ 2024-12-20T13:10:51+01:00 ... https://qdrant.tech/blog/ 2024-05-21T09:57:56+02:00 ... https://qdrant.tech/cloud/ 2024-08-20T11:44:59-07:00 ... https://qdrant.tech/demo/ 2024-09-06T13:14:12+02:00 ... https://qdrant.tech/qdrant-for-startups/ 2024-09-30T18:44:08+02:00 ... https://qdrant.tech/hybrid-cloud/ 2024-05-21T10:11:09+02:00 ... https://qdrant.tech/stars/ 2024-06-17T16:56:32+03:00 ... https://qdrant.tech/qdrant-vector-database/ 2024-08-29T08:43:52-04:00 ... https://qdrant.tech/rag/rag-evaluation-guide/ 2024-09-16T18:43:11+02:00 ... https://qdrant.tech/rag/ 2024-08-20T11:45:42-07:00 ... https://qdrant.tech/documentation/frameworks/ragbits/ 2024-11-07T08:29:10+05:30 ... https://qdrant.tech/recommendations/ 2024-08-20T12:49:28-07:00 ... https://qdrant.tech/documentation/data-management/redpanda/ 2024-08-15T22:23:17+05:30 ... https://qdrant.tech/documentation/frameworks/rig-rs/ 2024-11-07T08:04:53+05:30 ... https://qdrant.tech/documentation/platforms/mulesoft/ 2025-01-10T21:16:11+05:30 ... https://qdrant.tech/documentation/frameworks/semantic-router/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/smolagents/ 2025-01-04T22:43:37+05:30 ... https://qdrant.tech/documentation/embeddings/snowflake/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/frameworks/solon/ 2025-04-15T18:20:05+05:30 ... https://qdrant.tech/documentation/frameworks/spring-ai/ 2024-08-29T19:19:43+05:30 ... https://qdrant.tech/documentation/frameworks/dspy/ 2025-06-16T17:32:35+03:00 ... https://qdrant.tech/subscribe-confirmation/ 2023-12-26T11:53:00+00:00 ... https://qdrant.tech/subscribe/ 2025-02-04T13:55:26+01:00 ... https://qdrant.tech/documentation/frameworks/superduper/ 2024-11-27T17:46:12+05:30 ... https://qdrant.tech/documentation/frameworks/sycamore/ 2024-10-17T11:40:28+05:30 ... https://qdrant.tech/legal/terms\_and\_conditions/ 2021-12-10T10:29:52+01:00 ... https://qdrant.tech/documentation/cloud-tools/terraform/ 2024-11-19T18:01:59-08:00 ... https://qdrant.tech/documentation/frameworks/testcontainers/ 2025-04-24T18:47:10+10:00 ... https://qdrant.tech/documentation/platforms/tooljet/ 2025-03-06T14:58:05+05:30 ... https://qdrant.tech/documentation/embeddings/twelvelabs/ 2025-01-07T21:51:22+05:30 ... https://qdrant.tech/documentation/frameworks/txtai/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/data-management/unstructured/ 2025-02-18T21:01:07+05:30 ... https://qdrant.tech/documentation/embeddings/upstage/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/frameworks/vanna-ai/ 2024-08-15T08:50:37+05:30 ... https://qdrant.tech/documentation/frameworks/mirror-security/ 2025-02-21T09:20:59+05:30 ... https://qdrant.tech/benchmarks/ 2023-02-16T18:40:22+04:00 ... https://qdrant.tech/use-cases/ 2024-09-04T08:01:21-07:00 ... https://qdrant.tech/documentation/platforms/vectorize/ 2025-02-05T06:14:34-05:00 ... https://qdrant.tech/documentation/embeddings/voyage/ 2024-11-28T08:54:13+05:30 ... https://qdrant.tech/documentation/cloud-intro/ 2025-05-02T16:53:21+02:00 ... ... <|page-32-lllmstxt|> ## cloud-tools - [Documentation](https://qdrant.tech/documentation/) - Infrastructure Tools ## [Anchor](https://qdrant.tech/documentation/cloud-tools/\#cloud-tools) Cloud Tools | Integration | Description | | --- | --- | | [Pulumi](https://qdrant.tech/documentation/cloud-tools/pulumi/) | Infrastructure as code tool for creating, deploying, and managing cloud infrastructure | | [Terraform](https://qdrant.tech/documentation/cloud-tools/terraform/) | infrastructure as code tool to define resources in human-readable configuration files. | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-tools/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-tools/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-33-lllmstxt|> ## common-errors - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Troubleshooting --- # [Anchor](https://qdrant.tech/documentation/guides/common-errors/\#solving-common-errors) Solving common errors ## [Anchor](https://qdrant.tech/documentation/guides/common-errors/\#too-many-files-open-os-error-24) Too many files open (OS error 24) Each collection segment needs some files to be open. At some point you may encounter the following errors in your server log: ```text Error: Too many files open (OS error 24) ``` In such a case you may need to increase the limit of the open files. It might be done, for example, while you launch the Docker container: ```bash docker run --ulimit nofile=10000:10000 qdrant/qdrant:latest ``` The command above will set both soft and hard limits to `10000`. If you are not using Docker, the following command will change the limit for the current user session: ```bash ulimit -n 10000 ``` Please note, the command should be executed before you run Qdrant server. ## [Anchor](https://qdrant.tech/documentation/guides/common-errors/\#cant-open-collections-meta-wal) Can’t open Collections meta Wal When starting a Qdrant instance as part of a distributed deployment, you may come across an error message similar to this: ```bash Can't open Collections meta Wal: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" } ``` It means that Qdrant cannot start because a collection cannot be loaded. Its associated [WAL](https://qdrant.tech/documentation/concepts/storage/#versioning) files are currently unavailable, likely because the same files are already being used by another Qdrant instance. Each node must have their own separate storage directory, volume or mount. The formed cluster will take care of sharing all data with each node, putting it all in the correct places for you. If using Kubernetes, each node must have their own volume. If using Docker, each node must have their own storage mount or volume. If using Qdrant directly, each node must have their own storage directory. ## [Anchor](https://qdrant.tech/documentation/guides/common-errors/\#using-python-grpc-client-with-multiprocessing) Using python gRPC client with `multiprocessing` When using the Python gRPC client with `multiprocessing`, you may encounter an error like this: ```text <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "sendmsg: Socket operation on non-socket (88)" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"sendmsg: Socket operation on non-socket (88)", grpc_status:14, created_time:"....."}" ``` This error happens, because `multiprocessing` creates copies of gRPC channels, which share the same socket. When the parent process closes the channel, it closes the socket, and the child processes try to use a closed socket. To prevent this error, you can use the `forkserver` or `spawn` start methods for `multiprocessing`. ```python import multiprocessing multiprocessing.set_start_method("forkserver") # or "spawn" ``` Alternatively, you can switch to `REST` API, async client, or use built-in parallelization in the Python client - functions like `qdrant.upload_points(...)` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/common-errors.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/common-errors.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-34-lllmstxt|> ## qdrant-1.8.x - [Articles](https://qdrant.tech/articles/) - Qdrant 1.8.0: Enhanced Search Capabilities for Better Results [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Qdrant 1.8.0: Enhanced Search Capabilities for Better Results David Myriel, Mike Jang · March 06, 2024 ![Qdrant 1.8.0: Enhanced Search Capabilities for Better Results](https://qdrant.tech/articles_data/qdrant-1.8.x/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#unlocking-next-level-search-exploring-qdrant-180s-advanced-search-capabilities) Unlocking Next-Level Search: Exploring Qdrant 1.8.0’s Advanced Search Capabilities [Qdrant 1.8.0 is out!](https://github.com/qdrant/qdrant/releases/tag/v1.8.0). This time around, we have focused on Qdrant’s internals. Our goal was to optimize performance so that your existing setup can run faster and save on compute. Here is what we’ve been up to: - **Faster [sparse vectors](https://qdrant.tech/articles/sparse-vectors/):** [Hybrid search](https://qdrant.tech/articles/hybrid-search/) is up to 16x faster now! - **CPU resource management:** You can allocate CPU threads for faster indexing. - **Better indexing performance:** We optimized text [indexing](https://qdrant.tech/documentation/concepts/indexing/) on the backend. ## [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#faster-search-with-sparse-vectors) Faster search with sparse vectors Search throughput is now up to 16 times faster for sparse vectors. If you are [using Qdrant for hybrid search](https://qdrant.tech/articles/sparse-vectors/), this means that you can now handle up to sixteen times as many queries. This improvement comes from extensive backend optimizations aimed at increasing efficiency and capacity. What this means for your setup: - **Query speed:** The time it takes to run a search query has been significantly reduced. - **Search capacity:** Qdrant can now handle a much larger volume of search requests. - **User experience:** Results will appear faster, leading to a smoother experience for the user. - **Scalability:** You can easily accommodate rapidly growing users or an expanding dataset. ### [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#sparse-vectors-benchmark) Sparse vectors benchmark Performance results are publicly available for you to test. Qdrant’s R&D developed a dedicated [open-source benchmarking tool](https://github.com/qdrant/sparse-vectors-benchmark) just to test sparse vector performance. A real-life simulation of sparse vector queries was run against the [NeurIPS 2023 dataset](https://big-ann-benchmarks.com/neurips23.html). All tests were done on an 8 CPU machine on Azure. Latency (y-axis) has dropped significantly for queries. You can see the before/after here: ![dropping latency](https://qdrant.tech/articles_data/qdrant-1.8.x/benchmark.png)**Figure 1:** Dropping latency in sparse vector search queries across versions 1.7-1.8. The colors within both scatter plots show the frequency of results. The red dots show that the highest concentration is around 2200ms (before) and 135ms (after). This tells us that latency for sparse vector queries dropped by about a factor of 16. Therefore, the time it takes to retrieve an answer with Qdrant is that much shorter. This performance increase can have a dramatic effect on hybrid search implementations. [Read more about how to set this up.](https://qdrant.tech/articles/sparse-vectors/) FYI, sparse vectors were released in [Qdrant v.1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/#sparse-vectors). They are stored using a different index, so first [check out the documentation](https://qdrant.tech/documentation/concepts/indexing/#sparse-vector-index) if you want to try an implementation. ## [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#cpu-resource-management) CPU resource management Indexing is Qdrant’s most resource-intensive process. Now you can account for this by allocating compute use specifically to indexing. You can assign a number CPU resources towards indexing and leave the rest for search. As a result, indexes will build faster, and search quality will remain unaffected. This isn’t mandatory, as Qdrant is by default tuned to strike the right balance between indexing and search. However, if you wish to define specific CPU usage, you will need to do so from `config.yaml`. This version introduces a `optimizer_cpu_budget` parameter to control the maximum number of CPUs used for indexing. > Read more about `config.yaml` in the [configuration file](https://qdrant.tech/documentation/guides/configuration/). ```yaml --- # CPU budget, how many CPUs (threads) to allocate for an optimization job. optimizer_cpu_budget: 0 ``` - If left at 0, Qdrant will keep 1 or more CPUs unallocated - depending on CPU size. - If the setting is positive, Qdrant will use this exact number of CPUs for indexing. - If the setting is negative, Qdrant will subtract this number of CPUs from the available CPUs for indexing. For most users, the default `optimizer_cpu_budget` setting will work well. We only recommend you use this if your indexing load is significant. Our backend leverages dynamic CPU saturation to increase indexing speed. For that reason, the impact on search query performance ends up being minimal. Ultimately, you will be able to strike the best possible balance between indexing times and search performance. This configuration can be done at any time, but it requires a restart of Qdrant. Changing it affects both existing and new collections. > **Note:** This feature is not configurable on [Qdrant Cloud](https://qdrant.to/cloud). ## [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#better-indexing-for-text-data) Better indexing for text data In order to [minimize your RAM expenditure](https://qdrant.tech/articles/memory-consumption/), we have developed a new way to index specific types of data. Please keep in mind that this is a backend improvement, and you won’t need to configure anything. > Going forward, if you are indexing immutable text fields, we estimate a 10% reduction in RAM loads. Our benchmark result is based on a system that uses 64GB of RAM. If you are using less RAM, this reduction might be higher than 10%. Immutable text fields are static and do not change once they are added to Qdrant. These entries usually represent some type of attribute, description or tag. Vectors associated with them can be indexed more efficiently, since you don’t need to re-index them anymore. Conversely, mutable fields are dynamic and can be modified after their initial creation. Please keep in mind that they will continue to require additional RAM. This approach ensures stability in the [vector search](https://qdrant.tech/documentation/overview/vector-search/) index, with faster and more consistent operations. We achieved this by setting up a field index which helps minimize what is stored. To improve search performance we have also optimized the way we load documents for searches with a text field index. Now our backend loads documents mostly sequentially and in increasing order. ## [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#minor-improvements-and-new-features) Minor improvements and new features Beyond these enhancements, [Qdrant v1.8.0](https://github.com/qdrant/qdrant/releases/tag/v1.8.0) adds and improves on several smaller features: 1. **Order points by payload:** In addition to searching for semantic results, you might want to retrieve results by specific metadata (such as price). You can now use Scroll API to [order points by payload key](https://qdrant.tech/documentation/concepts/points/#order-points-by-payload-key). 2. **Datetime support:** We have implemented [datetime support for the payload index](https://qdrant.tech/documentation/concepts/filtering/#datetime-range). Prior to this, if you wanted to search for a specific datetime range, you would have had to convert dates to UNIX timestamps. ( [PR#3320](https://github.com/qdrant/qdrant/issues/3320)) 3. **Check collection existence:** You can check whether a collection exists via the `/exists` endpoint to the `/collections/{collection_name}`. You will get a true/false response. ( [PR#3472](https://github.com/qdrant/qdrant/pull/3472)). 4. **Find points** whose payloads match more than the minimal amount of conditions. We included the `min_should` match feature for a condition to be `true` ( [PR#3331](https://github.com/qdrant/qdrant/pull/3466/)). 5. **Modify nested fields:** We have improved the `set_payload` API, adding the ability to update nested fields ( [PR#3548](https://github.com/qdrant/qdrant/pull/3548)). ## [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#experience-the-power-of-qdrant-180) Experience the Power of Qdrant 1.8.0 Ready to experience the enhanced performance of Qdrant 1.8.0? Upgrade now and explore the major improvements, from faster sparse vectors to optimized CPU resource management and better indexing for text data. Take your search capabilities to the next level with Qdrant’s latest version. [Try a demo today](https://qdrant.tech/demo/) and see the difference firsthand! ## [Anchor](https://qdrant.tech/articles/qdrant-1.8.x/\#release-notes) Release notes For more information, see [our release notes](https://github.com/qdrant/qdrant/releases/tag/v1.8.0). Qdrant is an open-source project. We welcome your contributions; raise [issues](https://github.com/qdrant/qdrant/issues), or contribute via [pull requests](https://github.com/qdrant/qdrant/pulls)! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.8.x.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.8.x.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-35-lllmstxt|> ## what-is-a-vector-database - [Articles](https://qdrant.tech/articles/) - What is a Vector Database? [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # What is a Vector Database? Sabrina Aquino · October 09, 2024 ![What is a Vector Database?](https://qdrant.tech/articles_data/what-is-a-vector-database/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#what-is-a-vector-database) What Is a Vector Database? ![vector-database-architecture](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-database-1.jpeg) Most of the millions of terabytes of data we generate each day is **unstructured**. Think of the meal photos you snap, the PDFs shared at work, or the podcasts you save but may never listen to. None of it fits neatly into rows and columns. Unstructured data lacks a strict format or schema, making it challenging for conventional databases to manage. Yet, this unstructured data holds immense potential for **AI**, **machine learning**, and **modern search engines**. > A [Vector Database](https://qdrant.tech/qdrant-vector-database/) is a specialized system designed to efficiently handle high-dimensional vector data. It excels at indexing, querying, and retrieving this data, enabling advanced analysis and similarity searches that traditional databases cannot easily perform. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#the-challenge-with-traditional-databases) The Challenge with Traditional Databases Traditional [OLTP](https://www.ibm.com/topics/oltp) and [OLAP](https://www.ibm.com/topics/olap) databases have been the backbone of data storage for decades. They are great at managing structured data with well-defined schemas, like `name`, `address`, `phone number`, and `purchase history`. ![Structure of OLTP and OLAP databases](https://qdrant.tech/articles_data/what-is-a-vector-database/oltp-and-olap.png) But when data can’t be easily categorized, like the content inside a PDF file, things start to get complicated. You can always store the PDF file as raw data, perhaps with some metadata attached. However, the database still wouldn’t be able to understand what’s inside the document, categorize it, or even search for the information that it contains. Also, this applies to more than just PDF documents. Think about the vast amounts of text, audio, and image data you generate every day. If a database can’t grasp the **meaning** of this data, how can you search for or find relationships within the data? ![Structure of a Vector Database](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-db-structure.png) Vector databases allow you to understand the **context** or **conceptual similarity** of unstructured data by representing them as vectors, enabling advanced analysis and retrieval based on data similarity. ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#when-to-use-a-vector-database) When to Use a Vector Database Not sure if you should use a vector database or a traditional database? This chart may help. | **Feature** | **OLTP Database** | **OLAP Database** | **Vector Database** | | --- | --- | --- | --- | | **Data Structure** | Rows and columns | Rows and columns | Vectors | | **Type of Data** | Structured | Structured/Partially Unstructured | Unstructured | | **Query Method** | SQL-based (Transactional Queries) | SQL-based (Aggregations, Analytical Queries) | Vector Search (Similarity-Based) | | **Storage Focus** | Schema-based, optimized for updates | Schema-based, optimized for reads | Context and Semantics | | **Performance** | Optimized for high-volume transactions | Optimized for complex analytical queries | Optimized for unstructured data retrieval | | **Use Cases** | Inventory, order processing, CRM | Business intelligence, data warehousing | Similarity search, recommendations, RAG, anomaly detection, etc. | ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#what-is-a-vector) What Is a Vector? ![vector-database-vector](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-database-7.jpeg) When a machine needs to process unstructured data - an image, a piece of text, or an audio file, it first has to translate that data into a format it can work with: **vectors**. > A **vector** is a numerical representation of data that can capture the **context** and **semantics** of data. When you deal with unstructured data, traditional databases struggle to understand its meaning. However, a vector can translate that data into something a machine can process. For example, a vector generated from text can represent relationships and meaning between words, making it possible for a machine to compare and understand their context. There are three key elements that define a vector in a vector database: the **ID**, the **dimensions**, and the **payload**. These components work together to represent a vector effectively within the system. Together, they form a **point**, which is the core unit of data stored and retrieved in a vector database. ![Representation of a Point in Qdrant](https://qdrant.tech/articles_data/what-is-a-vector-database/point.png) Each one of these parts plays an important role in how vectors are stored, retrieved, and interpreted. Let’s see how. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#1-the-id-your-vectors-unique-identifier) 1\. The ID: Your Vector’s Unique Identifier Just like in a relational database, each vector in a vector database gets a unique ID. Think of it as your vector’s name tag, a **primary key** that ensures the vector can be easily found later. When a vector is added to the database, the ID is created automatically. While the ID itself doesn’t play a part in the similarity search (which operates on the vector’s numerical data), it is essential for associating the vector with its corresponding “real-world” data, whether that’s a document, an image, or a sound file. After a search is performed and similar vectors are found, their IDs are returned. These can then be used to **fetch additional details or metadata** tied to the result. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#2-the-dimensions-the-core-representation-of-the-data) 2\. The Dimensions: The Core Representation of the Data At the core of every vector is a set of numbers, which together form a representation of the data in a **multi-dimensional** space. #### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#from-text-to-vectors-how-does-it-work) From Text to Vectors: How Does It Work? These numbers are generated by **embedding models**, such as deep learning algorithms, and capture the essential patterns or relationships within the data. That’s why the term **embedding** is often used interchangeably with vector when referring to the output of these models. To represent textual data, for example, an embedding will encapsulate the nuances of language, such as semantics and context within its dimensions. ![Creation of a vector based on a sentence with an embedding model](https://qdrant.tech/articles_data/what-is-a-vector-database/embedding-model.png) For that reason, when comparing two similar sentences, their embeddings will turn out to be very similar, because they have similar **linguistic elements**. ![Comparison of the embeddings of 2 similar sentences](https://qdrant.tech/articles_data/what-is-a-vector-database/two-similar-vectors.png) That’s the beauty of embeddings. Tthe complexity of the data is distilled into something that can be compared across a multi-dimensional space. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#3-the-payload-adding-context-with-metadata) 3\. The Payload: Adding Context with Metadata Sometimes you’re going to need more than just numbers to fully understand or refine a search. While the dimensions capture the essence of the data, the payload holds **metadata** for structured information. It could be textual data like descriptions, tags, categories, or it could be numerical values like dates or prices. This extra information is vital when you want to filter or rank search results based on criteria that aren’t directly encoded in the vector. > This metadata is invaluable when you need to apply additional **filters** or **sorting** criteria. For example, if you’re searching for a picture of a dog, the vector helps the database find images that are visually similar. But let’s say you want results showing only images taken within the last year, or those tagged with “vacation.” ![Filtering Example](https://qdrant.tech/articles_data/what-is-a-vector-database/filtering-example.png) The payload can help you narrow down those results by ignoring vectors that doesn’t match your query vector filtering criteria. If you want the full picture of how filtering works in Qdrant, check out our [Complete Guide to Filtering.](https://qdrant.tech/articles/vector-search-filtering/) ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#the-architecture-of-a-vector-database) The Architecture of a Vector Database A vector database is made of multiple different entities and relations. Let’s understand a bit of what’s happening here: ![Architecture Diagram of a Vector Database](https://qdrant.tech/articles_data/what-is-a-vector-database/architecture-vector-db.png) ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#collections) Collections A [collection](https://qdrant.tech/documentation/concepts/collections/) is essentially a group of **vectors** (or “ [points](https://qdrant.tech/documentation/concepts/points/)”) that are logically grouped together **based on similarity or a specific task**. Every vector within a collection shares the same dimensionality and can be compared using a single metric. Avoid creating multiple collections unless necessary; instead, consider techniques like **sharding** for scaling across nodes or **multitenancy** for handling different use cases within the same infrastructure. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#distance-metrics) Distance Metrics These metrics defines how similarity between vectors is calculated. The choice of distance metric is made when creating a collection and the right choice depends on the type of data you’re working with and how the vectors were created. Here are the three most common distance metrics: - **Euclidean Distance:** The straight-line path. It’s like measuring the physical distance between two points in space. Pick this one when the actual distance (like spatial data) matters. - **Cosine Similarity:** This one is about the angle, not the length. It measures how two vectors point in the same direction, so it works well for text or documents when you care more about meaning than magnitude. For example, if two things are _similar_, _opposite_, or _unrelated_: ![Cosine Similarity Example](https://qdrant.tech/articles_data/what-is-a-vector-database/cosine-similarity.png) - **Dot Product:** This looks at how much two vectors align. It’s popular in recommendation systems where you’re interested in how much two things “agree” with each other. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#ram-based-and-memmap-storage) RAM-Based and Memmap Storage By default, Qdrant stores vectors in RAM, delivering incredibly fast access for datasets that fit comfortably in memory. But when your dataset exceeds RAM capacity, Qdrant offers Memmap as an alternative. Memmap allows you to store vectors **on disk**, yet still access them efficiently by mapping the data directly into memory if you have enough RAM. To enable it, you only need to set `"on_disk": true` when you are **creating a collection:** ```python from qdrant_client import QdrantClient, models client = QdrantClient(url='http://localhost:6333') client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( size=768, distance=models.Distance.COSINE, on_disk=True ), ) ``` For other configurations like `hnsw_config.on_disk` or `memmap_threshold`, see the Qdrant documentation for [Storage.](https://qdrant.tech/documentation/concepts/storage/) ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#sdks) SDKs Qdrant offers a range of SDKs. You can use the programming language you’re most comfortable with, whether you’re coding in [Python](https://github.com/qdrant/qdrant-client), [Go](https://github.com/qdrant/go-client), [Rust](https://github.com/qdrant/rust-client), [Javascript/Typescript](https://github.com/qdrant/qdrant-js), [C#](https://github.com/qdrant/qdrant-dotnet) or [Java](https://github.com/qdrant/java-client). ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#the-core-functionalities-of-vector-databases) The Core Functionalities of Vector Databases ![vector-database-functions](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-database-3.jpeg) When you think of a traditional database, the operations are familiar: you **create**, **read**, **update**, and **delete** records. These are the fundamentals. And guess what? In many ways, vector databases work the same way, but the operations are translated for the complexity of vectors. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#1-indexing-hnsw-index-and-sending-data-to-qdrant) 1\. Indexing: HNSW Index and Sending Data to Qdrant Indexing your vectors is like creating an entry in a traditional database. But for vector databases, this step is very important. Vectors need to be indexed in a way that makes them easy to search later on. **HNSW** (Hierarchical Navigable Small World) is a powerful indexing algorithm that most vector databases rely on to organize vectors for fast and efficient search. It builds a multi-layered graph, where each vector is a node and connections represent similarity. The higher layers connect broadly similar vectors, while lower layers link vectors that are closely related, making searches progressively more refined as they go deeper. ![Indexing Data with the HNSW algorithm](https://qdrant.tech/articles_data/what-is-a-vector-database/hnsw.png) When you run a search, HNSW starts at the top, quickly narrowing down the search by hopping between layers. It focuses only on relevant vectors as it goes deeper, refining the search with each step. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#11-payload-indexing) 1.1 Payload Indexing In Qdrant, indexing is modular. You can configure indexes for **both vectors and payloads independently**. The payload index is responsible for optimizing filtering based on metadata. Each payload index is built for a specific field and allows you to quickly filter vectors based on specific conditions. ![Searching Data with the HNSW algorithm](https://qdrant.tech/articles_data/what-is-a-vector-database/hnsw-search.png) You need to build the payload index for **each field** you’d like to search. The magic here is in the combination: HNSW finds similar vectors, and the payload index makes sure only the ones that fit your criteria come through. Learn more about Qdrant’s [Filtrable HNSW](https://qdrant.tech/articles/filtrable-hnsw/) and why it was built like this. > Combining [full-text search](https://qdrant.tech/documentation/concepts/indexing/#full-text-index) with vector-based search gives you even more versatility. You can simultaneously search for conceptually similar documents while ensuring specific keywords are present, all within the same query. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#2-searching-approximate-nearest-neighbors-ann-search) 2\. Searching: Approximate Nearest Neighbors (ANN) Search Similarity search allows you to search by **meaning**. This way you can do searches such as similar songs that evoke the same mood, finding images that match your artistic vision, or even exploring emotional patterns in text. ![Similar words grouped together](https://qdrant.tech/articles_data/what-is-a-vector-database/similarity.png) The way it works is, when the user queries the database, this query is also converted into a vector. The algorithm quickly identifies the area of the graph likely to contain vectors closest to the **query vector**. ![Approximate Nearest Neighbors (ANN) Search Graph](https://qdrant.tech/articles_data/what-is-a-vector-database/ann-search.png) The search then moves down progressively narrowing down to more closely related and relevant vectors. Once the closest vectors are identified at the bottom layer, these points translate back to actual data, representing your **top-scored documents**. Here’s a high-level overview of this process: ![Vector Database Searching Functionality](https://qdrant.tech/articles_data/what-is-a-vector-database/simple-arquitecture.png) ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#3-updating-vectors-real-time-and-bulk-adjustments) 3\. Updating Vectors: Real-Time and Bulk Adjustments Data isn’t static, and neither are vectors. Keeping your vectors up to date is crucial for maintaining relevance in your searches. Vector updates don’t always need to happen instantly, but when they do, Qdrant handles real-time modifications efficiently with a simple API call: ```python client.upsert( collection_name='product_collection', points=[PointStruct(id=product_id, vector=new_vector, payload=new_payload)] ) ``` For large-scale changes, like re-indexing vectors after a model update, batch updating allows you to update multiple vectors in one operation without impacting search performance: ```python batch_of_updates = [\ PointStruct(id=product_id_1, vector=updated_vector_1, payload=new_payload_1),\ PointStruct(id=product_id_2, vector=updated_vector_2, payload=new_payload_2),\ # Add more points...\ ] client.upsert( collection_name='product_collection', points=batch_of_updates ) ``` ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#4-deleting-vectors-managing-outdated-and-duplicate-data) 4\. Deleting Vectors: Managing Outdated and Duplicate Data Efficient vector management is key to keeping your searches accurate and your database lean. Deleting vectors that represent outdated or irrelevant data, such as expired products, old news articles, or archived profiles, helps maintain both performance and relevance. In Qdrant, removing vectors is straightforward, requiring only the vector IDs to be specified: ```python client.delete( collection_name='data_collection', points_selector=[point_id_1, point_id_2] ) ``` You can use deletion to remove outdated data, clean up duplicates, and manage the lifecycle of vectors by automatically deleting them after a set period to keep your dataset relevant and focused. ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#dense-vs-sparse-vectors) Dense vs. Sparse Vectors ![vector-database-dense-sparse](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-database-4.jpeg) Now that you understand what vectors are and how they are created, let’s learn more about the two possible types of vectors you can use: **dense** or **sparse**. The main difference between the two are: ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#1-dense-vectors) 1\. Dense Vectors Dense vectors are, quite literally, dense with information. Every element in the vector contributes to the **semantic meaning**, **relationships** and **nuances** of the data. A dense vector representation of this sentence might look like this: ![Representation of a Dense Vector](https://qdrant.tech/articles_data/what-is-a-vector-database/dense-1.png) Each number holds weight. Together, they convey the overall meaning of the sentence, and are better for identifying contextually similar items, even if the words don’t match exactly. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#2-sparse-vectors) 2\. Sparse Vectors Sparse vectors operate differently. They focus only on the essentials. In most sparse vectors, a large number of elements are zeros. When a feature or token is present, it’s marked—otherwise, zero. In the image, you can see a sentence, _“I love Vector Similarity,”_ broken down into tokens like _“i,” “love,” “vector”_ through tokenization. Each token is assigned a unique `ID` from a large vocabulary. For example, _“i”_ becomes `193`, and _“vector”_ becomes `15012`. ![How Sparse Vectors are Created](https://qdrant.tech/articles_data/what-is-a-vector-database/sparse.png) Sparse vectors, are used for **exact matching** and specific token-based identification. The values on the right, such as `193: 0.04` and `9182: 0.12`, are the scores or weights for each token, showing how relevant or important each token is in the context. The final result is a sparse vector: ```json { 193: 0.04, 9182: 0.12, 15012: 0.73, 6731: 0.69, 454: 0.21 } ``` Everything else in the vector space is assumed to be zero. Sparse vectors are ideal for tasks like **keyword search** or **metadata filtering**, where you need to check for the presence of specific tokens without needing to capture the full meaning or context. They suited for exact matches within the **data itself**, rather than relying on external metadata, which is handled by payload filtering. ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#benefits-of-hybrid-search) Benefits of Hybrid Search ![vector-database-get-started](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-database-5.jpeg) Sometimes context alone isn’t enough. Sometimes you need precision, too. Dense vectors are fantastic when you need to retrieve results based on the context or meaning behind the data. Sparse vectors are useful when you also need **keyword or specific attribute matching**. > With hybrid search you don’t have to choose one over the othe and use both to get searches that are more **relevant** and **filtered**. To achieve this balance, Qdrant uses **normalization** and **fusion** techniques to blend results from multiple search methods. One common approach is **Reciprocal Rank Fusion (RRF)**, where results from different methods are merged, giving higher importance to items ranked highly by both methods. This ensures that the best candidates, whether identified through dense or sparse vectors, appear at the top of the results. Qdrant combines dense and sparse vector results through a process of **normalization** and **fusion**. ![Hybrid Search API - How it works](https://qdrant.tech/articles_data/what-is-a-vector-database/hybrid-search-2.png) ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#how-to-use-hybrid-search-in-qdrant) How to Use Hybrid Search in Qdrant Qdrant makes it easy to implement hybrid search through its Query API. Here’s how you can make it happen in your own project: ![Hybrid Query Example](https://qdrant.tech/articles_data/what-is-a-vector-database/hybrid-query-1.png) **Example Hybrid Query:** Let’s say a researcher is looking for papers on NLP, but the paper must specifically mention “transformers” in the content: ```json search_query = { "vector": query_vector, # Dense vector for semantic search "filter": { # Filtering for specific terms "must": [\ {"key": "text", "match": "transformers"} # Exact keyword match in the paper\ ] } } ``` In this query the dense vector search finds papers related to the broad topic of NLP and the sparse vector filtering ensures that the papers specifically mention “transformers”. This is just a simple example and there’s so much more you can do with it. See our complete [article on Hybrid Search](https://qdrant.tech/articles/hybrid-search/) guide to see what’s happening behind the scenes and all the possibilities when building a hybrid search system. ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#quantization-get-40x-faster-results) Quantization: Get 40x Faster Results ![vector-database-architecture](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-database-2.jpeg) As your vector dataset grow larger, so do the computational demands of searching through it. Quantized vectors are much smaller and easier to compare. With methods like [**Binary Quantization**](https://qdrant.tech/articles/binary-quantization/), you can see **search speeds improve by up to 40x while memory usage decreases by 32x**. Improvements that can be decicive when dealing with large datasets or needing low-latency results. It works by converting high-dimensional vectors, which typically use `4 bytes` per dimension, into binary representations, using just `1 bit` per dimension. Values above zero become “1”, and everything else becomes “0”. ![ Binary Quantization example](https://qdrant.tech/articles_data/what-is-a-vector-database/binary-quantization.png) Quantization reduces data precision, and yes, this does lead to some loss of accuracy. However, for binary quantization, **OpenAI embeddings** achieves this performance improvement at a cost of only 5% of accuracy. If you apply techniques like **oversampling** and **rescoring**, this loss can be brought down even further. However, binary quantization isn’t the only available option. Techniques like [**Scalar Quantization**](https://qdrant.tech/documentation/guides/quantization/#scalar-quantization) and [**Product Quantization**](https://qdrant.tech/documentation/guides/quantization/#product-quantization) are also popular alternatives when optimizing vector compression. You can set up your chosen quantization method using the `quantization_config` parameter when creating a new collection: ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( size=1536, distance=models.Distance.COSINE ), # Choose your preferred quantization method quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True, # Store the quantized vectors in RAM for faster access ), ), ) ``` You can store original vectors on disk within the `vectors_config` by setting `on_disk=True` to save RAM space, while keeping quantized vectors in RAM for faster access We recommend checking out our [Vector Quantization guide](https://qdrant.tech/articles/what-is-vector-quantization/) for a full breakdown of methods and tips on **optimizing performance** for your specific use case. ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#distributed-deployment) Distributed Deployment When thinking about scaling, the key factors to consider are **fault tolerance**, **load balancing**, and **availability**. One node, no matter how powerful, can only take you so far. Eventually, you’ll need to spread the workload across multiple machines to ensure the system remains fast and stable. ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#sharding-distributing-data-across-nodes) Sharding: Distributing Data Across Nodes In a distributed Qdrant cluster, data is split into smaller units called **shards**, which are distributed across different nodes. which helps balance the load and ensures that queries can be processed in parallel. Each collection—a group of related data points—can be split into non-overlapping subsets, which are then managed by different nodes. ![ Distributed vector database with sharding and Raft consensus](https://qdrant.tech/articles_data/what-is-a-vector-database/sharding-raft.png) **Raft Consensus** ensures that all the nodes stay in sync and have a consistent view of the data. Each node knows where every shard is, and Raft ensures that all nodes are in sync. If one node fails, the others know where the missing data is located and can take over. By default, the number of shards in your Qdrant system matches the number of nodes in your cluster. But if you need more control, you can choose the `shard_number` manually when creating a collection. ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=300, distance=models.Distance.COSINE), shard_number=4, # Custom number of shards ) ``` There are two main types of sharding: 1. **Automatic Sharding:** Points (vectors) are automatically distributed across shards using consistent hashing. Each shard contains non-overlapping subsets of the data. 2. **User-defined Sharding:** Specify how points are distributed, enabling more control over your data organization, especially for use cases like **multitenancy**, where each tenant (a user, client, or organization) has their own isolated data. Each shard is divided into **segments**. They are a smaller storage unit within a shard, storing a subset of vectors and their associated payloads (metadata). When a query is executed, it targets the only relevant segments, processing them in parallel. ![Segments act as smaller storage units within a shard](https://qdrant.tech/articles_data/what-is-a-vector-database/segments.png) ### [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#replication-high-availability-and-data-integrity) Replication: High Availability and Data Integrity You don’t want a single failure to take down your system, right? Replication keeps multiple copies of the same data across different nodes to ensure **high availability**. In Qdrant, **Replica Sets** manage these copies of shards across different nodes. If one replica becomes unavailable, others are there to take over and keep the system running. Whether the data is local or remote is mainly influenced by how you’ve configured the cluster. ![ Replica Set and Replication diagram](https://qdrant.tech/articles_data/what-is-a-vector-database/replication.png) When a query is made, if the relevant data is stored locally, the local shard handles the operation. If the data is on a remote shard, it’s retrieved via gRPC. You can control how many copies you want with the `replication_factor`. For example, creating a collection with 4 shards and a replication factor of 2 will result in 8 physical shards distributed across the cluster: ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=300, distance=models.Distance.COSINE), shard_number=4, replication_factor=2, ) ``` We recommend using sharding and replication together so that your data is both split across nodes and replicated for availability. For more details on features like **user-defined sharding, node failure recovery**, and **consistency guarantees**, see our guide on [Distributed Deployment.](https://qdrant.tech/documentation/guides/distributed_deployment/) ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#multitenancy-data-isolation-for-multi-tenant-architectures) Multitenancy: Data Isolation for Multi-Tenant Architectures ![vector-database-get-started](https://qdrant.tech/articles_data/what-is-a-vector-database/vector-database-6.png) Sharding efficiently distributes data across nodes, while replication guarantees redundancy and fault tolerance. But what happens when you’ve got multiple clients or user groups, and you need to keep their data isolated within the same infrastructure? **Multitenancy** allows you to keep data for different tenants (users, clients, or organizations) isolated within a single cluster. Instead of creating separate collections for `Tenant 1` and `Tenant 2`, you store their data in the same collection but tag each vector with a `group_id` to identify which tenant it belongs to. ![Multitenancy dividing data between 2 tenants](https://qdrant.tech/articles_data/what-is-a-vector-database/multitenancy-1.png) In the backend, Qdrant can store `Tenant 1`’s data in Shard 1 located in Canada (perhaps for compliance reasons like GDPR), while `Tenant 2`’s data is stored in Shard 2 located in Germany. The data will be physically separated but still within the same infrastructure. To implement this, you tag each vector with a tenant-specific `group_id` during the upsert operation: ```python client.upsert( collection_name="tenant_data", points=[models.PointStruct(\ id=2,\ payload={"group_id": "tenant_1"},\ vector=[0.1, 0.9, 0.1]\ )], shard_key_selector="canada" ) ``` Each tenant’s data remains isolated while still benefiting from the shared infrastructure. Optimizing for data privacy, compliance with local regulations, and scalability, without the need to create excessive collections or maintain separate clusters for each tenant. If you want to learn more about working with a multitenant setup in Qdrant, you can check out our [Multitenancy and Custom Sharding dedicated guide.](https://qdrant.tech/articles/multitenancy/) ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#data-security-and-access-control) Data Security and Access Control A common security risk in vector databases is the possibility of **embedding inversion attacks**, where attackers could reconstruct the original data from embeddings. There are many layers of protection you can use to secure your instance that are very important before getting your vector database into production. For quick security in simpler use cases, you can use the **API key authentication**. To enable it, set up the API key in the configuration or environment variable. ```yaml service: api_key: your_secret_api_key_here enable_tls: true # Make sure to enable TLS to protect the API key from being exposed ``` Once this is set up, remember to include the API key in all your requests: ```python from qdrant_client import QdrantClient client = QdrantClient( url="https://localhost:6333", api_key="your_secret_api_key_here" ) ``` In more advanced setups, Qdrant uses **JWT (JSON Web Tokens)** to enforce **Role-Based Access Control (RBAC)**. RBAC defines roles and assigns permissions, while JWT securely encodes these roles into tokens. Each request is validated against the user’s JWT, ensuring they can only access or modify data based on their assigned permissions. You can easily setup you access tokens and secure access to sensitive data through the **Qdrant Web UI:** ![Qdrant Web UI for generating a new access token.](https://qdrant.tech/articles_data/what-is-a-vector-database/jwt-web-ui.png) By default, Qdrant instances are **unsecured**, so it’s important to configure security measures before moving to production. To learn more about how to configure security for your Qdrant instance and other advanced options, please check out the [official Qdrant documentation on security.](https://qdrant.tech/documentation/guides/security/) ## [Anchor](https://qdrant.tech/articles/what-is-a-vector-database/\#time-to-experiment) Time to Experiment As we’ve seen in this article, a vector database is definitely not **just** a database as we traditionally know it. It opens up a world of possibilities, from advanced similarity search to hybrid search that allows content retrieval with both context and precision. But there’s no better way to learn than by doing. Try building a [semantic search engine](https://qdrant.tech/documentation/tutorials/search-beginners/) or experiment deploying a [hybrid search service](https://qdrant.tech/documentation/tutorials/hybrid-search-fastembed/) from zero. You’ll realize there are endless ways you can take advantage of vectors. | **Use Case** | **How It Works** | **Examples** | | --- | --- | --- | | **Similarity Search** | Finds similar data points using vector distances | Find similar product images, retrieve documents based on themes, discover related topics | | **Anomaly Detection** | Identifies outliers based on deviations in vector space | Detect unusual user behavior in banking, spot irregular patterns | | **Recommendation Systems** | Uses vector embeddings to learn and model user preferences | Personalized movie or music recommendations, e-commerce product suggestions | | **RAG (Retrieval-Augmented Generation)** | Combines vector search with large language models (LLMs) for contextually relevant answers | Customer support, auto-generate summaries of documents, research reports | | **Multimodal Search** | Search across different types of data like text, images, and audio in a single query. | Search for products with a description and image, retrieve images based on audio or text | | **Voice & Audio Recognition** | Uses vector representations to recognize and retrieve audio content | Speech-to-text transcription, voice-controlled smart devices, identify and categorize sounds | | **Knowledge Graph Augmentation** | Links unstructured data to concepts in knowledge graphs using vectors | Link research papers to related studies, connect customer reviews to product features, organize patents by innovation trends | You can also watch our video tutorial and get started with Qdrant to generate semantic search results and recommendations from a sample dataset. Getting Started with Qdrant - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [Getting Started with Qdrant](https://www.youtube.com/watch?v=LRcZ9pbGnno) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=LRcZ9pbGnno&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 24:22 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=LRcZ9pbGnno "Watch on YouTube") Phew! I hope you found some of the concepts here useful. If you have any questions feel free to send them in our [Discord Community](https://discord.com/invite/qdrant) where our team will be more than happy to help you out! > Remember, don’t get lost in vector space! 🚀 ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-is-a-vector-database.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-is-a-vector-database.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-36-lllmstxt|> ## data-management - [Documentation](https://qdrant.tech/documentation/) - Data Management ## [Anchor](https://qdrant.tech/documentation/data-management/\#data-management-integrations) Data Management Integrations | Integration | Description | | --- | --- | | [Airbyte](https://qdrant.tech/documentation/data-management/airbyte/) | Data integration platform specialising in ELT pipelines. | | [Airflow](https://qdrant.tech/documentation/data-management/airflow/) | Platform designed for developing, scheduling, and monitoring batch-oriented workflows. | | [CocoIndex](https://qdrant.tech/documentation/data-management/cocoindex/) | High performance ETL framework to transform data for AI, with real-time incremental processing | | [Cognee](https://qdrant.tech/documentation/data-management/cognee/) | AI memory frameworks that allows loading from 30+ data sources to graph and vector stores | | [Connect](https://qdrant.tech/documentation/data-management/redpanda/) | Declarative data-agnostic streaming service for efficient, stateless processing. | | [Confluent](https://qdrant.tech/documentation/data-management/confluent/) | Fully-managed data streaming platform with a cloud-native Apache Kafka engine. | | [DLT](https://qdrant.tech/documentation/data-management/dlt/) | Python library to simplify data loading processes between several sources and destinations. | | [Fluvio](https://qdrant.tech/documentation/data-management/fluvio/) | Rust-based platform for high speed, real-time data processing. | | [Fondant](https://qdrant.tech/documentation/data-management/fondant/) | Framework for developing datasets, sharing reusable operations and data processing trees. | | [MindsDB](https://qdrant.tech/documentation/data-management/mindsdb/) | Platform to deploy, serve, and fine-tune models with numerous data source integrations. | | [NiFi](https://qdrant.tech/documentation/data-management/nifi/) | Data ingestion platform to manage data transfer between different sources and destination systems. | | [Spark](https://qdrant.tech/documentation/data-management/spark/) | A unified analytics engine for large-scale data processing. | | [Unstructured](https://qdrant.tech/documentation/data-management/unstructured/) | Python library with components for ingesting and pre-processing data from numerous sources. | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/data-management/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/data-management/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-37-lllmstxt|> ## configuration - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Configuration --- # [Anchor](https://qdrant.tech/documentation/guides/configuration/\#configuration) Configuration Qdrant ships with sensible defaults for collection and network settings that are suitable for most use cases. You can view these defaults in the [Qdrant source](https://github.com/qdrant/qdrant/blob/master/config/config.yaml). If you need to customize the settings, you can do so using configuration files and environment variables. ## [Anchor](https://qdrant.tech/documentation/guides/configuration/\#configuration-files) Configuration Files To customize Qdrant, you can mount your configuration file in any of the following locations. This guide uses `.yaml` files, but Qdrant also supports other formats such as `.toml`, `.json`, and `.ini`. 1. **Main Configuration: `qdrant/config/config.yaml`** Mount your custom `config.yaml` file to override default settings: ```bash docker run -p 6333:6333 \ -v $(pwd)/config.yaml:/qdrant/config/config.yaml \ qdrant/qdrant ``` 2. **Environment-Specific Configuration: `config/{RUN_MODE}.yaml`** Qdrant looks for an environment-specific configuration file based on the `RUN_MODE` variable. By default, the [official Docker image](https://hub.docker.com/r/qdrant/qdrant) uses `RUN_MODE=production`, meaning it will look for `config/production.yaml`. You can override this by setting `RUN_MODE` to another value (e.g., `dev`), and providing the corresponding file: ```bash docker run -p 6333:6333 \ -v $(pwd)/dev.yaml:/qdrant/config/dev.yaml \ -e RUN_MODE=dev \ qdrant/qdrant ``` 3. **Local Configuration: `config/local.yaml`** The `local.yaml` file is typically used for machine-specific settings that are not tracked in version control: ```bash docker run -p 6333:6333 \ -v $(pwd)/local.yaml:/qdrant/config/local.yaml \ qdrant/qdrant ``` 4. **Custom Configuration via `--config-path`** You can specify a custom configuration file path using the `--config-path` argument. This will override other configuration files: ```bash docker run -p 6333:6333 \ -v $(pwd)/config.yaml:/path/to/config.yaml \ qdrant/qdrant \ ./qdrant --config-path /path/to/config.yaml ``` For details on how these configurations are loaded and merged, see the [loading order and priority](https://qdrant.tech/documentation/guides/configuration/#loading-order-and-priority). The full list of available configuration options can be found [below](https://qdrant.tech/documentation/guides/configuration/#configuration-options). ## [Anchor](https://qdrant.tech/documentation/guides/configuration/\#environment-variables) Environment Variables You can also configure Qdrant using environment variables, which always take the highest priority and override any file-based settings. Environment variables follow this format: they should be prefixed with `QDRANT__`, and nested properties should be separated by double underscores ( `__`). For example: ```bash docker run -p 6333:6333 \ -e QDRANT__LOG_LEVEL=INFO \ -e QDRANT__SERVICE__API_KEY= \ -e QDRANT__SERVICE__ENABLE_TLS=1 \ -e QDRANT__TLS__CERT=./tls/cert.pem \ qdrant/qdrant ``` This results in the following configuration: ```yaml log_level: INFO service: enable_tls: true api_key: tls: cert: ./tls/cert.pem ``` ## [Anchor](https://qdrant.tech/documentation/guides/configuration/\#loading-order-and-priority) Loading Order and Priority During startup, Qdrant merges multiple configuration sources into a single effective configuration. The loading order is as follows (from least to most significant): 1. Embedded default configuration 2. `config/config.yaml` 3. `config/{RUN_MODE}.yaml` 4. `config/local.yaml` 5. Custom configuration file 6. Environment variables ### [Anchor](https://qdrant.tech/documentation/guides/configuration/\#overriding-behavior) Overriding Behavior Settings from later sources in the list override those from earlier sources: - Settings in `config/{RUN_MODE}.yaml` (3) will override those in `config/config.yaml` (2). - A custom configuration file provided via `--config-path` (5) will override all other file-based settings. - Environment variables (6) have the highest priority and will override any settings from files. ## [Anchor](https://qdrant.tech/documentation/guides/configuration/\#configuration-validation) Configuration Validation Qdrant validates the configuration during startup. If any issues are found, the server will terminate immediately, providing information about the error. For example: ```console Error: invalid type: 64-bit integer `-1`, expected an unsigned 64-bit or smaller integer for key `storage.hnsw_index.max_indexing_threads` in config/production.yaml ``` This ensures that misconfigurations are caught early, preventing Qdrant from running with invalid settings. ## [Anchor](https://qdrant.tech/documentation/guides/configuration/\#configuration-options) Configuration Options The following YAML example describes the available configuration options. ```yaml log_level: INFO --- # Qdrant logs to stdout. You may configure to also write logs to a file on disk. --- # # Logging format, supports `text` and `json` --- # format: text storage: # Where to store all the data storage_path: ./storage # Where to store snapshots snapshots_path: ./snapshots snapshots_config: # "local" or "s3" - where to store snapshots snapshots_storage: local # s3_config: # bucket: "" # region: "" # access_key: "" # secret_key: "" # Where to store temporary files # If null, temporary snapshots are stored in: storage/snapshots_temp/ temp_path: null # If true - point payloads will not be stored in memory. # It will be read from the disk every time it is requested. # This setting saves RAM by (slightly) increasing the response time. # Note: those payload values that are involved in filtering and are indexed - remain in RAM. # # Default: true on_disk_payload: true # Maximum number of concurrent updates to shard replicas # If `null` - maximum concurrency is used. update_concurrency: null # Write-ahead-log related configuration wal: # Size of a single WAL segment wal_capacity_mb: 32 # Number of WAL segments to create ahead of actual data requirement wal_segments_ahead: 0 # Normal node - receives all updates and answers all queries node_type: "Normal" # Listener node - receives all updates, but does not answer search/read queries # Useful for setting up a dedicated backup node # node_type: "Listener" performance: # Number of parallel threads used for search operations. If 0 - auto selection. max_search_threads: 0 # Max number of threads (jobs) for running optimizations across all collections, each thread runs one job. # If 0 - have no limit and choose dynamically to saturate CPU. # Note: each optimization job will also use `max_indexing_threads` threads by itself for index building. max_optimization_threads: 0 # CPU budget, how many CPUs (threads) to allocate for an optimization job. # If 0 - auto selection, keep 1 or more CPUs unallocated depending on CPU size # If negative - subtract this number of CPUs from the available CPUs. # If positive - use this exact number of CPUs. optimizer_cpu_budget: 0 # Prevent DDoS of too many concurrent updates in distributed mode. # One external update usually triggers multiple internal updates, which breaks internal # timings. For example, the health check timing and consensus timing. # If null - auto selection. update_rate_limit: null # Limit for number of incoming automatic shard transfers per collection on this node, does not affect user-requested transfers. # The same value should be used on all nodes in a cluster. # Default is to allow 1 transfer. # If null - allow unlimited transfers. #incoming_shard_transfers_limit: 1 # Limit for number of outgoing automatic shard transfers per collection on this node, does not affect user-requested transfers. # The same value should be used on all nodes in a cluster. # Default is to allow 1 transfer. # If null - allow unlimited transfers. #outgoing_shard_transfers_limit: 1 # Enable async scorer which uses io_uring when rescoring. # Only supported on Linux, must be enabled in your kernel. # See: #async_scorer: false optimizers: # The minimal fraction of deleted vectors in a segment, required to perform segment optimization deleted_threshold: 0.2 # The minimal number of vectors in a segment, required to perform segment optimization vacuum_min_vector_number: 1000 # Target amount of segments optimizer will try to keep. # Real amount of segments may vary depending on multiple parameters: # - Amount of stored points # - Current write RPS # # It is recommended to select default number of segments as a factor of the number of search threads, # so that each segment would be handled evenly by one of the threads. # If `default_segment_number = 0`, will be automatically selected by the number of available CPUs default_segment_number: 0 # Do not create segments larger this size (in KiloBytes). # Large segments might require disproportionately long indexation times, # therefore it makes sense to limit the size of segments. # # If indexation speed have more priority for your - make this parameter lower. # If search speed is more important - make this parameter higher. # Note: 1Kb = 1 vector of size 256 # If not set, will be automatically selected considering the number of available CPUs. max_segment_size_kb: null # Maximum size (in KiloBytes) of vectors to store in-memory per segment. # Segments larger than this threshold will be stored as read-only memmapped file. # To enable memmap storage, lower the threshold # Note: 1Kb = 1 vector of size 256 # To explicitly disable mmap optimization, set to `0`. # If not set, will be disabled by default. memmap_threshold_kb: null # Maximum size (in KiloBytes) of vectors allowed for plain index. # Default value based on https://github.com/google-research/google-research/blob/master/scann/docs/algorithms.md # Note: 1Kb = 1 vector of size 256 # To explicitly disable vector indexing, set to `0`. # If not set, the default value will be used. indexing_threshold_kb: 20000 # Interval between forced flushes. flush_interval_sec: 5 # Max number of threads (jobs) for running optimizations per shard. # Note: each optimization job will also use `max_indexing_threads` threads by itself for index building. # If null - have no limit and choose dynamically to saturate CPU. # If 0 - no optimization threads, optimizations will be disabled. max_optimization_threads: null # This section has the same options as 'optimizers' above. All values specified here will overwrite the collections # optimizers configs regardless of the config above and the options specified at collection creation. #optimizers_overwrite: # deleted_threshold: 0.2 # vacuum_min_vector_number: 1000 # default_segment_number: 0 # max_segment_size_kb: null # memmap_threshold_kb: null # indexing_threshold_kb: 20000 # flush_interval_sec: 5 # max_optimization_threads: null # Default parameters of HNSW Index. Could be overridden for each collection or named vector individually hnsw_index: # Number of edges per node in the index graph. Larger the value - more accurate the search, more space required. m: 16 # Number of neighbours to consider during the index building. Larger the value - more accurate the search, more time required to build index. ef_construct: 100 # Minimal size threshold (in KiloBytes) below which full-scan is preferred over HNSW search. # This measures the total size of vectors being queried against. # When the maximum estimated amount of points that a condition satisfies is smaller than # `full_scan_threshold_kb`, the query planner will use full-scan search instead of HNSW index # traversal for better performance. # Note: 1Kb = 1 vector of size 256 full_scan_threshold_kb: 10000 # Number of parallel threads used for background index building. # If 0 - automatically select. # Best to keep between 8 and 16 to prevent likelihood of building broken/inefficient HNSW graphs. # On small CPUs, less threads are used. max_indexing_threads: 0 # Store HNSW index on disk. If set to false, index will be stored in RAM. Default: false on_disk: false # Custom M param for hnsw graph built for payload index. If not set, default M will be used. payload_m: null # Default shard transfer method to use if none is defined. # If null - don't have a shard transfer preference, choose automatically. # If stream_records, snapshot or wal_delta - prefer this specific method. # More info: https://qdrant.tech/documentation/guides/distributed_deployment/#shard-transfer-method shard_transfer_method: null # Default parameters for collections collection: # Number of replicas of each shard that network tries to maintain replication_factor: 1 # How many replicas should apply the operation for us to consider it successful write_consistency_factor: 1 # Default parameters for vectors. vectors: # Whether vectors should be stored in memory or on disk. on_disk: null # shard_number_per_node: 1 # Default quantization configuration. # More info: https://qdrant.tech/documentation/guides/quantization quantization: null # Default strict mode parameters for newly created collections. strict_mode: # Whether strict mode is enabled for a collection or not. enabled: false # Max allowed `limit` parameter for all APIs that don't have their own max limit. max_query_limit: null # Max allowed `timeout` parameter. max_timeout: null # Allow usage of unindexed fields in retrieval based (eg. search) filters. unindexed_filtering_retrieve: null # Allow usage of unindexed fields in filtered updates (eg. delete by payload). unindexed_filtering_update: null # Max HNSW value allowed in search parameters. search_max_hnsw_ef: null # Whether exact search is allowed or not. search_allow_exact: null # Max oversampling value allowed in search. search_max_oversampling: null service: # Maximum size of POST data in a single request in megabytes max_request_size_mb: 32 # Number of parallel workers used for serving the api. If 0 - equal to the number of available cores. # If missing - Same as storage.max_search_threads max_workers: 0 # Host to bind the service on host: 0.0.0.0 # HTTP(S) port to bind the service on http_port: 6333 # gRPC port to bind the service on. # If `null` - gRPC is disabled. Default: null # Comment to disable gRPC: grpc_port: 6334 # Enable CORS headers in REST API. # If enabled, browsers would be allowed to query REST endpoints regardless of query origin. # More info: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS # Default: true enable_cors: true # Enable HTTPS for the REST and gRPC API enable_tls: false # Check user HTTPS client certificate against CA file specified in tls config verify_https_client_certificate: false # Set an api-key. # If set, all requests must include a header with the api-key. # example header: `api-key: ` # # If you enable this you should also enable TLS. # (Either above or via an external service like nginx.) # Sending an api-key over an unencrypted channel is insecure. # # Uncomment to enable. # api_key: your_secret_api_key_here # Set an api-key for read-only operations. # If set, all requests must include a header with the api-key. # example header: `api-key: ` # # If you enable this you should also enable TLS. # (Either above or via an external service like nginx.) # Sending an api-key over an unencrypted channel is insecure. # # Uncomment to enable. # read_only_api_key: your_secret_read_only_api_key_here # Uncomment to enable JWT Role Based Access Control (RBAC). # If enabled, you can generate JWT tokens with fine-grained rules for access control. # Use generated token instead of API key. # # jwt_rbac: true # Hardware reporting adds information to the API responses with a # hint on how many resources were used to execute the request. # # Uncomment to enable. # hardware_reporting: true cluster: # Use `enabled: true` to run Qdrant in distributed deployment mode enabled: false # Configuration of the inter-cluster communication p2p: # Port for internal communication between peers port: 6335 # Use TLS for communication between peers enable_tls: false # Configuration related to distributed consensus algorithm consensus: # How frequently peers should ping each other. # Setting this parameter to lower value will allow consensus # to detect disconnected nodes earlier, but too frequent # tick period may create significant network and CPU overhead. # We encourage you NOT to change this parameter unless you know what you are doing. tick_period_ms: 100 --- # Set to true to prevent service from sending usage statistics to the developers. --- # Read more: https://qdrant.tech/documentation/guides/telemetry telemetry_disabled: false --- # Required if either service.enable_tls or cluster.p2p.enable_tls is true. tls: # Server certificate chain file cert: ./tls/cert.pem # Server private key file key: ./tls/key.pem # Certificate authority certificate file. # This certificate will be used to validate the certificates # presented by other nodes during inter-cluster communication. # # If verify_https_client_certificate is true, it will verify # HTTPS client certificate # # Required if cluster.p2p.enable_tls is true. ca_cert: ./tls/cacert.pem # TTL in seconds to reload certificate from disk, useful for certificate rotations. # Only works for HTTPS endpoints. Does not support gRPC (and intra-cluster communication). # If `null` - TTL is disabled. cert_ttl: 3600 ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/configuration.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/configuration.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-38-lllmstxt|> ## collections - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Collections --- # [Anchor](https://qdrant.tech/documentation/concepts/collections/\#collections) Collections A collection is a named set of points (vectors with a payload) among which you can search. The vector of each point within the same collection must have the same dimensionality and be compared by a single metric. [Named vectors](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors) can be used to have multiple vectors in a single point, each of which can have their own dimensionality and metric requirements. Distance metrics are used to measure similarities among vectors. The choice of metric depends on the way vectors obtaining and, in particular, on the method of neural network encoder training. Qdrant supports these most popular types of metrics: - Dot product: `Dot` \- [\[wiki\]](https://en.wikipedia.org/wiki/Dot_product) - Cosine similarity: `Cosine` \- [\[wiki\]](https://en.wikipedia.org/wiki/Cosine_similarity) - Euclidean distance: `Euclid` \- [\[wiki\]](https://en.wikipedia.org/wiki/Euclidean_distance) - Manhattan distance: `Manhattan` \- [\[wiki\]](https://en.wikipedia.org/wiki/Taxicab_geometry) In addition to metrics and vector size, each collection uses its own set of parameters that controls collection optimization, index construction, and vacuum. These settings can be changed at any time by a corresponding request. ## [Anchor](https://qdrant.tech/documentation/concepts/collections/\#setting-up-multitenancy) Setting up multitenancy **How many collections should you create?** In most cases, you should only use a single collection with payload-based partitioning. This approach is called [multitenancy](https://en.wikipedia.org/wiki/Multitenancy). It is efficient for most of users, but it requires additional configuration. [Learn how to set it up](https://qdrant.tech/documentation/tutorials/multiple-partitions/) **When should you create multiple collections?** When you have a limited number of users and you need isolation. This approach is flexible, but it may be more costly, since creating numerous collections may result in resource overhead. Also, you need to ensure that they do not affect each other in any way, including performance-wise. ## [Anchor](https://qdrant.tech/documentation/concepts/collections/\#create-a-collection) Create a collection httpbashpythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 300, "distance": "Cosine" } } ``` ```bash curl -X PUT http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "vectors": { "size": 300, "distance": "Cosine" } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=100, distance=models.Distance.COSINE), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 100, distance: "Cosine" }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{CreateCollectionBuilder, VectorParamsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(100, Distance::Cosine)), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.createCollectionAsync("{collection_name}", VectorParams.newBuilder().setDistance(Distance.Cosine).setSize(100).build()).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 100, Distance = Distance.Cosine } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 100, Distance: qdrant.Distance_Cosine, }), }) ``` In addition to the required options, you can also specify custom values for the following collection options: - `hnsw_config` \- see [indexing](https://qdrant.tech/documentation/concepts/indexing/#vector-index) for details. - `wal_config` \- Write-Ahead-Log related configuration. See more details about [WAL](https://qdrant.tech/documentation/concepts/storage/#versioning) - `optimizers_config` \- see [optimizer](https://qdrant.tech/documentation/concepts/optimizer/) for details. - `shard_number` \- which defines how many shards the collection should have. See [distributed deployment](https://qdrant.tech/documentation/guides/distributed_deployment/#sharding) section for details. - `on_disk_payload` \- defines where to store payload data. If `true` \- payload will be stored on disk only. Might be useful for limiting the RAM usage in case of large payload. - `quantization_config` \- see [quantization](https://qdrant.tech/documentation/guides/quantization/#setting-up-quantization-in-qdrant) for details. - `strict_mode_config` \- see [strict mode](https://qdrant.tech/documentation/guides/administration/#strict-mode) for details. Default parameters for the optional collection parameters are defined in [configuration file](https://github.com/qdrant/qdrant/blob/master/config/config.yaml). See [schema definitions](https://api.qdrant.tech/api-reference/collections/create-collection) and a [configuration file](https://github.com/qdrant/qdrant/blob/master/config/config.yaml) for more information about collection and vector parameters. _Available as of v1.2.0_ Vectors all live in RAM for very quick access. The `on_disk` parameter can be set in the vector configuration. If true, all vectors will live on disk. This will enable the use of [memmaps](https://qdrant.tech/documentation/concepts/storage/#configuring-memmap-storage), which is suitable for ingesting a large amount of data. ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#create-collection-from-another-collection) Create collection from another collection _Available as of v1.0.0_ It is possible to initialize a collection from another existing collection. This might be useful for experimenting quickly with different configurations for the same data set. Make sure the vectors have the same `size` and `distance` function when setting up the vectors configuration in the new collection. If you used the previous sample code, `"size": 300` and `"distance": "Cosine"`. httpbashpythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 100, "distance": "Cosine" }, "init_from": { "collection": "{from_collection_name}" } } ``` ```bash curl -X PUT http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "vectors": { "size": 300, "distance": "Cosine" }, "init_from": { "collection": {from_collection_name} } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=100, distance=models.Distance.COSINE), init_from=models.InitFrom(collection="{from_collection_name}"), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 100, distance: "Cosine" }, init_from: { collection: "{from_collection_name}" }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{CreateCollectionBuilder, Distance, VectorParamsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(100, Distance::Cosine)) .init_from_collection("{from_collection_name}"), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(100) .setDistance(Distance.Cosine) .build())) .setInitFromCollection("{from_collection_name}") .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 100, Distance = Distance.Cosine }, initFromCollection: "{from_collection_name}" ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 100, Distance: qdrant.Distance_Cosine, }), InitFromCollection: qdrant.PtrOf("{from_collection_name}"), }) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#collection-with-multiple-vectors) Collection with multiple vectors _Available as of v0.10.0_ It is possible to have multiple vectors per record. This feature allows for multiple vector storages per collection. To distinguish vectors in one record, they should have a unique name defined when creating the collection. Each named vector in this mode has its distance and size: httpbashpythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "image": { "size": 4, "distance": "Dot" }, "text": { "size": 8, "distance": "Cosine" } } } ``` ```bash curl -X PUT http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "vectors": { "image": { "size": 4, "distance": "Dot" }, "text": { "size": 8, "distance": "Cosine" } } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config={ "image": models.VectorParams(size=4, distance=models.Distance.DOT), "text": models.VectorParams(size=8, distance=models.Distance.COSINE), }, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { image: { size: 4, distance: "Dot" }, text: { size: 8, distance: "Cosine" }, }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, VectorParamsBuilder, VectorsConfigBuilder, }; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut vectors_config = VectorsConfigBuilder::default(); vectors_config .add_named_vector_params("image", VectorParamsBuilder::new(4, Distance::Dot).build()); vectors_config.add_named_vector_params( "text", VectorParamsBuilder::new(8, Distance::Cosine).build(), ); client .create_collection( CreateCollectionBuilder::new("{collection_name}").vectors_config(vectors_config), ) .await?; ``` ```java import java.util.Map; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( "{collection_name}", Map.of( "image", VectorParams.newBuilder().setSize(4).setDistance(Distance.Dot).build(), "text", VectorParams.newBuilder().setSize(8).setDistance(Distance.Cosine).build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParamsMap { Map = { ["image"] = new VectorParams { Size = 4, Distance = Distance.Dot }, ["text"] = new VectorParams { Size = 8, Distance = Distance.Cosine }, } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfigMap( map[string]*qdrant.VectorParams{ "image": { Size: 4, Distance: qdrant.Distance_Dot, }, "text": { Size: 8, Distance: qdrant.Distance_Cosine, }, }), }) ``` For rare use cases, it is possible to create a collection without any vector storage. _Available as of v1.1.1_ For each named vector you can optionally specify [`hnsw_config`](https://qdrant.tech/documentation/concepts/indexing/#vector-index) or [`quantization_config`](https://qdrant.tech/documentation/guides/quantization/#setting-up-quantization-in-qdrant) to deviate from the collection configuration. This can be useful to fine-tune search performance on a vector level. _Available as of v1.2.0_ Vectors all live in RAM for very quick access. On a per-vector basis you can set `on_disk` to true to store all vectors on disk at all times. This will enable the use of [memmaps](https://qdrant.tech/documentation/concepts/storage/#configuring-memmap-storage), which is suitable for ingesting a large amount of data. ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#vector-datatypes) Vector datatypes _Available as of v1.9.0_ Some embedding providers may provide embeddings in a pre-quantized format. One of the most notable examples is the [Cohere int8 & binary embeddings](https://cohere.com/blog/int8-binary-embeddings). Qdrant has direct support for uint8 embeddings, which you can also use in combination with binary quantization. To create a collection with uint8 embeddings, you can use the following configuration: httpbashpythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 1024, "distance": "Cosine", "datatype": "uint8" } } ``` ```bash curl -X PUT http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "vectors": { "size": 1024, "distance": "Cosine", "datatype": "uint8" } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( size=1024, distance=models.Distance.COSINE, datatype=models.Datatype.UINT8, ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { image: { size: 1024, distance: "Cosine", datatype: "uint8" }, }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{ CreateCollectionBuilder, Datatype, Distance, VectorParamsBuilder, }; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}").vectors_config( VectorParamsBuilder::new(1024, Distance::Cosine).datatype(Datatype::Uint8), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.grpc.Collections.Datatype; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync("{collection_name}", VectorParams.newBuilder() .setSize(1024) .setDistance(Distance.Cosine) .setDatatype(Datatype.Uint8) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 1024, Distance = Distance.Cosine, Datatype = Datatype.Uint8 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 1024, Distance: qdrant.Distance_Cosine, Datatype: qdrant.Datatype_Uint8.Enum(), }), }) ``` Vectors with `uint8` datatype are stored in a more compact format, which can save memory and improve search speed at the cost of some precision. If you choose to use the `uint8` datatype, elements of the vector will be stored as unsigned 8-bit integers, which can take values **from 0 to 255**. ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#collection-with-sparse-vectors) Collection with sparse vectors _Available as of v1.7.0_ Qdrant supports sparse vectors as a first-class citizen. Sparse vectors are useful for text search, where each word is represented as a separate dimension. Collections can contain sparse vectors as additional [named vectors](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors) along side regular dense vectors in a single point. Unlike dense vectors, sparse vectors must be named. And additionally, sparse vectors and dense vectors must have different names within a collection. httpbashpythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "sparse_vectors": { "text": { } } } ``` ```bash curl -X PUT http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "sparse_vectors": { "text": { } } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config={}, sparse_vectors_config={ "text": models.SparseVectorParams(), }, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { sparse_vectors: { text: { }, }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{ CreateCollectionBuilder, SparseVectorParamsBuilder, SparseVectorsConfigBuilder, }; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut sparse_vector_config = SparseVectorsConfigBuilder::default(); sparse_vector_config.add_named_vector_params("text", SparseVectorParamsBuilder::default()); client .create_collection( CreateCollectionBuilder::new("{collection_name}") .sparse_vectors_config(sparse_vector_config), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.SparseVectorConfig; import io.qdrant.client.grpc.Collections.SparseVectorParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setSparseVectorsConfig( SparseVectorConfig.newBuilder() .putMap("text", SparseVectorParams.getDefaultInstance())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", sparseVectorsConfig: ("text", new SparseVectorParams()) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", SparseVectorsConfig: qdrant.NewSparseVectorsConfig( map[string]*qdrant.SparseVectorParams{ "text": {}, }), }) ``` Outside of a unique name, there are no required configuration parameters for sparse vectors. The distance function for sparse vectors is always `Dot` and does not need to be specified. However, there are optional parameters to tune the underlying [sparse vector index](https://qdrant.tech/documentation/concepts/indexing/#sparse-vector-index). ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#check-collection-existence) Check collection existence _Available as of v1.8.0_ httpbashpythontypescriptrustjavacsharpgo ```http GET http://localhost:6333/collections/{collection_name}/exists ``` ```bash curl -X GET http://localhost:6333/collections/{collection_name}/exists ``` ```python client.collection_exists(collection_name="{collection_name}") ``` ```typescript client.collectionExists("{collection_name}"); ``` ```rust client.collection_exists("{collection_name}").await?; ``` ```java client.collectionExistsAsync("{collection_name}").get(); ``` ```csharp await client.CollectionExistsAsync("{collection_name}"); ``` ```go import "context" client.CollectionExists(context.Background(), "my_collection") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#delete-collection) Delete collection httpbashpythontypescriptrustjavacsharpgo ```http DELETE http://localhost:6333/collections/{collection_name} ``` ```bash curl -X DELETE http://localhost:6333/collections/{collection_name} ``` ```python client.delete_collection(collection_name="{collection_name}") ``` ```typescript client.deleteCollection("{collection_name}"); ``` ```rust client.delete_collection("{collection_name}").await?; ``` ```java client.deleteCollectionAsync("{collection_name}").get(); ``` ```csharp await client.DeleteCollectionAsync("{collection_name}"); ``` ```go import "context" client.DeleteCollection(context.Background(), "{collection_name}") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#update-collection-parameters) Update collection parameters Dynamic parameter updates may be helpful, for example, for more efficient initial loading of vectors. For example, you can disable indexing during the upload process, and enable it immediately after the upload is finished. As a result, you will not waste extra computation resources on rebuilding the index. The following command enables indexing for segments that have more than 10000 kB of vectors stored: httpbashpythontypescriptrustjavacsharpgo ```http PATCH /collections/{collection_name} { "optimizers_config": { "indexing_threshold": 10000 } } ``` ```bash curl -X PATCH http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "optimizers_config": { "indexing_threshold": 10000 } }' ``` ```python client.update_collection( collection_name="{collection_name}", optimizers_config=models.OptimizersConfigDiff(indexing_threshold=10000), ) ``` ```typescript client.updateCollection("{collection_name}", { optimizers_config: { indexing_threshold: 10000, }, }); ``` ```rust use qdrant_client::qdrant::{OptimizersConfigDiffBuilder, UpdateCollectionBuilder}; client .update_collection( UpdateCollectionBuilder::new("{collection_name}").optimizers_config( OptimizersConfigDiffBuilder::default().indexing_threshold(10000), ), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.UpdateCollection; client.updateCollectionAsync( UpdateCollection.newBuilder() .setCollectionName("{collection_name}") .setOptimizersConfig( OptimizersConfigDiff.newBuilder().setIndexingThreshold(10000).build()) .build()); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateCollectionAsync( collectionName: "{collection_name}", optimizersConfig: new OptimizersConfigDiff { IndexingThreshold = 10000 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.UpdateCollection(context.Background(), &qdrant.UpdateCollection{ CollectionName: "{collection_name}", OptimizersConfig: &qdrant.OptimizersConfigDiff{ IndexingThreshold: qdrant.PtrOf(uint64(10000)), }, }) ``` The following parameters can be updated: - `optimizers_config` \- see [optimizer](https://qdrant.tech/documentation/concepts/optimizer/) for details. - `hnsw_config` \- see [indexing](https://qdrant.tech/documentation/concepts/indexing/#vector-index) for details. - `quantization_config` \- see [quantization](https://qdrant.tech/documentation/guides/quantization/#setting-up-quantization-in-qdrant) for details. - `vectors_config` \- vector-specific configuration, including individual `hnsw_config`, `quantization_config` and `on_disk` settings. - `params` \- other collection parameters, including `write_consistency_factor` and `on_disk_payload`. - `strict_mode_config` \- see [strict mode](https://qdrant.tech/documentation/guides/administration/#strict-mode) for details. Full API specification is available in [schema definitions](https://api.qdrant.tech/api-reference/collections/update-collection). Calls to this endpoint may be blocking as it waits for existing optimizers to finish. We recommended against using this in a production database as it may introduce huge overhead due to the rebuilding of the index. #### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#update-vector-parameters) Update vector parameters _Available as of v1.4.0_ Qdrant 1.4 adds support for updating more collection parameters at runtime. HNSW index, quantization and disk configurations can now be changed without recreating a collection. Segments (with index and quantized data) will automatically be rebuilt in the background to match updated parameters. To put vector data on disk for a collection that **does not have** named vectors, use `""` as name: httpbash ```http PATCH /collections/{collection_name} { "vectors": { "": { "on_disk": true } } } ``` ```bash curl -X PATCH http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "vectors": { "": { "on_disk": true } } }' ``` To put vector data on disk for a collection that **does have** named vectors: Note: To create a vector name, follow the procedure from our [Points](https://qdrant.tech/documentation/concepts/points/#create-vector-name). httpbash ```http PATCH /collections/{collection_name} { "vectors": { "my_vector": { "on_disk": true } } } ``` ```bash curl -X PATCH http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "vectors": { "my_vector": { "on_disk": true } } }' ``` In the following example the HNSW index and quantization parameters are updated, both for the whole collection, and for `my_vector` specifically: httpbashpythontypescriptrustjavacsharpgo ```http PATCH /collections/{collection_name} { "vectors": { "my_vector": { "hnsw_config": { "m": 32, "ef_construct": 123 }, "quantization_config": { "product": { "compression": "x32", "always_ram": true } }, "on_disk": true } }, "hnsw_config": { "ef_construct": 123 }, "quantization_config": { "scalar": { "type": "int8", "quantile": 0.8, "always_ram": false } } } ``` ```bash curl -X PATCH http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "vectors": { "my_vector": { "hnsw_config": { "m": 32, "ef_construct": 123 }, "quantization_config": { "product": { "compression": "x32", "always_ram": true } }, "on_disk": true } }, "hnsw_config": { "ef_construct": 123 }, "quantization_config": { "scalar": { "type": "int8", "quantile": 0.8, "always_ram": false } } }' ``` ```python client.update_collection( collection_name="{collection_name}", vectors_config={ "my_vector": models.VectorParamsDiff( hnsw_config=models.HnswConfigDiff( m=32, ef_construct=123, ), quantization_config=models.ProductQuantization( product=models.ProductQuantizationConfig( compression=models.CompressionRatio.X32, always_ram=True, ), ), on_disk=True, ), }, hnsw_config=models.HnswConfigDiff( ef_construct=123, ), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, quantile=0.8, always_ram=False, ), ), ) ``` ```typescript client.updateCollection("{collection_name}", { vectors: { my_vector: { hnsw_config: { m: 32, ef_construct: 123, }, quantization_config: { product: { compression: "x32", always_ram: true, }, }, on_disk: true, }, }, hnsw_config: { ef_construct: 123, }, quantization_config: { scalar: { type: "int8", quantile: 0.8, always_ram: true, }, }, }); ``` ```rust use std::collections::HashMap; use qdrant_client::qdrant::{ quantization_config_diff::Quantization, vectors_config_diff::Config, HnswConfigDiffBuilder, QuantizationType, ScalarQuantizationBuilder, UpdateCollectionBuilder, VectorParamsDiffBuilder, VectorParamsDiffMap, }; client .update_collection( UpdateCollectionBuilder::new("{collection_name}") .hnsw_config(HnswConfigDiffBuilder::default().ef_construct(123)) .vectors_config(Config::ParamsMap(VectorParamsDiffMap { map: HashMap::from([(\ ("my_vector".into()),\ VectorParamsDiffBuilder::default()\ .hnsw_config(HnswConfigDiffBuilder::default().m(32).ef_construct(123))\ .build(),\ )]), })) .quantization_config(Quantization::Scalar( ScalarQuantizationBuilder::default() .r#type(QuantizationType::Int8.into()) .quantile(0.8) .always_ram(true) .build(), )), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.HnswConfigDiff; import io.qdrant.client.grpc.Collections.QuantizationConfigDiff; import io.qdrant.client.grpc.Collections.QuantizationType; import io.qdrant.client.grpc.Collections.ScalarQuantization; import io.qdrant.client.grpc.Collections.UpdateCollection; import io.qdrant.client.grpc.Collections.VectorParamsDiff; import io.qdrant.client.grpc.Collections.VectorParamsDiffMap; import io.qdrant.client.grpc.Collections.VectorsConfigDiff; client .updateCollectionAsync( UpdateCollection.newBuilder() .setCollectionName("{collection_name}") .setHnswConfig(HnswConfigDiff.newBuilder().setEfConstruct(123).build()) .setVectorsConfig( VectorsConfigDiff.newBuilder() .setParamsMap( VectorParamsDiffMap.newBuilder() .putMap( "my_vector", VectorParamsDiff.newBuilder() .setHnswConfig( HnswConfigDiff.newBuilder() .setM(3) .setEfConstruct(123) .build()) .build()))) .setQuantizationConfig( QuantizationConfigDiff.newBuilder() .setScalar( ScalarQuantization.newBuilder() .setType(QuantizationType.Int8) .setQuantile(0.8f) .setAlwaysRam(true) .build())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateCollectionAsync( collectionName: "{collection_name}", hnswConfig: new HnswConfigDiff { EfConstruct = 123 }, vectorsConfig: new VectorParamsDiffMap { Map = { { "my_vector", new VectorParamsDiff { HnswConfig = new HnswConfigDiff { M = 3, EfConstruct = 123 } } } } }, quantizationConfig: new QuantizationConfigDiff { Scalar = new ScalarQuantization { Type = QuantizationType.Int8, Quantile = 0.8f, AlwaysRam = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.UpdateCollection(context.Background(), &qdrant.UpdateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfigDiffMap( map[string]*qdrant.VectorParamsDiff{ "my_vector": { HnswConfig: &qdrant.HnswConfigDiff{ M: qdrant.PtrOf(uint64(3)), EfConstruct: qdrant.PtrOf(uint64(123)), }, }, }), QuantizationConfig: qdrant.NewQuantizationDiffScalar( &qdrant.ScalarQuantization{ Type: qdrant.QuantizationType_Int8, Quantile: qdrant.PtrOf(float32(0.8)), AlwaysRam: qdrant.PtrOf(true), }), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/collections/\#collection-info) Collection info Qdrant allows determining the configuration parameters of an existing collection to better understand how the points are distributed and indexed. httpbashpythontypescriptrustjavacsharpgo ```http GET /collections/{collection_name} ``` ```bash curl -X GET http://localhost:6333/collections/{collection_name} ``` ```python client.get_collection(collection_name="{collection_name}") ``` ```typescript client.getCollection("{collection_name}"); ``` ```rust client.collection_info("{collection_name}").await?; ``` ```java client.getCollectionInfoAsync("{collection_name}").get(); ``` ```csharp await client.GetCollectionInfoAsync("{collection_name}"); ``` ```go import "context" client.GetCollectionInfo(context.Background(), "{collection_name}") ``` Expected result ```json { "result": { "status": "green", "optimizer_status": "ok", "vectors_count": 1068786, "indexed_vectors_count": 1024232, "points_count": 1068786, "segments_count": 31, "config": { "params": { "vectors": { "size": 384, "distance": "Cosine" }, "shard_number": 1, "replication_factor": 1, "write_consistency_factor": 1, "on_disk_payload": false }, "hnsw_config": { "m": 16, "ef_construct": 100, "full_scan_threshold": 10000, "max_indexing_threads": 0 }, "optimizer_config": { "deleted_threshold": 0.2, "vacuum_min_vector_number": 1000, "default_segment_number": 0, "max_segment_size": null, "memmap_threshold": null, "indexing_threshold": 20000, "flush_interval_sec": 5, "max_optimization_threads": 1 }, "wal_config": { "wal_capacity_mb": 32, "wal_segments_ahead": 0 } }, "payload_schema": {} }, "status": "ok", "time": 0.00010143 } ``` If you insert the vectors into the collection, the `status` field may become `yellow` whilst it is optimizing. It will become `green` once all the points are successfully processed. The following color statuses are possible: - 🟢 `green`: collection is ready - 🟡 `yellow`: collection is optimizing - ⚫ `grey`: collection is pending optimization ( [help](https://qdrant.tech/documentation/concepts/collections/#grey-collection-status)) - 🔴 `red`: an error occurred which the engine could not recover from ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#grey-collection-status) Grey collection status _Available as of v1.9.0_ A collection may have the grey ⚫ status or show “optimizations pending, awaiting update operation” as optimization status. This state is normally caused by restarting a Qdrant instance while optimizations were ongoing. It means the collection has optimizations pending, but they are paused. You must send any update operation to trigger and start the optimizations again. For example: httpbashpythontypescriptrustjavacsharpgo ```http PATCH /collections/{collection_name} { "optimizers_config": {} } ``` ```bash curl -X PATCH http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "optimizers_config": {} }' ``` ```python client.update_collection( collection_name="{collection_name}", optimizer_config=models.OptimizersConfigDiff(), ) ``` ```typescript client.updateCollection("{collection_name}", { optimizers_config: {}, }); ``` ```rust use qdrant_client::qdrant::{OptimizersConfigDiffBuilder, UpdateCollectionBuilder}; client .update_collection( UpdateCollectionBuilder::new("{collection_name}") .optimizers_config(OptimizersConfigDiffBuilder::default()), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.UpdateCollection; client.updateCollectionAsync( UpdateCollection.newBuilder() .setCollectionName("{collection_name}") .setOptimizersConfig( OptimizersConfigDiff.getDefaultInstance()) .build()); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateCollectionAsync( collectionName: "{collection_name}", optimizersConfig: new OptimizersConfigDiff { } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.UpdateCollection(context.Background(), &qdrant.UpdateCollection{ CollectionName: "{collection_name}", OptimizersConfig: &qdrant.OptimizersConfigDiff{}, }) ``` Alternatively you may use the `Trigger Optimizers` button in the [Qdrant Web UI](https://qdrant.tech/documentation/web-ui/). It is shown next to the grey collection status on the collection info page. ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#approximate-point-and-vector-counts) Approximate point and vector counts You may be interested in the count attributes: - `points_count` \- total number of objects (vectors and their payloads) stored in the collection - `vectors_count` \- total number of vectors in a collection, useful if you have multiple vectors per point - `indexed_vectors_count` \- total number of vectors stored in the HNSW or sparse index. Qdrant does not store all the vectors in the index, but only if an index segment might be created for a given configuration. The above counts are not exact, but should be considered approximate. Depending on how you use Qdrant these may give very different numbers than what you may expect. It’s therefore important **not** to rely on them. More specifically, these numbers represent the count of points and vectors in Qdrant’s internal storage. Internally, Qdrant may temporarily duplicate points as part of automatic optimizations. It may keep changed or deleted points for a bit. And it may delay indexing of new points. All of that is for optimization reasons. Updates you do are therefore not directly reflected in these numbers. If you see a wildly different count of points, it will likely resolve itself once a new round of automatic optimizations has completed. To clarify: these numbers don’t represent the exact amount of points or vectors you have inserted, nor does it represent the exact number of distinguishable points or vectors you can query. If you want to know exact counts, refer to the [count API](https://qdrant.tech/documentation/concepts/points/#counting-points). _Note: these numbers may be removed in a future version of Qdrant._ ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#indexing-vectors-in-hnsw) Indexing vectors in HNSW In some cases, you might be surprised the value of `indexed_vectors_count` is lower than `vectors_count`. This is an intended behaviour and depends on the [optimizer configuration](https://qdrant.tech/documentation/concepts/optimizer/). A new index segment is built if the size of non-indexed vectors is higher than the value of `indexing_threshold`(in kB). If your collection is very small or the dimensionality of the vectors is low, there might be no HNSW segment created and `indexed_vectors_count` might be equal to `0`. It is possible to reduce the `indexing_threshold` for an existing collection by [updating collection parameters](https://qdrant.tech/documentation/concepts/collections/#update-collection-parameters). ## [Anchor](https://qdrant.tech/documentation/concepts/collections/\#collection-aliases) Collection aliases In a production environment, it is sometimes necessary to switch different versions of vectors seamlessly. For example, when upgrading to a new version of the neural network. There is no way to stop the service and rebuild the collection with new vectors in these situations. Aliases are additional names for existing collections. All queries to the collection can also be done identically, using an alias instead of the collection name. Thus, it is possible to build a second collection in the background and then switch alias from the old to the new collection. Since all changes of aliases happen atomically, no concurrent requests will be affected during the switch. ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#create-alias) Create alias httpbashpythontypescriptrustjavacsharpgo ```http POST /collections/aliases { "actions": [\ {\ "create_alias": {\ "collection_name": "example_collection",\ "alias_name": "production_collection"\ }\ }\ ] } ``` ```bash curl -X POST http://localhost:6333/collections/aliases \ -H 'Content-Type: application/json' \ --data-raw '{ "actions": [\ {\ "create_alias": {\ "collection_name": "example_collection",\ "alias_name": "production_collection"\ }\ }\ ] }' ``` ```python client.update_collection_aliases( change_aliases_operations=[\ models.CreateAliasOperation(\ create_alias=models.CreateAlias(\ collection_name="example_collection", alias_name="production_collection"\ )\ )\ ] ) ``` ```typescript client.updateCollectionAliases({ actions: [\ {\ create_alias: {\ collection_name: "example_collection",\ alias_name: "production_collection",\ },\ },\ ], }); ``` ```rust use qdrant_client::qdrant::CreateAliasBuilder; client .create_alias(CreateAliasBuilder::new( "example_collection", "production_collection", )) .await?; ``` ```java client.createAliasAsync("production_collection", "example_collection").get(); ``` ```csharp await client.CreateAliasAsync(aliasName: "production_collection", collectionName: "example_collection"); ``` ```go import "context" client.CreateAlias(context.Background(), "production_collection", "example_collection") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#remove-alias) Remove alias httpbashpythontypescriptrustjavacsharpgo ```http POST /collections/aliases { "actions": [\ {\ "delete_alias": {\ "alias_name": "production_collection"\ }\ }\ ] } ``` ```bash curl -X POST http://localhost:6333/collections/aliases \ -H 'Content-Type: application/json' \ --data-raw '{ "actions": [\ {\ "delete_alias": {\ "alias_name": "production_collection"\ }\ }\ ] }' ``` ```python client.update_collection_aliases( change_aliases_operations=[\ models.DeleteAliasOperation(\ delete_alias=models.DeleteAlias(alias_name="production_collection")\ ),\ ] ) ``` ```typescript client.updateCollectionAliases({ actions: [\ {\ delete_alias: {\ alias_name: "production_collection",\ },\ },\ ], }); ``` ```rust client.delete_alias("production_collection").await?; ``` ```java client.deleteAliasAsync("production_collection").get(); ``` ```csharp await client.DeleteAliasAsync("production_collection"); ``` ```go import "context" client.DeleteAlias(context.Background(), "production_collection") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#switch-collection) Switch collection Multiple alias actions are performed atomically. For example, you can switch underlying collection with the following command: httpbashpythontypescriptrustjavacsharpgo ```http POST /collections/aliases { "actions": [\ {\ "delete_alias": {\ "alias_name": "production_collection"\ }\ },\ {\ "create_alias": {\ "collection_name": "example_collection",\ "alias_name": "production_collection"\ }\ }\ ] } ``` ```bash curl -X POST http://localhost:6333/collections/aliases \ -H 'Content-Type: application/json' \ --data-raw '{ "actions": [\ {\ "delete_alias": {\ "alias_name": "production_collection"\ }\ },\ {\ "create_alias": {\ "collection_name": "example_collection",\ "alias_name": "production_collection"\ }\ }\ ] }' ``` ```python client.update_collection_aliases( change_aliases_operations=[\ models.DeleteAliasOperation(\ delete_alias=models.DeleteAlias(alias_name="production_collection")\ ),\ models.CreateAliasOperation(\ create_alias=models.CreateAlias(\ collection_name="example_collection", alias_name="production_collection"\ )\ ),\ ] ) ``` ```typescript client.updateCollectionAliases({ actions: [\ {\ delete_alias: {\ alias_name: "production_collection",\ },\ },\ {\ create_alias: {\ collection_name: "example_collection",\ alias_name: "production_collection",\ },\ },\ ], }); ``` ```rust use qdrant_client::qdrant::CreateAliasBuilder; client.delete_alias("production_collection").await?; client .create_alias(CreateAliasBuilder::new( "example_collection", "production_collection", )) .await?; ``` ```java client.deleteAliasAsync("production_collection").get(); client.createAliasAsync("production_collection", "example_collection").get(); ``` ```csharp await client.DeleteAliasAsync("production_collection"); await client.CreateAliasAsync(aliasName: "production_collection", collectionName: "example_collection"); ``` ```go import "context" client.DeleteAlias(context.Background(), "production_collection") client.CreateAlias(context.Background(), "production_collection", "example_collection") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#list-collection-aliases) List collection aliases httpbashpythontypescriptrustjavacsharpgo ```http GET /collections/{collection_name}/aliases ``` ```bash curl -X GET http://localhost:6333/collections/{collection_name}/aliases ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.get_collection_aliases(collection_name="{collection_name}") ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.getCollectionAliases("{collection_name}"); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.list_collection_aliases("{collection_name}").await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.listCollectionAliasesAsync("{collection_name}").get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.ListCollectionAliasesAsync("{collection_name}"); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.ListCollectionAliases(context.Background(), "{collection_name}") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#list-all-aliases) List all aliases httpbashpythontypescriptrustjavacsharpgo ```http GET /aliases ``` ```bash curl -X GET http://localhost:6333/aliases ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.get_aliases() ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.getAliases(); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.list_aliases().await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.listAliasesAsync().get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.ListAliasesAsync(); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.ListAliases(context.Background()) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/collections/\#list-all-collections) List all collections httpbashpythontypescriptrustjavacsharpgo ```http GET /collections ``` ```bash curl -X GET http://localhost:6333/collections ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.get_collections() ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.getCollections(); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.list_collections().await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.listCollectionsAsync().get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.ListCollectionsAsync(); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.ListCollections(context.Background()) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/collections.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/collections.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-39-lllmstxt|> ## hybrid-search-fastembed - [Documentation](https://qdrant.tech/documentation/) - [Beginner tutorials](https://qdrant.tech/documentation/beginner-tutorials/) - Setup Hybrid Search with FastEmbed --- # [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#build-a-hybrid-search-service-with-fastembed-and-qdrant) Build a Hybrid Search Service with FastEmbed and Qdrant | Time: 20 min | Level: Beginner | Output: [GitHub](https://github.com/qdrant/qdrant_demo/) | | | --- | --- | --- | --- | This tutorial shows you how to build and deploy your own hybrid search service to look through descriptions of companies from [startups-list.com](https://www.startups-list.com/) and pick the most similar ones to your query. The website contains the company names, descriptions, locations, and a picture for each entry. As we have already written on our [blog](https://qdrant.tech/articles/hybrid-search/), there is no single definition of hybrid search. In this tutorial we are covering the case with a combination of dense and [sparse embeddings](https://qdrant.tech/articles/sparse-vectors/). The former ones refer to the embeddings generated by such well-known neural networks as BERT, while the latter ones are more related to a traditional full-text search approach. Our hybrid search service will use [Fastembed](https://github.com/qdrant/fastembed) package to generate embeddings of text descriptions and [FastAPI](https://fastapi.tiangolo.com/) to serve the search API. Fastembed natively integrates with Qdrant client, so you can easily upload the data into Qdrant and perform search queries. ![Hybrid Search Schema](https://qdrant.tech/documentation/tutorials/hybrid-search-with-fastembed/hybrid-search-schema.png) ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#workflow) Workflow To create a hybrid search service, you will need to transform your raw data and then create a search function to manipulate it. First, you will 1) download and prepare a sample dataset using a modified version of the BERT ML model. Then, you will 2) load the data into Qdrant, 3) create a hybrid search API and 4) serve it using FastAPI. ![Hybrid Search Workflow](https://qdrant.tech/docs/workflow-neural-search.png) ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#prerequisites) Prerequisites To complete this tutorial, you will need: - Docker - The easiest way to use Qdrant is to run a pre-built Docker image. - [Raw parsed data](https://storage.googleapis.com/generall-shared-data/startups_demo.json) from startups-list.com. - Python version >=3.9 ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#prepare-sample-dataset) Prepare sample dataset To conduct a hybrid search on startup descriptions, you must first encode the description data into vectors. Fastembed integration into qdrant client combines encoding and uploading into a single step. It also takes care of batching and parallelization, so you don’t have to worry about it. Let’s start by downloading the data and installing the necessary packages. 1. First you need to download the dataset. ```bash wget https://storage.googleapis.com/generall-shared-data/startups_demo.json ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#run-qdrant-in-docker) Run Qdrant in Docker Next, you need to manage all of your data using a vector engine. Qdrant lets you store, update or delete created vectors. Most importantly, it lets you search for the nearest vectors via a convenient API. > **Note:** Before you begin, create a project directory and a virtual python environment in it. 1. Download the Qdrant image from DockerHub. ```bash docker pull qdrant/qdrant ``` 2. Start Qdrant inside of Docker. ```bash docker run -p 6333:6333 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrant ``` You should see output like this ```text ... [2021-02-05T00:08:51Z INFO actix_server::builder] Starting 12 workers [2021-02-05T00:08:51Z INFO actix_server::builder] Starting "actix-web-service-0.0.0.0:6333" service on 0.0.0.0:6333 ``` Test the service by going to [http://localhost:6333/](http://localhost:6333/). You should see the Qdrant version info in your browser. All data uploaded to Qdrant is saved inside the `./qdrant_storage` directory and will be persisted even if you recreate the container. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#upload-data-to-qdrant) Upload data to Qdrant 1. Install the official Python client to best interact with Qdrant. ```bash pip install "qdrant-client[fastembed]>=1.14.2" ``` > **Note:** This tutorial requires fastembed of version >=0.6.1. At this point, you should have startup records in the `startups_demo.json` file and Qdrant running on a local machine. Now you need to write a script to upload all startup data and vectors into the search engine. 2. Create a client object for Qdrant. ```python --- # Import client library from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") ``` 3. Choose models to encode your data and prepare collections. In this tutorial, we will be using two pre-trained models to compute dense and sparse vectors correspondingly The models are: `sentence-transformers/all-MiniLM-L6-v2` and `prithivida/Splade_PP_en_v1`. As soon as the choice is made, we need to configure a collection in Qdrant. ```python dense_vector_name = "dense" sparse_vector_name = "sparse" dense_model_name = "sentence-transformers/all-MiniLM-L6-v2" sparse_model_name = "prithivida/Splade_PP_en_v1" if not client.collection_exists("startups"): client.create_collection( collection_name="startups", vectors_config={ dense_vector_name: models.VectorParams( size=client.get_embedding_size(dense_model_name), distance=models.Distance.COSINE ) }, # size and distance are model dependent sparse_vectors_config={sparse_vector_name: models.SparseVectorParams()}, ) ``` Qdrant requires vectors to have their own names and configurations. Parameters `size` and `distance` are mandatory, however, you can additionaly specify extended configuration for your vectors, like `quantization_config` or `hnsw_config`. 4. Read data from the file. ```python import json payload_path = "startups_demo.json" documents = [] metadata = [] with open(payload_path) as fd: for line in fd: obj = json.loads(line) description = obj["description"] dense_document = models.Document(text=description, model=dense_model_name) sparse_document = models.Document(text=description, model=sparse_model_name) documents.append( { dense_vector_name: dense_document, sparse_vector_name: sparse_document, } ) metadata.append(obj) ``` In this block of code, we read data from `startups_demo.json` file and split it into two list: `documents` and `metadata`. Documents are models with descriptions of startups and model names to embed data. Metadata is payload associated with each startup, such as the name, location, and picture. We will use `documents` to encode the data into vectors. 6. Encode and upload data. ```python client.upload_collection( collection_name="startups", vectors=tqdm.tqdm(documents), payload=metadata, parallel=4, # Use 4 CPU cores to encode data. # This will spawn a model per process, which might be memory expensive # Make sure that your system does not use swap, and reduce the amount # # of processes if it does. # Otherwise, it might significantly slow down the process. # Requires wrapping code into if __name__ == '__main__' block ) ``` Upload processed data Download and unpack the processed data from [here](https://storage.googleapis.com/dataset-startup-search/startup-list-com/startups_hybrid_search_processed_40k.tar.gz) or use the following script: ```bash wget https://storage.googleapis.com/dataset-startup-search/startup-list-com/startups_hybrid_search_processed_40k.tar.gz tar -xvf startups_hybrid_search_processed_40k.tar.gz ``` Then you can upload the data to Qdrant. ```python import json import numpy as np def named_vectors( vectors: list[float], sparse_vectors: list[models.SparseVector] ) -> dict: for vector, sparse_vector in zip(vectors, sparse_vectors): yield { dense_vector_name: vector, sparse_vector_name: models.SparseVector(**sparse_vector), } with open("dense_vectors.npy", "rb") as f: vectors = np.load(f) with open("sparse_vectors.json", "r") as f: sparse_vectors = json.load(f) with open("payload.json", "r") as f: payload = json.load(f) client.upload_collection( "startups", vectors=named_vectors(vectors, sparse_vectors), payload=payload ) ``` The `upload_collection` method will encode all documents and upload them to Qdrant. The `parallel` parameter enables data-parallelism instead of built-in ONNX parallelism. Additionally, you can specify ids for each document, if you want to use them later to update or delete documents. If you don’t specify ids, they will be generated automatically. You can monitor the progress of the encoding by passing tqdm progress bar to the `upload_collection` method. ```python from tqdm import tqdm client.upload_collection( collection_name="startups", vectors=documents, payload=metadata, ids=tqdm(range(len(documents))), ) ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#build-the-search-api) Build the search API Now that all the preparations are complete, let’s start building a neural search class. In order to process incoming requests, the hybrid search class will need 3 things: 1) models to convert the query into a vector, 2) the Qdrant client to perform search queries, 3) fusion function to re-rank dense and sparse search results. Qdrant supports 2 fusion functions for combining the results: [reciprocal rank fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) and [distribution based score fusion](https://qdrant.tech/documentation/concepts/hybrid-queries/?q=distribution+based+sc#:~:text=Distribution%2DBased%20Score%20Fusion) 1. Create a file named `hybrid_searcher.py` and specify the following. ```python from qdrant_client import QdrantClient, models class HybridSearcher: DENSE_MODEL = "sentence-transformers/all-MiniLM-L6-v2" SPARSE_MODEL = "prithivida/Splade_PP_en_v1" def __init__(self, collection_name): self.collection_name = collection_name self.qdrant_client = QdrantClient() ``` 2. Write the search function. ```python def search(self, text: str): search_result = self.qdrant_client.query_points( collection_name=self.collection_name, query=models.FusionQuery( fusion=models.Fusion.RRF # we are using reciprocal rank fusion here ), prefetch=[\ models.Prefetch(\ query=models.Document(text=text, model=self.DENSE_MODEL)\ ),\ models.Prefetch(\ query=models.Document(text=text, model=self.SPARSE_MODEL)\ ),\ ], query_filter=None, # If you don't want any filters for now limit=5, # 5 the closest results ).points # `search_result` contains models.QueryResponse structure # We can access list of scored points with the corresponding similarity scores, # vectors (if `with_vectors` was set to `True`), and payload via `points` attribute. # Select and return metadata metadata = [point.payload for point in search_result] return metadata ``` 3. Add search filters. With Qdrant it is also feasible to add some conditions to the search. For example, if you wanted to search for startups in a certain city, the search query could look like this: ```python ... city_of_interest = "Berlin" # Define a filter for cities city_filter = models.Filter( must=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(value=city_of_interest)\ )\ ] ) # NOTE: it is not a hybrid search! It's just a dense query for simplicity search_result = self.qdrant_client.query_points( collection_name=self.collection_name, query=models.Document(text=text, model=self.DENSE_MODEL), query_filter=city_filter, limit=5 ).points ... ``` You have now created a class for neural search queries. Now wrap it up into a service. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/\#deploy-the-search-with-fastapi) Deploy the search with FastAPI To build the service you will use the FastAPI framework. 1. Install FastAPI. To install it, use the command ```bash pip install fastapi uvicorn ``` 2. Implement the service. Create a file named `service.py` and specify the following. The service will have only one API endpoint and will look like this: ```python from fastapi import FastAPI --- # The file where HybridSearcher is stored from hybrid_searcher import HybridSearcher app = FastAPI() --- # Create a neural searcher instance hybrid_searcher = HybridSearcher(collection_name="startups") @app.get("/api/search") def search_startup(q: str): return {"result": hybrid_searcher.search(text=q)} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` 3. Run the service. ```bash python service.py ``` 4. Open your browser at [http://localhost:8000/docs](http://localhost:8000/docs). You should be able to see a debug interface for your service. ![FastAPI Swagger interface](https://qdrant.tech/docs/fastapi_neural_search.png) Feel free to play around with it, make queries regarding the companies in our corpus, and check out the results. Join our [Discord community](https://qdrant.to/discord), where we talk about vector search and similarity learning, publish other examples of neural networks and neural search applications. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/hybrid-search-fastembed.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/hybrid-search-fastembed.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-40-lllmstxt|> ## hybrid-cloud-setup - [Documentation](https://qdrant.tech/documentation/) - [Hybrid cloud](https://qdrant.tech/documentation/hybrid-cloud/) - Setup Hybrid Cloud --- # [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#creating-a-hybrid-cloud-environment) Creating a Hybrid Cloud Environment The following instruction set will show you how to properly set up a **Qdrant cluster** in your **Hybrid Cloud Environment**. You can also watch a video demo on how to set up a Hybrid Cloud Environment: Deploy a Production-Ready Vector Database in 5 Minutes With Qdrant Hybrid Cloud - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [Deploy a Production-Ready Vector Database in 5 Minutes With Qdrant Hybrid Cloud](https://www.youtube.com/watch?v=BF02jULGCfo) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=BF02jULGCfo&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 6:44 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=BF02jULGCfo "Watch on YouTube") To learn how Hybrid Cloud works, [read the overview document](https://qdrant.tech/documentation/hybrid-cloud/). ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#prerequisites) Prerequisites - **Kubernetes cluster:** To create a Hybrid Cloud Environment, you need a [standard compliant](https://www.cncf.io/training/certification/software-conformance/) Kubernetes cluster. You can run this cluster in any cloud, on-premise or edge environment, with distributions that range from AWS EKS to VMWare vSphere. See [Deployment Platforms](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/) for more information. - **Storage:** For storage, you need to set up the Kubernetes cluster with a Container Storage Interface (CSI) driver that provides block storage. For vertical scaling, the CSI driver needs to support volume expansion. The `StorageClass` needs to be created beforehand. For backups and restores, the driver needs to support CSI snapshots and restores. The `VolumeSnapshotClass` needs to be created beforehand. See [Deployment Platforms](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/) for more information. - **Kubernetes nodes:** You need enough CPU and memory capacity for the Qdrant database clusters that you create. A small amount of resources is also needed for the Hybrid Cloud control plane components. Qdrant Hybrid Cloud supports x86\_64 and ARM64 architectures. - **Permissions:** To install the Qdrant Kubernetes Operator you need to have `cluster-admin` access in your Kubernetes cluster. - **Connection:** The Qdrant Kubernetes Operator in your cluster needs to be able to connect to Qdrant Cloud. It will create an outgoing connection to `cloud.qdrant.io` on port `443`. - **Locations:** By default, the Qdrant Cloud Agent and Operator pulls Helm charts and container images from `registry.cloud.qdrant.io`. The Qdrant database container image is pulled from `docker.io`. > **Note:** You can also mirror these images and charts into your own registry and pull them from there. ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#cli-tools) CLI tools During the onboarding, you will need to deploy the Qdrant Kubernetes Operator and Agent using Helm. Make sure you have the following tools installed: - [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) - [helm](https://helm.sh/docs/intro/install/) You will need to have access to the Kubernetes cluster with `kubectl` and `helm` configured to connect to it. Please refer the documentation of your Kubernetes distribution for more information. ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#installation) Installation 1. To set up Hybrid Cloud, open the Qdrant Cloud Console at [cloud.qdrant.io](https://cloud.qdrant.io/). On the dashboard, select **Hybrid Cloud**. 2. Before creating your first Hybrid Cloud Environment, you have to provide billing information and accept the Hybrid Cloud license agreement. The installation wizard will guide you through the process. > **Note:** You will only be charged for the Qdrant cluster you create in a Hybrid Cloud Environment, but not for the environment itself. 3. Now you can specify the following: - **Name:** A name for the Hybrid Cloud Environment - **Kubernetes Namespace:** The Kubernetes namespace for the operator and agent. Once you select a namespace, you can’t change it. You can also configure the StorageClass and VolumeSnapshotClass to use for the Qdrant databases, if you want to deviate from the default settings of your cluster. ![Create Hybrid Cloud Environment](https://qdrant.tech/documentation/cloud/hybrid_cloud_env_create.png) 4. You can then enter the YAML configuration for your Kubernetes operator. Qdrant supports a specific list of configuration options, as described in the [Qdrant Operator configuration](https://qdrant.tech/documentation/hybrid-cloud/operator-configuration/) section. 5. (Optional) If you have special requirements for any of the following, activate the **Show advanced configuration** option: - If you use a proxy to connect from your infrastructure to the Qdrant Cloud API, you can specify the proxy URL, credentials and cetificates. - Container registry URL for Qdrant Operator and Agent images. The default is [https://registry.cloud.qdrant.io/qdrant/](https://registry.cloud.qdrant.io/qdrant/). - Helm chart repository URL for the Qdrant Operator and Agent. The default is [oci://registry.cloud.qdrant.io/qdrant-charts](oci://registry.cloud.qdrant.io/qdrant-charts). - An optional secret with credentials to access your own container registry. - Log level for the operator and agent - Node selectors and tolerations for the operater, agent and monitoring stack ![Create Hybrid Cloud Environment - Advanced Configuration](https://qdrant.tech/documentation/cloud/hybrid_cloud_advanced_configuration.png) 6. Once complete, click **Create**. > **Note:** All settings but the Kubernetes namespace can be changed later. ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#generate-installation-command) Generate Installation Command After creating your Hybrid Cloud, select **Generate Installation Command** to generate a script that you can run in your Kubernetes cluster which will perform the initial installation of the Kubernetes operator and agent. ![Rotate Hybrid Cloud Secrets](https://qdrant.tech/documentation/cloud/hybrid_cloud_create_command.png) It will: - Create the Kubernetes namespace, if not present. - Set up the necessary secrets with credentials to access the Qdrant container registry and the Qdrant Cloud API. - Sign in to the Helm registry at `registry.cloud.qdrant.io`. - Install the Qdrant cloud agent and Kubernetes operator chart. You need this command only for the initial installation. After that, you can update the agent and operator using the Qdrant Cloud Console. > **Note:** If you generate the installation command a second time, it will re-generate the included secrets, and you will have to apply the command again to update them. ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#advanced-configuration) Advanced configuration ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#mirroring-images-and-charts) Mirroring images and charts #### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#required-artifacts) Required artifacts Container images: - `registry.cloud.qdrant.io/qdrant/qdrant` - `registry.cloud.qdrant.io/qdrant/qdrant-cloud-agent` - `registry.cloud.qdrant.io/qdrant/operator` - `registry.cloud.qdrant.io/qdrant/cluster-manager` - `registry.cloud.qdrant.io/qdrant/prometheus` - `registry.cloud.qdrant.io/qdrant/prometheus-config-reloader` - `registry.cloud.qdrant.io/qdrant/kube-state-metrics` - `registry.cloud.qdrant.io/qdrant/kubernetes-event-exporter` - `registry.cloud.qdrant.io/qdrant/qdrant-cluster-exporter` Open Containers Initiative (OCI) Helm charts: - `registry.cloud.qdrant.io/qdrant-charts/qdrant-cloud-agent` - `registry.cloud.qdrant.io/qdrant-charts/operator` - `registry.cloud.qdrant.io/qdrant-charts/qdrant-cluster-manager` - `registry.cloud.qdrant.io/qdrant-charts/prometheus` - `registry.cloud.qdrant.io/qdrant-charts/kubernetes-event-exporter` - `registry.cloud.qdrant.io/qdrant-charts/qdrant-cluster-exporter` To mirror all necessary container images and Helm charts into your own registry, you should use an automatic replication feature that your registry provides, so that you have new image versions available automatically. Alternatively you can manually sync the images with tools like [Skopeo](https://github.com/containers/skopeo). When syncing images manually, make sure that you sync then with all, or with the right CPU architecture. ##### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#automatic-replication) Automatic replication Ensure that you have both the container images in the `/qdrant/` repository, and the helm charts in the `/qdrant-charts/` repository synced. Then go to the advanced section of your Hybrid Cloud Environment and configure your registry locations: - Container registry URL: `your-registry.example.com/qdrant` (this will for example result in `your-registry.example.com/qdrant/qdrant-cloud-agent`) - Chart repository URL: `oci://your-registry.example.com/qdrant-charts` (this will for example result in `oci://your-registry.example.com/qdrant-charts/qdrant-cloud-agent`) If you registry requires authentication, you have to create your own secrets with authentication information into your `the-qdrant-namespace` namespace. Example: ```shell kubectl --namespace the-qdrant-namespace create secret docker-registry my-creds --docker-server='your-registry.example.com' --docker-username='your-username' --docker-password='your-password' ``` You can then reference they secret in the advanced section of your Hybrid Cloud Environment. ##### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#manual-replication) Manual replication This example uses Skopeo. You can find your personal credentials for the Qdrant Cloud registry in the onboarding command, or you can fetch them with `kubectl`: ```shell kubectl get secrets qdrant-registry-creds --namespace the-qdrant-namespace -o jsonpath='{.data.\.dockerconfigjson}' | base64 --decode | jq -r '.' ``` First login to the source registry: ```shell skopeo login registry.cloud.qdrant.io ``` Then login to your own registry: ```shell skopeo login your-registry.example.com ``` To sync all container images: ```shell skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/operator your-registry.example.com/qdrant/operator skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/qdrant-cloud-agent your-registry.example.com/qdrant/qdrant-cloud-agent skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/prometheus your-registry.example.com/qdrant/prometheus skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/prometheus-config-reloader your-registry.example.com/qdrant/prometheus-config-reloader skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/kube-state-metrics your-registry.example.com/qdrant/kube-state-metrics skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/qdrant your-registry.example.com/qdrant/qdrant skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/cluster-manager your-registry.example.com/qdrant/cluster-manager skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/qdrant-cluster-exporter your-registry.example.com/qdrant/qdrant-cluster-exporter skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/kubernetes-event-exporter your-registry.example.com/qdrant/kubernetes-event-exporter ``` To sync all helm charts: ```shell skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/prometheus your-registry.example.com/qdrant-charts/prometheus skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/operator your-registry.example.com/qdrant-charts/operator skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/qdrant-kubernetes-api your-registry.example.com/qdrant-charts/qdrant-kubernetes-api skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/qdrant-cloud-agent your-registry.example.com/qdrant-charts/qdrant-cloud-agent skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/qdrant-cluster-exporter your-registry.example.com/qdrant-charts/qdrant-cluster-exporter skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/kubernetes-event-exporter your-registry.example.com/qdrant-charts/kubernetes-event-exporter ``` With the above configuration, you can add the following values to the advanced section of your Hybrid Cloud Environment: - Container registry URL: `your-registry.example.com/qdrant` - Chart repository URL: `oci://your-registry.example.com/qdrant-charts` If your registry requires authentication, you can create and reference the secret the same way as described above. ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#rate-limits-at-dockerio) Rate limits at `docker.io` By default, the Qdrant database image will be fetched from Docker Hub, which is the main source of truth. Docker Hub has rate limits for anonymous users. If you have larger setups and also fetch other images from their, you may run into these limits. To solve this, you can provide authentication information for Docker Hub. First, create a secret with your Docker Hub credentials into your `the-qdrant-namespace` namespace: ```shell kubectl create secret docker-registry dockerhub-registry-secret --namespace the-qdrant-namespace --docker-server=https://index.docker.io/v1/ --docker-username= --docker-password= --docker-email= ``` Then, you can reference this secret by adding the following configuration in the operator configuration YAML editor in the advanced section of the Hybrid Cloud Environment: ```yaml qdrant: image: pull_secret: "dockerhub-registry-secret" ``` ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#rotating-secrets) Rotating Secrets If you need to rotate the secrets to pull container images and charts from the Qdrant registry and to authenticate at the Qdrant Cloud API, you can do so by following these steps: - Go to the Hybrid Cloud environment list or the detail page of the environment. - In the actions menu, choose “Rotate Secrets” - Confirm the action - You will receive a new installation command that you can run in your Kubernetes cluster to update the secrets. If you don’t run the installation command, the secrets will not be updated and the communication between your Hybrid Cloud Environment and the Qdrant Cloud API will not work. ![Rotate Hybrid Cloud Secrets](https://qdrant.tech/documentation/cloud/hybrid_cloud_rotate_secrets.png) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/\#deleting-a-hybrid-cloud-environment) Deleting a Hybrid Cloud Environment To delete a Hybrid Cloud Environment, first delete all Qdrant database clusters in it. Then you can delete the environment itself. To clean up your Kubernetes cluster, after deleting the Hybrid Cloud Environment, you can download the script from [https://github.com/qdrant/qdrant-cloud-support-tools/tree/main/hybrid-cloud-cleanup](https://github.com/qdrant/qdrant-cloud-support-tools/tree/main/hybrid-cloud-cleanup) to remove all Qdrant related resources. Run the following command while being connected to your Kubernetes cluster. The script requires `kubectl` and `helm` to be installed. ```shell ./hybrid-cloud-cleanup.sh your-qdrant-namespace ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/hybrid-cloud-setup.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/hybrid-cloud-setup.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-41-lllmstxt|> ## points - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Points --- # [Anchor](https://qdrant.tech/documentation/concepts/points/\#points) Points The points are the central entity that Qdrant operates with. A point is a record consisting of a [vector](https://qdrant.tech/documentation/concepts/vectors/) and an optional [payload](https://qdrant.tech/documentation/concepts/payload/). It looks like this: ```json // This is a simple point { "id": 129, "vector": [0.1, 0.2, 0.3, 0.4], "payload": {"color": "red"}, } ``` You can search among the points grouped in one [collection](https://qdrant.tech/documentation/concepts/collections/) based on vector similarity. This procedure is described in more detail in the [search](https://qdrant.tech/documentation/concepts/search/) and [filtering](https://qdrant.tech/documentation/concepts/filtering/) sections. This section explains how to create and manage vectors. Any point modification operation is asynchronous and takes place in 2 steps. At the first stage, the operation is written to the Write-ahead-log. After this moment, the service will not lose the data, even if the machine loses power supply. ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#point-ids) Point IDs Qdrant supports using both `64-bit unsigned integers` and `UUID` as identifiers for points. Examples of UUID string representations: - simple: `936DA01F9ABD4d9d80C702AF85C822A8` - hyphenated: `550e8400-e29b-41d4-a716-446655440000` - urn: `urn:uuid:F9168C5E-CEB2-4faa-B6BF-329BF39FA1E4` That means that in every request UUID string could be used instead of numerical id. Example: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": "5c56c793-69f3-4fbf-87e6-c4bf54c28c26",\ "payload": {"color": "red"},\ "vector": [0.9, 0.1, 0.1]\ }\ ] } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id="5c56c793-69f3-4fbf-87e6-c4bf54c28c26",\ payload={\ "color": "red",\ },\ vector=[0.9, 0.1, 0.1],\ ),\ ], ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.upsert("{collection_name}", { points: [\ {\ id: "5c56c793-69f3-4fbf-87e6-c4bf54c28c26",\ payload: {\ color: "red",\ },\ vector: [0.9, 0.1, 0.1],\ },\ ], }); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![PointStruct::new(\ "5c56c793-69f3-4fbf-87e6-c4bf54c28c26",\ vec![0.9, 0.1, 0.1],\ [("color", "Red".into())],\ )], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import java.util.UUID; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(UUID.fromString("5c56c793-69f3-4fbf-87e6-c4bf54c28c26"))) .setVectors(vectors(0.05f, 0.61f, 0.76f, 0.74f)) .putAllPayload(Map.of("color", value("Red"))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = Guid.Parse("5c56c793-69f3-4fbf-87e6-c4bf54c28c26"), Vectors = new[] { 0.05f, 0.61f, 0.76f, 0.74f }, Payload = { ["color"] = "Red" } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewID("5c56c793-69f3-4fbf-87e6-c4bf54c28c26"), Vectors: qdrant.NewVectors(0.05, 0.61, 0.76, 0.74), Payload: qdrant.NewValueMap(map[string]any{"color": "Red"}), }, }, }) ``` and httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "payload": {"color": "red"},\ "vector": [0.9, 0.1, 0.1]\ }\ ] } ``` ```python client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ payload={\ "color": "red",\ },\ vector=[0.9, 0.1, 0.1],\ ),\ ], ) ``` ```typescript client.upsert("{collection_name}", { points: [\ {\ id: 1,\ payload: {\ color: "red",\ },\ vector: [0.9, 0.1, 0.1],\ },\ ], }); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![PointStruct::new(\ 1,\ vec![0.9, 0.1, 0.1],\ [("color", "Red".into())],\ )], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(0.05f, 0.61f, 0.76f, 0.74f)) .putAllPayload(Map.of("color", value("Red"))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new[] { 0.05f, 0.61f, 0.76f, 0.74f }, Payload = { ["color"] = "Red" } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectors(0.05, 0.61, 0.76, 0.74), Payload: qdrant.NewValueMap(map[string]any{"color": "Red"}), }, }, }) ``` are both possible. ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#vectors) Vectors Each point in qdrant may have one or more vectors. Vectors are the central component of the Qdrant architecture, qdrant relies on different types of vectors to provide different types of data exploration and search. Here is a list of supported vector types: | | | | --- | --- | | Dense Vectors | A regular vectors, generated by majority of the embedding models. | | Sparse Vectors | Vectors with no fixed length, but only a few non-zero elements.
Useful for exact token match and collaborative filtering recommendations. | | MultiVectors | Matrices of numbers with fixed length but variable height.
Usually obtained from late interaction models like ColBERT. | It is possible to attach more than one type of vector to a single point. In Qdrant we call these Named Vectors. Read more about vector types, how they are stored and optimized in the [vectors](https://qdrant.tech/documentation/concepts/vectors/) section. ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#upload-points) Upload points To optimize performance, Qdrant supports batch loading of points. I.e., you can load several points into the service in one API call. Batching allows you to minimize the overhead of creating a network connection. The Qdrant API supports two ways of creating batches - record-oriented and column-oriented. Internally, these options do not differ and are made only for the convenience of interaction. Create points with batch: httppythontypescript ```http PUT /collections/{collection_name}/points { "batch": { "ids": [1, 2, 3], "payloads": [\ {"color": "red"},\ {"color": "green"},\ {"color": "blue"}\ ], "vectors": [\ [0.9, 0.1, 0.1],\ [0.1, 0.9, 0.1],\ [0.1, 0.1, 0.9]\ ] } } ``` ```python client.upsert( collection_name="{collection_name}", points=models.Batch( ids=[1, 2, 3], payloads=[\ {"color": "red"},\ {"color": "green"},\ {"color": "blue"},\ ], vectors=[\ [0.9, 0.1, 0.1],\ [0.1, 0.9, 0.1],\ [0.1, 0.1, 0.9],\ ], ), ) ``` ```typescript client.upsert("{collection_name}", { batch: { ids: [1, 2, 3], payloads: [{ color: "red" }, { color: "green" }, { color: "blue" }], vectors: [\ [0.9, 0.1, 0.1],\ [0.1, 0.9, 0.1],\ [0.1, 0.1, 0.9],\ ], }, }); ``` or record-oriented equivalent: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "payload": {"color": "red"},\ "vector": [0.9, 0.1, 0.1]\ },\ {\ "id": 2,\ "payload": {"color": "green"},\ "vector": [0.1, 0.9, 0.1]\ },\ {\ "id": 3,\ "payload": {"color": "blue"},\ "vector": [0.1, 0.1, 0.9]\ }\ ] } ``` ```python client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ payload={\ "color": "red",\ },\ vector=[0.9, 0.1, 0.1],\ ),\ models.PointStruct(\ id=2,\ payload={\ "color": "green",\ },\ vector=[0.1, 0.9, 0.1],\ ),\ models.PointStruct(\ id=3,\ payload={\ "color": "blue",\ },\ vector=[0.1, 0.1, 0.9],\ ),\ ], ) ``` ```typescript client.upsert("{collection_name}", { points: [\ {\ id: 1,\ payload: { color: "red" },\ vector: [0.9, 0.1, 0.1],\ },\ {\ id: 2,\ payload: { color: "green" },\ vector: [0.1, 0.9, 0.1],\ },\ {\ id: 3,\ payload: { color: "blue" },\ vector: [0.1, 0.1, 0.9],\ },\ ], }); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![\ PointStruct::new(1, vec![0.9, 0.1, 0.1], [("city", "red".into())]),\ PointStruct::new(2, vec![0.1, 0.9, 0.1], [("city", "green".into())]),\ PointStruct::new(3, vec![0.1, 0.1, 0.9], [("city", "blue".into())]),\ ], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(0.9f, 0.1f, 0.1f)) .putAllPayload(Map.of("color", value("red"))) .build(), PointStruct.newBuilder() .setId(id(2)) .setVectors(vectors(0.1f, 0.9f, 0.1f)) .putAllPayload(Map.of("color", value("green"))) .build(), PointStruct.newBuilder() .setId(id(3)) .setVectors(vectors(0.1f, 0.1f, 0.9f)) .putAllPayload(Map.of("color", value("blue"))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new[] { 0.9f, 0.1f, 0.1f }, Payload = { ["color"] = "red" } }, new() { Id = 2, Vectors = new[] { 0.1f, 0.9f, 0.1f }, Payload = { ["color"] = "green" } }, new() { Id = 3, Vectors = new[] { 0.1f, 0.1f, 0.9f }, Payload = { ["color"] = "blue" } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectors(0.9, 0.1, 0.1), Payload: qdrant.NewValueMap(map[string]any{"color": "red"}), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectors(0.1, 0.9, 0.1), Payload: qdrant.NewValueMap(map[string]any{"color": "green"}), }, { Id: qdrant.NewIDNum(3), Vectors: qdrant.NewVectors(0.1, 0.1, 0.9), Payload: qdrant.NewValueMap(map[string]any{"color": "blue"}), }, }, }) ``` The Python client has additional features for loading points, which include: - Parallelization - A retry mechanism - Lazy batching support For example, you can read your data directly from hard drives, to avoid storing all data in RAM. You can use these features with the `upload_collection` and `upload_points` methods. Similar to the basic upsert API, these methods support both record-oriented and column-oriented formats. Column-oriented format: ```python client.upload_collection( collection_name="{collection_name}", ids=[1, 2], payload=[\ {"color": "red"},\ {"color": "green"},\ ], vectors=[\ [0.9, 0.1, 0.1],\ [0.1, 0.9, 0.1],\ ], parallel=4, max_retries=3, ) ``` Record-oriented format: ```python client.upload_points( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ payload={\ "color": "red",\ },\ vector=[0.9, 0.1, 0.1],\ ),\ models.PointStruct(\ id=2,\ payload={\ "color": "green",\ },\ vector=[0.1, 0.9, 0.1],\ ),\ ], parallel=4, max_retries=3, ) ``` All APIs in Qdrant, including point loading, are idempotent. It means that executing the same method several times in a row is equivalent to a single execution. In this case, it means that points with the same id will be overwritten when re-uploaded. Idempotence property is useful if you use, for example, a message queue that doesn’t provide an exactly-ones guarantee. Even with such a system, Qdrant ensures data consistency. [_Available as of v0.10.0_](https://qdrant.tech/documentation/concepts/points/#create-vector-name) If the collection was created with multiple vectors, each vector data can be provided using the vector’s name: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "vector": {\ "image": [0.9, 0.1, 0.1, 0.2],\ "text": [0.4, 0.7, 0.1, 0.8, 0.1, 0.1, 0.9, 0.2]\ }\ },\ {\ "id": 2,\ "vector": {\ "image": [0.2, 0.1, 0.3, 0.9],\ "text": [0.5, 0.2, 0.7, 0.4, 0.7, 0.2, 0.3, 0.9]\ }\ }\ ] } ``` ```python client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ vector={\ "image": [0.9, 0.1, 0.1, 0.2],\ "text": [0.4, 0.7, 0.1, 0.8, 0.1, 0.1, 0.9, 0.2],\ },\ ),\ models.PointStruct(\ id=2,\ vector={\ "image": [0.2, 0.1, 0.3, 0.9],\ "text": [0.5, 0.2, 0.7, 0.4, 0.7, 0.2, 0.3, 0.9],\ },\ ),\ ], ) ``` ```typescript client.upsert("{collection_name}", { points: [\ {\ id: 1,\ vector: {\ image: [0.9, 0.1, 0.1, 0.2],\ text: [0.4, 0.7, 0.1, 0.8, 0.1, 0.1, 0.9, 0.2],\ },\ },\ {\ id: 2,\ vector: {\ image: [0.2, 0.1, 0.3, 0.9],\ text: [0.5, 0.2, 0.7, 0.4, 0.7, 0.2, 0.3, 0.9],\ },\ },\ ], }); ``` ```rust use std::collections::HashMap; use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; use qdrant_client::Payload; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![\ PointStruct::new(\ 1,\ HashMap::from([\ ("image".to_string(), vec![0.9, 0.1, 0.1, 0.2]),\ (\ "text".to_string(),\ vec![0.4, 0.7, 0.1, 0.8, 0.1, 0.1, 0.9, 0.2],\ ),\ ]),\ Payload::default(),\ ),\ PointStruct::new(\ 2,\ HashMap::from([\ ("image".to_string(), vec![0.2, 0.1, 0.3, 0.9]),\ (\ "text".to_string(),\ vec![0.5, 0.2, 0.7, 0.4, 0.7, 0.2, 0.3, 0.9],\ ),\ ]),\ Payload::default(),\ ),\ ], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.VectorFactory.vector; import static io.qdrant.client.VectorsFactory.namedVectors; import io.qdrant.client.grpc.Points.PointStruct; client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors( namedVectors( Map.of( "image", vector(List.of(0.9f, 0.1f, 0.1f, 0.2f)), "text", vector(List.of(0.4f, 0.7f, 0.1f, 0.8f, 0.1f, 0.1f, 0.9f, 0.2f))))) .build(), PointStruct.newBuilder() .setId(id(2)) .setVectors( namedVectors( Map.of( "image", List.of(0.2f, 0.1f, 0.3f, 0.9f), "text", List.of(0.5f, 0.2f, 0.7f, 0.4f, 0.7f, 0.2f, 0.3f, 0.9f)))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new Dictionary { ["image"] = [0.9f, 0.1f, 0.1f, 0.2f], ["text"] = [0.4f, 0.7f, 0.1f, 0.8f, 0.1f, 0.1f, 0.9f, 0.2f] } }, new() { Id = 2, Vectors = new Dictionary { ["image"] = [0.2f, 0.1f, 0.3f, 0.9f], ["text"] = [0.5f, 0.2f, 0.7f, 0.4f, 0.7f, 0.2f, 0.3f, 0.9f] } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{ "image": qdrant.NewVector(0.9, 0.1, 0.1, 0.2), "text": qdrant.NewVector(0.4, 0.7, 0.1, 0.8, 0.1, 0.1, 0.9, 0.2), }), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{ "image": qdrant.NewVector(0.2, 0.1, 0.3, 0.9), "text": qdrant.NewVector(0.5, 0.2, 0.7, 0.4, 0.7, 0.2, 0.3, 0.9), }), }, }, }) ``` _Available as of v1.2.0_ Named vectors are optional. When uploading points, some vectors may be omitted. For example, you can upload one point with only the `image` vector and a second one with only the `text` vector. When uploading a point with an existing ID, the existing point is deleted first, then it is inserted with just the specified vectors. In other words, the entire point is replaced, and any unspecified vectors are set to null. To keep existing vectors unchanged and only update specified vectors, see [update vectors](https://qdrant.tech/documentation/concepts/points/#update-vectors). _Available as of v1.7.0_ Points can contain dense and sparse vectors. A sparse vector is an array in which most of the elements have a value of zero. It is possible to take advantage of this property to have an optimized representation, for this reason they have a different shape than dense vectors. They are represented as a list of `(index, value)` pairs, where `index` is an integer and `value` is a floating point number. The `index` is the position of the non-zero value in the vector. The `values` is the value of the non-zero element. For example, the following vector: ``` [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0] ``` can be represented as a sparse vector: ``` [(6, 1.0), (7, 2.0)] ``` Qdrant uses the following JSON representation throughout its APIs. ```json { "indices": [6, 7], "values": [1.0, 2.0] } ``` The `indices` and `values` arrays must have the same length. And the `indices` must be unique. If the `indices` are not sorted, Qdrant will sort them internally so you may not rely on the order of the elements. Sparse vectors must be named and can be uploaded in the same way as dense vectors. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "vector": {\ "text": {\ "indices": [6, 7],\ "values": [1.0, 2.0]\ }\ }\ },\ {\ "id": 2,\ "vector": {\ "text": {\ "indices": [1, 2, 4, 15, 33, 34],\ "values": [0.1, 0.2, 0.3, 0.4, 0.5]\ }\ }\ }\ ] } ``` ```python client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ vector={\ "text": models.SparseVector(\ indices=[6, 7],\ values=[1.0, 2.0],\ )\ },\ ),\ models.PointStruct(\ id=2,\ vector={\ "text": models.SparseVector(\ indices=[1, 2, 3, 4, 5],\ values=[0.1, 0.2, 0.3, 0.4, 0.5],\ )\ },\ ),\ ], ) ``` ```typescript client.upsert("{collection_name}", { points: [\ {\ id: 1,\ vector: {\ text: {\ indices: [6, 7],\ values: [1.0, 2.0],\ },\ },\ },\ {\ id: 2,\ vector: {\ text: {\ indices: [1, 2, 3, 4, 5],\ values: [0.1, 0.2, 0.3, 0.4, 0.5],\ },\ },\ },\ ], }); ``` ```rust use std::collections::HashMap; use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder, Vector}; use qdrant_client::Payload; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![\ PointStruct::new(\ 1,\ HashMap::from([("text".to_string(), vec![(6, 1.0), (7, 2.0)])]),\ Payload::default(),\ ),\ PointStruct::new(\ 2,\ HashMap::from([(\ "text".to_string(),\ vec![(1, 0.1), (2, 0.2), (3, 0.3), (4, 0.4), (5, 0.5)],\ )]),\ Payload::default(),\ ),\ ], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.VectorFactory.vector; import io.qdrant.client.grpc.Points.NamedVectors; import io.qdrant.client.grpc.Points.PointStruct; import io.qdrant.client.grpc.Points.Vectors; client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors( Vectors.newBuilder() .setVectors( NamedVectors.newBuilder() .putAllVectors( Map.of( "text", vector(List.of(1.0f, 2.0f), List.of(6, 7)))) .build()) .build()) .build(), PointStruct.newBuilder() .setId(id(2)) .setVectors( Vectors.newBuilder() .setVectors( NamedVectors.newBuilder() .putAllVectors( Map.of( "text", vector( List.of(0.1f, 0.2f, 0.3f, 0.4f, 0.5f), List.of(1, 2, 3, 4, 5)))) .build()) .build()) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new Dictionary { ["text"] = ([1.0f, 2.0f], [6, 7]) } }, new() { Id = 2, Vectors = new Dictionary { ["text"] = ([0.1f, 0.2f, 0.3f, 0.4f, 0.5f], [1, 2, 3, 4, 5]) } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{ "text": qdrant.NewVectorSparse( []uint32{6, 7}, []float32{1.0, 2.0}), }), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{ "text": qdrant.NewVectorSparse( []uint32{1, 2, 3, 4, 5}, []float32{0.1, 0.2, 0.3, 0.4, 0.5}), }), }, }, }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#modify-points) Modify points To change a point, you can modify its vectors or its payload. There are several ways to do this. ### [Anchor](https://qdrant.tech/documentation/concepts/points/\#update-vectors) Update vectors _Available as of v1.2.0_ This method updates the specified vectors on the given points. Unspecified vectors are kept unchanged. All given points must exist. REST API ( [Schema](https://api.qdrant.tech/api-reference/points/update-vectors)): httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points/vectors { "points": [\ {\ "id": 1,\ "vector": {\ "image": [0.1, 0.2, 0.3, 0.4]\ }\ },\ {\ "id": 2,\ "vector": {\ "text": [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2]\ }\ }\ ] } ``` ```python client.update_vectors( collection_name="{collection_name}", points=[\ models.PointVectors(\ id=1,\ vector={\ "image": [0.1, 0.2, 0.3, 0.4],\ },\ ),\ models.PointVectors(\ id=2,\ vector={\ "text": [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2],\ },\ ),\ ], ) ``` ```typescript client.updateVectors("{collection_name}", { points: [\ {\ id: 1,\ vector: {\ image: [0.1, 0.2, 0.3, 0.4],\ },\ },\ {\ id: 2,\ vector: {\ text: [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2],\ },\ },\ ], }); ``` ```rust use std::collections::HashMap; use qdrant_client::qdrant::{ PointVectors, UpdatePointVectorsBuilder, }; client .update_vectors( UpdatePointVectorsBuilder::new( "{collection_name}", vec![\ PointVectors {\ id: Some(1.into()),\ vectors: Some(\ HashMap::from([("image".to_string(), vec![0.1, 0.2, 0.3, 0.4])]).into(),\ ),\ },\ PointVectors {\ id: Some(2.into()),\ vectors: Some(\ HashMap::from([(\ "text".to_string(),\ vec![0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2],\ )])\ .into(),\ ),\ },\ ], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.VectorFactory.vector; import static io.qdrant.client.VectorsFactory.namedVectors; client .updateVectorsAsync( "{collection_name}", List.of( PointVectors.newBuilder() .setId(id(1)) .setVectors(namedVectors(Map.of("image", vector(List.of(0.1f, 0.2f, 0.3f, 0.4f))))) .build(), PointVectors.newBuilder() .setId(id(2)) .setVectors( namedVectors( Map.of( "text", vector(List.of(0.9f, 0.8f, 0.7f, 0.6f, 0.5f, 0.4f, 0.3f, 0.2f))))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateVectorsAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = ("image", new float[] { 0.1f, 0.2f, 0.3f, 0.4f }) }, new() { Id = 2, Vectors = ("text", new float[] { 0.9f, 0.8f, 0.7f, 0.6f, 0.5f, 0.4f, 0.3f, 0.2f }) } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.UpdateVectors(context.Background(), &qdrant.UpdatePointVectors{ CollectionName: "{collection_name}", Points: []*qdrant.PointVectors{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{ "image": qdrant.NewVector(0.1, 0.2, 0.3, 0.4), }), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{ "text": qdrant.NewVector(0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2), }), }, }, }) ``` To update points and replace all of its vectors, see [uploading\\ points](https://qdrant.tech/documentation/concepts/points/#upload-points). ### [Anchor](https://qdrant.tech/documentation/concepts/points/\#delete-vectors) Delete vectors _Available as of v1.2.0_ This method deletes just the specified vectors from the given points. Other vectors are kept unchanged. Points are never deleted. REST API ( [Schema](https://api.qdrant.tech/api-reference/points/delete-vectors)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/vectors/delete { "points": [0, 3, 100], "vectors": ["text", "image"] } ``` ```python client.delete_vectors( collection_name="{collection_name}", points=[0, 3, 100], vectors=["text", "image"], ) ``` ```typescript client.deleteVectors("{collection_name}", { points: [0, 3, 10], vector: ["text", "image"], }); ``` ```rust use qdrant_client::qdrant::{ DeletePointVectorsBuilder, PointsIdsList, }; client .delete_vectors( DeletePointVectorsBuilder::new("{collection_name}") .points_selector(PointsIdsList { ids: vec![0.into(), 3.into(), 10.into()], }) .vectors(vec!["text".into(), "image".into()]) .wait(true), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; client .deleteVectorsAsync( "{collection_name}", List.of("text", "image"), List.of(id(0), id(3), id(10))) .get(); ``` ```csharp await client.DeleteVectorsAsync("{collection_name}", ["text", "image"], [0, 3, 10]); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client.DeleteVectors(context.Background(), &qdrant.DeletePointVectors{ CollectionName: "{collection_name}", PointsSelector: qdrant.NewPointsSelector( qdrant.NewIDNum(0), qdrant.NewIDNum(3), qdrant.NewIDNum(10)), Vectors: &qdrant.VectorsSelector{ Names: []string{"text", "image"}, }, }) ``` To delete entire points, see [deleting points](https://qdrant.tech/documentation/concepts/points/#delete-points). ### [Anchor](https://qdrant.tech/documentation/concepts/points/\#update-payload) Update payload Learn how to modify the payload of a point in the [Payload](https://qdrant.tech/documentation/concepts/payload/#update-payload) section. ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#delete-points) Delete points REST API ( [Schema](https://api.qdrant.tech/api-reference/points/delete-points)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/delete { "points": [0, 3, 100] } ``` ```python client.delete( collection_name="{collection_name}", points_selector=models.PointIdsList( points=[0, 3, 100], ), ) ``` ```typescript client.delete("{collection_name}", { points: [0, 3, 100], }); ``` ```rust use qdrant_client::qdrant::{DeletePointsBuilder, PointsIdsList}; client .delete_points( DeletePointsBuilder::new("{collection_name}") .points(PointsIdsList { ids: vec![0.into(), 3.into(), 100.into()], }) .wait(true), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; client.deleteAsync("{collection_name}", List.of(id(0), id(3), id(100))); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.DeleteAsync(collectionName: "{collection_name}", ids: [0, 3, 100]); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Delete(context.Background(), &qdrant.DeletePoints{ CollectionName: "{collection_name}", Points: qdrant.NewPointsSelector( qdrant.NewIDNum(0), qdrant.NewIDNum(3), qdrant.NewIDNum(100), ), }) ``` Alternative way to specify which points to remove is to use filter. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/delete { "filter": { "must": [\ {\ "key": "color",\ "match": {\ "value": "red"\ }\ }\ ] } } ``` ```python client.delete( collection_name="{collection_name}", points_selector=models.FilterSelector( filter=models.Filter( must=[\ models.FieldCondition(\ key="color",\ match=models.MatchValue(value="red"),\ ),\ ], ) ), ) ``` ```typescript client.delete("{collection_name}", { filter: { must: [\ {\ key: "color",\ match: {\ value: "red",\ },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, DeletePointsBuilder, Filter}; client .delete_points( DeletePointsBuilder::new("{collection_name}") .points(Filter::must([Condition::matches(\ "color",\ "red".to_string(),\ )])) .wait(true), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; client .deleteAsync( "{collection_name}", Filter.newBuilder().addMust(matchKeyword("color", "red")).build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.DeleteAsync(collectionName: "{collection_name}", filter: MatchKeyword("color", "red")); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Delete(context.Background(), &qdrant.DeletePoints{ CollectionName: "{collection_name}", Points: qdrant.NewPointsSelectorFilter( &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("color", "red"), }, }, ), }) ``` This example removes all points with `{ "color": "red" }` from the collection. ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#retrieve-points) Retrieve points There is a method for retrieving points by their ids. REST API ( [Schema](https://api.qdrant.tech/api-reference/points/get-points)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points { "ids": [0, 3, 100] } ``` ```python client.retrieve( collection_name="{collection_name}", ids=[0, 3, 100], ) ``` ```typescript client.retrieve("{collection_name}", { ids: [0, 3, 100], }); ``` ```rust use qdrant_client::qdrant::GetPointsBuilder; client .get_points(GetPointsBuilder::new( "{collection_name}", vec![0.into(), 30.into(), 100.into()], )) .await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; client .retrieveAsync("{collection_name}", List.of(id(0), id(30), id(100)), false, false, null) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.RetrieveAsync( collectionName: "{collection_name}", ids: [0, 30, 100], withPayload: false, withVectors: false ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Get(context.Background(), &qdrant.GetPoints{ CollectionName: "{collection_name}", Ids: []*qdrant.PointId{ qdrant.NewIDNum(0), qdrant.NewIDNum(3), qdrant.NewIDNum(100), }, }) ``` This method has additional parameters `with_vectors` and `with_payload`. Using these parameters, you can select parts of the point you want as a result. Excluding helps you not to waste traffic transmitting useless data. The single point can also be retrieved via the API: REST API ( [Schema](https://api.qdrant.tech/api-reference/points/get-point)): ```http GET /collections/{collection_name}/points/{point_id} ``` ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#scroll-points) Scroll points Sometimes it might be necessary to get all stored points without knowing ids, or iterate over points that correspond to a filter. REST API ( [Schema](https://api.qdrant.tech/master/api-reference/points/scroll-points)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must": [\ {\ "key": "color",\ "match": {\ "value": "red"\ }\ }\ ] }, "limit": 1, "with_payload": true, "with_vector": false } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.FieldCondition(key="color", match=models.MatchValue(value="red")),\ ] ), limit=1, with_payload=True, with_vectors=False, ) ``` ```typescript client.scroll("{collection_name}", { filter: { must: [\ {\ key: "color",\ match: {\ value: "red",\ },\ },\ ], }, limit: 1, with_payload: true, with_vector: false, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}") .filter(Filter::must([Condition::matches(\ "color",\ "red".to_string(),\ )])) .limit(1) .with_payload(true) .with_vectors(false), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.WithPayloadSelectorFactory.enable; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter(Filter.newBuilder().addMust(matchKeyword("color", "red")).build()) .setLimit(1) .setWithPayload(enable(true)) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: MatchKeyword("color", "red"), limit: 1, payloadSelector: true ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("color", "red"), }, }, Limit: qdrant.PtrOf(uint32(1)), WithPayload: qdrant.NewWithPayload(true), }) ``` Returns all point with `color` = `red`. ```json { "result": { "next_page_offset": 1, "points": [\ {\ "id": 0,\ "payload": {\ "color": "red"\ }\ }\ ] }, "status": "ok", "time": 0.0001 } ``` The Scroll API will return all points that match the filter in a page-by-page manner. All resulting points are sorted by ID. To query the next page it is necessary to specify the largest seen ID in the `offset` field. For convenience, this ID is also returned in the field `next_page_offset`. If the value of the `next_page_offset` field is `null` \- the last page is reached. ### [Anchor](https://qdrant.tech/documentation/concepts/points/\#order-points-by-payload-key) Order points by payload key _Available as of v1.8.0_ When using the [`scroll`](https://qdrant.tech/documentation/concepts/points/#scroll-points) API, you can sort the results by payload key. For example, you can retrieve points in chronological order if your payloads have a `"timestamp"` field, as is shown from the example below: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "limit": 15, "order_by": "timestamp", // <-- this! } ``` ```python client.scroll( collection_name="{collection_name}", limit=15, order_by="timestamp", # <-- this! ) ``` ```typescript client.scroll("{collection_name}", { limit: 15, order_by: "timestamp", // <-- this! }); ``` ```rust use qdrant_client::qdrant::{OrderByBuilder, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}") .limit(15) .order_by(OrderByBuilder::new("timestamp")), ) .await?; ``` ```java import io.qdrant.client.grpc.Points.OrderBy; import io.qdrant.client.grpc.Points.ScrollPoints; client.scrollAsync(ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setLimit(15) .setOrderBy(OrderBy.newBuilder().setKey("timestamp").build()) .build()).get(); ``` ```csharp await client.ScrollAsync("{collection_name}", limit: 15, orderBy: "timestamp"); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Limit: qdrant.PtrOf(uint32(15)), OrderBy: &qdrant.OrderBy{ Key: "timestamp", }, }) ``` You need to use the `order_by` `key` parameter to specify the payload key. Then you can add other fields to control the ordering, such as `direction` and `start_from`: httppythontypescriptrustjavacsharpgo ```http "order_by": { "key": "timestamp", "direction": "desc" // default is "asc" "start_from": 123, // start from this value } ``` ```python order_by=models.OrderBy( key="timestamp", direction="desc", # default is "asc" start_from=123, # start from this value ) ``` ```typescript order_by: { key: "timestamp", direction: "desc", // default is "asc" start_from: 123, // start from this value } ``` ```rust use qdrant_client::qdrant::{start_from::Value, Direction, OrderByBuilder}; OrderByBuilder::new("timestamp") .direction(Direction::Desc.into()) .start_from(Value::Integer(123)) .build(); ``` ```java import io.qdrant.client.grpc.Points.Direction; import io.qdrant.client.grpc.Points.OrderBy; import io.qdrant.client.grpc.Points.StartFrom; OrderBy.newBuilder() .setKey("timestamp") .setDirection(Direction.Desc) .setStartFrom(StartFrom.newBuilder() .setInteger(123) .build()) .build(); ``` ```csharp using Qdrant.Client.Grpc; new OrderBy { Key = "timestamp", Direction = Direction.Desc, StartFrom = 123 }; ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.OrderBy{ Key: "timestamp", Direction: qdrant.Direction_Desc.Enum(), StartFrom: qdrant.NewStartFromInt(123), } ``` When sorting is based on a non-unique value, it is not possible to rely on an ID offset. Thus, next\_page\_offset is not returned within the response. However, you can still do pagination by combining `"order_by": { "start_from": ... }` with a `{ "must_not": [{ "has_id": [...] }] }` filter. ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#counting-points) Counting points _Available as of v0.8.4_ Sometimes it can be useful to know how many points fit the filter conditions without doing a real search. Among others, for example, we can highlight the following scenarios: - Evaluation of results size for faceted search - Determining the number of pages for pagination - Debugging the query execution speed REST API ( [Schema](https://api.qdrant.tech/master/api-reference/points/count-points)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/count { "filter": { "must": [\ {\ "key": "color",\ "match": {\ "value": "red"\ }\ }\ ] }, "exact": true } ``` ```python client.count( collection_name="{collection_name}", count_filter=models.Filter( must=[\ models.FieldCondition(key="color", match=models.MatchValue(value="red")),\ ] ), exact=True, ) ``` ```typescript client.count("{collection_name}", { filter: { must: [\ {\ key: "color",\ match: {\ value: "red",\ },\ },\ ], }, exact: true, }); ``` ```rust use qdrant_client::qdrant::{Condition, CountPointsBuilder, Filter}; client .count( CountPointsBuilder::new("{collection_name}") .filter(Filter::must([Condition::matches(\ "color",\ "red".to_string(),\ )])) .exact(true), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; client .countAsync( "{collection_name}", Filter.newBuilder().addMust(matchKeyword("color", "red")).build(), true) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.CountAsync( collectionName: "{collection_name}", filter: MatchKeyword("color", "red"), exact: true ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Count(context.Background(), &qdrant.CountPoints{ CollectionName: "midlib", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("color", "red"), }, }, }) ``` Returns number of counts matching given filtering conditions: ```json { "count": 3811 } ``` ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#batch-update) Batch update _Available as of v1.5.0_ You can batch multiple point update operations. This includes inserting, updating and deleting points, vectors and payload. A batch update request consists of a list of operations. These are executed in order. These operations can be batched: - [Upsert points](https://qdrant.tech/documentation/concepts/points/#upload-points): `upsert` or `UpsertOperation` - [Delete points](https://qdrant.tech/documentation/concepts/points/#delete-points): `delete_points` or `DeleteOperation` - [Update vectors](https://qdrant.tech/documentation/concepts/points/#update-vectors): `update_vectors` or `UpdateVectorsOperation` - [Delete vectors](https://qdrant.tech/documentation/concepts/points/#delete-vectors): `delete_vectors` or `DeleteVectorsOperation` - [Set payload](https://qdrant.tech/documentation/concepts/payload/#set-payload): `set_payload` or `SetPayloadOperation` - [Overwrite payload](https://qdrant.tech/documentation/concepts/payload/#overwrite-payload): `overwrite_payload` or `OverwritePayload` - [Delete payload](https://qdrant.tech/documentation/concepts/payload/#delete-payload-keys): `delete_payload` or `DeletePayloadOperation` - [Clear payload](https://qdrant.tech/documentation/concepts/payload/#clear-payload): `clear_payload` or `ClearPayloadOperation` The following example snippet makes use of all operations. REST API ( [Schema](https://api.qdrant.tech/master/api-reference/points/batch-update)): httppythontypescriptrustjava ```http POST /collections/{collection_name}/points/batch { "operations": [\ {\ "upsert": {\ "points": [\ {\ "id": 1,\ "vector": [1.0, 2.0, 3.0, 4.0],\ "payload": {}\ }\ ]\ }\ },\ {\ "update_vectors": {\ "points": [\ {\ "id": 1,\ "vector": [1.0, 2.0, 3.0, 4.0]\ }\ ]\ }\ },\ {\ "delete_vectors": {\ "points": [1],\ "vector": [""]\ }\ },\ {\ "overwrite_payload": {\ "payload": {\ "test_payload": "1"\ },\ "points": [1]\ }\ },\ {\ "set_payload": {\ "payload": {\ "test_payload_2": "2",\ "test_payload_3": "3"\ },\ "points": [1]\ }\ },\ {\ "delete_payload": {\ "keys": ["test_payload_2"],\ "points": [1]\ }\ },\ {\ "clear_payload": {\ "points": [1]\ }\ },\ {"delete": {"points": [1]}}\ ] } ``` ```python client.batch_update_points( collection_name="{collection_name}", update_operations=[\ models.UpsertOperation(\ upsert=models.PointsList(\ points=[\ models.PointStruct(\ id=1,\ vector=[1.0, 2.0, 3.0, 4.0],\ payload={},\ ),\ ]\ )\ ),\ models.UpdateVectorsOperation(\ update_vectors=models.UpdateVectors(\ points=[\ models.PointVectors(\ id=1,\ vector=[1.0, 2.0, 3.0, 4.0],\ )\ ]\ )\ ),\ models.DeleteVectorsOperation(\ delete_vectors=models.DeleteVectors(points=[1], vector=[""])\ ),\ models.OverwritePayloadOperation(\ overwrite_payload=models.SetPayload(\ payload={"test_payload": 1},\ points=[1],\ )\ ),\ models.SetPayloadOperation(\ set_payload=models.SetPayload(\ payload={\ "test_payload_2": 2,\ "test_payload_3": 3,\ },\ points=[1],\ )\ ),\ models.DeletePayloadOperation(\ delete_payload=models.DeletePayload(keys=["test_payload_2"], points=[1])\ ),\ models.ClearPayloadOperation(clear_payload=models.PointIdsList(points=[1])),\ models.DeleteOperation(delete=models.PointIdsList(points=[1])),\ ], ) ``` ```typescript client.batchUpdate("{collection_name}", { operations: [\ {\ upsert: {\ points: [\ {\ id: 1,\ vector: [1.0, 2.0, 3.0, 4.0],\ payload: {},\ },\ ],\ },\ },\ {\ update_vectors: {\ points: [\ {\ id: 1,\ vector: [1.0, 2.0, 3.0, 4.0],\ },\ ],\ },\ },\ {\ delete_vectors: {\ points: [1],\ vector: [""],\ },\ },\ {\ overwrite_payload: {\ payload: {\ test_payload: 1,\ },\ points: [1],\ },\ },\ {\ set_payload: {\ payload: {\ test_payload_2: 2,\ test_payload_3: 3,\ },\ points: [1],\ },\ },\ {\ delete_payload: {\ keys: ["test_payload_2"],\ points: [1],\ },\ },\ {\ clear_payload: {\ points: [1],\ },\ },\ {\ delete: {\ points: [1],\ },\ },\ ], }); ``` ```rust use std::collections::HashMap; use qdrant_client::qdrant::{ points_update_operation::{ ClearPayload, DeletePayload, DeletePoints, DeleteVectors, Operation, OverwritePayload, PointStructList, SetPayload, UpdateVectors, }, PointStruct, PointVectors, PointsUpdateOperation, UpdateBatchPointsBuilder, VectorsSelector, }; use qdrant_client::Payload; client .update_points_batch( UpdateBatchPointsBuilder::new( "{collection_name}", vec![\ PointsUpdateOperation {\ operation: Some(Operation::Upsert(PointStructList {\ points: vec![PointStruct::new(\ 1,\ vec![1.0, 2.0, 3.0, 4.0],\ Payload::default(),\ )],\ ..Default::default()\ })),\ },\ PointsUpdateOperation {\ operation: Some(Operation::UpdateVectors(UpdateVectors {\ points: vec![PointVectors {\ id: Some(1.into()),\ vectors: Some(vec![1.0, 2.0, 3.0, 4.0].into()),\ }],\ ..Default::default()\ })),\ },\ PointsUpdateOperation {\ operation: Some(Operation::DeleteVectors(DeleteVectors {\ points_selector: Some(vec![1.into()].into()),\ vectors: Some(VectorsSelector {\ names: vec!["".into()],\ }),\ ..Default::default()\ })),\ },\ PointsUpdateOperation {\ operation: Some(Operation::OverwritePayload(OverwritePayload {\ points_selector: Some(vec![1.into()].into()),\ payload: HashMap::from([("test_payload".to_string(), 1.into())]),\ ..Default::default()\ })),\ },\ PointsUpdateOperation {\ operation: Some(Operation::SetPayload(SetPayload {\ points_selector: Some(vec![1.into()].into()),\ payload: HashMap::from([\ ("test_payload_2".to_string(), 2.into()),\ ("test_payload_3".to_string(), 3.into()),\ ]),\ ..Default::default()\ })),\ },\ PointsUpdateOperation {\ operation: Some(Operation::DeletePayload(DeletePayload {\ points_selector: Some(vec![1.into()].into()),\ keys: vec!["test_payload_2".to_string()],\ ..Default::default()\ })),\ },\ PointsUpdateOperation {\ operation: Some(Operation::ClearPayload(ClearPayload {\ points: Some(vec![1.into()].into()),\ ..Default::default()\ })),\ },\ PointsUpdateOperation {\ operation: Some(Operation::DeletePoints(DeletePoints {\ points: Some(vec![1.into()].into()),\ ..Default::default()\ })),\ },\ ], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.grpc.Points.PointStruct; import io.qdrant.client.grpc.Points.PointVectors; import io.qdrant.client.grpc.Points.PointsIdsList; import io.qdrant.client.grpc.Points.PointsSelector; import io.qdrant.client.grpc.Points.PointsUpdateOperation; import io.qdrant.client.grpc.Points.PointsUpdateOperation.ClearPayload; import io.qdrant.client.grpc.Points.PointsUpdateOperation.DeletePayload; import io.qdrant.client.grpc.Points.PointsUpdateOperation.DeletePoints; import io.qdrant.client.grpc.Points.PointsUpdateOperation.DeleteVectors; import io.qdrant.client.grpc.Points.PointsUpdateOperation.PointStructList; import io.qdrant.client.grpc.Points.PointsUpdateOperation.SetPayload; import io.qdrant.client.grpc.Points.PointsUpdateOperation.UpdateVectors; import io.qdrant.client.grpc.Points.VectorsSelector; client .batchUpdateAsync( "{collection_name}", List.of( PointsUpdateOperation.newBuilder() .setUpsert( PointStructList.newBuilder() .addPoints( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(1.0f, 2.0f, 3.0f, 4.0f)) .build()) .build()) .build(), PointsUpdateOperation.newBuilder() .setUpdateVectors( UpdateVectors.newBuilder() .addPoints( PointVectors.newBuilder() .setId(id(1)) .setVectors(vectors(1.0f, 2.0f, 3.0f, 4.0f)) .build()) .build()) .build(), PointsUpdateOperation.newBuilder() .setDeleteVectors( DeleteVectors.newBuilder() .setPointsSelector( PointsSelector.newBuilder() .setPoints(PointsIdsList.newBuilder().addIds(id(1)).build()) .build()) .setVectors(VectorsSelector.newBuilder().addNames("").build()) .build()) .build(), PointsUpdateOperation.newBuilder() .setOverwritePayload( SetPayload.newBuilder() .setPointsSelector( PointsSelector.newBuilder() .setPoints(PointsIdsList.newBuilder().addIds(id(1)).build()) .build()) .putAllPayload(Map.of("test_payload", value(1))) .build()) .build(), PointsUpdateOperation.newBuilder() .setSetPayload( SetPayload.newBuilder() .setPointsSelector( PointsSelector.newBuilder() .setPoints(PointsIdsList.newBuilder().addIds(id(1)).build()) .build()) .putAllPayload( Map.of("test_payload_2", value(2), "test_payload_3", value(3))) .build()) .build(), PointsUpdateOperation.newBuilder() .setDeletePayload( DeletePayload.newBuilder() .setPointsSelector( PointsSelector.newBuilder() .setPoints(PointsIdsList.newBuilder().addIds(id(1)).build()) .build()) .addKeys("test_payload_2") .build()) .build(), PointsUpdateOperation.newBuilder() .setClearPayload( ClearPayload.newBuilder() .setPoints( PointsSelector.newBuilder() .setPoints(PointsIdsList.newBuilder().addIds(id(1)).build()) .build()) .build()) .build(), PointsUpdateOperation.newBuilder() .setDeletePoints( DeletePoints.newBuilder() .setPoints( PointsSelector.newBuilder() .setPoints(PointsIdsList.newBuilder().addIds(id(1)).build()) .build()) .build()) .build())) .get(); ``` To batch many points with a single operation type, please use batching functionality in that operation directly. ## [Anchor](https://qdrant.tech/documentation/concepts/points/\#awaiting-result) Awaiting result If the API is called with the `&wait=false` parameter, or if it is not explicitly specified, the client will receive an acknowledgment of receiving data: ```json { "result": { "operation_id": 123, "status": "acknowledged" }, "status": "ok", "time": 0.000206061 } ``` This response does not mean that the data is available for retrieval yet. This uses a form of eventual consistency. It may take a short amount of time before it is actually processed as updating the collection happens in the background. In fact, it is possible that such request eventually fails. If inserting a lot of vectors, we also recommend using asynchronous requests to take advantage of pipelining. If the logic of your application requires a guarantee that the vector will be available for searching immediately after the API responds, then use the flag `?wait=true`. In this case, the API will return the result only after the operation is finished: ```json { "result": { "operation_id": 0, "status": "completed" }, "status": "ok", "time": 0.000206061 } ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/points.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/points.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-42-lllmstxt|> ## cohere-rag-connector - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Implement Cohere RAG connector --- # [Anchor](https://qdrant.tech/documentation/examples/cohere-rag-connector/\#implement-custom-connector-for-cohere-rag) Implement custom connector for Cohere RAG | Time: 45 min | Level: Intermediate | | | | --- | --- | --- | --- | The usual approach to implementing Retrieval Augmented Generation requires users to build their prompts with the relevant context the LLM may rely on, and manually sending them to the model. Cohere is quite unique here, as their models can now speak to the external tools and extract meaningful data on their own. You can virtually connect any data source and let the Cohere LLM know how to access it. Obviously, vector search goes well with LLMs, and enabling semantic search over your data is a typical case. Cohere RAG has lots of interesting features, such as inline citations, which help you to refer to the specific parts of the documents used to generate the response. ![Cohere RAG citations](https://qdrant.tech/documentation/tutorials/cohere-rag-connector/cohere-rag-citations.png) _Source: [https://docs.cohere.com/docs/retrieval-augmented-generation-rag](https://docs.cohere.com/docs/retrieval-augmented-generation-rag)_ The connectors have to implement a specific interface and expose the data source as HTTP REST API. Cohere documentation [describes a general process of creating a connector](https://docs.cohere.com/v1/docs/creating-and-deploying-a-connector). This tutorial guides you step by step on building such a service around Qdrant. ## [Anchor](https://qdrant.tech/documentation/examples/cohere-rag-connector/\#qdrant-connector) Qdrant connector You probably already have some collections you would like to bring to the LLM. Maybe your pipeline was set up using some of the popular libraries such as Langchain, Llama Index, or Haystack. Cohere connectors may implement even more complex logic, e.g. hybrid search. In our case, we are going to start with a fresh Qdrant collection, index data using Cohere Embed v3, build the connector, and finally connect it with the [Command-R model](https://txt.cohere.com/command-r/). ### [Anchor](https://qdrant.tech/documentation/examples/cohere-rag-connector/\#building-the-collection) Building the collection First things first, let’s build a collection and configure it for the Cohere `embed-multilingual-v3.0` model. It produces 1024-dimensional embeddings, and we can choose any of the distance metrics available in Qdrant. Our connector will act as a personal assistant of a software engineer, and it will expose our notes to suggest the priorities or actions to perform. ```python from qdrant_client import QdrantClient, models client = QdrantClient( "https://my-cluster.cloud.qdrant.io:6333", api_key="my-api-key", ) client.create_collection( collection_name="personal-notes", vectors_config=models.VectorParams( size=1024, distance=models.Distance.DOT, ), ) ``` Our notes will be represented as simple JSON objects with a `title` and `text` of the specific note. The embeddings will be created from the `text` field only. ```python notes = [\ {\ "title": "Project Alpha Review",\ "text": "Review the current progress of Project Alpha, focusing on the integration of the new API. Check for any compatibility issues with the existing system and document the steps needed to resolve them. Schedule a meeting with the development team to discuss the timeline and any potential roadblocks."\ },\ {\ "title": "Learning Path Update",\ "text": "Update the learning path document with the latest courses on React and Node.js from Pluralsight. Schedule at least 2 hours weekly to dedicate to these courses. Aim to complete the React course by the end of the month and the Node.js course by mid-next month."\ },\ {\ "title": "Weekly Team Meeting Agenda",\ "text": "Prepare the agenda for the weekly team meeting. Include the following topics: project updates, review of the sprint backlog, discussion on the new feature requests, and a brainstorming session for improving remote work practices. Send out the agenda and the Zoom link by Thursday afternoon."\ },\ {\ "title": "Code Review Process Improvement",\ "text": "Analyze the current code review process to identify inefficiencies. Consider adopting a new tool that integrates with our version control system. Explore options such as GitHub Actions for automating parts of the process. Draft a proposal with recommendations and share it with the team for feedback."\ },\ {\ "title": "Cloud Migration Strategy",\ "text": "Draft a plan for migrating our current on-premise infrastructure to the cloud. The plan should cover the selection of a cloud provider, cost analysis, and a phased migration approach. Identify critical applications for the first phase and any potential risks or challenges. Schedule a meeting with the IT department to discuss the plan."\ },\ {\ "title": "Quarterly Goals Review",\ "text": "Review the progress towards the quarterly goals. Update the documentation to reflect any completed objectives and outline steps for any remaining goals. Schedule individual meetings with team members to discuss their contributions and any support they might need to achieve their targets."\ },\ {\ "title": "Personal Development Plan",\ "text": "Reflect on the past quarter's achievements and areas for improvement. Update the personal development plan to include new technical skills to learn, certifications to pursue, and networking events to attend. Set realistic timelines and check-in points to monitor progress."\ },\ {\ "title": "End-of-Year Performance Reviews",\ "text": "Start preparing for the end-of-year performance reviews. Collect feedback from peers and managers, review project contributions, and document achievements. Consider areas for improvement and set goals for the next year. Schedule preliminary discussions with each team member to gather their self-assessments."\ },\ {\ "title": "Technology Stack Evaluation",\ "text": "Conduct an evaluation of our current technology stack to identify any outdated technologies or tools that could be replaced for better performance and productivity. Research emerging technologies that might benefit our projects. Prepare a report with findings and recommendations to present to the management team."\ },\ {\ "title": "Team Building Event Planning",\ "text": "Plan a team-building event for the next quarter. Consider activities that can be done remotely, such as virtual escape rooms or online game nights. Survey the team for their preferences and availability. Draft a budget proposal for the event and submit it for approval."\ }\ ] ``` Storing the embeddings along with the metadata is fairly simple. ```python import cohere import uuid cohere_client = cohere.Client(api_key="my-cohere-api-key") response = cohere_client.embed( texts=[\ note.get("text")\ for note in notes\ ], model="embed-multilingual-v3.0", input_type="search_document", ) client.upload_points( collection_name="personal-notes", points=[\ models.PointStruct(\ id=uuid.uuid4().hex,\ vector=embedding,\ payload=note,\ )\ for note, embedding in zip(notes, response.embeddings)\ ] ) ``` Our collection is now ready to be searched over. In the real world, the set of notes would be changing over time, so the ingestion process won’t be as straightforward. This data is not yet exposed to the LLM, but we will build the connector in the next step. ### [Anchor](https://qdrant.tech/documentation/examples/cohere-rag-connector/\#connector-web-service) Connector web service [FastAPI](https://fastapi.tiangolo.com/) is a modern web framework and perfect a choice for a simple HTTP API. We are going to use it for the purposes of our connector. There will be just one endpoint, as required by the model. It will accept POST requests at the `/search` path. There is a single `query` parameter required. Let’s define a corresponding model. ```python from pydantic import BaseModel class SearchQuery(BaseModel): query: str ``` RAG connector does not have to return the documents in any specific format. There are [some good practices to follow](https://docs.cohere.com/v1/docs/creating-and-deploying-a-connector#configure-the-connection-between-the-connector-and-the-chat-api), but Cohere models are quite flexible here. Results just have to be returned as JSON, with a list of objects in a `results` property of the output. We will use the same document structure as we did for the Qdrant payloads, so there is no conversion required. That requires two additional models to be created. ```python from typing import List class Document(BaseModel): title: str text: str class SearchResults(BaseModel): results: List[Document] ``` Once our model classes are ready, we can implement the logic that will get the query and provide the notes that are relevant to it. Please note the LLM is not going to define the number of documents to be returned. That’s completely up to you how many of them you want to bring to the context. There are two services we need to interact with - Qdrant server and Cohere API. FastAPI has a concept of a [dependency\\ injection](https://fastapi.tiangolo.com/tutorial/dependencies/#dependencies), and we will use it to provide both clients into the implementation. In case of queries, we need to set the `input_type` to `search_query` in the calls to Cohere API. ```python from fastapi import FastAPI, Depends from typing import Annotated app = FastAPI() def client() -> QdrantClient: return QdrantClient(config.QDRANT_URL, api_key=config.QDRANT_API_KEY) def cohere_client() -> cohere.Client: return cohere.Client(api_key=config.COHERE_API_KEY) @app.post("/search") def search( query: SearchQuery, client: Annotated[QdrantClient, Depends(client)], cohere_client: Annotated[cohere.Client, Depends(cohere_client)], ) -> SearchResults: response = cohere_client.embed( texts=[query.query], model="embed-multilingual-v3.0", input_type="search_query", ) results = client.query_points( collection_name="personal-notes", query=response.embeddings[0], limit=2, ).points return SearchResults( results=[\ Document(**point.payload)\ for point in results\ ] ) ``` Our app might be launched locally for the development purposes, given we have the `uvicorn` server installed: ```shell uvicorn main:app ``` FastAPI exposes an interactive documentation at `http://localhost:8000/docs`, where we can test our endpoint. The `/search` endpoint is available there. ![FastAPI documentation](https://qdrant.tech/documentation/tutorials/cohere-rag-connector/fastapi-openapi.png) We can interact with it and check the documents that will be returned for a specific query. For example, we want to know recall what we are supposed to do regarding the infrastructure for your projects. ```shell curl -X "POST" \ -H "Content-type: application/json" \ -d '{"query": "Is there anything I have to do regarding the project infrastructure?"}' \ "http://localhost:8000/search" ``` The output should look like following: ```json { "results": [\ {\ "title": "Cloud Migration Strategy",\ "text": "Draft a plan for migrating our current on-premise infrastructure to the cloud. The plan should cover the selection of a cloud provider, cost analysis, and a phased migration approach. Identify critical applications for the first phase and any potential risks or challenges. Schedule a meeting with the IT department to discuss the plan."\ },\ {\ "title": "Project Alpha Review",\ "text": "Review the current progress of Project Alpha, focusing on the integration of the new API. Check for any compatibility issues with the existing system and document the steps needed to resolve them. Schedule a meeting with the development team to discuss the timeline and any potential roadblocks."\ }\ ] } ``` ### [Anchor](https://qdrant.tech/documentation/examples/cohere-rag-connector/\#connecting-to-command-r) Connecting to Command-R Our web service is implemented, yet running only on our local machine. It has to be exposed to the public before Command-R can interact with it. For a quick experiment, it might be enough to set up tunneling using services such as [ngrok](https://ngrok.com/). We won’t cover all the details in the tutorial, but their [Quickstart](https://ngrok.com/docs/guides/getting-started/) is a great resource describing the process step-by-step. Alternatively, you can also deploy the service with a public URL. Once it’s done, we can create the connector first, and then tell the model to use it, while interacting through the chat API. Creating a connector is a single call to Cohere client: ```python connector_response = cohere_client.connectors.create( name="personal-notes", url="https:/this-is-my-domain.app/search", ) ``` The `connector_response.connector` will be a descriptor, with `id` being one of the attributes. We’ll use this identifier for our interactions like this: ```python response = cohere_client.chat( message=( "Is there anything I have to do regarding the project infrastructure? " "Please mention the tasks briefly." ), connectors=[\ cohere.ChatConnector(id=connector_response.connector.id)\ ], model="command-r", ) ``` We changed the `model` to `command-r`, as this is currently the best Cohere model available to public. The `response.text` is the output of the model: ```text Here are some of the tasks related to project infrastructure that you might have to perform: - You need to draft a plan for migrating your on-premise infrastructure to the cloud and come up with a plan for the selection of a cloud provider, cost analysis, and a gradual migration approach. - It's important to evaluate your current technology stack to identify any outdated technologies. You should also research emerging technologies and the benefits they could bring to your projects. ``` You only need to create a specific connector once! Please do not call `cohere_client.connectors.create` for every single message you send to the `chat` method. ## [Anchor](https://qdrant.tech/documentation/examples/cohere-rag-connector/\#wrapping-up) Wrapping up We have built a Cohere RAG connector that integrates with your existing knowledge base stored in Qdrant. We covered just the basic flow, but in real world scenarios, you should also consider e.g. [building the authentication\\ system](https://docs.cohere.com/docs/connector-authentication) to prevent unauthorized access. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/cohere-rag-connector.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/cohere-rag-connector.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-43-lllmstxt|> ## platform-deployment-options - [Documentation](https://qdrant.tech/documentation/) - [Hybrid cloud](https://qdrant.tech/documentation/hybrid-cloud/) - Deployment Platforms --- # [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#qdrant-hybrid-cloud-hosting-platforms--deployment-options) Qdrant Hybrid Cloud: Hosting Platforms & Deployment Options This page provides an overview of how to deploy Qdrant Hybrid Cloud on various managed Kubernetes platforms. For a general list of prerequisites and installation steps, see our [Hybrid Cloud setup guide](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). This platform specific documentation also applies to Qdrant Private Cloud. ![Akamai](https://qdrant.tech/documentation/cloud/cloud-providers/akamai.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#akamai-linode) Akamai (Linode) [The Linode Kubernetes Engine (LKE)](https://www.linode.com/products/kubernetes/) is a managed container orchestration engine built on top of Kubernetes. LKE enables you to quickly deploy and manage your containerized applications without needing to build (and maintain) your own Kubernetes cluster. All LKE instances are equipped with a fully managed control plane at no additional cost. First, consult Linode’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on LKE**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-linode-kubernetes-engine) More on Linode Kubernetes Engine - [Getting Started with LKE](https://www.linode.com/docs/products/compute/kubernetes/get-started/) - [LKE Guides](https://www.linode.com/docs/products/compute/kubernetes/guides/) - [LKE API Reference](https://www.linode.com/docs/api/) At the time of writing, Linode [does not support CSI Volume Snapshots](https://github.com/linode/linode-blockstorage-csi-driver/issues/107). ![AWS](https://qdrant.tech/documentation/cloud/cloud-providers/aws.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#amazon-web-services-aws) Amazon Web Services (AWS) [Amazon Elastic Kubernetes Service (Amazon EKS)](https://aws.amazon.com/eks/) is a managed service to run Kubernetes in the AWS cloud and on-premises data centers which can then be paired with Qdrant’s hybrid cloud. With Amazon EKS, you can take advantage of all the performance, scale, reliability, and availability of AWS infrastructure, as well as integrations with AWS networking and security services. First, consult AWS’ managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on AWS**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). For a good balance between peformance and cost, we recommend: - Depending on your cluster resource configuration either general purpose (m6\*, m7\*, or m8\*), memory optimized (r6\*, r7\*, or r8\*) or cpu optimized (c6\*, c7\*, or c8\*) instance types. Qdrant Hybrid Cloud also supports AWS Graviton ARM64 instances. - At least gp3 EBS volumes for storage ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-amazon-elastic-kubernetes-service) More on Amazon Elastic Kubernetes Service - [Getting Started with Amazon EKS](https://docs.aws.amazon.com/eks/) - [Amazon EKS User Guide](https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html) - [Amazon EKS API Reference](https://docs.aws.amazon.com/eks/latest/APIReference/Welcome.html) Your EKS cluster needs the EKS EBS CSI driver or a similar storage driver: - [Amazon EBS CSI Driver](https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html) To allow vertical scaling, you need a StorageClass with volume expansion enabled: - [Amazon EBS CSI Volume Resizing](https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/examples/kubernetes/resizing/README.md) ```yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" name: ebs-sc provisioner: ebs.csi.aws.com reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer allowVolumeExpansion: true ``` To allow backups and restores, your EKS cluster needs the CSI snapshot controller: - [Amazon EBS CSI Snapshot Controller](https://docs.aws.amazon.com/eks/latest/userguide/csi-snapshot-controller.html) And you need to create a VolumeSnapshotClass: ```yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-snapclass deletionPolicy: Delete driver: ebs.csi.aws.com ``` ![Civo](https://qdrant.tech/documentation/cloud/cloud-providers/civo.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#civo) Civo [Civo Kubernetes](https://www.civo.com/kubernetes) is a robust, scalable, and managed Kubernetes service. Civo supplies a CNCF-compliant Kubernetes cluster and makes it easy to provide standard Kubernetes applications and containerized workloads. User-defined Kubernetes clusters can be created as self-service without complications using the Civo Portal. First, consult Civo’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on Civo**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-civo-kubernetes) More on Civo Kubernetes - [Getting Started with Civo Kubernetes](https://www.civo.com/docs/kubernetes) - [Civo Tutorials](https://www.civo.com/learn) - [Frequently Asked Questions on Civo](https://www.civo.com/docs/faq) To allow backups and restores, you need to create a VolumeSnapshotClass: ```yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-snapclass deletionPolicy: Delete driver: csi.civo.com ``` ![Digital Ocean](https://qdrant.tech/documentation/cloud/cloud-providers/digital-ocean.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#digital-ocean) Digital Ocean [DigitalOcean Kubernetes (DOKS)](https://www.digitalocean.com/products/kubernetes) is a managed Kubernetes service that lets you deploy Kubernetes clusters without the complexities of handling the control plane and containerized infrastructure. Clusters are compatible with standard Kubernetes toolchains and integrate natively with DigitalOcean Load Balancers and volumes. First, consult Digital Ocean’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on DigitalOcean**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-digitalocean-kubernetes) More on DigitalOcean Kubernetes - [Getting Started with DOKS](https://docs.digitalocean.com/products/kubernetes/getting-started/quickstart/) - [DOKS - How To Guides](https://docs.digitalocean.com/products/kubernetes/how-to/) - [DOKS - Reference Manual](https://docs.digitalocean.com/products/kubernetes/reference/) ![Gcore](https://qdrant.tech/documentation/cloud/cloud-providers/gcore.svg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#gcore) Gcore [Gcore Managed Kubernetes](https://gcore.com/cloud/managed-kubernetes) is a managed container orchestration engine built on top of Kubernetes. Gcore enables you to quickly deploy and manage your containerized applications without needing to build (and maintain) your own Kubernetes cluster. All Gcore instances are equipped with a fully managed control plane at no additional cost. First, consult Gcore’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on Gcore**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-gcore-kubernetes-engine) More on Gcore Kubernetes Engine - [Getting Started with Kubnetes on Gcore](https://gcore.com/docs/cloud/kubernetes/about-gcore-kubernetes) ![Google Cloud Platform](https://qdrant.tech/documentation/cloud/cloud-providers/gcp.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#google-cloud-platform-gcp) Google Cloud Platform (GCP) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) is a managed Kubernetes service that you can use to deploy and operate containerized applications at scale using Google’s infrastructure. GKE provides the operational power of Kubernetes while managing many of the underlying components, such as the control plane and nodes, for you. First, consult GCP’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on GCP**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). For a good balance between peformance and cost, we recommend: - Depending on your cluster resource configuration either general purpose (standard), memory optimized (highmem) or cpu optimized (highcpu) instance types of at least 2nd generation. Qdrant Hybrid Cloud also supports ARM64 instances. - At least pd-balanced disks for storage ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-the-google-kubernetes-engine) More on the Google Kubernetes Engine - [Getting Started with GKE](https://cloud.google.com/kubernetes-engine/docs/quickstart) - [GKE Tutorials](https://cloud.google.com/kubernetes-engine/docs/tutorials) - [GKE Documentation](https://cloud.google.com/kubernetes-engine/docs/) To allow backups and restores, your GKE cluster needs the CSI VolumeSnapshot controller and class: - [Google GKE Volume Snapshots](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/volume-snapshots) ```yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-snapclass deletionPolicy: Delete driver: pd.csi.storage.gke.io ``` ![Microsoft Azure](https://qdrant.tech/documentation/cloud/cloud-providers/azure.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#mircrosoft-azure) Mircrosoft Azure With [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-in/products/kubernetes-service), you can start developing and deploying cloud-native apps in Azure, data centres, or at the edge. Get unified management and governance for on-premises, edge, and multi-cloud Kubernetes clusters. Interoperate with Azure security, identity, cost management, and migration services. First, consult Azure’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on Azure**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). For a good balance between peformance and cost, we recommend: - Depending on your cluster resource configuration either general purpose (D-family), memory optimized (E-family) or cpu optimized (F-family) instance types. Qdrant Hybrid Cloud also supports Azure Cobalt ARM64 instances. - At least Premium SSD v2 disks for storage ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-azure-kubernetes-service) More on Azure Kubernetes Service - [Getting Started with AKS](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-start-here) - [AKS Documentation](https://learn.microsoft.com/en-in/azure/aks/) - [Best Practices with AKS](https://learn.microsoft.com/en-in/azure/aks/best-practices) To allow backups and restores, your AKS cluster needs the CSI VolumeSnapshot controller and class: - [Azure AKS Volume Snapshots](https://learn.microsoft.com/en-us/azure/aks/azure-disk-csi#create-a-volume-snapshot) ```yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-snapclass deletionPolicy: Delete driver: disk.csi.azure.com ``` ![Oracle Cloud Infrastructure](https://qdrant.tech/documentation/cloud/cloud-providers/oracle.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#oracle-cloud-infrastructure) Oracle Cloud Infrastructure [Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE)](https://www.oracle.com/in/cloud/cloud-native/container-engine-kubernetes/) is a managed Kubernetes solution that enables you to deploy Kubernetes clusters while ensuring stable operations for both the control plane and the worker nodes through automatic scaling, upgrades, and security patching. Additionally, OKE offers a completely serverless Kubernetes experience with virtual nodes. First, consult OCI’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on OCI**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-oci-container-engine) More on OCI Container Engine - [Getting Started with OCI](https://docs.oracle.com/en-us/iaas/Content/ContEng/home.htm) - [Frequently Asked Questions on OCI](https://www.oracle.com/in/cloud/cloud-native/container-engine-kubernetes/faq/) - [OCI Product Updates](https://docs.oracle.com/en-us/iaas/releasenotes/services/conteng/) To allow backups and restores, your OCI cluster needs the CSI VolumeSnapshot controller and class: - [Prerequisites for Creating Volume Snapshots](https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV.htm#contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV-PV_From_Snapshot_CSI__section_volume-snapshot-prerequisites) ```yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-snapclass deletionPolicy: Delete driver: blockvolume.csi.oraclecloud.com ``` ![OVHcloud](https://qdrant.tech/documentation/cloud/cloud-providers/ovh.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#ovhcloud) OVHcloud [Service Managed Kubernetes](https://www.ovhcloud.com/en-in/public-cloud/kubernetes/), powered by OVH Public Cloud Instances, a leading European cloud provider. With OVHcloud Load Balancers and disks built in. OVHcloud Managed Kubernetes provides high availability, compliance, and CNCF conformance, allowing you to focus on your containerized software layers with total reversibility. First, consult OVHcloud’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on OVHcloud**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-service-managed-kubernetes-by-ovhcloud) More on Service Managed Kubernetes by OVHcloud - [Getting Started with OVH Managed Kubernetes](https://help.ovhcloud.com/csm/en-in-documentation-public-cloud-containers-orchestration-managed-kubernetes-k8s-getting-started) - [OVH Managed Kubernetes Documentation](https://help.ovhcloud.com/csm/en-in-documentation-public-cloud-containers-orchestration-managed-kubernetes-k8s) - [OVH Managed Kubernetes Tutorials](https://help.ovhcloud.com/csm/en-in-documentation-public-cloud-containers-orchestration-managed-kubernetes-k8s-tutorials) ![Red Hat](https://qdrant.tech/documentation/cloud/cloud-providers/redhat.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#red-hat-openshift) Red Hat OpenShift [Red Hat OpenShift Kubernetes Engine](https://www.redhat.com/en/technologies/cloud-computing/openshift/kubernetes-engine) provides you with the basic functionality of Red Hat OpenShift. It offers a subset of the features that Red Hat OpenShift Container Platform offers, like full access to an enterprise-ready Kubernetes environment and an extensive compatibility test matrix with many of the software elements that you might use in your data centre. First, consult Red Hat’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on Red Hat OpenShift**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-openshift-kubernetes-engine) More on OpenShift Kubernetes Engine - [Getting Started with Red Hat OpenShift Kubernetes](https://docs.openshift.com/container-platform/4.15/getting_started/kubernetes-overview.html) - [Red Hat OpenShift Kubernetes Documentation](https://docs.openshift.com/container-platform/4.15/welcome/index.html) - [Installing on Container Platforms](https://access.redhat.com/documentation/en-us/openshift_container_platform/4.5/html/installing/index) Qdrant databases need a persistent storage solution. See [Openshift Storage Overview](https://docs.openshift.com/container-platform/4.15/storage/index.html). To allow vertical scaling, you need a StorageClass with [volume expansion enabled](https://docs.openshift.com/container-platform/4.15/storage/expanding-persistent-volumes.html). To allow backups and restores, your OpenShift cluster needs the [CSI snapshot controller](https://docs.openshift.com/container-platform/4.15/storage/container_storage_interface/persistent-storage-csi-snapshots.html), and you need to create a VolumeSnapshotClass. ![Scaleway](https://qdrant.tech/documentation/cloud/cloud-providers/scaleway.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#scaleway) Scaleway [Scaleway Kapsule](https://www.scaleway.com/en/kubernetes-kapsule/) and [Kosmos](https://www.scaleway.com/en/kubernetes-kosmos/) are managed Kubernetes services from [Scaleway](https://www.scaleway.com/en/). They abstract away the complexities of managing and operating a Kubernetes cluster. The primary difference being, Kapsule clusters are composed solely of Scaleway Instances. Whereas, a Kosmos cluster is a managed multi-cloud Kubernetes engine that allows you to connect instances from any cloud provider to a single managed Control-Plane. First, consult Scaleway’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on Scaleway**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-scaleway-kubernetes) More on Scaleway Kubernetes - [Getting Started with Scaleway Kubernetes](https://www.scaleway.com/en/docs/containers/kubernetes/quickstart/#how-to-add-a-scaleway-pool-to-a-kubernetes-cluster) - [Scaleway Kubernetes Documentation](https://www.scaleway.com/en/docs/containers/kubernetes/) - [Frequently Asked Questions on Scaleway Kubernetes](https://www.scaleway.com/en/docs/faq/kubernetes/) ![STACKIT](https://qdrant.tech/documentation/cloud/cloud-providers/stackit.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#stackit) STACKIT [STACKIT Kubernetes Engine (SKE)](https://www.stackit.de/en/product/kubernetes/) is a robust, scalable, and managed Kubernetes service. SKE supplies a CNCF-compliant Kubernetes cluster and makes it easy to provide standard Kubernetes applications and containerized workloads. User-defined Kubernetes clusters can be created as self-service without complications using the STACKIT Portal. First, consult STACKIT’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on STACKIT**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-stackit-kubernetes-engine) More on STACKIT Kubernetes Engine - [Getting Started with SKE](https://docs.stackit.cloud/stackit/en/getting-started-ske-10125565.html) - [SKE Tutorials](https://docs.stackit.cloud/stackit/en/tutorials-ske-66683162.html) - [Frequently Asked Questions on SKE](https://docs.stackit.cloud/stackit/en/faq-known-issues-of-ske-28476393.html) To allow backups and restores, you need to create a VolumeSnapshotClass: ```yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-snapclass deletionPolicy: Delete driver: cinder.csi.openstack.org ``` ![Vultr](https://qdrant.tech/documentation/cloud/cloud-providers/vultr.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#vultr) Vultr [Vultr Kubernetes Engine (VKE)](https://www.vultr.com/kubernetes/) is a fully-managed product offering with predictable pricing that makes Kubernetes easy to use. Vultr manages the control plane and worker nodes and provides integration with other managed services such as Load Balancers, Block Storage, and DNS. First, consult Vultr’s managed Kubernetes instructions below. Then, **to set up Qdrant Hybrid Cloud on Vultr**, follow our [step-by-step documentation](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#more-on-vultr-kubernetes-engine) More on Vultr Kubernetes Engine - [VKE Guide](https://docs.vultr.com/vultr-kubernetes-engine) - [VKE Documentation](https://docs.vultr.com/) - [Frequently Asked Questions on VKE](https://docs.vultr.com/vultr-kubernetes-engine#frequently-asked-questions) At the time of writing, Vultr does not support CSI Volume Snapshots. ![Kubernetes](https://qdrant.tech/documentation/cloud/cloud-providers/kubernetes.jpg) ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#generic-kubernetes-support-on-premises-cloud-edge) Generic Kubernetes Support (on-premises, cloud, edge) Qdrant Hybrid Cloud works with any Kubernetes cluster that meets the [standard compliance](https://www.cncf.io/training/certification/software-conformance/) requirements. This includes for example: - [VMWare Tanzu](https://tanzu.vmware.com/kubernetes-grid) - [Red Hat OpenShift](https://www.openshift.com/) - [SUSE Rancher](https://www.rancher.com/) - [Canonical Kubernetes](https://ubuntu.com/kubernetes) - [RKE](https://rancher.com/docs/rke/latest/en/) - [RKE2](https://docs.rke2.io/) - [K3s](https://k3s.io/) Qdrant databases need persistent block storage. Most storage solutions provide a CSI driver that can be used with Kubernetes. See [CSI drivers](https://kubernetes-csi.github.io/docs/drivers.html) for more information. To allow vertical scaling, you need a StorageClass with volume expansion enabled. See [Volume Expansion](https://kubernetes.io/docs/concepts/storage/storage-classes/#allow-volume-expansion) for more information. To allow backups and restores, your CSI driver needs to support volume snapshots cluster needs the CSI VolumeSnapshot controller and class. See [CSI Volume Snapshots](https://kubernetes-csi.github.io/docs/snapshot-controller.html) for more information. ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/\#next-steps) Next Steps Once you’ve got a Kubernetes cluster deployed on a platform of your choosing, you can begin setting up Qdrant Hybrid Cloud. Head to our Qdrant Hybrid Cloud [setup guide](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-setup/) for instructions. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/platform-deployment-options.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/platform-deployment-options.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-44-lllmstxt|> ## bulk-upload - [Documentation](https://qdrant.tech/documentation/) - [Database tutorials](https://qdrant.tech/documentation/database-tutorials/) - Bulk Upload Vectors --- # [Anchor](https://qdrant.tech/documentation/database-tutorials/bulk-upload/\#bulk-upload-vectors-to-a-qdrant-collection) Bulk Upload Vectors to a Qdrant Collection Uploading a large-scale dataset fast might be a challenge, but Qdrant has a few tricks to help you with that. The first important detail about data uploading is that the bottleneck is usually located on the client side, not on the server side. This means that if you are uploading a large dataset, you should prefer a high-performance client library. We recommend using our [Rust client library](https://github.com/qdrant/rust-client) for this purpose, as it is the fastest client library available for Qdrant. If you are not using Rust, you might want to consider parallelizing your upload process. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/bulk-upload/\#choose-an-indexing-strategy) Choose an Indexing Strategy Qdrant incrementally builds an HNSW index for dense vectors as new data arrives. This ensures fast search, but indexing is memory- and CPU-intensive. During bulk ingestion, frequent index updates can reduce throughput and increase resource usage. To control this behavior and optimize for your system’s limits, adjust the following parameters: | Your Goal | What to Do | Configuration | | --- | --- | --- | | Fastest upload, tolerate high RAM usage | Disable indexing completely | `indexing_threshold: 0` | | Low memory usage during upload | Defer HNSW graph construction (recommended) | `m: 0` | | Faster index availability after upload | Keep indexing enabled (default behavior) | `m: 16`, `indexing_threshold: 20000` _(default)_ | Indexing must be re-enabled after upload to activate fast HNSW search if it was disabled during ingestion. ### [Anchor](https://qdrant.tech/documentation/database-tutorials/bulk-upload/\#defer-hnsw-graph-construction-m-0) Defer HNSW graph construction ( `m: 0`) For dense vectors, setting the HNSW `m` parameter to `0` disables index building entirely. Vectors will still be stored, but not indexed until you enable indexing later. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "hnsw_config": { "m": 0 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), hnsw_config=models.HnswConfigDiff( m=0, ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, hnsw_config: { m: 0, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, HnswConfigDiffBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .hnsw_config(HnswConfigDiffBuilder::default().m(0)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.HnswConfigDiff; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setHnswConfig(HnswConfigDiff.newBuilder().setM(0).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, hnswConfig: new HnswConfigDiff { M = 0 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), HnswConfig: &qdrant.HnswConfigDiff{ M: qdrant.PtrOf(uint64(0)), }, }) ``` Once ingestion is complete, re-enable HNSW by setting `m` to your production value (usually 16 or 32). httppythontypescriptrustjavacsharpgo ```http PATCH /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "hnsw_config": { "m": 16 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.update_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), hnsw_config=models.HnswConfigDiff( m=16, ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.updateCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, hnsw_config: { m: 16, }, }); ``` ```rust use qdrant_client::qdrant::{ UpdateCollectionBuilder, HnswConfigDiffBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .update_collection( UpdateCollectionBuilder::new("{collection_name}") .hnsw_config(HnswConfigDiffBuilder::default().m(16)), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.UpdateCollection; import io.qdrant.client.grpc.Collections.HnswConfigDiff; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.updateCollectionAsync( UpdateCollection.newBuilder() .setCollectionName("{collection_name}") .setHnswConfig(HnswConfigDiff.newBuilder().setM(16).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateCollectionAsync( collectionName: "{collection_name}", hnswConfig: new HnswConfigDiff { M = 16 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client, err := client.UpdateCollection(context.Background(), &qdrant.UpdateCollection{ CollectionName: "{collection_name}", HnswConfig: &qdrant.HnswConfigDiff{ M: qdrant.PtrOf(uint64(16)), }, }) ``` ### [Anchor](https://qdrant.tech/documentation/database-tutorials/bulk-upload/\#disable-indexing-completely-indexing_threshold-0) Disable indexing completely ( `indexing_threshold: 0`) In case you are doing an initial upload of a large dataset, you might want to disable indexing during upload. It will enable to avoid unnecessary indexing of vectors, which will be overwritten by the next batch. Setting `indexing_threshold` to `0` disables indexing altogether: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "optimizers_config": { "indexing_threshold": 0 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), optimizers_config=models.OptimizersConfigDiff( indexing_threshold=0, ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, optimizers_config: { indexing_threshold: 0, }, }); ``` ```rust use qdrant_client::qdrant::{ OptimizersConfigDiffBuilder, UpdateCollectionBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .optimizers_config(OptimizersConfigDiffBuilder::default().indexing_threshold(0)), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setOptimizersConfig( OptimizersConfigDiff.newBuilder() .setIndexingThreshold(0) .build()) .build() ).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, optimizersConfig: new OptimizersConfigDiff { IndexingThreshold = 0 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), OptimizersConfig: &qdrant.OptimizersConfigDiff{ IndexingThreshold: qdrant.PtrOf(uint64(0)), }, }) ``` After upload is done, you can enable indexing by setting `indexing_threshold` to a desired value (default is 20000): httppythontypescriptrustjavacsharpgo ```http PATCH /collections/{collection_name} { "optimizers_config": { "indexing_threshold": 20000 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.update_collection( collection_name="{collection_name}", optimizer_config=models.OptimizersConfigDiff(indexing_threshold=20000), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.updateCollection("{collection_name}", { optimizers_config: { indexing_threshold: 20000, }, }); ``` ```rust use qdrant_client::qdrant::{ OptimizersConfigDiffBuilder, UpdateCollectionBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .update_collection( UpdateCollectionBuilder::new("{collection_name}") .optimizers_config(OptimizersConfigDiffBuilder::default().indexing_threshold(20000)), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.UpdateCollection; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; client.updateCollectionAsync( UpdateCollection.newBuilder() .setCollectionName("{collection_name}") .setOptimizersConfig( OptimizersConfigDiff.newBuilder() .setIndexingThreshold(20000) .build() ) .build() ).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpdateCollectionAsync( collectionName: "{collection_name}", optimizersConfig: new OptimizersConfigDiff { IndexingThreshold = 20000 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.UpdateCollection(context.Background(), &qdrant.UpdateCollection{ CollectionName: "{collection_name}", OptimizersConfig: &qdrant.OptimizersConfigDiff{ IndexingThreshold: qdrant.PtrOf(uint64(20000)), }, }) ``` At this point, Qdrant will begin indexing new and previously unindexed segments in the background. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/bulk-upload/\#upload-directly-to-disk) Upload directly to disk When the vectors you upload do not all fit in RAM, you likely want to use [memmap](https://qdrant.tech/documentation/concepts/storage/#configuring-memmap-storage) support. During collection [creation](https://qdrant.tech/documentation/concepts/collections/#create-collection), memmaps may be enabled on a per-vector basis using the `on_disk` parameter. This will store vector data directly on disk at all times. It is suitable for ingesting a large amount of data, essential for the billion scale benchmark. Using `memmap_threshold` is not recommended in this case. It would require the [optimizer](https://qdrant.tech/documentation/concepts/optimizer/) to constantly transform in-memory segments into memmap segments on disk. This process is slower, and the optimizer can be a bottleneck when ingesting a large amount of data. Read more about this in [Configuring Memmap Storage](https://qdrant.tech/documentation/concepts/storage/#configuring-memmap-storage). ## [Anchor](https://qdrant.tech/documentation/database-tutorials/bulk-upload/\#parallel-upload-into-multiple-shards) Parallel upload into multiple shards In Qdrant, each collection is split into shards. Each shard has a separate Write-Ahead-Log (WAL), which is responsible for ordering operations. By creating multiple shards, you can parallelize upload of a large dataset. From 2 to 4 shards per one machine is a reasonable number. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "shard_number": 2 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), shard_number=2, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, shard_number: 2, }); ``` ```rust use qdrant_client::qdrant::{CreateCollectionBuilder, Distance, VectorParamsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .shard_number(2), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setShardNumber(2) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, shardNumber: 2 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), ShardNumber: qdrant.PtrOf(uint32(2)), }) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/bulk-upload.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/bulk-upload.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-45-lllmstxt|> ## dedicated-service - [Articles](https://qdrant.tech/articles/) - Vector Search as a dedicated service [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Vector Search as a dedicated service Andrey Vasnetsov · November 30, 2023 ![Vector Search as a dedicated service](https://qdrant.tech/articles_data/dedicated-service/preview/title.jpg) Ever since the data science community discovered that vector search significantly improves LLM answers, various vendors and enthusiasts have been arguing over the proper solutions to store embeddings. Some say storing them in a specialized engine (aka vector database) is better. Others say that it’s enough to use plugins for existing databases. Here are [just](https://nextword.substack.com/p/vector-database-is-not-a-separate) a [few](https://stackoverflow.blog/2023/09/20/do-you-need-a-specialized-vector-database-to-implement-vector-search-well/) of [them](https://www.singlestore.com/blog/why-your-vector-database-should-not-be-a-vector-database/). This article presents our vision and arguments on the topic . We will: 1. Explain why and when you actually need a dedicated vector solution 2. Debunk some ungrounded claims and anti-patterns to be avoided when building a vector search system. A table of contents: - _Each database vendor will sooner or later introduce vector capabilities…_ \[ [click](https://qdrant.tech/articles/dedicated-service/#each-database-vendor-will-sooner-or-later-introduce-vector-capabilities-that-will-make-every-database-a-vector-database)\] - _Having a dedicated vector database requires duplication of data._ \[ [click](https://qdrant.tech/articles/dedicated-service/#having-a-dedicated-vector-database-requires-duplication-of-data)\] - _Having a dedicated vector database requires complex data synchronization._ \[ [click](https://qdrant.tech/articles/dedicated-service/#having-a-dedicated-vector-database-requires-complex-data-synchronization)\] - _You have to pay for a vector service uptime and data transfer._ \[ [click](https://qdrant.tech/articles/dedicated-service/#you-have-to-pay-for-a-vector-service-uptime-and-data-transfer-of-both-solutions)\] - _What is more seamless than your current database adding vector search capability?_ \[ [click](https://qdrant.tech/articles/dedicated-service/#what-is-more-seamless-than-your-current-database-adding-vector-search-capability)\] - _Databases can support RAG use-case end-to-end._ \[ [click](https://qdrant.tech/articles/dedicated-service/#databases-can-support-rag-use-case-end-to-end)\] ## [Anchor](https://qdrant.tech/articles/dedicated-service/\#responding-to-claims) Responding to claims ###### [Anchor](https://qdrant.tech/articles/dedicated-service/\#each-database-vendor-will-sooner-or-later-introduce-vector-capabilities-that-will-make-every-database-a-vector-database) Each database vendor will sooner or later introduce vector capabilities. That will make every database a Vector Database. The origins of this misconception lie in the careless use of the term Vector _Database_. When we think of a _database_, we subconsciously envision a relational database like Postgres or MySQL. Or, more scientifically, a service built on ACID principles that provides transactions, strong consistency guarantees, and atomicity. The majority of Vector Database are not _databases_ in this sense. It is more accurate to call them _search engines_, but unfortunately, the marketing term _vector database_ has already stuck, and it is unlikely to change. _What makes search engines different, and why vector DBs are built as search engines?_ First of all, search engines assume different patterns of workloads and prioritize different properties of the system. The core architecture of such solutions is built around those priorities. What types of properties do search engines prioritize? - **Scalability**. Search engines are built to handle large amounts of data and queries. They are designed to be horizontally scalable and operate with more data than can fit into a single machine. - **Search speed**. Search engines should guarantee low latency for queries, while the atomicity of updates is less important. - **Availability**. Search engines must stay available if the majority of the nodes in a cluster are down. At the same time, they can tolerate the eventual consistency of updates. ![Database guarantees compass](https://qdrant.tech/articles_data/dedicated-service/compass.png) Database guarantees compass Those priorities lead to different architectural decisions that are not reproducible in a general-purpose database, even if it has vector index support. ###### [Anchor](https://qdrant.tech/articles/dedicated-service/\#having-a-dedicated-vector-database-requires-duplication-of-data) Having a dedicated vector database requires duplication of data. By their very nature, vector embeddings are derivatives of the primary source data. In the vast majority of cases, embeddings are derived from some other data, such as text, images, or additional information stored in your system. So, in fact, all embeddings you have in your system can be considered transformations of some original source. And the distinguishing feature of derivative data is that it will change when the transformation pipeline changes. In the case of vector embeddings, the scenario of those changes is quite simple: every time you update the encoder model, all the embeddings will change. In systems where vector embeddings are fused with the primary data source, it is impossible to perform such migrations without significantly affecting the production system. As a result, even if you want to use a single database for storing all kinds of data, you would still need to duplicate data internally. ###### [Anchor](https://qdrant.tech/articles/dedicated-service/\#having-a-dedicated-vector-database-requires-complex-data-synchronization) Having a dedicated vector database requires complex data synchronization. Most production systems prefer to isolate different types of workloads into separate services. In many cases, those isolated services are not even related to search use cases. For example, databases for analytics and one for serving can be updated from the same source. Yet they can store and organize the data in a way that is optimal for their typical workloads. Search engines are usually isolated for the same reason: you want to avoid creating a noisy neighbor problem and compromise the performance of your main database. _To give you some intuition, let’s consider a practical example:_ Assume we have a database with 1 million records. This is a small database by modern standards of any relational database. You can probably use the smallest free tier of any cloud provider to host it. But if we want to use this database for vector search, 1 million OpenAI `text-embedding-ada-002` embeddings will take **~6GB of RAM** (sic!). As you can see, the vector search use case completely overwhelmed the main database resource requirements. In practice, this means that your main database becomes burdened with high memory requirements and can not scale efficiently, limited by the size of a single machine. Fortunately, the data synchronization problem is not new and definitely not unique to vector search. There are many well-known solutions, starting with message queues and ending with specialized ETL tools. For example, we recently released our [integration with Airbyte](https://qdrant.tech/documentation/integrations/airbyte/), allowing you to synchronize data from various sources into Qdrant incrementally. ###### [Anchor](https://qdrant.tech/articles/dedicated-service/\#you-have-to-pay-for-a-vector-service-uptime-and-data-transfer-of-both-solutions) You have to pay for a vector service uptime and data transfer of both solutions. In the open-source world, you pay for the resources you use, not the number of different databases you run. Resources depend more on the optimal solution for each use case. As a result, running a dedicated vector search engine can be even cheaper, as it allows optimization specifically for vector search use cases. For instance, Qdrant implements a number of [quantization techniques](https://qdrant.tech/documentation/guides/quantization/) that can significantly reduce the memory footprint of embeddings. In terms of data transfer costs, on most cloud providers, network use within a region is usually free. As long as you put the original source data and the vector store in the same region, there are no added data transfer costs. ###### [Anchor](https://qdrant.tech/articles/dedicated-service/\#what-is-more-seamless-than-your-current-database-adding-vector-search-capability) What is more seamless than your current database adding vector search capability? In contrast to the short-term attractiveness of integrated solutions, dedicated search engines propose flexibility and a modular approach. You don’t need to update the whole production database each time some of the vector plugins are updated. Maintenance of a dedicated search engine is as isolated from the main database as the data itself. In fact, integration of more complex scenarios, such as read/write segregation, is much easier with a dedicated vector solution. You can easily build cross-region replication to ensure low latency for your users. ![Read/Write segregation + cross-regional deployment](https://qdrant.tech/articles_data/dedicated-service/region-based-deploy.png) Read/Write segregation + cross-regional deployment It is especially important in large enterprise organizations, where the responsibility for different parts of the system is distributed among different teams. In those situations, it is much easier to maintain a dedicated search engine for the AI team than to convince the core team to update the whole primary database. Finally, the vector capabilities of the all-in-one database are tied to the development and release cycle of the entire stack. Their long history of use also means that they need to pay a high price for backward compatibility. ###### [Anchor](https://qdrant.tech/articles/dedicated-service/\#databases-can-support-rag-use-case-end-to-end) Databases can support RAG use-case end-to-end. Putting aside performance and scalability questions, the whole discussion about implementing RAG in the DBs assumes that the only detail missing in traditional databases is the vector index and the ability to make fast ANN queries. In fact, the current capabilities of vector search have only scratched the surface of what is possible. For example, in our recent article, we discuss the possibility of building an [exploration API](https://qdrant.tech/articles/vector-similarity-beyond-search/) to fuel the discovery process - an alternative to kNN search, where you don’t even know what exactly you are looking for. ## [Anchor](https://qdrant.tech/articles/dedicated-service/\#summary) Summary Ultimately, you do not need a vector database if you are looking for a simple vector search functionality with a small amount of data. We genuinely recommend starting with whatever you already have in your stack to prototype. But you need one if you are looking to do more out of it, and it is the central functionality of your application. It is just like using a multi-tool to make something quick or using a dedicated instrument highly optimized for the use case. Large-scale production systems usually consist of different specialized services and storage types for good reasons since it is one of the best practices of modern software architecture. Comparable to the orchestration of independent building blocks in a microservice architecture. When you stuff the database with a vector index, you compromise both the performance and scalability of the main database and the vector search capabilities. There is no one-size-fits-all approach that would not compromise on performance or flexibility. So if your use case utilizes vector search in any significant way, it is worth investing in a dedicated vector search engine, aka vector database. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dedicated-service.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dedicated-service.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-46-lllmstxt|> ## data-streaming-kafka-qdrant - [Documentation](https://qdrant.tech/documentation/) - [Send data](https://qdrant.tech/documentation/send-data/) - How to Setup Seamless Data Streaming with Kafka and Qdrant --- # [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#setup-data-streaming-with-kafka-via-confluent) Setup Data Streaming with Kafka via Confluent **Author:** [M K Pavan Kumar](https://www.linkedin.com/in/kameshwara-pavan-kumar-mantha-91678b21/) , research scholar at [IIITDM, Kurnool](https://iiitk.ac.in/). Specialist in hallucination mitigation techniques and RAG methodologies. • [GitHub](https://github.com/pavanjava) • [Medium](https://medium.com/@manthapavankumar11) ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#introduction) Introduction This guide will walk you through the detailed steps of installing and setting up the [Qdrant Sink Connector](https://github.com/qdrant/qdrant-kafka), building the necessary infrastructure, and creating a practical playground application. By the end of this article, you will have a deep understanding of how to leverage this powerful integration to streamline your data workflows, ultimately enhancing the performance and capabilities of your data-driven real-time semantic search and RAG applications. In this example, original data will be sourced from Azure Blob Storage and MongoDB. ![1.webp](https://qdrant.tech/documentation/examples/data-streaming-kafka-qdrant/1.webp) Figure 1: [Real time Change Data Capture (CDC)](https://www.confluent.io/learn/change-data-capture/) with Kafka and Qdrant. ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#the-architecture) The Architecture: ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#source-systems) Source Systems The architecture begins with the **source systems**, represented by MongoDB and Azure Blob Storage. These systems are vital for storing and managing raw data. MongoDB, a popular NoSQL database, is known for its flexibility in handling various data formats and its capability to scale horizontally. It is widely used for applications that require high performance and scalability. Azure Blob Storage, on the other hand, is Microsoft’s object storage solution for the cloud. It is designed for storing massive amounts of unstructured data, such as text or binary data. The data from these sources is extracted using **source connectors**, which are responsible for capturing changes in real-time and streaming them into Kafka. ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#kafka) Kafka At the heart of this architecture lies **Kafka**, a distributed event streaming platform capable of handling trillions of events a day. Kafka acts as a central hub where data from various sources can be ingested, processed, and distributed to various downstream systems. Its fault-tolerant and scalable design ensures that data can be reliably transmitted and processed in real-time. Kafka’s capability to handle high-throughput, low-latency data streams makes it an ideal choice for real-time data processing and analytics. The use of **Confluent** enhances Kafka’s functionalities, providing additional tools and services for managing Kafka clusters and stream processing. ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#qdrant) Qdrant The processed data is then routed to **Qdrant**, a highly scalable vector search engine designed for similarity searches. Qdrant excels at managing and searching through high-dimensional vector data, which is essential for applications involving machine learning and AI, such as recommendation systems, image recognition, and natural language processing. The **Qdrant Sink Connector** for Kafka plays a pivotal role here, enabling seamless integration between Kafka and Qdrant. This connector allows for the real-time ingestion of vector data into Qdrant, ensuring that the data is always up-to-date and ready for high-performance similarity searches. ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#integration-and-pipeline-importance) Integration and Pipeline Importance The integration of these components forms a powerful and efficient data streaming pipeline. The **Qdrant Sink Connector** ensures that the data flowing through Kafka is continuously ingested into Qdrant without any manual intervention. This real-time integration is crucial for applications that rely on the most current data for decision-making and analysis. By combining the strengths of MongoDB and Azure Blob Storage for data storage, Kafka for data streaming, and Qdrant for vector search, this pipeline provides a robust solution for managing and processing large volumes of data in real-time. The architecture’s scalability, fault-tolerance, and real-time processing capabilities are key to its effectiveness, making it a versatile solution for modern data-driven applications. ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#installation-of-confluent-kafka-platform) Installation of Confluent Kafka Platform To install the Confluent Kafka Platform (self-managed locally), follow these 3 simple steps: **Download and Extract the Distribution Files:** - Visit [Confluent Installation Page](https://www.confluent.io/installation/). - Download the distribution files (tar, zip, etc.). - Extract the downloaded file using: ```bash tar -xvf confluent-.tar.gz ``` or ```bash unzip confluent-.zip ``` **Configure Environment Variables:** ```bash --- # Set CONFLUENT_HOME to the installation directory: export CONFLUENT_HOME=/path/to/confluent- --- # Add Confluent binaries to your PATH export PATH=$CONFLUENT_HOME/bin:$PATH ``` **Run Confluent Platform Locally:** ```bash --- # Start the Confluent Platform services: confluent local start --- # Stop the Confluent Platform services: confluent local stop ``` ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#installation-of-qdrant) Installation of Qdrant: To install and run Qdrant (self-managed locally), you can use Docker, which simplifies the process. First, ensure you have Docker installed on your system. Then, you can pull the Qdrant image from Docker Hub and run it with the following commands: ```bash docker pull qdrant/qdrant docker run -p 6334:6334 -p 6333:6333 qdrant/qdrant ``` This will download the Qdrant image and start a Qdrant instance accessible at `http://localhost:6333`. For more detailed instructions and alternative installation methods, refer to the [Qdrant installation documentation](https://qdrant.tech/documentation/quick-start/). ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#installation-of-qdrant-kafka-sink-connector) Installation of Qdrant-Kafka Sink Connector: To install the Qdrant Kafka connector using [Confluent Hub](https://www.confluent.io/hub/), you can utilize the straightforward `confluent-hub install` command. This command simplifies the process by eliminating the need for manual configuration file manipulations. To install the Qdrant Kafka connector version 1.1.0, execute the following command in your terminal: ```bash confluent-hub install qdrant/qdrant-kafka:1.1.0 ``` This command downloads and installs the specified connector directly from Confluent Hub into your Confluent Platform or Kafka Connect environment. The installation process ensures that all necessary dependencies are handled automatically, allowing for a seamless integration of the Qdrant Kafka connector with your existing setup. Once installed, the connector can be configured and managed using the Confluent Control Center or the Kafka Connect REST API, enabling efficient data streaming between Kafka and Qdrant without the need for intricate manual setup. ![2.webp](https://qdrant.tech/documentation/examples/data-streaming-kafka-qdrant/2.webp) _Figure 2: Local Confluent platform showing the Source and Sink connectors after installation._ Ensure the configuration of the connector once it’s installed as below. keep in mind that your `key.converter` and `value.converter` are very important for kafka to safely deliver the messages from topic to qdrant. ```bash { "name": "QdrantSinkConnectorConnector_0", "config": { "value.converter.schemas.enable": "false", "name": "QdrantSinkConnectorConnector_0", "connector.class": "io.qdrant.kafka.QdrantSinkConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "topics": "topic_62,qdrant_kafka.docs", "errors.deadletterqueue.topic.name": "dead_queue", "errors.deadletterqueue.topic.replication.factor": "1", "qdrant.grpc.url": "http://localhost:6334", "qdrant.api.key": "************" } } ``` ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#installation-of-mongodb) Installation of MongoDB For the Kafka to connect MongoDB as source, your MongoDB instance should be running in a `replicaSet` mode. below is the `docker compose` file which will spin a single node `replicaSet` instance of MongoDB. ```bash version: "3.8" services: mongo1: image: mongo:7.0 command: ["--replSet", "rs0", "--bind_ip_all", "--port", "27017"] ports: - 27017:27017 healthcheck: test: echo "try { rs.status() } catch (err) { rs.initiate({_id:'rs0',members:[{_id:0,host:'host.docker.internal:27017'}]}) }" | mongosh --port 27017 --quiet interval: 5s timeout: 30s start_period: 0s start_interval: 1s retries: 30 volumes: - "mongo1_data:/data/db" - "mongo1_config:/data/configdb" volumes: mongo1_data: mongo1_config: ``` Similarly, install and configure source connector as below. ```bash confluent-hub install mongodb/kafka-connect-mongodb:latest ``` After installing the `MongoDB` connector, connector configuration should look like this: ```bash { "name": "MongoSourceConnectorConnector_0", "config": { "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "connection.uri": "mongodb://127.0.0.1:27017/?replicaSet=rs0&directConnection=true", "database": "qdrant_kafka", "collection": "docs", "publish.full.document.only": "true", "topic.namespace.map": "{\"*\":\"qdrant_kafka.docs\"}", "copy.existing": "true" } } ``` ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#playground-application) Playground Application As the infrastructure set is completely done, now it’s time for us to create a simple application and check our setup. the objective of our application is the data is inserted to Mongodb and eventually it will get ingested into Qdrant also using [Change Data Capture (CDC)](https://www.confluent.io/learn/change-data-capture/). `requirements.txt` ```bash fastembed==0.3.1 pymongo==4.8.0 qdrant_client==1.10.1 ``` `project_root_folder/main.py` This is just sample code. Nevertheless it can be extended to millions of operations based on your use case. pythonpython ```python from pymongo import MongoClient from utils.app_utils import create_qdrant_collection from fastembed import TextEmbedding collection_name: str = 'test' embed_model_name: str = 'snowflake/snowflake-arctic-embed-s' ``` ```python --- # Step 0: create qdrant_collection create_qdrant_collection(collection_name=collection_name, embed_model=embed_model_name) --- # Step 1: Connect to MongoDB client = MongoClient('mongodb://127.0.0.1:27017/?replicaSet=rs0&directConnection=true') --- # Step 2: Select Database db = client['qdrant_kafka'] --- # Step 3: Select Collection collection = db['docs'] --- # Step 4: Create a Document to Insert description = "qdrant is a high available vector search engine" embedding_model = TextEmbedding(model_name=embed_model_name) vector = next(embedding_model.embed(documents=description)).tolist() document = { "collection_name": collection_name, "id": 1, "vector": vector, "payload": { "name": "qdrant", "description": description, "url": "https://qdrant.tech/documentation" } } --- # Step 5: Insert the Document into the Collection result = collection.insert_one(document) --- # Step 6: Print the Inserted Document's ID print("Inserted document ID:", result.inserted_id) ``` `project_root_folder/utils/app_utils.py` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333", api_key="") dimension_dict = {"snowflake/snowflake-arctic-embed-s": 384} def create_qdrant_collection(collection_name: str, embed_model: str): if not client.collection_exists(collection_name=collection_name): client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams(size=dimension_dict.get(embed_model), distance=models.Distance.COSINE) ) ``` Before we run the application, below is the state of MongoDB and Qdrant databases. ![3.webp](https://qdrant.tech/documentation/examples/data-streaming-kafka-qdrant/3.webp) Figure 3: Initial state: no collection named `test` & `no data` in the `docs` collection of MongodDB. Once you run the code the data goes into Mongodb and the CDC gets triggered and eventually Qdrant will receive this data. ![4.webp](https://qdrant.tech/documentation/examples/data-streaming-kafka-qdrant/4.webp) Figure 4: The test Qdrant collection is created automatically. ![5.webp](https://qdrant.tech/documentation/examples/data-streaming-kafka-qdrant/5.webp) Figure 5: Data is inserted into both MongoDB and Qdrant. ## [Anchor](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/\#conclusion) Conclusion: In conclusion, the integration of **Kafka** with **Qdrant** using the **Qdrant Sink Connector** provides a seamless and efficient solution for real-time data streaming and processing. This setup not only enhances the capabilities of your data pipeline but also ensures that high-dimensional vector data is continuously indexed and readily available for similarity searches. By following the installation and setup guide, you can easily establish a robust data flow from your **source systems** like **MongoDB** and **Azure Blob Storage**, through **Kafka**, and into **Qdrant**. This architecture empowers modern applications to leverage real-time data insights and advanced search capabilities, paving the way for innovative data-driven solutions. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/data-streaming-kafka-qdrant.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/data-streaming-kafka-qdrant.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-47-lllmstxt|> ## fastembed-splade - [Documentation](https://qdrant.tech/documentation/) - [Fastembed](https://qdrant.tech/documentation/fastembed/) - Working with SPLADE --- # [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#how-to-generate-sparse-vectors-with-splade) How to Generate Sparse Vectors with SPLADE SPLADE is a novel method for learning sparse text representation vectors, outperforming BM25 in tasks like information retrieval and document classification. Its main advantage is generating efficient and interpretable sparse vectors, making it effective for large-scale text data. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#setup) Setup First, install FastEmbed. ```python pip install -q fastembed ``` Next, import the required modules for sparse embeddings and Python’s typing module. ```python from fastembed import SparseTextEmbedding, SparseEmbedding ``` You may always check the list of all supported sparse embedding models. ```python SparseTextEmbedding.list_supported_models() ``` This will return a list of models, each with its details such as model name, vocabulary size, description, and sources. ```python [\ {\ 'model': 'prithivida/Splade_PP_en_v1',\ 'sources': {'hf': 'Qdrant/Splade_PP_en_v1', ...},\ 'model_file': 'model.onnx',\ 'description': 'Independent Implementation of SPLADE++ Model for English.',\ 'license': 'apache-2.0',\ 'size_in_GB': 0.532,\ 'vocab_size': 30522,\ ...\ },\ ...\ ] # part of the output was omitted ``` Now, load the model. ```python model_name = "prithivida/Splade_PP_en_v1" --- # This triggers the model download model = SparseTextEmbedding(model_name=model_name) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#embed-data) Embed data You need to define a list of documents to be embedded. ```python documents: list[str] = [\ "Chandrayaan-3 is India's third lunar mission",\ "It aimed to land a rover on the Moon's surface - joining the US, China and Russia",\ "The mission is a follow-up to Chandrayaan-2, which had partial success",\ "Chandrayaan-3 will be launched by the Indian Space Research Organisation (ISRO)",\ "The estimated cost of the mission is around $35 million",\ "It will carry instruments to study the lunar surface and atmosphere",\ "Chandrayaan-3 landed on the Moon's surface on 23rd August 2023",\ "It consists of a lander named Vikram and a rover named Pragyan similar to Chandrayaan-2. Its propulsion module would act like an orbiter.",\ "The propulsion module carries the lander and rover configuration until the spacecraft is in a 100-kilometre (62 mi) lunar orbit",\ "The mission used GSLV Mk III rocket for its launch",\ "Chandrayaan-3 was launched from the Satish Dhawan Space Centre in Sriharikota",\ "Chandrayaan-3 was launched earlier in the year 2023",\ ] ``` Then, generate sparse embeddings for each document. Here, `batch_size` is optional and helps to process documents in batches. ```python sparse_embeddings_list: list[SparseEmbedding] = list( model.embed(documents, batch_size=6) ) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#retrieve-embeddings) Retrieve embeddings `sparse_embeddings_list` contains sparse embeddings for the documents provided earlier. Each element in this list is a `SparseEmbedding` object that contains the sparse vector representation of a document. ```python index = 0 sparse_embeddings_list[index] ``` This output is a `SparseEmbedding` object for the first document in our list. It contains two arrays: `values` and `indices`. \- The `values` array represents the weights of the features (tokens) in the document. - The `indices` array represents the indices of these features in the model’s vocabulary. Each pair of corresponding `values` and `indices` represents a token and its weight in the document. ```python SparseEmbedding(values=array([0.05297208, 0.01963477, 0.36459631, 1.38508618, 0.71776593,\ 0.12667948, 0.46230844, 0.446771 , 0.26897505, 1.01519883,\ 1.5655334 , 0.29412213, 1.53102326, 0.59785569, 1.1001817 ,\ 0.02079751, 0.09955651, 0.44249091, 0.09747757, 1.53519952,\ 1.36765671, 0.15740395, 0.49882549, 0.38629025, 0.76612782,\ 1.25805044, 0.39058095, 0.27236196, 0.45152301, 0.48262018,\ 0.26085234, 1.35912788, 0.70710695, 1.71639752]), indices=array([ 1010, 1011, 1016, 1017, 2001, 2018, 2034, 2093, 2117,\ 2319, 2353, 2509, 2634, 2686, 2796, 2817, 2922, 2959,\ 3003, 3148, 3260, 3390, 3462, 3523, 3822, 4231, 4316,\ 4774, 5590, 5871, 6416, 11926, 12076, 16469])) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#examine-weights) Examine weights Now, print the first 5 features and their weights for better understanding. ```python for i in range(5): print(f"Token at index {sparse_embeddings_list[0].indices[i]} has weight {sparse_embeddings_list[0].values[i]}") ``` The output will display the token indices and their corresponding weights for the first document. ```python Token at index 1010 has weight 0.05297207832336426 Token at index 1011 has weight 0.01963476650416851 Token at index 1016 has weight 0.36459630727767944 Token at index 1017 has weight 1.385086178779602 Token at index 2001 has weight 0.7177659273147583 ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#analyze-results) Analyze results Let’s use the tokenizer vocab to make sense of these indices. ```python import json from tokenizers import Tokenizer tokenizer = Tokenizer.from_pretrained("Qdrant/Splade_PP_en_v1") ``` The `get_tokens_and_weights` function takes a `SparseEmbedding` object and a `tokenizer` as input. It will construct a dictionary where the keys are the decoded tokens, and the values are their corresponding weights. ```python def get_tokens_and_weights(sparse_embedding, tokenizer): token_weight_dict = {} for i in range(len(sparse_embedding.indices)): token = tokenizer.decode([sparse_embedding.indices[i]]) weight = sparse_embedding.values[i] token_weight_dict[token] = weight # Sort the dictionary by weights token_weight_dict = dict(sorted(token_weight_dict.items(), key=lambda item: item[1], reverse=True)) return token_weight_dict --- # Test the function with the first SparseEmbedding print(json.dumps(get_tokens_and_weights(sparse_embeddings_list[index], tokenizer), indent=4)) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#dictionary-output) Dictionary output The dictionary is then sorted by weights in descending order. ```python { "chandra": 1.7163975238800049, "third": 1.5655333995819092, "##ya": 1.535199522972107, "india": 1.5310232639312744, "3": 1.385086178779602, "mission": 1.3676567077636719, "lunar": 1.3591278791427612, "moon": 1.2580504417419434, "indian": 1.1001816987991333, "##an": 1.015198826789856, "3rd": 0.7661278247833252, "was": 0.7177659273147583, "spacecraft": 0.7071069478988647, "space": 0.5978556871414185, "flight": 0.4988254904747009, "satellite": 0.4826201796531677, "first": 0.46230843663215637, "expedition": 0.4515230059623718, "three": 0.4467709958553314, "fourth": 0.44249090552330017, "vehicle": 0.390580952167511, "iii": 0.3862902522087097, "2": 0.36459630727767944, "##3": 0.2941221296787262, "planet": 0.27236196398735046, "second": 0.26897504925727844, "missions": 0.2608523368835449, "launched": 0.15740394592285156, "had": 0.12667948007583618, "largest": 0.09955651313066483, "leader": 0.09747757017612457, ",": 0.05297207832336426, "study": 0.02079751156270504, "-": 0.01963476650416851 } ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#observations) Observations - The relative order of importance is quite useful. The most important tokens in the sentence have the highest weights. - **Term Expansion:** The model can expand the terms in the document. This means that the model can generate weights for tokens that are not present in the document but are related to the tokens in the document. This is a powerful feature that allows the model to capture the context of the document. Here, you’ll see that the model has added the tokens ‘3’ from ’third’ and ‘moon’ from ’lunar’ to the sparse vector. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-splade/\#design-choices) Design choices - The weights are not normalized. This means that the sum of the weights is not 1 or 100. This is a common practice in sparse embeddings, as it allows the model to capture the importance of each token in the document. - Tokens are included in the sparse vector only if they are present in the model’s vocabulary. This means that the model will not generate a weight for tokens that it has not seen during training. - Tokens do not map to words directly – allowing you to gracefully handle typo errors and out-of-vocabulary tokens. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-splade.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-splade.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-48-lllmstxt|> ## authentication - [Documentation](https://qdrant.tech/documentation/) - [Cloud](https://qdrant.tech/documentation/cloud/) - Authentication --- # [Anchor](https://qdrant.tech/documentation/cloud/authentication/\#database-authentication-in-qdrant-managed-cloud) Database Authentication in Qdrant Managed Cloud This page describes what Database API keys are and shows you how to use the Qdrant Cloud Console to create a Database API key for a cluster. You will learn how to connect to your cluster using the new API key. Database API keys can be configured with granular access control. Database API keys with granular access control can be recognized by starting with `eyJhb`. Please refer to the [Table of access](https://qdrant.tech/documentation/guides/security/#table-of-access) to understand what permissions you can configure. Database API keys with granular access control are available for clusters using version **v1.11.0** and above. ## [Anchor](https://qdrant.tech/documentation/cloud/authentication/\#create-database-api-keys) Create Database API Keys ![API Key](https://qdrant.tech/documentation/cloud/create-api-key.png) 1. Go to the [Cloud Dashboard](https://qdrant.to/cloud). 2. Go to the **API Keys** section of the **Cluster Detail Page**. 3. Click **Create**. 4. Choose a name and an optional expiration (in days, the default is 90 days) for your API key. An empty expiration will result in no expiration. 5. By default, tokens are given cluster-wide permissions, with a choice between manage/write permissions (default) or read-only. To restrict a token to a subset of collections, you can select the Collections tab and choose from the collections available in your cluster. 6. Click **Create** and retrieve your API key. ![API Key](https://qdrant.tech/documentation/cloud/api-key.png) We recommend configuring an expiration and rotating your API keys regularly as a security best practice. How to Use Qdrant's Database API Keys with Granular Access Control - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [How to Use Qdrant's Database API Keys with Granular Access Control](https://www.youtube.com/watch?v=3c-8tcBIVdQ) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. Share Include playlist An error occurred while retrieving sharing information. Please try again later. 0:00 0:00 / 3:00 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=3c-8tcBIVdQ "Watch on YouTube") ## [Anchor](https://qdrant.tech/documentation/cloud/authentication/\#admin-database-api-keys) Admin Database API Keys The previous iteration of Database API keys, called Admin Database API keys, do not have granular access control. Clusters created before January 27, 2025 will still see the option to create Admin Database API keys. Older Admin Database API keys will continue to work, but we do recommend switching to Database API keys with granular access control to take advantage of better security controls. To enable Database API keys with granular access control, click **Enable** on the **API Keys** section of the Cluster detail page. After enabling Database API keys with granular access control for a cluster, existing Admin Database API keys will continue to work, but you will not be able to create new Admin Database API Keys. ## [Anchor](https://qdrant.tech/documentation/cloud/authentication/\#test-cluster-access) Test Cluster Access After creation, you will receive a code snippet to access your cluster. Your generated request should look very similar to this one: ```bash curl \ -X GET 'https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6333' \ --header 'api-key: ' ``` Open Terminal and run the request. You should get a response that looks like this: ```bash {"title":"qdrant - vector search engine","version":"1.13.0","commit":"ffda0b90c8c44fc43c99adab518b9787fe57bde6"} ``` > **Note:** You need to include the API key in the request header for every > request over REST or gRPC. ## [Anchor](https://qdrant.tech/documentation/cloud/authentication/\#authenticate-via-sdk) Authenticate via SDK Now that you have created your first cluster and key, you might want to access your database from within your application. Our [official Qdrant clients](https://qdrant.tech/documentation/interfaces/) for Python, TypeScript, Go, Rust, .NET and Java all support the API key parameter. bashpythontypescriptrustjavacsharpgo ```bash curl \ -X GET https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6333 \ --header 'api-key: ' --- # Alternatively, you can use the `Authorization` header with the `Bearer` prefix curl \ -X GET https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6333 \ --header 'Authorization: Bearer ' ``` ```python from qdrant_client import QdrantClient qdrant_client = QdrantClient( "xyz-example.cloud-region.cloud-provider.cloud.qdrant.io", api_key="", ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "xyz-example.cloud-region.cloud-provider.cloud.qdrant.io", apiKey: "", }); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6334") .api_key("") .build()?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder( "xyz-example.cloud-region.cloud-provider.cloud.qdrant.io", 6334, true) .withApiKey("") .build()); ``` ```csharp using Qdrant.Client; var client = new QdrantClient( host: "xyz-example.cloud-region.cloud-provider.cloud.qdrant.io", https: true, apiKey: "" ); ``` ```go import "github.com/qdrant/go-client/qdrant" client, err := qdrant.NewClient(&qdrant.Config{ Host: "xyz-example.cloud-region.cloud-provider.cloud.qdrant.io", Port: 6334, APIKey: "", UseTLS: true, }) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/authentication.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/authentication.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-49-lllmstxt|> ## web-ui-gsoc - [Articles](https://qdrant.tech/articles/) - Google Summer of Code 2023 - Web UI for Visualization and Exploration [Back to Ecosystem](https://qdrant.tech/articles/ecosystem/) --- # Google Summer of Code 2023 - Web UI for Visualization and Exploration Kartik Gupta · August 28, 2023 ![Google Summer of Code 2023 - Web UI for Visualization and Exploration](https://qdrant.tech/articles_data/web-ui-gsoc/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#introduction) Introduction Hello everyone! My name is Kartik Gupta, and I am thrilled to share my coding journey as part of the Google Summer of Code 2023 program. This summer, I had the incredible opportunity to work on an exciting project titled “Web UI for Visualization and Exploration” for Qdrant, a vector search engine. In this article, I will take you through my experience, challenges, and achievements during this enriching coding journey. ## [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#project-overview) Project Overview Qdrant is a powerful vector search engine widely used for similarity search and clustering. However, it lacked a user-friendly web-based UI for data visualization and exploration. My project aimed to bridge this gap by developing a web-based user interface that allows users to easily interact with and explore their vector data. ## [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#milestones-and-achievements) Milestones and Achievements The project was divided into six milestones, each focusing on a specific aspect of the web UI development. Let’s go through each of them and my achievements during the coding period. **1\. Designing a friendly UI on Figma** I started by designing the user interface on Figma, ensuring it was easy to use, visually appealing, and responsive on different devices. I focused on usability and accessibility to create a seamless user experience. ( [Figma Design](https://www.figma.com/file/z54cAcOErNjlVBsZ1DrXyD/Qdant?type=design&node-id=0-1&mode=design&t=Pu22zO2AMFuGhklG-0)) **2\. Building the layout** The layout route served as a landing page with an overview of the application’s features and navigation links to other routes. **3\. Creating a view collection route** This route enabled users to view a list of collections available in the application. Users could click on a collection to see more details, including the data and vectors associated with it. ![Collection Page](https://qdrant.tech/articles_data/web-ui-gsoc/collections-page.png) Collection Page **4\. Developing a data page with “find similar” functionality** I implemented a data page where users could search for data and find similar data using a recommendation API. The recommendation API suggested similar data based on the Data’s selected ID, providing valuable insights. ![Points Page](https://qdrant.tech/articles_data/web-ui-gsoc/points-page.png) Points Page **5\. Developing query editor page libraries** This milestone involved creating a query editor page that allowed users to write queries in a custom language. The editor provided syntax highlighting, autocomplete, and error-checking features for a seamless query writing experience. ![Query Editor Page](https://qdrant.tech/articles_data/web-ui-gsoc/console-page.png) Query Editor Page **6\. Developing a route for visualizing vector data points** This is done by the reduction of n-dimensional vector in 2-D points and they are displayed with their respective payloads. ![visualization-page](https://qdrant.tech/articles_data/web-ui-gsoc/visualization-page.png) Vector Visuliztion Page ## [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#challenges-and-learning) Challenges and Learning Throughout the project, I encountered a series of challenges that stretched my engineering capabilities and provided unique growth opportunities. From mastering new libraries and technologies to ensuring the user interface (UI) was both visually appealing and user-friendly, every obstacle became a stepping stone toward enhancing my skills as a developer. However, each challenge provided an opportunity to learn and grow as a developer. I acquired valuable experience in vector search and dimension reduction techniques. The most significant learning for me was the importance of effective project management. Setting realistic timelines, collaborating with mentors, and staying proactive with feedback allowed me to complete the milestones efficiently. ### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#technical-learning-and-skill-development) Technical Learning and Skill Development One of the most significant aspects of this journey was diving into the intricate world of vector search and dimension reduction techniques. These areas, previously unfamiliar to me, required rigorous study and exploration. Learning how to process vast amounts of data efficiently and extract meaningful insights through these techniques was both challenging and rewarding. ### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#effective-project-management) Effective Project Management Undoubtedly, the most impactful lesson was the art of effective project management. I quickly grasped the importance of setting realistic timelines and goals. Collaborating closely with mentors and maintaining proactive communication proved indispensable. This approach enabled me to navigate the complex development process and successfully achieve the project’s milestones. ### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#overcoming-technical-challenges) Overcoming Technical Challenges #### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#autocomplete-feature-in-console) Autocomplete Feature in Console One particularly intriguing challenge emerged while working on the autocomplete feature within the console. Finding a solution was proving elusive until a breakthrough came from an unexpected direction. My mentor, Andrey, proposed creating a separate module that could support autocomplete based on OpenAPI for our custom language. This ingenious approach not only resolved the issue but also showcased the power of collaborative problem-solving. #### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#optimization-with-web-workers) Optimization with Web Workers The high-processing demands of vector reduction posed another significant challenge. Initially, this task was straining browsers and causing performance issues. The solution materialized in the form of web workers—an independent processing instance that alleviated the strain on browsers. However, a new question arose: how to terminate these workers effectively? With invaluable insights from my mentor, I gained a deeper understanding of web worker dynamics and successfully tackled this challenge. #### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#console-integration-complexity) Console Integration Complexity Integrating the console interaction into the application presented multifaceted challenges. Crafting a custom language in Monaco, parsing text to make API requests, and synchronizing the entire process demanded meticulous attention to detail. Overcoming these hurdles was a testament to the complexity of real-world engineering endeavours. #### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#codelens-multiplicity-issue) Codelens Multiplicity Issue An unexpected issue cropped up during the development process: the codelen (run button) registered multiple times, leading to undesired behaviour. This hiccup underscored the importance of thorough testing and debugging, even in seemingly straightforward features. ### [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#key-learning-points) Key Learning Points Amidst these challenges, I garnered valuable insights that have significantly enriched my engineering prowess: **Vector Reduction Techniques**: Navigating the realm of vector reduction techniques provided a deep understanding of how to process and interpret data efficiently. This knowledge opens up new avenues for developing data-driven applications in the future. **Web Workers Efficiency**: Mastering the intricacies of web workers not only resolved performance concerns but also expanded my repertoire of optimization strategies. This newfound proficiency will undoubtedly find relevance in various future projects. **Monaco Editor and UI Frameworks**: Working extensively with the Monaco Editor, Material-UI (MUI), and Vite enriched my familiarity with these essential tools. I honed my skills in integrating complex UI components seamlessly into applications. ## [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#areas-for-improvement-and-future-enhancements) Areas for Improvement and Future Enhancements While reflecting on this transformative journey, I recognize several areas that offer room for improvement and future enhancements: 1. Enhanced Autocomplete: Further refining the autocomplete feature to support key-value suggestions in JSON structures could greatly enhance the user experience. 2. Error Detection in Console: Integrating the console’s error checker with OpenAPI could enhance its accuracy in identifying errors and offering precise suggestions for improvement. 3. Expanded Vector Visualization: Exploring additional visualization methods and optimizing their performance could elevate the utility of the vector visualization route. ## [Anchor](https://qdrant.tech/articles/web-ui-gsoc/\#conclusion) Conclusion Participating in the Google Summer of Code 2023 and working on the “Web UI for Visualization and Exploration” project has been an immensely rewarding experience. I am grateful for the opportunity to contribute to Qdrant and develop a user-friendly interface for vector data exploration. I want to express my gratitude to my mentors and the entire Qdrant community for their support and guidance throughout this journey. This experience has not only improved my coding skills but also instilled a deeper passion for web development and data analysis. As my coding journey continues beyond this project, I look forward to applying the knowledge and experience gained here to future endeavours. I am excited to see how Qdrant evolves with the newly developed web UI and how it positively impacts users worldwide. Thank you for joining me on this coding adventure, and I hope to share more exciting projects in the future! Happy coding! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/web-ui-gsoc.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/web-ui-gsoc.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-50-lllmstxt|> ## food-discovery-demo - [Articles](https://qdrant.tech/articles/) - Food Discovery Demo [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Food Discovery Demo Kacper Łukawski · September 05, 2023 ![Food Discovery Demo](https://qdrant.tech/articles_data/food-discovery-demo/preview/title.jpg) Not every search journey begins with a specific destination in mind. Sometimes, you just want to explore and see what’s out there and what you might like. This is especially true when it comes to food. You might be craving something sweet, but you don’t know what. You might be also looking for a new dish to try, and you just want to see the options available. In these cases, it’s impossible to express your needs in a textual query, as the thing you are looking for is not yet defined. Qdrant’s semantic search for images is useful when you have a hard time expressing your tastes in words. ## [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#general-architecture) General architecture We are happy to announce a refreshed version of our [Food Discovery Demo](https://food-discovery.qdrant.tech/). This time available as an open source project, so you can easily deploy it on your own and play with it. If you prefer to dive into the source code directly, then feel free to check out the [GitHub repository](https://github.com/qdrant/demo-food-discovery/). Otherwise, read on to learn more about the demo and how it works! In general, our application consists of three parts: a [FastAPI](https://fastapi.tiangolo.com/) backend, a [React](https://react.dev/) frontend, and a [Qdrant](https://qdrant.tech/) instance. The architecture diagram below shows how these components interact with each other: ![Archtecture diagram](https://qdrant.tech/articles_data/food-discovery-demo/architecture-diagram.png) ## [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#why-did-we-use-a-clip-model) Why did we use a CLIP model? CLIP is a neural network that can be used to encode both images and texts into vectors. And more importantly, both images and texts are vectorized into the same latent space, so we can compare them directly. This lets you perform semantic search on images using text queries and the other way around. For example, if you search for “flat bread with toppings”, you will get images of pizza. Or if you search for “pizza”, you will get images of some flat bread with toppings, even if they were not labeled as “pizza”. This is because CLIP embeddings capture the semantics of the images and texts and can find the similarities between them no matter the wording. ![CLIP model](https://qdrant.tech/articles_data/food-discovery-demo/clip-model.png) CLIP is available in many different ways. We used the pretrained `clip-ViT-B-32` model available in the [Sentence-Transformers](https://www.sbert.net/examples/applications/image-search/README.html) library, as this is the easiest way to get started. ## [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#the-dataset) The dataset The demo is based on the [Wolt](https://wolt.com/) dataset. It contains over 2M images of dishes from different restaurants along with some additional metadata. This is how a payload for a single dish looks like: ```json { "cafe": { "address": "VGX7+6R2 Vecchia Napoli, Valletta", "categories": ["italian", "pasta", "pizza", "burgers", "mediterranean"], "location": {"lat": 35.8980154, "lon": 14.5145106}, "menu_id": "610936a4ee8ea7a56f4a372a", "name": "Vecchia Napoli Is-Suq Tal-Belt", "rating": 9, "slug": "vecchia-napoli-skyparks-suq-tal-belt" }, "description": "Tomato sauce, mozzarella fior di latte, crispy guanciale, Pecorino Romano cheese and a hint of chilli", "image": "https://wolt-menu-images-cdn.wolt.com/menu-images/610936a4ee8ea7a56f4a372a/005dfeb2-e734-11ec-b667-ced7a78a5abd_l_amatriciana_pizza_joel_gueller1.jpeg", "name": "L'Amatriciana" } ``` Processing this amount of records takes some time, so we precomputed the CLIP embeddings, stored them in a Qdrant collection and exported the collection as a snapshot. You may [download it here](https://storage.googleapis.com/common-datasets-snapshots/wolt-clip-ViT-B-32.snapshot). ## [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#different-search-modes) Different search modes The FastAPI backend [exposes just a single endpoint](https://github.com/qdrant/demo-food-discovery/blob/6b49e11cfbd6412637d527cdd62fe9b9f74ac699/backend/main.py#L37), however it handles multiple scenarios. Let’s dive into them one by one and understand why they are needed. ### [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#cold-start) Cold start Recommendation systems struggle with a cold start problem. When a new user joins the system, there is no data about their preferences, so it’s hard to recommend anything. The same applies to our demo. When you open it, you will see a random selection of dishes, and it changes every time you refresh the page. Internally, the demo [chooses some random points](https://github.com/qdrant/demo-food-discovery/blob/6b49e11cfbd6412637d527cdd62fe9b9f74ac699/backend/discovery.py#L70) in the vector space. ![Random points selection](https://qdrant.tech/articles_data/food-discovery-demo/random-results.png) That procedure should result in returning diverse results, so we have a higher chance of showing something interesting to the user. ### [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#textual-search) Textual search Since the demo suffers from the cold start problem, we implemented a textual search mode that is useful to start exploring the data. You can type in any text query by clicking a search icon in the top right corner. The demo will use the CLIP model to encode the query into a vector and then search for the nearest neighbors in the vector space. ![Random points selection](https://qdrant.tech/articles_data/food-discovery-demo/textual-search.png) This is implemented as [a group search query to Qdrant](https://github.com/qdrant/demo-food-discovery/blob/6b49e11cfbd6412637d527cdd62fe9b9f74ac699/backend/discovery.py#L44). We didn’t use a simple search, but performed grouping by the restaurant to get more diverse results. [Search groups](https://qdrant.tech/documentation/concepts/search/#search-groups) is a mechanism similar to `GROUP BY` clause in SQL, and it’s useful when you want to get a specific number of result per group (in our case just one). ```python import settings --- # Encode query into a vector, model is an instance of --- # sentence_transformers.SentenceTransformer that loaded CLIP model query_vector = model.encode(query).tolist() --- # Search for nearest neighbors, client is an instance of --- # qdrant_client.QdrantClient that has to be initialized before response = client.search_groups( settings.QDRANT_COLLECTION, query_vector=query_vector, group_by=settings.GROUP_BY_FIELD, limit=search_query.limit, ) ``` ### [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#exploring-the-results) Exploring the results The main feature of the demo is the ability to explore the space of the dishes. You can click on any of them to see more details, but first of all you can like or dislike it, and the demo will update the search results accordingly. ![Recommendation results](https://qdrant.tech/articles_data/food-discovery-demo/recommendation-results.png) #### [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#negative-feedback-only) Negative feedback only Qdrant [Recommendation API](https://qdrant.tech/documentation/concepts/search/#recommendation-api) needs at least one positive example to work. However, in our demo we want to be able to provide only negative examples. This is because we want to be able to say “I don’t like this dish” without having to like anything first. To achieve this, we use a trick. We negate the vectors of the disliked dishes and use their mean as a query. This way, the disliked dishes will be pushed away from the search results. **This works because the cosine distance is based on the angle between two vectors, and the angle between a vector and its negation is 180 degrees.** ![CLIP model](https://qdrant.tech/articles_data/food-discovery-demo/negated-vector.png) Food Discovery Demo [implements that trick](https://github.com/qdrant/demo-food-discovery/blob/6b49e11cfbd6412637d527cdd62fe9b9f74ac699/backend/discovery.py#L122) by calling Qdrant twice. Initially, we use the [Scroll API](https://qdrant.tech/documentation/concepts/points/#scroll-points) to find disliked items, and then calculate a negated mean of all their vectors. That allows using the [Search Groups API](https://qdrant.tech/documentation/concepts/search/#search-groups) to find the nearest neighbors of the negated mean vector. ```python import numpy as np --- # Retrieve the disliked points based on their ids disliked_points, _ = client.scroll( settings.QDRANT_COLLECTION, scroll_filter=models.Filter( must=[\ models.HasIdCondition(has_id=search_query.negative),\ ] ), with_vectors=True, ) --- # Calculate a mean vector of disliked points disliked_vectors = np.array([point.vector for point in disliked_points]) mean_vector = np.mean(disliked_vectors, axis=0) negated_vector = -mean_vector --- # Search for nearest neighbors of the negated mean vector response = client.search_groups( settings.QDRANT_COLLECTION, query_vector=negated_vector.tolist(), group_by=settings.GROUP_BY_FIELD, limit=search_query.limit, ) ``` #### [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#positive-and-negative-feedback) Positive and negative feedback Since the [Recommendation API](https://qdrant.tech/documentation/concepts/search/#recommendation-api) requires at least one positive example, we can use it only when the user has liked at least one dish. We could theoretically use the same trick as above and negate the disliked dishes, but it would be a bit weird, as Qdrant has that feature already built-in, and we can call it just once to do the job. It’s always better to perform the search server-side. Thus, in this case [we just call\\ the Qdrant server with a list of positive and negative examples](https://github.com/qdrant/demo-food-discovery/blob/6b49e11cfbd6412637d527cdd62fe9b9f74ac699/backend/discovery.py#L166), so it can find some points which are close to the positive examples and far from the negative ones. ```python response = client.recommend_groups( settings.QDRANT_COLLECTION, positive=search_query.positive, negative=search_query.negative, group_by=settings.GROUP_BY_FIELD, limit=search_query.limit, ) ``` From the user perspective nothing changes comparing to the previous case. ### [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#location-based-search) Location-based search Last but not least, location plays an important role in the food discovery process. You are definitely looking for something you can find nearby, not on the other side of the globe. Therefore, your current location can be toggled as a filtering condition. You can enable it by clicking on “Find near me” icon in the top right. This way you can find the best pizza in your neighborhood, not in the whole world. Qdrant [geo radius filter](https://qdrant.tech/documentation/concepts/filtering/#geo-radius) is a perfect choice for this. It lets you filter the results by distance from a given point. ```python from qdrant_client import models --- # Create a geo radius filter query_filter = models.Filter( must=[\ models.FieldCondition(\ key="cafe.location",\ geo_radius=models.GeoRadius(\ center=models.GeoPoint(\ lon=location.longitude,\ lat=location.latitude,\ ),\ radius=location.radius_km * 1000,\ ),\ )\ ] ) ``` Such a filter needs [a payload index](https://qdrant.tech/documentation/concepts/indexing/#payload-index) to work efficiently, and it was created on a collection we used to create the snapshot. When you import it into your instance, the index will be already there. ## [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#using-the-demo) Using the demo The Food Discovery Demo [is available online](https://food-discovery.qdrant.tech/), but if you prefer to run it locally, you can do it with Docker. The [README](https://github.com/qdrant/demo-food-discovery/blob/main/README.md) describes all the steps more in detail, but here is a quick start: ```bash git clone git@github.com:qdrant/demo-food-discovery.git cd demo-food-discovery --- # Create .env file based on .env.example docker-compose up -d ``` The demo will be available at `http://localhost:8001`, but you won’t be able to search anything until you [import the snapshot into your Qdrant\\ instance](https://qdrant.tech/documentation/concepts/snapshots/#recover-via-api). If you don’t want to bother with hosting a local one, you can use the [Qdrant\\ Cloud](https://cloud.qdrant.io/) cluster. 4 GB RAM is enough to load all the 2 million entries. ## [Anchor](https://qdrant.tech/articles/food-discovery-demo/\#fork-and-reuse) Fork and reuse Our demo is completely open-source. Feel free to fork it, update with your own dataset or adapt the application to your use case. Whether you’re looking to understand the mechanics of semantic search or to have a foundation to build a larger project, this demo can serve as a starting point. Check out the [Food Discovery Demo repository](https://github.com/qdrant/demo-food-discovery/) to get started. If you have any questions, feel free to reach out [through Discord](https://qdrant.to/discord). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/food-discovery-demo.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/food-discovery-demo.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-51-lllmstxt|> ## capacity-planning - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Capacity Planning --- # [Anchor](https://qdrant.tech/documentation/guides/capacity-planning/\#capacity-planning) Capacity Planning When setting up your cluster, you’ll need to figure out the right balance of **RAM** and **disk storage**. The best setup depends on a few things: - How many vectors you have and their dimensions. - The amount of payload data you’re using and their indexes. - What data you want to store in memory versus on disk. - Your cluster’s replication settings. - Whether you’re using quantization and how you’ve set it up. ## [Anchor](https://qdrant.tech/documentation/guides/capacity-planning/\#calculating-ram-size) Calculating RAM size You should store frequently accessed data in RAM for faster retrieval. If you want to keep all vectors in memory for optimal performance, you can use this rough formula for estimation: ```text memory_size = number_of_vectors * vector_dimension * 4 bytes * 1.5 ``` At the end, we multiply everything by 1.5. This extra 50% accounts for metadata (such as indexes and point versions) and temporary segments created during optimization. Let’s say you want to store 1 million vectors with 1024 dimensions: ```text memory_size = 1,000,000 * 1024 * 4 bytes * 1.5 ``` The memory\_size is approximately 6,144,000,000 bytes, or about 5.72 GB. Depending on the use case, large datasets can benefit from reduced memory requirements via [quantization](https://qdrant.tech/documentation/guides/quantization/). ## [Anchor](https://qdrant.tech/documentation/guides/capacity-planning/\#calculating-payload-size) Calculating payload size This is always different. The size of the payload depends on the [structure and content of your data](https://qdrant.tech/documentation/concepts/payload/#payload-types). For instance: - **Text fields** consume space based on length and encoding (e.g. a large chunk of text vs a few words). - **Floats** have fixed sizes of 8 bytes for `int64` or `float64`. - **Boolean fields** typically consume 1 byte. Calculating total payload size is similar to vectors. We have to multiply it by 1.5 for back-end indexing processes. ```text total_payload_size = number_of_points * payload_size * 1.5 ``` Let’s say you want to store 1 million points with JSON payloads of 5KB: ```text total_payload_size = 1,000,000 * 5KB * 1.5 ``` The total\_payload\_size is approximately 5,000,000 bytes, or about 4.77 GB. ## [Anchor](https://qdrant.tech/documentation/guides/capacity-planning/\#choosing-disk-over-ram) Choosing disk over RAM For optimal performance, you should store only frequently accessed data in RAM. The rest should be offloaded to the disk. For example, extra payload fields that you don’t use for filtering can be stored on disk. Only [indexed fields](https://qdrant.tech/documentation/concepts/indexing/#payload-index) should be stored in RAM. You can read more about payload storage in the [Storage](https://qdrant.tech/documentation/concepts/storage/#payload-storage) section. ### [Anchor](https://qdrant.tech/documentation/guides/capacity-planning/\#storage-focused-configuration) Storage-focused configuration If your priority is to handle large volumes of vectors with average search latency, it’s recommended to configure [memory-mapped (mmap) storage](https://qdrant.tech/documentation/concepts/storage/#configuring-memmap-storage). In this setup, vectors are stored on disk in memory-mapped files, while only the most frequently accessed vectors are cached in RAM. The amount of available RAM greatly impacts search performance. As a general rule, if you store half as many vectors in RAM, search latency will roughly double. Disk speed is also crucial. [Contact us](https://qdrant.tech/documentation/support/) if you have specific requirements for high-volume searches in our Cloud. ### [Anchor](https://qdrant.tech/documentation/guides/capacity-planning/\#subgroup-oriented-configuration) Subgroup-oriented configuration If your use case involves splitting vectors into multiple collections or subgroups based on payload values (e.g., serving searches for multiple users, each with their own subset of vectors), memory-mapped storage is recommended. In this scenario, only the active subset of vectors will be cached in RAM, allowing for fast searches for the most recent and active users. You can estimate the required memory size as: ```text memory_size = number_of_active_vectors * vector_dimension * 4 bytes * 1.5 ``` Please refer to our [multitenancy](https://qdrant.tech/documentation/guides/multiple-partitions/) documentation for more details on partitioning data in a Qdrant. ## [Anchor](https://qdrant.tech/documentation/guides/capacity-planning/\#scaling-disk-space-in-qdrant-cloud) Scaling disk space in Qdrant Cloud Clusters supporting vector search require substantial disk space compared to other search systems. If you’re running low on disk space, you can use the UI at [cloud.qdrant.io](https://cloud.qdrant.io/) to **Scale Up** your cluster. When running low on disk space, consider the following benefits of scaling up: - **Larger Datasets**: Supports larger datasets, which can improve the relevance and quality of search results. - **Improved Indexing**: Enables the use of advanced indexing strategies like HNSW. - **Caching**: Enhances speed by having more RAM, allowing more frequently accessed data to be cached. - **Backups and Redundancy**: Facilitates more frequent backups, which is a key advantage for data safety. Always remember to add 50% of the vector size. This would account for things like indexes and auxiliary data used during operations such as vector insertion, deletion, and search. Thus, the estimated memory size including metadata is: ```text total_vector_size = number_of_dimensions * 4 bytes * 1.5 ``` **Disclaimer** The above calculations are estimates at best. If you’re looking for more accurate numbers, you should always test your data set in practice. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/capacity-planning.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/capacity-planning.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-52-lllmstxt|> ## machine-learning - [Articles](https://qdrant.tech/articles/) - Machine Learning #### Machine Learning Explore Machine Learning principles and practices which make modern semantic similarity search possible. Apply Qdrant and vector search capabilities to your ML projects. [![Preview](https://qdrant.tech/articles_data/minicoil/preview/preview.jpg)\\ **miniCOIL: on the Road to Usable Sparse Neural Retrieval** \\ Introducing miniCOIL, a lightweight sparse neural retriever capable of generalization.\\ \\ Evgeniya Sukhodolskaya\\ \\ May 13, 2025](https://qdrant.tech/articles/minicoil/)[![Preview](https://qdrant.tech/articles_data/search-feedback-loop/preview/preview.jpg)\\ **Relevance Feedback in Informational Retrieval** \\ Relerance feedback: from ancient history to LLMs. Why relevance feedback techniques are good on paper but not popular in neural search, and what we can do about it.\\ \\ Evgeniya Sukhodolskaya\\ \\ March 27, 2025](https://qdrant.tech/articles/search-feedback-loop/)[![Preview](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/preview/preview.jpg)\\ **Modern Sparse Neural Retrieval: From Theory to Practice** \\ A comprehensive guide to modern sparse neural retrievers: COIL, TILDEv2, SPLADE, and more. Find out how they work and learn how to use them effectively.\\ \\ Evgeniya Sukhodolskaya\\ \\ October 23, 2024](https://qdrant.tech/articles/modern-sparse-neural-retrieval/)[![Preview](https://qdrant.tech/articles_data/cross-encoder-integration-gsoc/preview/preview.jpg)\\ **Qdrant Summer of Code 2024 - ONNX Cross Encoders in Python** \\ A summary of my work and experience at Qdrant Summer of Code 2024.\\ \\ Huong (Celine) Hoang\\ \\ October 14, 2024](https://qdrant.tech/articles/cross-encoder-integration-gsoc/)[![Preview](https://qdrant.tech/articles_data/late-interaction-models/preview/preview.jpg)\\ **Any\* Embedding Model Can Become a Late Interaction Model... If You Give It a Chance!** \\ We recently discovered that embedding models can become late interaction models & can perform surprisingly well in some scenarios. See what we learned here.\\ \\ Kacper Łukawski\\ \\ August 14, 2024](https://qdrant.tech/articles/late-interaction-models/)[![Preview](https://qdrant.tech/articles_data/bm42/preview/preview.jpg)\\ **BM42: New Baseline for Hybrid Search** \\ Introducing BM42 - a new sparse embedding approach, which combines the benefits of exact keyword search with the intelligence of transformers.\\ \\ Andrey Vasnetsov\\ \\ July 01, 2024](https://qdrant.tech/articles/bm42/)[![Preview](https://qdrant.tech/articles_data/embedding-recycling/preview/preview.jpg)\\ **Layer Recycling and Fine-tuning Efficiency** \\ Learn when and how to use layer recycling to achieve different performance targets.\\ \\ Yusuf Sarıgöz\\ \\ August 23, 2022](https://qdrant.tech/articles/embedding-recycler/)[![Preview](https://qdrant.tech/articles_data/cars-recognition/preview/preview.jpg)\\ **Fine Tuning Similar Cars Search** \\ Learn how to train a similarity model that can retrieve similar car images in novel categories.\\ \\ Yusuf Sarıgöz\\ \\ June 28, 2022](https://qdrant.tech/articles/cars-recognition/)[![Preview](https://qdrant.tech/articles_data/detecting-coffee-anomalies/preview/preview.jpg)\\ **Metric Learning for Anomaly Detection** \\ Practical use of metric learning for anomaly detection. A way to match the results of a classification-based approach with only ~0.6% of the labeled data.\\ \\ Yusuf Sarıgöz\\ \\ May 04, 2022](https://qdrant.tech/articles/detecting-coffee-anomalies/)[![Preview](https://qdrant.tech/articles_data/triplet-loss/preview/preview.jpg)\\ **Triplet Loss - Advanced Intro** \\ What are the advantages of Triplet Loss over Contrastive loss and how to efficiently implement it?\\ \\ Yusuf Sarıgöz\\ \\ March 24, 2022](https://qdrant.tech/articles/triplet-loss/)[![Preview](https://qdrant.tech/articles_data/metric-learning-tips/preview/preview.jpg)\\ **Metric Learning Tips & Tricks** \\ Practical recommendations on how to train a matching model and serve it in production. Even with no labeled data.\\ \\ Andrei Vasnetsov\\ \\ May 15, 2021](https://qdrant.tech/articles/metric-learning-tips/) × [Powered by](https://qdrant.tech/) <|page-53-lllmstxt|> ## support - [Documentation](https://qdrant.tech/documentation/) - Support --- # [Anchor](https://qdrant.tech/documentation/support/\#qdrant-cloud-support-and-troubleshooting) Qdrant Cloud Support and Troubleshooting ## [Anchor](https://qdrant.tech/documentation/support/\#community-support) Community Support All Qdrant Cloud users are welcome to join our [Discord community](https://qdrant.to/discord/). ![Discord](https://qdrant.tech/documentation/cloud/discord.png) ## [Anchor](https://qdrant.tech/documentation/support/\#qdrant-cloud-support) Qdrant Cloud Support Paying customers have access to our Support team. Links to the support portal are available in the Qdrant Cloud Console. ![Support Portal](https://qdrant.tech/documentation/cloud/support-portal.png) When creating a support ticket, please provide as much information as possible to help us understand your issue. This includes but is not limited to: - The ID of your Qdrant Cloud cluster, if it’s not filled out by the UI automatically. You can find the ID on your cluster’s detail page. - Which collection(s) are affected - Code examples on how you are interacting with the Qdrant API - Logs or error messages from your application - Relevant telemetry from your application You can also choose a severity, when creating a ticket. This helps us prioritize your issue correctly. Please refer to the [Qdrant Cloud SLA](https://qdrant.to/sla/) for a definition of these severity levels and their corresponding response time SLA for your respective [support tier](https://qdrant.tech/documentation/cloud/premium/). If you are opening a ticket for a Hybrid Cloud or Private Cloud environment, we may ask for additional information about your environment, such as detailed logs of the Qdrant databases or operator and the state of your Kubernetes cluster. We have prepared a support bundle script that can help you with collecting all this information. A support bundle will not contain any user data or sensitive information like api keys. It will contain the names and configuration of Qdrant collections though. For more information see the [support bundle documentation](https://github.com/qdrant/qdrant-cloud-support-tools/tree/main/support-bundle). We recommend creating one and attaching it to your support ticket, so that we can help you faster. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/support.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/support.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-54-lllmstxt|> ## filtered-search-intro --- # [Anchor](https://qdrant.tech/benchmarks/filtered-search-intro/\#filtered-search-benchmark) Filtered search benchmark Applying filters to search results brings a whole new level of complexity. It is no longer enough to apply one algorithm to plain data. With filtering, it becomes a matter of the _cross-integration_ of the different indices. To measure how well different search engines perform in this scenario, we have prepared a set of **Filtered ANN Benchmark Datasets** - [https://github.com/qdrant/ann-filtering-benchmark-datasets](https://github.com/qdrant/ann-filtering-benchmark-datasets) It is similar to the ones used in the [ann-benchmarks project](https://github.com/erikbern/ann-benchmarks/) but enriched with payload metadata and pre-generated filtering requests. It includes synthetic and real-world datasets with various filters, from keywords to geo-spatial queries. ### [Anchor](https://qdrant.tech/benchmarks/filtered-search-intro/\#why-filtering-is-not-trivial) Why filtering is not trivial? Not many ANN algorithms are compatible with filtering. HNSW is one of the few of them, but search engines approach its integration in different ways: - Some use **post-filtering**, which applies filters after ANN search. It doesn’t scale well as it either loses results or requires many candidates on the first stage. - Others use **pre-filtering**, which requires a binary mask of the whole dataset to be passed into the ANN algorithm. It is also not scalable, as the mask size grows linearly with the dataset size. On top of it, there is also a problem with search accuracy. It appears if too many vectors are filtered out, so the HNSW graph becomes disconnected. Qdrant uses a different approach, not requiring pre- or post-filtering while addressing the accuracy problem. Read more about the Qdrant approach in our [Filtrable HNSW](https://qdrant.tech/articles/filtrable-hnsw/) article. Share this article [x](https://twitter.com/intent/tweet?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Ffiltered-search-intro%2F&text=Filtered%20search%20benchmark "x")[LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Ffiltered-search-intro%2F "LinkedIn") Up! <|page-55-lllmstxt|> ## qdrant-internals - [Articles](https://qdrant.tech/articles/) - Qdrant Internals #### Qdrant Internals Take a look under the hood of Qdrant’s high-performance vector search engine. Explore the architecture, components, and design principles the Qdrant Vector Search Engine is built on. [![Preview](https://qdrant.tech/articles_data/dedicated-vector-search/preview/preview.jpg)\\ **Built for Vector Search** \\ Why add-on vector search looks good — until you actually use it.\\ \\ Evgeniya Sukhodolskaya & Andrey Vasnetsov\\ \\ February 17, 2025](https://qdrant.tech/articles/dedicated-vector-search/)[![Preview](https://qdrant.tech/articles_data/gridstore-key-value-storage/preview/preview.jpg)\\ **Introducing Gridstore: Qdrant's Custom Key-Value Store** \\ Why and how we built our own key-value store. A short technical report on our procedure and results.\\ \\ Luis Cossio, Arnaud Gourlay & David Myriel\\ \\ February 05, 2025](https://qdrant.tech/articles/gridstore-key-value-storage/)[![Preview](https://qdrant.tech/articles_data/immutable-data-structures/preview/preview.jpg)\\ **Qdrant Internals: Immutable Data Structures** \\ Learn how immutable data structures improve vector search performance in Qdrant.\\ \\ Andrey Vasnetsov\\ \\ August 20, 2024](https://qdrant.tech/articles/immutable-data-structures/)[![Preview](https://qdrant.tech/articles_data/dedicated-service/preview/preview.jpg)\\ **Vector Search as a dedicated service** \\ Why vector search requires a dedicated service.\\ \\ Andrey Vasnetsov\\ \\ November 30, 2023](https://qdrant.tech/articles/dedicated-service/)[![Preview](https://qdrant.tech/articles_data/geo-polygon-filter-gsoc/preview/preview.jpg)\\ **Google Summer of Code 2023 - Polygon Geo Filter for Qdrant Vector Database** \\ A Summary of my work and experience at Qdrant's Gsoc '23.\\ \\ Zein Wen\\ \\ October 12, 2023](https://qdrant.tech/articles/geo-polygon-filter-gsoc/)[![Preview](https://qdrant.tech/articles_data/binary-quantization/preview/preview.jpg)\\ **Binary Quantization - Vector Search, 40x Faster** \\ Binary Quantization is a newly introduced mechanism of reducing the memory footprint and increasing performance\\ \\ Nirant Kasliwal\\ \\ September 18, 2023](https://qdrant.tech/articles/binary-quantization/)[![Preview](https://qdrant.tech/articles_data/io_uring/preview/preview.jpg)\\ **Qdrant under the hood: io\_uring** \\ Slow disk decelerating your Qdrant deployment? Get on top of IO overhead with this one trick!\\ \\ Andre Bogus\\ \\ June 21, 2023](https://qdrant.tech/articles/io_uring/)[![Preview](https://qdrant.tech/articles_data/product-quantization/preview/preview.jpg)\\ **Product Quantization in Vector Search \| Qdrant** \\ Discover product quantization in vector search technology. Learn how it optimizes storage and accelerates search processes for high-dimensional data.\\ \\ Kacper Łukawski\\ \\ May 30, 2023](https://qdrant.tech/articles/product-quantization/)[![Preview](https://qdrant.tech/articles_data/scalar-quantization/preview/preview.jpg)\\ **Scalar Quantization: Background, Practices & More \| Qdrant** \\ Discover the efficiency of scalar quantization for optimized data storage and enhanced performance. Learn about its data compression benefits and efficiency improvements.\\ \\ Kacper Łukawski\\ \\ March 27, 2023](https://qdrant.tech/articles/scalar-quantization/)[![Preview](https://qdrant.tech/articles_data/memory-consumption/preview/preview.jpg)\\ **Minimal RAM you need to serve a million vectors** \\ How to properly measure RAM usage and optimize Qdrant for memory consumption.\\ \\ Andrei Vasnetsov\\ \\ December 07, 2022](https://qdrant.tech/articles/memory-consumption/)[![Preview](https://qdrant.tech/articles_data/filtrable-hnsw/preview/preview.jpg)\\ **Filtrable HNSW** \\ How to make ANN search with custom filtering? Search in selected subsets without loosing the results.\\ \\ Andrei Vasnetsov\\ \\ November 24, 2019](https://qdrant.tech/articles/filtrable-hnsw/) × [Powered by](https://qdrant.tech/) <|page-56-lllmstxt|> ## send-data - [Documentation](https://qdrant.tech/documentation/) - Send Data to Qdrant ## [Anchor](https://qdrant.tech/documentation/send-data/\#how-to-send-your-data-to-a-qdrant-cluster) How to Send Your Data to a Qdrant Cluster | Example | Description | Stack | | --- | --- | --- | | [Pinecone to Qdrant Data Transfer](https://githubtocolab.com/qdrant/examples/blob/master/data-migration/from-pinecone-to-qdrant.ipynb) | Migrate your vector data from Pinecone to Qdrant. | Qdrant, Vector-io | | [Stream Data to Qdrant with Kafka](https://qdrant.tech/documentation/send-data/data-streaming-kafka-qdrant/) | Use Confluent to Stream Data to Qdrant via Managed Kafka. | Qdrant, Kafka | | [Qdrant on Databricks](https://qdrant.tech/documentation/send-data/databricks/) | Learn how to use Qdrant on Databricks using the Spark connector | Qdrant, Databricks, Apache Spark | | [Qdrant with Airflow and Astronomer](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/) | Build a semantic querying system using Airflow and Astronomer | Qdrant, Airflow, Astronomer | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-57-lllmstxt|> ## neural-search - [Documentation](https://qdrant.tech/documentation/) - [Beginner tutorials](https://qdrant.tech/documentation/beginner-tutorials/) - Build a Neural Search Service --- # [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#build-a-neural-search-service-with-sentence-transformers-and-qdrant) Build a Neural Search Service with Sentence Transformers and Qdrant | Time: 30 min | Level: Beginner | Output: [GitHub](https://github.com/qdrant/qdrant_demo/tree/sentense-transformers) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kPktoudAP8Tu8n8l-iVMOQhVmHkWV_L9?usp=sharing) | | --- | --- | --- | --- | This tutorial shows you how to build and deploy your own neural search service to look through descriptions of companies from [startups-list.com](https://www.startups-list.com/) and pick the most similar ones to your query. The website contains the company names, descriptions, locations, and a picture for each entry. A neural search service uses artificial neural networks to improve the accuracy and relevance of search results. Besides offering simple keyword results, this system can retrieve results by meaning. It can understand and interpret complex search queries and provide more contextually relevant output, effectively enhancing the user’s search experience. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#workflow) Workflow To create a neural search service, you will need to transform your raw data and then create a search function to manipulate it. First, you will 1) download and prepare a sample dataset using a modified version of the BERT ML model. Then, you will 2) load the data into Qdrant, 3) create a neural search API and 4) serve it using FastAPI. ![Neural Search Workflow](https://qdrant.tech/docs/workflow-neural-search.png) > **Note**: The code for this tutorial can be found here: \| [Step 1: Data Preparation Process](https://colab.research.google.com/drive/1kPktoudAP8Tu8n8l-iVMOQhVmHkWV_L9?usp=sharing) \| [Step 2: Full Code for Neural Search](https://github.com/qdrant/qdrant_demo/tree/sentense-transformers). \| ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#prerequisites) Prerequisites To complete this tutorial, you will need: - Docker - The easiest way to use Qdrant is to run a pre-built Docker image. - [Raw parsed data](https://storage.googleapis.com/generall-shared-data/startups_demo.json) from startups-list.com. - Python version >=3.8 ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#prepare-sample-dataset) Prepare sample dataset To conduct a neural search on startup descriptions, you must first encode the description data into vectors. To process text, you can use a pre-trained models like [BERT](https://en.wikipedia.org/wiki/BERT_%28language_model%29) or sentence transformers. The [sentence-transformers](https://github.com/UKPLab/sentence-transformers) library lets you conveniently download and use many pre-trained models, such as DistilBERT, MPNet, etc. 1. First you need to download the dataset. ```bash wget https://storage.googleapis.com/generall-shared-data/startups_demo.json ``` 2. Install the SentenceTransformer library as well as other relevant packages. ```bash pip install sentence-transformers numpy pandas tqdm ``` 3. Import the required modules. ```python from sentence_transformers import SentenceTransformer import numpy as np import json import pandas as pd from tqdm.notebook import tqdm ``` You will be using a pre-trained model called `all-MiniLM-L6-v2`. This is a performance-optimized sentence embedding model and you can read more about it and other available models [here](https://www.sbert.net/docs/pretrained_models.html). 4. Download and create a pre-trained sentence encoder. ```python model = SentenceTransformer( "all-MiniLM-L6-v2", device="cuda" ) # or device="cpu" if you don't have a GPU ``` 5. Read the raw data file. ```python df = pd.read_json("./startups_demo.json", lines=True) ``` 6. Encode all startup descriptions to create an embedding vector for each. Internally, the `encode` function will split the input into batches, which will significantly speed up the process. ```python vectors = model.encode( [row.alt + ". " + row.description for row in df.itertuples()], show_progress_bar=True, ) ``` All of the descriptions are now converted into vectors. There are 40474 vectors of 384 dimensions. The output layer of the model has this dimension ```python vectors.shape --- # > (40474, 384) ``` 7. Download the saved vectors into a new file named `startup_vectors.npy` ```python np.save("startup_vectors.npy", vectors, allow_pickle=False) ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#run-qdrant-in-docker) Run Qdrant in Docker Next, you need to manage all of your data using a vector engine. Qdrant lets you store, update or delete created vectors. Most importantly, it lets you search for the nearest vectors via a convenient API. > **Note:** Before you begin, create a project directory and a virtual python environment in it. 1. Download the Qdrant image from DockerHub. ```bash docker pull qdrant/qdrant ``` 2. Start Qdrant inside of Docker. ```bash docker run -p 6333:6333 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrant ``` You should see output like this ```text ... [2021-02-05T00:08:51Z INFO actix_server::builder] Starting 12 workers [2021-02-05T00:08:51Z INFO actix_server::builder] Starting "actix-web-service-0.0.0.0:6333" service on 0.0.0.0:6333 ``` Test the service by going to [http://localhost:6333/](http://localhost:6333/). You should see the Qdrant version info in your browser. All data uploaded to Qdrant is saved inside the `./qdrant_storage` directory and will be persisted even if you recreate the container. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#upload-data-to-qdrant) Upload data to Qdrant 1. Install the official Python client to best interact with Qdrant. ```bash pip install qdrant-client ``` At this point, you should have startup records in the `startups_demo.json` file, encoded vectors in `startup_vectors.npy` and Qdrant running on a local machine. Now you need to write a script to upload all startup data and vectors into the search engine. 2. Create a client object for Qdrant. ```python --- # Import client library from qdrant_client import QdrantClient from qdrant_client.models import VectorParams, Distance client = QdrantClient("http://localhost:6333") ``` 3. Related vectors need to be added to a collection. Create a new collection for your startup vectors. ```python if not client.collection_exists("startups"): client.create_collection( collection_name="startups", vectors_config=VectorParams(size=384, distance=Distance.COSINE), ) ``` 4. Create an iterator over the startup data and vectors. The Qdrant client library defines a special function that allows you to load datasets into the service. However, since there may be too much data to fit a single computer memory, the function takes an iterator over the data as input. ```python fd = open("./startups_demo.json") --- # payload is now an iterator over startup data payload = map(json.loads, fd) --- # Load all vectors into memory, numpy array works as iterable for itself. --- # Other option would be to use Mmap, if you don't want to load all data into RAM vectors = np.load("./startup_vectors.npy") ``` 5. Upload the data ```python client.upload_collection( collection_name="startups", vectors=vectors, payload=payload, ids=None, # Vector ids will be assigned automatically batch_size=256, # How many vectors will be uploaded in a single request? ) ``` Vectors are now uploaded to Qdrant. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#build-the-search-api) Build the search API Now that all the preparations are complete, let’s start building a neural search class. In order to process incoming requests, neural search will need 2 things: 1) a model to convert the query into a vector and 2) the Qdrant client to perform search queries. 1. Create a file named `neural_searcher.py` and specify the following. ```python from qdrant_client import QdrantClient from sentence_transformers import SentenceTransformer class NeuralSearcher: def __init__(self, collection_name): self.collection_name = collection_name # Initialize encoder model self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu") # initialize Qdrant client self.qdrant_client = QdrantClient("http://localhost:6333") ``` 2. Write the search function. ```python def search(self, text: str): # Convert text query into vector vector = self.model.encode(text).tolist() # Use `vector` for search for closest vectors in the collection search_result = self.qdrant_client.query_points( collection_name=self.collection_name, query=vector, query_filter=None, # If you don't want any filters for now limit=5, # 5 the most closest results is enough ).points # `search_result` contains found vector ids with similarity scores along with the stored payload # In this function you are interested in payload only payloads = [hit.payload for hit in search_result] return payloads ``` 3. Add search filters. With Qdrant it is also feasible to add some conditions to the search. For example, if you wanted to search for startups in a certain city, the search query could look like this: ```python from qdrant_client.models import Filter ... city_of_interest = "Berlin" # Define a filter for cities city_filter = Filter(**{ "must": [{\ "key": "city", # Store city information in a field of the same name\ "match": { # This condition checks if payload field has the requested value\ "value": city_of_interest\ }\ }] }) search_result = self.qdrant_client.query_points( collection_name=self.collection_name, query=vector, query_filter=city_filter, limit=5 ).points ... ``` You have now created a class for neural search queries. Now wrap it up into a service. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#deploy-the-search-with-fastapi) Deploy the search with FastAPI To build the service you will use the FastAPI framework. 1. Install FastAPI. To install it, use the command ```bash pip install fastapi uvicorn ``` 2. Implement the service. Create a file named `service.py` and specify the following. The service will have only one API endpoint and will look like this: ```python from fastapi import FastAPI --- # The file where NeuralSearcher is stored from neural_searcher import NeuralSearcher app = FastAPI() --- # Create a neural searcher instance neural_searcher = NeuralSearcher(collection_name="startups") @app.get("/api/search") def search_startup(q: str): return {"result": neural_searcher.search(text=q)} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` 3. Run the service. ```bash python service.py ``` 4. Open your browser at [http://localhost:8000/docs](http://localhost:8000/docs). You should be able to see a debug interface for your service. ![FastAPI Swagger interface](https://qdrant.tech/docs/fastapi_neural_search.png) Feel free to play around with it, make queries regarding the companies in our corpus, and check out the results. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/neural-search/\#next-steps) Next steps The code from this tutorial has been used to develop a [live online demo](https://qdrant.to/semantic-search-demo). You can try it to get an intuition for cases when the neural search is useful. The demo contains a switch that selects between neural and full-text searches. You can turn the neural search on and off to compare your result with a regular full-text search. > **Note**: The code for this tutorial can be found here: \| [Step 1: Data Preparation Process](https://colab.research.google.com/drive/1kPktoudAP8Tu8n8l-iVMOQhVmHkWV_L9?usp=sharing) \| [Step 2: Full Code for Neural Search](https://github.com/qdrant/qdrant_demo/tree/sentense-transformers). \| Join our [Discord community](https://qdrant.to/discord), where we talk about vector search and similarity learning, publish other examples of neural networks and neural search applications. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/neural-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/neural-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-58-lllmstxt|> ## observability - [Documentation](https://qdrant.tech/documentation/) - Observability ## [Anchor](https://qdrant.tech/documentation/observability/\#observability-integrations) Observability Integrations | Tool | Description | | --- | --- | | [OpenLIT](https://qdrant.tech/documentation/observability/openlit/) | Platform for OpenTelemetry-native Observability & Evals for LLMs and Vector Databases. | | [OpenLLMetry](https://qdrant.tech/documentation/observability/openllmetry/) | Set of OpenTelemetry extensions to add Observability for your LLM application. | | [Datadog](https://qdrant.tech/documentation/observability/datadog/) | Cloud-based monitoring and analytics platform. | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/observability/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/observability/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-59-lllmstxt|> ## cloud-quickstart - [Documentation](https://qdrant.tech/documentation/) - Cloud Quickstart --- # [Anchor](https://qdrant.tech/documentation/cloud-quickstart/\#how-to-get-started-with-qdrant-cloud) How to Get Started With Qdrant Cloud How to Get Started With Qdrant Cloud - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [How to Get Started With Qdrant Cloud](https://www.youtube.com/watch?v=3hrQP3hh69Y) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Why am I seeing this?](https://support.google.com/youtube/answer/9004474?hl=en) [Watch on](https://www.youtube.com/watch?v=3hrQP3hh69Y&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 1:53 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=3hrQP3hh69Y "Watch on YouTube") You can try vector search on Qdrant Cloud in three steps. Instructions are below, but the video is faster: ## [Anchor](https://qdrant.tech/documentation/cloud-quickstart/\#setup-a-qdrant-cloud-cluster) Setup a Qdrant Cloud Cluster 1. Register for a [Cloud account](https://cloud.qdrant.io/signup) with your email, Google or Github credentials. 2. Go to **Clusters** and follow the onboarding instructions under **Create First Cluster**. ![create a cluster](https://qdrant.tech/docs/gettingstarted/gui-quickstart/create-cluster.png) 3. When you create it, you will receive an API key. You will need to copy it and store it somewhere self. It will not be displayed again. If you loose it, you can always create a new one on the **Cluster Detail Page** later. ![get api key](https://qdrant.tech/docs/gettingstarted/gui-quickstart/api-key.png) ## [Anchor](https://qdrant.tech/documentation/cloud-quickstart/\#access-the-cluster-ui) Access the Cluster UI 1. Click on **Cluster UI** on the **Cluster Detail Page** to access the cluster UI dashboard. 2. Paste your new API key here. You can revoke and create new API keys in the **API Keys** tab on your **Cluster Detail Page**. 3. The key will grant you access to your Qdrant instance. Now you can see the cluster Dashboard. ![access the dashboard](https://qdrant.tech/docs/gettingstarted/gui-quickstart/access-dashboard.png) ## [Anchor](https://qdrant.tech/documentation/cloud-quickstart/\#authenticate-via-sdks) Authenticate via SDKs Now that you have your cluster and key, you can use our official SDKs to access Qdrant Cloud from within your application. bashpythontypescriptrustjavacsharpgo ```bash curl \ -X GET https://xyz-example.eu-central.aws.cloud.qdrant.io:6333 \ --header 'api-key: ' --- # Alternatively, you can use the `Authorization` header with the `Bearer` prefix curl \ -X GET https://xyz-example.eu-central.aws.cloud.qdrant.io:6333 \ --header 'Authorization: Bearer ' ``` ```python from qdrant_client import QdrantClient qdrant_client = QdrantClient( host="xyz-example.eu-central.aws.cloud.qdrant.io", api_key="", ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "xyz-example.eu-central.aws.cloud.qdrant.io", apiKey: "", }); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("https://xyz-example.eu-central.aws.cloud.qdrant.io:6334") .api_key("") .build()?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder( "xyz-example.eu-central.aws.cloud.qdrant.io", 6334, true) .withApiKey("") .build()); ``` ```csharp using Qdrant.Client; var client = new QdrantClient( host: "xyz-example.eu-central.aws.cloud.qdrant.io", https: true, apiKey: "" ); ``` ```go import "github.com/qdrant/go-client/qdrant" client, err := qdrant.NewClient(&qdrant.Config{ Host: "xyz-example.eu-central.aws.cloud.qdrant.io", Port: 6334, APIKey: "", UseTLS: true, }) ``` ## [Anchor](https://qdrant.tech/documentation/cloud-quickstart/\#try-the-tutorial-sandbox) Try the Tutorial Sandbox 1. Open the interactive **Tutorial**. Here, you can test basic Qdrant API requests. 2. Using the **Quickstart** instructions, create a collection, add vectors and run a search. 3. The output on the right will show you some basic semantic search results. ![interactive-tutorial](https://qdrant.tech/docs/gettingstarted/gui-quickstart/interactive-tutorial.png) ## [Anchor](https://qdrant.tech/documentation/cloud-quickstart/\#thats-vector-search) That’s Vector Search! You can stay in the sandbox and continue trying our different API calls. When ready, use the Console and our complete REST API to try other operations. ## [Anchor](https://qdrant.tech/documentation/cloud-quickstart/\#whats-next) What’s Next? Now that you have a Qdrant Cloud cluster up and running, you should [test remote access](https://qdrant.tech/documentation/cloud/authentication/#test-cluster-access) with a Qdrant Client. For more about Qdrant Cloud, check our [dedicated documentation](https://qdrant.tech/documentation/cloud-intro/). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-quickstart.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-quickstart.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-60-lllmstxt|> ## documentation --- # Qdrant Documentation Qdrant is an AI-native vector database and a semantic search engine. You can use it to extract meaningful information from unstructured data. [Clone this repo now](https://github.com/qdrant/qdrant_demo/) and build a search engine in five minutes. [Cloud Quickstart](https://qdrant.tech/documentation/quickstart-cloud/) [Local Quickstart](https://qdrant.tech/documentation/quickstart/) ## Ready to start developing? Qdrant is open-source and can be self-hosted. However, the quickest way to get started is with our [free tier](https://qdrant.to/cloud) on Qdrant Cloud. It scales easily and provides a UI where you can interact with data. ### Create your first Qdrant Cloud cluster today [Get Started](https://qdrant.to/cloud) ![](https://qdrant.tech/img/rocket.svg) ## Optimize Qdrant's performance Boost search speed, reduce latency, and improve the accuracy and memory usage of your Qdrant deployment. [Learn More](https://qdrant.tech/documentation/guides/optimize/) [![Documents](https://qdrant.tech/icons/outline/documentation-blue.svg)Documents\\ **Distributed Deployment** \\ Scale Qdrant beyond a single node and optimize for high availability, fault tolerance, and billion-scale performance.\\ Read More](https://qdrant.tech/documentation/guides/distributed_deployment/) [![Documents](https://qdrant.tech/icons/outline/documentation-blue.svg)Documents\\ **Multitenancy** \\ Build vector search apps that serve millions of users. Learn about data isolation, security, and performance tuning.\\ Read More](https://qdrant.tech/documentation/guides/multiple-partitions/) [![Blog](https://qdrant.tech/icons/outline/blog-purple.svg)Blog\\ **Vector Quantization** \\ Learn about cutting-edge techniques for vector quantization and how they can be used to improve search performance.\\ Read More](https://qdrant.tech/articles/what-is-vector-quantization/) × [Powered by](https://qdrant.tech/) <|page-61-lllmstxt|> ## cars-recognition - [Articles](https://qdrant.tech/articles/) - Fine Tuning Similar Cars Search [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Fine Tuning Similar Cars Search Yusuf Sarıgöz · June 28, 2022 ![Fine Tuning Similar Cars Search](https://qdrant.tech/articles_data/cars-recognition/preview/title.jpg) Supervised classification is one of the most widely used training objectives in machine learning, but not every task can be defined as such. For example, 1. Your classes may change quickly —e.g., new classes may be added over time, 2. You may not have samples from every possible category, 3. It may be impossible to enumerate all the possible classes during the training time, 4. You may have an essentially different task, e.g., search or retrieval. All such problems may be efficiently solved with similarity learning. N.B.: If you are new to the similarity learning concept, checkout the [awesome-metric-learning](https://github.com/qdrant/awesome-metric-learning) repo for great resources and use case examples. However, similarity learning comes with its own difficulties such as: 1. Need for larger batch sizes usually, 2. More sophisticated loss functions, 3. Changing architectures between training and inference. Quaterion is a fine tuning framework built to tackle such problems in similarity learning. It uses [PyTorch Lightning](https://www.pytorchlightning.ai/) as a backend, which is advertized with the motto, “spend more time on research, less on engineering.” This is also true for Quaterion, and it includes: 1. Trainable and servable model classes, 2. Annotated built-in loss functions, and a wrapper over [pytorch-metric-learning](https://kevinmusgrave.github.io/pytorch-metric-learning/) when you need even more, 3. Sample, dataset and data loader classes to make it easier to work with similarity learning data, 4. A caching mechanism for faster iterations and less memory footprint. ## [Anchor](https://qdrant.tech/articles/cars-recognition/\#a-closer-look-at-quaterion) A closer look at Quaterion Let’s break down some important modules: - `TrainableModel`: A subclass of `pl.LightNingModule` that has additional hook methods such as `configure_encoders`, `configure_head`, `configure_metrics` and others to define objects needed for training and evaluation —see below to learn more on these. - `SimilarityModel`: An inference-only export method to boost code transfer and lower dependencies during the inference time. In fact, Quaterion is composed of two packages: 1. `quaterion_models`: package that you need for inference. 2. `quaterion`: package that defines objects needed for training and also depends on `quaterion_models`. - `Encoder` and `EncoderHead`: Two objects that form a `SimilarityModel`. In most of the cases, you may use a frozen pretrained encoder, e.g., ResNets from `torchvision`, or language modelling models from `transformers`, with a trainable `EncoderHead` stacked on top of it. `quaterion_models` offers several ready-to-use `EncoderHead` implementations, but you may also create your own by subclassing a parent class or easily listing PyTorch modules in a `SequentialHead`. Quaterion has other objects such as distance functions, evaluation metrics, evaluators, convenient dataset and data loader classes, but these are mostly self-explanatory. Thus, they will not be explained in detail in this article for brevity. However, you can always go check out the [documentation](https://quaterion.qdrant.tech/) to learn more about them. The focus of this tutorial is a step-by-step solution to a similarity learning problem with Quaterion. This will also help us better understand how the abovementioned objects fit together in a real project. Let’s start walking through some of the important parts of the code. If you are looking for the complete source code instead, you can find it under the [examples](https://github.com/qdrant/quaterion/tree/master/examples/cars) directory in the Quaterion repo. ## [Anchor](https://qdrant.tech/articles/cars-recognition/\#dataset) Dataset In this tutorial, we will use the [Stanford Cars](https://pytorch.org/vision/main/generated/torchvision.datasets.StanfordCars.html) dataset. ![Stanford Cars Dataset](https://storage.googleapis.com/quaterion/docs/class_montage.jpg) Stanford Cars Dataset It has 16185 images of cars from 196 classes, and it is split into training and testing subsets with almost a 50-50% split. To make things even more interesting, however, we will first merge training and testing subsets, then we will split it into two again in such a way that the half of the 196 classes will be put into the training set and the other half will be in the testing set. This will let us test our model with samples from novel classes that it has never seen in the training phase, which is what supervised classification cannot achieve but similarity learning can. In the following code borrowed from [`data.py`](https://github.com/qdrant/quaterion/blob/master/examples/cars/data.py): - `get_datasets()` function performs the splitting task described above. - `get_dataloaders()` function creates `GroupSimilarityDataLoader` instances from training and testing datasets. - Datasets are regular PyTorch datasets that emit `SimilarityGroupSample` instances. N.B.: Currently, Quaterion has two data types to represent samples in a dataset. To learn more about `SimilarityPairSample`, check out the [NLP tutorial](https://quaterion.qdrant.tech/tutorials/nlp_tutorial.html) ```python import numpy as np import os import tqdm from torch.utils.data import Dataset, Subset from torchvision import datasets, transforms from typing import Callable from pytorch_lightning import seed_everything from quaterion.dataset import ( GroupSimilarityDataLoader, SimilarityGroupSample, ) --- # set seed to deterministically sample train and test categories later on seed_everything(seed=42) --- # dataset will be downloaded to this directory under local directory dataset_path = os.path.join(".", "torchvision", "datasets") def get_datasets(input_size: int): # Use Mean and std values for the ImageNet dataset as the base model was pretrained on it. # taken from https://www.geeksforgeeks.org/how-to-normalize-images-in-pytorch/ mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] # create train and test transforms transform = transforms.Compose( [\ transforms.Resize((input_size, input_size)),\ transforms.ToTensor(),\ transforms.Normalize(mean, std),\ ] ) # we need to merge train and test splits into a full dataset first, # and then we will split it to two subsets again with each one composed of distinct labels. full_dataset = datasets.StanfordCars( root=dataset_path, split="train", download=True ) + datasets.StanfordCars(root=dataset_path, split="test", download=True) # full_dataset contains examples from 196 categories labeled with an integer from 0 to 195 # randomly sample half of it to be used for training train_categories = np.random.choice(a=196, size=196 // 2, replace=False) # get a list of labels for all samples in the dataset labels_list = np.array([label for _, label in tqdm.tqdm(full_dataset)]) # get a mask for indices where label is included in train_categories labels_mask = np.isin(labels_list, train_categories) # get a list of indices to be used as train samples train_indices = np.argwhere(labels_mask).squeeze() # others will be used as test samples test_indices = np.argwhere(np.logical_not(labels_mask)).squeeze() # now that we have distinct indices for train and test sets, we can use `Subset` to create new datasets # from `full_dataset`, which contain only the samples at given indices. # finally, we apply transformations created above. train_dataset = CarsDataset( Subset(full_dataset, train_indices), transform=transform ) test_dataset = CarsDataset( Subset(full_dataset, test_indices), transform=transform ) return train_dataset, test_dataset def get_dataloaders( batch_size: int, input_size: int, shuffle: bool = False, ): train_dataset, test_dataset = get_datasets(input_size) train_dataloader = GroupSimilarityDataLoader( train_dataset, batch_size=batch_size, shuffle=shuffle ) test_dataloader = GroupSimilarityDataLoader( test_dataset, batch_size=batch_size, shuffle=False ) return train_dataloader, test_dataloader class CarsDataset(Dataset): def __init__(self, dataset: Dataset, transform: Callable): self._dataset = dataset self._transform = transform def __len__(self) -> int: return len(self._dataset) def __getitem__(self, index) -> SimilarityGroupSample: image, label = self._dataset[index] image = self._transform(image) return SimilarityGroupSample(obj=image, group=label) ``` ## [Anchor](https://qdrant.tech/articles/cars-recognition/\#trainable-model) Trainable Model Now it’s time to review one of the most exciting building blocks of Quaterion: [TrainableModel](https://quaterion.qdrant.tech/quaterion.train.trainable_model.html#module-quaterion.train.trainable_model). It is the base class for models you would like to configure for training, and it provides several hook methods starting with `configure_` to set up every aspect of the training phase just like [`pl.LightningModule`](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.core.LightningModule.html), its own base class. It is central to fine tuning with Quaterion, so we will break down this essential code in [`models.py`](https://github.com/qdrant/quaterion/blob/master/examples/cars/models.py) and review each method separately. Let’s begin with the imports: ```python import torch import torchvision from quaterion_models.encoders import Encoder from quaterion_models.heads import EncoderHead, SkipConnectionHead from torch import nn from typing import Dict, Union, Optional, List from quaterion import TrainableModel from quaterion.eval.attached_metric import AttachedMetric from quaterion.eval.group import RetrievalRPrecision from quaterion.loss import SimilarityLoss, TripletLoss from quaterion.train.cache import CacheConfig, CacheType from .encoders import CarsEncoder ``` In the following code snippet, we subclass `TrainableModel`. You may use `__init__()` to store some attributes to be used in various `configure_*` methods later on. The more interesting part is, however, in the [`configure_encoders()`](https://quaterion.qdrant.tech/quaterion.train.trainable_model.html#quaterion.train.trainable_model.TrainableModel.configure_encoders) method. We need to return an instance of [`Encoder`](https://quaterion-models.qdrant.tech/quaterion_models.encoders.encoder.html#quaterion_models.encoders.encoder.Encoder) (or a dictionary with `Encoder` instances as values) from this method. In our case, it is an instance of `CarsEncoders`, which we will review soon. Notice now how it is created with a pretrained ResNet152 model whose classification layer is replaced by an identity function. ```python class Model(TrainableModel): def __init__(self, lr: float, mining: str): self._lr = lr self._mining = mining super().__init__() def configure_encoders(self) -> Union[Encoder, Dict[str, Encoder]]: pre_trained_encoder = torchvision.models.resnet152(pretrained=True) pre_trained_encoder.fc = nn.Identity() return CarsEncoder(pre_trained_encoder) ``` In Quaterion, a [`SimilarityModel`](https://quaterion-models.qdrant.tech/quaterion_models.model.html#quaterion_models.model.SimilarityModel) is composed of one or more `Encoder` s and an [`EncoderHead`](https://quaterion-models.qdrant.tech/quaterion_models.heads.encoder_head.html#quaterion_models.heads.encoder_head.EncoderHead). `quaterion_models` has [several `EncoderHead` implementations](https://quaterion-models.qdrant.tech/quaterion_models.heads.html#module-quaterion_models.heads) with a unified API such as a configurable dropout value. You may use one of them or create your own subclass of `EncoderHead`. In either case, you need to return an instance of it from [`configure_head`](https://quaterion.qdrant.tech/quaterion.train.trainable_model.html#quaterion.train.trainable_model.TrainableModel.configure_head) In this example, we will use a `SkipConnectionHead`, which is lightweight and more resistant to overfitting. ```python def configure_head(self, input_embedding_size) -> EncoderHead: return SkipConnectionHead(input_embedding_size, dropout=0.1) ``` Quaterion has implementations of [some popular loss functions](https://quaterion.qdrant.tech/quaterion.loss.html) for similarity learning, all of which subclass either [`GroupLoss`](https://quaterion.qdrant.tech/quaterion.loss.group_loss.html#quaterion.loss.group_loss.GroupLoss) or [`PairwiseLoss`](https://quaterion.qdrant.tech/quaterion.loss.pairwise_loss.html#quaterion.loss.pairwise_loss.PairwiseLoss). In this example, we will use [`TripletLoss`](https://quaterion.qdrant.tech/quaterion.loss.triplet_loss.html#quaterion.loss.triplet_loss.TripletLoss), which is a subclass of `GroupLoss`. In general, subclasses of `GroupLoss` are used with datasets in which samples are assigned with some group (or label). In our example label is a make of the car. Those datasets should emit `SimilarityGroupSample`. Other alternatives are implementations of `PairwiseLoss`, which consume `SimilarityPairSample` \- pair of objects for which similarity is specified individually. To see an example of the latter, you may need to check out the [NLP Tutorial](https://quaterion.qdrant.tech/tutorials/nlp_tutorial.html) ```python def configure_loss(self) -> SimilarityLoss: return TripletLoss(mining=self._mining, margin=0.5) ``` `configure_optimizers()` may be familiar to PyTorch Lightning users, but there is a novel `self.model` used inside that method. It is an instance of `SimilarityModel` and is automatically created by Quaterion from the return values of `configure_encoders()` and `configure_head()`. ```python def configure_optimizers(self): optimizer = torch.optim.Adam(self.model.parameters(), self._lr) return optimizer ``` Caching in Quaterion is used for avoiding calculation of outputs of a frozen pretrained `Encoder` in every epoch. When it is configured, outputs will be computed once and cached in the preferred device for direct usage later on. It provides both a considerable speedup and less memory footprint. However, it is quite a bit versatile and has several knobs to tune. To get the most out of its potential, it’s recommended that you check out the [cache tutorial](https://quaterion.qdrant.tech/tutorials/cache_tutorial.html). For the sake of making this article self-contained, you need to return a [`CacheConfig`](https://quaterion.qdrant.tech/quaterion.train.cache.cache_config.html#quaterion.train.cache.cache_config.CacheConfig) instance from [`configure_caches()`](https://quaterion.qdrant.tech/quaterion.train.trainable_model.html#quaterion.train.trainable_model.TrainableModel.configure_caches) to specify cache-related preferences such as: - [`CacheType`](https://quaterion.qdrant.tech/quaterion.train.cache.cache_config.html#quaterion.train.cache.cache_config.CacheType), i.e., whether to store caches on CPU or GPU, - `save_dir`, i.e., where to persist caches for subsequent runs, - `batch_size`, i.e., batch size to be used only when creating caches - the batch size to be used during the actual training might be different. ```python def configure_caches(self) -> Optional[CacheConfig]: return CacheConfig( cache_type=CacheType.AUTO, save_dir="./cache_dir", batch_size=32 ) ``` We have just configured the training-related settings of a `TrainableModel`. However, evaluation is an integral part of experimentation in machine learning, and you may configure evaluation metrics by returning one or more [`AttachedMetric`](https://quaterion.qdrant.tech/quaterion.eval.attached_metric.html#quaterion.eval.attached_metric.AttachedMetric) instances from `configure_metrics()`. Quaterion has several built-in [group](https://quaterion.qdrant.tech/quaterion.eval.group.html) and [pairwise](https://quaterion.qdrant.tech/quaterion.eval.pair.html) evaluation metrics. ```python def configure_metrics(self) -> Union[AttachedMetric, List[AttachedMetric]]: return AttachedMetric( "rrp", metric=RetrievalRPrecision(), prog_bar=True, on_epoch=True, on_step=False, ) ``` ## [Anchor](https://qdrant.tech/articles/cars-recognition/\#encoder) Encoder As previously stated, a `SimilarityModel` is composed of one or more `Encoder` s and an `EncoderHead`. Even if we freeze pretrained `Encoder` instances, `EncoderHead` is still trainable and has enough parameters to adapt to the new task at hand. It is recommended that you set the `trainable` property to `False` whenever possible, as it lets you benefit from the caching mechanism described above. Another important property is `embedding_size`, which will be passed to `TrainableModel.configure_head()` as `input_embedding_size` to let you properly initialize the head layer. Let’s see how an `Encoder` is implemented in the following code borrowed from [`encoders.py`](https://github.com/qdrant/quaterion/blob/master/examples/cars/encoders.py): ```python import os import torch import torch.nn as nn from quaterion_models.encoders import Encoder class CarsEncoder(Encoder): def __init__(self, encoder_model: nn.Module): super().__init__() self._encoder = encoder_model self._embedding_size = 2048 # last dimension from the ResNet model @property def trainable(self) -> bool: return False @property def embedding_size(self) -> int: return self._embedding_size ``` An `Encoder` is a regular `torch.nn.Module` subclass, and we need to implement the forward pass logic in the `forward` method. Depending on how you create your submodules, this method may be more complex; however, we simply pass the input through a pretrained ResNet152 backbone in this example: ```python def forward(self, images): embeddings = self._encoder.forward(images) return embeddings ``` An important step of machine learning development is proper saving and loading of models. Quaterion lets you save your `SimilarityModel` with [`TrainableModel.save_servable()`](https://quaterion.qdrant.tech/quaterion.train.trainable_model.html#quaterion.train.trainable_model.TrainableModel.save_servable) and restore it with [`SimilarityModel.load()`](https://quaterion-models.qdrant.tech/quaterion_models.model.html#quaterion_models.model.SimilarityModel.load). To be able to use these two methods, you need to implement `save()` and `load()` methods in your `Encoder`. Additionally, it is also important that you define your subclass of `Encoder` outside the `__main__` namespace, i.e., in a separate file from your main entry point. It may not be restored properly otherwise. ```python def save(self, output_path: str): os.makedirs(output_path, exist_ok=True) torch.save(self._encoder, os.path.join(output_path, "encoder.pth")) @classmethod def load(cls, input_path): encoder_model = torch.load(os.path.join(input_path, "encoder.pth")) return CarsEncoder(encoder_model) ``` ## [Anchor](https://qdrant.tech/articles/cars-recognition/\#training) Training With all essential objects implemented, it is easy to bring them all together and run a training loop with the [`Quaterion.fit()`](https://quaterion.qdrant.tech/quaterion.main.html#quaterion.main.Quaterion.fit) method. It expects: - A `TrainableModel`, - A [`pl.Trainer`](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html), - A [`SimilarityDataLoader`](https://quaterion.qdrant.tech/quaterion.dataset.similarity_data_loader.html#quaterion.dataset.similarity_data_loader.SimilarityDataLoader) for training data, - And optionally, another `SimilarityDataLoader` for evaluation data. We need to import a few objects to prepare all of these: ```python import os import pytorch_lightning as pl import torch from pytorch_lightning.callbacks import EarlyStopping, ModelSummary from quaterion import Quaterion from .data import get_dataloaders from .models import Model ``` The `train()` function in the following code snippet expects several hyperparameter values as arguments. They can be defined in a `config.py` or passed from the command line. However, that part of the code is omitted for brevity. Instead let’s focus on how all the building blocks are initialized and passed to `Quaterion.fit()`, which is responsible for running the whole loop. When the training loop is complete, you can simply call `TrainableModel.save_servable()` to save the current state of the `SimilarityModel` instance: ```python def train( lr: float, mining: str, batch_size: int, epochs: int, input_size: int, shuffle: bool, save_dir: str, ): model = Model( lr=lr, mining=mining, ) train_dataloader, val_dataloader = get_dataloaders( batch_size=batch_size, input_size=input_size, shuffle=shuffle ) early_stopping = EarlyStopping( monitor="validation_loss", patience=50, ) trainer = pl.Trainer( gpus=1 if torch.cuda.is_available() else 0, max_epochs=epochs, callbacks=[early_stopping, ModelSummary(max_depth=3)], enable_checkpointing=False, log_every_n_steps=1, ) Quaterion.fit( trainable_model=model, trainer=trainer, train_dataloader=train_dataloader, val_dataloader=val_dataloader, ) model.save_servable(save_dir) ``` ## [Anchor](https://qdrant.tech/articles/cars-recognition/\#evaluation) Evaluation Let’s see what we have achieved with these simple steps. [`evaluate.py`](https://github.com/qdrant/quaterion/blob/master/examples/cars/evaluate.py) has two functions to evaluate both the baseline model and the tuned similarity model. We will review only the latter for brevity. In addition to the ease of restoring a `SimilarityModel`, this code snippet also shows how to use [`Evaluator`](https://quaterion.qdrant.tech/quaterion.eval.evaluator.html#quaterion.eval.evaluator.Evaluator) to evaluate the performance of a `SimilarityModel` on a given dataset by given evaluation metrics. ![Comparison of original and tuned models for retrieval](https://storage.googleapis.com/quaterion/docs/original_vs_tuned_cars.png) Comparison of original and tuned models for retrieval Full evaluation of a dataset usually grows exponentially, and thus you may want to perform a partial evaluation on a sampled subset. In this case, you may use [samplers](https://quaterion.qdrant.tech/quaterion.eval.samplers.html) to limit the evaluation. Similar to `Quaterion.fit()` used for training, [`Quaterion.evaluate()`](https://quaterion.qdrant.tech/quaterion.main.html#quaterion.main.Quaterion.evaluate) runs a complete evaluation loop. It takes the following as arguments: - An `Evaluator` instance created with given evaluation metrics and a `Sampler`, - The `SimilarityModel` to be evaluated, - And the evaluation dataset. ```python def eval_tuned_encoder(dataset, device): print("Evaluating tuned encoder...") tuned_cars_model = SimilarityModel.load( os.path.join(os.path.dirname(__file__), "cars_encoders") ).to(device) tuned_cars_model.eval() result = Quaterion.evaluate( evaluator=Evaluator( metrics=RetrievalRPrecision(), sampler=GroupSampler(sample_size=1000, device=device, log_progress=True), ), model=tuned_cars_model, dataset=dataset, ) print(result) ``` ## [Anchor](https://qdrant.tech/articles/cars-recognition/\#conclusion) Conclusion In this tutorial, we trained a similarity model to search for similar cars from novel categories unseen in the training phase. Then, we evaluated it on a test dataset by the Retrieval R-Precision metric. The base model scored 0.1207, and our tuned model hit 0.2540, a twice higher score. These scores can be seen in the following figure: ![Metrics for the base and tuned models](https://qdrant.tech/articles_data/cars-recognition/cars_metrics.png) Metrics for the base and tuned models ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/cars-recognition.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/cars-recognition.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-62-lllmstxt|> ## rag-chatbot-scaleway - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Blog-Reading Chatbot with GPT-4o --- # [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#blog-reading-chatbot-with-gpt-4o) Blog-Reading Chatbot with GPT-4o | Time: 90 min | Level: Advanced | [GitHub](https://github.com/qdrant/examples/blob/langchain-lcel-rag/langchain-lcel-rag/Langchain-LCEL-RAG-Demo.ipynb) | | | --- | --- | --- | --- | In this tutorial, you will build a RAG system that combines blog content ingestion with the capabilities of semantic search. **OpenAI’s GPT-4o LLM** is powerful, but scaling its use requires us to supply context systematically. RAG enhances the LLM’s generation of answers by retrieving relevant documents to aid the question-answering process. This setup showcases the integration of advanced search and AI language processing to improve information retrieval and generation tasks. A notebook for this tutorial is available on [GitHub](https://github.com/qdrant/examples/blob/langchain-lcel-rag/langchain-lcel-rag/Langchain-LCEL-RAG-Demo.ipynb). **Data Privacy and Sovereignty:** RAG applications often rely on sensitive or proprietary internal data. Running the entire stack within your own environment becomes crucial for maintaining control over this data. Qdrant Hybrid Cloud deployed on [Scaleway](https://www.scaleway.com/) addresses this need perfectly, offering a secure, scalable platform that still leverages the full potential of RAG. Scaleway offers serverless [Functions](https://www.scaleway.com/en/serverless-functions/) and serverless [Jobs](https://www.scaleway.com/en/serverless-jobs/), both of which are ideal for embedding creation in large-scale RAG cases. ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#components) Components - **Cloud Host:** [Scaleway on managed Kubernetes](https://www.scaleway.com/en/kubernetes-kapsule/) for compatibility with Qdrant Hybrid Cloud. - **Vector Database:** Qdrant Hybrid Cloud as the vector search engine for retrieval. - **LLM:** GPT-4o, developed by OpenAI is utilized as the generator for producing answers. - **Framework:** [LangChain](https://www.langchain.com/) for extensive RAG capabilities. ![Architecture diagram](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/architecture-diagram.png) > Langchain [supports a wide range of LLMs](https://python.langchain.com/docs/integrations/chat/), and GPT-4o is used as the main generator in this tutorial. You can easily swap it out for your preferred model that might be launched on your premises to complete the fully private setup. For the sake of simplicity, we used the OpenAI APIs, but LangChain makes the transition seamless. ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#deploying-qdrant-hybrid-cloud-on-scaleway) Deploying Qdrant Hybrid Cloud on Scaleway [Scaleway Kapsule](https://www.scaleway.com/en/kubernetes-kapsule/) and [Kosmos](https://www.scaleway.com/en/kubernetes-kosmos/) are managed Kubernetes services from [Scaleway](https://www.scaleway.com/en/). They abstract away the complexities of managing and operating a Kubernetes cluster. The primary difference being, Kapsule clusters are composed solely of Scaleway Instances. Whereas, a Kosmos cluster is a managed multi-cloud Kubernetes engine that allows you to connect instances from any cloud provider to a single managed Control-Plane. 1. To start using managed Kubernetes on Scaleway, follow the [platform-specific documentation](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/#scaleway). 2. Once your Kubernetes clusters are up, [you can begin deploying Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/). ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#prerequisites) Prerequisites To prepare the environment for working with Qdrant and related libraries, it’s necessary to install all required Python packages. This can be done using Poetry, a tool for dependency management and packaging in Python. The code snippet imports various libraries essential for the tasks ahead, including `bs4` for parsing HTML and XML documents, `langchain` and its community extensions for working with language models and document loaders, and `Qdrant` for vector storage and retrieval. These imports lay the groundwork for utilizing Qdrant alongside other tools for natural language processing and machine learning tasks. Qdrant will be running on a specific URL and access will be restricted by the API key. Make sure to store them both as environment variables as well: ```shell export QDRANT_URL="https://qdrant.example.com" export QDRANT_API_KEY="your-api-key" ``` _Optional:_ Whenever you use LangChain, you can also [configure LangSmith](https://docs.smith.langchain.com/), which will help us trace, monitor and debug LangChain applications. You can sign up for LangSmith [here](https://smith.langchain.com/). ```shell export LANGCHAIN_TRACING_V2=true export LANGCHAIN_API_KEY="your-api-key" export LANGCHAIN_PROJECT="your-project" # if not specified, defaults to "default" ``` Now you can get started: ```python import getpass import os import bs4 from langchain import hub from langchain_community.document_loaders import WebBaseLoader from langchain_qdrant import Qdrant from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter ``` Set up the OpenAI API key: ```python os.environ["OPENAI_API_KEY"] = getpass.getpass() ``` Initialize the language model: ```python llm = ChatOpenAI(model="gpt-4o") ``` It is here that we configure both the Embeddings and LLM. You can replace this with your own models using Ollama or other services. Scaleway has some great [L4 GPU Instances](https://www.scaleway.com/en/l4-gpu-instance/) you can use for compute here. ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#download-and-parse-data) Download and parse data To begin working with blog post contents, the process involves loading and parsing the HTML content. This is achieved using `urllib` and `BeautifulSoup`, which are tools designed for such tasks. After the content is loaded and parsed, it is indexed using Qdrant, a powerful tool for managing and querying vector data. The code snippet demonstrates how to load, chunk, and index the contents of a blog post by specifying the URL of the blog and the specific HTML elements to parse. This step is crucial for preparing the data for further processing and analysis with Qdrant. ```python --- # Load, chunk and index the contents of the blog. loader = WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parse_only=bs4.SoupStrainer( class_=("post-content", "post-title", "post-header") ) ), ) docs = loader.load() ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#chunking-data) Chunking data When dealing with large documents, such as a blog post exceeding 42,000 characters, it’s crucial to manage the data efficiently for processing. Many models have a limited context window and struggle with long inputs, making it difficult to extract or find relevant information. To overcome this, the document is divided into smaller chunks. This approach enhances the model’s ability to process and retrieve the most pertinent sections of the document effectively. In this scenario, the document is split into chunks using the `RecursiveCharacterTextSplitter` with a specified chunk size and overlap. This method ensures that no critical information is lost between chunks. Following the splitting, these chunks are then indexed into Qdrant—a vector database for efficient similarity search and storage of embeddings. The `Qdrant.from_documents` function is utilized for indexing, with documents being the split chunks and embeddings generated through `OpenAIEmbeddings`. The entire process is facilitated within an in-memory database, signifying that the operations are performed without the need for persistent storage, and the collection is named “lilianweng” for reference. This chunking and indexing strategy significantly improves the management and retrieval of information from large documents, making it a practical solution for handling extensive texts in data processing workflows. ```python text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) splits = text_splitter.split_documents(docs) vectorstore = Qdrant.from_documents( documents=splits, embedding=OpenAIEmbeddings(), collection_name="lilianweng", url=os.environ["QDRANT_URL"], api_key=os.environ["QDRANT_API_KEY"], ) ``` ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#retrieve-and-generate-content) Retrieve and generate content The `vectorstore` is used as a retriever to fetch relevant documents based on vector similarity. The `hub.pull("rlm/rag-prompt")` function is used to pull a specific prompt from a repository, which is designed to work with retrieved documents and a question to generate a response. The `format_docs` function formats the retrieved documents into a single string, preparing them for further processing. This formatted string, along with a question, is passed through a chain of operations. Firstly, the context (formatted documents) and the question are processed by the retriever and the prompt. Then, the result is fed into a large language model ( `llm`) for content generation. Finally, the output is parsed into a string format using `StrOutputParser()`. This chain of operations demonstrates a sophisticated approach to information retrieval and content generation, leveraging both the semantic understanding capabilities of vector search and the generative prowess of large language models. Now, retrieve and generate data using relevant snippets from the blogL ```python retriever = vectorstore.as_retriever() prompt = hub.pull("rlm/rag-prompt") def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#invoking-the-rag-chain) Invoking the RAG Chain ```python rag_chain.invoke("What is Task Decomposition?") ``` ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/\#next-steps) Next steps: We built a solid foundation for a simple chatbot, but there is still a lot to do. If you want to make the system production-ready, you should consider implementing the mechanism into your existing stack. We recommend Our vector database can easily be hosted on [Scaleway](https://www.scaleway.com/), our trusted [Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/) partner. This means that Qdrant can be run from your Scaleway region, but the database itself can still be managed from within Qdrant Cloud’s interface. Both products have been tested for compatibility and scalability, and we recommend their [managed Kubernetes](https://www.scaleway.com/en/kubernetes-kapsule/) service. Their French deployment regions e.g. France are excellent for network latency and data sovereignty. For hosted GPUs, try [rendering with L4 GPU instances](https://www.scaleway.com/en/l4-gpu-instance/). If you have any questions, feel free to ask on our [Discord community](https://qdrant.to/discord). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-chatbot-scaleway.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-chatbot-scaleway.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-63-lllmstxt|> ## operator-configuration - [Documentation](https://qdrant.tech/documentation/) - [Hybrid cloud](https://qdrant.tech/documentation/hybrid-cloud/) - Configure the Qdrant Operator --- # [Anchor](https://qdrant.tech/documentation/hybrid-cloud/operator-configuration/\#configuring-qdrant-operator-advanced-options) Configuring Qdrant Operator: Advanced Options The Qdrant Operator has several configuration options, which can be configured in the advanced section of your Hybrid Cloud Environment. The following YAML shows all configuration options with their default values: ```yaml --- # Configuration for the Qdrant operator service monitor to scrape metrics serviceMonitor: enabled: false --- # Resource requests and limits for the Qdrant operator resources: {} --- # Node selector for the Qdrant operator nodeSelector: {} --- # Tolerations for the Qdrant operator tolerations: [] --- # Affinity configuration for the Qdrant operator affinity: {} --- # Configuration for the Qdrant operator (v2) settings: # The log level for the operator # Available options: DEBUG | INFO | WARN | ERROR logLevel: INFO # Controller related settings controller: # The period a forced recync is done by the controller (if watches are missed / nothing happened) forceResyncPeriod: 10h # QPS indicates the maximum QPS to the master from this client. # Default is 200 qps: 200 # Maximum burst for throttle. # Default is 500. burst: 500 # Features contains the settings for enabling / disabling the individual features of the operator features: # ClusterManagement contains the settings for qdrant (database) cluster management clusterManagement: # Whether or not the Qdrant cluster features are enabled. # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected. # Default is true. enable: true # The StorageClass used to make database and snapshot PVCs. # Default is nil, meaning the default storage class of Kubernetes. storageClass: # The StorageClass used to make database PVCs. # Default is nil, meaning the default storage class of Kubernetes. #database: # The StorageClass used to make snapshot PVCs. # Default is nil, meaning the default storage class of Kubernetes. #snapshot: # Qdrant config contains settings specific for the database qdrant: # The config where to find the image for qdrant image: # The repository where to find the image for qdrant # Default is "qdrant/qdrant" repository: qdrant/qdrant # Docker image pull policy # Default "IfNotPresent", unless the tag is dev, master or latest. Then "Always" #pullPolicy: # Docker image pull secret name # This secret should be available in the namespace where the cluster is running # Default not set #pullSecretName: # storage contains the settings for the storage of the Qdrant cluster storage: performance: # CPU budget, how many CPUs (threads) to allocate for an optimization job. # If 0 - auto selection, keep 1 or more CPUs unallocated depending on CPU size # If negative - subtract this number of CPUs from the available CPUs. # If positive - use this exact number of CPUs. optimizerCpuBudget: 0 # Enable async scorer which uses io_uring when rescoring. # Only supported on Linux, must be enabled in your kernel. # See: asyncScorer: false # Qdrant DB log level # Available options: DEBUG | INFO | WARN | ERROR # Default is "INFO" logLevel: INFO # Default Qdrant security context configuration securityContext: # Enable default security context # Default is false enabled: false # Default user for qdrant container # Default not set #user: 1000 # Default fsGroup for qdrant container # Default not set #fsUser: 2000 # Default group for qdrant container # Default not set #group: 3000 # Network policies configuration for the Qdrant databases networkPolicies: ingress: - ports: - protocol: TCP port: 6333 - protocol: TCP port: 6334 # Allow DNS resolution from qdrant pods at Kubernetes internal DNS server egress: - ports: - protocol: UDP port: 53 # Scheduling config contains the settings specific for scheduling scheduling: # Default topology spread constraints (list from type corev1.TopologySpreadConstraint) topologySpreadConstraints: - maxSkew: 1 topologyKey: "kubernetes.io/hostname" whenUnsatisfiable: "ScheduleAnyway" # Default pod disruption budget (object from type policyv1.PodDisruptionBudgetSpec) podDisruptionBudget: maxUnavailable: 1 # ClusterManager config contains the settings specific for cluster manager clusterManager: # Whether or not the cluster manager (on operator level). # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected. # Default is false. enable: true # The endpoint address the cluster manager could be reached # If set, this should be a full URL like: http://cluster-manager.qdrant-cloud-ns.svc.cluster.local:7333 endpointAddress: http://qdrant-cluster-manager:80 # InvocationInterval is the interval between calls (started after the previous call is retured) # Default is 10 seconds invocationInterval: 10s # Timeout is the duration a single call to the cluster manager is allowed to take. # Default is 30 seconds timeout: 30s # Specifies overrides for the manage rules manageRulesOverrides: #dry_run: #max_transfers: #max_transfers_per_collection: #rebalance: #replicate: # Ingress config contains the settings specific for ingress ingress: # Whether or not the Ingress feature is enabled. # Default is true. enable: false # Which specific ingress provider should be used # Default is KubernetesIngress provider: KubernetesIngress # The specific settings when the Provider is QdrantCloudTraefik qdrantCloudTraefik: # Enable tls # Default is false tls: false # Secret with TLS certificate # Default is None secretName: "" # List of Traefik middlewares to apply # Default is an empty list middlewares: [] # IP Allowlist Strategy for Traefik # Default is None ipAllowlistStrategy: # Enable body validator plugin and matching ingressroute rules # Default is false enableBodyValidatorPlugin: false # The specific settings when the Provider is KubernetesIngress kubernetesIngress: # Name of the ingress class # Default is None #ingressClassName: # TelemetryTimeout is the duration a single call to the cluster telemetry endpoint is allowed to take. # Default is 3 seconds telemetryTimeout: 3s # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 20. maxConcurrentReconciles: 20 # VolumeExpansionMode specifies the expansion mode, which can be online or offline (e.g. in case of Azure). # Available options: Online, Offline # Default is Online volumeExpansionMode: Online # BackupManagementConfig contains the settings for backup management backupManagement: # Whether or not the backup features are enabled. # If disabled, all other properties in this struct are disregarded. Otherwise, the individual features will be inspected. # Default is true. enable: true # Snapshots contains the settings for snapshots as part of backup management. snapshots: # Whether or not the Snapshot feature is enabled. # Default is true. enable: true # The VolumeSnapshotClass used to make VolumeSnapshots. # Default is "csi-snapclass". volumeSnapshotClass: "csi-snapclass" # The duration a snapshot is retained when the phase becomes Failed or Skipped # Default is 72h (3d). retainUnsuccessful: 72h # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1. maxConcurrentReconciles: 1 # ScheduledSnapshots contains the settings for scheduled snapshot as part of backup management. scheduledSnapshots: # Whether or not the ScheduledSnapshot feature is enabled. # Default is true. enable: true # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1. maxConcurrentReconciles: 1 # Restores contains the settings for restoring (a snapshot) as part of backup management. restores: # Whether or not the Restore feature is enabled. # Default is true. enable: true # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1. maxConcurrentReconciles: 1 ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/operator-configuration.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/operator-configuration.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-64-lllmstxt|> ## fastembed-colbert - [Documentation](https://qdrant.tech/documentation/) - [Fastembed](https://qdrant.tech/documentation/fastembed/) - Working with ColBERT --- # [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-colbert/\#how-to-generate-colbert-multivectors-with-fastembed) How to Generate ColBERT Multivectors with FastEmbed ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-colbert/\#colbert) ColBERT ColBERT is an embedding model that produces a matrix (multivector) representation of input text, generating one vector per token (a token being a meaningful text unit for a machine learning model). This approach allows ColBERT to capture more nuanced input semantics than many dense embedding models, which represent an entire input with a single vector. By producing more granular input representations, ColBERT becomes a strong retriever. However, this advantage comes at the cost of increased resource consumption compared to traditional dense embedding models, both in terms of speed and memory. Despite ColBERT being a powerful retriever, its speed limitation might make it less suitable for large-scale retrieval. Therefore, we generally recommend using ColBERT for reranking a small set of already retrieved examples, rather than for first-stage retrieval. A simple dense retriever can initially retrieve around 100-500 candidates, which can then be reranked with ColBERT to bring the most relevant results to the top. ColBERT is a considerable alternative of a reranking model to [cross-encoders](https://sbert.net/examples/applications/cross-encoder/README.html), since it tends to be faster on inference time due to its `late interaction` mechanism. How does `late interaction` work? Cross-encoders ingest a query and a document glued together as one input. A cross-encoder model divides this input into meaningful (for the model) parts and checks how these parts relate. So, all interactions between the query and the document happen “early” inside the model. Late interaction models, such as ColBERT, only do the first part, generating document and query parts suitable for comparison. All interactions between these parts are expected to be done “later” outside the model. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-colbert/\#using-colbert-in-qdrant) Using ColBERT in Qdrant Qdrant supports [multivector representations](https://qdrant.tech/documentation/concepts/vectors/#multivectors) out of the box so that you can use any late interaction model as `ColBERT` or `ColPali` in Qdrant without any additional pre/post-processing. This tutorial uses ColBERT as a first-stage retriever on a toy dataset. You can see how to use ColBERT as a reranker in our [multi-stage queries documentation](https://qdrant.tech/documentation/concepts/hybrid-queries/#multi-stage-queries). ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-colbert/\#setup) Setup Install `fastembed`. ```python pip install fastembed ``` Imports late interaction models for text embedding. ```python from fastembed import LateInteractionTextEmbedding ``` You can list which late interaction models are supported in FastEmbed. ```python LateInteractionTextEmbedding.list_supported_models() ``` This command displays the available models. The output shows details about the model, including output embedding dimensions, model description, model size, model sources, and model file. ```python [{'model': 'colbert-ir/colbertv2.0',\ 'dim': 128,\ 'description': 'Late interaction model',\ 'size_in_GB': 0.44,\ 'sources': {'hf': 'colbert-ir/colbertv2.0'},\ 'model_file': 'model.onnx'},\ {'model': 'answerdotai/answerai-colbert-small-v1',\ 'dim': 96,\ 'description': 'Text embeddings, Unimodal (text), Multilingual (~100 languages), 512 input tokens truncation, 2024 year',\ 'size_in_GB': 0.13,\ 'sources': {'hf': 'answerdotai/answerai-colbert-small-v1'},\ 'model_file': 'vespa_colbert.onnx'}] ``` Now, load the model. ```python model_name = "colbert-ir/colbertv2.0" embedding_model = LateInteractionTextEmbedding(model_name) ``` The model files will be fetched and downloaded, with progress showing. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-colbert/\#embed-data) Embed data We will vectorize a toy movie description dataset with ColBERT: Movie description dataset ```python descriptions = ["In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions.",\ "A film projectionist longs to be a detective, and puts his meagre skills to work when he is framed by a rival for stealing his girlfriend's father's pocketwatch.",\ "A group of high-end professional thieves start to feel the heat from the LAPD when they unknowingly leave a clue at their latest heist.",\ "A petty thief with an utter resemblance to a samurai warlord is hired as the lord's double. When the warlord later dies the thief is forced to take up arms in his place.",\ "A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past.",\ "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre.",\ "When a machine that allows therapists to enter their patients' dreams is stolen, all hell breaks loose. Only a young female therapist, Paprika, can stop it.",\ "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop.",\ "A story that revolves around drug abuse in the affluent north Indian State of Punjab and how the youth there have succumbed to it en-masse resulting in a socio-economic decline.",\ "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent.",\ "Concurrent theatrical ending of the TV series Neon Genesis Evangelion (1995).",\ "During World War II, a rebellious U.S. Army Major is assigned a dozen convicted murderers to train and lead them into a mass assassination mission of German officers.",\ "The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home.",\ "A soldier fighting aliens gets to relive the same day over and over again, the day restarting every time he dies.",\ "After two male musicians witness a mob hit, they flee the state in an all-female band disguised as women, but further complications set in.",\ "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household.",\ "A renegade reporter trailing a young runaway heiress for a big story joins her on a bus heading from Florida to New York, and they end up stuck with each other when the bus leaves them behind at one of the stops.",\ "Story of 40-man Turkish task force who must defend a relay station.",\ "Spinal Tap, one of England's loudest bands, is chronicled by film director Marty DiBergi on what proves to be a fateful tour.",\ "Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl."] ``` The vectorization is done with an `embed` generator function. ```python descriptions_embeddings = list( embedding_model.embed(descriptions) ) ``` Let’s check the size of one of the produced embeddings. ```python descriptions_embeddings[0].shape ``` We get the following result ```bash (48, 128) ``` That means that for the first description, we have **48** vectors of lengths **128** representing it. ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-colbert/\#upload-embeddings-to-qdrant) Upload embeddings to Qdrant Install `qdrant-client` ```python pip install "qdrant-client>=1.14.2" ``` Qdrant Client has a simple in-memory mode that allows you to experiment locally on small data volumes. Alternatively, you could use for experiments [a free cluster](https://qdrant.tech/documentation/cloud/create-cluster/#create-a-cluster) in Qdrant Cloud. ```python from qdrant_client import QdrantClient, models qdrant_client = QdrantClient(":memory:") # Qdrant is running from RAM. ``` Now, let’s create a small [collection](https://qdrant.tech/documentation/concepts/collections/) with our movie data. For that, we will use the [multivectors](https://qdrant.tech/documentation/concepts/vectors/#multivectors) functionality supported in Qdrant. To configure multivector collection, we need to specify: - similarity metric between vectors; - the size of each vector (for ColBERT, it’s **128**); - similarity metric between multivectors (matrices), for example, `maximum`, so for vector from matrix A, we find the most similar vector from matrix B, and their similarity score will be out matrix similarity. ```python qdrant_client.create_collection( collection_name="movies", vectors_config=models.VectorParams( size=128, #size of each vector produced by ColBERT distance=models.Distance.COSINE, #similarity metric between each vector multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM #similarity metric between multivectors (matrices) ), ), ) ``` To make this collection human-readable, let’s save movie metadata (name, description in text form and movie’s length) together with an embedded description. Movie metadata ```python metadata = [{"movie_name": "The Passion of Joan of Arc", "movie_watch_time_min": 114, "movie_description": "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions."},\ {"movie_name": "Sherlock Jr.", "movie_watch_time_min": 45, "movie_description": "A film projectionist longs to be a detective, and puts his meagre skills to work when he is framed by a rival for stealing his girlfriend's father's pocketwatch."},\ {"movie_name": "Heat", "movie_watch_time_min": 170, "movie_description": "A group of high-end professional thieves start to feel the heat from the LAPD when they unknowingly leave a clue at their latest heist."},\ {"movie_name": "Kagemusha", "movie_watch_time_min": 162, "movie_description": "A petty thief with an utter resemblance to a samurai warlord is hired as the lord's double. When the warlord later dies the thief is forced to take up arms in his place."},\ {"movie_name": "Kubo and the Two Strings", "movie_watch_time_min": 101, "movie_description": "A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past."},\ {"movie_name": "Sardar Udham", "movie_watch_time_min": 164, "movie_description": "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre."},\ {"movie_name": "Paprika", "movie_watch_time_min": 90, "movie_description": "When a machine that allows therapists to enter their patients' dreams is stolen, all hell breaks loose. Only a young female therapist, Paprika, can stop it."},\ {"movie_name": "After Hours", "movie_watch_time_min": 97, "movie_description": "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop."},\ {"movie_name": "Udta Punjab", "movie_watch_time_min": 148, "movie_description": "A story that revolves around drug abuse in the affluent north Indian State of Punjab and how the youth there have succumbed to it en-masse resulting in a socio-economic decline."},\ {"movie_name": "Philomena", "movie_watch_time_min": 98, "movie_description": "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent."},\ {"movie_name": "Neon Genesis Evangelion: The End of Evangelion", "movie_watch_time_min": 87, "movie_description": "Concurrent theatrical ending of the TV series Neon Genesis Evangelion (1995)."},\ {"movie_name": "The Dirty Dozen", "movie_watch_time_min": 150, "movie_description": "During World War II, a rebellious U.S. Army Major is assigned a dozen convicted murderers to train and lead them into a mass assassination mission of German officers."},\ {"movie_name": "Toy Story 3", "movie_watch_time_min": 103, "movie_description": "The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home."},\ {"movie_name": "Edge of Tomorrow", "movie_watch_time_min": 113, "movie_description": "A soldier fighting aliens gets to relive the same day over and over again, the day restarting every time he dies."},\ {"movie_name": "Some Like It Hot", "movie_watch_time_min": 121, "movie_description": "After two male musicians witness a mob hit, they flee the state in an all-female band disguised as women, but further complications set in."},\ {"movie_name": "Snow White and the Seven Dwarfs", "movie_watch_time_min": 83, "movie_description": "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household."},\ {"movie_name": "It Happened One Night", "movie_watch_time_min": 105, "movie_description": "A renegade reporter trailing a young runaway heiress for a big story joins her on a bus heading from Florida to New York, and they end up stuck with each other when the bus leaves them behind at one of the stops."},\ {"movie_name": "Nefes: Vatan Sagolsun", "movie_watch_time_min": 128, "movie_description": "Story of 40-man Turkish task force who must defend a relay station."},\ {"movie_name": "This Is Spinal Tap", "movie_watch_time_min": 82, "movie_description": "Spinal Tap, one of England's loudest bands, is chronicled by film director Marty DiBergi on what proves to be a fateful tour."},\ {"movie_name": "Let the Right One In", "movie_watch_time_min": 114, "movie_description": "Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl."}] ``` ```python qdrant_client.upload_points( collection_name="movies", points=[\ models.PointStruct(\ id=idx,\ payload=metadata[idx],\ vector=vector\ )\ for idx, vector in enumerate(descriptions_embeddings)\ ], ) ``` Upload with implicit embeddings computation ```python description_documents = [models.Document(text=description, model=model_name) for description in descriptions] qdrant_client.upload_points( collection_name="movies", points=[\ models.PointStruct(\ id=idx,\ payload=metadata[idx],\ vector=description_document\ )\ for idx, description_document in enumerate(description_documents)\ ], ) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-colbert/\#querying) Querying ColBERT uses two distinct methods for embedding documents and queries, as do we in Fastembed. However, we altered query pre-processing used in ColBERT, so we don’t have to cut all queries after 32-token length but ingest longer queries directly. ```python qdrant_client.query_points( collection_name="movies", query=list(embedding_model.query_embed("A movie for kids with fantasy elements and wonders"))[0], #converting generator object into numpy.ndarray limit=1, #How many closest to the query movies we would like to get #with_vectors=True, #If this option is used, vectors will also be returned with_payload=True #So metadata is provided in the output ) ``` Query points with implicit embeddings computation ```python query_document = models.Document(text="A movie for kids with fantasy elements and wonders", model=model_name) qdrant_client.query_points( collection_name="movies", query=query_document, limit=1, ) ``` The result is the following: ```bash QueryResponse(points=[ScoredPoint(id=4, version=0, score=12.063469,\ payload={'movie_name': 'Kubo and the Two Strings', 'movie_watch_time_min': 101,\ 'movie_description': 'A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past.'},\ vector=None, shard_key=None, order_value=None)]) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-colbert.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-colbert.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-65-lllmstxt|> ## cross-encoder-integration-gsoc - [Articles](https://qdrant.tech/articles/) - Qdrant Summer of Code 2024 - ONNX Cross Encoders in Python [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Qdrant Summer of Code 2024 - ONNX Cross Encoders in Python Huong (Celine) Hoang · October 14, 2024 ![Qdrant Summer of Code 2024 - ONNX Cross Encoders in Python](https://qdrant.tech/articles_data/cross-encoder-integration-gsoc/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#introduction) Introduction Hi everyone! I’m Huong (Celine) Hoang, and I’m thrilled to share my experience working at Qdrant this summer as part of their Summer of Code 2024 program. During my internship, I worked on integrating cross-encoders into the FastEmbed library for re-ranking tasks. This enhancement widened the capabilities of the Qdrant ecosystem, enabling developers to build more context-aware search applications, such as question-answering systems, using Qdrant’s suite of libraries. This project was both technically challenging and rewarding, pushing me to grow my skills in handling large-scale ONNX (Open Neural Network Exchange) model integrations, tokenization, and more. Let me take you through the journey, the lessons learned, and where things are headed next. ## [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#project-overview) Project Overview Qdrant is well known for its vector search capabilities, but my task was to go one step further — introducing cross-encoders for re-ranking. Traditionally, the FastEmbed library would generate embeddings, but cross-encoders don’t do that. Instead, they provide a list of scores based on how well a query matches a list of documents. This kind of re-ranking is critical when you want to refine search results and bring the most relevant answers to the top. The project revolved around creating a new input-output scheme: text data to scores. For this, I designed a family of classes to support ONNX models. Some of the key models I worked with included Xenova/ms-marco-MiniLM-L-6-v2, Xenova/ms-marco-MiniLM-L-12-v2, and BAAI/bge-reranker, all designed for re-ranking tasks. An important point to mention is that FastEmbed is a minimalistic library: it doesn’t have heavy dependencies like PyTorch or TensorFlow, and as a result, it is lightweight, occupying far less storage space. Below is a diagram that represents the overall workflow for this project, detailing the key steps from user interaction to the final output validation: ![Search workflow with reranking](https://qdrant.tech/articles_data/cross-encoder-integration-gsoc/rerank-workflow.png) Search workflow with reranking ## [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#technical-challenges) Technical Challenges ### [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#1-building-a-new-input-output-scheme) 1\. Building a New Input-Output Scheme FastEmbed already had support for embeddings, but re-ranking with cross-encoders meant building a completely new family of classes. These models accept a query and a set of documents, then return a list of relevance scores. For that, I created the base classes like `TextCrossEncoderBase` and `OnnxCrossEncoder`, taking inspiration from existing text embedding models. One thing I had to ensure was that the new class hierarchy was user-friendly. Users should be able to work with cross-encoders without needing to know the complexities of the underlying models. For instance, they should be able to just write: ```python from fastembed.rerank.cross_encoder import TextCrossEncoder encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2") scores = encoder.rerank(query, documents) ``` Meanwhile, behind the scenes, we manage all the model loading, tokenization, and scoring. ### [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#2-handling-tokenization-for-cross-encoders) 2\. Handling Tokenization for Cross-Encoders Cross-encoders require careful tokenization because they need to distinguish between the query and the documents. This is done using token type IDs, which help the model differentiate between the two. To implement this, I configured the tokenizer to handle pairs of inputs—concatenating the query with each document and assigning token types accordingly. Efficient tokenization is critical to ensure the performance of the models, and I optimized it specifically for ONNX models. ### [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#3-model-loading-and-integration) 3\. Model Loading and Integration One of the most rewarding parts of the project was integrating the ONNX models into the FastEmbed library. ONNX models need to be loaded into a runtime environment that efficiently manages the computations. While PyTorch is a common framework for these types of tasks, FastEmbed exclusively supports ONNX models, making it both lightweight and efficient. I focused on extensive testing to ensure that the ONNX models performed equivalently to their PyTorch counterparts, ensuring users could trust the results. I added support for batching as well, allowing users to re-rank large sets of documents without compromising speed. ### [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#4-debugging-and-code-reviews) 4\. Debugging and Code Reviews During the project, I encountered a number of challenges, including issues with model configurations, tokenizers, and test cases. With the help of my mentor, George Panchuk, I was able to resolve these issues and improve my understanding of best practices, particularly around code readability, maintainability, and style. One notable lesson was the importance of keeping the code organized and maintainable, with a strong focus on readability. This included properly structuring modules and ensuring the entire codebase followed a clear, consistent style. ### [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#5-testing-and-validation) 5\. Testing and Validation To ensure the accuracy and performance of the models, I conducted extensive testing. I compared the output of ONNX models with their PyTorch counterparts, ensuring the conversion to ONNX was correct. A key part of this process was rigorous testing to verify the outputs and identify potential issues, such as incorrect conversions or bugs in our implementation. For instance, a test to validate the model’s output was structured as follows: ```python def test_rerank(): is_ci = os.getenv("CI") for model_desc in TextCrossEncoder.list_supported_models(): if not is_ci and model_desc["size_in_GB"] > 1: continue model_name = model_desc["model"] model = TextCrossEncoder(model_name=model_name) query = "What is the capital of France?" documents = ["Paris is the capital of France.", "Berlin is the capital of Germany."] scores = np.array(model.rerank(query, documents)) canonical_scores = CANONICAL_SCORE_VALUES[model_name] assert np.allclose( scores, canonical_scores, atol=1e-3 ), f"Model: {model_name}, Scores: {scores}, Expected: {canonical_scores}" ``` The `CANONICAL_SCORE_VALUES` were retrieved directly from the result of applying the original PyTorch models to the same input ## [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#outcomes-and-future-improvements) Outcomes and Future Improvements By the end of my project, I successfully added cross-encoders to the FastEmbed library, allowing users to re-rank search results based on relevance scores. This enhancement opens up new possibilities for applications that rely on contextual ranking, such as search engines and recommendation systems. This functionality will be available as of FastEmbed `0.4.0`. Some areas for future improvements include: - Expanding Model Support: We could add more cross-encoder models, especially from the sentence transformers library, to give users more options. - Parallelization: Optimizing batch processing to handle even larger datasets could further improve performance. - Custom Tokenization: For models with non-standard tokenization, like BAAI/bge-reranker, more specific tokenizer configurations could be added. ## [Anchor](https://qdrant.tech/articles/cross-encoder-integration-gsoc/\#overall-experience-and-wrapping-up) Overall Experience and Wrapping Up Looking back, this internship has been an incredibly valuable experience. I’ve grown not only as a developer but also as someone who can take on complex projects and see them through from start to finish. The Qdrant team has been so supportive, especially during the debugging and review stages. I’ve learned so much about model integration, ONNX, and how to build tools that are user-friendly and scalable. One key takeaway for me is the importance of understanding the user experience. It’s not just about getting the models to work but making sure they are easy to use and integrate into real-world applications. This experience has solidified my passion for building solutions that truly make an impact, and I’m excited to continue working on projects like this in the future. Thank you for taking the time to read about my journey with Qdrant and the FastEmbed library. I’m excited to see how this work will continue to improve search experiences for users! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/cross-encoder-integration-gsoc.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/cross-encoder-integration-gsoc.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-66-lllmstxt|> ## qdrant-0-10-release - [Articles](https://qdrant.tech/articles/) - Qdrant 0.10 released [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Qdrant 0.10 released Kacper Łukawski · September 19, 2022 ![Qdrant 0.10 released](https://qdrant.tech/articles_data/qdrant-0-10-release/preview/title.jpg) [Qdrant 0.10 is a new version](https://github.com/qdrant/qdrant/releases/tag/v0.10.0) that brings a lot of performance improvements, but also some new features which were heavily requested by our users. Here is an overview of what has changed. ## [Anchor](https://qdrant.tech/articles/qdrant-0-10-release/\#storing-multiple-vectors-per-object) Storing multiple vectors per object Previously, if you wanted to use semantic search with multiple vectors per object, you had to create separate collections for each vector type. This was even if the vectors shared some other attributes in the payload. With Qdrant 0.10, you can now store all of these vectors together in the same collection, which allows you to share a single copy of the payload. This makes it easier to use semantic search with multiple vector types, and reduces the amount of work you need to do to set up your collections. ## [Anchor](https://qdrant.tech/articles/qdrant-0-10-release/\#batch-vector-search) Batch vector search Previously, you had to send multiple requests to the Qdrant API to perform multiple non-related tasks. However, this can cause significant network overhead and slow down the process, especially if you have a poor connection speed. Fortunately, the [new batch search feature](https://qdrant.tech/documentation/concepts/search/#batch-search-api) allows you to avoid this issue. With just one API call, Qdrant will handle multiple search requests in the most efficient way possible. This means that you can perform multiple tasks simultaneously without having to worry about network overhead or slow performance. ## [Anchor](https://qdrant.tech/articles/qdrant-0-10-release/\#built-in-arm-support) Built-in ARM support To make our application accessible to ARM users, we have compiled it specifically for that platform. If it is not compiled for ARM, the device will have to emulate it, which can slow down performance. To ensure the best possible experience for ARM users, we have created Docker images specifically for that platform. Keep in mind that using a limited set of processor instructions may affect the performance of your vector search. Therefore, we have tested both ARM and non-ARM architectures using similar setups to understand the potential impact on performance. ## [Anchor](https://qdrant.tech/articles/qdrant-0-10-release/\#full-text-filtering) Full-text filtering Qdrant is a vector database that allows you to quickly search for the nearest neighbors. However, you may need to apply additional filters on top of the semantic search. Up until version 0.10, Qdrant only supported keyword filters. With the release of Qdrant 0.10, [you can now use full-text filters](https://qdrant.tech/documentation/concepts/filtering/#full-text-match) as well. This new filter type can be used on its own or in combination with other filter types to provide even more flexibility in your searches. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-0-10-release.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-0-10-release.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-67-lllmstxt|> ## installation - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Installation --- # [Anchor](https://qdrant.tech/documentation/guides/installation/\#installation-requirements) Installation requirements The following sections describe the requirements for deploying Qdrant. ## [Anchor](https://qdrant.tech/documentation/guides/installation/\#cpu-and-memory) CPU and memory The preferred size of your CPU and RAM depends on: - Number of vectors - Vector dimensions - [Payloads](https://qdrant.tech/documentation/concepts/payload/) and their indexes - Storage - Replication - How you configure quantization Our [Cloud Pricing Calculator](https://cloud.qdrant.io/calculator) can help you estimate required resources without payload or index data. ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#supported-cpu-architectures) Supported CPU architectures: **64-bit system:** - x86\_64/amd64 - AArch64/arm64 **32-bit system:** - Not supported ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#storage) Storage For persistent storage, Qdrant requires block-level access to storage devices with a [POSIX-compatible file system](https://www.quobyte.com/storage-explained/posix-filesystem/). Network systems such as [iSCSI](https://en.wikipedia.org/wiki/ISCSI) that provide block-level access are also acceptable. Qdrant won’t work with [Network file systems](https://en.wikipedia.org/wiki/File_system#Network_file_systems) such as NFS, or [Object storage](https://en.wikipedia.org/wiki/Object_storage) systems such as S3. If you offload vectors to a local disk, we recommend you use a solid-state (SSD or NVMe) drive. ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#networking) Networking Each Qdrant instance requires three open ports: - `6333` \- For the HTTP API, for the [Monitoring](https://qdrant.tech/documentation/guides/monitoring/) health and metrics endpoints - `6334` \- For the [gRPC](https://qdrant.tech/documentation/interfaces/#grpc-interface) API - `6335` \- For [Distributed deployment](https://qdrant.tech/documentation/guides/distributed_deployment/) All Qdrant instances in a cluster must be able to: - Communicate with each other over these ports - Allow incoming connections to ports `6333` and `6334` from clients that use Qdrant. ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#security) Security The default configuration of Qdrant might not be secure enough for every situation. Please see [our security documentation](https://qdrant.tech/documentation/guides/security/) for more information. ## [Anchor](https://qdrant.tech/documentation/guides/installation/\#installation-options) Installation options Qdrant can be installed in different ways depending on your needs: For production, you can use our Qdrant Cloud to run Qdrant either fully managed in our infrastructure or with Hybrid Cloud in yours. If you want to run Qdrant in your own infrastructure, without any cloud connection, we recommend to install Qdrant in a Kubernetes cluster with our Qdrant Private Cloud Enterprise Operator. For testing or development setups, you can run the Qdrant container or as a binary executable. We also provide a Helm chart for an easy installation in Kubernetes. ## [Anchor](https://qdrant.tech/documentation/guides/installation/\#production) Production ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#qdrant-cloud) Qdrant Cloud You can set up production with the [Qdrant Cloud](https://qdrant.to/cloud), which provides fully managed Qdrant databases. It provides horizontal and vertical scaling, one click installation and upgrades, monitoring, logging, as well as backup and disaster recovery. For more information, see the [Qdrant Cloud documentation](https://qdrant.tech/documentation/cloud/). ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#qdrant-kubernetes-operator) Qdrant Kubernetes Operator We provide a Qdrant Enterprise Operator for Kubernetes installations as part of our [Qdrant Private Cloud](https://qdrant.tech/documentation/private-cloud/) offering. For more information, [use this form](https://qdrant.to/contact-us) to contact us. ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#kubernetes) Kubernetes You can use a ready-made [Helm Chart](https://helm.sh/docs/) to run Qdrant in your Kubernetes cluster. While it is possible to deploy Qdrant in a distributed setup with the Helm chart, it does not come with the same level of features for zero-downtime upgrades, up and down-scaling, monitoring, logging, and backup and disaster recovery as the Qdrant Cloud offering or the Qdrant Private Cloud Enterprise Operator. Instead you must manage and set this up [yourself](https://qdrant.tech/documentation/guides/distributed_deployment/). Support for the Helm chart is limited to community support. The following table gives you an overview about the feature differences between the Qdrant Cloud and the Helm chart: | Feature | Qdrant Helm Chart | Qdrant Cloud | | --- | --- | --- | | Open-source | ✅ | | | Community support only | ✅ | | | Quick to get started | ✅ | ✅ | | Vertical and horizontal scaling | ✅ | ✅ | | API keys with granular access control | ✅ | ✅ | | Qdrant version upgrades | ✅ | ✅ | | Support for transit and storage encryption | ✅ | ✅ | | Zero-downtime upgrades with optimized restart strategy | | ✅ | | Production ready out-of the box | | ✅ | | Dataloss prevention on downscaling | | ✅ | | Full cluster backup and disaster recovery | | ✅ | | Automatic shard rebalancing | | ✅ | | Re-sharding support | | ✅ | | Automatic persistent volume scaling | | ✅ | | Advanced telemetry | | ✅ | | One-click API key revoking | | ✅ | | Recreating nodes with new volumes in existing cluster | | ✅ | | Enterprise support | | ✅ | To install the helm chart: ```bash helm repo add qdrant https://qdrant.to/helm helm install qdrant qdrant/qdrant ``` For more information, see the [qdrant-helm](https://github.com/qdrant/qdrant-helm/tree/main/charts/qdrant) README. ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#docker-and-docker-compose) Docker and Docker Compose Usually, we recommend to run Qdrant in Kubernetes, or use the Qdrant Cloud for production setups. This makes setting up highly available and scalable Qdrant clusters with backups and disaster recovery a lot easier. However, you can also use Docker and Docker Compose to run Qdrant in production, by following the setup instructions in the [Docker](https://qdrant.tech/documentation/guides/installation/#docker) and [Docker Compose](https://qdrant.tech/documentation/guides/installation/#docker-compose) Development sections. In addition, you have to make sure: - To use a performant [persistent storage](https://qdrant.tech/documentation/guides/installation/#storage) for your data - To configure the [security settings](https://qdrant.tech/documentation/guides/security/) for your deployment - To set up and configure Qdrant on multiple nodes for a highly available [distributed deployment](https://qdrant.tech/documentation/guides/distributed_deployment/) - To set up a load balancer for your Qdrant cluster - To create a [backup and disaster recovery strategy](https://qdrant.tech/documentation/concepts/snapshots/) for your data - To integrate Qdrant with your [monitoring](https://qdrant.tech/documentation/guides/monitoring/) and logging solutions ## [Anchor](https://qdrant.tech/documentation/guides/installation/\#development) Development For development and testing, we recommend that you set up Qdrant in Docker. We also have different client libraries. ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#docker) Docker The easiest way to start using Qdrant for testing or development is to run the Qdrant container image. The latest versions are always available on [DockerHub](https://hub.docker.com/r/qdrant/qdrant/tags?page=1&ordering=last_updated). Make sure that [Docker](https://docs.docker.com/engine/install/), [Podman](https://podman.io/docs/installation) or the container runtime of your choice is installed and running. The following instructions use Docker. Pull the image: ```bash docker pull qdrant/qdrant ``` In the following command, revise `$(pwd)/path/to/data` for your Docker configuration. Then use the updated command to run the container: ```bash docker run -p 6333:6333 \ -v $(pwd)/path/to/data:/qdrant/storage \ qdrant/qdrant ``` With this command, you start a Qdrant instance with the default configuration. It stores all data in the `./path/to/data` directory. By default, Qdrant uses port 6333, so at [localhost:6333](http://localhost:6333/) you should see the welcome message. To change the Qdrant configuration, you can overwrite the production configuration: ```bash docker run -p 6333:6333 \ -v $(pwd)/path/to/data:/qdrant/storage \ -v $(pwd)/path/to/custom_config.yaml:/qdrant/config/production.yaml \ qdrant/qdrant ``` Alternatively, you can use your own `custom_config.yaml` configuration file: ```bash docker run -p 6333:6333 \ -v $(pwd)/path/to/data:/qdrant/storage \ -v $(pwd)/path/to/custom_config.yaml:/qdrant/config/custom_config.yaml \ qdrant/qdrant \ ./qdrant --config-path config/custom_config.yaml ``` For more information, see the [Configuration](https://qdrant.tech/documentation/guides/configuration/) documentation. ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#docker-compose) Docker Compose You can also use [Docker Compose](https://docs.docker.com/compose/) to run Qdrant. Here is an example customized compose file for a single node Qdrant cluster: ```yaml services: qdrant: image: qdrant/qdrant:latest restart: always container_name: qdrant ports: - 6333:6333 - 6334:6334 expose: - 6333 - 6334 - 6335 configs: - source: qdrant_config target: /qdrant/config/production.yaml volumes: - ./qdrant_data:/qdrant/storage configs: qdrant_config: content: | log_level: INFO ``` ### [Anchor](https://qdrant.tech/documentation/guides/installation/\#from-source) From source Qdrant is written in Rust and can be compiled into a binary executable. This installation method can be helpful if you want to compile Qdrant for a specific processor architecture or if you do not want to use Docker. Before compiling, make sure that the necessary libraries and the [rust toolchain](https://www.rust-lang.org/tools/install) are installed. The current list of required libraries can be found in the [Dockerfile](https://github.com/qdrant/qdrant/blob/master/Dockerfile). Build Qdrant with Cargo: ```bash cargo build --release --bin qdrant ``` After a successful build, you can find the binary in the following subdirectory `./target/release/qdrant`. ## [Anchor](https://qdrant.tech/documentation/guides/installation/\#client-libraries) Client libraries In addition to the service, Qdrant provides a variety of client libraries for different programming languages. For a full list, see our [Client libraries](https://qdrant.tech/documentation/interfaces/#client-libraries) documentation. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/installation.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/installation.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-68-lllmstxt|> ## permission-reference - [Documentation](https://qdrant.tech/documentation/) - [Cloud rbac](https://qdrant.tech/documentation/cloud-rbac/) - Permission Reference --- # [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#permission-reference)**Permission Reference** This document outlines the permissions available in Qdrant Cloud. * * * > 💡 When enabling `write:*` permissions in the UI, the corresponding `read:*` permission will also be enabled and non-actionable. This guarantees access to resources after creating and/or updating them. ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#identity-and-access-management)**Identity and Access Management** Permissions for users, user roles, management keys, and invitations. | Permission | Description | | --- | --- | | `read:roles` | View roles in the Access Management page. | | `write:roles` | Create and modify roles in the Access Management page. | | `delete:roles` | Remove roles in the Access Management page. | | `read:management_keys` | View Cloud Management Keys in the Access Management page. | | `write:management_keys` | Create and manage Cloud Management Keys. | | `delete:management_keys` | Remove Cloud Management Keys in the Access Management page. | | `write:invites` | Invite new users to an account and revoke invitations. | | `read:invites` | View pending invites in an account. | | `delete:invites` | Remove an invitation. | | `read:users` | View user details in the profile page.
\- Also applicable in User Management and Role details (User tab). | | `delete:users` | Remove users from an account.
\- Applicable in User Management and Role details (User tab). | * * * ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#cluster)**Cluster** Permissions for API Keys, backups, clusters, and backup schedules. ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#api-keys)**API Keys** | Permission | Description | | --- | --- | | `read:api_keys` | View Database API Keys for Managed Cloud clusters. | | `write:api_keys` | Create new Database API Keys for Managed Cloud clusters. | | `delete:api_keys` | Remove Database API Keys for Managed Cloud clusters. | ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#backups)**Backups** | Permission | Description | | --- | --- | | `read:backups` | View backups in the **Backups page** and **Cluster details > Backups tab**. | | `write:backups` | Create backups from the **Backups page** and **Cluster details > Backups tab**. | | `delete:backups` | Remove backups from the **Backups page** and **Cluster details > Backups tab**. | ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#clusters)**Clusters** | Permission | Description | | --- | --- | | `read:clusters` | View cluster details. | | `write:clusters` | Modify cluster settings. | | `delete:clusters` | Delete clusters. | ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#backup-schedules)**Backup Schedules** | Permission | Description | | --- | --- | | `read:backup_schedules` | View backup schedules in the **Backups page** and **Cluster details > Backups tab**. | | `write:backup_schedules` | Create backup schedules from the **Backups page** and **Cluster details > Backups tab**. | | `delete:backup_schedules` | Remove backup schedules from the **Backups page** and **Cluster details > Backups tab**. | * * * ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#hybrid-cloud)**Hybrid Cloud** Permissions for Hybrid Cloud environments. | Permission | Description | | --- | --- | | `read:hybrid_cloud_environments` | View Hybrid Cloud environment details. | | `write:hybrid_cloud_environments` | Modify Hybrid Cloud environment settings. | | `delete:hybrid_cloud_environments` | Delete Hybrid Cloud environments. | * * * ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#payment--billing)**Payment & Billing** Permissions for payment methods and billing information. | Permission | Description | | --- | --- | | `read:payment_information` | View payment methods and billing details. | | `write:payment_information` | Modify or remove payment methods and billing details. | * * * ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#account-management)**Account Management** Permissions for managing user accounts. | Permission | Description | | --- | --- | | `read:account` | View account details that the user is a part of. | | `write:account` | Modify account details such as:
\- Editing the account name
\- Setting an account as default
\- Leaving an account
**(Only available to Owners)** | | `delete:account` | Remove an account from:
\- The **Profile page** (list of user accounts).
\- The **active account** (if the user is an owner/admin). | * * * ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/permission-reference/\#profile)**Profile** Permissions for accessing personal profile information. | Permission | Description | | --- | --- | | `read:profile` | View the user’s own profile information.
**(Assigned to all users by default)** | * * * ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/permission-reference.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/permission-reference.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-69-lllmstxt|> ## vector-search-manuals - [Articles](https://qdrant.tech/articles/) - Vector Search Manuals #### Vector Search Manuals Take full control of your vector data with Qdrant. Learn how to easily store, organize, and optimize vectors for high-performance similarity search. [![Preview](https://qdrant.tech/articles_data/vector-search-production/preview/preview.jpg)\\ **Vector Search in Production** \\ We gathered our most recommended tips and tricks to make your production deployment run smoothly.\\ \\ David Myriel\\ \\ April 30, 2025](https://qdrant.tech/articles/vector-search-production/)[![Preview](https://qdrant.tech/articles_data/indexing-optimization/preview/preview.jpg)\\ **Optimizing Memory for Bulk Uploads** \\ Efficient memory management is key when handling large-scale vector data. Learn how to optimize memory consumption during bulk uploads in Qdrant and keep your deployments performant under heavy load.\\ \\ Sabrina Aquino\\ \\ February 13, 2025](https://qdrant.tech/articles/indexing-optimization/)[![Preview](https://qdrant.tech/articles_data/vector-search-resource-optimization/preview/preview.jpg)\\ **Vector Search Resource Optimization Guide** \\ Learn how to get the most from Qdrant's optimization features. Discover key tricks and best practices to boost vector search performance and reduce Qdrant's resource usage.\\ \\ David Myriel\\ \\ February 09, 2025](https://qdrant.tech/articles/vector-search-resource-optimization/)[![Preview](https://qdrant.tech/articles_data/what-is-a-vector-database/preview/preview.jpg)\\ **What is a Vector Database?** \\ Discover what a vector database is, its core functionalities, and real-world applications.\\ \\ Sabrina Aquino\\ \\ October 09, 2024](https://qdrant.tech/articles/what-is-a-vector-database/)[![Preview](https://qdrant.tech/articles_data/what-is-vector-quantization/preview/preview.jpg)\\ **What is Vector Quantization?** \\ In this article, we'll teach you about compression methods like Scalar, Product, and Binary Quantization. Learn how to choose the best method for your specific application.\\ \\ Sabrina Aquino\\ \\ September 25, 2024](https://qdrant.tech/articles/what-is-vector-quantization/)[![Preview](https://qdrant.tech/articles_data/vector-search-filtering/preview/preview.jpg)\\ **A Complete Guide to Filtering in Vector Search** \\ Learn everything about filtering in Qdrant. Discover key tricks and best practices to boost semantic search performance and reduce Qdrant's resource usage.\\ \\ Sabrina Aquino, David Myriel\\ \\ September 10, 2024](https://qdrant.tech/articles/vector-search-filtering/)[![Preview](https://qdrant.tech/articles_data/hybrid-search/preview/preview.jpg)\\ **Hybrid Search Revamped - Building with Qdrant's Query API** \\ Our new Query API allows you to build a hybrid search system that uses different search methods to improve search quality & experience. Learn more here.\\ \\ Kacper Łukawski\\ \\ July 25, 2024](https://qdrant.tech/articles/hybrid-search/)[![Preview](https://qdrant.tech/articles_data/data-privacy/preview/preview.jpg)\\ **Data Privacy with Qdrant: Implementing Role-Based Access Control (RBAC)** \\ Discover how Qdrant's Role-Based Access Control (RBAC) ensures data privacy and compliance for your AI applications. Build secure and scalable systems with ease. Read more now!\\ \\ Qdrant Team\\ \\ June 18, 2024](https://qdrant.tech/articles/data-privacy/)[![Preview](https://qdrant.tech/articles_data/what-are-embeddings/preview/preview.jpg)\\ **What are Vector Embeddings? - Revolutionize Your Search Experience** \\ Discover the power of vector embeddings. Learn how to harness the potential of numerical machine learning representations to create a personalized Neural Search Service with FastEmbed.\\ \\ Sabrina Aquino\\ \\ February 06, 2024](https://qdrant.tech/articles/what-are-embeddings/)[![Preview](https://qdrant.tech/articles_data/multitenancy/preview/preview.jpg)\\ **How to Implement Multitenancy and Custom Sharding in Qdrant** \\ Discover how multitenancy and custom sharding in Qdrant can streamline your machine-learning operations. Learn how to scale efficiently and manage data securely.\\ \\ David Myriel\\ \\ February 06, 2024](https://qdrant.tech/articles/multitenancy/)[![Preview](https://qdrant.tech/articles_data/sparse-vectors/preview/preview.jpg)\\ **What is a Sparse Vector? How to Achieve Vector-based Hybrid Search** \\ Learn what sparse vectors are, how they work, and their importance in modern data processing. Explore methods like SPLADE for creating and leveraging sparse vectors efficiently.\\ \\ Nirant Kasliwal\\ \\ December 09, 2023](https://qdrant.tech/articles/sparse-vectors/)[![Preview](https://qdrant.tech/articles_data/storing-multiple-vectors-per-object-in-qdrant/preview/preview.jpg)\\ **Optimizing Semantic Search by Managing Multiple Vectors** \\ Discover the power of vector storage optimization and learn how to efficiently manage multiple vectors per object for enhanced semantic search capabilities.\\ \\ Kacper Łukawski\\ \\ October 05, 2022](https://qdrant.tech/articles/storing-multiple-vectors-per-object-in-qdrant/)[![Preview](https://qdrant.tech/articles_data/batch-vector-search-with-qdrant/preview/preview.jpg)\\ **Mastering Batch Search for Vector Optimization** \\ Discover how to optimize your vector search capabilities with efficient batch search. Learn optimization strategies for faster, more accurate results.\\ \\ Kacper Łukawski\\ \\ September 26, 2022](https://qdrant.tech/articles/batch-vector-search-with-qdrant/)[![Preview](https://qdrant.tech/articles_data/neural-search-tutorial/preview/preview.jpg)\\ **Neural Search 101: A Complete Guide and Step-by-Step Tutorial** \\ Discover the power of neural search. Learn what neural search is and follow our tutorial to build a neural search service using BERT, Qdrant, and FastAPI.\\ \\ Andrey Vasnetsov\\ \\ June 10, 2021](https://qdrant.tech/articles/neural-search-tutorial/) × [Powered by](https://qdrant.tech/) <|page-70-lllmstxt|> ## snapshots - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Snapshots --- # [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#snapshots) Snapshots _Available as of v0.8.4_ Snapshots are `tar` archive files that contain data and configuration of a specific collection on a specific node at a specific time. In a distributed setup, when you have multiple nodes in your cluster, you must create snapshots for each node separately when dealing with a single collection. This feature can be used to archive data or easily replicate an existing deployment. For disaster recovery, Qdrant Cloud users may prefer to use [Backups](https://qdrant.tech/documentation/cloud/backups/) instead, which are physical disk-level copies of your data. For a step-by-step guide on how to use snapshots, see our [tutorial](https://qdrant.tech/documentation/tutorials/create-snapshot/). ## [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#create-snapshot) Create snapshot To create a new snapshot for an existing collection: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/snapshots ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.create_snapshot(collection_name="{collection_name}") ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createSnapshot("{collection_name}"); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.create_snapshot("{collection_name}").await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.createSnapshotAsync("{collection_name}").get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.CreateSnapshotAsync("{collection_name}"); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateSnapshot(context.Background(), "{collection_name}") ``` This is a synchronous operation for which a `tar` archive file will be generated into the `snapshot_path`. ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#delete-snapshot) Delete snapshot _Available as of v1.0.0_ httppythontypescriptrustjavacsharpgo ```http DELETE /collections/{collection_name}/snapshots/{snapshot_name} ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.delete_snapshot( collection_name="{collection_name}", snapshot_name="{snapshot_name}" ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.deleteSnapshot("{collection_name}", "{snapshot_name}"); ``` ```rust use qdrant_client::qdrant::DeleteSnapshotRequestBuilder; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .delete_snapshot(DeleteSnapshotRequestBuilder::new( "{collection_name}", "{snapshot_name}", )) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.deleteSnapshotAsync("{collection_name}", "{snapshot_name}").get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.DeleteSnapshotAsync(collectionName: "{collection_name}", snapshotName: "{snapshot_name}"); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.DeleteSnapshot(context.Background(), "{collection_name}", "{snapshot_name}") ``` ## [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#list-snapshot) List snapshot List of snapshots for a collection: httppythontypescriptrustjavacsharpgo ```http GET /collections/{collection_name}/snapshots ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.list_snapshots(collection_name="{collection_name}") ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.listSnapshots("{collection_name}"); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.list_snapshots("{collection_name}").await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.listSnapshotAsync("{collection_name}").get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.ListSnapshotsAsync("{collection_name}"); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.ListSnapshots(context.Background(), "{collection_name}") ``` ## [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#retrieve-snapshot) Retrieve snapshot To download a specified snapshot from a collection as a file: httpshell ```http GET /collections/{collection_name}/snapshots/{snapshot_name} ``` ```shell curl 'http://{qdrant-url}:6333/collections/{collection_name}/snapshots/snapshot-2022-10-10.snapshot' \ -H 'api-key: ********' \ --output 'filename.snapshot' ``` ## [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#restore-snapshot) Restore snapshot Snapshots can be restored in three possible ways: 1. [Recovering from a URL or local file](https://qdrant.tech/documentation/concepts/snapshots/#recover-from-a-url-or-local-file) (useful for restoring a snapshot file that is on a remote server or already stored on the node) 2. [Recovering from an uploaded file](https://qdrant.tech/documentation/concepts/snapshots/#recover-from-an-uploaded-file) (useful for migrating data to a new cluster) 3. [Recovering during start-up](https://qdrant.tech/documentation/concepts/snapshots/#recover-during-start-up) (useful when running a self-hosted single-node Qdrant instance) Regardless of the method used, Qdrant will extract the shard data from the snapshot and properly register shards in the cluster. If there are other active replicas of the recovered shards in the cluster, Qdrant will replicate them to the newly recovered node by default to maintain data consistency. ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#recover-from-a-url-or-local-file) Recover from a URL or local file _Available as of v0.11.3_ This method of recovery requires the snapshot file to be downloadable from a URL or exist as a local file on the node (like if you [created the snapshot](https://qdrant.tech/documentation/concepts/snapshots/#create-snapshot) on this node previously). If instead you need to upload a snapshot file, see the next section. To recover from a URL or local file use the [snapshot recovery endpoint](https://api.qdrant.tech/master/api-reference/snapshots/recover-from-snapshot). This endpoint accepts either a URL like `https://example.com` or a [file URI](https://en.wikipedia.org/wiki/File_URI_scheme) like `file:///tmp/snapshot-2022-10-10.snapshot`. If the target collection does not exist, it will be created. httppythontypescript ```http PUT /collections/{collection_name}/snapshots/recover { "location": "http://qdrant-node-1:6333/collections/{collection_name}/snapshots/snapshot-2022-10-10.shapshot" } ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://qdrant-node-2:6333") client.recover_snapshot( "{collection_name}", "http://qdrant-node-1:6333/collections/collection_name/snapshots/snapshot-2022-10-10.shapshot", ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.recoverSnapshot("{collection_name}", { location: "http://qdrant-node-1:6333/collections/{collection_name}/snapshots/snapshot-2022-10-10.shapshot", }); ``` ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#recover-from-an-uploaded-file) Recover from an uploaded file The snapshot file can also be uploaded as a file and restored using the [recover from uploaded snapshot](https://api.qdrant.tech/master/api-reference/snapshots/recover-from-uploaded-snapshot). This endpoint accepts the raw snapshot data in the request body. If the target collection does not exist, it will be created. ```bash curl -X POST 'http://{qdrant-url}:6333/collections/{collection_name}/snapshots/upload?priority=snapshot' \ -H 'api-key: ********' \ -H 'Content-Type:multipart/form-data' \ -F 'snapshot=@/path/to/snapshot-2022-10-10.shapshot' ``` This method is typically used to migrate data from one cluster to another, so we recommend setting the [priority](https://qdrant.tech/documentation/concepts/snapshots/#snapshot-priority) to “snapshot” for that use-case. ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#recover-during-start-up) Recover during start-up If you have a single-node deployment, you can recover any collection at start-up and it will be immediately available. Restoring snapshots is done through the Qdrant CLI at start-up time via the `--snapshot` argument which accepts a list of pairs such as `:` For example: ```bash ./qdrant --snapshot /snapshots/test-collection-archive.snapshot:test-collection --snapshot /snapshots/test-collection-archive.snapshot:test-copy-collection ``` The target collection **must** be absent otherwise the program will exit with an error. If you wish instead to overwrite an existing collection, use the `--force_snapshot` flag with caution. ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#snapshot-priority) Snapshot priority When recovering a snapshot to a non-empty node, there may be conflicts between the snapshot data and the existing data. The “priority” setting controls how Qdrant handles these conflicts. The priority setting is important because different priorities can give very different end results. The default priority may not be best for all situations. The available snapshot recovery priorities are: - `replica`: _(default)_ prefer existing data over the snapshot. - `snapshot`: prefer snapshot data over existing data. - `no_sync`: restore snapshot without any additional synchronization. To recover a new collection from a snapshot, you need to set the priority to `snapshot`. With `snapshot` priority, all data from the snapshot will be recovered onto the cluster. With `replica` priority _(default)_, you’d end up with an empty collection because the collection on the cluster did not contain any points and that source was preferred. `no_sync` is for specialized use cases and is not commonly used. It allows managing shards and transferring shards between clusters manually without any additional synchronization. Using it incorrectly will leave your cluster in a broken state. To recover from a URL, you specify an additional parameter in the request body: httpbashpythontypescript ```http PUT /collections/{collection_name}/snapshots/recover { "location": "http://qdrant-node-1:6333/collections/{collection_name}/snapshots/snapshot-2022-10-10.shapshot", "priority": "snapshot" } ``` ```bash curl -X POST 'http://qdrant-node-1:6333/collections/{collection_name}/snapshots/upload?priority=snapshot' \ -H 'api-key: ********' \ -H 'Content-Type:multipart/form-data' \ -F 'snapshot=@/path/to/snapshot-2022-10-10.shapshot' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://qdrant-node-2:6333") client.recover_snapshot( "{collection_name}", "http://qdrant-node-1:6333/collections/{collection_name}/snapshots/snapshot-2022-10-10.shapshot", priority=models.SnapshotPriority.SNAPSHOT, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.recoverSnapshot("{collection_name}", { location: "http://qdrant-node-1:6333/collections/{collection_name}/snapshots/snapshot-2022-10-10.shapshot", priority: "snapshot" }); ``` ## [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#snapshots-for-the-whole-storage) Snapshots for the whole storage _Available as of v0.8.5_ Sometimes it might be handy to create snapshot not just for a single collection, but for the whole storage, including collection aliases. Qdrant provides a dedicated API for that as well. It is similar to collection-level snapshots, but does not require `collection_name`. ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#create-full-storage-snapshot) Create full storage snapshot httppythontypescriptrustjavacsharpgo ```http POST /snapshots ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.create_full_snapshot() ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createFullSnapshot(); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.create_full_snapshot().await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.createFullSnapshotAsync().get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.CreateFullSnapshotAsync(); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFullSnapshot(context.Background()) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#delete-full-storage-snapshot) Delete full storage snapshot _Available as of v1.0.0_ httppythontypescriptrustjavacsharpgo ```http DELETE /snapshots/{snapshot_name} ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.delete_full_snapshot(snapshot_name="{snapshot_name}") ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.deleteFullSnapshot("{snapshot_name}"); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.delete_full_snapshot("{snapshot_name}").await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.deleteFullSnapshotAsync("{snapshot_name}").get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.DeleteFullSnapshotAsync("{snapshot_name}"); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.DeleteFullSnapshot(context.Background(), "{snapshot_name}") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#list-full-storage-snapshots) List full storage snapshots httppythontypescriptrustjavacsharpgo ```http GET /snapshots ``` ```python from qdrant_client import QdrantClient client = QdrantClient("localhost", port=6333) client.list_full_snapshots() ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.listFullSnapshots(); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client.list_full_snapshots().await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.listFullSnapshotAsync().get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.ListFullSnapshotsAsync(); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.ListFullSnapshots(context.Background()) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#download-full-storage-snapshot) Download full storage snapshot ```http GET /snapshots/{snapshot_name} ``` ## [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#restore-full-storage-snapshot) Restore full storage snapshot Restoring snapshots can only be done through the Qdrant CLI at startup time. For example: ```bash ./qdrant --storage-snapshot /snapshots/full-snapshot-2022-07-18-11-20-51.snapshot ``` ## [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#storage) Storage Created, uploaded and recovered snapshots are stored as `.snapshot` files. By default, they’re stored on the [local file system](https://qdrant.tech/documentation/concepts/snapshots/#local-file-system). You may also configure to use an [S3 storage](https://qdrant.tech/documentation/concepts/snapshots/#s3) service for them. ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#local-file-system) Local file system By default, snapshots are stored at `./snapshots` or at `/qdrant/snapshots` when using our Docker image. The target directory can be controlled through the [configuration](https://qdrant.tech/documentation/guides/configuration/): ```yaml storage: # Specify where you want to store snapshots. snapshots_path: ./snapshots ``` Alternatively you may use the environment variable `QDRANT__STORAGE__SNAPSHOTS_PATH=./snapshots`. _Available as of v1.3.0_ While a snapshot is being created, temporary files are placed in the configured storage directory by default. In case of limited capacity or a slow network attached disk, you can specify a separate location for temporary files: ```yaml storage: # Where to store temporary files temp_path: /tmp ``` ### [Anchor](https://qdrant.tech/documentation/concepts/snapshots/\#s3) S3 _Available as of v1.10.0_ Rather than storing snapshots on the local file system, you may also configure to store snapshots in an S3-compatible storage service. To enable this, you must configure it in the [configuration](https://qdrant.tech/documentation/guides/configuration/) file. For example, to configure for AWS S3: ```yaml storage: snapshots_config: # Use 's3' to store snapshots on S3 snapshots_storage: s3 s3_config: # Bucket name bucket: your_bucket_here # Bucket region (e.g. eu-central-1) region: your_bucket_region_here # Storage access key # Can be specified either here or in the `QDRANT__STORAGE__SNAPSHOTS_CONFIG__S3_CONFIG__ACCESS_KEY` environment variable. access_key: your_access_key_here # Storage secret key # Can be specified either here or in the `QDRANT__STORAGE__SNAPSHOTS_CONFIG__S3_CONFIG__SECRET_KEY` environment variable. secret_key: your_secret_key_here # S3-Compatible Storage URL # Can be specified either here or in the `QDRANT__STORAGE__SNAPSHOTS_CONFIG__S3_CONFIG__ENDPOINT_URL` environment variable. endpoint_url: your_url_here ``` Apart from Snapshots, Qdrant also provides the [Qdrant Migration Tool](https://github.com/qdrant/migration) that supports: - Migration between Qdrant Cloud instances. - Migrating vectors from other providers into Qdrant. - Migrating from Qdrant OSS to Qdrant Cloud. Follow our [migration guide](https://qdrant.tech/documentation/database-tutorials/migration/) to learn how to effectively use the Qdrant Migration tool. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/snapshots.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/snapshots.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-71-lllmstxt|> ## examples - [Documentation](https://qdrant.tech/documentation/) - Build Prototypes --- # [Anchor](https://qdrant.tech/documentation/examples/\#examples) Examples | End-to-End Code Samples | Description | Stack | | --- | --- | --- | | [Multitenancy with LlamaIndex](https://qdrant.tech/documentation/examples/llama-index-multitenancy/) | Handle data coming from multiple users in LlamaIndex. | Qdrant, Python, LlamaIndex | | [Implement custom connector for Cohere RAG](https://qdrant.tech/documentation/examples/cohere-rag-connector/) | Bring data stored in Qdrant to Cohere RAG | Qdrant, Cohere, FastAPI | | [Chatbot for Interactive Learning](https://qdrant.tech/documentation/examples/rag-chatbot-red-hat-openshift-haystack/) | Build a Private RAG Chatbot for Interactive Learning | Qdrant, Haystack, OpenShift | | [Information Extraction Engine](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/) | Build a Private RAG Information Extraction Engine | Qdrant, Vultr, DSPy, Ollama | | [System for Employee Onboarding](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/) | Build a RAG System for Employee Onboarding | Qdrant, Cohere, LangChain | | [System for Contract Management](https://qdrant.tech/documentation/examples/rag-contract-management-stackit-aleph-alpha/) | Build a Region-Specific RAG System for Contract Management | Qdrant, Aleph Alpha, STACKIT | | [Question-Answering System for Customer Support](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/) | Build a RAG System for AI Customer Support | Qdrant, Cohere, Airbyte, AWS | | [Hybrid Search on PDF Documents](https://qdrant.tech/documentation/examples/hybrid-search-llamaindex-jinaai/) | Develop a Hybrid Search System for Product PDF Manuals | Qdrant, LlamaIndex, Jina AI | | [Blog-Reading RAG Chatbot](https://qdrant.tech/documentation/examples/rag-chatbot-scaleway/) | Develop a RAG-based Chatbot on Scaleway and with LangChain | Qdrant, LangChain, GPT-4o | | [Movie Recommendation System](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/) | Build a Movie Recommendation System with LlamaIndex and With JinaAI | Qdrant | | [GraphRAG Agent](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/) | Build a GraphRAG Agent with Neo4J and Qdrant | Qdrant, Neo4j | | [Building a Chain-of-Thought Medical Chatbot with Qdrant and DSPy](https://qdrant.tech/documentation/examples/Qdrant-DSPy-medicalbot/) | How to build a medical chatbot grounded in medical literature with Qdrant and DSPy. | Qdrant, DSPy | ## [Anchor](https://qdrant.tech/documentation/examples/\#notebooks) Notebooks Our Notebooks offer complex instructions that are supported with a throrough explanation. Follow along by trying out the code and get the most out of each example. | Example | Description | Stack | | --- | --- | --- | | [Intro to Semantic Search and Recommendations Systems](https://githubtocolab.com/qdrant/examples/blob/master/qdrant_101_getting_started/getting_started.ipynb) | Learn how to get started building semantic search and recommendation systems. | Qdrant | | [Search and Recommend Newspaper Articles](https://githubtocolab.com/qdrant/examples/blob/master/qdrant_101_text_data/qdrant_and_text_data.ipynb) | Work with text data to develop a semantic search and a recommendation engine for news articles. | Qdrant | | [Recommendation System for Songs](https://githubtocolab.com/qdrant/examples/blob/master/qdrant_101_audio_data/03_qdrant_101_audio.ipynb) | Use Qdrant to develop a music recommendation engine based on audio embeddings. | Qdrant | | [Image Comparison System for Skin Conditions](https://colab.research.google.com/github/qdrant/examples/blob/master/qdrant_101_image_data/04_qdrant_101_cv.ipynb) | Use Qdrant to compare challenging images with labels representing different skin diseases. | Qdrant | | [Question and Answer System with LlamaIndex](https://github.com/qdrant/examples/blob/949669f001a03131afebf2ecd1e0ce63cab01c81/llama_index_recency/Qdrant%20and%20LlamaIndex%20%E2%80%94%20A%20new%20way%20to%20keep%20your%20Q%26A%20systems%20up-to-date.ipynb) | Combine Qdrant and LlamaIndex to create a self-updating Q&A system. | Qdrant, LlamaIndex, Cohere | | [Extractive QA System](https://githubtocolab.com/qdrant/examples/blob/master/extractive_qa/extractive-question-answering.ipynb) | Extract answers directly from context to generate highly relevant answers. | Qdrant | | [Ecommerce Reverse Image Search](https://githubtocolab.com/qdrant/examples/blob/master/ecommerce_reverse_image_search/ecommerce-reverse-image-search.ipynb) | Accept images as search queries to receive semantically appropriate answers. | Qdrant | | [Basic RAG](https://githubtocolab.com/qdrant/examples/blob/master/rag-openai-qdrant/rag-openai-qdrant.ipynb) | Basic RAG pipeline with Qdrant and OpenAI SDKs. | OpenAI, Qdrant, FastEmbed | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-72-lllmstxt|> ## scalar-quantization - [Articles](https://qdrant.tech/articles/) - Scalar Quantization: Background, Practices & More \| Qdrant [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Scalar Quantization: Background, Practices & More \| Qdrant Kacper Łukawski · March 27, 2023 ![Scalar Quantization: Background, Practices & More | Qdrant](https://qdrant.tech/articles_data/scalar-quantization/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/scalar-quantization/\#efficiency-unleashed-the-power-of-scalar-quantization) Efficiency Unleashed: The Power of Scalar Quantization High-dimensional vector embeddings can be memory-intensive, especially when working with large datasets consisting of millions of vectors. Memory footprint really starts being a concern when we scale things up. A simple choice of the data type used to store a single number impacts even billions of numbers and can drive the memory requirements crazy. The higher the precision of your type, the more accurately you can represent the numbers. The more accurate your vectors, the more precise is the distance calculation. But the advantages stop paying off when you need to order more and more memory. Qdrant chose `float32` as a default type used to store the numbers of your embeddings. So a single number needs 4 bytes of the memory and a 512-dimensional vector occupies 2 kB. That’s only the memory used to store the vector. There is also an overhead of the HNSW graph, so as a rule of thumb we estimate the memory size with the following formula: ```text memory_size = 1.5 * number_of_vectors * vector_dimension * 4 bytes ``` While Qdrant offers various options to store some parts of the data on disk, starting from version 1.1.0, you can also optimize your memory by compressing the embeddings. We’ve implemented the mechanism of **Scalar Quantization**! It turns out to have not only a positive impact on memory but also on the performance. ## [Anchor](https://qdrant.tech/articles/scalar-quantization/\#scalar-quantization) Scalar quantization Scalar quantization is a data compression technique that converts floating point values into integers. In case of Qdrant `float32` gets converted into `int8`, so a single number needs 75% less memory. It’s not a simple rounding though! It’s a process that makes that transformation partially reversible, so we can also revert integers back to floats with a small loss of precision. ### [Anchor](https://qdrant.tech/articles/scalar-quantization/\#theoretical-background) Theoretical background Assume we have a collection of `float32` vectors and denote a single value as `f32`. In reality neural embeddings do not cover a whole range represented by the floating point numbers, but rather a small subrange. Since we know all the other vectors, we can establish some statistics of all the numbers. For example, the distribution of the values will be typically normal: ![A distribution of the vector values](https://qdrant.tech/articles_data/scalar-quantization/float32-distribution.png) Our example shows that 99% of the values come from a `[-2.0, 5.0]` range. And the conversion to `int8` will surely lose some precision, so we rather prefer keeping the representation accuracy within the range of 99% of the most probable values and ignoring the precision of the outliers. There might be a different choice of the range width, actually, any value from a range `[0, 1]`, where `0` means empty range, and `1` would keep all the values. That’s a hyperparameter of the procedure called `quantile`. A value of `0.95` or `0.99` is typically a reasonable choice, but in general `quantile ∈ [0, 1]`. #### [Anchor](https://qdrant.tech/articles/scalar-quantization/\#conversion-to-integers) Conversion to integers Let’s talk about the conversion to `int8`. Integers also have a finite set of values that might be represented. Within a single byte they may represent up to 256 different values, either from `[-128, 127]` or `[0, 255]`. ![Value ranges represented by int8](https://qdrant.tech/articles_data/scalar-quantization/int8-value-range.png) Since we put some boundaries on the numbers that might be represented by the `f32`, and `i8` has some natural boundaries, the process of converting the values between those two ranges is quite natural: f32=α×i8+offset i8=f32−offsetα The parameters α and offset has to be calculated for a given set of vectors, but that comes easily by putting the minimum and maximum of the represented range for both `f32` and `i8`. ![Float32 to int8 conversion](https://qdrant.tech/articles_data/scalar-quantization/float32-to-int8-conversion.png) For the unsigned `int8` it will go as following: {−2=α×0+offset5=α×255+offset In case of signed `int8`, we’ll just change the represented range boundaries: {−2=α×(−128)+offset5=α×127+offset For any set of vector values we can simply calculate the α and offset and those values have to be stored along with the collection to enable to conversion between the types. #### [Anchor](https://qdrant.tech/articles/scalar-quantization/\#distance-calculation) Distance calculation We do not store the vectors in the collections represented by `int8` instead of `float32` just for the sake of compressing the memory. But the coordinates are being used while we calculate the distance between the vectors. Both dot product and cosine distance requires multiplying the corresponding coordinates of two vectors, so that’s the operation we perform quite often on `float32`. Here is how it would look like if we perform the conversion to `int8`: f32×f32′==(α×i8+offset)×(α×i8′+offset)==α2×i8×i8′+offset×α×i8′+offset×α×i8+offset2⏟pre-compute The first term, α2×i8×i8′ has to be calculated when we measure the distance as it depends on both vectors. However, both the second and the third term (offset×α×i8′ and offset×α×i8 respectively), depend only on a single vector and those might be precomputed and kept for each vector. The last term, offset2 does not depend on any of the values, so it might be even computed once and reused. If we had to calculate all the terms to measure the distance, the performance could have been even worse than without the conversion. But thanks for the fact we can precompute the majority of the terms, things are getting simpler. And in turns out the scalar quantization has a positive impact not only on the memory usage, but also on the performance. As usual, we performed some benchmarks to support this statement! ## [Anchor](https://qdrant.tech/articles/scalar-quantization/\#benchmarks) Benchmarks We simply used the same approach as we use in all [the other benchmarks we publish](https://qdrant.tech/benchmarks/). Both [Arxiv-titles-384-angular-no-filters](https://github.com/qdrant/ann-filtering-benchmark-datasets) and [Gist-960](https://github.com/erikbern/ann-benchmarks/) datasets were chosen to make the comparison between non-quantized and quantized vectors. The results are summarized in the tables: #### [Anchor](https://qdrant.tech/articles/scalar-quantization/\#arxiv-titles-384-angular-no-filters) Arxiv-titles-384-angular-no-filters | | ef = 128 | ef = 256 | ef = 512 | | --- | --- | --- | --- | | | Upload and indexing time | Mean search precision | Mean search time | Mean search precision | Mean search time | Mean search precision | Mean search time | | --- | --- | --- | --- | --- | --- | --- | --- | | Non-quantized vectors | 649 s | 0.989 | 0.0094 | 0.994 | 0.0932 | 0.996 | 0.161 | | Scalar Quantization | 496 s | 0.986 | 0.0037 | 0.993 | 0.060 | 0.996 | 0.115 | | Difference | -23.57% | -0.3% | -60.64% | -0.1% | -35.62% | 0% | -28.57% | A slight decrease in search precision results in a considerable improvement in the latency. Unless you aim for the highest precision possible, you should not notice the difference in your search quality. #### [Anchor](https://qdrant.tech/articles/scalar-quantization/\#gist-960) Gist-960 | | ef = 128 | ef = 256 | ef = 512 | | --- | --- | --- | --- | | | Upload and indexing time | Mean search precision | Mean search time | Mean search precision | Mean search time | Mean search precision | Mean search time | | --- | --- | --- | --- | --- | --- | --- | --- | | Non-quantized vectors | 452 | 0.802 | 0.077 | 0.887 | 0.135 | 0.941 | 0.231 | | Scalar Quantization | 312 | 0.802 | 0.043 | 0.888 | 0.077 | 0.941 | 0.135 | | Difference | -30.79% | 0% | -44,16% | +0.11% | -42.96% | 0% | -41,56% | In all the cases, the decrease in search precision is negligible, but we keep a latency reduction of at least 28.57%, even up to 60,64%, while searching. As a rule of thumb, the higher the dimensionality of the vectors, the lower the precision loss. ### [Anchor](https://qdrant.tech/articles/scalar-quantization/\#oversampling-and-rescoring) Oversampling and rescoring A distinctive feature of the Qdrant architecture is the ability to combine the search for quantized and original vectors in a single query. This enables the best combination of speed, accuracy, and RAM usage. Qdrant stores the original vectors, so it is possible to rescore the top-k results with the original vectors after doing the neighbours search in quantized space. That obviously has some impact on the performance, but in order to measure how big it is, we made the comparison in different search scenarios. We used a machine with a very slow network-mounted disk and tested the following scenarios with different amounts of allowed RAM: | Setup | RPS | Precision | | --- | --- | --- | | 4.5GB memory | 600 | 0.99 | | 4.5GB memory + SQ + rescore | 1000 | 0.989 | And another group with more strict memory limits: | Setup | RPS | Precision | | --- | --- | --- | | 2GB memory | 2 | 0.99 | | 2GB memory + SQ + rescore | 30 | 0.989 | | 2GB memory + SQ + no rescore | 1200 | 0.974 | In those experiments, throughput was mainly defined by the number of disk reads, and quantization efficiently reduces it by allowing more vectors in RAM. Read more about on-disk storage in Qdrant and how we measure its performance in our article: [Minimal RAM you need to serve a million vectors](https://qdrant.tech/articles/memory-consumption/). The mechanism of Scalar Quantization with rescoring disabled pushes the limits of low-end machines even further. It seems like handling lots of requests does not require an expensive setup if you can agree to a small decrease in the search precision. ### [Anchor](https://qdrant.tech/articles/scalar-quantization/\#accessing-best-practices) Accessing best practices Qdrant documentation on [Scalar Quantization](https://qdrant.tech/documentation/quantization/#setting-up-quantization-in-qdrant) is a great resource describing different scenarios and strategies to achieve up to 4x lower memory footprint and even up to 2x performance increase. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/scalar-quantization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/scalar-quantization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-73-lllmstxt|> ## faq-question-answering - [Articles](https://qdrant.tech/articles/) - Q&A with Similarity Learning [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Q&A with Similarity Learning George Panchuk · June 28, 2022 ![Q&A with Similarity Learning](https://qdrant.tech/articles_data/faq-question-answering/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/faq-question-answering/\#question-answering-system-with-similarity-learning-and-quaterion) Question-answering system with Similarity Learning and Quaterion Many problems in modern machine learning are approached as classification tasks. Some are the classification tasks by design, but others are artificially transformed into such. And when you try to apply an approach, which does not naturally fit your problem, you risk coming up with over-complicated or bulky solutions. In some cases, you would even get worse performance. Imagine that you got a new task and decided to solve it with a good old classification approach. Firstly, you will need labeled data. If it came on a plate with the task, you’re lucky, but if it didn’t, you might need to label it manually. And I guess you are already familiar with how painful it might be. Assuming you somehow labeled all required data and trained a model. It shows good performance - well done! But a day later, your manager told you about a bunch of new data with new classes, which your model has to handle. You repeat your pipeline. Then, two days later, you’ve been reached out one more time. You need to update the model again, and again, and again. Sounds tedious and expensive for me, does not it for you? ## [Anchor](https://qdrant.tech/articles/faq-question-answering/\#automating-customer-support) Automating customer support Let’s now take a look at the concrete example. There is a pressing problem with automating customer support. The service should be capable of answering user questions and retrieving relevant articles from the documentation without any human involvement. With the classification approach, you need to build a hierarchy of classification models to determine the question’s topic. You have to collect and label a whole custom dataset of your private documentation topics to train that. And then, each time you have a new topic in your documentation, you have to re-train the whole pile of classifiers with additionally labeled data. Can we make it easier? ## [Anchor](https://qdrant.tech/articles/faq-question-answering/\#similarity-option) Similarity option One of the possible alternatives is Similarity Learning, which we are going to discuss in this article. It suggests getting rid of the classes and making decisions based on the similarity between objects instead. To do it quickly, we would need some intermediate representation - embeddings. Embeddings are high-dimensional vectors with semantic information accumulated in them. As embeddings are vectors, one can apply a simple function to calculate the similarity score between them, for example, cosine or euclidean distance. So with similarity learning, all we need to do is provide pairs of correct questions and answers. And then, the model will learn to distinguish proper answers by the similarity of embeddings. > If you want to learn more about similarity learning and applications, check out this [article](https://qdrant.tech/documentation/tutorials/neural-search/) which might be an asset. ## [Anchor](https://qdrant.tech/articles/faq-question-answering/\#lets-build) Let’s build Similarity learning approach seems a lot simpler than classification in this case, and if you have some doubts on your mind, let me dispel them. As I have no any resource with exhaustive F.A.Q. which might serve as a dataset, I’ve scrapped it from sites of popular cloud providers. The dataset consists of just 8.5k pairs of question and answers, you can take a closer look at it [here](https://github.com/qdrant/demo-cloud-faq). Once we have data, we need to obtain embeddings for it. It is not a novel technique in NLP to represent texts as embeddings. There are plenty of algorithms and models to calculate them. You could have heard of Word2Vec, GloVe, ELMo, BERT, all these models can provide text embeddings. However, it is better to produce embeddings with a model trained for semantic similarity tasks. For instance, we can find such models at [sentence-transformers](https://www.sbert.net/docs/pretrained_models.html). Authors claim that `all-mpnet-base-v2` provides the best quality, but let’s pick `all-MiniLM-L6-v2` for our tutorial as it is 5x faster and still offers good results. Having all this, we can test our approach. We won’t take all our dataset at the moment, but only a part of it. To measure model’s performance we will use two metrics - [mean reciprocal rank](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) and [precision@1](https://en.wikipedia.org/wiki/Evaluation_measures_%28information_retrieval%29#Precision_at_k). We have a [ready script](https://github.com/qdrant/demo-cloud-faq/blob/experiments/faq/baseline.py) for this experiment, let’s just launch it now. | precision@1 | reciprocal\_rank | | --- | --- | | 0.564 | 0.663 | That’s already quite decent quality, but maybe we can do better? ## [Anchor](https://qdrant.tech/articles/faq-question-answering/\#improving-results-with-fine-tuning) Improving results with fine-tuning Actually, we can! Model we used has a good natural language understanding, but it has never seen our data. An approach called `fine-tuning` might be helpful to overcome this issue. With fine-tuning you don’t need to design a task-specific architecture, but take a model pre-trained on another task, apply a couple of layers on top and train its parameters. Sounds good, but as similarity learning is not as common as classification, it might be a bit inconvenient to fine-tune a model with traditional tools. For this reason we will use [Quaterion](https://github.com/qdrant/quaterion) \- a framework for fine-tuning similarity learning models. Let’s see how we can train models with it First, create our project and call it `faq`. > All project dependencies, utils scripts not covered in the tutorial can be found in the > [repository](https://github.com/qdrant/demo-cloud-faq/tree/tutorial). ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#configure-training) Configure training The main entity in Quaterion is [TrainableModel](https://quaterion.qdrant.tech/quaterion.train.trainable_model.html). This class makes model’s building process fast and convenient. `TrainableModel` is a wrapper around [pytorch\_lightning.LightningModule](https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html). [Lightning](https://www.pytorchlightning.ai/) handles all the training process complexities, like training loop, device managing, etc. and saves user from a necessity to implement all this routine manually. Also Lightning’s modularity is worth to be mentioned. It improves separation of responsibilities, makes code more readable, robust and easy to write. All these features make Pytorch Lightning a perfect training backend for Quaterion. To use `TrainableModel` you need to inherit your model class from it. The same way you would use `LightningModule` in pure `pytorch_lightning`. Mandatory methods are `configure_loss`, `configure_encoders`, `configure_head`, `configure_optimizers`. The majority of mentioned methods are quite easy to implement, you’ll probably just need a couple of imports to do that. But `configure_encoders` requires some code:) Let’s create a `model.py` with model’s template and a placeholder for `configure_encoders` for the moment. ```python from typing import Union, Dict, Optional from torch.optim import Adam from quaterion import TrainableModel from quaterion.loss import MultipleNegativesRankingLoss, SimilarityLoss from quaterion_models.encoders import Encoder from quaterion_models.heads import EncoderHead from quaterion_models.heads.skip_connection_head import SkipConnectionHead class FAQModel(TrainableModel): def __init__(self, lr=10e-5, *args, **kwargs): self.lr = lr super().__init__(*args, **kwargs) def configure_optimizers(self): return Adam(self.model.parameters(), lr=self.lr) def configure_loss(self) -> SimilarityLoss: return MultipleNegativesRankingLoss(symmetric=True) def configure_encoders(self) -> Union[Encoder, Dict[str, Encoder]]: ... # ToDo def configure_head(self, input_embedding_size: int) -> EncoderHead: return SkipConnectionHead(input_embedding_size) ``` - `configure_optimizers` is a method provided by Lightning. An eagle-eye of you could notice mysterious `self.model`, it is actually a [SimilarityModel](https://quaterion-models.qdrant.tech/quaterion_models.model.html) instance. We will cover it later. - `configure_loss` is a loss function to be used during training. You can choose a ready-made implementation from Quaterion. However, since Quaterion’s purpose is not to cover all possible losses, or other entities and features of similarity learning, but to provide a convenient framework to build and use such models, there might not be a desired loss. In this case it is possible to use [PytorchMetricLearningWrapper](https://quaterion.qdrant.tech/quaterion.loss.extras.pytorch_metric_learning_wrapper.html) to bring required loss from [pytorch-metric-learning](https://kevinmusgrave.github.io/pytorch-metric-learning/) library, which has a rich collection of losses. You can also implement a custom loss yourself. - `configure_head` \- model built via Quaterion is a combination of encoders and a top layer - head. As with losses, some head implementations are provided. They can be found at [quaterion\_models.heads](https://quaterion-models.qdrant.tech/quaterion_models.heads.html). At our example we use [MultipleNegativesRankingLoss](https://quaterion.qdrant.tech/quaterion.loss.multiple_negatives_ranking_loss.html). This loss is especially good for training retrieval tasks. It assumes that we pass only positive pairs (similar objects) and considers all other objects as negative examples. `MultipleNegativesRankingLoss` use cosine to measure distance under the hood, but it is a configurable parameter. Quaterion provides implementation for other distances as well. You can find available ones at [quaterion.distances](https://quaterion.qdrant.tech/quaterion.distances.html). Now we can come back to `configure_encoders`:) ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#configure-encoder) Configure Encoder The encoder task is to convert objects into embeddings. They usually take advantage of some pre-trained models, in our case `all-MiniLM-L6-v2` from `sentence-transformers`. In order to use it in Quaterion, we need to create a wrapper inherited from the [Encoder](https://quaterion-models.qdrant.tech/quaterion_models.encoders.encoder.html) class. Let’s create our encoder in `encoder.py` ```python import os from torch import Tensor, nn from sentence_transformers.models import Transformer, Pooling from quaterion_models.encoders import Encoder from quaterion_models.types import TensorInterchange, CollateFnType class FAQEncoder(Encoder): def __init__(self, transformer, pooling): super().__init__() self.transformer = transformer self.pooling = pooling self.encoder = nn.Sequential(self.transformer, self.pooling) @property def trainable(self) -> bool: # Defines if we want to train encoder itself, or head layer only return False @property def embedding_size(self) -> int: return self.transformer.get_word_embedding_dimension() def forward(self, batch: TensorInterchange) -> Tensor: return self.encoder(batch)["sentence_embedding"] def get_collate_fn(self) -> CollateFnType: return self.transformer.tokenize @staticmethod def _transformer_path(path: str): return os.path.join(path, "transformer") @staticmethod def _pooling_path(path: str): return os.path.join(path, "pooling") def save(self, output_path: str): transformer_path = self._transformer_path(output_path) os.makedirs(transformer_path, exist_ok=True) pooling_path = self._pooling_path(output_path) os.makedirs(pooling_path, exist_ok=True) self.transformer.save(transformer_path) self.pooling.save(pooling_path) @classmethod def load(cls, input_path: str) -> Encoder: transformer = Transformer.load(cls._transformer_path(input_path)) pooling = Pooling.load(cls._pooling_path(input_path)) return cls(transformer=transformer, pooling=pooling) ``` As you can notice, there are more methods implemented, then we’ve already discussed. Let’s go through them now! - In `__init__` we register our pre-trained layers, similar as you do in [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) descendant. - `trainable` defines whether current `Encoder` layers should be updated during training or not. If `trainable=False`, then all layers will be frozen. - `embedding_size` is a size of encoder’s output, it is required for proper `head` configuration. - `get_collate_fn` is a tricky one. Here you should return a method which prepares a batch of raw data into the input, suitable for the encoder. If `get_collate_fn` is not overridden, then the [default\_collate](https://pytorch.org/docs/stable/data.html#torch.utils.data.default_collate) will be used. The remaining methods are considered self-describing. As our encoder is ready, we now are able to fill `configure_encoders`. Just insert the following code into `model.py`: ```python ... from sentence_transformers import SentenceTransformer from sentence_transformers.models import Transformer, Pooling from faq.encoder import FAQEncoder class FAQModel(TrainableModel): ... def configure_encoders(self) -> Union[Encoder, Dict[str, Encoder]]: pre_trained_model = SentenceTransformer("all-MiniLM-L6-v2") transformer: Transformer = pre_trained_model[0] pooling: Pooling = pre_trained_model[1] encoder = FAQEncoder(transformer, pooling) return encoder ``` ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#data-preparation) Data preparation Okay, we have raw data and a trainable model. But we don’t know yet how to feed this data to our model. Currently, Quaterion takes two types of similarity representation - pairs and groups. The groups format assumes that all objects split into groups of similar objects. All objects inside one group are similar, and all other objects outside this group considered dissimilar to them. But in the case of pairs, we can only assume similarity between explicitly specified pairs of objects. We can apply any of the approaches with our data, but pairs one seems more intuitive. The format in which Similarity is represented determines which loss can be used. For example, _ContrastiveLoss_ and _MultipleNegativesRankingLoss_ works with pairs format. [SimilarityPairSample](https://quaterion.qdrant.tech/quaterion.dataset.similarity_samples.html#quaterion.dataset.similarity_samples.SimilarityPairSample) could be used to represent pairs. Let’s take a look at it: ```python @dataclass class SimilarityPairSample: obj_a: Any obj_b: Any score: float = 1.0 subgroup: int = 0 ``` Here might be some questions: what `score` and `subgroup` are? Well, `score` is a measure of expected samples similarity. If you only need to specify if two samples are similar or not, you can use `1.0` and `0.0` respectively. `subgroups` parameter is required for more granular description of what negative examples could be. By default, all pairs belong the subgroup zero. That means that we would need to specify all negative examples manually. But in most cases, we can avoid this by enabling different subgroups. All objects from different subgroups will be considered as negative examples in loss, and thus it provides a way to set negative examples implicitly. With this knowledge, we now can create our `Dataset` class in `dataset.py` to feed our model: ```python import json from typing import List, Dict from torch.utils.data import Dataset from quaterion.dataset.similarity_samples import SimilarityPairSample class FAQDataset(Dataset): """Dataset class to process .jsonl files with FAQ from popular cloud providers.""" def __init__(self, dataset_path): self.dataset: List[Dict[str, str]] = self.read_dataset(dataset_path) def __getitem__(self, index) -> SimilarityPairSample: line = self.dataset[index] question = line["question"] # All questions have a unique subgroup # Meaning that all other answers are considered negative pairs subgroup = hash(question) return SimilarityPairSample( obj_a=question, obj_b=line["answer"], score=1, subgroup=subgroup ) def __len__(self): return len(self.dataset) @staticmethod def read_dataset(dataset_path) -> List[Dict[str, str]]: """Read jsonl-file into a memory.""" with open(dataset_path, "r") as fd: return [json.loads(json_line) for json_line in fd] ``` We assigned a unique subgroup for each question, so all other objects which have different question will be considered as negative examples. ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#evaluation-metric) Evaluation Metric We still haven’t added any metrics to the model. For this purpose Quaterion provides `configure_metrics`. We just need to override it and attach interested metrics. Quaterion has some popular retrieval metrics implemented - such as _precision @ k_ or _mean reciprocal rank_. They can be found in [quaterion.eval](https://quaterion.qdrant.tech/quaterion.eval.html) package. But there are just a few metrics, it is assumed that desirable ones will be made by user or taken from another libraries. You will probably need to inherit from `PairMetric` or `GroupMetric` to implement a new one. In `configure_metrics` we need to return a list of `AttachedMetric`. They are just wrappers around metric instances and helps to log metrics more easily. Under the hood `logging` is handled by `pytorch-lightning`. You can configure it as you want - pass required parameters as keyword arguments to `AttachedMetric`. For additional info visit [logging documentation page](https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html) Let’s add mentioned metrics for our `FAQModel`. Add this code to `model.py`: ```python ... from quaterion.eval.pair import RetrievalPrecision, RetrievalReciprocalRank from quaterion.eval.attached_metric import AttachedMetric class FAQModel(TrainableModel): def __init__(self, lr=10e-5, *args, **kwargs): self.lr = lr super().__init__(*args, **kwargs) ... def configure_metrics(self): return [\ AttachedMetric(\ "RetrievalPrecision",\ RetrievalPrecision(k=1),\ prog_bar=True,\ on_epoch=True,\ ),\ AttachedMetric(\ "RetrievalReciprocalRank",\ RetrievalReciprocalRank(),\ prog_bar=True,\ on_epoch=True\ ),\ ] ``` ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#fast-training-with-cache) Fast training with Cache Quaterion has one more cherry on top of the cake when it comes to non-trainable encoders. If encoders are frozen, they are deterministic and emit the exact embeddings for the same input data on each epoch. It provides a way to avoid repeated calculations and reduce training time. For this purpose Quaterion has a cache functionality. Before training starts, the cache runs one epoch to pre-calculate all embeddings with frozen encoders and then store them on a device you chose (currently CPU or GPU). Everything you need is to define which encoders are trainable or not and set cache settings. And that’s it: everything else Quaterion will handle for you. To configure cache you need to override `configure_cache` method in `TrainableModel`. This method should return an instance of [CacheConfig](https://quaterion.qdrant.tech/quaterion.train.cache.cache_config.html#quaterion.train.cache.cache_config.CacheConfig). Let’s add cache to our model: ```python ... from quaterion.train.cache import CacheConfig, CacheType ... class FAQModel(TrainableModel): ... def configure_caches(self) -> Optional[CacheConfig]: return CacheConfig(CacheType.AUTO) ... ``` [CacheType](https://quaterion.qdrant.tech/quaterion.train.cache.cache_config.html#quaterion.train.cache.cache_config.CacheType) determines how the cache will be stored in memory. ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#training) Training Now we need to combine all our code together in `train.py` and launch a training process. ```python import torch import pytorch_lightning as pl from quaterion import Quaterion from quaterion.dataset import PairsSimilarityDataLoader from faq.dataset import FAQDataset def train(model, train_dataset_path, val_dataset_path, params): use_gpu = params.get("cuda", torch.cuda.is_available()) trainer = pl.Trainer( min_epochs=params.get("min_epochs", 1), max_epochs=params.get("max_epochs", 500), auto_select_gpus=use_gpu, log_every_n_steps=params.get("log_every_n_steps", 1), gpus=int(use_gpu), ) train_dataset = FAQDataset(train_dataset_path) val_dataset = FAQDataset(val_dataset_path) train_dataloader = PairsSimilarityDataLoader( train_dataset, batch_size=1024 ) val_dataloader = PairsSimilarityDataLoader( val_dataset, batch_size=1024 ) Quaterion.fit(model, trainer, train_dataloader, val_dataloader) if __name__ == "__main__": import os from pytorch_lightning import seed_everything from faq.model import FAQModel from faq.config import DATA_DIR, ROOT_DIR seed_everything(42, workers=True) faq_model = FAQModel() train_path = os.path.join( DATA_DIR, "train_cloud_faq_dataset.jsonl" ) val_path = os.path.join( DATA_DIR, "val_cloud_faq_dataset.jsonl" ) train(faq_model, train_path, val_path, {}) faq_model.save_servable(os.path.join(ROOT_DIR, "servable")) ``` Here are a couple of unseen classes, `PairsSimilarityDataLoader`, which is a native dataloader for `SimilarityPairSample` objects, and `Quaterion` is an entry point to the training process. ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#dataset-wise-evaluation) Dataset-wise evaluation Up to this moment we’ve calculated only batch-wise metrics. Such metrics can fluctuate a lot depending on a batch size and can be misleading. It might be helpful if we can calculate a metric on a whole dataset or some large part of it. Raw data may consume a huge amount of memory, and usually we can’t fit it into one batch. Embeddings, on the contrary, most probably will consume less. That’s where `Evaluator` enters the scene. At first, having dataset of `SimilaritySample`, `Evaluator` encodes it via `SimilarityModel` and compute corresponding labels. After that, it calculates a metric value, which could be more representative than batch-wise ones. However, you still can find yourself in a situation where evaluation becomes too slow, or there is no enough space left in the memory. A bottleneck might be a squared distance matrix, which one needs to calculate to compute a retrieval metric. You can mitigate this bottleneck by calculating a rectangle matrix with reduced size. `Evaluator` accepts `sampler` with a sample size to select only specified amount of embeddings. If sample size is not specified, evaluation is performed on all embeddings. Fewer words! Let’s add evaluator to our code and finish `train.py`. ```python ... from quaterion.eval.evaluator import Evaluator from quaterion.eval.pair import RetrievalReciprocalRank, RetrievalPrecision from quaterion.eval.samplers.pair_sampler import PairSampler ... def train(model, train_dataset_path, val_dataset_path, params): ... metrics = { "rrk": RetrievalReciprocalRank(), "rp@1": RetrievalPrecision(k=1) } sampler = PairSampler() evaluator = Evaluator(metrics, sampler) results = Quaterion.evaluate(evaluator, val_dataset, model.model) print(f"results: {results}") ``` ### [Anchor](https://qdrant.tech/articles/faq-question-answering/\#train-results) Train Results At this point we can train our model, I do it via `python3 -m faq.train`. | epoch | train\_precision@1 | train\_reciprocal\_rank | val\_precision@1 | val\_reciprocal\_rank | | --- | --- | --- | --- | --- | | 0 | 0.650 | 0.732 | 0.659 | 0.741 | | 100 | 0.665 | 0.746 | 0.673 | 0.754 | | 200 | 0.677 | 0.757 | 0.682 | 0.763 | | 300 | 0.686 | 0.765 | 0.688 | 0.768 | | 400 | 0.695 | 0.772 | 0.694 | 0.773 | | 500 | 0.701 | 0.778 | 0.700 | 0.777 | Results obtained with `Evaluator`: | precision@1 | reciprocal\_rank | | --- | --- | | 0.577 | 0.675 | After training all the metrics have been increased. And this training was done in just 3 minutes on a single gpu! There is no overfitting and the results are steadily growing, although I think there is still room for improvement and experimentation. ## [Anchor](https://qdrant.tech/articles/faq-question-answering/\#model-serving) Model serving As you could already notice, Quaterion framework is split into two separate libraries: `quaterion` and [quaterion-models](https://quaterion-models.qdrant.tech/). The former one contains training related stuff like losses, cache, `pytorch-lightning` dependency, etc. While the latter one contains only modules necessary for serving: encoders, heads and `SimilarityModel` itself. The reasons for this separation are: - less amount of entities you need to operate in a production environment - reduced memory footprint It is essential to isolate training dependencies from the serving environment cause the training step is usually more complicated. Training dependencies are quickly going out of control, significantly slowing down the deployment and serving timings and increasing unnecessary resource usage. The very last row of `train.py` \- `faq_model.save_servable(...)` saves encoders and the model in a fashion that eliminates all Quaterion dependencies and stores only the most necessary data to run a model in production. In `serve.py` we load and encode all the answers and then look for the closest vectors to the questions we are interested in: ```python import os import json import torch from quaterion_models.model import SimilarityModel from quaterion.distances import Distance from faq.config import DATA_DIR, ROOT_DIR if __name__ == "__main__": device = "cuda:0" if torch.cuda.is_available() else "cpu" model = SimilarityModel.load(os.path.join(ROOT_DIR, "servable")) model.to(device) dataset_path = os.path.join(DATA_DIR, "val_cloud_faq_dataset.jsonl") with open(dataset_path) as fd: answers = [json.loads(json_line)["answer"] for json_line in fd] # everything is ready, let's encode our answers answer_embeddings = model.encode(answers, to_numpy=False) # Some prepared questions and answers to ensure that our model works as intended questions = [\ "what is the pricing of aws lambda functions powered by aws graviton2 processors?",\ "can i run a cluster or job for a long time?",\ "what is the dell open manage system administrator suite (omsa)?",\ "what are the differences between the event streams standard and event streams enterprise plans?",\ ] ground_truth_answers = [\ "aws lambda functions powered by aws graviton2 processors are 20% cheaper compared to x86-based lambda functions",\ "yes, you can run a cluster for as long as is required",\ "omsa enables you to perform certain hardware configuration tasks and to monitor the hardware directly via the operating system",\ "to find out more information about the different event streams plans, see choosing your plan",\ ] # encode our questions and find the closest to them answer embeddings question_embeddings = model.encode(questions, to_numpy=False) distance = Distance.get_by_name(Distance.COSINE) question_answers_distances = distance.distance_matrix( question_embeddings, answer_embeddings ) answers_indices = question_answers_distances.min(dim=1)[1] for q_ind, a_ind in enumerate(answers_indices): print("Q:", questions[q_ind]) print("A:", answers[a_ind], end="\n\n") assert ( answers[a_ind] == ground_truth_answers[q_ind] ), f"<{answers[a_ind]}> != <{ground_truth_answers[q_ind]}>" ``` We stored our collection of answer embeddings in memory and perform search directly in Python. For production purposes, it’s better to use some sort of vector search engine like [Qdrant](https://github.com/qdrant/qdrant). It provides durability, speed boost, and a bunch of other features. So far, we’ve implemented a whole training process, prepared model for serving and even applied a trained model today with `Quaterion`. Thank you for your time and attention! I hope you enjoyed this huge tutorial and will use `Quaterion` for your similarity learning projects. All ready to use code can be found [here](https://github.com/qdrant/demo-cloud-faq/tree/tutorial). Stay tuned!:) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/faq-question-answering.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/faq-question-answering.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-74-lllmstxt|> ## vector-search - [Documentation](https://qdrant.tech/documentation/) - [Overview](https://qdrant.tech/documentation/overview/) - Understanding Vector Search in Qdrant --- # [Anchor](https://qdrant.tech/documentation/overview/vector-search/\#how-does-vector-search-work-in-qdrant) How Does Vector Search Work in Qdrant? If you are still trying to figure out how vector search works, please read ahead. This document describes how vector search is used, covers Qdrant’s place in the larger ecosystem, and outlines how you can use Qdrant to augment your existing projects. For those who want to start writing code right away, visit our [Complete Beginners tutorial](https://qdrant.tech/documentation/tutorials/search-beginners/) to build a search engine in 5-15 minutes. ## [Anchor](https://qdrant.tech/documentation/overview/vector-search/\#a-brief-history-of-search) A Brief History of Search Human memory is unreliable. Thus, as long as we have been trying to collect ‘knowledge’ in written form, we had to figure out how to search for relevant content without rereading the same books repeatedly. That’s why some brilliant minds introduced the inverted index. In the simplest form, it’s an appendix to a book, typically put at its end, with a list of the essential terms-and links to pages they occur at. Terms are put in alphabetical order. Back in the day, that was a manually crafted list requiring lots of effort to prepare. Once digitalization started, it became a lot easier, but still, we kept the same general principles. That worked, and still, it does. If you are looking for a specific topic in a particular book, you can try to find a related phrase and quickly get to the correct page. Of course, assuming you know the proper term. If you don’t, you must try and fail several times or find somebody else to help you form the correct query. ![A simplified version of the inverted index.](https://qdrant.tech/docs/gettingstarted/inverted-index.png) A simplified version of the inverted index. Time passed, and we haven’t had much change in that area for quite a long time. But our textual data collection started to grow at a greater pace. So we also started building up many processes around those inverted indexes. For example, we allowed our users to provide many words and started splitting them into pieces. That allowed finding some documents which do not necessarily contain all the query words, but possibly part of them. We also started converting words into their root forms to cover more cases, removing stopwords, etc. Effectively we were becoming more and more user-friendly. Still, the idea behind the whole process is derived from the most straightforward keyword-based search known since the Middle Ages, with some tweaks. ![The process of tokenization with an additional stopwords removal and converstion to root form of a word.](https://qdrant.tech/docs/gettingstarted/tokenization.png) The process of tokenization with an additional stopwords removal and converstion to root form of a word. Technically speaking, we encode the documents and queries into so-called sparse vectors where each position has a corresponding word from the whole dictionary. If the input text contains a specific word, it gets a non-zero value at that position. But in reality, none of the texts will contain more than hundreds of different words. So the majority of vectors will have thousands of zeros and a few non-zero values. That’s why we call them sparse. And they might be already used to calculate some word-based similarity by finding the documents which have the biggest overlap. ![An example of a query vectorized to sparse format.](https://qdrant.tech/docs/gettingstarted/query.png) An example of a query vectorized to sparse format. Sparse vectors have relatively **high dimensionality**; equal to the size of the dictionary. And the dictionary is obtained automatically from the input data. So if we have a vector, we are able to partially reconstruct the words used in the text that created that vector. ## [Anchor](https://qdrant.tech/documentation/overview/vector-search/\#the-tower-of-babel) The Tower of Babel Every once in a while, when we discover new problems with inverted indexes, we come up with a new heuristic to tackle it, at least to some extent. Once we realized that people might describe the same concept with different words, we started building lists of synonyms to convert the query to a normalized form. But that won’t work for the cases we didn’t foresee. Still, we need to craft and maintain our dictionaries manually, so they can support the language that changes over time. Another difficult issue comes to light with multilingual scenarios. Old methods require setting up separate pipelines and keeping humans in the loop to maintain the quality. ![The Tower of Babel, Pieter Bruegel.](https://qdrant.tech/docs/gettingstarted/babel.jpg) The Tower of Babel, Pieter Bruegel. ## [Anchor](https://qdrant.tech/documentation/overview/vector-search/\#the-representation-revolution) The Representation Revolution The latest research in Machine Learning for NLP is heavily focused on training Deep Language Models. In this process, the neural network takes a large corpus of text as input and creates a mathematical representation of the words in the form of vectors. These vectors are created in such a way that words with similar meanings and occurring in similar contexts are grouped together and represented by similar vectors. And we can also take, for example, an average of all the word vectors to create the vector for a whole text (e.g query, sentence, or paragraph). ![deep neural](https://qdrant.tech/docs/gettingstarted/deep-neural.png) We can take those **dense vectors** produced by the network and use them as a **different data representation**. They are dense because neural networks will rarely produce zeros at any position. In contrary to sparse ones, they have a relatively low dimensionality — hundreds or a few thousand only. Unfortunately, if we want to have a look and understand the content of the document by looking at the vector it’s no longer possible. Dimensions are no longer representing the presence of specific words. Dense vectors can capture the meaning, not the words used in a text. That being said, **Large Language Models can automatically handle synonyms**. Moreso, since those neural networks might have been trained with multilingual corpora, they translate the same sentence, written in different languages, to similar vector representations, also called **embeddings**. And we can compare them to find similar pieces of text by calculating the distance to other vectors in our database. ![Input queries contain different words, but they are still converted into similar vector representations, because the neural encoder can capture the meaning of the sentences. That feature can capture synonyms but also different languages..](https://qdrant.tech/docs/gettingstarted/input.png) Input queries contain different words, but they are still converted into similar vector representations, because the neural encoder can capture the meaning of the sentences. That feature can capture synonyms but also different languages.. **Vector search** is a process of finding similar objects based on their embeddings similarity. The good thing is, you don’t have to design and train your neural network on your own. Many pre-trained models are available, either on **HuggingFace** or by using libraries like [SentenceTransformers](https://www.sbert.net/?ref=hackernoon.com). If you, however, prefer not to get your hands dirty with neural models, you can also create the embeddings with SaaS tools, like [co.embed API](https://docs.cohere.com/reference/embed?ref=hackernoon.com). ## [Anchor](https://qdrant.tech/documentation/overview/vector-search/\#why-qdrant) Why Qdrant? The challenge with vector search arises when we need to find similar documents in a big set of objects. If we want to find the closest examples, the naive approach would require calculating the distance to every document. That might work with dozens or even hundreds of examples but may become a bottleneck if we have more than that. When we work with relational data, we set up database indexes to speed things up and avoid full table scans. And the same is true for vector search. Qdrant is a fully-fledged vector database that speeds up the search process by using a graph-like structure to find the closest objects in sublinear time. So you don’t calculate the distance to every object from the database, but some candidates only. ![Vector search with Qdrant. Thanks to HNSW graph we are able to compare the distance to some of the objects from the database, not to all of them.](https://qdrant.tech/docs/gettingstarted/vector-search.png) Vector search with Qdrant. Thanks to HNSW graph we are able to compare the distance to some of the objects from the database, not to all of them. While doing a semantic search at scale, because this is what we sometimes call the vector search done on texts, we need a specialized tool to do it effectively — a tool like Qdrant. ## [Anchor](https://qdrant.tech/documentation/overview/vector-search/\#next-steps) Next Steps Vector search is an exciting alternative to sparse methods. It solves the issues we had with the keyword-based search without needing to maintain lots of heuristics manually. It requires an additional component, a neural encoder, to convert text into vectors. [**Tutorial 1 - Qdrant for Complete Beginners**](https://qdrant.tech/documentation/tutorials/search-beginners/) Despite its complicated background, vectors search is extraordinarily simple to set up. With Qdrant, you can have a search engine up-and-running in five minutes. Our [Complete Beginners tutorial](https://qdrant.tech/documentation/tutorials/search-beginners/) will show you how. [**Tutorial 2 - Question and Answer System**](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/) However, you can also choose SaaS tools to generate them and avoid building your model. Setting up a vector search project with Qdrant Cloud and Cohere co.embed API is fairly easy if you follow the [Question and Answer system tutorial](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/). There is another exciting thing about vector search. You can search for any kind of data as long as there is a neural network that would vectorize your data type. Do you think about a reverse image search? That’s also possible with vector embeddings. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/overview/vector-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/overview/vector-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-75-lllmstxt|> ## benchmarks --- # [Anchor](https://qdrant.tech/benchmarks/\#benchmarking-vector-databases) Benchmarking Vector Databases At Qdrant, performance is the top-most priority. We always make sure that we use system resources efficiently so you get the **fastest and most accurate results at the cheapest cloud costs**. So all of our decisions from [choosing Rust](https://qdrant.tech/articles/why-rust/), [io optimisations](https://qdrant.tech/articles/io_uring/), [serverless support](https://qdrant.tech/articles/serverless/), [binary quantization](https://qdrant.tech/articles/binary-quantization/), to our [fastembed library](https://qdrant.tech/articles/fastembed/) are all based on our principle. In this article, we will compare how Qdrant performs against the other vector search engines. Here are the principles we followed while designing these benchmarks: - We do comparative benchmarks, which means we focus on **relative numbers** rather than absolute numbers. - We use affordable hardware, so that you can reproduce the results easily. - We run benchmarks on the same exact machines to avoid any possible hardware bias. - All the benchmarks are [open-sourced](https://github.com/qdrant/vector-db-benchmark), so you can contribute and improve them. Scenarios we tested 1. Upload & Search benchmark on single node [Benchmark](https://qdrant.tech/benchmarks/single-node-speed-benchmark/) 2. Filtered search benchmark - [Benchmark](https://qdrant.tech/benchmarks/#filtered-search-benchmark) 3. Memory consumption benchmark - Coming soon 4. Cluster mode benchmark - Coming soon Some of our experiment design decisions are described in the [F.A.Q Section](https://qdrant.tech/benchmarks/#benchmarks-faq). Reach out to us on our [Discord channel](https://qdrant.to/discord) if you want to discuss anything related Qdrant or these benchmarks. ## [Anchor](https://qdrant.tech/benchmarks/\#single-node-benchmarks) Single node benchmarks We benchmarked several vector databases using various configurations of them on different datasets to check how the results may vary. Those datasets may have different vector dimensionality but also vary in terms of the distance function being used. We also tried to capture the difference we can expect while using some different configuration parameters, for both the engine itself and the search operation separately. **Updated: January/June 2024** Dataset:dbpedia-openai-1M-1536-angulardeep-image-96-angulargist-960-euclideanglove-100-angular Search threads:1001 Plot values: RPS Latency p95 latency Index time | Engine | Setup | Dataset | Upload Time(m) | Upload + Index Time(m) | Latency(ms) | P95(ms) | P99(ms) | RPS | Precision | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | qdrant | qdrant-sq-rps-m-64-ef-512 | dbpedia-openai-1M-1536-angular | 3.51 | 24.43 | 3.54 | 4.95 | 8.62 | 1238.0016 | 0.99 | | weaviate | latest-weaviate-m32 | dbpedia-openai-1M-1536-angular | 13.94 | 13.94 | 4.99 | 7.16 | 11.33 | 1142.13 | 0.97 | | elasticsearch | elasticsearch-m-32-ef-128 | dbpedia-openai-1M-1536-angular | 19.18 | 83.72 | 22.10 | 72.53 | 135.68 | 716.80 | 0.98 | | redis | redis-m-32-ef-256 | dbpedia-openai-1M-1536-angular | 92.49 | 92.49 | 140.65 | 160.85 | 167.35 | 625.27 | 0.97 | | milvus | milvus-m-16-ef-128 | dbpedia-openai-1M-1536-angular | 0.27 | 1.16 | 393.31 | 441.32 | 576.65 | 219.11 | 0.99 | _Download raw data: [here](https://qdrant.tech/benchmarks/results-1-100-thread-2024-06-15.json)_ ## [Anchor](https://qdrant.tech/benchmarks/\#observations) Observations Most of the engines have improved since [our last run](https://qdrant.tech/benchmarks/single-node-speed-benchmark-2022/). Both life and software have trade-offs but some clearly do better: - **`Qdrant` achives highest RPS and lowest latencies in almost all the scenarios, no matter the precision threshold and the metric we choose.** It has also shown 4x RPS gains on one of the datasets. - `Elasticsearch` has become considerably fast for many cases but it’s very slow in terms of indexing time. It can be 10x slower when storing 10M+ vectors of 96 dimensions! (32mins vs 5.5 hrs) - `Milvus` is the fastest when it comes to indexing time and maintains good precision. However, it’s not on-par with others when it comes to RPS or latency when you have higher dimension embeddings or more number of vectors. - `Redis` is able to achieve good RPS but mostly for lower precision. It also achieved low latency with single thread, however its latency goes up quickly with more parallel requests. Part of this speed gain comes from their custom protocol. - `Weaviate` has improved the least since our last run. ## [Anchor](https://qdrant.tech/benchmarks/\#how-to-read-the-results) How to read the results - Choose the dataset and the metric you want to check. - Select a precision threshold that would be satisfactory for your usecase. This is important because ANN search is all about trading precision for speed. This means in any vector search benchmark, **two results must be compared only when you have similar precision**. However most benchmarks miss this critical aspect. - The table is sorted by the value of the selected metric (RPS / Latency / p95 latency / Index time), and the first entry is always the winner of the category 🏆 ### [Anchor](https://qdrant.tech/benchmarks/\#latency-vs-rps) Latency vs RPS In our benchmark we test two main search usage scenarios that arise in practice. - **Requests-per-Second (RPS)**: Serve more requests per second in exchange of individual requests taking longer (i.e. higher latency). This is a typical scenario for a web application, where multiple users are searching at the same time. To simulate this scenario, we run client requests in parallel with multiple threads and measure how many requests the engine can handle per second. - **Latency**: React quickly to individual requests rather than serving more requests in parallel. This is a typical scenario for applications where server response time is critical. Self-driving cars, manufacturing robots, and other real-time systems are good examples of such applications. To simulate this scenario, we run client in a single thread and measure how long each request takes. ### [Anchor](https://qdrant.tech/benchmarks/\#tested-datasets) Tested datasets Our [benchmark tool](https://github.com/qdrant/vector-db-benchmark) is inspired by [github.com/erikbern/ann-benchmarks](https://github.com/erikbern/ann-benchmarks/). We used the following datasets to test the performance of the engines on ANN Search tasks: | Datasets | \# Vectors | Dimensions | Distance | | --- | --- | --- | --- | | [dbpedia-openai-1M-angular](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) | 1M | 1536 | cosine | | [deep-image-96-angular](http://sites.skoltech.ru/compvision/noimi/) | 10M | 96 | cosine | | [gist-960-euclidean](http://corpus-texmex.irisa.fr/) | 1M | 960 | euclidean | | [glove-100-angular](https://nlp.stanford.edu/projects/glove/) | 1.2M | 100 | cosine | ### [Anchor](https://qdrant.tech/benchmarks/\#setup) Setup ![Benchmarks configuration](https://qdrant.tech/benchmarks/client-server.png) Benchmarks configuration - This was our setup for this experiment: - Client: 8 vcpus, 16 GiB memory, 64GiB storage ( `Standard D8ls v5` on Azure Cloud) - Server: 8 vcpus, 32 GiB memory, 64GiB storage ( `Standard D8s v3` on Azure Cloud) - The Python client uploads data to the server, waits for all required indexes to be constructed, and then performs searches with configured number of threads. We repeat this process with different configurations for each engine, and then select the best one for a given precision. - We ran all the engines in docker and limited their memory to 25GB. This was used to ensure fairness by avoiding the case of some engine configs being too greedy with RAM usage. This 25 GB limit is completely fair because even to serve the largest `dbpedia-openai-1M-1536-angular` dataset, one hardly needs `1M * 1536 * 4bytes * 1.5 = 8.6GB` of RAM (including vectors + index). Hence, we decided to provide all the engines with ~3x the requirement. Please note that some of the configs of some engines crashed on some datasets because of the 25 GB memory limit. That’s why you might see fewer points for some engines on choosing higher precision thresholds. --- # [Anchor](https://qdrant.tech/benchmarks/\#filtered-search-benchmark) Filtered search benchmark Applying filters to search results brings a whole new level of complexity. It is no longer enough to apply one algorithm to plain data. With filtering, it becomes a matter of the _cross-integration_ of the different indices. To measure how well different search engines perform in this scenario, we have prepared a set of **Filtered ANN Benchmark Datasets** - [https://github.com/qdrant/ann-filtering-benchmark-datasets](https://github.com/qdrant/ann-filtering-benchmark-datasets) It is similar to the ones used in the [ann-benchmarks project](https://github.com/erikbern/ann-benchmarks/) but enriched with payload metadata and pre-generated filtering requests. It includes synthetic and real-world datasets with various filters, from keywords to geo-spatial queries. ### [Anchor](https://qdrant.tech/benchmarks/\#why-filtering-is-not-trivial) Why filtering is not trivial? Not many ANN algorithms are compatible with filtering. HNSW is one of the few of them, but search engines approach its integration in different ways: - Some use **post-filtering**, which applies filters after ANN search. It doesn’t scale well as it either loses results or requires many candidates on the first stage. - Others use **pre-filtering**, which requires a binary mask of the whole dataset to be passed into the ANN algorithm. It is also not scalable, as the mask size grows linearly with the dataset size. On top of it, there is also a problem with search accuracy. It appears if too many vectors are filtered out, so the HNSW graph becomes disconnected. Qdrant uses a different approach, not requiring pre- or post-filtering while addressing the accuracy problem. Read more about the Qdrant approach in our [Filtrable HNSW](https://qdrant.tech/articles/filtrable-hnsw/) article. ## [Anchor](https://qdrant.tech/benchmarks/\#) **Updated: Feb 2023** Dataset:keyword-100range-100int-2048100-kw-small-vocabkeyword-2048geo-radius-100range-2048geo-radius-2048int-100h-and-m-2048arxiv-titles-384 Plot values: Regular search Filter search _Download raw data: [here](https://qdrant.tech/benchmarks/filter-result-2023-02-03.json)_ ## [Anchor](https://qdrant.tech/benchmarks/\#filtered-results) Filtered Results As you can see from the charts, there are three main patterns: - **Speed boost** \- for some engines/queries, the filtered search is faster than the unfiltered one. It might happen if the filter is restrictive enough, to completely avoid the usage of the vector index. - **Speed downturn** \- some engines struggle to keep high RPS, it might be related to the requirement of building a filtering mask for the dataset, as described above. - **Accuracy collapse** \- some engines are loosing accuracy dramatically under some filters. It is related to the fact that the HNSW graph becomes disconnected, and the search becomes unreliable. Qdrant avoids all these problems and also benefits from the speed boost, as it implements an advanced [query planning strategy](https://qdrant.tech/documentation/search/#query-planning). --- # [Anchor](https://qdrant.tech/benchmarks/\#benchmarks-faq) Benchmarks F.A.Q. ## [Anchor](https://qdrant.tech/benchmarks/\#are-we-biased) Are we biased? Probably, yes. Even if we try to be objective, we are not experts in using all the existing vector databases. We build Qdrant and know the most about it. Due to that, we could have missed some important tweaks in different vector search engines. However, we tried our best, kept scrolling the docs up and down, experimented with combinations of different configurations, and gave all of them an equal chance to stand out. If you believe you can do it better than us, our **benchmarks are fully [open-sourced](https://github.com/qdrant/vector-db-benchmark), and contributions are welcome**! ## [Anchor](https://qdrant.tech/benchmarks/\#what-do-we-measure) What do we measure? There are several factors considered while deciding on which database to use. Of course, some of them support a different subset of functionalities, and those might be a key factor to make the decision. But in general, we all care about the search precision, speed, and resources required to achieve it. There is one important thing - **the speed of the vector databases should to be compared only if they achieve the same precision**. Otherwise, they could maximize the speed factors by providing inaccurate results, which everybody would rather avoid. Thus, our benchmark results are compared only at a specific search precision threshold. ## [Anchor](https://qdrant.tech/benchmarks/\#how-we-select-hardware) How we select hardware? In our experiments, we are not focusing on the absolute values of the metrics but rather on a relative comparison of different engines. What is important is the fact we used the same machine for all the tests. It was just wiped off between launching different engines. We selected an average machine, which you can easily rent from almost any cloud provider. No extra quota or custom configuration is required. ## [Anchor](https://qdrant.tech/benchmarks/\#why-you-are-not-comparing-with-faiss-or-annoy) Why you are not comparing with FAISS or Annoy? Libraries like FAISS provide a great tool to do experiments with vector search. But they are far away from real usage in production environments. If you are using FAISS in production, in the best case, you never need to update it in real-time. In the worst case, you have to create your custom wrapper around it to support CRUD, high availability, horizontal scalability, concurrent access, and so on. Some vector search engines even use FAISS under the hood, but a search engine is much more than just an indexing algorithm. We do, however, use the same benchmark datasets as the famous [ann-benchmarks project](https://github.com/erikbern/ann-benchmarks), so you can align your expectations for any practical reasons. ### [Anchor](https://qdrant.tech/benchmarks/\#why-we-decided-to-test-with-the-python-client) Why we decided to test with the Python client There is no consensus when it comes to the best technology to run benchmarks. You’re free to choose Go, Java or Rust-based systems. But there are two main reasons for us to use Python for this: 1. While generating embeddings you’re most likely going to use Python and python based ML frameworks. 2. Based on GitHub stars, python clients are one of the most popular clients across all the engines. From the user’s perspective, the crucial thing is the latency perceived while using a specific library - in most cases a Python client. Nobody can and even should redefine the whole technology stack, just because of using a specific search tool. That’s why we decided to focus primarily on official Python libraries, provided by the database authors. Those may use some different protocols under the hood, but at the end of the day, we do not care how the data is transferred, as long as it ends up in the target location. ## [Anchor](https://qdrant.tech/benchmarks/\#what-about-closed-source-saas-platforms) What about closed-source SaaS platforms? There are some vector databases available as SaaS only so that we couldn’t test them on the same machine as the rest of the systems. That makes the comparison unfair. That’s why we purely focused on testing the Open Source vector databases, so everybody may reproduce the benchmarks easily. This is not the final list, and we’ll continue benchmarking as many different engines as possible. ## [Anchor](https://qdrant.tech/benchmarks/\#how-to-reproduce-the-benchmark) How to reproduce the benchmark? The source code is available on [Github](https://github.com/qdrant/vector-db-benchmark) and has a `README.md` file describing the process of running the benchmark for a specific engine. ## [Anchor](https://qdrant.tech/benchmarks/\#how-to-contribute) How to contribute? We made the benchmark Open Source because we believe that it has to be transparent. We could have misconfigured one of the engines or just done it inefficiently. If you feel like you could help us out, check out our [benchmark repository](https://github.com/qdrant/vector-db-benchmark). Up! <|page-76-lllmstxt|> ## fastembed-quickstart - [Documentation](https://qdrant.tech/documentation/) - [Fastembed](https://qdrant.tech/documentation/fastembed/) - Quickstart --- # [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-quickstart/\#how-to-generate-text-embedings-with-fastembed) How to Generate Text Embedings with FastEmbed ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-quickstart/\#install-fastembed) Install FastEmbed ```python pip install fastembed ``` Just for demo purposes, you will use Lists and NumPy to work with sample data. ```python from typing import List import numpy as np ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-quickstart/\#load-default-model) Load default model In this example, you will use the default text embedding model, `BAAI/bge-small-en-v1.5`. ```python from fastembed import TextEmbedding ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-quickstart/\#add-sample-data) Add sample data Now, add two sample documents. Your documents must be in a list, and each document must be a string ```python documents: List[str] = [\ "FastEmbed is lighter than Transformers & Sentence-Transformers.",\ "FastEmbed is supported by and maintained by Qdrant.",\ ] ``` Download and initialize the model. Print a message to verify the process. ```python embedding_model = TextEmbedding() print("The model BAAI/bge-small-en-v1.5 is ready to use.") ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-quickstart/\#embed-data) Embed data Generate embeddings for both documents. ```python embeddings_generator = embedding_model.embed(documents) embeddings_list = list(embeddings_generator) len(embeddings_list[0]) ``` Here is the sample document list. The default model creates vectors with 384 dimensions. ```bash Document: This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc. Vector of type: with shape: (384,) Document: fastembed is supported by and maintained by Qdrant. Vector of type: with shape: (384,) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-quickstart/\#visualize-embeddings) Visualize embeddings ```python print("Embeddings:\n", embeddings_list) ``` The embeddings don’t look too interesting, but here is a visual. ```bash Embeddings: [[-0.11154681 0.00976555 0.00524559 0.01951888 -0.01934952 0.02943449\ -0.10519084 -0.00890122 0.01831438 0.01486796 -0.05642502 0.02561352\ -0.00120165 0.00637456 0.02633459 0.0089221 0.05313658 0.03955453\ -0.04400245 -0.02929407 0.04691846 -0.02515868 0.00778646 -0.05410657\ ...\ -0.00243012 -0.01820582 0.02938612 0.02108984 -0.02178085 0.02971899\ -0.00790564 0.03561783 0.0652488 -0.04371546 -0.05550042 0.02651665\ -0.01116153 -0.01682246 -0.05976734 -0.03143916 0.06522726 0.01801389\ -0.02611006 0.01627177 -0.0368538 0.03968835 0.027597 0.03305927]] ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-quickstart.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-quickstart.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-77-lllmstxt|> ## collaborative-filtering - [Documentation](https://qdrant.tech/documentation/) - [Advanced tutorials](https://qdrant.tech/documentation/advanced-tutorials/) - Build a Recommendation System with Collaborative Filtering --- # [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#use-collaborative-filtering-to-build-a-movie-recommendation-system-with-qdrant) Use Collaborative Filtering to Build a Movie Recommendation System with Qdrant | Time: 45 min | Level: Intermediate | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/qdrant/examples/blob/master/collaborative-filtering/collaborative-filtering.ipynb) | | | --- | --- | --- | --- | Every time Spotify recommends the next song from a band you’ve never heard of, it uses a recommendation algorithm based on other users’ interactions with that song. This type of algorithm is known as **collaborative filtering**. Unlike content-based recommendations, collaborative filtering excels when the objects’ semantics are loosely or unrelated to users’ preferences. This adaptability is what makes it so fascinating. Movie, music, or book recommendations are good examples of such use cases. After all, we rarely choose which book to read purely based on the plot twists. The traditional way to build a collaborative filtering engine involves training a model that converts the sparse matrix of user-to-item relations into a compressed, dense representation of user and item vectors. Some of the most commonly referenced algorithms for this purpose include [SVD (Singular Value Decomposition)](https://en.wikipedia.org/wiki/Singular_value_decomposition) and [Factorization Machines](https://en.wikipedia.org/wiki/Matrix_factorization_%28recommender_systems%29). However, the model training approach requires significant resource investments. Model training necessitates data, regular re-training, and a mature infrastructure. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#methodology) Methodology Fortunately, there is a way to build collaborative filtering systems without any model training. You can obtain interpretable recommendations and have a scalable system using a technique based on similarity search. Let’s explore how this works with an example of building a movie recommendation system. Recommendation system with Qdrant and sparse vectors (Collaborative Filtering) - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [Recommendation system with Qdrant and sparse vectors (Collaborative Filtering)](https://www.youtube.com/watch?v=9B7RrmQCQeQ) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=9B7RrmQCQeQ&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 3:55 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=9B7RrmQCQeQ "Watch on YouTube") ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#implementation) Implementation To implement this, you will use a simple yet powerful resource: [Qdrant with Sparse Vectors](https://qdrant.tech/articles/sparse-vectors/). Notebook: [You can try this code here](https://githubtocolab.com/qdrant/examples/blob/master/collaborative-filtering/collaborative-filtering.ipynb) ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#setup) Setup You have to first import the necessary libraries and define the environment. ```python import os import pandas as pd import requests from qdrant_client import QdrantClient, models from qdrant_client.models import PointStruct, SparseVector, NamedSparseVector from collections import defaultdict --- # OMDB API Key - for movie posters omdb_api_key = os.getenv("OMDB_API_KEY") --- # Set Qdrant Client qdrant_client = QdrantClient( os.getenv("QDRANT_HOST"), api_key=os.getenv("QDRANT_API_KEY") ) ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#define-output) Define output Here, you will configure the recommendation engine to retrieve movie posters as output. ```python --- # Function to get movie poster using OMDB API def get_movie_poster(imdb_id, api_key): url = f"https://www.omdbapi.com/?i={imdb_id}&apikey={api_key}" data = requests.get(url).json() return data.get('Poster'), data ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#prepare-the-data) Prepare the data Load the movie datasets. These include three main CSV files: user ratings, movie titles, and OMDB IDs. ```python --- # Load CSV files ratings_df = pd.read_csv('data/ratings.csv', low_memory=False) movies_df = pd.read_csv('data/movies.csv', low_memory=False) --- # Convert movieId in ratings_df and movies_df to string ratings_df['movieId'] = ratings_df['movieId'].astype(str) movies_df['movieId'] = movies_df['movieId'].astype(str) rating = ratings_df['rating'] --- # Normalize ratings ratings_df['rating'] = (rating - rating.mean()) / rating.std() --- # Merge ratings with movie metadata to get movie titles merged_df = ratings_df.merge( movies_df[['movieId', 'title']], left_on='movieId', right_on='movieId', how='inner' ) --- # Aggregate ratings to handle duplicate (userId, title) pairs ratings_agg_df = merged_df.groupby(['userId', 'movieId']).rating.mean().reset_index() ratings_agg_df.head() ``` | | userId | movieId | rating | | --- | --- | --- | --- | | 0 | 1 | 1 | 0.429960 | | 1 | 1 | 1036 | 1.369846 | | 2 | 1 | 1049 | -0.509926 | | 3 | 1 | 1066 | 0.429960 | | 4 | 1 | 110 | 0.429960 | ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#convert-to-sparse) Convert to sparse If you want to search across numerous reviews from different users, you can represent these reviews in a sparse matrix. ```python --- # Convert ratings to sparse vectors user_sparse_vectors = defaultdict(lambda: {"values": [], "indices": []}) for row in ratings_agg_df.itertuples(): user_sparse_vectors[row.userId]["values"].append(row.rating) user_sparse_vectors[row.userId]["indices"].append(int(row.movieId)) ``` ![collaborative-filtering](https://qdrant.tech/blog/collaborative-filtering/collaborative-filtering.png) ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#upload-the-data) Upload the data Here, you will initialize the Qdrant client and create a new collection to store the data. Convert the user ratings to sparse vectors and include the `movieId` in the payload. ```python --- # Define a data generator def data_generator(): for user_id, sparse_vector in user_sparse_vectors.items(): yield PointStruct( id=user_id, vector={"ratings": SparseVector( indices=sparse_vector["indices"], values=sparse_vector["values"] )}, payload={"user_id": user_id, "movie_id": sparse_vector["indices"]} ) --- # Upload points using the data generator qdrant_client.upload_points( collection_name=collection_name, points=data_generator() ) ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#define-query) Define query In order to get recommendations, we need to find users with similar tastes to ours. Let’s describe our preferences by providing ratings for some of our favorite movies. `1` indicates that we like the movie, `-1` indicates that we dislike it. ```python my_ratings = { 603: 1, # Matrix 13475: 1, # Star Trek 11: 1, # Star Wars 1091: -1, # The Thing 862: 1, # Toy Story 597: -1, # Titanic 680: -1, # Pulp Fiction 13: 1, # Forrest Gump 120: 1, # Lord of the Rings 87: -1, # Indiana Jones 562: -1 # Die Hard } ``` Click to see the code for `to_vector` ```python --- # Create sparse vector from my_ratings def to_vector(ratings): vector = SparseVector( values=[], indices=[] ) for movie_id, rating in ratings.items(): vector.values.append(rating) vector.indices.append(movie_id) return vector ``` ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#run-the-query) Run the query From the uploaded list of movies with ratings, we can perform a search in Qdrant to get the top most similar users to us. ```python --- # Perform the search results = qdrant_client.query_points( collection_name=collection_name, query=to_vector(my_ratings), using="ratings", limit=20 ).points ``` Now we can find the movies liked by the other similar users, but we haven’t seen yet. Let’s combine the results from found users, filter out seen movies, and sort by the score. ```python --- # Convert results to scores and sort by score def results_to_scores(results): movie_scores = defaultdict(lambda: 0) for result in results: for movie_id in result.payload["movie_id"]: movie_scores[movie_id] += result.score return movie_scores --- # Convert results to scores and sort by score movie_scores = results_to_scores(results) top_movies = sorted(movie_scores.items(), key=lambda x: x[1], reverse=True) ``` Visualize results in Jupyter Notebook Finally, we display the top 5 recommended movies along with their posters and titles. ```python --- # Create HTML to display top 5 results html_content = "
" for movie_id, score in top_movies[:5]: imdb_id_row = links.loc[links['movieId'] == int(movie_id), 'imdbId'] if not imdb_id_row.empty: imdb_id = imdb_id_row.values[0] poster_url, movie_info = get_movie_poster(imdb_id, omdb_api_key) movie_title = movie_info.get('Title', 'Unknown Title') html_content += f"""
Poster
{movie_title}
Score: {score}
""" else: continue # Skip if imdb_id is not found html_content += "
" display(HTML(html_content)) ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#recommendations) Recommendations For a complete display of movie posters, check the [notebook output](https://github.com/qdrant/examples/blob/master/collaborative-filtering/collaborative-filtering.ipynb). Here are the results without html content. ```text Toy Story, Score: 131.2033799 Monty Python and the Holy Grail, Score: 131.2033799 Star Wars: Episode V - The Empire Strikes Back, Score: 131.2033799 Star Wars: Episode VI - Return of the Jedi, Score: 131.2033799 Men in Black, Score: 131.2033799 ``` On top of collaborative filtering, we can further enhance the recommendation system by incorporating other features like user demographics, movie genres, or movie tags. Or, for example, only consider recent ratings via a time-based filter. This way, we can recommend movies that are currently popular among users. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/\#conclusion) Conclusion As demonstrated, it is possible to build an interesting movie recommendation system without intensive model training using Qdrant and Sparse Vectors. This approach not only simplifies the recommendation process but also makes it scalable and interpretable. In future tutorials, we can experiment more with this combination to further enhance our recommendation systems. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/collaborative-filtering.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/collaborative-filtering.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-78-lllmstxt|> ## search-as-you-type - [Articles](https://qdrant.tech/articles/) - Semantic Search As You Type [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Semantic Search As You Type Andre Bogus · August 14, 2023 ![Semantic Search As You Type](https://qdrant.tech/articles_data/search-as-you-type/preview/title.jpg) Qdrant is one of the fastest vector search engines out there, so while looking for a demo to show off, we came upon the idea to do a search-as-you-type box with a fully semantic search backend. Now we already have a semantic/keyword hybrid search on our website. But that one is written in Python, which incurs some overhead for the interpreter. Naturally, I wanted to see how fast I could go using Rust. Since Qdrant doesn’t embed by itself, I had to decide on an embedding model. The prior version used the [SentenceTransformers](https://www.sbert.net/) package, which in turn employs Bert-based [All-MiniLM-L6-V2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main) model. This model is battle-tested and delivers fair results at speed, so not experimenting on this front I took an [ONNX version](https://huggingface.co/optimum/all-MiniLM-L6-v2/tree/main) and ran that within the service. The workflow looks like this: ![Search Qdrant by Embedding](https://qdrant.tech/articles_data/search-as-you-type/Qdrant_Search_by_Embedding.png) This will, after tokenizing and embedding send a `/collections/site/points/search` POST request to Qdrant, sending the following JSON: ```json POST collections/site/points/search { "vector": [-0.06716014,-0.056464013, ...(382 values omitted)], "limit": 5, "with_payload": true, } ``` Even with avoiding a network round-trip, the embedding still takes some time. As always in optimization, if you cannot do the work faster, a good solution is to avoid work altogether (please don’t tell my employer). This can be done by pre-computing common prefixes and calculating embeddings for them, then storing them in a `prefix_cache` collection. Now the [`recommend`](https://api.qdrant.tech/api-reference/search/recommend-points) API method can find the best matches without doing any embedding. For now, I use short (up to and including 5 letters) prefixes, but I can also parse the logs to get the most common search terms and add them to the cache later. ![Qdrant Recommendation](https://qdrant.tech/articles_data/search-as-you-type/Qdrant_Recommendation.png) Making that work requires setting up the `prefix_cache` collection with points that have the prefix as their `point_id` and the embedding as their `vector`, which lets us do the lookup with no search or index. The `prefix_to_id` function currently uses the `u64` variant of `PointId`, which can hold eight bytes, enough for this use. If the need arises, one could instead encode the names as UUID, hashing the input. Since I know all our prefixes are within 8 bytes, I decided against this for now. The `recommend` endpoint works roughly the same as `search_points`, but instead of searching for a vector, Qdrant searches for one or more points (you can also give negative example points the search engine will try to avoid in the results). It was built to help drive recommendation engines, saving the round-trip of sending the current point’s vector back to Qdrant to find more similar ones. However Qdrant goes a bit further by allowing us to select a different collection to lookup the points, which allows us to keep our `prefix_cache` collection separate from the site data. So in our case, Qdrant first looks up the point from the `prefix_cache`, takes its vector and searches for that in the `site` collection, using the precomputed embeddings from the cache. The API endpoint expects a POST of the following JSON to `/collections/site/points/recommend`: ```json POST collections/site/points/recommend { "positive": [1936024932], "limit": 5, "with_payload": true, "lookup_from": { "collection": "prefix_cache" } } ``` Now I have, in the best Rust tradition, a blazingly fast semantic search. To demo it, I used our [Qdrant documentation website](https://qdrant.tech/documentation/)’s page search, replacing our previous Python implementation. So in order to not just spew empty words, here is a benchmark, showing different queries that exercise different code paths. Since the operations themselves are far faster than the network whose fickle nature would have swamped most measurable differences, I benchmarked both the Python and Rust services locally. I’m measuring both versions on the same AMD Ryzen 9 5900HX with 16GB RAM running Linux. The table shows the average time and error bound in milliseconds. I only measured up to a thousand concurrent requests. None of the services showed any slowdown with more requests in that range. I do not expect our service to become DDOS’d, so I didn’t benchmark with more load. Without further ado, here are the results: | query length | Short | Long | | --- | --- | --- | | Python 🐍 | 16 ± 4 ms | 16 ± 4 ms | | Rust 🦀 | 1½ ± ½ ms | 5 ± 1 ms | The Rust version consistently outperforms the Python version and offers a semantic search even on few-character queries. If the prefix cache is hit (as in the short query length), the semantic search can even get more than ten times faster than the Python version. The general speed-up is due to both the relatively lower overhead of Rust + Actix Web compared to Python + FastAPI (even if that already performs admirably), as well as using ONNX Runtime instead of SentenceTransformers for the embedding. The prefix cache gives the Rust version a real boost by doing a semantic search without doing any embedding work. As an aside, while the millisecond differences shown here may mean relatively little for our users, whose latency will be dominated by the network in between, when typing, every millisecond more or less can make a difference in user perception. Also search-as-you-type generates between three and five times as much load as a plain search, so the service will experience more traffic. Less time per request means being able to handle more of them. Mission accomplished! But wait, there’s more! ### [Anchor](https://qdrant.tech/articles/search-as-you-type/\#prioritizing-exact-matches-and-headings) Prioritizing Exact Matches and Headings To improve on the quality of the results, Qdrant can do multiple searches in parallel, and then the service puts the results in sequence, taking the first best matches. The extended code searches: 1. Text matches in titles 2. Text matches in body (paragraphs or lists) 3. Semantic matches in titles 4. Any Semantic matches Those are put together by taking them in the above order, deduplicating as necessary. ![merge workflow](https://qdrant.tech/articles_data/search-as-you-type/sayt_merge.png) Instead of sending a `search` or `recommend` request, one can also send a `search/batch` or `recommend/batch` request, respectively. Each of those contain a `"searches"` property with any number of search/recommend JSON requests: ```json POST collections/site/points/search/batch { "searches": [\ {\ "vector": [-0.06716014,-0.056464013, ...],\ "filter": {\ "must": [\ { "key": "text", "match": { "text": }},\ { "key": "tag", "match": { "any": ["h1", "h2", "h3"] }},\ ]\ }\ ...,\ },\ {\ "vector": [-0.06716014,-0.056464013, ...],\ "filter": {\ "must": [ { "key": "body", "match": { "text": }} ]\ }\ ...,\ },\ {\ "vector": [-0.06716014,-0.056464013, ...],\ "filter": {\ "must": [ { "key": "tag", "match": { "any": ["h1", "h2", "h3"] }} ]\ }\ ...,\ },\ {\ "vector": [-0.06716014,-0.056464013, ...],\ ...,\ },\ ] } ``` As the queries are done in a batch request, there isn’t any additional network overhead and only very modest computation overhead, yet the results will be better in many cases. The only additional complexity is to flatten the result lists and take the first 5 results, deduplicating by point ID. Now there is one final problem: The query may be short enough to take the recommend code path, but still not be in the prefix cache. In that case, doing the search _sequentially_ would mean two round-trips between the service and the Qdrant instance. The solution is to _concurrently_ start both requests and take the first successful non-empty result. ![sequential vs. concurrent flow](https://qdrant.tech/articles_data/search-as-you-type/sayt_concurrency.png) While this means more load for the Qdrant vector search engine, this is not the limiting factor. The relevant data is already in cache in many cases, so the overhead stays within acceptable bounds, and the maximum latency in case of prefix cache misses is measurably reduced. The code is available on the [Qdrant github](https://github.com/qdrant/page-search) To sum up: Rust is fast, recommend lets us use precomputed embeddings, batch requests are awesome and one can do a semantic search in mere milliseconds. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/search-as-you-type.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/search-as-you-type.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-79-lllmstxt|> ## multimodal-search - [Documentation](https://qdrant.tech/documentation/) - Multilingual & Multimodal RAG with LlamaIndex --- # [Anchor](https://qdrant.tech/documentation/multimodal-search/\#multilingual--multimodal-search-with-llamaindex) Multilingual & Multimodal Search with LlamaIndex ![Snow prints](https://qdrant.tech/documentation/examples/multimodal-search/image-1.png) | Time: 15 min | Level: Beginner | Output: [GitHub](https://github.com/qdrant/examples/blob/master/multimodal-search/Multimodal_Search_with_LlamaIndex.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/qdrant/examples/blob/master/multimodal-search/Multimodal_Search_with_LlamaIndex.ipynb) | | --- | --- | --- | --- | ## [Anchor](https://qdrant.tech/documentation/multimodal-search/\#overview) Overview We often understand and share information more effectively when combining different types of data. For example, the taste of comfort food can trigger childhood memories. We might describe a song with just “pam pam clap” sounds. Instead of writing paragraphs. Sometimes, we may use emojis and stickers to express how we feel or to share complex ideas. Modalities of data such as **text**, **images**, **video** and **audio** in various combinations form valuable use cases for Semantic Search applications. Vector databases, being **modality-agnostic**, are perfect for building these applications. In this simple tutorial, we are working with two simple modalities: **image** and **text** data. However, you can create a Semantic Search application with any combination of modalities if you choose the right embedding model to bridge the **semantic gap**. > The **semantic gap** refers to the difference between low-level features (aka brightness) and high-level concepts (aka cuteness). For example, the [vdr-2b-multi-v1 model](https://huggingface.co/llamaindex/vdr-2b-multi-v1) from LlamaIndex is designed for multilingual embedding, particularly effective for visual document retrieval across multiple languages and domains. It allows for searching and querying visually rich multilingual documents without the need for OCR or other data extraction pipelines. ## [Anchor](https://qdrant.tech/documentation/multimodal-search/\#setup) Setup First, install the required libraries `qdrant-client` and `llama-index-embeddings-huggingface`. ```bash pip install qdrant-client llama-index-embeddings-huggingface ``` ## [Anchor](https://qdrant.tech/documentation/multimodal-search/\#dataset) Dataset To make the demonstration simple, we created a tiny dataset of images and their captions for you. Images can be downloaded from [here](https://github.com/qdrant/examples/tree/master/multimodal-search/images). It’s **important** to place them in the same folder as your code/notebook, in the folder named `images`. ## [Anchor](https://qdrant.tech/documentation/multimodal-search/\#vectorize-data) Vectorize data `LlamaIndex`’s `vdr-2b-multi-v1` model supports cross-lingual retrieval, allowing for effective searches across languages and domains. It encodes document page screenshots into dense single-vector representations, eliminating the need for OCR and other complex data extraction processes. Let’s embed the images and their captions in the **shared embedding space**. ```python from llama_index.embeddings.huggingface import HuggingFaceEmbedding model = HuggingFaceEmbedding( model_name="llamaindex/vdr-2b-multi-v1", device="cpu", # "mps" for mac, "cuda" for nvidia GPUs trust_remote_code=True, ) documents = [\ {"caption": "An image about plane emergency safety.", "image": "images/image-1.png"},\ {"caption": "An image about airplane components.", "image": "images/image-2.png"},\ {"caption": "An image about COVID safety restrictions.", "image": "images/image-3.png"},\ {"caption": "An confidential image about UFO sightings.", "image": "images/image-4.png"},\ {"caption": "An image about unusual footprints on Aralar 2011.", "image": "images/image-5.png"},\ ] text_embeddings = model.get_text_embedding_batch([doc["caption"] for doc in documents]) image_embeddings = model.get_image_embedding_batch([doc["image"] for doc in documents]) ``` ## [Anchor](https://qdrant.tech/documentation/multimodal-search/\#upload-data-to-qdrant) Upload data to Qdrant 1. **Create a client object for Qdrant**. ```python from qdrant_client import QdrantClient, models --- # docker run -p 6333:6333 qdrant/qdrant client = QdrantClient(url="http://localhost:6333/") ``` 2. **Create a new collection for the images with captions**. ```python COLLECTION_NAME = "llama-multi" if not client.collection_exists(COLLECTION_NAME): client.create_collection( collection_name=COLLECTION_NAME, vectors_config={ "image": models.VectorParams(size=len(image_embeddings[0]), distance=models.Distance.COSINE), "text": models.VectorParams(size=len(text_embeddings[0]), distance=models.Distance.COSINE), } ) ``` 3. **Upload our images with captions to the Collection**. ```python client.upload_points( collection_name=COLLECTION_NAME, points=[\ models.PointStruct(\ id=idx,\ vector={\ "text": text_embeddings[idx],\ "image": image_embeddings[idx],\ },\ payload=doc\ )\ for idx, doc in enumerate(documents)\ ] ) ``` ## [Anchor](https://qdrant.tech/documentation/multimodal-search/\#search) Search ### [Anchor](https://qdrant.tech/documentation/multimodal-search/\#text-to-image) Text-to-Image Let’s see what image we will get to the query “ _Adventures on snow hills_”. ```python from PIL import Image find_image = model.get_query_embedding("Adventures on snow hills") Image.open(client.query_points( collection_name=COLLECTION_NAME, query=find_image, using="image", with_payload=["image"], limit=1 ).points[0].payload['image']) ``` Let’s also run the same query in Italian and compare the results. ### [Anchor](https://qdrant.tech/documentation/multimodal-search/\#multilingual-search) Multilingual Search Now, let’s do a multilingual search using an Italian query: ```python Image.open(client.query_points( collection_name=COLLECTION_NAME, query=model.get_query_embedding("Avventure sulle colline innevate"), using="image", with_payload=["image"], limit=1 ).points[0].payload['image']) ``` **Response:** ![Snow prints](https://qdrant.tech/documentation/advanced-tutorials/snow-prints.png) ### [Anchor](https://qdrant.tech/documentation/multimodal-search/\#image-to-text) Image-to-Text Now, let’s do a reverse search with the following image: ![Airplane](https://qdrant.tech/documentation/advanced-tutorials/airplane.png) ```python client.query_points( collection_name=COLLECTION_NAME, query=model.get_image_embedding("images/image-2.png"), # Now we are searching only among text vectors with our image query using="text", with_payload=["caption"], limit=1 ).points[0].payload['caption'] ``` **Response:** ```text 'An image about plane emergency safety.' ``` ## [Anchor](https://qdrant.tech/documentation/multimodal-search/\#next-steps) Next steps Use cases of even just Image & Text Multimodal Search are countless: E-Commerce, Media Management, Content Recommendation, Emotion Recognition Systems, Biomedical Image Retrieval, Spoken Sign Language Transcription, etc. Imagine a scenario: a user wants to find a product similar to a picture they have, but they also have specific textual requirements, like “ _in beige colour_”. You can search using just texts or images and combine their embeddings in a **late fusion manner** (summing and weighting might work surprisingly well). Moreover, using [Discovery Search](https://qdrant.tech/articles/discovery-search/) with both modalities, you can provide users with information that is impossible to retrieve unimodally! Join our [Discord community](https://qdrant.to/discord), where we talk about vector search and similarity learning, experiment, and have fun! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/multimodal-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/multimodal-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-80-lllmstxt|> ## concepts - [Documentation](https://qdrant.tech/documentation/) - Concepts --- # [Anchor](https://qdrant.tech/documentation/concepts/\#concepts) Concepts Think of these concepts as a glossary. Each of these concepts include a link to detailed information, usually with examples. If you’re new to AI, these concepts can help you learn more about AI and the Qdrant approach. ## [Anchor](https://qdrant.tech/documentation/concepts/\#collections) Collections [Collections](https://qdrant.tech/documentation/concepts/collections/) define a named set of points that you can use for your search. ## [Anchor](https://qdrant.tech/documentation/concepts/\#payload) Payload A [Payload](https://qdrant.tech/documentation/concepts/payload/) describes information that you can store with vectors. ## [Anchor](https://qdrant.tech/documentation/concepts/\#points) Points [Points](https://qdrant.tech/documentation/concepts/points/) are a record which consists of a vector and an optional payload. ## [Anchor](https://qdrant.tech/documentation/concepts/\#search) Search [Search](https://qdrant.tech/documentation/concepts/search/) describes _similarity search_, which set up related objects close to each other in vector space. ## [Anchor](https://qdrant.tech/documentation/concepts/\#explore) Explore [Explore](https://qdrant.tech/documentation/concepts/explore/) includes several APIs for exploring data in your collections. ## [Anchor](https://qdrant.tech/documentation/concepts/\#hybrid-queries) Hybrid Queries [Hybrid Queries](https://qdrant.tech/documentation/concepts/hybrid-queries/) combines multiple queries or performs them in more than one stage. ## [Anchor](https://qdrant.tech/documentation/concepts/\#filtering) Filtering [Filtering](https://qdrant.tech/documentation/concepts/filtering/) defines various database-style clauses, conditions, and more. ## [Anchor](https://qdrant.tech/documentation/concepts/\#optimizer) Optimizer [Optimizer](https://qdrant.tech/documentation/concepts/optimizer/) describes options to rebuild database structures for faster search. They include a vacuum, a merge, and an indexing optimizer. ## [Anchor](https://qdrant.tech/documentation/concepts/\#storage) Storage [Storage](https://qdrant.tech/documentation/concepts/storage/) describes the configuration of storage in segments, which include indexes and an ID mapper. ## [Anchor](https://qdrant.tech/documentation/concepts/\#indexing) Indexing [Indexing](https://qdrant.tech/documentation/concepts/indexing/) lists and describes available indexes. They include payload, vector, sparse vector, and a filterable index. ## [Anchor](https://qdrant.tech/documentation/concepts/\#snapshots) Snapshots [Snapshots](https://qdrant.tech/documentation/concepts/snapshots/) describe the backup/restore process (and more) for each node at specific times. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-81-lllmstxt|> ## what-is-vector-quantization - [Articles](https://qdrant.tech/articles/) - What is Vector Quantization? [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # What is Vector Quantization? Sabrina Aquino · September 25, 2024 ![What is Vector Quantization?](https://qdrant.tech/articles_data/what-is-vector-quantization/preview/title.jpg) Vector quantization is a data compression technique used to reduce the size of high-dimensional data. Compressing vectors reduces memory usage while maintaining nearly all of the essential information. This method allows for more efficient storage and faster search operations, particularly in large datasets. When working with high-dimensional vectors, such as embeddings from providers like OpenAI, a single 1536-dimensional vector requires **6 KB of memory**. ![1536-dimensional vector size is 6 KB](https://qdrant.tech/articles_data/what-is-vector-quantization/vector-size.png) With 1 million vectors needing around 6 GB of memory, as your dataset grows to multiple **millions of vectors**, the memory and processing demands increase significantly. To understand why this process is so computationally demanding, let’s take a look at the nature of the [HNSW index](https://qdrant.tech/documentation/concepts/indexing/#vector-index). The **HNSW (Hierarchical Navigable Small World) index** organizes vectors in a layered graph, connecting each vector to its nearest neighbors. At each layer, the algorithm narrows down the search area until it reaches the lower layers, where it efficiently finds the closest matches to the query. ![HNSW Search visualization](https://qdrant.tech/articles_data/what-is-vector-quantization/hnsw.png) Each time a new vector is added, the system must determine its position in the existing graph, a process similar to searching. This makes both inserting and searching for vectors complex operations. One of the key challenges with the HNSW index is that it requires a lot of **random reads** and **sequential traversals** through the graph. This makes the process computationally expensive, especially when you’re dealing with millions of high-dimensional vectors. The system has to jump between various points in the graph in an unpredictable manner. This unpredictability makes optimization difficult, and as the dataset grows, the memory and processing requirements increase significantly. ![HNSW Search visualization](https://qdrant.tech/articles_data/what-is-vector-quantization/hnsw-search2.png) Since vectors need to be stored in **fast storage** like **RAM** or **SSD** for low-latency searches, as the size of the data grows, so does the cost of storing and processing it efficiently. **Quantization** offers a solution by compressing vectors to smaller memory sizes, making the process more efficient. There are several methods to achieve this, and here we will focus on three main ones: ![Types of Quantization: 1. Scalar Quantization, 2. Product Quantization, 3. Binary Quantization](https://qdrant.tech/articles_data/what-is-vector-quantization/types-of-quant.png) ## [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#1-what-is-scalar-quantization) 1\. What is Scalar Quantization? ![](https://qdrant.tech/articles_data/what-is-vector-quantization/astronaut-mars.jpg) In Qdrant, each dimension is represented by a `float32` value, which uses **4 bytes** of memory. When using [Scalar Quantization](https://qdrant.tech/documentation/guides/quantization/#scalar-quantization), we map our vectors to a range that the smaller `int8` type can represent. An `int8` is only **1 byte** and can represent 256 values (from -128 to 127, or 0 to 255). This results in a **75% reduction** in memory size. For example, if our data lies in the range of -1.0 to 1.0, Scalar Quantization will transform these values to a range that `int8` can represent, i.e., within -128 to 127. The system **maps** the `float32` values into this range. Here’s a simple linear example of what this process looks like: ![Scalar Quantization example](https://qdrant.tech/articles_data/what-is-vector-quantization/scalar-quant.png) To set up Scalar Quantization in Qdrant, you need to include the `quantization_config` section when creating or updating a collection: httppython ```http PUT /collections/{collection_name} { "vectors": { "size": 128, "distance": "Cosine" }, "quantization_config": { "scalar": { "type": "int8", "quantile": 0.99, "always_ram": true } } } ``` ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=128, distance=models.Distance.COSINE), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, quantile=0.99, always_ram=True, ), ), ) ``` The `quantile` parameter is used to calculate the quantization bounds. For example, if you specify a `0.99` quantile, the most extreme 1% of values will be excluded from the quantization bounds. This parameter only affects the resulting precision, not the memory footprint. You can adjust it if you experience a significant decrease in search quality. Scalar Quantization is a great choice if you’re looking to boost search speed and compression without losing much accuracy. It also slightly improves performance, as distance calculations (such as dot product or cosine similarity) using `int8` values are computationally simpler than using `float32` values. While the performance gains of Scalar Quantization may not match those achieved with Binary Quantization (which we’ll discuss later), it remains an excellent default choice when Binary Quantization isn’t suitable for your use case. ## [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#2-what-is-binary-quantization) 2\. What is Binary Quantization? ![Astronaut in surreal white environment](https://qdrant.tech/articles_data/what-is-vector-quantization/astronaut-white-surreal.jpg) [Binary Quantization](https://qdrant.tech/documentation/guides/quantization/#binary-quantization) is an excellent option if you’re looking to **reduce memory** usage while also achieving a significant **boost in speed**. It works by converting high-dimensional vectors into simple binary (0 or 1) representations. - Values greater than zero are converted to 1. - Values less than or equal to zero are converted to 0. Let’s consider our initial example of a 1536-dimensional vector that requires **6 KB** of memory (4 bytes for each `float32` value). After Binary Quantization, each dimension is reduced to 1 bit (1/8 byte), so the memory required is: 1536 dimensions8 bits per byte=192 bytes This leads to a **32x** memory reduction. ![Binary Quantization example](https://qdrant.tech/articles_data/what-is-vector-quantization/binary-quant.png) Qdrant automates the Binary Quantization process during indexing. As vectors are added to your collection, each 32-bit floating-point component is converted into a binary value according to the configuration you define. Here’s how you can set it up: httppython ```http PUT /collections/{collection_name} { "vectors": { "size": 1536, "distance": "Cosine" }, "quantization_config": { "binary": { "always_ram": true } } } ``` ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE), quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True, ), ), ) ``` Binary Quantization is by far the quantization method that provides the most significant processing **speed gains** compared to Scalar and Product Quantizations. This is because the binary representation allows the system to use highly optimized CPU instructions, such as [XOR](https://en.wikipedia.org/wiki/XOR_gate#:~:text=XOR%20represents%20the%20inequality%20function,the%20other%20but%20not%20both%22) and [Popcount](https://en.wikipedia.org/wiki/Hamming_weight), for fast distance computations. It can speed up search operations by **up to 40x**, depending on the dataset and hardware. Not all models are equally compatible with Binary Quantization, and in the comparison above, we are only using models that are compatible. Some models may experience a greater loss in accuracy when quantized. We recommend using Binary Quantization with models that have **at least 1024 dimensions** to minimize accuracy loss. The models that have shown the best compatibility with this method include: - **OpenAI text-embedding-ada-002** (1536 dimensions) - **Cohere AI embed-english-v2.0** (4096 dimensions) These models demonstrate minimal accuracy loss while still benefiting from substantial speed and memory gains. Even though Binary Quantization is incredibly fast and memory-efficient, the trade-offs are in **precision** and **model compatibility**, so you may need to ensure search quality using techniques like oversampling and rescoring. If you’re interested in exploring Binary Quantization in more detail—including implementation examples, benchmark results, and usage recommendations—check out our dedicated article on [Binary Quantization - Vector Search, 40x Faster](https://qdrant.tech/articles/binary-quantization/). ## [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#3-what-is-product-quantization) 3\. What is Product Quantization? ![](https://qdrant.tech/articles_data/what-is-vector-quantization/astronaut-centroids.jpg) [Product Quantization](https://qdrant.tech/documentation/guides/quantization/#product-quantization) is a method used to compress high-dimensional vectors by representing them with a smaller set of representative points. The process begins by splitting the original high-dimensional vectors into smaller **sub-vectors.** Each sub-vector represents a segment of the original vector, capturing different characteristics of the data. ![Creation of the Sub-vector](https://qdrant.tech/articles_data/what-is-vector-quantization/subvec.png) For each sub-vector, a separate **codebook** is created, representing regions in the data space where common patterns occur. The codebook in Qdrant is trained automatically during the indexing process. As vectors are added to the collection, Qdrant uses your specified quantization settings in the `quantization_config` to build the codebook and quantize the vectors. Here’s how you can set it up: httppython ```http PUT /collections/{collection_name} { "vectors": { "size": 1024, "distance": "Cosine" }, "quantization_config": { "product": { "compression": "x32", "always_ram": true } } } ``` ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=1024, distance=models.Distance.COSINE), quantization_config=models.ProductQuantization( product=models.ProductQuantizationConfig( compression=models.CompressionRatio.X32, always_ram=True, ), ), ) ``` Each region in the codebook is defined by a **centroid**, which serves as a representative point summarizing the characteristics of that region. Instead of treating every single data point as equally important, we can group similar sub-vectors together and represent them with a single centroid that captures the general characteristics of that group. The centroids used in Product Quantization are determined using the **[K-means clustering algorithm](https://en.wikipedia.org/wiki/K-means_clustering)**. ![Codebook and Centroids example](https://qdrant.tech/articles_data/what-is-vector-quantization/code-book.png) Qdrant always selects **K = 256** as the number of centroids in its implementation, based on the fact that 256 is the maximum number of unique values that can be represented by a single byte. This makes the compression process efficient because each centroid index can be stored in a single byte. The original high-dimensional vectors are quantized by mapping each sub-vector to the nearest centroid in its respective codebook. ![Vectors being mapped to their corresponding centroids example](https://qdrant.tech/articles_data/what-is-vector-quantization/mapping.png) The compressed vector stores the index of the closest centroid for each sub-vector. Here’s how a 1024-dimensional vector, originally taking up 4096 bytes, is reduced to just 128 bytes by representing it as 128 indexes, each pointing to the centroid of a sub-vector: ![Product Quantization example](https://qdrant.tech/articles_data/what-is-vector-quantization/product-quant.png) After setting up quantization and adding your vectors, you can perform searches as usual. Qdrant will automatically use the quantized vectors, optimizing both speed and memory usage. Optionally, you can enable rescoring for better accuracy. httppython ```http POST /collections/{collection_name}/points/search { "query": [0.22, -0.01, -0.98, 0.37], "params": { "quantization": { "rescore": true } }, "limit": 10 } ``` ```python client.query_points( collection_name="my_collection", query_vector=[0.22, -0.01, -0.98, 0.37], # Your query vector search_params=models.SearchParams( quantization=models.QuantizationSearchParams( rescore=True # Enables rescoring with original vectors ) ), limit=10 # Return the top 10 results ) ``` Product Quantization can significantly reduce memory usage, potentially offering up to **64x** compression in certain configurations. However, it’s important to note that this level of compression can lead to a noticeable drop in quality. If your application requires high precision or real-time performance, Product Quantization may not be the best choice. However, if **memory savings** are critical and some accuracy loss is acceptable, it could still be an ideal solution. Here’s a comparison of speed, accuracy, and compression for all three methods, adapted from [Qdrant’s documentation](https://qdrant.tech/documentation/guides/quantization/#how-to-choose-the-right-quantization-method): | Quantization method | Accuracy | Speed | Compression | | --- | --- | --- | --- | | Scalar | 0.99 | up to x2 | 4 | | Product | 0.7 | 0.5 | up to 64 | | Binary | 0.95\* | up to x40 | 32 | \\* \- for compatible models For a more in-depth understanding of the benchmarks you can expect, check out our dedicated article on [Product Quantization in Vector Search](https://qdrant.tech/articles/product-quantization/). ## [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#rescoring-oversampling-and-reranking) Rescoring, Oversampling, and Reranking When we use quantization methods like Scalar, Binary, or Product Quantization, we’re compressing our vectors to save memory and improve performance. However, this compression removes some detail from the original vectors. This can slightly reduce the accuracy of our similarity searches because the quantized vectors are approximations of the original data. To mitigate this loss of accuracy, you can use **oversampling** and **rescoring**, which help improve the accuracy of the final search results. The original vectors are never deleted during this process, and you can easily switch between quantization methods or parameters by updating the collection configuration at any time. Here’s how the process works, step by step: ### [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#1-initial-quantized-search) 1\. Initial Quantized Search When you perform a search, Qdrant retrieves the top candidates using the quantized vectors based on their similarity to the query vector, as determined by the quantized data. This step is fast because we’re using the quantized vectors. ![ANN Search with Quantization](https://qdrant.tech/articles_data/what-is-vector-quantization/ann-search-quantized.png) ### [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#2-oversampling) 2\. Oversampling Oversampling is a technique that helps compensate for any precision lost due to quantization. Since quantization simplifies vectors, some relevant matches could be missed in the initial search. To avoid this, you can **retrieve more candidates**, increasing the chances that the most relevant vectors make it into the final results. You can control the number of extra candidates by setting an `oversampling` parameter. For example, if your desired number of results ( `limit`) is 4 and you set an `oversampling` factor of 2, Qdrant will retrieve 8 candidates (4 × 2). ![ANN Search with Quantization and Oversampling](https://qdrant.tech/articles_data/what-is-vector-quantization/ann-search-quantized-oversampling.png) You can adjust the oversampling factor to control how many extra vectors Qdrant includes in the initial pool. More candidates mean a better chance of obtaining high-quality top-K results, especially after rescoring with the original vectors. ### [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#3-rescoring-with-original-vectors) 3\. Rescoring with Original Vectors After oversampling to gather more potential matches, each candidate is re-evaluated based on additional criteria to ensure higher accuracy and relevance to the query. The rescoring process **maps** the quantized vectors to their corresponding original vectors, allowing you to consider factors like context, metadata, or additional relevance that wasn’t included in the initial search, leading to more accurate results. ![Rescoring with Original Vectors](https://qdrant.tech/articles_data/what-is-vector-quantization/rescoring.png) During rescoring, one of the lower-ranked candidates from oversampling might turn out to be a better match than some of the original top-K candidates. Even though rescoring uses the original, larger vectors, the process remains much faster because only a very small number of vectors are read. The initial quantized search already identifies the specific vectors to read, rescore, and rerank. ### [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#4-reranking) 4\. Reranking With the new similarity scores from rescoring, **reranking** is where the final top-K candidates are determined based on the updated similarity scores. For example, in our case with a limit of 4, a candidate that ranked 6th in the initial quantized search might improve its score after rescoring because the original vectors capture more context or metadata. As a result, this candidate could move into the final top 4 after reranking, replacing a less relevant option from the initial search. ![Reranking with Original Vectors](https://qdrant.tech/articles_data/what-is-vector-quantization/reranking.png) Here’s how you can set it up: httppython ```http POST /collections/{collection_name}/points/search { "query": [0.22, -0.01, -0.98, 0.37], "params": { "quantization": { "rescore": true, "oversampling": 2 } }, "limit": 4 } ``` ```python client.query_points( collection_name="my_collection", query_vector=[0.22, -0.01, -0.98, 0.37], search_params=models.SearchParams( quantization=models.QuantizationSearchParams( rescore=True, # Enables rescoring with original vectors oversampling=2 # Retrieves extra candidates for rescoring ) ), limit=4 # Desired number of final results ) ``` You can adjust the `oversampling` factor to find the right balance between search speed and result accuracy. If quantization is impacting performance in an application that requires high accuracy, combining oversampling with rescoring is a great choice. However, if you need faster searches and can tolerate some loss in accuracy, you might choose to use oversampling without rescoring, or adjust the oversampling factor to a lower value. ## [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#distributing-resources-between-disk--memory) Distributing Resources Between Disk & Memory Qdrant stores both the quantized and original vectors. When you enable quantization, both the original and quantized vectors are stored in RAM by default. You can move the original vectors to disk to significantly reduce RAM usage and lower system costs. Simply enabling quantization is not enough—you need to explicitly move the original vectors to disk by setting `on_disk=True`. Here’s an example configuration: httppython ```http PUT /collections/{collection_name} { "vectors": { "size": 1536, "distance": "Cosine", "on_disk": true # Move original vectors to disk }, "quantization_config": { "binary": { "always_ram": true # Store only quantized vectors in RAM } } } ``` ```python client.update_collection( collection_name="my_collection", vectors_config=models.VectorParams( size=1536, distance=models.Distance.COSINE, on_disk=True # Move original vectors to disk ), quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True # Store only quantized vectors in RAM ) ) ) ``` Without explicitly setting `on_disk=True`, you won’t see any RAM savings, even with quantization enabled. So, make sure to configure both storage and quantization options based on your memory and performance needs. If your storage has high disk latency, consider disabling rescoring to maintain speed. ### [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#speeding-up-rescoring-with-io_uring) Speeding Up Rescoring with io\_uring When dealing with large collections of quantized vectors, frequent disk reads are required to retrieve both original and compressed data for rescoring operations. While `mmap` helps with efficient I/O by reducing user-to-kernel transitions, rescoring can still be slowed down when working with large datasets on disk due to the need for frequent disk reads. On Linux-based systems, `io_uring` allows multiple disk operations to be processed in parallel, significantly reducing I/O overhead. This optimization is particularly effective during rescoring, where multiple vectors need to be re-evaluated after the initial search. With io\_uring, Qdrant can retrieve and rescore vectors from disk in the most efficient way, improving overall search performance. When you perform vector quantization and store data on disk, Qdrant often needs to access multiple vectors in parallel. Without io\_uring, this process can be slowed down due to the system’s limitations in handling many disk accesses. To enable `io_uring` in Qdrant, add the following to your storage configuration: ```yaml storage: async_scorer: true # Enable io_uring for async storage ``` Without this configuration, Qdrant will default to using `mmap` for disk I/O operations. For more information and benchmarks comparing io\_uring with traditional I/O approaches like mmap, check out [Qdrant’s io\_uring implementation article.](https://qdrant.tech/articles/io_uring/) ## [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#performance-of-quantized-vs-non-quantized-data) Performance of Quantized vs. Non-Quantized Data Qdrant uses the quantized vectors by default if they are available. If you want to evaluate how quantization affects your search results, you can temporarily disable it to compare results from quantized and non-quantized searches. To do this, set `ignore: true` in the query: httppython ```http POST /collections/{collection_name}/points/query { "query": [0.22, -0.01, -0.98, 0.37], "params": { "quantization": { "ignore": true, } }, "limit": 4 } ``` ```python client.query_points( collection_name="{collection_name}", query=[0.22, -0.01, -0.98, 0.37], search_params=models.SearchParams( quantization=models.QuantizationSearchParams( ignore=True ) ), ) ``` ### [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#switching-between-quantization-methods) Switching Between Quantization Methods Not sure if you’ve chosen the right quantization method? In Qdrant, you have the flexibility to remove quantization and rely solely on the original vectors, adjust the quantization type, or change compression parameters at any time without affecting your original vectors. To switch to binary quantization and adjust the compression rate, for example, you can update the collection’s quantization configuration using the `update_collection` method: httppython ```http PUT /collections/{collection_name} { "vectors": { "size": 1536, "distance": "Cosine" }, "quantization_config": { "binary": { "always_ram": true, "compression_rate": 0.8 # Set the new compression rate } } } ``` ```python client.update_collection( collection_name="my_collection", quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True, # Store only quantized vectors in RAM compression_rate=0.8 # Set the new compression rate ) ), ) ``` If you decide to **turn off quantization** and use only the original vectors, you can remove the quantization settings entirely with `quantization_config=None`: httppython ```http PUT /collections/my_collection { "vectors": { "size": 1536, "distance": "Cosine" }, "quantization_config": null # Remove quantization and use original vectors only } ``` ```python client.update_collection( collection_name="my_collection", quantization_config=None # Remove quantization and rely on original vectors only ) ``` ## [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#wrapping-up) Wrapping Up ![](https://qdrant.tech/articles_data/what-is-vector-quantization/astronaut-running.jpg) Quantization methods like Scalar, Product, and Binary Quantization offer powerful ways to optimize memory usage and improve search performance when dealing with large datasets of high-dimensional vectors. Each method comes with its own trade-offs between memory savings, computational speed, and accuracy. Here are some final thoughts to help you choose the right quantization method for your needs: | **Quantization Method** | **Key Features** | **When to Use** | | --- | --- | --- | | **Binary Quantization** | • **Fastest method and most memory-efficient**
• Up to **40x** faster search and **32x** reduced memory footprint | • Use with tested models like OpenAI’s `text-embedding-ada-002` and Cohere’s `embed-english-v2.0`
• When speed and memory efficiency are critical | | **Scalar Quantization** | • **Minimal loss of accuracy**
• Up to **4x** reduced memory footprint | • Safe default choice for most applications.
• Offers a good balance between accuracy, speed, and compression. | | **Product Quantization** | • **Highest compression ratio**
• Up to **64x** reduced memory footprint | • When minimizing memory usage is the top priority
• Acceptable if some loss of accuracy and slower indexing is tolerable | ### [Anchor](https://qdrant.tech/articles/what-is-vector-quantization/\#learn-more) Learn More If you want to learn more about improving accuracy, memory efficiency, and speed when using quantization in Qdrant, we have a dedicated [Quantization tips](https://qdrant.tech/documentation/guides/quantization/#quantization-tips) section in our docs that explains all the quantization tips you can use to enhance your results. Learn more about optimizing real-time precision with oversampling in Binary Quantization by watching this interview with Qdrant’s CTO, Andrey Vasnetsov: Binary Quantization - Andrey Vasnetsov \| Vector Space Talk #001 - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [Binary Quantization - Andrey Vasnetsov \| Vector Space Talk #001](https://www.youtube.com/watch?v=4aUq5VnR_VI) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=4aUq5VnR_VI&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 20:44 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=4aUq5VnR_VI "Watch on YouTube") Stay up-to-date on the latest in [vector search](https://qdrant.tech/advanced-search/) and quantization, share your projects, ask questions, [join our vector search community](https://discord.com/invite/qdrant)! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-is-quantization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-is-quantization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) ![Company Logo](https://cdn.cookielaw.org/logos/static/ot_company_logo.png) ## Privacy Preference Center Cookies used on the site are categorized, and below, you can read about each category and allow or deny some or all of them. When categories that have been previously allowed are disabled, all cookies assigned to that category will be removed from your browser. Additionally, you can see a list of cookies assigned to each category and detailed information in the cookie declaration. [More information](https://qdrant.tech/legal/privacy-policy/#cookies-and-web-beacons) Allow All ### Manage Consent Preferences #### Targeting Cookies Targeting Cookies These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising. #### Functional Cookies Functional Cookies These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly. #### Strictly Necessary Cookies Always Active These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information. #### Performance Cookies Performance Cookies These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance. Back Button ### Cookie List Search Icon Filter Icon Clear checkbox labellabel ApplyCancel ConsentLeg.Interest checkbox labellabel checkbox labellabel checkbox labellabel Reject AllConfirm My Choices [![Powered by Onetrust](https://cdn.cookielaw.org/logos/static/powered_by_logo.svg)](https://www.onetrust.com/products/cookie-consent/) <|page-82-lllmstxt|> ## indexing - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Indexing --- # [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#indexing) Indexing A key feature of Qdrant is the effective combination of vector and traditional indexes. It is essential to have this because for vector search to work effectively with filters, having vector index only is not enough. In simpler terms, a vector index speeds up vector search, and payload indexes speed up filtering. The indexes in the segments exist independently, but the parameters of the indexes themselves are configured for the whole collection. Not all segments automatically have indexes. Their necessity is determined by the [optimizer](https://qdrant.tech/documentation/concepts/optimizer/) settings and depends, as a rule, on the number of stored points. ## [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#payload-index) Payload Index Payload index in Qdrant is similar to the index in conventional document-oriented databases. This index is built for a specific field and type, and is used for quick point requests by the corresponding filtering condition. The index is also used to accurately estimate the filter cardinality, which helps the [query planning](https://qdrant.tech/documentation/concepts/search/#query-planning) choose a search strategy. Creating an index requires additional computational resources and memory, so choosing fields to be indexed is essential. Qdrant does not make this choice but grants it to the user. To mark a field as indexable, you can use the following: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "name_of_the_field_to_index", "field_schema": "keyword" } ``` ```python client.create_payload_index( collection_name="{collection_name}", field_name="name_of_the_field_to_index", field_schema="keyword", ) ``` ```typescript client.createPayloadIndex("{collection_name}", { field_name: "name_of_the_field_to_index", field_schema: "keyword", }); ``` ```rust use qdrant_client::qdrant::{CreateFieldIndexCollectionBuilder, FieldType}; client .create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "name_of_the_field_to_index", FieldType::Keyword, ) .wait(true), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.PayloadSchemaType; client.createPayloadIndexAsync( "{collection_name}", "name_of_the_field_to_index", PayloadSchemaType.Keyword, null, true, null, null); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "name_of_the_field_to_index" ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "name_of_the_field_to_index", FieldType: qdrant.FieldType_FieldTypeKeyword.Enum(), }) ``` You can use dot notation to specify a nested field for indexing. Similar to specifying [nested filters](https://qdrant.tech/documentation/concepts/filtering/#nested-key). Available field types are: - `keyword` \- for [keyword](https://qdrant.tech/documentation/concepts/payload/#keyword) payload, affects [Match](https://qdrant.tech/documentation/concepts/filtering/#match) filtering conditions. - `integer` \- for [integer](https://qdrant.tech/documentation/concepts/payload/#integer) payload, affects [Match](https://qdrant.tech/documentation/concepts/filtering/#match) and [Range](https://qdrant.tech/documentation/concepts/filtering/#range) filtering conditions. - `float` \- for [float](https://qdrant.tech/documentation/concepts/payload/#float) payload, affects [Range](https://qdrant.tech/documentation/concepts/filtering/#range) filtering conditions. - `bool` \- for [bool](https://qdrant.tech/documentation/concepts/payload/#bool) payload, affects [Match](https://qdrant.tech/documentation/concepts/filtering/#match) filtering conditions (available as of v1.4.0). - `geo` \- for [geo](https://qdrant.tech/documentation/concepts/payload/#geo) payload, affects [Geo Bounding Box](https://qdrant.tech/documentation/concepts/filtering/#geo-bounding-box) and [Geo Radius](https://qdrant.tech/documentation/concepts/filtering/#geo-radius) filtering conditions. - `datetime` \- for [datetime](https://qdrant.tech/documentation/concepts/payload/#datetime) payload, affects [Range](https://qdrant.tech/documentation/concepts/filtering/#range) filtering conditions (available as of v1.8.0). - `text` \- a special kind of index, available for [keyword](https://qdrant.tech/documentation/concepts/payload/#keyword) / string payloads, affects [Full Text search](https://qdrant.tech/documentation/concepts/filtering/#full-text-match) filtering conditions. - `uuid` \- a special type of index, similar to `keyword`, but optimized for [UUID values](https://qdrant.tech/documentation/concepts/payload/#uuid). Affects [Match](https://qdrant.tech/documentation/concepts/filtering/#match) filtering conditions. (available as of v1.11.0) Payload index may occupy some additional memory, so it is recommended to only use index for those fields that are used in filtering conditions. If you need to filter by many fields and the memory limits does not allow to index all of them, it is recommended to choose the field that limits the search result the most. As a rule, the more different values a payload value has, the more efficiently the index will be used. ### [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#full-text-index) Full-text index _Available as of v0.10.0_ Qdrant supports full-text search for string payload. Full-text index allows you to filter points by the presence of a word or a phrase in the payload field. Full-text index configuration is a bit more complex than other indexes, as you can specify the tokenization parameters. Tokenization is the process of splitting a string into tokens, which are then indexed in the inverted index. To create a full-text index, you can use the following: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "name_of_the_field_to_index", "field_schema": { "type": "text", "tokenizer": "word", "min_token_len": 2, "max_token_len": 20, "lowercase": true } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_payload_index( collection_name="{collection_name}", field_name="name_of_the_field_to_index", field_schema=models.TextIndexParams( type="text", tokenizer=models.TokenizerType.WORD, min_token_len=2, max_token_len=15, lowercase=True, ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createPayloadIndex("{collection_name}", { field_name: "name_of_the_field_to_index", field_schema: { type: "text", tokenizer: "word", min_token_len: 2, max_token_len: 15, lowercase: true, }, }); ``` ```rust use qdrant_client::qdrant::{ payload_index_params::IndexParams, CreateFieldIndexCollectionBuilder, FieldType, PayloadIndexParams, TextIndexParams, TokenizerType, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "name_of_the_field_to_index", FieldType::Text, ) .field_index_params(PayloadIndexParams { index_params: Some(IndexParams::TextIndexParams(TextIndexParams { tokenizer: TokenizerType::Word as i32, min_token_len: Some(2), max_token_len: Some(10), lowercase: Some(true), })), }), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.PayloadIndexParams; import io.qdrant.client.grpc.Collections.PayloadSchemaType; import io.qdrant.client.grpc.Collections.TextIndexParams; import io.qdrant.client.grpc.Collections.TokenizerType; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createPayloadIndexAsync( "{collection_name}", "name_of_the_field_to_index", PayloadSchemaType.Text, PayloadIndexParams.newBuilder() .setTextIndexParams( TextIndexParams.newBuilder() .setTokenizer(TokenizerType.Word) .setMinTokenLen(2) .setMaxTokenLen(10) .setLowercase(true) .build()) .build(), null, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "name_of_the_field_to_index", schemaType: PayloadSchemaType.Text, indexParams: new PayloadIndexParams { TextIndexParams = new TextIndexParams { Tokenizer = TokenizerType.Word, MinTokenLen = 2, MaxTokenLen = 10, Lowercase = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "name_of_the_field_to_index", FieldType: qdrant.FieldType_FieldTypeText.Enum(), FieldIndexParams: qdrant.NewPayloadIndexParamsText( &qdrant.TextIndexParams{ Tokenizer: qdrant.TokenizerType_Whitespace, MinTokenLen: qdrant.PtrOf(uint64(2)), MaxTokenLen: qdrant.PtrOf(uint64(10)), Lowercase: qdrant.PtrOf(true), }), }) ``` Available tokenizers are: - `word` \- splits the string into words, separated by spaces, punctuation marks, and special characters. - `whitespace` \- splits the string into words, separated by spaces. - `prefix` \- splits the string into words, separated by spaces, punctuation marks, and special characters, and then creates a prefix index for each word. For example: `hello` will be indexed as `h`, `he`, `hel`, `hell`, `hello`. - `multilingual` \- special type of tokenizer based on [charabia](https://github.com/meilisearch/charabia) package. It allows proper tokenization and lemmatization for multiple languages, including those with non-latin alphabets and non-space delimiters. See [charabia documentation](https://github.com/meilisearch/charabia) for full list of supported languages supported normalization options. In the default build configuration, qdrant does not include support for all languages, due to the increasing size of the resulting binary. Chinese, Japanese and Korean languages are not enabled by default, but can be enabled by building qdrant from source with `--features multiling-chinese,multiling-japanese,multiling-korean` flags. See [Full Text match](https://qdrant.tech/documentation/concepts/filtering/#full-text-match) for examples of querying with full-text index. ### [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#parameterized-index) Parameterized index _Available as of v1.8.0_ We’ve added a parameterized variant to the `integer` index, which allows you to fine-tune indexing and search performance. Both the regular and parameterized `integer` indexes use the following flags: - `lookup`: enables support for direct lookup using [Match](https://qdrant.tech/documentation/concepts/filtering/#match) filters. - `range`: enables support for [Range](https://qdrant.tech/documentation/concepts/filtering/#range) filters. The regular `integer` index assumes both `lookup` and `range` are `true`. In contrast, to configure a parameterized index, you would set only one of these filters to `true`: | `lookup` | `range` | Result | | --- | --- | --- | | `true` | `true` | Regular integer index | | `true` | `false` | Parameterized integer index | | `false` | `true` | Parameterized integer index | | `false` | `false` | No integer index | The parameterized index can enhance performance in collections with millions of points. We encourage you to try it out. If it does not enhance performance in your use case, you can always restore the regular `integer` index. Note: If you set `"lookup": true` with a range filter, that may lead to significant performance issues. For example, the following code sets up a parameterized integer index which supports only range filters: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "name_of_the_field_to_index", "field_schema": { "type": "integer", "lookup": false, "range": true } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_payload_index( collection_name="{collection_name}", field_name="name_of_the_field_to_index", field_schema=models.IntegerIndexParams( type=models.IntegerIndexType.INTEGER, lookup=False, range=True, ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createPayloadIndex("{collection_name}", { field_name: "name_of_the_field_to_index", field_schema: { type: "integer", lookup: false, range: true, }, }); ``` ```rust use qdrant_client::qdrant::{ payload_index_params::IndexParams, CreateFieldIndexCollectionBuilder, FieldType, IntegerIndexParams, PayloadIndexParams, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "name_of_the_field_to_index", FieldType::Integer, ) .field_index_params(PayloadIndexParams { index_params: Some(IndexParams::IntegerIndexParams(IntegerIndexParams { lookup: false, range: true, })), }), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.IntegerIndexParams; import io.qdrant.client.grpc.Collections.PayloadIndexParams; import io.qdrant.client.grpc.Collections.PayloadSchemaType; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createPayloadIndexAsync( "{collection_name}", "name_of_the_field_to_index", PayloadSchemaType.Integer, PayloadIndexParams.newBuilder() .setIntegerIndexParams( IntegerIndexParams.newBuilder().setLookup(false).setRange(true).build()) .build(), null, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "name_of_the_field_to_index", schemaType: PayloadSchemaType.Integer, indexParams: new PayloadIndexParams { IntegerIndexParams = new() { Lookup = false, Range = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "name_of_the_field_to_index", FieldType: qdrant.FieldType_FieldTypeInteger.Enum(), FieldIndexParams: qdrant.NewPayloadIndexParamsInt( &qdrant.IntegerIndexParams{ Lookup: false, Range: true, }), }) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#on-disk-payload-index) On-disk payload index _Available as of v1.11.0_ By default all payload-related structures are stored in memory. In this way, the vector index can quickly access payload values during search. As latency in this case is critical, it is recommended to keep hot payload indexes in memory. There are, however, cases when payload indexes are too large or rarely used. In those cases, it is possible to store payload indexes on disk. To configure on-disk payload index, you can use the following index parameters: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "payload_field_name", "field_schema": { "type": "keyword", "on_disk": true } } ``` ```python client.create_payload_index( collection_name="{collection_name}", field_name="payload_field_name", field_schema=models.KeywordIndexParams( type=models.KeywordIndexType.KEYWORD, on_disk=True, ), ) ``` ```typescript client.createPayloadIndex("{collection_name}", { field_name: "payload_field_name", field_schema: { type: "keyword", on_disk: true }, }); ``` ```rust use qdrant_client::qdrant::{ CreateFieldIndexCollectionBuilder, KeywordIndexParamsBuilder, FieldType }; use qdrant_client::{Qdrant, QdrantError}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "payload_field_name", FieldType::Keyword, ) .field_index_params( KeywordIndexParamsBuilder::default() .on_disk(true), ), ); ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.PayloadIndexParams; import io.qdrant.client.grpc.Collections.PayloadSchemaType; import io.qdrant.client.grpc.Collections.KeywordIndexParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createPayloadIndexAsync( "{collection_name}", "payload_field_name", PayloadSchemaType.Keyword, PayloadIndexParams.newBuilder() .setKeywordIndexParams( KeywordIndexParams.newBuilder() .setOnDisk(true) .build()) .build(), null, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "payload_field_name", schemaType: PayloadSchemaType.Keyword, indexParams: new PayloadIndexParams { KeywordIndexParams = new KeywordIndexParams { OnDisk = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "name_of_the_field_to_index", FieldType: qdrant.FieldType_FieldTypeKeyword.Enum(), FieldIndexParams: qdrant.NewPayloadIndexParamsKeyword( &qdrant.KeywordIndexParams{ OnDisk: qdrant.PtrOf(true), }), }) ``` Payload index on-disk is supported for following types: - `keyword` - `integer` - `float` - `datetime` - `uuid` - `text` - `geo` The list will be extended in future versions. ### [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#tenant-index) Tenant Index _Available as of v1.11.0_ Many vector search use-cases require multitenancy. In a multi-tenant scenario the collection is expected to contain multiple subsets of data, where each subset belongs to a different tenant. Qdrant supports efficient multi-tenant search by enabling [special configuration](https://qdrant.tech/documentation/guides/multiple-partitions/) vector index, which disables global search and only builds sub-indexes for each tenant. However, knowing that the collection contains multiple tenants unlocks more opportunities for optimization. To optimize storage in Qdrant further, you can enable tenant indexing for payload fields. This option will tell Qdrant which fields are used for tenant identification and will allow Qdrant to structure storage for faster search of tenant-specific data. One example of such optimization is localizing tenant-specific data closer on disk, which will reduce the number of disk reads during search. To enable tenant index for a field, you can use the following index parameters: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "payload_field_name", "field_schema": { "type": "keyword", "is_tenant": true } } ``` ```python client.create_payload_index( collection_name="{collection_name}", field_name="payload_field_name", field_schema=models.KeywordIndexParams( type=models.KeywordIndexType.KEYWORD, is_tenant=True, ), ) ``` ```typescript client.createPayloadIndex("{collection_name}", { field_name: "payload_field_name", field_schema: { type: "keyword", is_tenant: true }, }); ``` ```rust use qdrant_client::qdrant::{ CreateFieldIndexCollectionBuilder, KeywordIndexParamsBuilder, FieldType }; use qdrant_client::{Qdrant, QdrantError}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "payload_field_name", FieldType::Keyword, ) .field_index_params( KeywordIndexParamsBuilder::default() .is_tenant(true), ), ); ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.PayloadIndexParams; import io.qdrant.client.grpc.Collections.PayloadSchemaType; import io.qdrant.client.grpc.Collections.KeywordIndexParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createPayloadIndexAsync( "{collection_name}", "payload_field_name", PayloadSchemaType.Keyword, PayloadIndexParams.newBuilder() .setKeywordIndexParams( KeywordIndexParams.newBuilder() .setIsTenant(true) .build()) .build(), null, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "payload_field_name", schemaType: PayloadSchemaType.Keyword, indexParams: new PayloadIndexParams { KeywordIndexParams = new KeywordIndexParams { IsTenant = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "name_of_the_field_to_index", FieldType: qdrant.FieldType_FieldTypeKeyword.Enum(), FieldIndexParams: qdrant.NewPayloadIndexParamsKeyword( &qdrant.KeywordIndexParams{ IsTenant: qdrant.PtrOf(true), }), }) ``` Tenant optimization is supported for the following datatypes: - `keyword` - `uuid` ### [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#principal-index) Principal Index _Available as of v1.11.0_ Similar to the tenant index, the principal index is used to optimize storage for faster search, assuming that the search request is primarily filtered by the principal field. A good example of a use case for the principal index is time-related data, where each point is associated with a timestamp. In this case, the principal index can be used to optimize storage for faster search with time-based filters. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "timestamp", "field_schema": { "type": "integer", "is_principal": true } } ``` ```python client.create_payload_index( collection_name="{collection_name}", field_name="timestamp", field_schema=models.IntegerIndexParams( type=models.IntegerIndexType.INTEGER, is_principal=True, ), ) ``` ```typescript client.createPayloadIndex("{collection_name}", { field_name: "timestamp", field_schema: { type: "integer", is_principal: true }, }); ``` ```rust use qdrant_client::qdrant::{ CreateFieldIndexCollectionBuilder, IntegerIndexParamsBuilder, FieldType }; use qdrant_client::{Qdrant, QdrantError}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "timestamp", FieldType::Integer, ) .field_index_params( IntegerIndexParamsBuilder::default() .is_principal(true), ), ); ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.PayloadIndexParams; import io.qdrant.client.grpc.Collections.PayloadSchemaType; import io.qdrant.client.grpc.Collections.IntegerIndexParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createPayloadIndexAsync( "{collection_name}", "timestamp", PayloadSchemaType.Integer, PayloadIndexParams.newBuilder() .setIntegerIndexParams( KeywordIndexParams.newBuilder() .setIsPrincipa(true) .build()) .build(), null, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "timestamp", schemaType: PayloadSchemaType.Integer, indexParams: new PayloadIndexParams { IntegerIndexParams = new IntegerIndexParams { IsPrincipal = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "name_of_the_field_to_index", FieldType: qdrant.FieldType_FieldTypeInteger.Enum(), FieldIndexParams: qdrant.NewPayloadIndexParamsInt( &qdrant.IntegerIndexParams{ IsPrincipal: qdrant.PtrOf(true), }), }) ``` Principal optimization is supported for following types: - `integer` - `float` - `datetime` ## [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#vector-index) Vector Index A vector index is a data structure built on vectors through a specific mathematical model. Through the vector index, we can efficiently query several vectors similar to the target vector. Qdrant currently only uses HNSW as a dense vector index. [HNSW](https://arxiv.org/abs/1603.09320) (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm. It builds a multi-layer navigation structure for an image according to certain rules. In this structure, the upper layers are more sparse and the distances between nodes are farther. The lower layers are denser and the distances between nodes are closer. The search starts from the uppermost layer, finds the node closest to the target in this layer, and then enters the next layer to begin another search. After multiple iterations, it can quickly approach the target position. In order to improve performance, HNSW limits the maximum degree of nodes on each layer of the graph to `m`. In addition, you can use `ef_construct` (when building index) or `ef` (when searching targets) to specify a search range. The corresponding parameters could be configured in the configuration file: ```yaml storage: # Default parameters of HNSW Index. Could be overridden for each collection or named vector individually hnsw_index: # Number of edges per node in the index graph. # Larger the value - more accurate the search, more space required. m: 16 # Number of neighbours to consider during the index building. # Larger the value - more accurate the search, more time required to build index. ef_construct: 100 # Minimal size threshold (in KiloBytes) below which full-scan is preferred over HNSW search. # This measures the total size of vectors being queried against. # When the maximum estimated amount of points that a condition satisfies is smaller than # `full_scan_threshold_kb`, the query planner will use full-scan search instead of HNSW index # traversal for better performance. # Note: 1Kb = 1 vector of size 256 full_scan_threshold: 10000 ``` And so in the process of creating a [collection](https://qdrant.tech/documentation/concepts/collections/). The `ef` parameter is configured during [the search](https://qdrant.tech/documentation/concepts/search/) and by default is equal to `ef_construct`. HNSW is chosen for several reasons. First, HNSW is well-compatible with the modification that allows Qdrant to use filters during a search. Second, it is one of the most accurate and fastest algorithms, according to [public benchmarks](https://github.com/erikbern/ann-benchmarks). _Available as of v1.1.1_ The HNSW parameters can also be configured on a collection and named vector level by setting [`hnsw_config`](https://qdrant.tech/documentation/concepts/indexing/#vector-index) to fine-tune search performance. ## [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#sparse-vector-index) Sparse Vector Index _Available as of v1.7.0_ Sparse vectors in Qdrant are indexed with a special data structure, which is optimized for vectors that have a high proportion of zeroes. In some ways, this indexing method is similar to the inverted index, which is used in text search engines. - A sparse vector index in Qdrant is exact, meaning it does not use any approximation algorithms. - All sparse vectors added to the collection are immediately indexed in the mutable version of a sparse index. With Qdrant, you can benefit from a more compact and efficient immutable sparse index, which is constructed during the same optimization process as the dense vector index. This approach is particularly useful for collections storing both dense and sparse vectors. To configure a sparse vector index, create a collection with the following parameters: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "sparse_vectors": { "text": { "index": { "on_disk": false } } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config={}, sparse_vectors_config={ "text": models.SparseVectorParams( index=models.SparseIndexParams(on_disk=False), ) }, ) ``` ```typescript import { QdrantClient, Schemas } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { sparse_vectors: { "splade-model-name": { index: { on_disk: false } } } }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, SparseIndexConfigBuilder, SparseVectorParamsBuilder, SparseVectorsConfigBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut sparse_vectors_config = SparseVectorsConfigBuilder::default(); sparse_vectors_config.add_named_vector_params( "splade-model-name", SparseVectorParamsBuilder::default() .index(SparseIndexConfigBuilder::default().on_disk(true)), ); client .create_collection( CreateCollectionBuilder::new("{collection_name}") .sparse_vectors_config(sparse_vectors_config), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.createCollectionAsync( Collections.CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setSparseVectorsConfig( Collections.SparseVectorConfig.newBuilder().putMap( "splade-model-name", Collections.SparseVectorParams.newBuilder() .setIndex( Collections.SparseIndexConfig .newBuilder() .setOnDisk(false) .build() ).build() ).build() ).build() ).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", sparseVectorsConfig: ("splade-model-name", new SparseVectorParams{ Index = new SparseIndexConfig { OnDisk = false, } }) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", SparseVectorsConfig: qdrant.NewSparseVectorsConfig( map[string]*qdrant.SparseVectorParams{ "splade-model-name": { Index: &qdrant.SparseIndexConfig{ OnDisk: qdrant.PtrOf(false), }}, }), }) ``` \` The following parameters may affect performance: - `on_disk: true` \- The index is stored on disk, which lets you save memory. This may slow down search performance. - `on_disk: false` \- The index is still persisted on disk, but it is also loaded into memory for faster search. Unlike a dense vector index, a sparse vector index does not require a pre-defined vector size. It automatically adjusts to the size of the vectors added to the collection. **Note:** A sparse vector index only supports dot-product similarity searches. It does not support other distance metrics. ### [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#idf-modifier) IDF Modifier _Available as of v1.10.0_ For many search algorithms, it is important to consider how often an item occurs in a collection. Intuitively speaking, the less frequently an item appears in a collection, the more important it is in a search. This is also known as the Inverse Document Frequency (IDF). It is used in text search engines to rank search results based on the rarity of a word in a collection. IDF depends on the currently stored documents and therefore can’t be pre-computed in the sparse vectors in streaming inference mode. In order to support IDF in the sparse vector index, Qdrant provides an option to modify the sparse vector query with the IDF statistics automatically. The only requirement is to enable the IDF modifier in the collection configuration: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "sparse_vectors": { "text": { "modifier": "idf" } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config={}, sparse_vectors_config={ "text": models.SparseVectorParams( modifier=models.Modifier.IDF, ), }, ) ``` ```typescript import { QdrantClient, Schemas } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { sparse_vectors: { "text": { modifier: "idf" } } }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Modifier, SparseVectorParamsBuilder, SparseVectorsConfigBuilder, }; use qdrant_client::{Qdrant, QdrantError}; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut sparse_vectors_config = SparseVectorsConfigBuilder::default(); sparse_vectors_config.add_named_vector_params( "text", SparseVectorParamsBuilder::default().modifier(Modifier::Idf), ); client .create_collection( CreateCollectionBuilder::new("{collection_name}") .sparse_vectors_config(sparse_vectors_config), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Modifier; import io.qdrant.client.grpc.Collections.SparseVectorConfig; import io.qdrant.client.grpc.Collections.SparseVectorParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setSparseVectorsConfig( SparseVectorConfig.newBuilder() .putMap("text", SparseVectorParams.newBuilder().setModifier(Modifier.Idf).build())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", sparseVectorsConfig: ("text", new SparseVectorParams { Modifier = Modifier.Idf, }) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", SparseVectorsConfig: qdrant.NewSparseVectorsConfig( map[string]*qdrant.SparseVectorParams{ "text": { Modifier: qdrant.Modifier_Idf.Enum(), }, }), }) ``` Qdrant uses the following formula to calculate the IDF modifier: IDF(qi)=ln⁡(N−n(qi)+0.5n(qi)+0.5+1) Where: - `N` is the total number of documents in the collection. - `n` is the number of documents containing non-zero values for the given vector element. ## [Anchor](https://qdrant.tech/documentation/concepts/indexing/\#filtrable-index) Filtrable Index Separately, a payload index and a vector index cannot solve the problem of search using the filter completely. In the case of weak filters, you can use the HNSW index as it is. In the case of stringent filters, you can use the payload index and complete rescore. However, for cases in the middle, this approach does not work well. On the one hand, we cannot apply a full scan on too many vectors. On the other hand, the HNSW graph starts to fall apart when using too strict filters. ![HNSW fail](https://qdrant.tech/docs/precision_by_m.png) ![hnsw graph](https://qdrant.tech/docs/graph.gif) You can find more information on why this happens in our [blog post](https://blog.vasnetsov.com/posts/categorical-hnsw/). Qdrant solves this problem by extending the HNSW graph with additional edges based on the stored payload values. Extra edges allow you to efficiently search for nearby vectors using the HNSW index and apply filters as you search in the graph. This approach minimizes the overhead on condition checks since you only need to calculate the conditions for a small fraction of the points involved in the search. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/indexing.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/indexing.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-83-lllmstxt|> ## backups - [Documentation](https://qdrant.tech/documentation/) - [Cloud](https://qdrant.tech/documentation/cloud/) - Backup Clusters --- # [Anchor](https://qdrant.tech/documentation/cloud/backups/\#backing-up-qdrant-cloud-clusters) Backing up Qdrant Cloud Clusters Qdrant organizes cloud instances as clusters. On occasion, you may need to restore your cluster because of application or system failure. You may already have a source of truth for your data in a regular database. If you have a problem, you could reindex the data into your Qdrant vector search cluster. However, this process can take time. For high availability critical projects we recommend replication. It guarantees the proper cluster functionality as long as at least one replica is running. For other use-cases such as disaster recovery, you can set up automatic or self-service backups. ## [Anchor](https://qdrant.tech/documentation/cloud/backups/\#prerequisites) Prerequisites You can back up your Qdrant clusters though the Qdrant Cloud Dashboard at [https://cloud.qdrant.io](https://cloud.qdrant.io/). This section assumes that you’ve already set up your cluster, as described in the following sections: - [Create a cluster](https://qdrant.tech/documentation/cloud/create-cluster/) - Set up [Authentication](https://qdrant.tech/documentation/cloud/authentication/) - Configure one or more [Collections](https://qdrant.tech/documentation/concepts/collections/) ## [Anchor](https://qdrant.tech/documentation/cloud/backups/\#automatic-backups) Automatic Backups You can set up automatic backups of your clusters with our Cloud UI. With the procedures listed in this page, you can set up snapshots on a daily/weekly/monthly basis. You can keep as many snapshots as you need. You can restore a cluster from the snapshot of your choice. > Note: When you restore a snapshot, consider the following: > > - The affected cluster is not available while a snapshot is being restored. > - If you changed the cluster setup after the copy was created, the cluster > resets to the previous configuration. > - The previous configuration includes: > - CPU > - Memory > - Node count > - Qdrant version ### [Anchor](https://qdrant.tech/documentation/cloud/backups/\#configure-a-backup) Configure a Backup After you have taken the prerequisite steps, you can configure a backup with the [Qdrant Cloud Dashboard](https://cloud.qdrant.io/). To do so, take these steps: 1. On the **Cluster Detail Page** and select the **Backups** tab. 2. Now you can set up a backup schedule. The **Days of Retention** is the number of days after a backup snapshot is deleted. 3. Alternatively, you can select **Backup now** to take an immediate snapshot. ![Configure a cluster backup](https://qdrant.tech/documentation/cloud/backup-schedule.png) ### [Anchor](https://qdrant.tech/documentation/cloud/backups/\#restore-a-backup) Restore a Backup If you have a backup, it appears in the list of **Available Backups**. You can choose to restore or delete the backups of your choice. ![Restore or delete a cluster backup](https://qdrant.tech/documentation/cloud/restore-delete.png) ## [Anchor](https://qdrant.tech/documentation/cloud/backups/\#backups-with-a-snapshot) Backups With a Snapshot Qdrant also offers a snapshot API which allows you to create a snapshot of a specific collection or your entire cluster. For more information, see our [snapshot documentation](https://qdrant.tech/documentation/concepts/snapshots/). Here is how you can take a snapshot and recover a collection: 1. Take a snapshot: - For a single node cluster, call the snapshot endpoint on the exposed URL. - For a multi node cluster call a snapshot on each node of the collection. Specifically, prepend `node-{num}-` to your cluster URL. Then call the [snapshot endpoint](https://qdrant.tech/documentation/concepts/snapshots/#create-snapshot) on the individual hosts. Start with node 0. - In the response, you’ll see the name of the snapshot. 2. Delete and recreate the collection. 3. Recover the snapshot: - Call the [recover endpoint](https://qdrant.tech/documentation/concepts/snapshots/#recover-in-cluster-deployment). Set a location which points to the snapshot file ( `file:///qdrant/snapshots/{collection_name}/{snapshot_file_name}`) for each host. ## [Anchor](https://qdrant.tech/documentation/cloud/backups/\#backup-considerations) Backup Considerations Backups are incremental for AWS and GCP clusters. For example, if you have two backups, backup number 2 contains only the data that changed since backup number 1. This reduces the total cost of your backups. For Azure clusters, backups are based on total disk usage. The cost is calculated as half of the disk usage when the backup was taken. You can create multiple backup schedules. When you restore a snapshot, any changes made after the date of the snapshot are lost. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/backups.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/backups.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-84-lllmstxt|> ## qdrant-1.3.x - [Articles](https://qdrant.tech/articles/) - Introducing Qdrant 1.3.0 [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Introducing Qdrant 1.3.0 David Sertic · June 26, 2023 ![Introducing Qdrant 1.3.0](https://qdrant.tech/articles_data/qdrant-1.3.x/preview/title.jpg) A brand-new [Qdrant 1.3.0 release](https://github.com/qdrant/qdrant/releases/tag/v1.3.0) comes packed with a plethora of new features, performance improvements and bux fixes: 1. Asynchronous I/O interface: Reduce overhead by managing I/O operations asynchronously, thus minimizing context switches. 2. Oversampling for Quantization: Improve the accuracy and performance of your queries while using Scalar or Product Quantization. 3. Grouping API lookup: Storage optimization method that lets you look for points in another collection using group ids. 4. Qdrant Web UI: A convenient dashboard to help you manage data stored in Qdrant. 5. Temp directory for Snapshots: Set a separate storage directory for temporary snapshots on a faster disk. 6. Other important changes Your feedback is valuable to us, and are always tying to include some of your feature requests into our roadmap. Join [our Discord community](https://qdrant.to/discord) and help us build Qdrant!. ## [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#new-features) New features ### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#asychronous-io-interface) Asychronous I/O interface Going forward, we will support the `io_uring` asychnronous interface for storage devices on Linux-based systems. Since its introduction, `io_uring` has been proven to speed up slow-disk deployments as it decouples kernel work from the IO process. This interface uses two ring buffers to queue and manage I/O operations asynchronously, avoiding costly context switches and reducing overhead. Unlike mmap, it frees the user threads to do computations instead of waiting for the kernel to complete. ![io_uring](https://qdrant.tech/articles_data/qdrant-1.3.x/io-uring.png) #### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#enable-the-interface-from-your-config-file) Enable the interface from your config file: ```yaml storage: # enable the async scorer which uses io_uring async_scorer: true ``` You can return to the mmap based backend by either deleting the `async_scorer` entry or setting the value to `false`. This optimization will mainly benefit workloads with lots of disk IO (e.g. querying on-disk collections with rescoring). Please keep in mind that this feature is experimental and that the interface may change in further versions. ### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#oversampling-for-quantization) Oversampling for quantization We are introducing [oversampling](https://qdrant.tech/documentation/guides/quantization/#oversampling) as a new way to help you improve the accuracy and performance of similarity search algorithms. With this method, you are able to significantly compress high-dimensional vectors in memory and then compensate the accuracy loss by re-scoring additional points with the original vectors. You will experience much faster performance with quantization due to parallel disk usage when reading vectors. Much better IO means that you can keep quantized vectors in RAM, so the pre-selection will be even faster. Finally, once pre-selection is done, you can use parallel IO to retrieve original vectors, which is significantly faster than traversing HNSW on slow disks. #### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#set-the-oversampling-factor-via-query) Set the oversampling factor via query: Here is how you can configure the oversampling factor - define how many extra vectors should be pre-selected using the quantized index, and then re-scored using original vectors. httppython ```http POST /collections/{collection_name}/points/search { "params": { "quantization": { "ignore": false, "rescore": true, "oversampling": 2.4 } }, "vector": [0.2, 0.1, 0.9, 0.7], "limit": 100 } ``` ```python from qdrant_client import QdrantClient from qdrant_client.http import models client = QdrantClient("localhost", port=6333) client.search( collection_name="{collection_name}", query_vector=[0.2, 0.1, 0.9, 0.7], search_params=models.SearchParams( quantization=models.QuantizationSearchParams( ignore=False, rescore=True, oversampling=2.4 ) ) ) ``` In this case, if `oversampling` is 2.4 and `limit` is 100, then 240 vectors will be pre-selected using quantized index, and then the top 100 points will be returned after re-scoring with the unquantized vectors. As you can see from the example above, this parameter is set during the query. This is a flexible method that will let you tune query accuracy. While the index is not changed, you can decide how many points you want to retrieve using quantized vectors. ### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#grouping-api-lookup) Grouping API lookup In version 1.2.0, we introduced a mechanism for requesting groups of points. Our new feature extends this functionality by giving you the option to look for points in another collection using the group ids. We wanted to add this feature, since having a single point for the shared data of the same item optimizes storage use, particularly if the payload is large. This has the extra benefit of having a single point to update when the information shared by the points in a group changes. ![Group Lookup](https://qdrant.tech/articles_data/qdrant-1.3.x/group-lookup.png) For example, if you have a collection of documents, you may want to chunk them and store the points for the chunks in a separate collection, making sure that you store the point id from the document it belongs in the payload of the chunk point. #### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#adding-the-parameter-to-grouping-api-request) Adding the parameter to grouping API request: When using the grouping API, add the `with_lookup` parameter to bring the information from those points into each group: httppython ```http POST /collections/chunks/points/search/groups { // Same as in the regular search API "vector": [1.1], ..., // Grouping parameters "group_by": "document_id", "limit": 2, "group_size": 2, // Lookup parameters "with_lookup": { // Name of the collection to look up points in "collection_name": "documents", // Options for specifying what to bring from the payload // of the looked up point, true by default "with_payload": ["title", "text"], // Options for specifying what to bring from the vector(s) // of the looked up point, true by default "with_vectors: false, } } ``` ```python client.search_groups( collection_name="chunks", # Same as in the regular search() API query_vector=[1.1], ..., # Grouping parameters group_by="document_id", # Path of the field to group by limit=2, # Max amount of groups group_size=2, # Max amount of points per group # Lookup parameters with_lookup=models.WithLookup( # Name of the collection to look up points in collection_name="documents", # Options for specifying what to bring from the payload # of the looked up point, True by default with_payload=["title", "text"] # Options for specifying what to bring from the vector(s) # of the looked up point, True by default with_vectors=False, ) ) ``` ### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#qdrant-web-user-interface) Qdrant web user interface We are excited to announce a more user-friendly way to organize and work with your collections inside of Qdrant. Our dashboard’s design is simple, but very intuitive and easy to access. Try it out now! If you have Docker running, you can [quickstart Qdrant](https://qdrant.tech/documentation/quick-start/) and access the Dashboard locally from [http://localhost:6333/dashboard](http://localhost:6333/dashboard). You should see this simple access point to Qdrant: ![Qdrant Web UI](https://qdrant.tech/articles_data/qdrant-1.3.x/web-ui.png) ### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#temporary-directory-for-snapshots) Temporary directory for Snapshots Currently, temporary snapshot files are created inside the `/storage` directory. Oftentimes `/storage` is a network-mounted disk. Therefore, we found this method suboptimal because `/storage` is limited in disk size and also because writing data to it may affect disk performance as it consumes bandwidth. This new feature allows you to specify a different directory on another disk that is faster. We expect this feature to significantly optimize cloud performance. To change it, access `config.yaml` and set `storage.temp_path` to another directory location. ## [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#important-changes) Important changes The latest release focuses not only on the new features but also introduces some changes making Qdrant even more reliable. ### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#optimizing-group-requests) Optimizing group requests Internally, `is_empty` was not using the index when it was called, so it had to deserialize the whole payload to see if the key had values or not. Our new update makes sure to check the index first, before confirming with the payload if it is actually `empty`/ `null`, so these changes improve performance only when the negated condition is true (e.g. it improves when the field is not empty). Going forward, this will improve the way grouping API requests are handled. ### [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#faster-read-access-with-mmap) Faster read access with mmap If you used mmap, you most likely found that segments were always created with cold caches. The first request to the database needed to request the disk, which made startup slower despite plenty of RAM being available. We have implemeneted a way to ask the kernel to “heat up” the disk cache and make initialization much faster. The function is expected to be used on startup and after segment optimization and reloading of newly indexed segment. So far this is only implemented for “immutable” memmaps. ## [Anchor](https://qdrant.tech/articles/qdrant-1.3.x/\#release-notes) Release notes As usual, [our release notes](https://github.com/qdrant/qdrant/releases/tag/v1.3.0) describe all the changes introduced in the latest version. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.3.x.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.3.x.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-85-lllmstxt|> ## distributed_deployment - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Distributed Deployment --- # [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#distributed-deployment) Distributed deployment Since version v0.8.0 Qdrant supports a distributed deployment mode. In this mode, multiple Qdrant services communicate with each other to distribute the data across the peers to extend the storage capabilities and increase stability. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#how-many-qdrant-nodes-should-i-run) How many Qdrant nodes should I run? The ideal number of Qdrant nodes depends on how much you value cost-saving, resilience, and performance/scalability in relation to each other. - **Prioritizing cost-saving**: If cost is most important to you, run a single Qdrant node. This is not recommended for production environments. Drawbacks: - Resilience: Users will experience downtime during node restarts, and recovery is not possible unless you have backups or snapshots. - Performance: Limited to the resources of a single server. - **Prioritizing resilience**: If resilience is most important to you, run a Qdrant cluster with three or more nodes and two or more shard replicas. Clusters with three or more nodes and replication can perform all operations even while one node is down. Additionally, they gain performance benefits from load-balancing and they can recover from the permanent loss of one node without the need for backups or snapshots (but backups are still strongly recommended). This is most recommended for production environments. Drawbacks: - Cost: Larger clusters are more costly than smaller clusters, which is the only drawback of this configuration. - **Balancing cost, resilience, and performance**: Running a two-node Qdrant cluster with replicated shards allows the cluster to respond to most read/write requests even when one node is down, such as during maintenance events. Having two nodes also means greater performance than a single-node cluster while still being cheaper than a three-node cluster. Drawbacks: - Resilience (uptime): The cluster cannot perform operations on collections when one node is down. Those operations require >50% of nodes to be running, so this is only possible in a 3+ node cluster. Since creating, editing, and deleting collections are usually rare operations, many users find this drawback to be negligible. - Resilience (data integrity): If the data on one of the two nodes is permanently lost or corrupted, it cannot be recovered aside from snapshots or backups. Only 3+ node clusters can recover from the permanent loss of a single node since recovery operations require >50% of the cluster to be healthy. - Cost: Replicating your shards requires storing two copies of your data. - Performance: The maximum performance of a Qdrant cluster increases as you add more nodes. In summary, single-node clusters are best for non-production workloads, replicated 3+ node clusters are the gold standard, and replicated 2-node clusters strike a good balance. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#enabling-distributed-mode-in-self-hosted-qdrant) Enabling distributed mode in self-hosted Qdrant To enable distributed deployment - enable the cluster mode in the [configuration](https://qdrant.tech/documentation/guides/configuration/) or using the ENV variable: `QDRANT__CLUSTER__ENABLED=true`. ```yaml cluster: # Use `enabled: true` to run Qdrant in distributed deployment mode enabled: true # Configuration of the inter-cluster communication p2p: # Port for internal communication between peers port: 6335 # Configuration related to distributed consensus algorithm consensus: # How frequently peers should ping each other. # Setting this parameter to lower value will allow consensus # to detect disconnected node earlier, but too frequent # tick period may create significant network and CPU overhead. # We encourage you NOT to change this parameter unless you know what you are doing. tick_period_ms: 100 ``` By default, Qdrant will use port `6335` for its internal communication. All peers should be accessible on this port from within the cluster, but make sure to isolate this port from outside access, as it might be used to perform write operations. Additionally, you must provide the `--uri` flag to the first peer so it can tell other nodes how it should be reached: ```bash ./qdrant --uri 'http://qdrant_node_1:6335' ``` Subsequent peers in a cluster must know at least one node of the existing cluster to synchronize through it with the rest of the cluster. To do this, they need to be provided with a bootstrap URL: ```bash ./qdrant --bootstrap 'http://qdrant_node_1:6335' ``` The URL of the new peers themselves will be calculated automatically from the IP address of their request. But it is also possible to provide them individually using the `--uri` argument. ```text USAGE: qdrant [OPTIONS] OPTIONS: --bootstrap Uri of the peer to bootstrap from in case of multi-peer deployment. If not specified - this peer will be considered as a first in a new deployment --uri Uri of this peer. Other peers should be able to reach it by this uri. This value has to be supplied if this is the first peer in a new deployment. In case this is not the first peer and it bootstraps the value is optional. If not supplied then qdrant will take internal grpc port from config and derive the IP address of this peer on bootstrap peer (receiving side) ``` After a successful synchronization you can observe the state of the cluster through the [REST API](https://api.qdrant.tech/master/api-reference/distributed/cluster-status): ```http GET /cluster ``` Example result: ```json { "result": { "status": "enabled", "peer_id": 11532566549086892000, "peers": { "9834046559507417430": { "uri": "http://172.18.0.3:6335/" }, "11532566549086892528": { "uri": "http://qdrant_node_1:6335/" } }, "raft_info": { "term": 1, "commit": 4, "pending_operations": 1, "leader": 11532566549086892000, "role": "Leader" } }, "status": "ok", "time": 5.731e-06 } ``` Note that enabling distributed mode does not automatically replicate your data. See the section on [making use of a new distributed Qdrant cluster](https://qdrant.tech/documentation/guides/distributed_deployment/#making-use-of-a-new-distributed-qdrant-cluster) for the next steps. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#enabling-distributed-mode-in-qdrant-cloud) Enabling distributed mode in Qdrant Cloud For best results, first ensure your cluster is running Qdrant v1.7.4 or higher. Older versions of Qdrant do support distributed mode, but improvements in v1.7.4 make distributed clusters more resilient during outages. In the [Qdrant Cloud console](https://cloud.qdrant.io/), click “Scale Up” to increase your cluster size to >1. Qdrant Cloud configures the distributed mode settings automatically. After the scale-up process completes, you will have a new empty node running alongside your existing node(s). To replicate data into this new empty node, see the next section. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#making-use-of-a-new-distributed-qdrant-cluster) Making use of a new distributed Qdrant cluster When you enable distributed mode and scale up to two or more nodes, your data does not move to the new node automatically; it starts out empty. To make use of your new empty node, do one of the following: - Create a new replicated collection by setting the [replication\_factor](https://qdrant.tech/documentation/guides/distributed_deployment/#replication-factor) to 2 or more and setting the [number of shards](https://qdrant.tech/documentation/guides/distributed_deployment/#choosing-the-right-number-of-shards) to a multiple of your number of nodes. - If you have an existing collection which does not contain enough shards for each node, you must create a new collection as described in the previous bullet point. - If you already have enough shards for each node and you merely need to replicate your data, follow the directions for [creating new shard replicas](https://qdrant.tech/documentation/guides/distributed_deployment/#creating-new-shard-replicas). - If you already have enough shards for each node and your data is already replicated, you can move data (without replicating it) onto the new node(s) by [moving shards](https://qdrant.tech/documentation/guides/distributed_deployment/#moving-shards). ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#raft) Raft Qdrant uses the [Raft](https://raft.github.io/) consensus protocol to maintain consistency regarding the cluster topology and the collections structure. Operations on points, on the other hand, do not go through the consensus infrastructure. Qdrant is not intended to have strong transaction guarantees, which allows it to perform point operations with low overhead. In practice, it means that Qdrant does not guarantee atomic distributed updates but allows you to wait until the [operation is complete](https://qdrant.tech/documentation/concepts/points/#awaiting-result) to see the results of your writes. Operations on collections, on the contrary, are part of the consensus which guarantees that all operations are durable and eventually executed by all nodes. In practice it means that a majority of nodes agree on what operations should be applied before the service will perform them. Practically, it means that if the cluster is in a transition state - either electing a new leader after a failure or starting up, the collection update operations will be denied. You may use the cluster [REST API](https://api.qdrant.tech/master/api-reference/distributed/cluster-status) to check the state of the consensus. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#sharding) Sharding A Collection in Qdrant is made of one or more shards. A shard is an independent store of points which is able to perform all operations provided by collections. There are two methods of distributing points across shards: - **Automatic sharding**: Points are distributed among shards by using a [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing) algorithm, so that shards are managing non-intersecting subsets of points. This is the default behavior. - **User-defined sharding**: _Available as of v1.7.0_ \- Each point is uploaded to a specific shard, so that operations can hit only the shard or shards they need. Even with this distribution, shards still ensure having non-intersecting subsets of points. [See more…](https://qdrant.tech/documentation/guides/distributed_deployment/#user-defined-sharding) Each node knows where all parts of the collection are stored through the [consensus protocol](https://qdrant.tech/documentation/guides/distributed_deployment/#raft), so when you send a search request to one Qdrant node, it automatically queries all other nodes to obtain the full search result. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#choosing-the-right-number-of-shards) Choosing the right number of shards When you create a collection, Qdrant splits the collection into `shard_number` shards. If left unset, `shard_number` is set to the number of nodes in your cluster when the collection was created. The `shard_number` cannot be changed without recreating the collection. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 300, "distance": "Cosine" }, "shard_number": 6 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=300, distance=models.Distance.COSINE), shard_number=6, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 300, distance: "Cosine", }, shard_number: 6, }); ``` ```rust use qdrant_client::qdrant::{CreateCollectionBuilder, Distance, VectorParamsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(300, Distance::Cosine)) .shard_number(6), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(300) .setDistance(Distance.Cosine) .build()) .build()) .setShardNumber(6) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 300, Distance = Distance.Cosine }, shardNumber: 6 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 300, Distance: qdrant.Distance_Cosine, }), ShardNumber: qdrant.PtrOf(uint32(6)), }) ``` To ensure all nodes in your cluster are evenly utilized, the number of shards must be a multiple of the number of nodes you are currently running in your cluster. > Aside: Advanced use cases such as multitenancy may require an uneven distribution of shards. See [Multitenancy](https://qdrant.tech/articles/multitenancy/). We recommend creating at least 2 shards per node to allow future expansion without having to re-shard. [Resharding](https://qdrant.tech/documentation/guides/distributed_deployment/#resharding) is possible when using our cloud offering, but should be avoided if hosting elsewhere as it would require creating a new collection. If you anticipate a lot of growth, we recommend 12 shards since you can expand from 1 node up to 2, 3, 6, and 12 nodes without having to re-shard. Having more than 12 shards in a small cluster may not be worth the performance overhead. Shards are evenly distributed across all existing nodes when a collection is first created, but Qdrant does not automatically rebalance shards if your cluster size or replication factor changes (since this is an expensive operation on large clusters). See the next section for how to move shards after scaling operations. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#resharding) Resharding _Available as of v1.13.0 in Cloud_ Resharding allows you to change the number of shards in your existing collections if you’re hosting with our [Cloud](https://qdrant.tech/documentation/cloud-intro/) offering. Resharding can change the number of shards both up and down, without having to recreate the collection from scratch. Please refer to the [Resharding](https://qdrant.tech/documentation/cloud/cluster-scaling/#resharding) section in our cloud documentation for more details. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#moving-shards) Moving shards _Available as of v0.9.0_ Qdrant allows moving shards between nodes in the cluster and removing nodes from the cluster. This functionality unlocks the ability to dynamically scale the cluster size without downtime. It also allows you to upgrade or migrate nodes without downtime. Qdrant provides the information regarding the current shard distribution in the cluster with the [Collection Cluster info API](https://api.qdrant.tech/master/api-reference/distributed/collection-cluster-info). Use the [Update collection cluster setup API](https://api.qdrant.tech/master/api-reference/distributed/update-collection-cluster) to initiate the shard transfer: ```http POST /collections/{collection_name}/cluster { "move_shard": { "shard_id": 0, "from_peer_id": 381894127, "to_peer_id": 467122995 } } ``` After the transfer is initiated, the service will process it based on the used [transfer method](https://qdrant.tech/documentation/guides/distributed_deployment/#shard-transfer-method) keeping both shards in sync. Once the transfer is completed, the old shard is deleted from the source node. In case you want to downscale the cluster, you can move all shards away from a peer and then remove the peer using the [remove peer API](https://api.qdrant.tech/master/api-reference/distributed/remove-peer). ```http DELETE /cluster/peer/{peer_id} ``` After that, Qdrant will exclude the node from the consensus, and the instance will be ready for shutdown. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#user-defined-sharding) User-defined sharding _Available as of v1.7.0_ Qdrant allows you to specify the shard for each point individually. This feature is useful if you want to control the shard placement of your data, so that operations can hit only the subset of shards they actually need. In big clusters, this can significantly improve the performance of operations that do not require the whole collection to be scanned. A clear use-case for this feature is managing a multi-tenant collection, where each tenant (let it be a user or organization) is assumed to be segregated, so they can have their data stored in separate shards. To enable user-defined sharding, set `sharding_method` to `custom` during collection creation: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "shard_number": 1, "sharding_method": "custom" // ... other collection parameters } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", shard_number=1, sharding_method=models.ShardingMethod.CUSTOM, # ... other collection parameters ) client.create_shard_key("{collection_name}", "{shard_key}") ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { shard_number: 1, sharding_method: "custom", // ... other collection parameters }); client.createShardKey("{collection_name}", { shard_key: "{shard_key}" }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, CreateShardKeyBuilder, CreateShardKeyRequestBuilder, Distance, ShardingMethod, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(300, Distance::Cosine)) .shard_number(1) .sharding_method(ShardingMethod::Custom.into()), ) .await?; client .create_shard_key( CreateShardKeyRequestBuilder::new("{collection_name}") .request(CreateShardKeyBuilder::default().shard_key("{shard_key".to_string())), ) .await?; ``` ```java import static io.qdrant.client.ShardKeyFactory.shardKey; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.ShardingMethod; import io.qdrant.client.grpc.Collections.CreateShardKey; import io.qdrant.client.grpc.Collections.CreateShardKeyRequest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") // ... other collection parameters .setShardNumber(1) .setShardingMethod(ShardingMethod.Custom) .build()) .get(); client.createShardKeyAsync(CreateShardKeyRequest.newBuilder() .setCollectionName("{collection_name}") .setRequest(CreateShardKey.newBuilder() .setShardKey(shardKey("{shard_key}")) .build()) .build()).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", // ... other collection parameters shardNumber: 1, shardingMethod: ShardingMethod.Custom ); await client.CreateShardKeyAsync( "{collection_name}", new CreateShardKey { ShardKey = new ShardKey { Keyword = "{shard_key}", } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", // ... other collection parameters ShardNumber: qdrant.PtrOf(uint32(1)), ShardingMethod: qdrant.ShardingMethod_Custom.Enum(), }) client.CreateShardKey(context.Background(), "{collection_name}", &qdrant.CreateShardKey{ ShardKey: qdrant.NewShardKey("{shard_key}"), }) ``` In this mode, the `shard_number` means the number of shards per shard key, where points will be distributed evenly. For example, if you have 10 shard keys and a collection config with these settings: ```json { "shard_number": 1, "sharding_method": "custom", "replication_factor": 2 } ``` Then you will have `1 * 10 * 2 = 20` total physical shards in the collection. Physical shards require a large amount of resources, so make sure your custom sharding key has a low cardinality. For large cardinality keys, it is recommended to use [partition by payload](https://qdrant.tech/documentation/guides/multiple-partitions/#partition-by-payload) instead. To specify the shard for each point, you need to provide the `shard_key` field in the upsert request: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1111,\ "vector": [0.1, 0.2, 0.3]\ },\ ] "shard_key": "user_1" } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1111,\ vector=[0.1, 0.2, 0.3],\ ),\ ], shard_key_selector="user_1", ) ``` ```typescript client.upsert("{collection_name}", { points: [\ {\ id: 1111,\ vector: [0.1, 0.2, 0.3],\ },\ ], shard_key: "user_1", }); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; use qdrant_client::Payload; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![PointStruct::new(\ 111,\ vec![0.1, 0.2, 0.3],\ Payload::default(),\ )], ) .shard_key_selector("user_1".to_string()), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ShardKeySelectorFactory.shardKeySelector; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; import io.qdrant.client.grpc.Points.UpsertPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( UpsertPoints.newBuilder() .setCollectionName("{collection_name}") .addAllPoints( List.of( PointStruct.newBuilder() .setId(id(111)) .setVectors(vectors(0.1f, 0.2f, 0.3f)) .build())) .setShardKeySelector(shardKeySelector("user_1")) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 111, Vectors = new[] { 0.1f, 0.2f, 0.3f } } }, shardKeySelector: new ShardKeySelector { ShardKeys = { new List { "user_1" } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(111), Vectors: qdrant.NewVectors(0.1, 0.2, 0.3), }, }, ShardKeySelector: &qdrant.ShardKeySelector{ ShardKeys: []*qdrant.ShardKey{ qdrant.NewShardKey("user_1"), }, }, }) ``` **\*** When using custom sharding, IDs are only enforced to be unique within a shard key. This means that you can have multiple points with the same ID, if they have different shard keys. This is a limitation of the current implementation, and is an anti-pattern that should be avoided because it can create scenarios of points with the same ID to have different contents. In the future, we plan to add a global ID uniqueness check. Now you can target the operations to specific shard(s) by specifying the `shard_key` on any operation you do. Operations that do not specify the shard key will be executed on **all** shards. Another use-case would be to have shards that track the data chronologically, so that you can do more complex itineraries like uploading live data in one shard and archiving it once a certain age has passed. ![Sharding per day](https://qdrant.tech/docs/sharding-per-day.png) ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#shard-transfer-method) Shard transfer method _Available as of v1.7.0_ There are different methods for transferring a shard, such as moving or replicating, to another node. Depending on what performance and guarantees you’d like to have and how you’d like to manage your cluster, you likely want to choose a specific method. Each method has its own pros and cons. Which is fastest depends on the size and state of a shard. Available shard transfer methods are: - `stream_records`: _(default)_ transfer by streaming just its records to the target node in batches. - `snapshot`: transfer including its index and quantized data by utilizing a [snapshot](https://qdrant.tech/documentation/concepts/snapshots/) automatically. - `wal_delta`: _(auto recovery default)_ transfer by resolving [WAL](https://qdrant.tech/documentation/concepts/storage/#versioning) difference; the operations that were missed. Each has pros, cons and specific requirements, some of which are: | Method: | Stream records | Snapshot | WAL delta | | --- | --- | --- | --- | | **Version** | v0.8.0+ | v1.7.0+ | v1.8.0+ | | **Target** | New/existing shard | New/existing shard | Existing shard | | **Connectivity** | Internal gRPC API (6335) | REST API (6333)
Internal gRPC API (6335) | Internal gRPC API (6335) | | **HNSW index** | Doesn’t transfer, will reindex on target. | Does transfer, immediately ready on target. | Doesn’t transfer, may index on target. | | **Quantization** | Doesn’t transfer, will requantize on target. | Does transfer, immediately ready on target. | Doesn’t transfer, may quantize on target. | | **Ordering** | Unordered updates on target[1](https://qdrant.tech/documentation/guides/distributed_deployment/#fn:1) | Ordered updates on target[2](https://qdrant.tech/documentation/guides/distributed_deployment/#fn:2) | Ordered updates on target[2](https://qdrant.tech/documentation/guides/distributed_deployment/#fn:2) | | **Disk space** | No extra required | Extra required for snapshot on both nodes | No extra required | To select a shard transfer method, specify the `method` like: ```http POST /collections/{collection_name}/cluster { "move_shard": { "shard_id": 0, "from_peer_id": 381894127, "to_peer_id": 467122995, "method": "snapshot" } } ``` The `stream_records` transfer method is the simplest available. It simply transfers all shard records in batches to the target node until it has transferred all of them, keeping both shards in sync. It will also make sure the transferred shard indexing process is keeping up before performing a final switch. The method has two common disadvantages: 1. It does not transfer index or quantization data, meaning that the shard has to be optimized again on the new node, which can be very expensive. 2. The ordering guarantees are `weak` [1](https://qdrant.tech/documentation/guides/distributed_deployment/#fn:1), which is not suitable for some applications. Because it is so simple, it’s also very robust, making it a reliable choice if the above cons are acceptable in your use case. If your cluster is unstable and out of resources, it’s probably best to use the `stream_records` transfer method, because it is unlikely to fail. The `snapshot` transfer method utilizes [snapshots](https://qdrant.tech/documentation/concepts/snapshots/) to transfer a shard. A snapshot is created automatically. It is then transferred and restored on the target node. After this is done, the snapshot is removed from both nodes. While the snapshot/transfer/restore operation is happening, the source node queues up all new operations. All queued updates are then sent in order to the target shard to bring it into the same state as the source. There are two important benefits: 1. It transfers index and quantization data, so that the shard does not have to be optimized again on the target node, making them immediately available. This way, Qdrant ensures that there will be no degradation in performance at the end of the transfer. Especially on large shards, this can give a huge performance improvement. 2. The ordering guarantees can be `strong` [2](https://qdrant.tech/documentation/guides/distributed_deployment/#fn:2), required for some applications. The `wal_delta` transfer method only transfers the difference between two shards. More specifically, it transfers all operations that were missed to the target shard. The [WAL](https://qdrant.tech/documentation/concepts/storage/#versioning) of both shards is used to resolve this. There are two benefits: 1. It will be very fast because it only transfers the difference rather than all data. 2. The ordering guarantees can be `strong` [2](https://qdrant.tech/documentation/guides/distributed_deployment/#fn:2), required for some applications. Two disadvantages are: 1. It can only be used to transfer to a shard that already exists on the other node. 2. Applicability is limited because the WALs normally don’t hold more than 64MB of recent operations. But that should be enough for a node that quickly restarts, to upgrade for example. If a delta cannot be resolved, this method automatically falls back to `stream_records` which equals transferring the full shard. The `stream_records` method is currently used as default. This may change in the future. As of Qdrant 1.9.0 `wal_delta` is used for automatic shard replications to recover dead shards. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#replication) Replication Qdrant allows you to replicate shards between nodes in the cluster. Shard replication increases the reliability of the cluster by keeping several copies of a shard spread across the cluster. This ensures the availability of the data in case of node failures, except if all replicas are lost. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#replication-factor) Replication factor When you create a collection, you can control how many shard replicas you’d like to store by changing the `replication_factor`. By default, `replication_factor` is set to “1”, meaning no additional copy is maintained automatically. The default can be changed in the [Qdrant configuration](https://qdrant.tech/documentation/guides/configuration/#configuration-options). You can change that by setting the `replication_factor` when you create a collection. The `replication_factor` can be updated for an existing collection, but the effect of this depends on how you’re running Qdrant. If you’re hosting the open source version of Qdrant yourself, changing the replication factor after collection creation doesn’t do anything. You can manually [create](https://qdrant.tech/documentation/guides/distributed_deployment/#creating-new-shard-replicas) or drop shard replicas to achieve your desired replication factor. In Qdrant Cloud (including Hybrid Cloud, Private Cloud) your shards will automatically be replicated or dropped to match your configured replication factor. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 300, "distance": "Cosine" }, "shard_number": 6, "replication_factor": 2 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=300, distance=models.Distance.COSINE), shard_number=6, replication_factor=2, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 300, distance: "Cosine", }, shard_number: 6, replication_factor: 2, }); ``` ```rust use qdrant_client::qdrant::{CreateCollectionBuilder, Distance, VectorParamsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(300, Distance::Cosine)) .shard_number(6) .replication_factor(2), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(300) .setDistance(Distance.Cosine) .build()) .build()) .setShardNumber(6) .setReplicationFactor(2) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 300, Distance = Distance.Cosine }, shardNumber: 6, replicationFactor: 2 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 300, Distance: qdrant.Distance_Cosine, }), ShardNumber: qdrant.PtrOf(uint32(6)), ReplicationFactor: qdrant.PtrOf(uint32(2)), }) ``` This code sample creates a collection with a total of 6 logical shards backed by a total of 12 physical shards. Since a replication factor of “2” would require twice as much storage space, it is advised to make sure the hardware can host the additional shard replicas beforehand. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#creating-new-shard-replicas) Creating new shard replicas It is possible to create or delete replicas manually on an existing collection using the [Update collection cluster setup API](https://api.qdrant.tech/master/api-reference/distributed/update-collection-cluster). This is usually only necessary if you run Qdrant open-source. In Qdrant Cloud shard replication is handled and updated automatically, matching the configured `replication_factor`. A replica can be added on a specific peer by specifying the peer from which to replicate. ```http POST /collections/{collection_name}/cluster { "replicate_shard": { "shard_id": 0, "from_peer_id": 381894127, "to_peer_id": 467122995 } } ``` And a replica can be removed on a specific peer. ```http POST /collections/{collection_name}/cluster { "drop_replica": { "shard_id": 0, "peer_id": 381894127 } } ``` Keep in mind that a collection must contain at least one active replica of a shard. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#error-handling) Error handling Replicas can be in different states: - Active: healthy and ready to serve traffic - Dead: unhealthy and not ready to serve traffic - Partial: currently under resynchronization before activation A replica is marked as dead if it does not respond to internal healthchecks or if it fails to serve traffic. A dead replica will not receive traffic from other peers and might require a manual intervention if it does not recover automatically. This mechanism ensures data consistency and availability if a subset of the replicas fail during an update operation. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#node-failure-recovery) Node Failure Recovery Sometimes hardware malfunctions might render some nodes of the Qdrant cluster unrecoverable. No system is immune to this. But several recovery scenarios allow qdrant to stay available for requests and even avoid performance degradation. Let’s walk through them from best to worst. **Recover with replicated collection** If the number of failed nodes is less than the replication factor of the collection, then your cluster should still be able to perform read, search and update queries. Now, if the failed node restarts, consensus will trigger the replication process to update the recovering node with the newest updates it has missed. If the failed node never restarts, you can recover the lost shards if you have a 3+ node cluster. You cannot recover lost shards in smaller clusters because recovery operations go through [raft](https://qdrant.tech/documentation/guides/distributed_deployment/#raft) which requires >50% of the nodes to be healthy. **Recreate node with replicated collections** If a node fails and it is impossible to recover it, you should exclude the dead node from the consensus and create an empty node. To exclude failed nodes from the consensus, use [remove peer](https://api.qdrant.tech/master/api-reference/distributed/remove-peer) API. Apply the `force` flag if necessary. When you create a new node, make sure to attach it to the existing cluster by specifying `--bootstrap` CLI parameter with the URL of any of the running cluster nodes. Once the new node is ready and synchronized with the cluster, you might want to ensure that the collection shards are replicated enough. Remember that Qdrant will not automatically balance shards since this is an expensive operation. Use the [Replicate Shard Operation](https://api.qdrant.tech/master/api-reference/distributed/update-collection-cluster) to create another copy of the shard on the newly connected node. It’s worth mentioning that Qdrant only provides the necessary building blocks to create an automated failure recovery. Building a completely automatic process of collection scaling would require control over the cluster machines themself. Check out our [cloud solution](https://qdrant.to/cloud), where we made exactly that. **Recover from snapshot** If there are no copies of data in the cluster, it is still possible to recover from a snapshot. Follow the same steps to detach failed node and create a new one in the cluster: - To exclude failed nodes from the consensus, use [remove peer](https://api.qdrant.tech/master/api-reference/distributed/remove-peer) API. Apply the `force` flag if necessary. - Create a new node, making sure to attach it to the existing cluster by specifying the `--bootstrap` CLI parameter with the URL of any of the running cluster nodes. Snapshot recovery, used in single-node deployment, is different from cluster one. Consensus manages all metadata about all collections and does not require snapshots to recover it. But you can use snapshots to recover missing shards of the collections. Use the [Collection Snapshot Recovery API](https://qdrant.tech/documentation/concepts/snapshots/#recover-in-cluster-deployment) to do it. The service will download the specified snapshot of the collection and recover shards with data from it. Once all shards of the collection are recovered, the collection will become operational again. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#temporary-node-failure) Temporary node failure If properly configured, running Qdrant in distributed mode can make your cluster resistant to outages when one node fails temporarily. Here is how differently-configured Qdrant clusters respond: - 1-node clusters: All operations time out or fail for up to a few minutes. It depends on how long it takes to restart and load data from disk. - 2-node clusters where shards ARE NOT replicated: All operations will time out or fail for up to a few minutes. It depends on how long it takes to restart and load data from disk. - 2-node clusters where all shards ARE replicated to both nodes: All requests except for operations on collections continue to work during the outage. - 3+-node clusters where all shards are replicated to at least 2 nodes: All requests continue to work during the outage. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#consistency-guarantees) Consistency guarantees By default, Qdrant focuses on availability and maximum throughput of search operations. For the majority of use cases, this is a preferable trade-off. During the normal state of operation, it is possible to search and modify data from any peers in the cluster. Before responding to the client, the peer handling the request dispatches all operations according to the current topology in order to keep the data synchronized across the cluster. - reads are using a partial fan-out strategy to optimize latency and availability - writes are executed in parallel on all active sharded replicas ![Embeddings](https://qdrant.tech/docs/concurrent-operations-replicas.png) However, in some cases, it is necessary to ensure additional guarantees during possible hardware instabilities, mass concurrent updates of same documents, etc. Qdrant provides a few options to control consistency guarantees: - `write_consistency_factor` \- defines the number of replicas that must acknowledge a write operation before responding to the client. Increasing this value will make write operations tolerant to network partitions in the cluster, but will require a higher number of replicas to be active to perform write operations. - Read `consistency` param, can be used with search and retrieve operations to ensure that the results obtained from all replicas are the same. If this option is used, Qdrant will perform the read operation on multiple replicas and resolve the result according to the selected strategy. This option is useful to avoid data inconsistency in case of concurrent updates of the same documents. This options is preferred if the update operations are frequent and the number of replicas is low. - Write `ordering` param, can be used with update and delete operations to ensure that the operations are executed in the same order on all replicas. If this option is used, Qdrant will route the operation to the leader replica of the shard and wait for the response before responding to the client. This option is useful to avoid data inconsistency in case of concurrent updates of the same documents. This options is preferred if read operations are more frequent than update and if search performance is critical. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#write-consistency-factor) Write consistency factor The `write_consistency_factor` represents the number of replicas that must acknowledge a write operation before responding to the client. It is set to 1 by default. It can be configured at the collection’s creation or when updating the collection parameters. This value can range from 1 to the number of replicas you have for each shard. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 300, "distance": "Cosine" }, "shard_number": 6, "replication_factor": 2, "write_consistency_factor": 2 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=300, distance=models.Distance.COSINE), shard_number=6, replication_factor=2, write_consistency_factor=2, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 300, distance: "Cosine", }, shard_number: 6, replication_factor: 2, write_consistency_factor: 2, }); ``` ```rust use qdrant_client::qdrant::{CreateCollectionBuilder, Distance, VectorParamsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(300, Distance::Cosine)) .shard_number(6) .replication_factor(2) .write_consistency_factor(2), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(300) .setDistance(Distance.Cosine) .build()) .build()) .setShardNumber(6) .setReplicationFactor(2) .setWriteConsistencyFactor(2) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 300, Distance = Distance.Cosine }, shardNumber: 6, replicationFactor: 2, writeConsistencyFactor: 2 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 300, Distance: qdrant.Distance_Cosine, }), ShardNumber: qdrant.PtrOf(uint32(6)), ReplicationFactor: qdrant.PtrOf(uint32(2)), WriteConsistencyFactor: qdrant.PtrOf(uint32(2)), }) ``` Write operations will fail if the number of active replicas is less than the `write_consistency_factor`. In this case, the client is expected to send the operation again to ensure a consistent state is reached. Setting the `write_consistency_factor` to a lower value may allow accepting writes even if there are unresponsive nodes. Unresponsive nodes are marked as dead and will automatically be recovered once available to ensure data consistency. The configuration of the `write_consistency_factor` is important for adjusting the cluster’s behavior when some nodes go offline due to restarts, upgrades, or failures. By default, the cluster continues to accept updates as long as at least one replica of each shard is online. However, this behavior means that once an offline replica is restored, it will require additional synchronization with the rest of the cluster. In some cases, this synchronization can be resource-intensive and undesirable. Setting the `write_consistency_factor` to match the replication factor modifies the cluster’s behavior so that unreplicated updates are rejected, preventing the need for extra synchronization. If the update is applied to enough replicas - according to the `write_consistency_factor` \- the update will return a successful status. Any replicas that failed to apply the update will be temporarily disabled and are automatically recovered to keep data consistency. If the update could not be applied to enough replicas, it’ll return an error and may be partially applied. The user must submit the operation again to ensure data consistency. For asynchronous updates and injection pipelines capable of handling errors and retries, this strategy might be preferable. ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#read-consistency) Read consistency Read `consistency` can be specified for most read requests and will ensure that the returned result is consistent across cluster nodes. - `all` will query all nodes and return points, which present on all of them - `majority` will query all nodes and return points, which present on the majority of them - `quorum` will query randomly selected majority of nodes and return points, which present on all of them - `1`/ `2`/ `3`/etc - will query specified number of randomly selected nodes and return points which present on all of them - default `consistency` is `1` httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query?consistency=majority { "query": [0.2, 0.1, 0.9, 0.7], "filter": { "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ] }, "params": { "hnsw_ef": 128, "exact": false }, "limit": 3 } ``` ```python client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], query_filter=models.Filter( must=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(\ value="London",\ ),\ )\ ] ), search_params=models.SearchParams(hnsw_ef=128, exact=False), limit=3, consistency="majority", ) ``` ```typescript client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], filter: { must: [{ key: "city", match: { value: "London" } }], }, params: { hnsw_ef: 128, exact: false, }, limit: 3, consistency: "majority", }); ``` ```rust use qdrant_client::qdrant::{ read_consistency::Value, Condition, Filter, QueryPointsBuilder, ReadConsistencyType, SearchParamsBuilder, }; use qdrant_client::{Qdrant, QdrantError}; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .filter(Filter::must([Condition::matches(\ "city",\ "London".to_string(),\ )])) .params(SearchParamsBuilder::default().hnsw_ef(128).exact(false)) .read_consistency(Value::Type(ReadConsistencyType::Majority.into())), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.ReadConsistency; import io.qdrant.client.grpc.Points.ReadConsistencyType; import io.qdrant.client.grpc.Points.SearchParams; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.ConditionFactory.matchKeyword; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter(Filter.newBuilder().addMust(matchKeyword("city", "London")).build()) .setQuery(nearest(.2f, 0.1f, 0.9f, 0.7f)) .setParams(SearchParams.newBuilder().setHnswEf(128).setExact(false).build()) .setLimit(3) .setReadConsistency( ReadConsistency.newBuilder().setType(ReadConsistencyType.Majority).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, filter: MatchKeyword("city", "London"), searchParams: new SearchParams { HnswEf = 128, Exact = false }, limit: 3, readConsistency: new ReadConsistency { Type = ReadConsistencyType.Majority } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), }, }, Params: &qdrant.SearchParams{ HnswEf: qdrant.PtrOf(uint64(128)), }, Limit: qdrant.PtrOf(uint64(3)), ReadConsistency: qdrant.NewReadConsistencyType(qdrant.ReadConsistencyType_Majority), }) ``` ### [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#write-ordering) Write ordering Write `ordering` can be specified for any write request to serialize it through a single “leader” node, which ensures that all write operations (issued with the same `ordering`) are performed and observed sequentially. - `weak` _(default)_ ordering does not provide any additional guarantees, so write operations can be freely reordered. - `medium` ordering serializes all write operations through a dynamically elected leader, which might cause minor inconsistencies in case of leader change. - `strong` ordering serializes all write operations through the permanent leader, which provides strong consistency, but write operations may be unavailable if the leader is down. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points?ordering=strong { "batch": { "ids": [1, 2, 3], "payloads": [\ {"color": "red"},\ {"color": "green"},\ {"color": "blue"}\ ], "vectors": [\ [0.9, 0.1, 0.1],\ [0.1, 0.9, 0.1],\ [0.1, 0.1, 0.9]\ ] } } ``` ```python client.upsert( collection_name="{collection_name}", points=models.Batch( ids=[1, 2, 3], payloads=[\ {"color": "red"},\ {"color": "green"},\ {"color": "blue"},\ ], vectors=[\ [0.9, 0.1, 0.1],\ [0.1, 0.9, 0.1],\ [0.1, 0.1, 0.9],\ ], ), ordering=models.WriteOrdering.STRONG, ) ``` ```typescript client.upsert("{collection_name}", { batch: { ids: [1, 2, 3], payloads: [{ color: "red" }, { color: "green" }, { color: "blue" }], vectors: [\ [0.9, 0.1, 0.1],\ [0.1, 0.9, 0.1],\ [0.1, 0.1, 0.9],\ ], }, ordering: "strong", }); ``` ```rust use qdrant_client::qdrant::{ PointStruct, UpsertPointsBuilder, WriteOrdering, WriteOrderingType }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![\ PointStruct::new(1, vec![0.9, 0.1, 0.1], [("color", "red".into())]),\ PointStruct::new(2, vec![0.1, 0.9, 0.1], [("color", "green".into())]),\ PointStruct::new(3, vec![0.1, 0.1, 0.9], [("color", "blue".into())]),\ ], ) .ordering(WriteOrdering { r#type: WriteOrderingType::Strong.into(), }), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.grpc.Points.PointStruct; import io.qdrant.client.grpc.Points.UpsertPoints; import io.qdrant.client.grpc.Points.WriteOrdering; import io.qdrant.client.grpc.Points.WriteOrderingType; client .upsertAsync( UpsertPoints.newBuilder() .setCollectionName("{collection_name}") .addAllPoints( List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(0.9f, 0.1f, 0.1f)) .putAllPayload(Map.of("color", value("red"))) .build(), PointStruct.newBuilder() .setId(id(2)) .setVectors(vectors(0.1f, 0.9f, 0.1f)) .putAllPayload(Map.of("color", value("green"))) .build(), PointStruct.newBuilder() .setId(id(3)) .setVectors(vectors(0.1f, 0.1f, 0.94f)) .putAllPayload(Map.of("color", value("blue"))) .build())) .setOrdering(WriteOrdering.newBuilder().setType(WriteOrderingType.Strong).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new[] { 0.9f, 0.1f, 0.1f }, Payload = { ["color"] = "red" } }, new() { Id = 2, Vectors = new[] { 0.1f, 0.9f, 0.1f }, Payload = { ["color"] = "green" } }, new() { Id = 3, Vectors = new[] { 0.1f, 0.1f, 0.9f }, Payload = { ["color"] = "blue" } } }, ordering: WriteOrderingType.Strong ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectors(0.9, 0.1, 0.1), Payload: qdrant.NewValueMap(map[string]any{"color": "red"}), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectors(0.1, 0.9, 0.1), Payload: qdrant.NewValueMap(map[string]any{"color": "green"}), }, { Id: qdrant.NewIDNum(3), Vectors: qdrant.NewVectors(0.1, 0.1, 0.9), Payload: qdrant.NewValueMap(map[string]any{"color": "blue"}), }, }, Ordering: &qdrant.WriteOrdering{ Type: qdrant.WriteOrderingType_Strong, }, }) ``` ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#listener-mode) Listener mode In some cases it might be useful to have a Qdrant node that only accumulates data and does not participate in search operations. There are several scenarios where this can be useful: - Listener option can be used to store data in a separate node, which can be used for backup purposes or to store data for a long time. - Listener node can be used to synchronize data into another region, while still performing search operations in the local region. To enable listener mode, set `node_type` to `Listener` in the config file: ```yaml storage: node_type: "Listener" ``` Listener node will not participate in search operations, but will still accept write operations and will store the data in the local storage. All shards, stored on the listener node, will be converted to the `Listener` state. Additionally, all write requests sent to the listener node will be processed with `wait=false` option, which means that the write oprations will be considered successful once they are written to WAL. This mechanism should allow to minimize upsert latency in case of parallel snapshotting. ## [Anchor](https://qdrant.tech/documentation/guides/distributed_deployment/\#consensus-checkpointing) Consensus Checkpointing Consensus checkpointing is a technique used in Raft to improve performance and simplify log management by periodically creating a consistent snapshot of the system state. This snapshot represents a point in time where all nodes in the cluster have reached agreement on the state, and it can be used to truncate the log, reducing the amount of data that needs to be stored and transferred between nodes. For example, if you attach a new node to the cluster, it should replay all the log entries to catch up with the current state. In long-running clusters, this can take a long time, and the log can grow very large. To prevent this, one can use a special checkpointing mechanism, that will truncate the log and create a snapshot of the current state. To use this feature, simply call the `/cluster/recover` API on required node: ```http POST /cluster/recover ``` This API can be triggered on any non-leader node, it will send a request to the current consensus leader to create a snapshot. The leader will in turn send the snapshot back to the requesting node for application. In some cases, this API can be used to recover from an inconsistent cluster state by forcing a snapshot creation. * * * 1. Weak ordering for updates: All records are streamed to the target node in order. New updates are received on the target node in parallel, while the transfer of records is still happening. We therefore have `weak` ordering, regardless of what [ordering](https://qdrant.tech/documentation/guides/distributed_deployment/#write-ordering) is used for updates. [↩︎](https://qdrant.tech/documentation/guides/distributed_deployment/#fnref:1) [↩︎](https://qdrant.tech/documentation/guides/distributed_deployment/#fnref1:1) 2. Strong ordering for updates: A snapshot of the shard is created, it is transferred and recovered on the target node. That ensures the state of the shard is kept consistent. New updates are queued on the source node, and transferred in order to the target node. Updates therefore have the same [ordering](https://qdrant.tech/documentation/guides/distributed_deployment/#write-ordering) as the user selects, making `strong` ordering possible. [↩︎](https://qdrant.tech/documentation/guides/distributed_deployment/#fnref:2) [↩︎](https://qdrant.tech/documentation/guides/distributed_deployment/#fnref1:2) [↩︎](https://qdrant.tech/documentation/guides/distributed_deployment/#fnref2:2) [↩︎](https://qdrant.tech/documentation/guides/distributed_deployment/#fnref3:2) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/distributed_deployment.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/distributed_deployment.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-86-lllmstxt|> ## cluster-access - [Documentation](https://qdrant.tech/documentation/) - [Cloud](https://qdrant.tech/documentation/cloud/) - Cluster Access --- # [Anchor](https://qdrant.tech/documentation/cloud/cluster-access/\#accessing-qdrant-cloud-clusters) Accessing Qdrant Cloud Clusters Once you [created](https://qdrant.tech/documentation/cloud/create-cluster/) a cluster, and set up an [API key](https://qdrant.tech/documentation/cloud/authentication/), you can access your cluster through the integrated Cluster UI, the REST API and the GRPC API. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-access/\#cluster-ui) Cluster UI There is the convenient link on the cluster detail page in the Qdrant Cloud Console to access the [Cluster UI](https://qdrant.tech/documentation/web-ui/). ![Cluster Cluster UI](https://qdrant.tech/documentation/cloud/cloud-db-dashboard.png) The Overview tab also contains direct links to explore Qdrant tutorials and sample datasets. ![Cluster Cluster UI Tutorials](https://qdrant.tech/documentation/cloud/cloud-db-deeplinks.png) ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-access/\#api) API The REST API is exposed on your cluster endpoint at port `6333`. The GRPC API is exposed on your cluster endpoint at port `6334`. When accessing the cluster endpoint, traffic is automatically load balanced across all healthy Qdrant nodes in the cluster. For all operations, but the few mentioned at [Node specific endpoints](https://qdrant.tech/documentation/cloud/cluster-access/#node-specific-endpoints), you should use the cluster endpoint. It does not matter which node in the cluster you land on. All nodes can handle all search and write requests. ![Cluster cluster endpoint](https://qdrant.tech/documentation/cloud/cloud-endpoint.png) Have a look at the [API reference](https://qdrant.tech/documentation/interfaces/#api-reference) and the official [client libraries](https://qdrant.tech/documentation/interfaces/#client-libraries) for more information on how to interact with the Qdrant Cloud API. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-access/\#node-specific-endpoints) Node Specific Endpoints Next to the cluster endpoint which loadbalances requests across all healthy Qdrant nodes, each node in the cluster has its own endpoint as well. This is mainly usefull for monitoring or manual shard management purpuses. You can finde the node specific endpoints on the cluster detail page in the Qdrant Cloud Console. ![Cluster node endpoints](https://qdrant.tech/documentation/cloud/cloud-node-endpoints.png) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-access.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-access.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-87-lllmstxt|> ## gridstore-key-value-storage - [Articles](https://qdrant.tech/articles/) - Introducing Gridstore: Qdrant's Custom Key-Value Store [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Introducing Gridstore: Qdrant's Custom Key-Value Store Luis Cossio, Arnaud Gourlay & David Myriel · February 05, 2025 ![Introducing Gridstore: Qdrant's Custom Key-Value Store](https://qdrant.tech/articles_data/gridstore-key-value-storage/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#why-we-built-our-own-storage-engine) Why We Built Our Own Storage Engine Databases need a place to store and retrieve data. That’s what Qdrant’s [**key-value storage**](https://en.wikipedia.org/wiki/Key%e2%80%93value_database) does—it links keys to values. When we started building Qdrant, we needed to pick something ready for the task. So we chose [**RocksDB**](https://rocksdb.org/) as our embedded key-value store. ![RocksDB](https://qdrant.tech/articles_data/gridstore-key-value-storage/rocksdb.jpg) It is mature, reliable, and well-documented. Over time, we ran into issues. Its architecture required compaction (uses [LSMT](https://en.wikipedia.org/wiki/Log-structured_merge-tree)), which caused random latency spikes. It handles generic keys, while we only use it for sequential IDs. Having lots of configuration options makes it versatile, but accurately tuning it was a headache. Finally, interoperating with C++ slowed us down (although we will still support it for quite some time 😭). While there are already some good options written in Rust that we could leverage, we needed something custom. Nothing out there fit our needs in the way we wanted. We didn’t require generic keys. We wanted full control over when and which data was written and flushed. Our system already has crash recovery mechanisms built-in. Online compaction isn’t a priority, we already have optimizers for that. Debugging misconfigurations was not a great use of our time. So we built our own storage. As of [**Qdrant Version 1.13**](https://qdrant.tech/blog/qdrant-1.13.x/), we are using Gridstore for **payload and sparse vector storages**. ![Gridstore](https://qdrant.tech/articles_data/gridstore-key-value-storage/gridstore.png) Simple, efficient, and designed just for Qdrant. #### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#in-this-article-youll-learn-about) In this article, you’ll learn about: - **How Gridstore works** – a deep dive into its architecture and mechanics. - **Why we built it this way** – the key design decisions that shaped it. - **Rigorous testing** – how we ensured the new storage is production-ready. - **Performance benchmarks** – official metrics that demonstrate its efficiency. **Our first challenge?** Figuring out the best way to handle sequential keys and variable-sized data. ## [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#gridstore-architecture-three-main-components) Gridstore Architecture: Three Main Components ![gridstore](https://qdrant.tech/articles_data/gridstore-key-value-storage/gridstore-2.png) Gridstore’s architecture is built around three key components that enable fast lookups and efficient space management: | Component | Description | | --- | --- | | The Data Layer | Stores values in fixed-sized blocks and retrieves them using a pointer-based lookup system. | | The Mask Layer | Uses a bitmask to track which blocks are in use and which are available. | | The Gaps Layer | Manages block availability at a higher level, allowing for quick space allocation. | ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#1-the-data-layer-for-fast-retrieval) 1\. The Data Layer for Fast Retrieval At the core of Gridstore is **The Data Layer**, which is designed to store and retrieve values quickly based on their keys. This layer allows us to do efficient reads and lets us store variable-sized data. The main two components of this layer are **The Tracker** and **The Data Grid**. Since internal IDs are always sequential integers (0, 1, 2, 3, 4, …), the tracker is an array of pointers, where each pointer tells the system exactly where a value starts and how long it is. ![The Data Layer](https://qdrant.tech/articles_data/gridstore-key-value-storage/data-layer.png) The Data Layer uses an array of pointers to quickly retrieve data. This makes lookups incredibly fast. For example, finding key 3 is just a matter of jumping to the third position in the tracker, and following the pointer to find the value in the data grid. However, because values are of variable size, the data itself is stored separately in a grid of fixed-sized blocks, which are grouped into larger page files. The fixed size of each block is usually 128 bytes. When inserting a value, Gridstore allocates one or more consecutive blocks to store it, ensuring that each block only holds data from a single value. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#2-the-mask-layer-reuses-space) 2\. The Mask Layer Reuses Space **The Mask Layer** helps Gridstore handle updates and deletions without the need for expensive data compaction. Instead of maintaining complex metadata for each block, Gridstore tracks usage with a bitmask, where each bit represents a block, with 1 for used, 0 for free. ![The Mask Layer](https://qdrant.tech/articles_data/gridstore-key-value-storage/mask-layer.png) The bitmask efficiently tracks block usage. This makes it easy to determine where new values can be written. When a value is removed, it gets soft-deleted at its pointer, and the corresponding blocks in the bitmask are marked as available. Similarly, when updating a value, the new version is written elsewhere, and the old blocks are freed at the bitmask. This approach ensures that Gridstore doesn’t waste space. As the storage grows, however, scanning for available blocks in the entire bitmask can become computationally expensive. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#3-the-gaps-layer-for-effective-updates) 3\. The Gaps Layer for Effective Updates To further optimize update handling, Gridstore introduces **The Gaps Layer**, which provides a higher-level view of block availability. Instead of scanning the entire bitmask, Gridstore splits the bitmask into regions and keeps track of the largest contiguous free space within each region, known as **The Region Gap**. By also storing the leading and trailing gaps of each region, the system can efficiently combine multiple regions when needed for storing large values. ![The Gaps Layer](https://qdrant.tech/articles_data/gridstore-key-value-storage/architecture.png) The complete architecture of Gridstore This layered approach allows Gridstore to locate available space quickly, scaling down the work required for scans while keeping memory overhead minimal. With this system, finding storage space for new values requires scanning only a tiny fraction of the total metadata, making updates and insertions highly efficient, even in large segments. Given the default configuration, the gaps layer is scoped out in a millionth fraction of the actual storage size. This means that for each 1GB of data, the gaps layer only requires scanning 6KB of metadata. With this mechanism, the other operations can be executed in virtually constant-time complexity. ## [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#gridstore-in-production-maintaining-data-integrity) Gridstore in Production: Maintaining Data Integrity ![gridstore](https://qdrant.tech/articles_data/gridstore-key-value-storage/gridstore-1.png) Gridstore’s architecture introduces multiple interdependent structures that must remain in sync to ensure data integrity: - **The Data Layer** holds the data and associates each key with its location in storage, including page ID, block offset, and the size of its value. - **The Mask Layer** keeps track of which blocks are occupied and which are free. - **The Gaps Layer** provides an indexed view of free blocks for efficient space allocation. Every time a new value is inserted or an existing value is updated, all these components need to be modified in a coordinated way. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#when-things-break-in-real-life) When Things Break in Real Life Real-world systems don’t operate in a vacuum. Failures happen: software bugs cause unexpected crashes, memory exhaustion forces processes to terminate, disks fail to persist data reliably, and power losses can interrupt operations at any moment. _The critical question is: what happens if a failure occurs while updating these structures?_ If one component is updated but another isn’t, the entire system could become inconsistent. Worse, if an operation is only partially written to disk, it could lead to orphaned data, unusable space, or even data corruption. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#stability-through-idempotency-recovering-with-wal) Stability Through Idempotency: Recovering With WAL To guard against these risks, Qdrant relies on a [**Write-Ahead Log (WAL)**](https://qdrant.tech/documentation/concepts/storage/). Before committing an operation, Qdrant ensures that it is at least recorded in the WAL. If a crash happens before all updates are flushed, the system can safely replay operations from the log. This recovery mechanism introduces another essential property: [**idempotence**](https://en.wikipedia.org/wiki/Idempotence). The storage system must be designed so that reapplying the same operation after a failure leads to the same final state as if the operation had been applied just once. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#the-grand-solution-lazy-updates) The Grand Solution: Lazy Updates To achieve this, **Gridstore completes updates lazily**, prioritizing the most critical part of the write: the data itself. | | | --- | | 👉 Instead of immediately updating all metadata structures, it writes the new value first while keeping lightweight pending changes in a buffer. | | 👉 The system only finalizes these updates when explicitly requested, ensuring that a crash never results in marking data as deleted before the update has been safely persisted. | | 👉 In the worst-case scenario, Gridstore may need to write the same data twice, leading to a minor space overhead, but it will never corrupt the storage by overwriting valid data. | ## [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#how-we-tested-the-final-product) How We Tested the Final Product ![gridstore](https://qdrant.tech/articles_data/gridstore-key-value-storage/gridstore-3.png) ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#first-model-testing) First… Model Testing Gridstore can be tested efficiently using model testing, which compares its behavior to a simple in-memory hash map. Since Gridstore should function like a persisted hash map, this method quickly detects inconsistencies. The process is straightforward: 1. Initialize a Gridstore instance and an empty hash map. 2. Run random operations (put, delete, update) on both. 3. Verify that results match after each operation. 4. Compare all keys and values to ensure consistency. This approach provides high test coverage, exposing issues like incorrect persistence or faulty deletions. Running large-scale model tests ensures Gridstore remains reliable in real-world use. Here is a naive way to generate operations in Rust. ```rust enum Operation { Put(PointOffset, Payload), Delete(PointOffset), Update(PointOffset, Payload), } impl Operation { fn random(rng: &mut impl Rng, max_point_offset: u32) -> Self { let point_offset = rng.random_range(0..=max_point_offset); let operation = rng.gen_range(0..3); match operation { 0 => { let size_factor = rng.random_range(1..10); let payload = random_payload(rng, size_factor); Operation::Put(point_offset, payload) } 1 => Operation::Delete(point_offset), 2 => { let size_factor = rng.random_range(1..10); let payload = random_payload(rng, size_factor); Operation::Update(point_offset, payload) } _ => unreachable!(), } } } ``` Model testing is a high-value way to catch bugs, especially when your system mimics a well-defined component like a hash map. If your component behaves the same as another one, using model testing brings a lot of value for a bit of effort. We could have tested against RocksDB, but simplicity matters more. A simple hash map lets us run massive test sequences quickly, exposing issues faster. For even sharper debugging, Property-Based Testing adds automated test generation and shrinking. It pinpoints failures with minimalized test cases, making bug hunting faster and more effective. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#crash-testing-can-gridstore-handle-the-pressure) Crash Testing: Can Gridstore Handle the Pressure? Designing for crash resilience is one thing, and proving it works under stress is another. To push Qdrant’s data integrity to the limit, we built [**Crasher**](https://github.com/qdrant/crasher), a test bench that brutally kills and restarts Qdrant while it handles a heavy update workload. Crasher runs a loop that continuously writes data, then randomly crashes Qdrant. On each restart, Qdrant replays its [**Write-Ahead Log (WAL)**](https://qdrant.tech/documentation/concepts/storage/), and we verify if data integrity holds. Possible anomalies include: - Missing data (points, vectors, or payloads) - Corrupt payload values This aggressive yet simple approach has uncovered real-world issues when run for extended periods. While we also use chaos testing for distributed setups, Crasher excels at fast, repeatable failure testing in a local environment. ## [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#testing-gridstore-performance-benchmarks) Testing Gridstore Performance: Benchmarks ![gridstore](https://qdrant.tech/articles_data/gridstore-key-value-storage/gridstore-4.png) To measure the impact of our new storage engine, we used [**Bustle, a key-value storage benchmarking framework**](https://github.com/jonhoo/bustle), to compare Gridstore against RocksDB. We tested three workloads: | Workload Type | Operation Distribution | | --- | --- | | Read-heavy | 95% reads | | Insert-heavy | 80% inserts | | Update-heavy | 50% updates | #### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#the-results-speak-for-themselves) The results speak for themselves: Average latency for all kinds of workloads is lower across the board, particularly for inserts. ![image.png](https://qdrant.tech/articles_data/gridstore-key-value-storage/1.png) This shows a clear boost in performance. As we can see, the investment in Gridstore is paying off. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#end-to-end-benchmarking) End-to-End Benchmarking Now, let’s test the impact on a real Qdrant instance. So far, we’ve only integrated Gridstore for [**payloads**](https://qdrant.tech/documentation/concepts/payload/) and [**sparse vectors**](https://qdrant.tech/documentation/concepts/vectors/#sparse-vectors), but even this partial switch should show noticeable improvements. For benchmarking, we used our in-house [**bfb tool**](https://github.com/qdrant/bfb) to generate a workload. Our configuration: ```json bfb -n 2000000 --max-id 1000000 \ --sparse-vectors 0.02 \ --set-payload \ --on-disk-payload \ --dim 1 \ --sparse-dim 5000 \ --bool-payloads \ --keywords 100 \ --float-payloads true \ --int-payloads 100000 \ --text-payloads \ --text-payload-length 512 \ --skip-field-indices \ --jsonl-updates ./rps.jsonl ``` This benchmark upserts 1 million points twice. Each point has: - A medium to large payload - A tiny dense vector (dense vectors use a different storage type) - A sparse vector * * * #### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#additional-configuration) Additional configuration: 1. The test we conducted updated payload data separately in another request. 2. There were no payload indices, which ensured we measured pure ingestion speed. 3. Finally, we gathered request latency metrics for analysis. * * * We ran this against Qdrant 1.12.6, toggling between the old and new storage backends. ### [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#final-result) Final Result Data ingestion is **twice as fast and with a smoother throughput** — a massive win! 😍 ![image.png](https://qdrant.tech/articles_data/gridstore-key-value-storage/2.png) We optimized for speed, and it paid off—but what about storage size? - Gridstore: 2333MB - RocksDB: 2319MB Strictly speaking, RocksDB is slightly smaller, but the difference is negligible compared to the 2x faster ingestion and more stable throughput. A small trade-off for a big performance gain! ## [Anchor](https://qdrant.tech/articles/gridstore-key-value-storage/\#trying-out-gridstore) Trying Out Gridstore Gridstore represents a significant advancement in how Qdrant manages its **key-value storage** needs. It offers great performance and streamlined updates tailored specifically for our use case. We have managed to achieve faster, more reliable data ingestion while maintaining data integrity, even under heavy workloads and unexpected failures. It is already used as a storage backend for on-disk payloads and sparse vectors. 👉 It’s important to note that Gridstore remains tightly integrated with Qdrant and, as such, has not been released as a standalone crate. Its API is still evolving, and we are focused on refining it within our ecosystem to ensure maximum stability and performance. That said, we recognize the value this innovation could bring to the wider Rust community. In the future, once the API stabilizes and we decouple it enough from Qdrant, we will consider publishing it as a contribution to the community ❤️. For now, Gridstore continues to drive improvements in Qdrant, demonstrating the benefits of a custom-tailored storage engine designed with modern demands in mind. Stay tuned for further updates and potential community releases as we keep pushing the boundaries of performance and reliability. ![Gridstore](https://qdrant.tech/articles_data/gridstore-key-value-storage/gridstore.png) Simple, efficient, and designed just for Qdrant. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/gridstore-key-value-storage.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/gridstore-key-value-storage.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-88-lllmstxt|> ## vector-search-filtering - [Articles](https://qdrant.tech/articles/) - A Complete Guide to Filtering in Vector Search [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # A Complete Guide to Filtering in Vector Search Sabrina Aquino, David Myriel · September 10, 2024 ![A Complete Guide to Filtering in Vector Search](https://qdrant.tech/articles_data/vector-search-filtering/preview/title.jpg) Imagine you sell computer hardware. To help shoppers easily find products on your website, you need to have a **user-friendly [search engine](https://qdrant.tech/)**. ![vector-search-ecommerce](https://qdrant.tech/articles_data/vector-search-filtering/vector-search-ecommerce.png) If you’re selling computers and have extensive data on laptops, desktops, and accessories, your search feature should guide customers to the exact device they want - or at least a **very similar** match. When storing data in Qdrant, each product is a point, consisting of an `id`, a `vector` and `payload`: ```json { "id": 1, "vector": [0.1, 0.2, 0.3, 0.4], "payload": { "price": 899.99, "category": "laptop" } } ``` The `id` is a unique identifier for the point in your collection. The `vector` is a mathematical representation of similarity to other points in the collection. Finally, the `payload` holds metadata that directly describes the point. Though we may not be able to decipher the vector, we are able to derive additional information about the item from its metadata, In this specific case, **we are looking at a data point for a laptop that costs $899.99**. ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#what-is-filtering) What is filtering? When searching for the perfect computer, your customers may end up with results that are mathematically similar to the search entry, but not exact. For example, if they are searching for **laptops under $1000**, a simple [vector search](https://qdrant.tech/advanced-search/) without constraints might still show other laptops over $1000. This is why [semantic search](https://qdrant.tech/advanced-search/) alone **may not be enough**. In order to get the exact result, you would need to enforce a payload filter on the `price`. Only then can you be sure that the search results abide by the chosen characteristic. > This is called **filtering** and it is one of the key features of [vector databases](https://qdrant.tech/). Here is how a **filtered vector search** looks behind the scenes. We’ll cover its mechanics in the following section. ```http POST /collections/online_store/points/search { "vector": [ 0.2, 0.1, 0.9, 0.7 ], "filter": { "must": [\ {\ "key": "category",\ "match": { "value": "laptop" }\ },\ {\ "key": "price",\ "range": {\ "gt": null,\ "gte": null,\ "lt": null,\ "lte": 1000\ }\ }\ ] }, "limit": 3, "with_payload": true, "with_vector": false } ``` The filtered result will be a combination of the semantic search and the filtering conditions imposed upon the query. In the following pages, we will show that **filtering is a key practice in vector search for two reasons:** 1. With filtering in Qdrant, you can **dramatically increase search precision**. More on this in the next section. 2. Filtering helps control resources and **reduce compute use**. More on this in [**Payload Indexing**](https://qdrant.tech/articles/vector-search-filtering/#filtering-with-the-payload-index). ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#what-you-will-learn-in-this-guide) What you will learn in this guide: In [vector search](https://qdrant.tech/advanced-search/), filtering and sorting are more interdependent than they are in traditional databases. While databases like SQL use commands such as `WHERE` and `ORDER BY`, the interplay between these processes in vector search is a bit more complex. Most people use default settings and build vector search apps that aren’t properly configured or even setup for precise retrieval. In this guide, we will show you how to **use filtering to get the most out of vector search** with some basic and advanced strategies that are easy to implement. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#remember-to-run-all-tutorial-code-in-qdrants-dashboard) Remember to run all tutorial code in Qdrant’s Dashboard The easiest way to reach that “Hello World” moment is to [**try filtering in a live cluster**](https://qdrant.tech/documentation/quickstart-cloud/). Our interactive tutorial will show you how to create a cluster, add data and try some filtering clauses. ![qdrant-filtering-tutorial](https://qdrant.tech/articles_data/vector-search-filtering/qdrant-filtering-tutorial.png) ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#qdrants-approach-to-filtering) Qdrant’s approach to filtering Qdrant follows a specific method of searching and filtering through dense vectors. Let’s take a look at this **3-stage diagram**. In this case, we are trying to find the nearest neighbour to the query vector **(green)**. Your search journey starts at the bottom **(orange)**. By default, Qdrant connects all your data points within the [**vector index**](https://qdrant.tech/documentation/concepts/indexing/). After you [**introduce filters**](https://qdrant.tech/documentation/concepts/filtering/), some data points become disconnected. Vector search can’t cross the grayed out area and it won’t reach the nearest neighbor. How can we bridge this gap? **Figure 1:** How Qdrant maintains a filterable vector index. ![filterable-vector-index](https://qdrant.tech/articles_data/vector-search-filtering/filterable-vector-index.png) [**Filterable vector index**](https://qdrant.tech/documentation/concepts/indexing/): This technique builds additional links **(orange)** between leftover data points. The filtered points which stay behind are now traversible once again. Qdrant uses special category-based methods to connect these data points. ### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#qdrants-approach-vs-traditional-filtering-methods) Qdrant’s approach vs traditional filtering methods ![stepping-lens](https://qdrant.tech/articles_data/vector-search-filtering/stepping-lens.png) The filterable vector index is Qdrant’s solves pre and post-filtering problems by adding specialized links to the search graph. It aims to maintain the speed advantages of vector search while allowing for precise filtering, addressing the inefficiencies that can occur when applying filters after the vector search. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#pre-filtering) Pre-filtering In pre-filtering, a search engine first narrows down the dataset based on chosen metadata values, and then searches within that filtered subset. This reduces unnecessary computation over a dataset that is potentially much larger. The choice between pre-filtering and using the filterable HNSW index depends on filter cardinality. When metadata cardinality is too low, the filter becomes restrictive and it can disrupt the connections within the graph. This leads to fragmented search paths (as in **Figure 1**). When the semantic search process begins, it won’t be able to travel to those locations. However, Qdrant still benefits from pre-filtering **under certain conditions**. In cases of low cardinality, Qdrant’s query planner stops using HNSW and switches over to the payload index alone. This makes the search process much cheaper and faster than if using HNSW. **Figure 2:** On the user side, this is how filtering looks. We start with five products with different prices. First, the $1000 price **filter** is applied, narrowing down the selection of laptops. Then, a vector search finds the relevant **results** within this filtered set. ![pre-filtering-vector-search](https://qdrant.tech/articles_data/vector-search-filtering/pre-filtering.png) In conclusion, pre-filtering is efficient in specific cases when you use small datasets with low cardinality metadata. However, pre-filtering should not be used over large datasets as it breaks too many links in the HNSW graph, causing lower accuracy. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#post-filtering) Post-filtering In post-filtering, a search engine first looks for similar vectors and retrieves a larger set of results. Then, it applies filters to those results based on metadata. The problem with post-filtering becomes apparent when using low-cardinality filters. > When you apply a low-cardinality filter after performing a vector search, you often end up discarding a large portion of the results that the vector search returned. **Figure 3:** In the same example, we have five laptops. First, the vector search finds the top two relevant **results**, but they may not meet the price match. When the $1000 price **filter** is applied, other potential results are discarded. ![post-filtering-vector-search](https://qdrant.tech/articles_data/vector-search-filtering/post-filtering.png) The system will waste computational resources by first finding similar vectors and then discarding many that don’t meet the filter criteria. You’re also limited to filtering only from the initial set of [vector search](https://qdrant.tech/advanced-search/) results. If your desired items aren’t in this initial set, you won’t find them, even if they exist in the database. ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#basic-filtering-example-ecommerce-and-laptops) Basic filtering example: ecommerce and laptops We know that there are three possible laptops that suit our price point. Let’s see how Qdrant’s filterable vector index works and why it is the best method of capturing all available results. First, add five new laptops to your online store. Here is a sample input: ```python laptops = [\ (1, [0.1, 0.2, 0.3, 0.4], {"price": 899.99, "category": "laptop"}),\ (2, [0.2, 0.3, 0.4, 0.5], {"price": 1299.99, "category": "laptop"}),\ (3, [0.3, 0.4, 0.5, 0.6], {"price": 799.99, "category": "laptop"}),\ (4, [0.4, 0.5, 0.6, 0.7], {"price": 1099.99, "category": "laptop"}),\ (5, [0.5, 0.6, 0.7, 0.8], {"price": 949.99, "category": "laptop"})\ ] ``` The four-dimensional vector can represent features like laptop CPU, RAM or battery life, but that isn’t specified. The payload, however, specifies the exact price and product category. Now, set the filter to “price is less than $1000”: ```json { "key": "price", "range": { "gt": null, "gte": null, "lt": null, "lte": 1000 } } ``` When a price filter of equal/less than $1000 is applied, vector search returns the following results: ```json [\ {\ "id": 3,\ "score": 0.9978443564622781,\ "payload": {\ "price": 799.99,\ "category": "laptop"\ }\ },\ {\ "id": 1,\ "score": 0.9938079894227599,\ "payload": {\ "price": 899.99,\ "category": "laptop"\ }\ },\ {\ "id": 5,\ "score": 0.9903751498208603,\ "payload": {\ "price": 949.99,\ "category": "laptop"\ }\ }\ ] ``` As you can see, Qdrant’s filtering method has a greater chance of capturing all possible search results. This specific example uses the `range` condition for filtering. Qdrant, however, offers many other possible ways to structure a filter **For detailed usage examples, [filtering](https://qdrant.tech/documentation/concepts/filtering/) docs are the best resource.** ### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#scrolling-instead-of-searching) Scrolling instead of searching You don’t need to use our `search` and `query` APIs to filter through data. The `scroll` API is another option that lets you retrieve lists of points which meet the filters. If you aren’t interested in finding similar points, you can simply list the ones that match a given filter. While search gives you the most similar points based on some query vector, scroll will give you all points matching your filter not considering similarity. In Qdrant, scrolling is used to iteratively **retrieve large sets of points from a collection**. It is particularly useful when you’re dealing with a large number of points and don’t want to load them all at once. Instead, Qdrant provides a way to scroll through the points **one page at a time**. You start by sending a scroll request to Qdrant with specific conditions like filtering by payload, vector search, or other criteria. Let’s retrieve a list of top 10 laptops ordered by price in the store: ```http POST /collections/online_store/points/scroll { "filter": { "must": [\ {\ "key": "category",\ "match": {\ "value": "laptop"\ }\ }\ ] }, "limit": 10, "with_payload": true, "with_vector": false, "order_by": [\ {\ "key": "price",\ }\ ] } ``` The response contains a batch of points that match the criteria and a reference (offset or next page token) to retrieve the next set of points. > [**Scrolling**](https://qdrant.tech/documentation/concepts/points/#scroll-points) is designed to be efficient. It minimizes the load on the server and reduces memory consumption on the client side by returning only manageable chunks of data at a time. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#available-filtering-conditions) Available filtering conditions | **Condition** | **Usage** | **Condition** | **Usage** | | --- | --- | --- | --- | | **Match** | Exact value match. | **Range** | Filter by value range. | | **Match Any** | Match multiple values. | **Datetime Range** | Filter by date range. | | **Match Except** | Exclude specific values. | **UUID Match** | Filter by unique ID. | | **Nested Key** | Filter by nested data. | **Geo** | Filter by location. | | **Nested Object** | Filter by nested objects. | **Values Count** | Filter by element count. | | **Full Text Match** | Search in text fields. | **Is Empty** | Filter empty fields. | | **Has ID** | Filter by unique ID. | **Is Null** | Filter null values. | > All clauses and conditions are outlined in Qdrant’s [filtering](https://qdrant.tech/documentation/concepts/filtering/) documentation. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#filtering-clauses-to-remember) Filtering clauses to remember | **Clause** | **Description** | **Clause** | **Description** | | --- | --- | --- | --- | | **Must** | Includes items that meet the condition
(similar to `AND`). | **Should** | Filters if at least one condition is met
(similar to `OR`). | | **Must Not** | Excludes items that meet the condition
(similar to `NOT`). | **Clauses Combination** | Combines multiple clauses to refine filtering
(similar to `AND`). | ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#advanced-filtering-example-dinosaur-diets) Advanced filtering example: dinosaur diets ![advanced-payload-filtering](https://qdrant.tech/articles_data/vector-search-filtering/advanced-payload-filtering.png) We can also use nested filtering to query arrays of objects within the payload. In this example, we have two points. They each represent a dinosaur with a list of food preferences (diet) that indicate what type of food they like or dislike: ```json [\ {\ "id": 1,\ "dinosaur": "t-rex",\ "diet": [\ { "food": "leaves", "likes": false},\ { "food": "meat", "likes": true}\ ]\ },\ {\ "id": 2,\ "dinosaur": "diplodocus",\ "diet": [\ { "food": "leaves", "likes": true},\ { "food": "meat", "likes": false}\ ]\ }\ ] ``` To ensure that both conditions are applied to the same array element (e.g., food = meat and likes = true must refer to the same diet item), you need to use a nested filter. Nested filters are used to apply conditions within an array of objects. They ensure that the conditions are evaluated per array element, rather than across all elements. httppythontypescriptrustjavacsharp ```http POST /collections/dinosaurs/points/scroll { "filter": { "must": [\ {\ "key": "diet[].food",\ "match": {\ "value": "meat"\ }\ },\ {\ "key": "diet[].likes",\ "match": {\ "value": true\ }\ }\ ] } } ``` ```python client.scroll( collection_name="dinosaurs", scroll_filter=models.Filter( must=[\ models.FieldCondition(\ key="diet[].food", match=models.MatchValue(value="meat")\ ),\ models.FieldCondition(\ key="diet[].likes", match=models.MatchValue(value=True)\ ),\ ], ), ) ``` ```typescript client.scroll("dinosaurs", { filter: { must: [\ {\ key: "diet[].food",\ match: { value: "meat" },\ },\ {\ key: "diet[].likes",\ match: { value: true },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("dinosaurs").filter(Filter::must([\ Condition::matches("diet[].food", "meat".to_string()),\ Condition::matches("diet[].likes", true),\ ])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.match; import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("dinosaurs") .setFilter( Filter.newBuilder() .addAllMust( List.of(matchKeyword("diet[].food", "meat"), match("diet[].likes", true))) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "dinosaurs", filter: MatchKeyword("diet[].food", "meat") & Match("diet[].likes", true) ); ``` This happens because both points are matching the two conditions: - the “t-rex” matches food=meat on `diet[1].food` and likes=true on `diet[1].likes` - the “diplodocus” matches food=meat on `diet[1].food` and likes=true on `diet[0].likes` To retrieve only the points where the conditions apply to a specific element within an array (such as the point with id 1 in this example), you need to use a nested object filter. Nested object filters enable querying arrays of objects independently, ensuring conditions are checked within individual array elements. This is done by using the `nested` condition type, which consists of a payload key that targets an array and a filter to apply. The key should reference an array of objects and can be written with or without bracket notation (e.g., “data” or “data\[\]”). httppythontypescriptrustjavacsharp ```http POST /collections/dinosaurs/points/scroll { "filter": { "must": [{\ "nested": {\ "key": "diet",\ "filter":{\ "must": [\ {\ "key": "food",\ "match": {\ "value": "meat"\ }\ },\ {\ "key": "likes",\ "match": {\ "value": true\ }\ }\ ]\ }\ }\ }] } } ``` ```python client.scroll( collection_name="dinosaurs", scroll_filter=models.Filter( must=[\ models.NestedCondition(\ nested=models.Nested(\ key="diet",\ filter=models.Filter(\ must=[\ models.FieldCondition(\ key="food", match=models.MatchValue(value="meat")\ ),\ models.FieldCondition(\ key="likes", match=models.MatchValue(value=True)\ ),\ ]\ ),\ )\ )\ ], ), ) ``` ```typescript client.scroll("dinosaurs", { filter: { must: [\ {\ nested: {\ key: "diet",\ filter: {\ must: [\ {\ key: "food",\ match: { value: "meat" },\ },\ {\ key: "likes",\ match: { value: true },\ },\ ],\ },\ },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, NestedCondition, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("dinosaurs").filter(Filter::must([NestedCondition {\ key: "diet".to_string(),\ filter: Some(Filter::must([\ Condition::matches("food", "meat".to_string()),\ Condition::matches("likes", true),\ ])),\ }\ .into()])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.match; import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.ConditionFactory.nested; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("dinosaurs") .setFilter( Filter.newBuilder() .addMust( nested( "diet", Filter.newBuilder() .addAllMust( List.of( matchKeyword("food", "meat"), match("likes", true))) .build())) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "dinosaurs", filter: Nested("diet", MatchKeyword("food", "meat") & Match("likes", true)) ); ``` The matching logic is adjusted to operate at the level of individual elements within an array in the payload, rather than on all array elements together. Nested filters function as though each element of the array is evaluated separately. The parent document will be considered a match if at least one array element satisfies all the nested filter conditions. ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#other-creative-uses-for-filters) Other creative uses for filters You can use filters to retrieve data points without knowing their `id`. You can search through data and manage it, solely by using filters. Let’s take a look at some creative uses for filters: | Action | Description | Action | Description | | --- | --- | --- | --- | | [Delete Points](https://qdrant.tech/documentation/concepts/points/#delete-points) | Deletes all points matching the filter. | [Set Payload](https://qdrant.tech/documentation/concepts/payload/#set-payload) | Adds payload fields to all points matching the filter. | | [Scroll Points](https://qdrant.tech/documentation/concepts/points/#scroll-points) | Lists all points matching the filter. | [Update Payload](https://qdrant.tech/documentation/concepts/payload/#overwrite-payload) | Updates payload fields for points matching the filter. | | [Order Points](https://qdrant.tech/documentation/concepts/points/#order-points-by-payload-key) | Lists all points, sorted by the filter. | [Delete Payload](https://qdrant.tech/documentation/concepts/payload/#delete-payload-keys) | Deletes fields for points matching the filter. | | [Count Points](https://qdrant.tech/documentation/concepts/points/#counting-points) | Totals the points matching the filter. | | | ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#filtering-with-the-payload-index) Filtering with the payload index ![vector-search-filtering-vector-search](https://qdrant.tech/articles_data/vector-search-filtering/scanning-lens.png) When you start working with Qdrant, your data is by default organized in a vector index. In addition to this, we recommend adding a secondary data structure - **the payload index**. Just how the vector index organizes vectors, the payload index will structure your metadata. **Figure 4:** The payload index is an additional data structure that supports vector search. A payload index (in green) organizes candidate results by cardinality, so that semantic search (in red) can traverse the vector index quickly. ![payload-index-vector-search](https://qdrant.tech/articles_data/vector-search-filtering/payload-index-vector-search.png) On its own, semantic searching over terabytes of data can take up lots of RAM. [**Filtering**](https://qdrant.tech/documentation/concepts/filtering/) and [**Indexing**](https://qdrant.tech/documentation/concepts/indexing/) are two easy strategies to reduce your compute usage and still get the best results. Remember, this is only a guide. For an exhaustive list of filtering options, you should read the [filtering documentation](https://qdrant.tech/documentation/concepts/filtering/). Here is how you can create a single index for a metadata field “category”: httppython ```http PUT /collections/computers/index { "field_name": "category", "field_schema": "keyword" } ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.create_payload_index( collection_name="computers", field_name="category", field_schema="keyword", ) ``` Once you mark a field indexable, **you don’t need to do anything else**. Qdrant will handle all optimizations in the background. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#why-should-you-index-metadata) Why should you index metadata? ![payload-index-filtering](https://qdrant.tech/articles_data/vector-search-filtering/payload-index-filtering.png) The payload index acts as a secondary data structure that speeds up retrieval. Whenever you run vector search with a filter, Qdrant will consult a payload index - if there is one. As your dataset grows in complexity, Qdrant takes up additional resources to go through all data points. Without a proper data structure, the search can take longer - or run out of resources. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#payload-indexing-helps-evaluate-the-most-restrictive-filters) Payload indexing helps evaluate the most restrictive filters The payload index is also used to accurately estimate **filter cardinality**, which helps the query planning choose a search strategy. **Filter cardinality** refers to the number of distinct values that a filter can match within a dataset. Qdrant’s search strategy can switch from **HNSW search** to **payload index-based search** if the cardinality is too low. **How it affects your queries:** Depending on the filter used in the search - there are several possible scenarios for query execution. Qdrant chooses one of the query execution options depending on the available indexes, the complexity of the conditions and the cardinality of the filtering result. - The planner estimates the cardinality of a filtered result before selecting a strategy. - Qdrant retrieves points using the **payload index** if cardinality is below threshold. - Qdrant uses the **filterable vector index** if the cardinality is above a threshold #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#what-happens-if-you-dont-use-payload-indexes) What happens if you don’t use payload indexes? When using filters while querying, Qdrant needs to estimate cardinality of those filters to define a proper query plan. If you don’t create a payload index, Qdrant will not be able to do this. It may end up choosing a sub-optimal way of searching causing extremely slow search times or low accuracy results. If you only rely on **searching for the nearest vector**, Qdrant will have to go through the entire vector index. It will calculate similarities against each vector in the collection, relevant or not. Alternatively, when you filter with the help of a payload index, the HSNW algorithm won’t have to evaluate every point. Furthermore, the payload index will help HNSW construct the graph with additional links. ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#how-does-the-payload-index-look) How does the payload index look? A payload index is similar to conventional document-oriented databases. It connects metadata fields with their corresponding point id’s for quick retrieval. In this example, you are indexing all of your computer hardware inside of the `computers` collection. Let’s take a look at a sample payload index for the field `category`. ```json Payload Index by keyword: +------------+-------------+ | category | id | +------------+-------------+ | laptop | 1, 4, 7 | | desktop | 2, 5, 9 | | speakers | 3, 6, 8 | | keyboard | 10, 11 | +------------+-------------+ ``` When fields are properly indexed, the search engine roughly knows where it can start its journey. It can start looking up points that contain relevant metadata, and it doesn’t need to scan the entire dataset. This reduces the engine’s workload by a lot. As a result, query results are faster and the system can easily scale. > You may create as many payload indexes as you want, and we recommend you do so for each field that you filter by. If your users are often filtering by **laptop** when looking up a product **category**, indexing all computer metadata will speed up retrieval and make the results more precise. #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#different-types-of-payload-indexes) Different types of payload indexes | Index Type | Description | | --- | --- | | [Full-text Index](https://qdrant.tech/documentation/concepts/indexing/#full-text-index) | Enables efficient text search in large datasets. | | [Tenant Index](https://qdrant.tech/documentation/concepts/indexing/#tenant-index) | For data isolation and retrieval efficiency in multi-tenant architectures. | | [Principal Index](https://qdrant.tech/documentation/concepts/indexing/#principal-index) | Manages data based on primary entities like users or accounts. | | [On-Disk Index](https://qdrant.tech/documentation/concepts/indexing/#on-disk-payload-index) | Stores indexes on disk to manage large datasets without memory usage. | | [Parameterized Index](https://qdrant.tech/documentation/concepts/indexing/#parameterized-index) | Allows for dynamic querying, where the index can adapt based on different parameters or conditions provided by the user. Useful for numeric data like prices or timestamps. | ### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#indexing-payloads-in-multitenant-setups) Indexing payloads in multitenant setups Some applications need to have data segregated, whereby different users need to see different data inside of the same program. When setting up storage for such a complex application, many users think they need multiple databases for segregated users. We see this quite often. Users very frequently make the mistake of creating a separate collection for each tenant inside of the same cluster. This can quickly exhaust the cluster’s resources. Running vector search through too many collections can start using up too much RAM. You may start seeing out-of-memory (OOM) errors and degraded performance. To mitigate this, we offer extensive support for multitenant systems, so that you can build an entire global application in one single Qdrant collection. When creating or updating a collection, you can mark a metadata field as indexable. To mark `user_id` as a tenant in a shared collection, do the following: ```http PUT /collections/{collection_name}/index { "field_name": "user_id", "field_schema": { "type": "keyword", "is_tenant": true } } ``` Additionally, we offer a way of organizing data efficiently by means of the tenant index. This is another variant of the payload index that makes tenant data more accessible. This time, the request will specify the field as a tenant. This means that you can mark various customer types and user id’s as `is_tenant: true`. Read more about setting up [tenant defragmentation](https://qdrant.tech/documentation/concepts/indexing/?q=tenant#tenant-index) in multitenant environments, ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#key-takeaways-in-filtering-and-indexing) Key takeaways in filtering and indexing ![best-practices](https://qdrant.tech/articles_data/vector-search-filtering/best-practices.png) ### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#filtering-with-float-point-decimal-numbers) Filtering with float-point (decimal) numbers If you filter by the float data type, your search precision may be limited and inaccurate. Float Datatype numbers have a decimal point and are 64 bits in size. Here is an example: ```json { "price": 11.99 } ``` When you filter for a specific float number, such as 11.99, you may get a different result, like 11.98 or 12.00. With decimals, numbers are rounded differently, so logically identical values may appear different. Unfortunately, searching for exact matches can be unreliable in this case. To avoid inaccuracies, use a different filtering method. We recommend that you try Range Based Filtering instead of exact matches. This method accounts for minor variations in data, and it boosts performance - especially with large datasets. Here is a sample JSON range filter for values greater than or equal to 11.99 and less than or equal to the same number. This will retrieve any values within the range of 11.99, including those with additional decimal places. ```json { "key": "price", "range": { "gt": null, "gte": 11.99, "lt": null, "lte": 11.99 } } ``` ### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#working-with-pagination-in-queries) Working with pagination in queries When you’re implementing pagination in filtered queries, indexing becomes even more critical. When paginating results, you often need to exclude items you’ve already seen. This is typically managed by applying filters that specify which IDs should not be included in the next set of results. However, an interesting aspect of Qdrant’s data model is that a single point can have multiple values for the same field, such as different color options for a product. This means that during filtering, an ID might appear multiple times if it matches on different values of the same field. Proper indexing ensures that these queries are efficient, preventing duplicate results and making pagination smoother. ## [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#conclusion-real-life-use-cases-of-filtering) Conclusion: Real-life use cases of filtering Filtering in a [vector database](https://qdrant.tech/) like Qdrant can significantly enhance search capabilities by enabling more precise and efficient retrieval of data. As a conclusion to this guide, let’s look at some real-life use cases where filtering is crucial: | **Use Case** | **Vector Search** | **Filtering** | | --- | --- | --- | | [E-Commerce Product Search](https://qdrant.tech/advanced-search/) | Search for products by style or visual similarity | Filter by price, color, brand, size, ratings | | [Recommendation Systems](https://qdrant.tech/recommendations/) | Recommend similar content (e.g., movies, songs) | Filter by release date, genre, etc. (e.g., movies after 2020) | | [Geospatial Search in Ride-Sharing](https://qdrant.tech/articles/geo-polygon-filter-gsoc/) | Find similar drivers or delivery partners | Filter by rating, distance radius, vehicle type | | [Fraud & Anomaly Detection](https://qdrant.tech/data-analysis-anomaly-detection/) | Detect transactions similar to known fraud cases | Filter by amount, time, location | #### [Anchor](https://qdrant.tech/articles/vector-search-filtering/\#before-you-go---all-the-code-is-in-qdrants-dashboard) Before you go - all the code is in Qdrant’s Dashboard The easiest way to reach that “Hello World” moment is to [**try filtering in a live cluster**](https://qdrant.tech/documentation/quickstart-cloud/). Our interactive tutorial will show you how to create a cluster, add data and try some filtering clauses. **It’s all in your free cluster!** [![qdrant-hybrid-cloud](https://qdrant.tech/docs/homepage/cloud-cta.png)](https://qdrant.to/cloud) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/vector-search-filtering.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/vector-search-filtering.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-89-lllmstxt|> ## qdrant-airflow-astronomer - [Documentation](https://qdrant.tech/documentation/) - [Send data](https://qdrant.tech/documentation/send-data/) - Semantic Querying with Airflow and Astronomer --- # [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#semantic-querying-with-airflow-and-astronomer) Semantic Querying with Airflow and Astronomer | Time: 45 min | Level: Intermediate | | | | --- | --- | --- | --- | In this tutorial, you will use Qdrant as a [provider](https://airflow.apache.org/docs/apache-airflow-providers-qdrant/stable/index.html) in [Apache Airflow](https://airflow.apache.org/), an open-source tool that lets you setup data-engineering workflows. You will write the pipeline as a DAG (Directed Acyclic Graph) in Python. With this, you can leverage the powerful suite of Python’s capabilities and libraries to achieve almost anything your data pipeline needs. [Astronomer](https://www.astronomer.io/) is a managed platform that simplifies the process of developing and deploying Airflow projects via its easy-to-use CLI and extensive automation capabilities. Airflow is useful when running operations in Qdrant based on data events or building parallel tasks for generating vector embeddings. By using Airflow, you can set up monitoring and alerts for your pipelines for full observability. ## [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#prerequisites) Prerequisites Please make sure you have the following ready: - A running Qdrant instance. We’ll be using a free instance from [https://cloud.qdrant.io](https://cloud.qdrant.io/) - The Astronomer CLI. Find the installation instructions [here](https://docs.astronomer.io/astro/cli/install-cli). - A [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens) to generate embeddings. ## [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#implementation) Implementation We’ll be building a DAG that generates embeddings in parallel for our data corpus and performs semantic retrieval based on user input. ### [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#set-up-the-project) Set up the project The Astronomer CLI makes it very straightforward to set up the Airflow project: ```console mkdir qdrant-airflow-tutorial && cd qdrant-airflow-tutorial astro dev init ``` This command generates all of the project files you need to run Airflow locally. You can find a directory called `dags`, which is where we can place our Python DAG files. To use Qdrant within Airflow, install the Qdrant Airflow provider by adding the following to the `requirements.txt` file ```text apache-airflow-providers-qdrant ``` ### [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#configure-credentials) Configure credentials We can set up provider connections using the Airflow UI, environment variables or the `airflow_settings.yml` file. Add the following to the `.env` file in the project. Replace the values as per your credentials. ```env HUGGINGFACE_TOKEN="" AIRFLOW_CONN_QDRANT_DEFAULT='{ "conn_type": "qdrant", "host": "xyz-example.eu-central.aws.cloud.qdrant.io:6333", "password": "" }' ``` ### [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#add-the-data-corpus) Add the data corpus Let’s add some sample data to work with. Paste the following content into a file called `books.txt` file within the `include` directory. ```text 1 | To Kill a Mockingbird (1960) | fiction | Harper Lee's Pulitzer Prize-winning novel explores racial injustice and moral growth through the eyes of young Scout Finch in the Deep South. 2 | Harry Potter and the Sorcerer's Stone (1997) | fantasy | J.K. Rowling's magical tale follows Harry Potter as he discovers his wizarding heritage and attends Hogwarts School of Witchcraft and Wizardry. 3 | The Great Gatsby (1925) | fiction | F. Scott Fitzgerald's classic novel delves into the glitz, glamour, and moral decay of the Jazz Age through the eyes of narrator Nick Carraway and his enigmatic neighbour, Jay Gatsby. 4 | 1984 (1949) | dystopian | George Orwell's dystopian masterpiece paints a chilling picture of a totalitarian society where individuality is suppressed and the truth is manipulated by a powerful regime. 5 | The Catcher in the Rye (1951) | fiction | J.D. Salinger's iconic novel follows disillusioned teenager Holden Caulfield as he navigates the complexities of adulthood and society's expectations in post-World War II America. 6 | Pride and Prejudice (1813) | romance | Jane Austen's beloved novel revolves around the lively and independent Elizabeth Bennet as she navigates love, class, and societal expectations in Regency-era England. 7 | The Hobbit (1937) | fantasy | J.R.R. Tolkien's adventure follows Bilbo Baggins, a hobbit who embarks on a quest with a group of dwarves to reclaim their homeland from the dragon Smaug. 8 | The Lord of the Rings (1954-1955) | fantasy | J.R.R. Tolkien's epic fantasy trilogy follows the journey of Frodo Baggins to destroy the One Ring and defeat the Dark Lord Sauron in the land of Middle-earth. 9 | The Alchemist (1988) | fiction | Paulo Coelho's philosophical novel follows Santiago, an Andalusian shepherd boy, on a journey of self-discovery and spiritual awakening as he searches for a hidden treasure. 10 | The Da Vinci Code (2003) | mystery/thriller | Dan Brown's gripping thriller follows symbologist Robert Langdon as he unravels clues hidden in art and history while trying to solve a murder mystery with far-reaching implications. ``` Now, the hacking part - writing our Airflow DAG! ### [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#write-the-dag) Write the dag We’ll add the following content to a `books_recommend.py` file within the `dags` directory. Let’s go over what it does for each task. ```python import os import requests from airflow.decorators import dag, task from airflow.models.baseoperator import chain from airflow.models.param import Param from airflow.providers.qdrant.hooks.qdrant import QdrantHook from airflow.providers.qdrant.operators.qdrant import QdrantIngestOperator from pendulum import datetime from qdrant_client import models QDRANT_CONNECTION_ID = "qdrant_default" DATA_FILE_PATH = "include/books.txt" COLLECTION_NAME = "airflow_tutorial_collection" EMBEDDING_MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2" EMBEDDING_DIMENSION = 384 SIMILARITY_METRIC = models.Distance.COSINE def embed(text: str) -> list: HUGGINFACE_URL = f"https://api-inference.huggingface.co/pipeline/feature-extraction/{EMBEDDING_MODEL_ID}" response = requests.post( HUGGINFACE_URL, headers={"Authorization": f"Bearer {os.getenv('HUGGINGFACE_TOKEN')}"}, json={"inputs": [text], "options": {"wait_for_model": True}}, ) return response.json()[0] @dag( dag_id="books_recommend", start_date=datetime(2023, 10, 18), schedule=None, catchup=False, params={"preference": Param("Something suspenseful and thrilling.", type="string")}, ) def recommend_book(): @task def import_books(text_file_path: str) -> list: data = [] with open(text_file_path, "r") as f: for line in f: _, title, genre, description = line.split("|") data.append( { "title": title.strip(), "genre": genre.strip(), "description": description.strip(), } ) return data @task def init_collection(): hook = QdrantHook(conn_id=QDRANT_CONNECTION_ID) if not hook.conn..collection_exists(COLLECTION_NAME): hook.conn.create_collection( COLLECTION_NAME, vectors_config=models.VectorParams( size=EMBEDDING_DIMENSION, distance=SIMILARITY_METRIC ), ) @task def embed_description(data: dict) -> list: return embed(data["description"]) books = import_books(text_file_path=DATA_FILE_PATH) embeddings = embed_description.expand(data=books) qdrant_vector_ingest = QdrantIngestOperator( conn_id=QDRANT_CONNECTION_ID, task_id="qdrant_vector_ingest", collection_name=COLLECTION_NAME, payload=books, vectors=embeddings, ) @task def embed_preference(**context) -> list: user_mood = context["params"]["preference"] response = embed(text=user_mood) return response @task def search_qdrant( preference_embedding: list, ) -> None: hook = QdrantHook(conn_id=QDRANT_CONNECTION_ID) result = hook.conn.query_points( collection_name=COLLECTION_NAME, query=preference_embedding, limit=1, with_payload=True, ).points print("Book recommendation: " + result[0].payload["title"]) print("Description: " + result[0].payload["description"]) chain( init_collection(), qdrant_vector_ingest, search_qdrant(embed_preference()), ) recommend_book() ``` `import_books`: This task reads a text file containing information about the books (like title, genre, and description), and then returns the data as a list of dictionaries. `init_collection`: This task initializes a collection in the Qdrant database, where we will store the vector representations of the book descriptions. `embed_description`: This is a dynamic task that creates one mapped task instance for each book in the list. The task uses the `embed` function to generate vector embeddings for each description. To use a different embedding model, you can adjust the `EMBEDDING_MODEL_ID`, `EMBEDDING_DIMENSION` values. `embed_user_preference`: Here, we take a user’s input and convert it into a vector using the same pre-trained model used for the book descriptions. `qdrant_vector_ingest`: This task ingests the book data into the Qdrant collection using the [QdrantIngestOperator](https://airflow.apache.org/docs/apache-airflow-providers-qdrant/1.0.0/), associating each book description with its corresponding vector embeddings. `search_qdrant`: Finally, this task performs a search in the Qdrant database using the vectorized user preference. It finds the most relevant book in the collection based on vector similarity. ### [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#run-the-dag) Run the DAG Head over to your terminal and run `astro dev start` A local Airflow container should spawn. You can now access the Airflow UI at [http://localhost:8080](http://localhost:8080/). Visit our DAG by clicking on `books_recommend`. ![DAG](https://qdrant.tech/documentation/examples/airflow/demo-dag.png) Hit the PLAY button on the right to run the DAG. You’ll be asked for input about your preference, with the default value already filled in. ![Preference](https://qdrant.tech/documentation/examples/airflow/preference-input.png) After your DAG run completes, you should be able to see the output of your search in the logs of the `search_qdrant` task. ![Output](https://qdrant.tech/documentation/examples/airflow/output.png) There you have it, an Airflow pipeline that interfaces with Qdrant! Feel free to fiddle around and explore Airflow. There are references below that might come in handy. ## [Anchor](https://qdrant.tech/documentation/send-data/qdrant-airflow-astronomer/\#further-reading) Further reading - [Introduction to Airflow](https://docs.astronomer.io/learn/intro-to-airflow) - [Airflow Concepts](https://docs.astronomer.io/learn/category/airflow-concepts) - [Airflow Reference](https://airflow.apache.org/docs/) - [Astronomer Documentation](https://docs.astronomer.io/) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/qdrant-airflow-astronomer.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/qdrant-airflow-astronomer.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-90-lllmstxt|> ## datasets - [Documentation](https://qdrant.tech/documentation/) - Practice Datasets --- # [Anchor](https://qdrant.tech/documentation/datasets/\#common-datasets-in-snapshot-format) Common Datasets in Snapshot Format You may find that creating embeddings from datasets is a very resource-intensive task. If you need a practice dataset, feel free to pick one of the ready-made snapshots on this page. These snapshots contain pre-computed vectors that you can easily import into your Qdrant instance. ## [Anchor](https://qdrant.tech/documentation/datasets/\#available-datasets) Available datasets Our snapshots are usually generated from publicly available datasets, which are often used for non-commercial or academic purposes. The following datasets are currently available. Please click on a dataset name to see its detailed description. | Dataset | Model | Vector size | Documents | Size | Qdrant snapshot | HF Hub | | --- | --- | --- | --- | --- | --- | --- | | [Arxiv.org titles](https://qdrant.tech/documentation/datasets/#arxivorg-titles) | [InstructorXL](https://huggingface.co/hkunlp/instructor-xl) | 768 | 2.3M | 7.1 GB | [Download](https://snapshots.qdrant.io/arxiv_titles-3083016565637815127-2023-05-29-13-56-22.snapshot) | [Open](https://huggingface.co/datasets/Qdrant/arxiv-titles-instructorxl-embeddings) | | [Arxiv.org abstracts](https://qdrant.tech/documentation/datasets/#arxivorg-abstracts) | [InstructorXL](https://huggingface.co/hkunlp/instructor-xl) | 768 | 2.3M | 8.4 GB | [Download](https://snapshots.qdrant.io/arxiv_abstracts-3083016565637815127-2023-06-02-07-26-29.snapshot) | [Open](https://huggingface.co/datasets/Qdrant/arxiv-abstracts-instructorxl-embeddings) | | [Wolt food](https://qdrant.tech/documentation/datasets/#wolt-food) | [clip-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) | 512 | 1.7M | 7.9 GB | [Download](https://snapshots.qdrant.io/wolt-clip-ViT-B-32-2446808438011867-2023-12-14-15-55-26.snapshot) | [Open](https://huggingface.co/datasets/Qdrant/wolt-food-clip-ViT-B-32-embeddings) | Once you download a snapshot, you need to [restore it](https://qdrant.tech/documentation/concepts/snapshots/#restore-snapshot) using the Qdrant CLI upon startup or through the API. ## [Anchor](https://qdrant.tech/documentation/datasets/\#qdrant-on-hugging-face) Qdrant on Hugging Face [![HuggingFace](https://qdrant.tech/content/images/hf-logo-with-title.svg)](https://huggingface.co/Qdrant) [Hugging Face](https://huggingface.co/) provides a platform for sharing and using ML models and datasets. [Qdrant](https://huggingface.co/Qdrant) is one of the organizations there! We aim to provide you with datasets containing neural embeddings that you can use to practice with Qdrant and build your applications based on semantic search. **Please let us know if you’d like to see** **a specific dataset!** If you are not familiar with [Hugging Face datasets](https://huggingface.co/docs/datasets/index), or would like to know how to combine it with Qdrant, please refer to the [tutorial](https://qdrant.tech/documentation/tutorials/huggingface-datasets/). ## [Anchor](https://qdrant.tech/documentation/datasets/\#arxivorg) Arxiv.org [Arxiv.org](https://arxiv.org/) is a highly-regarded open-access repository of electronic preprints in multiple fields. Operated by Cornell University, arXiv allows researchers to share their findings with the scientific community and receive feedback before they undergo peer review for formal publication. Its archives host millions of scholarly articles, making it an invaluable resource for those looking to explore the cutting edge of scientific research. With a high frequency of daily submissions from scientists around the world, arXiv forms a comprehensive, evolving dataset that is ripe for mining, analysis, and the development of future innovations. ### [Anchor](https://qdrant.tech/documentation/datasets/\#arxivorg-titles) Arxiv.org titles This dataset contains embeddings generated from the paper titles only. Each vector has a payload with the title used to create it, along with the DOI (Digital Object Identifier). ```json { "title": "Nash Social Welfare for Indivisible Items under Separable, Piecewise-Linear Concave Utilities", "DOI": "1612.05191" } ``` The embeddings generated with InstructorXL model have been generated using the following instruction: > Represent the Research Paper title for retrieval; Input: The following code snippet shows how to generate embeddings using the InstructorXL model: ```python from InstructorEmbedding import INSTRUCTOR model = INSTRUCTOR("hkunlp/instructor-xl") sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments" instruction = "Represent the Research Paper title for retrieval; Input:" embeddings = model.encode([[instruction, sentence]]) ``` The snapshot of the dataset might be downloaded [here](https://snapshots.qdrant.io/arxiv_titles-3083016565637815127-2023-05-29-13-56-22.snapshot). #### [Anchor](https://qdrant.tech/documentation/datasets/\#importing-the-dataset) Importing the dataset The easiest way to use the provided dataset is to recover it via the API by passing the URL as a location. It works also in [Qdrant Cloud](https://cloud.qdrant.io/). The following code snippet shows how to create a new collection and fill it with the snapshot data: ```http PUT /collections/{collection_name}/snapshots/recover { "location": "https://snapshots.qdrant.io/arxiv_titles-3083016565637815127-2023-05-29-13-56-22.snapshot" } ``` ### [Anchor](https://qdrant.tech/documentation/datasets/\#arxivorg-abstracts) Arxiv.org abstracts This dataset contains embeddings generated from the paper abstracts. Each vector has a payload with the abstract used to create it, along with the DOI (Digital Object Identifier). ```json { "abstract": "Recently Cole and Gkatzelis gave the first constant factor approximation\nalgorithm for the problem of allocating indivisible items to agents, under\nadditive valuations, so as to maximize the Nash Social Welfare. We give\nconstant factor algorithms for a substantial generalization of their problem --\nto the case of separable, piecewise-linear concave utility functions. We give\ntwo such algorithms, the first using market equilibria and the second using the\ntheory of stable polynomials.\n In AGT, there is a paucity of methods for the design of mechanisms for the\nallocation of indivisible goods and the result of Cole and Gkatzelis seemed to\nbe taking a major step towards filling this gap. Our result can be seen as\nanother step in this direction.\n", "DOI": "1612.05191" } ``` The embeddings generated with InstructorXL model have been generated using the following instruction: > Represent the Research Paper abstract for retrieval; Input: The following code snippet shows how to generate embeddings using the InstructorXL model: ```python from InstructorEmbedding import INSTRUCTOR model = INSTRUCTOR("hkunlp/instructor-xl") sentence = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train." instruction = "Represent the Research Paper abstract for retrieval; Input:" embeddings = model.encode([[instruction, sentence]]) ``` The snapshot of the dataset might be downloaded [here](https://snapshots.qdrant.io/arxiv_abstracts-3083016565637815127-2023-06-02-07-26-29.snapshot). #### [Anchor](https://qdrant.tech/documentation/datasets/\#importing-the-dataset-1) Importing the dataset The easiest way to use the provided dataset is to recover it via the API by passing the URL as a location. It works also in [Qdrant Cloud](https://cloud.qdrant.io/). The following code snippet shows how to create a new collection and fill it with the snapshot data: ```http PUT /collections/{collection_name}/snapshots/recover { "location": "https://snapshots.qdrant.io/arxiv_abstracts-3083016565637815127-2023-06-02-07-26-29.snapshot" } ``` ## [Anchor](https://qdrant.tech/documentation/datasets/\#wolt-food) Wolt food Our [Food Discovery demo](https://food-discovery.qdrant.tech/) relies on the dataset of food images from the Wolt app. Each point in the collection represents a dish with a single image. The image is represented as a vector of 512 float numbers. There is also a JSON payload attached to each point, which looks similar to this: ```json { "cafe": { "address": "VGX7+6R2 Vecchia Napoli, Valletta", "categories": ["italian", "pasta", "pizza", "burgers", "mediterranean"], "location": {"lat": 35.8980154, "lon": 14.5145106}, "menu_id": "610936a4ee8ea7a56f4a372a", "name": "Vecchia Napoli Is-Suq Tal-Belt", "rating": 9, "slug": "vecchia-napoli-skyparks-suq-tal-belt" }, "description": "Tomato sauce, mozzarella fior di latte, crispy guanciale, Pecorino Romano cheese and a hint of chilli", "image": "https://wolt-menu-images-cdn.wolt.com/menu-images/610936a4ee8ea7a56f4a372a/005dfeb2-e734-11ec-b667-ced7a78a5abd_l_amatriciana_pizza_joel_gueller1.jpeg", "name": "L'Amatriciana" } ``` The embeddings generated with clip-ViT-B-32 model have been generated using the following code snippet: ```python from PIL import Image from sentence_transformers import SentenceTransformer image_path = "5dbfd216-5cce-11eb-8122-de94874ad1c8_ns_takeaway_seelachs_ei_baguette.jpeg" model = SentenceTransformer("clip-ViT-B-32") embedding = model.encode(Image.open(image_path)) ``` The snapshot of the dataset might be downloaded [here](https://snapshots.qdrant.io/wolt-clip-ViT-B-32-2446808438011867-2023-12-14-15-55-26.snapshot). #### [Anchor](https://qdrant.tech/documentation/datasets/\#importing-the-dataset-2) Importing the dataset The easiest way to use the provided dataset is to recover it via the API by passing the URL as a location. It works also in [Qdrant Cloud](https://cloud.qdrant.io/). The following code snippet shows how to create a new collection and fill it with the snapshot data: ```http PUT /collections/{collection_name}/snapshots/recover { "location": "https://snapshots.qdrant.io/wolt-clip-ViT-B-32-2446808438011867-2023-12-14-15-55-26.snapshot" } ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/datasets.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/datasets.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-91-lllmstxt|> ## rapid-rag-optimization-with-qdrant-and-quotient - [Articles](https://qdrant.tech/articles/) - Optimizing RAG Through an Evaluation-Based Methodology [Back to RAG & GenAI](https://qdrant.tech/articles/rag-and-genai/) --- # Optimizing RAG Through an Evaluation-Based Methodology Atita Arora · June 12, 2024 ![Optimizing RAG Through an Evaluation-Based Methodology](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/preview/title.jpg) In today’s fast-paced, information-rich world, AI is revolutionizing knowledge management. The systematic process of capturing, distributing, and effectively using knowledge within an organization is one of the fields in which AI provides exceptional value today. > The potential for AI-powered knowledge management increases when leveraging [Retrieval Augmented Generation (RAG)](https://qdrant.tech/rag/rag-evaluation-guide/), a methodology that enables LLMs to access a vast, diverse repository of factual information from knowledge stores, such as vector databases. This process enhances the accuracy, relevance, and reliability of generated text, thereby mitigating the risk of faulty, incorrect, or nonsensical results sometimes associated with traditional LLMs. This method not only ensures that the answers are contextually relevant but also up-to-date, reflecting the latest insights and data available. While RAG enhances the accuracy, relevance, and reliability of traditional LLM solutions, **an evaluation strategy can further help teams ensure their AI products meet these benchmarks of success.** ## [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#relevant-tools-for-this-experiment) Relevant tools for this experiment In this article, we’ll break down a RAG Optimization workflow experiment that demonstrates that evaluation is essential to build a successful RAG strategy. We will use Qdrant and Quotient for this experiment. [Qdrant](https://qdrant.tech/) is a vector database and vector similarity search engine designed for efficient storage and retrieval of high-dimensional vectors. Because Qdrant offers efficient indexing and searching capabilities, it is ideal for implementing RAG solutions, where quickly and accurately retrieving relevant information from extremely large datasets is crucial. Qdrant also offers a wealth of additional features, such as quantization, multivector support and multi-tenancy. Alongside Qdrant we will use Quotient, which provides a seamless way to evaluate your RAG implementation, accelerating and improving the experimentation process. [Quotient](https://www.quotientai.co/) is a platform that provides tooling for AI developers to build [evaluation frameworks](https://qdrant.tech/rag/rag-evaluation-guide/) and conduct experiments on their products. Evaluation is how teams surface the shortcomings of their applications and improve performance in key benchmarks such as faithfulness, and semantic similarity. Iteration is key to building innovative AI products that will deliver value to end users. > 💡 The [accompanying notebook](https://github.com/qdrant/qdrant-rag-eval/tree/master/workshop-rag-eval-qdrant-quotient) for this exercise can be found on GitHub for future reference. ## [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#summary-of-key-findings) Summary of key findings 1. **Irrelevance and Hallucinations**: When the documents retrieved are irrelevant, evidenced by low scores in both Chunk Relevance and Context Relevance, the model is prone to generating inaccurate or fabricated information. 2. **Optimizing Document Retrieval**: By retrieving a greater number of documents and reducing the chunk size, we observed improved outcomes in the model’s performance. 3. **Adaptive Retrieval Needs**: Certain queries may benefit from accessing more documents. Implementing a dynamic retrieval strategy that adjusts based on the query could enhance accuracy. 4. **Influence of Model and Prompt Variations**: Alterations in language models or the prompts used can significantly impact the quality of the generated responses, suggesting that fine-tuning these elements could optimize performance. Let us walk you through how we arrived at these findings! ## [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#building-a-rag-pipeline) Building a RAG pipeline To evaluate a RAG pipeline, we will have to build a RAG Pipeline first. In the interest of simplicity, we are building a Naive RAG in this article. There are certainly other versions of RAG : ![shades_of_rag.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/shades_of_rag.png) The illustration below depicts how we can leverage a [RAG Evaluation framework](https://qdrant.tech/rag/rag-evaluation-guide/) to assess the quality of RAG Application. ![qdrant_and_quotient.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/qdrant_and_quotient.png) We are going to build a RAG application using Qdrant’s Documentation and the premeditated [hugging face dataset](https://huggingface.co/datasets/atitaarora/qdrant_doc). We will then assess our RAG application’s ability to answer questions about Qdrant. To prepare our knowledge store we will use Qdrant, which can be leveraged in 3 different ways as below : ```python client = qdrant_client.QdrantClient( os.environ.get("QDRANT_URL"), api_key=os.environ.get("QDRANT_API_KEY"), ) ``` We will be using [Qdrant Cloud](https://cloud.qdrant.io/login) so it is a good idea to provide the `QDRANT_URL` and `QDRANT_API_KEY` as environment variables for easier access. Moving on, we will need to define the collection name as : ```python COLLECTION_NAME = "qdrant-docs-quotient" ``` In this case , we may need to create different collections based on the experiments we conduct. To help us provide seamless embedding creations throughout the experiment, we will use Qdrant’s own embeddings library [Fastembed](https://qdrant.github.io/fastembed/) which supports [many different models](https://qdrant.github.io/fastembed/examples/Supported_Models/) including dense as well as sparse vector models. Before implementing RAG, we need to prepare and index our data in Qdrant. This involves converting textual data into vectors using a suitable encoder (e.g., sentence transformers), and storing these vectors in Qdrant for retrieval. ```python from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.docstore.document import Document as LangchainDocument ## Load the dataset with qdrant documentation dataset = load_dataset("atitaarora/qdrant_doc", split="train") ## Dataset to langchain document langchain_docs = [\ LangchainDocument(page_content=doc["text"], metadata={"source": doc["source"]})\ for doc in dataset\ ] len(langchain_docs) #Outputs #240 ``` You can preview documents in the dataset as below : ```python ## Here's an example of what a document in our dataset looks like print(dataset[100]['text']) ``` ## [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#evaluation-dataset) Evaluation dataset To measure the quality of our RAG setup, we will need a representative evaluation dataset. This dataset should contain realistic questions and the expected answers. Additionally, including the expected contexts for which your RAG pipeline is designed to retrieve information would be beneficial. We will be using a [prebuilt evaluation dataset](https://huggingface.co/datasets/atitaarora/qdrant_doc_qna). If you are struggling to make an evaluation dataset for your use case , you can use your documents and some techniques described in this [notebook](https://github.com/qdrant/qdrant-rag-eval/blob/master/synthetic_qna/notebook/Synthetic_question_generation.ipynb) ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#building-the-rag-pipeline) Building the RAG pipeline We establish the data preprocessing parameters essential for the RAG pipeline and configure the Qdrant vector database according to the specified criteria. Key parameters under consideration are: - **Chunk size** - **Chunk overlap** - **Embedding model** - **Number of documents retrieved (retrieval window)** Following the ingestion of data in Qdrant, we proceed to retrieve pertinent documents corresponding to each query. These documents are then seamlessly integrated into our evaluation dataset, enriching the contextual information within the designated **`context`** column to fulfil the evaluation aspect. Next we define methods to take care of logistics with respect to adding documents to Qdrant ```python import uuid from qdrant_client import models def add_documents(client, collection_name, chunk_size, chunk_overlap, embedding_model_name): """ This function adds documents to the desired Qdrant collection given the specified RAG parameters. """ ## Processing each document with desired TEXT_SPLITTER_ALGO, CHUNK_SIZE, CHUNK_OVERLAP text_splitter = RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=chunk_overlap, add_start_index=True, separators=["\n\n", "\n", ".", " ", ""], ) docs_processed = [] for doc in langchain_docs: docs_processed += text_splitter.split_documents([doc]) ## Processing documents to be encoded by Fastembed docs_contents = [] docs_metadatas = [] for doc in docs_processed: if hasattr(doc, 'page_content') and hasattr(doc, 'metadata'): docs_contents.append(doc.page_content) docs_metadatas.append(doc.metadata) else: # Handle the case where attributes are missing print("Warning: Some documents do not have 'page_content' or 'metadata' attributes.") print("processed: ", len(docs_processed)) print("content: ", len(docs_contents)) print("metadata: ", len(docs_metadatas)) if not client.collection_exists(collection_name): client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE), ) client.upsert( collection_name=collection_name, points=[\ models.PointStruct(\ id=uuid.uuid4().hex,\ vector=models.Document(text=content, model=embedding_model_name),\ payload={"metadata": metadata, "document": content},\ )\ for metadata, content in zip(docs_metadatas, docs_contents)\ ], ) ``` and retrieving documents from Qdrant during our RAG Pipeline assessment. ```python def get_documents(collection_name, query, num_documents=3): """ This function retrieves the desired number of documents from the Qdrant collection given a query. It returns a list of the retrieved documents. """ search_results = client.query_points( collection_name=collection_name, query=models.Document(text=query, model=embedding_model_name), limit=num_documents, ).points results = [r.payload["document"] for r in search_results] return results ``` ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#setting-up-quotient) Setting up Quotient You will need an account log in, which you can get by requesting access on [Quotient’s website](https://www.quotientai.co/). Once you have an account, you can create an API key by running the `quotient authenticate` CLI command. **Once you have your API key, make sure to set it as an environment variable called `QUOTIENT_API_KEY`** ```python --- # Import QuotientAI client and connect to QuotientAI from quotientai.client import QuotientClient from quotientai.utils import show_job_progress --- # IMPORTANT: be sure to set your API key as an environment variable called QUOTIENT_API_KEY --- # You will need this set before running the code below. You may also uncomment the following line and insert your API key: --- # os.environ['QUOTIENT_API_KEY'] = "YOUR_API_KEY" quotient = QuotientClient() ``` **QuotientAI** provides a seamless way to integrate _RAG evaluation_ into your applications. Here, we’ll see how to use it to evaluate text generated from an LLM, based on retrieved knowledge from the Qdrant vector database. After retrieving the top similar documents and populating the `context` column, we can submit the evaluation dataset to Quotient and execute an evaluation job. To run a job, all you need is your evaluation dataset and a `recipe`. _**A recipe is a combination of a prompt template and a specified LLM.**_ **Quotient** orchestrates the evaluation run and handles version control and asset management throughout the experimentation process. _**Prior to assessing our RAG solution, it’s crucial to outline our optimization goals.**_ In the context of _question-answering on Qdrant documentation_, our focus extends beyond merely providing helpful responses. Ensuring the absence of any _inaccurate or misleading information_ is paramount. In other words, **we want to minimize hallucinations** in the LLM outputs. For our evaluation, we will be considering the following metrics, with a focus on **Faithfulness**: - **Context Relevance** - **Chunk Relevance** - **Faithfulness** - **ROUGE-L** - **BERT Sentence Similarity** - **BERTScore** ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#evaluation-in-action) Evaluation in action The function below takes an evaluation dataset as input, which in this case contains questions and their corresponding answers. It retrieves relevant documents based on the questions in the dataset and populates the context field with this information from Qdrant. The prepared dataset is then submitted to QuotientAI for evaluation for the chosen metrics. After the evaluation is complete, the function displays aggregated statistics on the evaluation metrics followed by the summarized evaluation results. ```python def run_eval(eval_df, collection_name, recipe_id, num_docs=3, path="eval_dataset_qdrant_questions.csv"): """ This function evaluates the performance of a complete RAG pipeline on a given evaluation dataset. Given an evaluation dataset (containing questions and ground truth answers), this function retrieves relevant documents, populates the context field, and submits the dataset to QuotientAI for evaluation. Once the evaluation is complete, aggregated statistics on the evaluation metrics are displayed. The evaluation results are returned as a pandas dataframe. """ # Add context to each question by retrieving relevant documents eval_df['documents'] = eval_df.apply(lambda x: get_documents(collection_name=collection_name, query=x['input_text'], num_documents=num_docs), axis=1) eval_df['context'] = eval_df.apply(lambda x: "\n".join(x['documents']), axis=1) # Now we'll save the eval_df to a CSV eval_df.to_csv(path, index=False) # Upload the eval dataset to QuotientAI dataset = quotient.create_dataset( file_path=path, name="qdrant-questions-eval-v1", ) # Create a new task for the dataset task = quotient.create_task( dataset_id=dataset['id'], name='qdrant-questions-qa-v1', task_type='question_answering' ) # Run a job to evaluate the model job = quotient.create_job( task_id=task['id'], recipe_id=recipe_id, num_fewshot_examples=0, limit=500, metric_ids=[5, 7, 8, 11, 12, 13, 50], ) # Show the progress of the job show_job_progress(quotient, job['id']) # Once the job is complete, we can get our results data = quotient.get_eval_results(job_id=job['id']) # Add the results to a pandas dataframe to get statistics on performance df = pd.json_normalize(data, "results") df_stats = df[df.columns[df.columns.str.contains("metric|completion_time")]] df.columns = df.columns.str.replace("metric.", "") df_stats.columns = df_stats.columns.str.replace("metric.", "") metrics = { 'completion_time_ms':'Completion Time (ms)', 'chunk_relevance': 'Chunk Relevance', 'selfcheckgpt_nli_relevance':"Context Relevance", 'selfcheckgpt_nli':"Faithfulness", 'rougeL_fmeasure':"ROUGE-L", 'bert_score_f1':"BERTScore", 'bert_sentence_similarity': "BERT Sentence Similarity", 'completion_verbosity':"Completion Verbosity", 'verbosity_ratio':"Verbosity Ratio",} df = df.rename(columns=metrics) df_stats = df_stats.rename(columns=metrics) display(df_stats[metrics.values()].describe()) return df main_metrics = [\ 'Context Relevance',\ 'Chunk Relevance',\ 'Faithfulness',\ 'ROUGE-L',\ 'BERT Sentence Similarity',\ 'BERTScore',\ ] ``` ## [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#experimentation) Experimentation Our approach is rooted in the belief that improvement thrives in an environment of exploration and discovery. By systematically testing and tweaking various components of the RAG pipeline, we aim to incrementally enhance its capabilities and performance. In the following section, we dive into the details of our experimentation process, outlining the specific experiments conducted and the insights gained. ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#experiment-1---baseline) Experiment 1 - Baseline Parameters - **Embedding Model: `bge-small-en`** - **Chunk size: `512`** - **Chunk overlap: `64`** - **Number of docs retrieved (Retireval Window): `3`** - **LLM: `Mistral-7B-Instruct`** We’ll process our documents based on configuration above and ingest them into Qdrant using `add_documents` method introduced earlier ```python #experiment1 - base config chunk_size = 512 chunk_overlap = 64 embedding_model_name = "BAAI/bge-small-en" num_docs = 3 COLLECTION_NAME = f"experiment_{chunk_size}_{chunk_overlap}_{embedding_model_name.split('/')[1]}" add_documents(client, collection_name=COLLECTION_NAME, chunk_size=chunk_size, chunk_overlap=chunk_overlap, embedding_model_name=embedding_model_name) #Outputs #processed: 4504 #content: 4504 #metadata: 4504 ``` Notice the `COLLECTION_NAME` which helps us segregate and identify our collections based on the experiments conducted. To proceed with the evaluation, let’s create the `evaluation recipe` up next ```python --- # Create a recipe for the generator model and prompt template recipe_mistral = quotient.create_recipe( model_id=10, prompt_template_id=1, name='mistral-7b-instruct-qa-with-rag', description='Mistral-7b-instruct using a prompt template that includes context.' ) recipe_mistral #Outputs recipe JSON with the used prompt template #'prompt_template': {'id': 1, --- # 'template_string': 'Question: {input_text}\\n\\nContext: {context}\\n\\nAnswer:', --- # 'owner_profile_id': None} ``` To get a list of your existing recipes, you can simply run: ```python quotient.list_recipes() ``` Notice the recipe template is a simplest prompt using `Question` from evaluation template `Context` from document chunks retrieved from Qdrant and `Answer` generated by the pipeline. To kick off the evaluation ```python --- # Kick off an evaluation job experiment_1 = run_eval(eval_df, collection_name=COLLECTION_NAME, recipe_id=recipe_mistral['id'], num_docs=num_docs, path=f"{COLLECTION_NAME}_{num_docs}_mistral.csv") ``` This may take few minutes (depending on the size of evaluation dataset!) We can look at the results from our first (baseline) experiment as below : ![experiment1_eval.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/experiment1_eval.png) Notice that we have a pretty **low average Chunk Relevance** and **very large standard deviations for both Chunk Relevance and Context Relevance**. Let’s take a look at some of the lower performing datapoints with **poor Faithfulness**: ```python with pd.option_context('display.max_colwidth', 0): display(experiment_1[['content.input_text', 'content.answer','content.documents','Chunk Relevance','Context Relevance','Faithfulness']\ ].sort_values(by='Faithfulness').head(2)) ``` ![experiment1_bad_examples.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/experiment1_bad_examples.png) In instances where the retrieved documents are **irrelevant (where both Chunk Relevance and Context Relevance are low)**, the model also shows **tendencies to hallucinate** and **produce poor quality responses**. The quality of the retrieved text directly impacts the quality of the LLM-generated answer. Therefore, our focus will be on enhancing the RAG setup by **adjusting the chunking parameters**. ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#experiment-2---adjusting-the-chunk-parameter) Experiment 2 - Adjusting the chunk parameter Keeping all other parameters constant, we changed the `chunk size` and `chunk overlap` to see if we can improve our results. Parameters : - **Embedding Model : `bge-small-en`** - **Chunk size: `1024`** - **Chunk overlap: `128`** - **Number of docs retrieved (Retireval Window): `3`** - **LLM: `Mistral-7B-Instruct`** We will reprocess the data with the updated parameters above: ```python ## for iteration 2 - lets modify chunk configuration ## We will start with creating seperate collection to store vectors chunk_size = 1024 chunk_overlap = 128 embedding_model_name = "BAAI/bge-small-en" num_docs = 3 COLLECTION_NAME = f"experiment_{chunk_size}_{chunk_overlap}_{embedding_model_name.split('/')[1]}" add_documents(client, collection_name=COLLECTION_NAME, chunk_size=chunk_size, chunk_overlap=chunk_overlap, embedding_model_name=embedding_model_name) #Outputs #processed: 2152 #content: 2152 #metadata: 2152 ``` Followed by running evaluation : ![experiment2_eval.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/experiment2_eval.png) and **comparing it with the results from Experiment 1:** ![graph_exp1_vs_exp2.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/graph_exp1_vs_exp2.png) We observed slight enhancements in our LLM completion metrics (including BERT Sentence Similarity, BERTScore, ROUGE-L, and Knowledge F1) with the increase in _chunk size_. However, it’s noteworthy that there was a significant decrease in _Faithfulness_, which is the primary metric we are aiming to optimize. Moreover, _Context Relevance_ demonstrated an increase, indicating that the RAG pipeline retrieved more relevant information required to address the query. Nonetheless, there was a considerable drop in _Chunk Relevance_, implying that a smaller portion of the retrieved documents contained pertinent information for answering the question. **The correlation between the rise in Context Relevance and the decline in Chunk Relevance suggests that retrieving more documents using the smaller chunk size might yield improved results.** ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#experiment-3---increasing-the-number-of-documents-retrieved-retrieval-window) Experiment 3 - Increasing the number of documents retrieved (retrieval window) This time, we are using the same RAG setup as `Experiment 1`, but increasing the number of retrieved documents from **3** to **5**. Parameters : - **Embedding Model : `bge-small-en`** - **Chunk size: `512`** - **Chunk overlap: `64`** - **Number of docs retrieved (Retrieval Window): `5`** - **LLM: : `Mistral-7B-Instruct`** We can use the collection from Experiment 1 and run evaluation with modified `num_docs` parameter as : ```python #collection name from Experiment 1 COLLECTION_NAME = f"experiment_{chunk_size}_{chunk_overlap}_{embedding_model_name.split('/')[1]}" #running eval for experiment 3 experiment_3 = run_eval(eval_df, collection_name=COLLECTION_NAME, recipe_id=recipe_mistral['id'], num_docs=num_docs, path=f"{COLLECTION_NAME}_{num_docs}_mistral.csv") ``` Observe the results as below : ![experiment_3_eval.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/experiment_3_eval.png) Comparing the results with Experiment 1 and 2 : ![graph_exp1_exp2_exp3.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/graph_exp1_exp2_exp3.png) As anticipated, employing the smaller chunk size while retrieving a larger number of documents resulted in achieving the highest levels of both _Context Relevance_ and _Chunk Relevance._ Additionally, it yielded the **best** (albeit marginal) _Faithfulness_ score, indicating a _reduced occurrence of inaccuracies or hallucinations_. Looks like we have achieved a good hold on our chunking parameters but it is worth testing another embedding model to see if we can get better results. ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#experiment-4---changing-the-embedding-model) Experiment 4 - Changing the embedding model Let us try using **MiniLM** for this experiment \*\*\*\*Parameters : - **Embedding Model : `MiniLM-L6-v2`** - **Chunk size: `512`** - **Chunk overlap: `64`** - **Number of docs retrieved (Retrieval Window): `5`** - **LLM: : `Mistral-7B-Instruct`** We will have to create another collection for this experiment : ```python #experiment-4 chunk_size=512 chunk_overlap=64 embedding_model_name="sentence-transformers/all-MiniLM-L6-v2" num_docs=5 COLLECTION_NAME = f"experiment_{chunk_size}_{chunk_overlap}_{embedding_model_name.split('/')[1]}" add_documents(client, collection_name=COLLECTION_NAME, chunk_size=chunk_size, chunk_overlap=chunk_overlap, embedding_model_name=embedding_model_name) #Outputs #processed: 4504 #content: 4504 #metadata: 4504 ``` We will observe our evaluations as : ![experiment4_eval.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/experiment4_eval.png) Comparing these with our previous experiments : ![graph_exp1_exp2_exp3_exp4.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/graph_exp1_exp2_exp3_exp4.png) It appears that `bge-small` was more proficient in capturing the semantic nuances of the Qdrant Documentation. Up to this point, our experimentation has focused solely on the _retrieval aspect_ of our RAG pipeline. Now, let’s explore altering the _generation aspect_ or LLM while retaining the optimal parameters identified in Experiment 3. ### [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#experiment-5---changing-the-llm) Experiment 5 - Changing the LLM Parameters : - **Embedding Model : `bge-small-en`** - **Chunk size: `512`** - **Chunk overlap: `64`** - **Number of docs retrieved (Retrieval Window): `5`** - **LLM: : `GPT-3.5-turbo`** For this we can repurpose our collection from Experiment 3 while the evaluations to use a new recipe with **GPT-3.5-turbo** model. ```python #collection name from Experiment 3 COLLECTION_NAME = f"experiment_{chunk_size}_{chunk_overlap}_{embedding_model_name.split('/')[1]}" --- # We have to create a recipe using the same prompt template and GPT-3.5-turbo recipe_gpt = quotient.create_recipe( model_id=5, prompt_template_id=1, name='gpt3.5-qa-with-rag-recipe-v1', description='GPT-3.5 using a prompt template that includes context.' ) recipe_gpt #Outputs #{'id': 495, --- # 'description': 'GPT-3.5 using a prompt template that includes context.', --- # 'template_string': 'Question: {input_text}\\n\\nContext: {context}\\n\\nAnswer:', --- # 'endpoint': 'https://api.openai.com/v1/chat/completions', --- # 'description': 'Returns a maximum of 4K output tokens.', --- # 'instruction_template_cls': 'NoneType'}} ``` Running the evaluations as : ```python experiment_5 = run_eval(eval_df, collection_name=COLLECTION_NAME, recipe_id=recipe_gpt['id'], num_docs=num_docs, path=f"{COLLECTION_NAME}_{num_docs}_gpt.csv") ``` We observe : ![experiment5_eval.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/experiment5_eval.png) and comparing all the 5 experiments as below : ![graph_exp1_exp2_exp3_exp4_exp5.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/graph_exp1_exp2_exp3_exp4_exp5.png) **GPT-3.5 surpassed Mistral-7B in all metrics**! Notably, Experiment 5 exhibited the **lowest occurrence of hallucination**. ## [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#conclusions) Conclusions Let’s take a look at our results from all 5 experiments above ![overall_eval_results.png](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/overall_eval_results.png) We still have a long way to go in improving the retrieval performance of RAG, as indicated by our generally poor results thus far. It might be beneficial to **explore alternative embedding models** or **different retrieval strategies** to address this issue. The significant variations in _Context Relevance_ suggest that **certain questions may necessitate retrieving more documents than others**. Therefore, investigating a **dynamic retrieval strategy** could be worthwhile. Furthermore, there’s ongoing **exploration required on the generative aspect** of RAG. Modifying LLMs or prompts can substantially impact the overall quality of responses. This iterative process demonstrates how, starting from scratch, continual evaluation and adjustments throughout experimentation can lead to the development of an enhanced RAG system. ## [Anchor](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/\#watch-this-workshop-on-youtube) Watch this workshop on YouTube > A workshop version of this article is [available on YouTube](https://www.youtube.com/watch?v=3MEMPZR1aZA). Follow along using our [GitHub notebook](https://github.com/qdrant/qdrant-rag-eval/tree/master/workshop-rag-eval-qdrant-quotient). Rapid RAG Optimization with Qdrant and Quotient - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [Rapid RAG Optimization with Qdrant and Quotient](https://www.youtube.com/watch?v=3MEMPZR1aZA) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=3MEMPZR1aZA&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 51:40 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=3MEMPZR1aZA "Watch on YouTube") ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/rapid-rag-optimization-with-qdrant-and-quotient.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/rapid-rag-optimization-with-qdrant-and-quotient.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-92-lllmstxt|> ## hybrid-queries - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Hybrid Queries --- # [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#hybrid-and-multi-stage-queries) Hybrid and Multi-Stage Queries _Available as of v1.10.0_ With the introduction of [many named vectors per point](https://qdrant.tech/documentation/concepts/vectors/#named-vectors), there are use-cases when the best search is obtained by combining multiple queries, or by performing the search in more than one stage. Qdrant has a flexible and universal interface to make this possible, called `Query API` ( [API reference](https://api.qdrant.tech/api-reference/search/query-points)). The main component for making the combinations of queries possible is the `prefetch` parameter, which enables making sub-requests. Specifically, whenever a query has at least one prefetch, Qdrant will: 1. Perform the prefetch query (or queries), 2. Apply the main query over the results of its prefetch(es). Additionally, prefetches can have prefetches themselves, so you can have nested prefetches. ## [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#hybrid-search) Hybrid Search One of the most common problems when you have different representations of the same data is to combine the queried points for each representation into a single result. ![Fusing results from multiple queries](https://qdrant.tech/docs/fusion-idea.png) Fusing results from multiple queries For example, in text search, it is often useful to combine dense and sparse vectors get the best of semantics, plus the best of matching specific words. Qdrant currently has two ways of combining the results from different queries: - `rrf` - [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) Considers the positions of results within each query, and boosts the ones that appear closer to the top in multiple of them. - `dbsf` - [Distribution-Based Score Fusion](https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18) _(available as of v1.11.0)_ Normalizes the scores of the points in each query, using the mean +/- the 3rd standard deviation as limits, and then sums the scores of the same point across different queries. Here is an example of Reciprocal Rank Fusion for a query containing two prefetches against different named vectors configured to respectively hold sparse and dense vectors. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "prefetch": [\ {\ "query": {\ "indices": [1, 42], // <┐\ "values": [0.22, 0.8] // <┴─sparse vector\ },\ "using": "sparse",\ "limit": 20\ },\ {\ "query": [0.01, 0.45, 0.67, ...], // <-- dense vector\ "using": "dense",\ "limit": 20\ }\ ], "query": { "fusion": "rrf" }, // <--- reciprocal rank fusion "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", prefetch=[\ models.Prefetch(\ query=models.SparseVector(indices=[1, 42], values=[0.22, 0.8]),\ using="sparse",\ limit=20,\ ),\ models.Prefetch(\ query=[0.01, 0.45, 0.67], # <-- dense vector\ using="dense",\ limit=20,\ ),\ ], query=models.FusionQuery(fusion=models.Fusion.RRF), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { prefetch: [\ {\ query: {\ values: [0.22, 0.8],\ indices: [1, 42],\ },\ using: 'sparse',\ limit: 20,\ },\ {\ query: [0.01, 0.45, 0.67],\ using: 'dense',\ limit: 20,\ },\ ], query: { fusion: 'rrf', }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{Fusion, PrefetchQueryBuilder, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.query( QueryPointsBuilder::new("{collection_name}") .add_prefetch(PrefetchQueryBuilder::default() .query(Query::new_nearest([(1, 0.22), (42, 0.8)].as_slice())) .using("sparse") .limit(20u64) ) .add_prefetch(PrefetchQueryBuilder::default() .query(Query::new_nearest(vec![0.01, 0.45, 0.67])) .using("dense") .limit(20u64) ) .query(Query::new_fusion(Fusion::Rrf)) ).await?; ``` ```java import static io.qdrant.client.QueryFactory.nearest; import java.util.List; import static io.qdrant.client.QueryFactory.fusion; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Fusion; import io.qdrant.client.grpc.Points.PrefetchQuery; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .addPrefetch(PrefetchQuery.newBuilder() .setQuery(nearest(List.of(0.22f, 0.8f), List.of(1, 42))) .setUsing("sparse") .setLimit(20) .build()) .addPrefetch(PrefetchQuery.newBuilder() .setQuery(nearest(List.of(0.01f, 0.45f, 0.67f))) .setUsing("dense") .setLimit(20) .build()) .setQuery(fusion(Fusion.RRF)) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", prefetch: new List < PrefetchQuery > { new() { Query = new(float, uint)[] { (0.22f, 1), (0.8f, 42), }, Using = "sparse", Limit = 20 }, new() { Query = new float[] { 0.01f, 0.45f, 0.67f }, Using = "dense", Limit = 20 } }, query: Fusion.Rrf ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Prefetch: []*qdrant.PrefetchQuery{ { Query: qdrant.NewQuerySparse([]uint32{1, 42}, []float32{0.22, 0.8}), Using: qdrant.PtrOf("sparse"), }, { Query: qdrant.NewQueryDense([]float32{0.01, 0.45, 0.67}), Using: qdrant.PtrOf("dense"), }, }, Query: qdrant.NewQueryFusion(qdrant.Fusion_RRF), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#multi-stage-queries) Multi-stage queries In many cases, the usage of a larger vector representation gives more accurate search results, but it is also more expensive to compute. Splitting the search into two stages is a known technique: - First, use a smaller and cheaper representation to get a large list of candidates. - Then, re-score the candidates using the larger and more accurate representation. There are a few ways to build search architectures around this idea: - The quantized vectors as a first stage, and the full-precision vectors as a second stage. - Leverage Matryoshka Representation Learning ( [MRL](https://arxiv.org/abs/2205.13147)) to generate candidate vectors with a shorter vector, and then refine them with a longer one. - Use regular dense vectors to pre-fetch the candidates, and then re-score them with a multi-vector model like [ColBERT](https://arxiv.org/abs/2112.01488). To get the best of all worlds, Qdrant has a convenient interface to perform the queries in stages, such that the coarse results are fetched first, and then they are refined later with larger vectors. ### [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#re-scoring-examples) Re-scoring examples Fetch 1000 results using a shorter MRL byte vector, then re-score them using the full vector and get the top 10. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "prefetch": { "query": [1, 23, 45, 67], // <------------- small byte vector "using": "mrl_byte" "limit": 1000 }, "query": [0.01, 0.299, 0.45, 0.67, ...], // <-- full vector "using": "full", "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", prefetch=models.Prefetch( query=[1, 23, 45, 67], # <------------- small byte vector using="mrl_byte", limit=1000, ), query=[0.01, 0.299, 0.45, 0.67], # <-- full vector using="full", limit=10, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { prefetch: { query: [1, 23, 45, 67], // <------------- small byte vector using: 'mrl_byte', limit: 1000, }, query: [0.01, 0.299, 0.45, 0.67], // <-- full vector, using: 'full', limit: 10, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{PrefetchQueryBuilder, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.query( QueryPointsBuilder::new("{collection_name}") .add_prefetch(PrefetchQueryBuilder::default() .query(Query::new_nearest(vec![1.0, 23.0, 45.0, 67.0])) .using("mlr_byte") .limit(1000u64) ) .query(Query::new_nearest(vec![0.01, 0.299, 0.45, 0.67])) .using("full") .limit(10u64) ).await?; ``` ```java import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PrefetchQuery; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .addPrefetch( PrefetchQuery.newBuilder() .setQuery(nearest(1, 23, 45, 67)) // <------------- small byte vector .setLimit(1000) .setUsing("mrl_byte") .build()) .setQuery(nearest(0.01f, 0.299f, 0.45f, 0.67f)) // <-- full vector .setUsing("full") .setLimit(10) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", prefetch: new List { new() { Query = new float[] { 1,23, 45, 67 }, // <------------- small byte vector Using = "mrl_byte", Limit = 1000 } }, query: new float[] { 0.01f, 0.299f, 0.45f, 0.67f }, // <-- full vector usingVector: "full", limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Prefetch: []*qdrant.PrefetchQuery{ { Query: qdrant.NewQueryDense([]float32{1, 23, 45, 67}), Using: qdrant.PtrOf("mrl_byte"), Limit: qdrant.PtrOf(uint64(1000)), }, }, Query: qdrant.NewQueryDense([]float32{0.01, 0.299, 0.45, 0.67}), Using: qdrant.PtrOf("full"), }) ``` Fetch 100 results using the default vector, then re-score them using a multi-vector to get the top 10. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "prefetch": { "query": [0.01, 0.45, 0.67, ...], // <-- dense vector "limit": 100 }, "query": [ // <─┐\ [0.1, 0.2, ...], // < │\ [0.2, 0.1, ...], // < ├─ multi-vector\ [0.8, 0.9, ...] // < │\ ], // <─┘ "using": "colbert", "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", prefetch=models.Prefetch( query=[0.01, 0.45, 0.67, 0.53], # <-- dense vector limit=100, ), query=[\ [0.1, 0.2, 0.32], # <─┐\ [0.2, 0.1, 0.52], # < ├─ multi-vector\ [0.8, 0.9, 0.93], # < ┘\ ], using="colbert", limit=10, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { prefetch: { query: [1, 23, 45, 67], // <------------- small byte vector limit: 100, }, query: [\ [0.1, 0.2], // <─┐\ [0.2, 0.1], // < ├─ multi-vector\ [0.8, 0.9], // < ┘\ ], using: 'colbert', limit: 10, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{PrefetchQueryBuilder, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.query( QueryPointsBuilder::new("{collection_name}") .add_prefetch(PrefetchQueryBuilder::default() .query(Query::new_nearest(vec![0.01, 0.45, 0.67])) .limit(100u64) ) .query(Query::new_nearest(vec![\ vec![0.1, 0.2],\ vec![0.2, 0.1],\ vec![0.8, 0.9],\ ])) .using("colbert") .limit(10u64) ).await?; ``` ```java import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PrefetchQuery; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .addPrefetch( PrefetchQuery.newBuilder() .setQuery(nearest(0.01f, 0.45f, 0.67f)) // <-- dense vector .setLimit(100) .build()) .setQuery( nearest( new float[][] { {0.1f, 0.2f}, // <─┐ {0.2f, 0.1f}, // < ├─ multi-vector {0.8f, 0.9f} // < ┘ })) .setUsing("colbert") .setLimit(10) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", prefetch: new List { new() { Query = new float[] { 0.01f, 0.45f, 0.67f }, // <-- dense vector**** Limit = 100 } }, query: new float[][] { [0.1f, 0.2f], // <─┐ [0.2f, 0.1f], // < ├─ multi-vector [0.8f, 0.9f] // < ┘ }, usingVector: "colbert", limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Prefetch: []*qdrant.PrefetchQuery{ { Query: qdrant.NewQueryDense([]float32{0.01, 0.45, 0.67}), Limit: qdrant.PtrOf(uint64(100)), }, }, Query: qdrant.NewQueryMulti([][]float32{ {0.1, 0.2}, {0.2, 0.1}, {0.8, 0.9}, }), Using: qdrant.PtrOf("colbert"), }) ``` It is possible to combine all the above techniques in a single query: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "prefetch": { "prefetch": { "query": [1, 23, 45, 67], // <------ small byte vector "using": "mrl_byte" "limit": 1000 }, "query": [0.01, 0.45, 0.67, ...], // <-- full dense vector "using": "full" "limit": 100 }, "query": [ // <─┐\ [0.1, 0.2, ...], // < │\ [0.2, 0.1, ...], // < ├─ multi-vector\ [0.8, 0.9, ...] // < │\ ], // <─┘ "using": "colbert", "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", prefetch=models.Prefetch( prefetch=models.Prefetch( query=[1, 23, 45, 67], # <------ small byte vector using="mrl_byte", limit=1000, ), query=[0.01, 0.45, 0.67], # <-- full dense vector using="full", limit=100, ), query=[\ [0.17, 0.23, 0.52], # <─┐\ [0.22, 0.11, 0.63], # < ├─ multi-vector\ [0.86, 0.93, 0.12], # < ┘\ ], using="colbert", limit=10, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { prefetch: { prefetch: { query: [1, 23, 45, 67], // <------------- small byte vector using: 'mrl_byte', limit: 1000, }, query: [0.01, 0.45, 0.67], // <-- full dense vector using: 'full', limit: 100, }, query: [\ [0.1, 0.2], // <─┐\ [0.2, 0.1], // < ├─ multi-vector\ [0.8, 0.9], // < ┘\ ], using: 'colbert', limit: 10, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{PrefetchQueryBuilder, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.query( QueryPointsBuilder::new("{collection_name}") .add_prefetch(PrefetchQueryBuilder::default() .add_prefetch(PrefetchQueryBuilder::default() .query(Query::new_nearest(vec![1.0, 23.0, 45.0, 67.0])) .using("mlr_byte") .limit(1000u64) ) .query(Query::new_nearest(vec![0.01, 0.45, 0.67])) .using("full") .limit(100u64) ) .query(Query::new_nearest(vec![\ vec![0.1, 0.2],\ vec![0.2, 0.1],\ vec![0.8, 0.9],\ ])) .using("colbert") .limit(10u64) ).await?; ``` ```java import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PrefetchQuery; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .addPrefetch( PrefetchQuery.newBuilder() .addPrefetch( PrefetchQuery.newBuilder() .setQuery(nearest(1, 23, 45, 67)) // <------------- small byte vector .setUsing("mrl_byte") .setLimit(1000) .build()) .setQuery(nearest(0.01f, 0.45f, 0.67f)) // <-- dense vector .setUsing("full") .setLimit(100) .build()) .setQuery( nearest( new float[][] { {0.1f, 0.2f}, // <─┐ {0.2f, 0.1f}, // < ├─ multi-vector {0.8f, 0.9f} // < ┘ })) .setUsing("colbert") .setLimit(10) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", prefetch: new List { new() { Prefetch = { new List { new() { Query = new float[] { 1, 23, 45, 67 }, // <------------- small byte vector Using = "mrl_byte", Limit = 1000 }, } }, Query = new float[] {0.01f, 0.45f, 0.67f}, // <-- dense vector Using = "full", Limit = 100 } }, query: new float[][] { [0.1f, 0.2f], // <─┐ [0.2f, 0.1f], // < ├─ multi-vector [0.8f, 0.9f] // < ┘ }, usingVector: "colbert", limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Prefetch: []*qdrant.PrefetchQuery{ { Prefetch: []*qdrant.PrefetchQuery{ { Query: qdrant.NewQueryDense([]float32{1, 23, 45, 67}), Using: qdrant.PtrOf("mrl_byte"), Limit: qdrant.PtrOf(uint64(1000)), }, }, Query: qdrant.NewQueryDense([]float32{0.01, 0.45, 0.67}), Limit: qdrant.PtrOf(uint64(100)), Using: qdrant.PtrOf("full"), }, }, Query: qdrant.NewQueryMulti([][]float32{ {0.1, 0.2}, {0.2, 0.1}, {0.8, 0.9}, }), Using: qdrant.PtrOf("colbert"), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#score-boosting) Score boosting _Available as of v1.14.0_ When introducing vector search to specific applications, sometimes business logic needs to be considered for ranking the final list of results. A quick example is [our own documentation search bar](https://github.com/qdrant/page-search). It has vectors for every part of the documentation site. If one were to perform a search by “just” using the vectors, all kinds of elements would be equally considered good results. However, when searching for documentation, we can establish a hierarchy of importance: `title > content > snippets` One way to solve this is to weight the results based on the kind of element. For example, we can assign a higher weight to titles and content, and keep snippets unboosted. Pseudocode would be something like: `score = score + (is_title * 0.5) + (is_content * 0.25)` Query API can rescore points with custom formulas. They can be based on: - Dynamic payload values - Conditions - Scores of prefetches To express the formula, the syntax uses objects to identify each element. Taking the documentation example, the request would look like this: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "prefetch": { "query": [0.2, 0.8, ...], // <-- dense vector "limit": 50 } "query": { "formula": { "sum": [\ "$score",\ {\ "mult": [\ 0.5,\ {\ "key": "tag",\ "match": { "any": ["h1", "h2", "h3", "h4"] }\ }\ ]\ },\ {\ "mult": [\ 0.25,\ {\ "key": "tag",\ "match": { "any": ["p", "li"] }\ }\ ]\ }\ ] } } } ``` ```python from qdrant_client import models tag_boosted = client.query_points( collection_name="{collection_name}", prefetch=models.Prefetch( query=[0.2, 0.8, ...], # <-- dense vector limit=50 ), query=models.FormulaQuery( formula=models.SumExpression(sum=[\ "$score",\ models.MultExpression(mult=[0.5, models.FieldCondition(key="tag", match=models.MatchAny(any=["h1", "h2", "h3", "h4"]))]),\ models.MultExpression(mult=[0.25, models.FieldCondition(key="tag", match=models.MatchAny(any=["p", "li"]))])\ ] )) ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); const tag_boosted = await client.query(collectionName, { prefetch: { query: [0.2, 0.8, 0.1, 0.9], limit: 50 }, query: { formula: { sum: [\ "$score",\ {\ mult: [ 0.5, { key: "tag", match: { any: ["h1", "h2", "h3", "h4"] }} ]\ },\ {\ mult: [ 0.25, { key: "tag", match: { any: ["p", "li"] }} ]\ }\ ] } } }); ``` ```rust use qdrant_client::qdrant::{ Condition, Expression, FormulaBuilder, PrefetchQueryBuilder, QueryPointsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let _tag_boosted = client.query( QueryPointsBuilder::new("{collection_name}") .add_prefetch(PrefetchQueryBuilder::default() .query(vec![0.01, 0.45, 0.67]) .limit(100u64) ) .query(FormulaBuilder::new(Expression::sum_with([\ Expression::score(),\ Expression::mult_with([\ Expression::constant(0.5),\ Expression::condition(Condition::matches("tag", ["h1", "h2", "h3", "h4"])),\ ]),\ Expression::mult_with([\ Expression::constant(0.25),\ Expression::condition(Condition::matches("tag", ["p", "li"])),\ ]),\ ]))) .limit(10) ).await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.matchKeywords; import static io.qdrant.client.ExpressionFactory.condition; import static io.qdrant.client.ExpressionFactory.constant; import static io.qdrant.client.ExpressionFactory.mult; import static io.qdrant.client.ExpressionFactory.sum; import static io.qdrant.client.ExpressionFactory.variable; import static io.qdrant.client.QueryFactory.formula; import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Formula; import io.qdrant.client.grpc.Points.MultExpression; import io.qdrant.client.grpc.Points.PrefetchQuery; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SumExpression; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .addPrefetch( PrefetchQuery.newBuilder() .setQuery(nearest(0.01f, 0.45f, 0.67f)) .setLimit(100) .build()) .setQuery( formula( Formula.newBuilder() .setExpression( sum( SumExpression.newBuilder() .addSum(variable("$score")) .addSum( mult( MultExpression.newBuilder() .addMult(constant(0.5f)) .addMult( condition( matchKeywords( "tag", List.of("h1", "h2", "h3", "h4")))) .build())) .addSum(mult(MultExpression.newBuilder() .addMult(constant(0.25f)) .addMult( condition( matchKeywords( "tag", List.of("p", "li")))) .build())) .build())) .build())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", prefetch: [\ new PrefetchQuery { Query = new float[] { 0.01f, 0.45f, 0.67f }, Limit = 100 },\ ], query: new Formula { Expression = new SumExpression { Sum = { "$score", new MultExpression { Mult = { 0.5f, Match("tag", ["h1", "h2", "h3", "h4"]) }, }, new MultExpression { Mult = { 0.25f, Match("tag", ["p", "li"]) } }, }, }, }, limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Prefetch: []*qdrant.PrefetchQuery{ { Query: qdrant.NewQuery(0.01, 0.45, 0.67), }, }, Query: qdrant.NewQueryFormula(&qdrant.Formula{ Expression: qdrant.NewExpressionSum(&qdrant.SumExpression{ Sum: []*qdrant.Expression{ qdrant.NewExpressionVariable("$score"), qdrant.NewExpressionMult(&qdrant.MultExpression{ Mult: []*qdrant.Expression{ qdrant.NewExpressionConstant(0.5), qdrant.NewExpressionCondition(qdrant.NewMatchKeywords("tag", "h1", "h2", "h3", "h4")), }, }), qdrant.NewExpressionMult(&qdrant.MultExpression{ Mult: []*qdrant.Expression{ qdrant.NewExpressionConstant(0.25), qdrant.NewExpressionCondition(qdrant.NewMatchKeywords("tag", "p", "li")), }, }), }, }), }), }) ``` There are multiple expressions available, check the [API docs for specific details](https://api.qdrant.tech/v-1-14-x/api-reference/search/query-points#request.body.query.Query%20Interface.Query.Formula%20Query.formula). - **constant** \- A floating point number. e.g. `0.5`. - `"$score"` \- Reference to the score of the point in the prefetch. This is the same as `"$score[0]"`. - `"$score[0]"`, `"$score[1]"`, `"$score[2]"`, … \- When using multiple prefetches, you can reference specific prefetch with the index within the array of prefetches. - **payload key** \- Any plain string will refer to a payload key. This uses the jsonpath format used in every other place, e.g. `key` or `key.subkey`. It will try to extract a number from the given key. - **condition** \- A filtering condition. If the condition is met, it becomes `1.0`, otherwise `0.0`. - **mult** \- Multiply an array of expressions. - **sum** \- Sum an array of expressions. - **div** \- Divide an expression by another expression. - **abs** \- Absolute value of an expression. - **pow** \- Raise an expression to the power of another expression. - **sqrt** \- Square root of an expression. - **log10** \- Base 10 logarithm of an expression. - **ln** \- Natural logarithm of an expression. - **exp** \- Exponential function of an expression ( `e^x`). - **geo distance** \- Haversine distance between two geographic points. Values need to be `{ "lat": 0.0, "lon": 0.0 }` objects. - **decay** \- Apply a decay function to an expression, which clamps the output between 0 and 1. Available decay functions are **linear**, **exponential**, and **gaussian**. [See more](https://qdrant.tech/documentation/concepts/hybrid-queries/#boost-points-closer-to-user). - **datetime** \- Parse a datetime string (see formats [here](https://qdrant.tech/documentation/concepts/payload/#datetime)), and use it as a POSIX timestamp, in seconds. - **datetime key** \- Specify that a payload key contains a datetime string to be parsed into POSIX seconds. It is possible to define a default for when the variable (either from payload or prefetch score) is not found. This is given in the form of a mapping from variable to value. If there is no variable, and no defined default, a default value of `0.0` is used. ### [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#boost-points-closer-to-user) Boost points closer to user Another example. Combine the score with how close the result is to a user. Considering each point has an associated geo location, we can calculate the distance between the point and the request’s location. Assuming we have cosine scores in the prefetch, we can use a helper function to clamp the geographical distance between 0 and 1, by using a decay function. Once clamped, we can sum the score and the distance together. Pseudocode: `score = score + gauss_decay(distance)` In this case we use a **gauss\_decay** function. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "prefetch": { "query": [0.2, 0.8, ...], "limit": 50 }, "query": { "formula": { "sum": [\ "$score",\ {\ "gauss_decay": {\ "x": {\ "geo_distance": {\ "origin": { "lat": 52.504043, "lon": 13.393236 }\ "to": "geo.location"\ }\ },\ "scale": 5000 // 5km\ }\ }\ ] }, "defaults": { "geo.location": {"lat": 48.137154, "lon": 11.576124} } } } ``` ```python from qdrant_client import models geo_boosted = client.query_points( collection_name="{collection_name}", prefetch=models.Prefetch( query=[0.2, 0.8, ...], # <-- dense vector limit=50 ), query=models.FormulaQuery( formula=models.SumExpression(sum=[\ "$score",\ models.GaussDecayExpression(\ gauss_decay=models.DecayParamsExpression(\ x=models.GeoDistance(\ geo_distance=models.GeoDistanceParams(\ origin=models.GeoPoint(\ lat=52.504043,\ lon=13.393236\ ), # Berlin\ to="geo.location"\ )\ ),\ scale=5000 # 5km\ )\ )\ ]), defaults={"geo.location": models.GeoPoint(lat=48.137154, lon=11.576124)} # Munich ) ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); const distance_boosted = await client.query(collectionName, { prefetch: { query: [0.2, 0.8, ...], limit: 50 }, query: { formula: { sum: [\ "$score",\ {\ gauss_decay: {\ x: {\ geo_distance: {\ origin: { lat: 52.504043, lon: 13.393236 }, // Berlin\ to: "geo.location"\ }\ },\ scale: 5000 // 5km\ }\ }\ ] }, defaults: { "geo.location": { lat: 48.137154, lon: 11.576124 } } // Munich } }); ``` ```rust use qdrant_client::qdrant::{ GeoPoint, DecayParamsExpressionBuilder, Expression, FormulaBuilder, PrefetchQueryBuilder, QueryPointsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let _geo_boosted = client.query( QueryPointsBuilder::new("{collection_name}") .add_prefetch( PrefetchQueryBuilder::default() .query(vec![0.01, 0.45, 0.67]) .limit(100u64), ) .query( FormulaBuilder::new(Expression::sum_with([\ Expression::score(),\ Expression::exp_decay(\ DecayParamsExpressionBuilder::new(Expression::geo_distance_with(\ // Berlin\ GeoPoint { lat: 52.504043, lon: 13.393236 },\ "geo.location",\ ))\ .scale(5_000.0),\ ),\ ])) // Munich .add_default("geo.location", GeoPoint { lat: 48.137154, lon: 11.576124 }), ) .limit(10), ) .await?; ``` ```java import static io.qdrant.client.ExpressionFactory.expDecay; import static io.qdrant.client.ExpressionFactory.geoDistance; import static io.qdrant.client.ExpressionFactory.sum; import static io.qdrant.client.ExpressionFactory.variable; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.QueryFactory.formula; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.ValueFactory.value; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.DecayParamsExpression; import io.qdrant.client.grpc.Points.Formula; import io.qdrant.client.grpc.Points.GeoDistance; import io.qdrant.client.grpc.Points.GeoPoint; import io.qdrant.client.grpc.Points.PrefetchQuery; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SumExpression; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .addPrefetch( PrefetchQuery.newBuilder() .setQuery(nearest(0.01f, 0.45f, 0.67f)) .setLimit(100) .build()) .setQuery( formula( Formula.newBuilder() .setExpression( sum( SumExpression.newBuilder() .addSum(variable("$score")) .addSum( expDecay( DecayParamsExpression.newBuilder() .setX( geoDistance( GeoDistance.newBuilder() .setOrigin( GeoPoint.newBuilder() .setLat(52.504043) .setLon(13.393236) .build()) .setTo("geo.location") .build())) .setScale(5000) .build())) .build())) .putDefaults( "geo.location", value( Map.of( "lat", value(48.137154), "lon", value(11.576124)))) .build())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Expression; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", prefetch: [\ new PrefetchQuery { Query = new float[] { 0.01f, 0.45f, 0.67f }, Limit = 100 },\ ], query: new Formula { Expression = new SumExpression { Sum = { "$score", FromExpDecay( new() { X = new GeoDistance { Origin = new GeoPoint { Lat = 52.504043, Lon = 13.393236 }, To = "geo.location", }, Scale = 5000, } ), }, }, Defaults = { ["geo.location"] = new Dictionary { ["lat"] = 48.137154, ["lon"] = 11.576124, }, }, } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Prefetch: []*qdrant.PrefetchQuery{ { Query: qdrant.NewQuery(0.2, 0.8), }, }, Query: qdrant.NewQueryFormula(&qdrant.Formula{ Expression: qdrant.NewExpressionSum(&qdrant.SumExpression{ Sum: []*qdrant.Expression{ qdrant.NewExpressionVariable("$score"), qdrant.NewExpressionExpDecay(&qdrant.DecayParamsExpression{ X: qdrant.NewExpressionGeoDistance(&qdrant.GeoDistance{ Origin: &qdrant.GeoPoint{ Lat: 52.504043, Lon: 13.393236, }, To: "geo.location", }), }), }, }), Defaults: qdrant.NewValueMap(map[string]any{ "geo.location": map[string]any{ "lat": 48.137154, "lon": 11.576124, }, }), }), }) ``` For all decay functions, there are these parameters available | Parameter | Default | Description | | --- | --- | --- | | `x` | N/A | The value to decay | | `target` | 0.0 | The value at which the decay will be at its peak. For distances it is usually set at 0.0, but can be set to any value. | | `scale` | 1.0 | The value at which the decay function will be equal to `midpoint`. This is in terms of `x` units, for example, if `x` is in meters, `scale` of 5000 means 5km. Must be a non-zero positive number | | `midpoint` | 0.5 | Output is `midpoint` when `x` equals `scale`. Must be in the range (0.0, 1.0), exclusive | The formulas for each decay function are as follows: Loading... [edit graph on](https://www.desmos.com/calculator/idv5hknwb1) scale target midpoint "x"x "y"y "a" squareda2 "a" Superscript, "b" , Baselineab 77 88 99 over÷ functions (( )) less than< greater than> 44 55 66 times× \| "a" \|\|a\| ,, less than or equal to≤ greater than or equal to≥ 11 22 33 negative− ABC StartRoot, , EndRoot piπ 00 .. equals= positive+ #### [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#decay-functions) Decay functions **`lin_decay`** (green), range: `[0, 1]` lin\_decay(x)=max(0,−(1−midpoint)scale⋅abs(x−target)+1) **`exp_decay`** (red), range: `(0, 1]` exp\_decay(x)=exp⁡(ln⁡(midpoint)scale⋅abs(x−target)) **`gauss_decay`** (purple), range: `(0, 1]` gauss\_decay(x)=exp⁡(ln⁡(midpoint)scale2⋅(x−target)2) ## [Anchor](https://qdrant.tech/documentation/concepts/hybrid-queries/\#grouping) Grouping _Available as of v1.11.0_ It is possible to group results by a certain field. This is useful when you have multiple points for the same item, and you want to avoid redundancy of the same item in the results. REST API ( [Schema](https://api.qdrant.tech/master/api-reference/search/query-points-groups)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query/groups { // Same as in the regular query API "query": [1.1], // Grouping parameters "group_by": "document_id", // Path of the field to group by "limit": 4, // Max amount of groups "group_size": 2 // Max amount of points per group } ``` ```python client.query_points_groups( collection_name="{collection_name}", # Same as in the regular query_points() API query=[1.1], # Grouping parameters group_by="document_id", # Path of the field to group by limit=4, # Max amount of groups group_size=2, # Max amount of points per group ) ``` ```typescript client.queryGroups("{collection_name}", { query: [1.1], group_by: "document_id", limit: 4, group_size: 2, }); ``` ```rust use qdrant_client::qdrant::QueryPointGroupsBuilder; client .query_groups( QueryPointGroupsBuilder::new("{collection_name}", "document_id") .query(vec![0.2, 0.1, 0.9, 0.7]) .group_size(2u64) .with_payload(true) .with_vectors(true) .limit(4u64), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.grpc.Points.SearchPointGroups; client.queryGroupsAsync( QueryPointGroups.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setGroupBy("document_id") .setLimit(4) .setGroupSize(2) .build()) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryGroupsAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, groupBy: "document_id", limit: 4, groupSize: 2 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.QueryGroups(context.Background(), &qdrant.QueryPointGroups{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), GroupBy: "document_id", GroupSize: qdrant.PtrOf(uint64(2)), }) ``` For more information on the `grouping` capabilities refer to the reference documentation for search with [grouping](https://qdrant.tech/documentation/concepts/search/#search-groups) and [lookup](https://qdrant.tech/documentation/concepts/search/#lookup-in-groups). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/hybrid-queries.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/hybrid-queries.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-93-lllmstxt|> ## hybrid-search - [Articles](https://qdrant.tech/articles/) - Hybrid Search Revamped - Building with Qdrant's Query API [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # Hybrid Search Revamped - Building with Qdrant's Query API Kacper Łukawski · July 25, 2024 ![Hybrid Search Revamped - Building with Qdrant's Query API](https://qdrant.tech/articles_data/hybrid-search/preview/title.jpg) It’s been over a year since we published the original article on how to build a hybrid search system with Qdrant. The idea was straightforward: combine the results from different search methods to improve retrieval quality. Back in 2023, you still needed to use an additional service to bring lexical search capabilities and combine all the intermediate results. Things have changed since then. Once we introduced support for sparse vectors, [the additional search service became obsolete](https://qdrant.tech/articles/sparse-vectors/), but you were still required to combine the results from different methods on your end. **Qdrant 1.10 introduces a new Query API that lets you build a search system by combining different search methods** **to improve retrieval quality**. Everything is now done on the server side, and you can focus on building the best search experience for your users. In this article, we will show you how to utilize the new [Query\\ API](https://qdrant.tech/documentation/concepts/search/#query-api) to build a hybrid search system. ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#introducing-the-new-query-api) Introducing the new Query API At Qdrant, we believe that vector search capabilities go well beyond a simple search for nearest neighbors. That’s why we provided separate methods for different search use cases, such as `search`, `recommend`, or `discover`. With the latest release, we are happy to introduce the new Query API, which combines all of these methods into a single endpoint and also supports creating nested multistage queries that can be used to build complex search pipelines. If you are an existing Qdrant user, you probably have a running search mechanism that you want to improve, whether sparse or dense. Doing any changes should be preceded by a proper evaluation of its effectiveness. ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#how-effective-is-your-search-system) How effective is your search system? None of the experiments makes sense if you don’t measure the quality. How else would you compare which method works better for your use case? The most common way of doing that is by using the standard metrics, such as `precision@k`, `MRR`, or `NDCG`. There are existing libraries, such as [ranx](https://amenra.github.io/ranx/), that can help you with that. We need to have the ground truth dataset to calculate any of these, but curating it is a separate task. ```python from ranx import Qrels, Run, evaluate --- # Qrels, or query relevance judgments, keep the ground truth data qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 }, "q_2": { "d_11": 6, "d_22": 1 } } --- # Runs are built from the search results run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7, "d_36": 0.6, "d_32": 0.5, "d_35": 0.4 }, "q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7, "d_36": 0.6, "d_22": 0.5, "d_35": 0.4 } } --- # We need to create both objects, and then we can evaluate the run against the qrels qrels = Qrels(qrels_dict) run = Run(run_dict) --- # Calculating the NDCG@5 metric is as simple as that evaluate(qrels, run, "ndcg@5") ``` ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#available-embedding-options-with-query-api) Available embedding options with Query API Support for multiple vectors per point is nothing new in Qdrant, but introducing the Query API makes it even more powerful. The 1.10 release supports the multivectors, allowing you to treat embedding lists as a single entity. There are many possible ways of utilizing this feature, and the most prominent one is the support for late interaction models, such as [ColBERT](https://qdrant.tech/documentation/fastembed/fastembed-colbert/). Instead of having a single embedding for each document or query, this family of models creates a separate one for each token of text. In the search process, the final score is calculated based on the interaction between the tokens of the query and the document. Contrary to cross-encoders, document embedding might be precomputed and stored in the database, which makes the search process much faster. If you are curious about the details, please check out [the article about ColBERT, written by our friends from Jina\\ AI](https://jina.ai/news/what-is-colbert-and-late-interaction-and-why-they-matter-in-search/). ![Late interaction](https://qdrant.tech/articles_data/hybrid-search/late-interaction.png) Besides multivectors, you can use regular dense and sparse vectors, and experiment with smaller data types to reduce memory use. Named vectors can help you store different dimensionalities of the embeddings, which is useful if you use multiple models to represent your data, or want to utilize the Matryoshka embeddings. ![Multiple vectors per point](https://qdrant.tech/articles_data/hybrid-search/multiple-vectors.png) There is no single way of building a hybrid search. The process of designing it is an exploratory exercise, where you need to test various setups and measure their effectiveness. Building a proper search experience is a complex task, and it’s better to keep it data-driven, not just rely on the intuition. ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#fusion-vs-reranking) Fusion vs reranking We can, distinguish two main approaches to building a hybrid search system: fusion and reranking. The former is about combining the results from different search methods, based solely on the scores returned by each method. That usually involves some normalization, as the scores returned by different methods might be in different ranges. After that, there is a formula that takes the relevancy measures and calculates the final score that we use later on to reorder the documents. Qdrant has built-in support for the Reciprocal Rank Fusion method, which is the de facto standard in the field. ![Fusion](https://qdrant.tech/articles_data/hybrid-search/fusion.png) Reranking, on the other hand, is about taking the results from different search methods and reordering them based on some additional processing using the content of the documents, not just the scores. This processing may rely on an additional neural model, such as a cross-encoder which would be inefficient enough to be used on the whole dataset. These methods are practically applicable only when used on a smaller subset of candidates returned by the faster search methods. Late interaction models, such as ColBERT, are way more efficient in this case, as they can be used to rerank the candidates without the need to access all the documents in the collection. ![Reranking](https://qdrant.tech/articles_data/hybrid-search/reranking.png) ### [Anchor](https://qdrant.tech/articles/hybrid-search/\#why-not-a-linear-combination) Why not a linear combination? It’s often proposed to use full-text and vector search scores to form a linear combination formula to rerank the results. So it goes like this: `final_score = 0.7 * vector_score + 0.3 * full_text_score` However, we didn’t even consider such a setup. Why? Those scores don’t make the problem linearly separable. We used the BM25 score along with cosine vector similarity to use both of them as points coordinates in 2-dimensional space. The chart shows how those points are distributed: ![A distribution of both Qdrant and BM25 scores mapped into 2D space.](https://qdrant.tech/articles_data/hybrid-search/linear-combination.png) _A distribution of both Qdrant and BM25 scores mapped into 2D space. It clearly shows relevant and non-relevant_ _objects are not linearly separable in that space, so using a linear combination of both scores won’t give us_ _a proper hybrid search._ Both relevant and non-relevant items are mixed. **None of the linear formulas would be able to distinguish** **between them.** Thus, that’s not the way to solve it. ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#building-a-hybrid-search-system-in-qdrant) Building a hybrid search system in Qdrant Ultimately, **any search mechanism might also be a reranking mechanism**. You can prefetch results with sparse vectors and then rerank them with the dense ones, or the other way around. Or, if you have Matryoshka embeddings, you can start with oversampling the candidates with the dense vectors of the lowest dimensionality and then gradually reduce the number of candidates by reranking them with the higher-dimensional embeddings. Nothing stops you from combining both fusion and reranking. Let’s go a step further and build a hybrid search mechanism that combines the results from the Matryoshka embeddings, dense vectors, and sparse vectors and then reranks them with the late interaction model. In the meantime, we will introduce additional reranking and fusion steps. ![Complex search pipeline](https://qdrant.tech/articles_data/hybrid-search/complex-search-pipeline.png) Our search pipeline consists of two branches, each of them responsible for retrieving a subset of documents that we eventually want to rerank with the late interaction model. Let’s connect to Qdrant first and then build the search pipeline. ```python from qdrant_client import QdrantClient, models client = QdrantClient("http://localhost:6333") ``` All the steps utilizing Matryoshka embeddings might be specified in the Query API as a nested structure: ```python --- # The first branch of our search pipeline retrieves 25 documents --- # using the Matryoshka embeddings with multistep retrieval. matryoshka_prefetch = models.Prefetch( prefetch=[\ models.Prefetch(\ prefetch=[\ # The first prefetch operation retrieves 100 documents\ # using the Matryoshka embeddings with the lowest\ # dimensionality of 64.\ models.Prefetch(\ query=[0.456, -0.789, ..., 0.239],\ using="matryoshka-64dim",\ limit=100,\ ),\ ],\ # Then, the retrieved documents are re-ranked using the\ # Matryoshka embeddings with the dimensionality of 128.\ query=[0.456, -0.789, ..., -0.789],\ using="matryoshka-128dim",\ limit=50,\ )\ ], # Finally, the results are re-ranked using the Matryoshka # embeddings with the dimensionality of 256. query=[0.456, -0.789, ..., 0.123], using="matryoshka-256dim", limit=25, ) ``` Similarly, we can build the second branch of our search pipeline, which retrieves the documents using the dense and sparse vectors and performs the fusion of them using the Reciprocal Rank Fusion method: ```python --- # The second branch of our search pipeline also retrieves 25 documents, --- # but uses the dense and sparse vectors, with their results combined --- # using the Reciprocal Rank Fusion. sparse_dense_rrf_prefetch = models.Prefetch( prefetch=[\ models.Prefetch(\ prefetch=[\ # The first prefetch operation retrieves 100 documents\ # using dense vectors using integer data type. Retrieval\ # is faster, but quality is lower.\ models.Prefetch(\ query=[7, 63, ..., 92],\ using="dense-uint8",\ limit=100,\ )\ ],\ # Integer-based embeddings are then re-ranked using the\ # float-based embeddings. Here we just want to retrieve\ # 25 documents.\ query=[-1.234, 0.762, ..., 1.532],\ using="dense",\ limit=25,\ ),\ # Here we just add another 25 documents using the sparse\ # vectors only.\ models.Prefetch(\ query=models.SparseVector(\ indices=[125, 9325, 58214],\ values=[-0.164, 0.229, 0.731],\ ),\ using="sparse",\ limit=25,\ ),\ ], # RRF is activated below, so there is no need to specify the # query vector here, as fusion is done on the scores of the # retrieved documents. query=models.FusionQuery( fusion=models.Fusion.RRF, ), ) ``` The second branch could have already been called hybrid, as it combines the results from the dense and sparse vectors with fusion. However, nothing stops us from building even more complex search pipelines. Here is how the target call to the Query API would look like in Python: ```python client.query_points( "my-collection", prefetch=[\ matryoshka_prefetch,\ sparse_dense_rrf_prefetch,\ ], # Finally rerank the results with the late interaction model. It only # considers the documents retrieved by all the prefetch operations above. # Return 10 final results. query=[\ [1.928, -0.654, ..., 0.213],\ [-1.197, 0.583, ..., 1.901],\ ...,\ [0.112, -1.473, ..., 1.786],\ ], using="late-interaction", with_payload=False, limit=10, ) ``` The options are endless, the new Query API gives you the flexibility to experiment with different setups. **You** **rarely need to build such a complex search pipeline**, but it’s good to know that you can do that if needed. ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#lessons-learned-multi-vector-representations) Lessons learned: multi-vector representations Many of you have already started building hybrid search systems and reached out to us with questions and feedback. We’ve seen many different approaches, however one recurring idea was to utilize **multi-vector representations with** **ColBERT-style models as a reranking step**, after retrieving candidates with single-vector dense and/or sparse methods. This reflects the latest trends in the field, as single-vector methods are still the most efficient, but multivectors capture the nuances of the text better. ![Reranking with late interaction models](https://qdrant.tech/articles_data/hybrid-search/late-interaction-reranking.png) Assuming you never use late interaction models for retrieval alone, but only for reranking, this setup comes with a hidden cost. By default, each configured dense vector of the collection will have a corresponding HNSW graph created. Even, if it is a multi-vector. ```python from qdrant_client import QdrantClient, models client = QdrantClient(...) client.create_collection( collection_name="my-collection", vectors_config={ "dense": models.VectorParams(...), "late-interaction": models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), ) }, sparse_vectors_config={ "sparse": models.SparseVectorParams(...) }, ) ``` Reranking will never use the created graph, as all the candidates are already retrieved. Multi-vector ranking will only be applied to the candidates retrieved by the previous steps, so no search operation is needed. HNSW becomes redundant while still the indexing process has to be performed, and in that case, it will be quite heavy. ColBERT-like models create hundreds of embeddings for each document, so the overhead is significant. **To avoid it, you can disable the HNSW** **graph creation for this kind of model**: ```python client.create_collection( collection_name="my-collection", vectors_config={ "dense": models.VectorParams(...), "late-interaction": models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), hnsw_config=models.HnswConfigDiff( m=0, # Disable HNSW graph creation ), ) }, sparse_vectors_config={ "sparse": models.SparseVectorParams(...) }, ) ``` You won’t notice any difference in the search performance, but the use of resources will be significantly lower when you upload the embeddings to the collection. ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#some-anecdotal-observations) Some anecdotal observations Neither of the algorithms performs best in all cases. In some cases, keyword-based search will be the winner and vice-versa. The following table shows some interesting examples we could find in the [WANDS](https://github.com/wayfair/WANDS) dataset during experimentation: | Query | BM25 Search | Vector Search | | --- | --- | --- | | cybersport desk | desk ❌ | gaming desk ✅ | | plates for icecream | "eat" plates on wood wall décor ❌ | alicyn 8.5 '' melamine dessert plate ✅ | | kitchen table with a thick board | craft kitchen acacia wood cutting board ❌ | industrial solid wood dining table ✅ | | wooden bedside table | 30 '' bedside table lamp ❌ | portable bedside end table ✅ | Also examples where keyword-based search did better: | Query | BM25 Search | Vector Search | | --- | --- | --- | | computer chair | vibrant computer task chair ✅ | office chair ❌ | | 64.2 inch console table | cervantez 64.2 '' console table ✅ | 69.5 '' console table ❌ | ## [Anchor](https://qdrant.tech/articles/hybrid-search/\#try-the-new-query-api-in-qdrant-110) Try the New Query API in Qdrant 1.10 The new Query API introduced in Qdrant 1.10 is a game-changer for building hybrid search systems. You don’t need any additional services to combine the results from different search methods, and you can even create more complex pipelines and serve them directly from Qdrant. Our webinar on _Building the Ultimate Hybrid Search_ takes you through the process of building a hybrid search system with Qdrant Query API. If you missed it, you can [watch the recording](https://www.youtube.com/watch?v=LAZOxqzceEU), or [check the notebooks](https://github.com/qdrant/workshop-ultimate-hybrid-search). How to Build the Ultimate Hybrid Search with Qdrant - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [How to Build the Ultimate Hybrid Search with Qdrant](https://www.youtube.com/watch?v=LAZOxqzceEU) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=LAZOxqzceEU&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 1:01:18 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=LAZOxqzceEU "Watch on YouTube") If you have any questions or need help with building your hybrid search system, don’t hesitate to reach out to us on [Discord](https://qdrant.to/discord). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/hybrid-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/hybrid-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-94-lllmstxt|> ## why-rust - [Articles](https://qdrant.tech/articles/) - Why Rust? [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Why Rust? Andre Bogus · May 11, 2023 ![Why Rust?](https://qdrant.tech/articles_data/why-rust/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/why-rust/\#building-qdrant-in-rust) Building Qdrant in Rust Looking at the [github repository](https://github.com/qdrant/qdrant), you can see that Qdrant is built in [Rust](https://rust-lang.org/). Other offerings may be written in C++, Go, Java or even Python. So why does Qdrant chose Rust? Our founder Andrey had built the first prototype in C++, but didn’t trust his command of the language to scale to a production system (to be frank, he likened it to cutting his leg off). He was well versed in Java and Scala and also knew some Python. However, he considered neither a good fit: **Java** is also more than 30 years old now. With a throughput-optimized VM it can often at least play in the same ball park as native services, and the tooling is phenomenal. Also portability is surprisingly good, although the GC is not suited for low-memory applications and will generally take good amount of RAM to deliver good performance. That said, the focus on throughput led to the dreaded GC pauses that cause latency spikes. Also the fat runtime incurs high start-up delays, which need to be worked around. **Scala** also builds on the JVM, although there is a native compiler, there was the question of compatibility. So Scala shared the limitations of Java, and although it has some nice high-level amenities (of which Java only recently copied a subset), it still doesn’t offer the same level of control over memory layout as, say, C++, so it is similarly disqualified. **Python**, being just a bit younger than Java, is ubiquitous in ML projects, mostly owing to its tooling (notably jupyter notebooks), being easy to learn and integration in most ML stacks. It doesn’t have a traditional garbage collector, opting for ubiquitous reference counting instead, which somewhat helps memory consumption. With that said, unless you only use it as glue code over high-perf modules, you may find yourself waiting for results. Also getting complex python services to perform stably under load is a serious technical challenge. ## [Anchor](https://qdrant.tech/articles/why-rust/\#into-the-unknown) Into the Unknown So Andrey looked around at what younger languages would fit the challenge. After some searching, two contenders emerged: Go and Rust. Knowing neither, Andrey consulted the docs, and found hinself intrigued by Rust with its promise of Systems Programming without pervasive memory unsafety. This early decision has been validated time and again. When first learning Rust, the compiler’s error messages are very helpful (and have only improved in the meantime). It’s easy to keep memory profile low when one doesn’t have to wrestle a garbage collector and has complete control over stack and heap. Apart from the much advertised memory safety, many footguns one can run into when writing C++ have been meticulously designed out. And it’s much easier to parallelize a task if one doesn’t have to fear data races. With Qdrant written in Rust, we can offer cloud services that don’t keep us awake at night, thanks to Rust’s famed robustness. A current qdrant docker container comes in at just a bit over 50MB — try that for size. As for performance… have some [benchmarks](https://qdrant.tech/benchmarks/). And we don’t have to compromise on ergonomics either, not for us nor for our users. Of course, there are downsides: Rust compile times are usually similar to C++’s, and though the learning curve has been considerably softened in the last years, it’s still no match for easy-entry languages like Python or Go. But learning it is a one-time cost. Contrast this with Go, where you may find [the apparent simplicity is only skin-deep](https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-ride). ## [Anchor](https://qdrant.tech/articles/why-rust/\#smooth-is-fast) Smooth is Fast The complexity of the type system pays large dividends in bugs that didn’t even make it to a commit. The ecosystem for web services is also already quite advanced, perhaps not at the same point as Java, but certainly matching or outcompeting Go. Some people may think that the strict nature of Rust will slow down development, which is true only insofar as it won’t let you cut any corners. However, experience has conclusively shown that this is a net win. In fact, Rust lets us [ride the wall](https://the-race.com/nascar/bizarre-wall-riding-move-puts-chastain-into-nascar-folklore/), which makes us faster, not slower. The job market for Rust programmers is certainly not as big as that for Java or Python programmers, but the language has finally reached the mainstream, and we don’t have any problems getting and retaining top talent. And being an open source project, when we get contributions, we don’t have to check for a wide variety of errors that Rust already rules out. ## [Anchor](https://qdrant.tech/articles/why-rust/\#in-rust-we-trust) In Rust We Trust Finally, the Rust community is a very friendly bunch, and we are delighted to be part of that. And we don’t seem to be alone. Most large IT companies (notably Amazon, Google, Huawei, Meta and Microsoft) have already started investing in Rust. It’s in the Windows font system already and in the process of coming to the Linux kernel (build support has already been included). In machine learning applications, Rust has been tried and proven by the likes of Aleph Alpha and Huggingface, among many others. To sum up, choosing Rust was a lucky guess that has brought huge benefits to Qdrant. Rust continues to be our not-so-secret weapon. ### [Anchor](https://qdrant.tech/articles/why-rust/\#key-takeaways) Key Takeaways: - **Rust’s Advantages for Qdrant:** Rust provides memory safety and control without a garbage collector, which is crucial for Qdrant’s high-performance cloud services. - **Low Overhead:** Qdrant’s Rust-based system offers efficiency, with small Docker container sizes and robust performance benchmarks. - **Complexity vs. Simplicity:** Rust’s strict type system reduces bugs early in development, making it faster in the long run despite initial learning curves. - **Adoption by Major Players:** Large tech companies like Amazon, Google, and Microsoft are embracing Rust, further validating Qdrant’s choice. - **Community and Talent:** The supportive Rust community and increasing availability of Rust developers make it easier for Qdrant to grow and innovate. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/why-rust.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/why-rust.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-95-lllmstxt|> ## cloud-premium - [Documentation](https://qdrant.tech/documentation/) - Premium Tier --- # [Anchor](https://qdrant.tech/documentation/cloud-premium/\#qdrant-cloud-premium-tier) Qdrant Cloud Premium Tier Qdrant Cloud offers an optional premium tier for customers who require additional features and better SLA support levels. The premium tier includes: - **24/7 Support**: Our support team is available around the clock to help you with any issues you may encounter (compared to 10x5 in standard). - **Shorter Response Times**: Premium customers receive priority support and can expect faster response times, with shorter SLAs. - **99.9% Uptime SLA**: We guarantee 99.9% uptime for your Qdrant Cloud clusters (compared to 99.5% in standard). - **Single Sign-On (SSO)**: Premium customers can use their existing SSO provider to manage access to Qdrant Cloud. - **VPC Private Links**: Premium customers can connect their Qdrant Cloud clusters to their VPCs using private links (AWS only). - **Storage encryption with shared keys**: Premium customers can encrypt their data at rest using their own keys (AWS only). Please refer to the [Qdrant Cloud SLA](https://qdrant.to/sla/) for a detailed definition on uptime and support SLAs. If you are interested in switching to Qdrant Cloud Premium, please [contact us](https://qdrant.tech/contact-us/) for more information. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-premium.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-premium.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-96-lllmstxt|> ## graphrag-qdrant-neo4j - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - GraphRAG with Qdrant and Neo4j --- # [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#build-a-graphrag-agent-with-neo4j-and-qdrant) Build a GraphRAG Agent with Neo4j and Qdrant ![image0](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/image0.png) | Time: 30 min | Level: Intermediate | Output: [GitHub](https://github.com/qdrant/examples/blob/master/graphrag_neo4j/graphrag.py) | | --- | --- | --- | To make Artificial Intelligence (AI) systems more intelligent and reliable, we face a paradox: Large Language Models (LLMs) possess remarkable reasoning capabilities, yet they struggle to connect information in ways humans find intuitive. While groundbreaking, Retrieval-Augmented Generation (RAG) approaches often fall short when tasked with complex information synthesis. When asked to connect disparate pieces of information or understand holistic concepts across large documents, these systems frequently miss crucial connections that would be obvious to human experts. To solve these problems, Microsoft introduced **GraphRAG,** which uses Knowledge Graphs (KGs) instead of vectors as a context for LLMs. GraphRAG depends mainly on LLMs for creating KGs and querying them. However, this reliance on LLMs can lead to many problems. We will address these challenges by combining vector databases with graph-based databases. This tutorial will demonstrate how to build a GraphRAG system with vector search using Neo4j and Qdrant. | Additional Materials | | --- | | This advanced tutorial is based on our original integration doc: [**Neo4j - Qdrant Integration**](https://qdrant.tech/documentation/frameworks/neo4j-graphrag/) | | The output for this tutorial is in our GitHub Examples repo: [**Neo4j - Qdrant Agent in Python**](https://github.com/qdrant/examples/blob/master/graphrag_neo4j/graphrag.py) | ## [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#watch-the-video) Watch the Video GraphRAG with Qdrant & Neo4j: Combining Vector Search and Knowledge Graphs - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [GraphRAG with Qdrant & Neo4j: Combining Vector Search and Knowledge Graphs](https://www.youtube.com/watch?v=o9pszzRuyjo) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=o9pszzRuyjo&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 11:11 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=o9pszzRuyjo "Watch on YouTube") --- # [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#rag--its-challenges) RAG & Its Challenges [RAG](https://qdrant.tech/rag/) combines retrieval-based and generative AI to enhance LLMs with relevant, up-to-date information from a knowledge base, like a vector database. However, RAG faces several challenges: 1. **Understanding Context:** Models may misinterpret queries, particularly when the context is complex or ambiguous, leading to incorrect or irrelevant answers. 2. **Balancing Similarity vs. Relevance:** RAG systems can struggle to ensure that retrieved information is similar and contextually relevant. 3. **Answer Completeness:** Traditional RAGs might not be able to capture all relevant details for complex queries that require LLMs to find relationships in the context that are not explicitly present. --- # [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#introduction-to-graphrag) Introduction to GraphRAG Unlike RAG, which typically relies on document retrieval, GraphRAG builds knowledge graphs (KGs) to capture entities and their relationships. For datasets or use cases that demand human-level intelligence from an AI system, GraphRAG offers a promising solution: - It can follow chains of relationships to answer complex queries, making it suitable for better reasoning beyond simple document retrieval. - The graph structure allows a deeper understanding of the context, leading to more accurate and relevant responses. The workflow of GraphRAG is as follows: 1. The LLM analyzes the dataset to identify entities (people, places, organizations) and their relationships, creating a comprehensive knowledge graph where entities are nodes and their connections form edges. 2. A bottom-up clustering algorithm organizes the KG into hierarchical semantic groups. This creates meaningful segments of related information, enabling understanding at different levels of abstraction. 3. GraphRAG uses both the KG and semantic clusters to select a relevant context for the LLM when answering queries. ![image2](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/image2.png) [Fig](https://arxiv.org/pdf/2404.16130) 1: A Complete Picture of GraphRAG Ingestion and Retrieval ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#challenges-of-graphrag) Challenges of GraphRAG Despite its advantages, the LLM-centric GraphRAG approach faces several challenges: - **KG Construction with LLMs:** Since the LLM is responsible for constructing the knowledge graph, there are risks such as inconsistencies, propagation of biases or errors, and lack of control over the ontology used. However, we used a LLM to extract the ontology in our implementation. - **Querying KG with LLMs:** Once the graph is constructed, an LLM translates the human query into Cypher (Neo4j’s declarative query language). However, crafting complex queries in Cypher may result in inaccurate outcomes. - **Scalability & Cost Consideration:** To be practical, applications must be both scalable and cost-effective. Relying on LLMs increases costs and decreases scalability, as they are used every time data is added, queried, or generated. To address these challenges, a more controlled and structured knowledge representation system may be required for GraphRAG to function optimally at scale. --- # [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#architecture-overview) Architecture Overview The architecture has two main components: **Ingestion** and **Retrieval & Generation**. Ingestion processes raw data into structured knowledge and vector representations, while Retrieval and Generation enable efficient querying and response generation. This process is divided into two steps: **Ingestion**, where data is prepared and stored, and **Retrieval and Generation**, where the prepared data is queried and utilized. Let’s start with Ingestion. ## [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#ingestion) Ingestion The GraphRAG ingestion pipeline combines a **Graph Database** and a **Vector Database** to improve RAG workflows. ![image1](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/image1.png) Fig 2: Overview of Ingestion Pipeline Let’s break it down: 1. **Raw Data:** Serves as the foundation, comprising unstructured or structured content. 2. **Ontology Creation:** An **LLM** processes the raw data into an **ontology**, structuring entities, relationships, and hierarchies. Better approaches exist to extracting more structured information from raw data, like using NER to identify the names of people, organizations, and places. Unlike LLMs, this method creates. 3. **Graph Database:** The ontology is stored in a **Graph database** to capture complex relationships. 4. **Vector Embeddings:** An **Embedding model** converts the raw data into high-dimensional vectors capturing semantic similarities. 5. **Vector Database:** These embeddings are stored in a **Vector database** for similarity-based retrieval. 6. **Database Interlinking:** The **Graph database** (e.g., Neo4j) and **Vector database** (e.g., Qdrant) share unique IDs, enabling cross-referencing between ontology-based and vector-based results. ## [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#retrieval--generation) Retrieval & Generation The **Retrieval and Generation** process is designed to handle user queries by leveraging both semantic search and graph-based context extraction. ![image3](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/image3.png) Fig 3: Overview of Retrieval and Generation Pipeline The architecture can be broken down into the following steps: 1. **Query Vectorization:** An embedding model converts The user query into a high-dimensional vector. 2. **Semantic Search:** The vector performs a similarity-based search in the **Vector database**, retrieving relevant documents or entries. 3. **ID Extraction:** Extracted IDs from the semantic search results are used to query the **Graph database**. 4. **Graph Context Retrieval:** The **Graph database** provides contextual information, including relationships and entities linked to the extracted IDs. 5. **Response Generation:** The context retrieved from the graph is passed to an LLM to generate a final response. 6. **Results:** The generated response is returned to the user. This architecture combines the strengths of both databases: 1. **Semantic Search with Vector Database:** The user query is first processed semantically to identify the most relevant data points without needing explicit keyword matches. 2. **Contextual Expansion with Graph Database:** IDs or entities retrieved from the vector database query the graph database for detailed relationships, enriching the retrieved data with structured context. 3. **Enhanced Generation:** The architecture combines semantic relevance (from the vector database) and graph-based context to enable the LLM to generate more informed, accurate, and contextually rich responses. --- # [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#implementation) Implementation We’ll walk through a complete pipeline that ingests data into Neo4j and Qdrant, retrieves relevant data, and generates responses using an LLM based on the retrieved graph context. The main components of this pipeline include data ingestion (to Neo4j and Qdrant), retrieval, and generation steps. ## [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#prerequisites) Prerequisites These are the tutorial prerequisites, which are divided into setup, imports, and initialization of the two DBs. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#setup) Setup Let’s start with setting up instances with Qdrant and Neo4j. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#qdrant-setup) Qdrant Setup To create a Qdrant instance, you can use their **managed service** (Qdrant Cloud) or set up a self-hosted cluster. For simplicity, we will use Qdrant cloud: - Go to [Qdrant Cloud](https://qdrant.tech/) and sign up or log in. - Once logged in, click on **Create New Cluster**. - Follow the on-screen instructions to create your cluster. - Once your cluster is created, you’ll be given a **Cluster URL** and **API Key**, which you will use in the client to interact with Qdrant. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#neo4j-setup) Neo4j Setup To set up a Neo4j instance, you can use **Neo4j Aura** (cloud service) or host it yourself. We will use Neo4j Aura: - Go to Neo4j Aura and sign up/log in. - After setting up, an instance will be created if it is the first time. - After the database is set up, you’ll receive a **connection URI**, **username**, and **password**. We can add the following in the .env file for security purposes. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#imports) Imports First, we import the required libraries for working with Neo4j, Qdrant, OpenAI, and other utility functions. ```python from neo4j import GraphDatabase from qdrant_client import QdrantClient, models from dotenv import load_dotenv from pydantic import BaseModel from openai import OpenAI from collections import defaultdict from neo4j_graphrag.retrievers import QdrantNeo4jRetriever import uuid import os ``` * * * - **Neo4j:** Used to store and query the graph database. - **Qdrant:** A vector database used for semantic similarity search. - **dotenv:** Loads environment variables for credentials and API keys. - **Pydantic:** Ensures data is structured properly when interacting with the graph data. - **OpenAI:** Interfaces with the OpenAI API to generate responses and embeddings. - **neo4j\_graphrag:** A helper package to retrieve data from both Qdrant and Neo4j. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#setting-up-environment-variables) Setting Up Environment Variables Before initializing the clients, we load the necessary credentials from environment variables. ```python --- # Get credentials from environment variables qdrant_key = os.getenv("QDRANT_KEY") qdrant_url = os.getenv("QDRANT_URL") neo4j_uri = os.getenv("NEO4J_URI") neo4j_username = os.getenv("NEO4J_USERNAME") neo4j_password = os.getenv("NEO4J_PASSWORD") openai_key = os.getenv("OPENAI_API_KEY") ``` * * * This ensures that sensitive information (like API keys and database credentials) is securely stored in environment variables. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#initializing-neo4j-and-qdrant-clients) Initializing Neo4j and Qdrant Clients Now, we initialize the Neo4j and Qdrant clients using the credentials. ```python --- # Initialize Neo4j driver neo4j_driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_username, neo4j_password)) --- # Initialize Qdrant client qdrant_client = QdrantClient( url=qdrant_url, api_key=qdrant_key ) ``` * * * - **Neo4j:** We set up a connection to the Neo4j graph database. - **Qdrant:** We initialize the connection to the Qdrant vector store. This will connect with Neo4j and Qdrant, and we can now start with Ingestion. ## [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#ingestion-1) Ingestion We will follow the workflow of the ingestion pipeline presented in the architecture section. Let’s examine it implementation-wise. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#defining-output-parser) Defining Output Parser The single and GraphComponents classes structure the LLM’s responses into a usable format. ```python class single(BaseModel): node: str target_node: str relationship: str class GraphComponents(BaseModel): graph: list[single] ``` * * * These classes help ensure that data from the OpenAI LLM is parsed correctly into the graph components (nodes and relationships). ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#defining-openai-client-and-llm-parser-function) Defining OpenAI Client and LLM Parser Function We now initialize the OpenAI client and define a function to send prompts to the LLM and parse its responses. ```python client = OpenAI() def openai_llm_parser(prompt): completion = client.chat.completions.create( model="gpt-4o-2024-08-06", response_format={"type": "json_object"}, messages=[\ {\ "role": "system",\ "content":\ \ """ You are a precise graph relationship extractor. Extract all\ relationships from the text and format them as a JSON object\ with this exact structure:\ {\ "graph": [\ {"node": "Person/Entity",\ "target_node": "Related Entity",\ "relationship": "Type of Relationship"},\ ...more relationships...\ ]\ }\ Include ALL relationships mentioned in the text, including\ implicit ones. Be thorough and precise. """\ \ },\ {\ "role": "user",\ "content": prompt\ }\ ] ) return GraphComponents.model_validate_json(completion.choices[0].message.content) ``` * * * This function sends a prompt to the LLM, asking it to extract graph components (nodes and relationships) from the provided text. The response is parsed into structured graph data. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#extracting-graph-components) Extracting Graph Components The function extract\_graph\_components processes raw data, extracting the nodes and relationships as graph components. ```python def extract_graph_components(raw_data): prompt = f"Extract nodes and relationships from the following text:\n{raw_data}" parsed_response = openai_llm_parser(prompt) # Assuming this returns a list of dictionaries parsed_response = parsed_response.graph # Assuming the 'graph' structure is a key in the parsed response nodes = {} relationships = [] for entry in parsed_response: node = entry.node target_node = entry.target_node # Get target node if available relationship = entry.relationship # Get relationship if available # Add nodes to the dictionary with a unique ID if node not in nodes: nodes[node] = str(uuid.uuid4()) if target_node and target_node not in nodes: nodes[target_node] = str(uuid.uuid4()) # Add relationship to the relationships list with node IDs if target_node and relationship: relationships.append({ "source": nodes[node], "target": nodes[target_node], "type": relationship }) return nodes, relationships ``` * * * This function takes raw data, uses the LLM to parse it into graph components, and then assigns unique IDs to nodes and relationships. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#ingesting-data-to-neo4j) Ingesting Data to Neo4j The function ingest\_to\_neo4j ingests the extracted graph data (nodes and relationships) into Neo4j. ```python def ingest_to_neo4j(nodes, relationships): """ Ingest nodes and relationships into Neo4j. """ with neo4j_driver.session() as session: # Create nodes in Neo4j for name, node_id in nodes.items(): session.run( "CREATE (n:Entity {id: $id, name: $name})", id=node_id, name=name ) # Create relationships in Neo4j for relationship in relationships: session.run( "MATCH (a:Entity {id: $source_id}), (b:Entity {id: $target_id}) " "CREATE (a)-[:RELATIONSHIP {type: $type}]->(b)", source_id=relationship["source"], target_id=relationship["target"], type=relationship["type"] ) return nodes ``` * * * Here, we create nodes and relationships in the Neo4j graph database. Nodes are entities, and relationships link these entities. This will ingest the data into Neo4j and on a sample dataset it looks something like this: ![image4](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/image4.png) Fig 4: Visualization of the Knowledge Graph Let’s explore how to map nodes with their IDs and integrate this information, along with vectors, into Qdrant. First, let’s create a Qdrant collection. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#creating-qdrant-collection) Creating Qdrant Collection You can create a collection once you have set up your Qdrant instance. A collection in Qdrant holds vectors for search and retrieval. ```python def create_collection(client, collection_name, vector_dimension): ``` try: ```python --- # Try to fetch the collection status try: collection_info = client.get_collection(collection_name) print(f"Skipping creating collection; '{collection_name}' already exists.") except Exception as e: # If collection does not exist, an error will be thrown, so we create the collection if 'Not found: Collection' in str(e): print(f"Collection '{collection_name}' not found. Creating it now...") client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams(size=vector_dimension, distance=models.Distance.COSINE) ) print(f"Collection '{collection_name}' created successfully.") else: print(f"Error while checking collection: {e}") ``` * * * - **Qdrant Client:** The QdrantClient is used to connect to the Qdrant instance. - **Creating Collection:** The create\_collection function checks if a collection exists. If not, it creates one with a specified vector dimension and distance metric (cosine similarity in this case). ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#generating-embeddings) Generating Embeddings Next, we define a function that generates embeddings for text using OpenAI’s API. ```python def openai_embeddings(text): response = client.embeddings.create( input=text, model="text-embedding-3-small" ) return response.data[0].embedding ``` * * * This function uses OpenAI’s embedding model to transform input text into vector representations. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#ingesting-into-qdrant) Ingesting into Qdrant Let’s ingest the data into the vector database. ```python def ingest_to_qdrant(collection_name, raw_data, node_id_mapping): embeddings = [openai_embeddings(paragraph) for paragraph in raw_data.split("\n")] qdrant_client.upsert( collection_name=collection_name, points=[\ {\ "id": str(uuid.uuid4()),\ "vector": embedding,\ "payload": {"id": node_id}\ }\ for node_id, embedding in zip(node_id_mapping.values(), embeddings)\ ] ) ``` * * * The ingest\_to\_qdrant function generates embeddings for each paragraph in the raw data and stores them in a Qdrant collection. It associates each embedding with a unique ID and its corresponding node ID from the node\_id\_mapping dictionary, ensuring proper linkage for later retrieval. * * * ## [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#retrieval--generation-1) Retrieval & Generation In this section, we will create the retrieval and generation engine for the system. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#building-a-retriever) Building a Retriever The retriever integrates vector search and graph data, enabling semantic similarity searches with Qdrant and fetching relevant graph data from Neo4j. This enriches the RAG process and allows for more informed responses. ```python def retriever_search(neo4j_driver, qdrant_client, collection_name, query): retriever = QdrantNeo4jRetriever( driver=neo4j_driver, client=qdrant_client, collection_name=collection_name, id_property_external="id", id_property_neo4j="id", ) results = retriever.search(query_vector=openai_embeddings(query), top_k=5) return results ``` * * * The [QdrantNeo4jRetriever](https://qdrant.tech/documentation/frameworks/neo4j-graphrag/) handles both vector search and graph data fetching, combining Qdrant for vector-based retrieval and Neo4j for graph-based queries. **Vector Search:** - **`qdrant_client`** connects to Qdrant for efficient vector similarity search. - **`collection_name`** specifies where vectors are stored. - **`id_property_external="id"`** maps the external entity’s ID for retrieval. **Graph Fetching:** - **`neo4j_driver`** connects to Neo4j for querying graph data. - **`id_property_neo4j="id"`** ensures the entity IDs from Qdrant match the graph nodes in Neo4j. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#querying-neo4j-for-related-graph-data) Querying Neo4j for Related Graph Data We need to fetch subgraph data from a Neo4j database based on specific entity IDs after the retriever has provided the relevant IDs. ```python def fetch_related_graph(neo4j_client, entity_ids): query = """ MATCH (e:Entity)-[r1]-(n1)-[r2]-(n2) WHERE e.id IN $entity_ids RETURN e, r1 as r, n1 as related, r2, n2 UNION MATCH (e:Entity)-[r]-(related) WHERE e.id IN $entity_ids RETURN e, r, related, null as r2, null as n2 """ with neo4j_client.session() as session: result = session.run(query, entity_ids=entity_ids) subgraph = [] for record in result: subgraph.append({ "entity": record["e"], "relationship": record["r"], "related_node": record["related"] }) if record["r2"] and record["n2"]: subgraph.append({ "entity": record["related"], "relationship": record["r2"], "related_node": record["n2"] }) return subgraph ``` * * * The function fetch\_related\_graph takes in a Neo4j client and a list of entity\_ids. It runs a Cypher query to find related nodes (entities) and their relationships based on the given entity IDs. The query matches entities (e:Entity) and finds related nodes through any relationship \[r\]. The function returns a list of subgraph data, where each record contains the entity, relationship, and related\_node. This subgraph is essential for generating context to answer user queries. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#setting-up-the-graph-context) Setting up the Graph Context The second part of the implementation involves preparing a graph context. We’ll fetch relevant subgraph data from a Neo4j database and format it for the model. Let’s break it down. ```python def format_graph_context(subgraph): nodes = set() edges = [] for entry in subgraph: entity = entry["entity"] related = entry["related_node"] relationship = entry["relationship"] nodes.add(entity["name"]) nodes.add(related["name"]) edges.append(f"{entity['name']} {relationship['type']} {related['name']}") return {"nodes": list(nodes), "edges": edges} ``` * * * The function format\_graph\_context processes a subgraph returned by a Neo4j query. It extracts the graph’s entities (nodes) and relationships (edges). The nodes set ensures each entity is added only once. The edges list captures the relationships in a readable format: _Entity1 relationship Entity2_. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#integrating-with-the-llm) Integrating with the LLM Now that we have the graph context, we need to generate a prompt for a language model like GPT-4. This is where the core of the Retrieval-Augmented Generation (RAG) happens — we combine the graph data and the user query into a comprehensive prompt for the model. ```python def graphRAG_run(graph_context, user_query): nodes_str = ", ".join(graph_context["nodes"]) edges_str = "; ".join(graph_context["edges"]) prompt = f""" You are an intelligent assistant with access to the following knowledge graph: Nodes: {nodes_str} Edges: {edges_str} Using this graph, Answer the following question: User Query: "{user_query}" """ try: response = client.chat.completions.create( model="gpt-4", messages=[\ {"role": "system", "content": "Provide the answer for the following question:"},\ {"role": "user", "content": prompt}\ ] ) return response.choices[0].message except Exception as e: return f"Error querying LLM: {str(e)}" ``` * * * The function graphRAG\_run takes the graph context (nodes and edges) and the user query, combining them into a structured prompt for the LLM. The nodes and edges are formatted as readable strings to form part of the LLM input. The LLM is then queried with the generated prompt, asking it to refine the user query using the graph context and provide an answer. If the model successfully generates a response, it returns the answer. ### [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#end-to-end-pipeline) End-to-End Pipeline Finally, let’s integrate everything into an end-to-end pipeline where we ingest some sample data, run the retrieval process, and query the language model. ```python if __name__ == "__main__": print("Script started") print("Loading environment variables...") load_dotenv('.env.local') print("Environment variables loaded") print("Initializing clients...") neo4j_driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_username, neo4j_password)) qdrant_client = QdrantClient( url=qdrant_url, api_key=qdrant_key ) print("Clients initialized") print("Creating collection...") collection_name = "graphRAGstoreds" vector_dimension = 1536 create_collection(qdrant_client, collection_name, vector_dimension) print("Collection created/verified") print("Extracting graph components...") raw_data = """Alice is a data scientist at TechCorp's Seattle office. Bob and Carol collaborate on the Alpha project. Carol transferred to the New York office last year. Dave mentors both Alice and Bob. TechCorp's headquarters is in Seattle. Carol leads the East Coast team. Dave started his career in Seattle. The Alpha project is managed from New York. Alice previously worked with Carol at DataCo. Bob joined the team after Dave's recommendation. Eve runs the West Coast operations from Seattle. Frank works with Carol on client relations. The New York office expanded under Carol's leadership. Dave's team spans multiple locations. Alice visits Seattle monthly for team meetings. Bob's expertise is crucial for the Alpha project. Carol implemented new processes in New York. Eve and Dave collaborated on previous projects. Frank reports to the New York office. TechCorp's main AI research is in Seattle. The Alpha project revolutionized East Coast operations. Dave oversees projects in both offices. Bob's contributions are mainly remote. Carol's team grew significantly after moving to New York. Seattle remains the technology hub for TechCorp.""" nodes, relationships = extract_graph_components(raw_data) print("Nodes:", nodes) print("Relationships:", relationships) print("Ingesting to Neo4j...") node_id_mapping = ingest_to_neo4j(nodes, relationships) print("Neo4j ingestion complete") print("Ingesting to Qdrant...") ingest_to_qdrant(collection_name, raw_data, node_id_mapping) print("Qdrant ingestion complete") query = "How is Bob connected to New York?" print("Starting retriever search...") retriever_result = retriever_search(neo4j_driver, qdrant_client, collection_name, query) print("Retriever results:", retriever_result) print("Extracting entity IDs...") entity_ids = [item.content.split("'id': '")[1].split("'")[0] for item in retriever_result.items] print("Entity IDs:", entity_ids) print("Fetching related graph...") subgraph = fetch_related_graph(neo4j_driver, entity_ids) print("Subgraph:", subgraph) print("Formatting graph context...") graph_context = format_graph_context(subgraph) print("Graph context:", graph_context) print("Running GraphRAG...") answer = graphRAG_run(graph_context, query) print("Final Answer:", answer) ``` * * * Here’s what’s happening: - First, the user query is defined (“How is Bob connected to New York?”). - The QdrantNeo4jRetriever searches for related entities in the Qdrant vector database based on the user query’s embedding. It retrieves the top 5 results (top\_k=5). - The entity\_ids are extracted from the retriever result. - The fetch\_related\_graph function retrieves related entities and their relationships from the Neo4j database. - The format\_graph\_context function prepares the graph data in a format the LLM can understand. - Finally, the graphRAG\_run function is called to generate and query the language model, producing an answer based on the retrieved graph context. With this, we have successfully created GraphRAG, a system capable of capturing complex relationships and delivering improved performance compared to the baseline RAG approach. --- # [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#advantages-of-qdrant--neo4j-graphrag) Advantages of Qdrant + Neo4j GraphRAG Combining Qdrant with Neo4j in a GraphRAG architecture offers several compelling advantages, particularly regarding recall and precision combo, contextual understanding, adaptability to complex queries, and better cost and scalability. 1. **Improved Recall and Precision:** By leveraging Qdrant, a highly efficient vector search engine, alongside Neo4j’s robust graph database, the system benefits from both semantic search and relationship-based retrieval. Qdrant identifies relevant vectors and captures the similarity between queries and stored data. At the same time, Neo4j adds a layer of connectivity through its graph structure, ensuring that relevant and contextually linked information is retrieved. This combination improves recall (retrieving a broader set of relevant results) and precision (delivering more accurate and contextually relevant results), addressing a common challenge in traditional retrieval-based AI systems. 2. **Enhanced Contextual Understanding:** Neo4j enhances contextual understanding by representing information as a graph, where entities and their relationships are naturally modeled. When integrated with Qdrant, the system can retrieve similar items based on vector embeddings and those that fit within the desired relational context, leading to more nuanced and meaningful responses. 3. **Adaptability to Complex Queries:** Combining Qdrant and Neo4j makes the system highly adaptable to complex queries. While Qdrant handles the vector search for relevant data, Neo4j’s graph capabilities enable sophisticated querying through relationships. This allows for multi-hop reasoning and handling complex, structured queries that would be challenging for traditional search engines. 4. **Better Cost & Scalability:** GraphRAG, on its own, demands significant resources, as it relies on LLMs to construct and query knowledge graphs. It also employs clustering algorithms to create semantic clusters for local searches. These can hinder scalability and increase costs. Qdrant addresses the issue of local search through vector search, while Neo4j’s knowledge graph is queried for more precise answers, enhancing both efficiency and accuracy. Furthermore, instead of using an LLM, Named Entity Recognition (NER)-based techniques can reduce the cost further, but it depends mainly on the dataset. --- # [Anchor](https://qdrant.tech/documentation/examples/graphrag-qdrant-neo4j/\#conclusion) Conclusion GraphRAG with Neo4j and Qdrant marks an important step forward in retrieval-augmented generation. This hybrid approach delivers significant advantages by combining vector search and graph databases. Qdrant’s semantic search capabilities enhance recall accuracy, while Neo4j’s relationship modeling provides deeper context understanding. The implementation template we’ve explored offers a foundation for your projects. You can adapt and customize it based on your specific needs, whether for document analysis, knowledge management, or other information retrieval tasks. As AI systems evolve, this combination of technologies shows how we can build smarter, more efficient solutions. We encourage you to experiment with this approach and discover how it can enhance your applications. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/graphrag-qdrant-neo4j.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/graphrag-qdrant-neo4j.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-97-lllmstxt|> ## payload - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Payload --- # [Anchor](https://qdrant.tech/documentation/concepts/payload/\#payload) Payload One of the significant features of Qdrant is the ability to store additional information along with vectors. This information is called `payload` in Qdrant terminology. Qdrant allows you to store any information that can be represented using JSON. Here is an example of a typical payload: ```json { "name": "jacket", "colors": ["red", "blue"], "count": 10, "price": 11.99, "locations": [\ {\ "lon": 52.5200,\ "lat": 13.4050\ }\ ], "reviews": [\ {\ "user": "alice",\ "score": 4\ },\ {\ "user": "bob",\ "score": 5\ }\ ] } ``` ## [Anchor](https://qdrant.tech/documentation/concepts/payload/\#payload-types) Payload types In addition to storing payloads, Qdrant also allows you search based on certain kinds of values. This feature is implemented as additional filters during the search and will enable you to incorporate custom logic on top of semantic similarity. During the filtering, Qdrant will check the conditions over those values that match the type of the filtering condition. If the stored value type does not fit the filtering condition - it will be considered not satisfied. For example, you will get an empty output if you apply the [range condition](https://qdrant.tech/documentation/concepts/filtering/#range) on the string data. However, arrays (multiple values of the same type) are treated a little bit different. When we apply a filter to an array, it will succeed if at least one of the values inside the array meets the condition. The filtering process is discussed in detail in the section [Filtering](https://qdrant.tech/documentation/concepts/filtering/). Let’s look at the data types that Qdrant supports for searching: ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#integer) Integer `integer` \- 64-bit integer in the range from `-9223372036854775808` to `9223372036854775807`. Example of single and multiple `integer` values: ```json { "count": 10, "sizes": [35, 36, 38] } ``` ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#float) Float `float` \- 64-bit floating point number. Example of single and multiple `float` values: ```json { "price": 11.99, "ratings": [9.1, 9.2, 9.4] } ``` ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#bool) Bool Bool - binary value. Equals to `true` or `false`. Example of single and multiple `bool` values: ```json { "is_delivered": true, "responses": [false, false, true, false] } ``` ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#keyword) Keyword `keyword` \- string value. Example of single and multiple `keyword` values: ```json { "name": "Alice", "friends": [\ "bob",\ "eva",\ "jack"\ ] } ``` ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#geo) Geo `geo` is used to represent geographical coordinates. Example of single and multiple `geo` values: ```json { "location": { "lon": 52.5200, "lat": 13.4050 }, "cities": [\ {\ "lon": 51.5072,\ "lat": 0.1276\ },\ {\ "lon": 40.7128,\ "lat": 74.0060\ }\ ] } ``` Coordinate should be described as an object containing two fields: `lon` \- for longitude, and `lat` \- for latitude. ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#datetime) Datetime _Available as of v1.8.0_ `datetime` \- date and time in [RFC 3339](https://datatracker.ietf.org/doc/html/rfc3339#section-5.6) format. See the following examples of single and multiple `datetime` values: ```json { "created_at": "2023-02-08T10:49:00Z", "updated_at": [\ "2023-02-08T13:52:00Z",\ "2023-02-21T21:23:00Z"\ ] } ``` The following formats are supported: - `"2023-02-08T10:49:00Z"` ( [RFC 3339](https://datatracker.ietf.org/doc/html/rfc3339#section-5.6), UTC) - `"2023-02-08T11:49:00+01:00"` ( [RFC 3339](https://datatracker.ietf.org/doc/html/rfc3339#section-5.6), with timezone) - `"2023-02-08T10:49:00"` (without timezone, UTC is assumed) - `"2023-02-08T10:49"` (without timezone and seconds) - `"2023-02-08"` (only date, midnight is assumed) Notes about the format: - `T` can be replaced with a space. - The `T` and `Z` symbols are case-insensitive. - UTC is always assumed when the timezone is not specified. - Timezone can have the following formats: `±HH:MM`, `±HHMM`, `±HH`, or `Z`. - Seconds can have up to 6 decimals, so the finest granularity for `datetime` is microseconds. ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#uuid) UUID _Available as of v1.11.0_ In addition to the basic `keyword` type, Qdrant supports `uuid` type for storing UUID values. Functionally, it works the same as `keyword`, internally stores parsed UUID values. ```json { "uuid": "550e8400-e29b-41d4-a716-446655440000", "uuids": [\ "550e8400-e29b-41d4-a716-446655440000",\ "550e8400-e29b-41d4-a716-446655440001"\ ] } ``` String representation of UUID (e.g. `550e8400-e29b-41d4-a716-446655440000`) occupies 36 bytes. But when numeric representation is used, it is only 128 bits (16 bytes). Usage of `uuid` index type is recommended in payload-heavy collections to save RAM and improve search performance. ## [Anchor](https://qdrant.tech/documentation/concepts/payload/\#create-point-with-payload) Create point with payload REST API ( [Schema](https://api.qdrant.tech/api-reference/points/upsert-points)) httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "vector": [0.05, 0.61, 0.76, 0.74],\ "payload": {"city": "Berlin", "price": 1.99}\ },\ {\ "id": 2,\ "vector": [0.19, 0.81, 0.75, 0.11],\ "payload": {"city": ["Berlin", "London"], "price": 1.99}\ },\ {\ "id": 3,\ "vector": [0.36, 0.55, 0.47, 0.94],\ "payload": {"city": ["Berlin", "Moscow"], "price": [1.99, 2.99]}\ }\ ] } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ vector=[0.05, 0.61, 0.76, 0.74],\ payload={\ "city": "Berlin",\ "price": 1.99,\ },\ ),\ models.PointStruct(\ id=2,\ vector=[0.19, 0.81, 0.75, 0.11],\ payload={\ "city": ["Berlin", "London"],\ "price": 1.99,\ },\ ),\ models.PointStruct(\ id=3,\ vector=[0.36, 0.55, 0.47, 0.94],\ payload={\ "city": ["Berlin", "Moscow"],\ "price": [1.99, 2.99],\ },\ ),\ ], ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.upsert("{collection_name}", { points: [\ {\ id: 1,\ vector: [0.05, 0.61, 0.76, 0.74],\ payload: {\ city: "Berlin",\ price: 1.99,\ },\ },\ {\ id: 2,\ vector: [0.19, 0.81, 0.75, 0.11],\ payload: {\ city: ["Berlin", "London"],\ price: 1.99,\ },\ },\ {\ id: 3,\ vector: [0.36, 0.55, 0.47, 0.94],\ payload: {\ city: ["Berlin", "Moscow"],\ price: [1.99, 2.99],\ },\ },\ ], }); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; use qdrant_client::{Payload, Qdrant, QdrantError}; use serde_json::json; let client = Qdrant::from_url("http://localhost:6334").build()?; let points = vec![\ PointStruct::new(\ 1,\ vec![0.05, 0.61, 0.76, 0.74],\ Payload::try_from(json!({"city": "Berlin", "price": 1.99})).unwrap(),\ ),\ PointStruct::new(\ 2,\ vec![0.19, 0.81, 0.75, 0.11],\ Payload::try_from(json!({"city": ["Berlin", "London"]})).unwrap(),\ ),\ PointStruct::new(\ 3,\ vec![0.36, 0.55, 0.47, 0.94],\ Payload::try_from(json!({"city": ["Berlin", "Moscow"], "price": [1.99, 2.99]}))\ .unwrap(),\ ),\ ]; client .upsert_points(UpsertPointsBuilder::new("{collection_name}", points).wait(true)) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(0.05f, 0.61f, 0.76f, 0.74f)) .putAllPayload(Map.of("city", value("Berlin"), "price", value(1.99))) .build(), PointStruct.newBuilder() .setId(id(2)) .setVectors(vectors(0.19f, 0.81f, 0.75f, 0.11f)) .putAllPayload( Map.of("city", list(List.of(value("Berlin"), value("London"))))) .build(), PointStruct.newBuilder() .setId(id(3)) .setVectors(vectors(0.36f, 0.55f, 0.47f, 0.94f)) .putAllPayload( Map.of( "city", list(List.of(value("Berlin"), value("London"))), "price", list(List.of(value(1.99), value(2.99))))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new PointStruct { Id = 1, Vectors = new[] { 0.05f, 0.61f, 0.76f, 0.74f }, Payload = { ["city"] = "Berlin", ["price"] = 1.99 } }, new PointStruct { Id = 2, Vectors = new[] { 0.19f, 0.81f, 0.75f, 0.11f }, Payload = { ["city"] = new[] { "Berlin", "London" } } }, new PointStruct { Id = 3, Vectors = new[] { 0.36f, 0.55f, 0.47f, 0.94f }, Payload = { ["city"] = new[] { "Berlin", "Moscow" }, ["price"] = new Value { ListValue = new ListValue { Values = { new Value[] { 1.99, 2.99 } } } } } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectors(0.05, 0.61, 0.76, 0.74), Payload: qdrant.NewValueMap(map[string]any{ "city": "Berlin", "price": 1.99}), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectors(0.19, 0.81, 0.75, 0.11), Payload: qdrant.NewValueMap(map[string]any{ "city": []any{"Berlin", "London"}}), }, { Id: qdrant.NewIDNum(3), Vectors: qdrant.NewVectors(0.36, 0.55, 0.47, 0.94), Payload: qdrant.NewValueMap(map[string]any{ "city": []any{"Berlin", "London"}, "price": []any{1.99, 2.99}}), }, }, }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/payload/\#update-payload) Update payload Updating payloads in Qdrant offers flexible methods to manage vector metadata. The **set payload** method updates specific fields while keeping others unchanged, while the **overwrite** method replaces the entire payload. Developers can also use **clear payload** to remove all metadata or delete fields to remove specific keys without affecting the rest. These options provide precise control for adapting to dynamic datasets. ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#set-payload) Set payload Set only the given payload values on a point. REST API ( [Schema](https://api.qdrant.tech/api-reference/points/set-payload)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/payload { "payload": { "property1": "string", "property2": "string" }, "points": [\ 0, 3, 100\ ] } ``` ```python client.set_payload( collection_name="{collection_name}", payload={ "property1": "string", "property2": "string", }, points=[0, 3, 10], ) ``` ```typescript client.setPayload("{collection_name}", { payload: { property1: "string", property2: "string", }, points: [0, 3, 10], }); ``` ```rust use qdrant_client::qdrant::{ PointsIdsList, SetPayloadPointsBuilder, }; use qdrant_client::Payload,; use serde_json::json; client .set_payload( SetPayloadPointsBuilder::new( "{collection_name}", Payload::try_from(json!({ "property1": "string", "property2": "string", })) .unwrap(), ) .points_selector(PointsIdsList { ids: vec![0.into(), 3.into(), 10.into()], }) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; client .setPayloadAsync( "{collection_name}", Map.of("property1", value("string"), "property2", value("string")), List.of(id(0), id(3), id(10)), true, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.SetPayloadAsync( collectionName: "{collection_name}", payload: new Dictionary { { "property1", "string" }, { "property2", "string" } }, ids: new ulong[] { 0, 3, 10 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.SetPayload(context.Background(), &qdrant.SetPayloadPoints{ CollectionName: "{collection_name}", Payload: qdrant.NewValueMap( map[string]any{"property1": "string", "property2": "string"}), PointsSelector: qdrant.NewPointsSelector( qdrant.NewIDNum(0), qdrant.NewIDNum(3)), }) ``` You don’t need to know the ids of the points you want to modify. The alternative is to use filters. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/payload { "payload": { "property1": "string", "property2": "string" }, "filter": { "must": [\ {\ "key": "color",\ "match": {\ "value": "red"\ }\ }\ ] } } ``` ```python client.set_payload( collection_name="{collection_name}", payload={ "property1": "string", "property2": "string", }, points=models.Filter( must=[\ models.FieldCondition(\ key="color",\ match=models.MatchValue(value="red"),\ ),\ ], ), ) ``` ```typescript client.setPayload("{collection_name}", { payload: { property1: "string", property2: "string", }, filter: { must: [\ {\ key: "color",\ match: {\ value: "red",\ },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, SetPayloadPointsBuilder}; use qdrant_client::Payload; use serde_json::json; client .set_payload( SetPayloadPointsBuilder::new( "{collection_name}", Payload::try_from(json!({ "property1": "string", "property2": "string", })) .unwrap(), ) .points_selector(Filter::must([Condition::matches(\ "color",\ "red".to_string(),\ )])) .wait(true), ) .await?; ``` ```java import java.util.Map; import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.ValueFactory.value; client .setPayloadAsync( "{collection_name}", Map.of("property1", value("string"), "property2", value("string")), Filter.newBuilder().addMust(matchKeyword("color", "red")).build(), true, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.SetPayloadAsync( collectionName: "{collection_name}", payload: new Dictionary { { "property1", "string" }, { "property2", "string" } }, filter: MatchKeyword("color", "red") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.SetPayload(context.Background(), &qdrant.SetPayloadPoints{ CollectionName: "{collection_name}", Payload: qdrant.NewValueMap( map[string]any{"property1": "string", "property2": "string"}), PointsSelector: qdrant.NewPointsSelectorFilter(&qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("color", "red"), }, }), }) ``` _Available as of v1.8.0_ It is possible to modify only a specific key of the payload by using the `key` parameter. For instance, given the following payload JSON object on a point: ```json { "property1": { "nested_property": "foo", }, "property2": { "nested_property": "bar", } } ``` You can modify the `nested_property` of `property1` with the following request: ```http POST /collections/{collection_name}/points/payload { "payload": { "nested_property": "qux", }, "key": "property1", "points": [1] } ``` Resulting in the following payload: ```json { "property1": { "nested_property": "qux", }, "property2": { "nested_property": "bar", } } ``` ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#overwrite-payload) Overwrite payload Fully replace any existing payload with the given one. REST API ( [Schema](https://api.qdrant.tech/api-reference/points/overwrite-payload)): httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points/payload { "payload": { "property1": "string", "property2": "string" }, "points": [\ 0, 3, 100\ ] } ``` ```python client.overwrite_payload( collection_name="{collection_name}", payload={ "property1": "string", "property2": "string", }, points=[0, 3, 10], ) ``` ```typescript client.overwritePayload("{collection_name}", { payload: { property1: "string", property2: "string", }, points: [0, 3, 10], }); ``` ```rust use qdrant_client::qdrant::{PointsIdsList, SetPayloadPointsBuilder}; use qdrant_client::Payload; use serde_json::json; client .overwrite_payload( SetPayloadPointsBuilder::new( "{collection_name}", Payload::try_from(json!({ "property1": "string", "property2": "string", })) .unwrap(), ) .points_selector(PointsIdsList { ids: vec![0.into(), 3.into(), 10.into()], }) .wait(true), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; client .overwritePayloadAsync( "{collection_name}", Map.of("property1", value("string"), "property2", value("string")), List.of(id(0), id(3), id(10)), true, null, null) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.OverwritePayloadAsync( collectionName: "{collection_name}", payload: new Dictionary { { "property1", "string" }, { "property2", "string" } }, ids: new ulong[] { 0, 3, 10 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.OverwritePayload(context.Background(), &qdrant.SetPayloadPoints{ CollectionName: "{collection_name}", Payload: qdrant.NewValueMap( map[string]any{"property1": "string", "property2": "string"}), PointsSelector: qdrant.NewPointsSelector( qdrant.NewIDNum(0), qdrant.NewIDNum(3)), }) ``` Like [set payload](https://qdrant.tech/documentation/concepts/payload/#set-payload), you don’t need to know the ids of the points you want to modify. The alternative is to use filters. ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#clear-payload) Clear payload This method removes all payload keys from specified points REST API ( [Schema](https://api.qdrant.tech/api-reference/points/clear-payload)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/payload/clear { "points": [0, 3, 100] } ``` ```python client.clear_payload( collection_name="{collection_name}", points_selector=[0, 3, 100], ) ``` ```typescript client.clearPayload("{collection_name}", { points: [0, 3, 100], }); ``` ```rust use qdrant_client::qdrant::{ClearPayloadPointsBuilder, PointsIdsList}; client .clear_payload( ClearPayloadPointsBuilder::new("{collection_name}") .points(PointsIdsList { ids: vec![0.into(), 3.into(), 10.into()], }) .wait(true), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; client .clearPayloadAsync("{collection_name}", List.of(id(0), id(3), id(100)), true, null, null) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.ClearPayloadAsync(collectionName: "{collection_name}", ids: new ulong[] { 0, 3, 100 }); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.ClearPayload(context.Background(), &qdrant.ClearPayloadPoints{ CollectionName: "{collection_name}", Points: qdrant.NewPointsSelector( qdrant.NewIDNum(0), qdrant.NewIDNum(3)), }) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/payload/\#delete-payload-keys) Delete payload keys Delete specific payload keys from points. REST API ( [Schema](https://api.qdrant.tech/api-reference/points/delete-payload)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/payload/delete { "keys": ["color", "price"], "points": [0, 3, 100] } ``` ```python client.delete_payload( collection_name="{collection_name}", keys=["color", "price"], points=[0, 3, 100], ) ``` ```typescript client.deletePayload("{collection_name}", { keys: ["color", "price"], points: [0, 3, 100], }); ``` ```rust use qdrant_client::qdrant::{DeletePayloadPointsBuilder, PointsIdsList}; client .delete_payload( DeletePayloadPointsBuilder::new( "{collection_name}", vec!["color".to_string(), "price".to_string()], ) .points_selector(PointsIdsList { ids: vec![0.into(), 3.into(), 10.into()], }) .wait(true), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; client .deletePayloadAsync( "{collection_name}", List.of("color", "price"), List.of(id(0), id(3), id(100)), true, null, null) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.DeletePayloadAsync( collectionName: "{collection_name}", keys: ["color", "price"], ids: new ulong[] { 0, 3, 100 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.DeletePayload(context.Background(), &qdrant.DeletePayloadPoints{ CollectionName: "{collection_name}", Keys: []string{"color", "price"}, PointsSelector: qdrant.NewPointsSelector( qdrant.NewIDNum(0), qdrant.NewIDNum(3)), }) ``` Alternatively, you can use filters to delete payload keys from the points. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/payload/delete { "keys": ["color", "price"], "filter": { "must": [\ {\ "key": "color",\ "match": {\ "value": "red"\ }\ }\ ] } } ``` ```python client.delete_payload( collection_name="{collection_name}", keys=["color", "price"], points=models.Filter( must=[\ models.FieldCondition(\ key="color",\ match=models.MatchValue(value="red"),\ ),\ ], ), ) ``` ```typescript client.deletePayload("{collection_name}", { keys: ["color", "price"], filter: { must: [\ {\ key: "color",\ match: {\ value: "red",\ },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, DeletePayloadPointsBuilder, Filter}; client .delete_payload( DeletePayloadPointsBuilder::new( "{collection_name}", vec!["color".to_string(), "price".to_string()], ) .points_selector(Filter::must([Condition::matches(\ "color",\ "red".to_string(),\ )])) .wait(true), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.matchKeyword; client .deletePayloadAsync( "{collection_name}", List.of("color", "price"), Filter.newBuilder().addMust(matchKeyword("color", "red")).build(), true, null, null) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.DeletePayloadAsync( collectionName: "{collection_name}", keys: ["color", "price"], filter: MatchKeyword("color", "red") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.DeletePayload(context.Background(), &qdrant.DeletePayloadPoints{ CollectionName: "{collection_name}", Keys: []string{"color", "price"}, PointsSelector: qdrant.NewPointsSelectorFilter( &qdrant.Filter{ Must: []*qdrant.Condition{qdrant.NewMatch("color", "red")}, }, ), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/payload/\#payload-indexing) Payload indexing To search more efficiently with filters, Qdrant allows you to create indexes for payload fields by specifying the name and type of field it is intended to be. The indexed fields also affect the vector index. See [Indexing](https://qdrant.tech/documentation/concepts/indexing/) for details. In practice, we recommend creating an index on those fields that could potentially constrain the results the most. For example, using an index for the object ID will be much more efficient, being unique for each record, than an index by its color, which has only a few possible values. In compound queries involving multiple fields, Qdrant will attempt to use the most restrictive index first. To create index for the field, you can use the following: REST API ( [Schema](https://api.qdrant.tech/api-reference/indexes/create-field-index)) httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "name_of_the_field_to_index", "field_schema": "keyword" } ``` ```python client.create_payload_index( collection_name="{collection_name}", field_name="name_of_the_field_to_index", field_schema="keyword", ) ``` ```typescript client.createPayloadIndex("{collection_name}", { field_name: "name_of_the_field_to_index", field_schema: "keyword", }); ``` ```rust use qdrant_client::qdrant::{CreateFieldIndexCollectionBuilder, FieldType}; client .create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "name_of_the_field_to_index", FieldType::Keyword, ) .wait(true), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.PayloadSchemaType; client.createPayloadIndexAsync( "{collection_name}", "name_of_the_field_to_index", PayloadSchemaType.Keyword, null, true, null, null); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "name_of_the_field_to_index" ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "name_of_the_field_to_index", FieldType: qdrant.FieldType_FieldTypeKeyword.Enum(), }) ``` The index usage flag is displayed in the payload schema with the [collection info API](https://api.qdrant.tech/api-reference/collections/get-collection). Payload schema example: ```json { "payload_schema": { "property1": { "data_type": "keyword" }, "property2": { "data_type": "integer" } } } ``` ## [Anchor](https://qdrant.tech/documentation/concepts/payload/\#facet-counts) Facet counts _Available as of v1.12.0_ Faceting is a special counting technique that can be used for various purposes: - Know which unique values exist for a payload key. - Know the number of points that contain each unique value. - Know how restrictive a filter would become by matching a specific value. Specifically, it is a counting aggregation for the values in a field, akin to a `GROUP BY` with `COUNT(*)` commands in SQL. These results for a specific field is called a “facet”. For example, when you look at an e-commerce search results page, you might see a list of brands on the sidebar, showing the number of products for each brand. This would be a facet for a `"brand"` field. To get the facet counts for a field, you can use the following: REST API ( [Facet](https://api.qdrant.tech/v-1-13-x/api-reference/points/facet)) httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/facet { "key": "size", "filter": { "must": { "key": "color", "match": { "value": "red" } } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.facet( collection_name="{collection_name}", key="size", facet_filter=models.Filter(must=[models.Match("color", "red")]), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.facet("{collection_name}", { filter: { must: [\ {\ key: "color",\ match: {\ value: "red",\ },\ },\ ], }, key: "size", }); ``` ```rust use qdrant_client::qdrant::{Condition, FacetCountsBuilder, Filter}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .facet( FacetCountsBuilder::new("{collection_name}", "size") .limit(10) .filter(Filter::must(vec![Condition::matches(\ "color",\ "red".to_string(),\ )])), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points; import io.qdrant.client.grpc.Filter; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .facetAsync( Points.FacetCounts.newBuilder() .setCollectionName(collection_name) .setKey("size") .setFilter(Filter.newBuilder().addMust(matchKeyword("color", "red")).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.FacetAsync( "{collection_name}", key: "size", filter: MatchKeyword("color", "red") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) res, err := client.Facet(ctx, &qdrant.FacetCounts{ CollectionName: "{collection_name}", Key: "size", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("color", "red"), }, }, }) ``` The response will contain the counts for each unique value in the field: ```json { "response": { "hits": [\ {"value": "L", "count": 19},\ {"value": "S", "count": 10},\ {"value": "M", "count": 5},\ {"value": "XL", "count": 1},\ {"value": "XXL", "count": 1}\ ] }, "time": 0.0001 } ``` The results are sorted by the count in descending order, then by the value in ascending order. Only values with non-zero counts will be returned. By default, the way Qdrant the counts for each value is approximate to achieve fast results. This should accurate enough for most cases, but if you need to debug your storage, you can use the `exact` parameter to get exact counts. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/facet { "key": "size", "exact": true } ``` ```python client.facet( collection_name="{collection_name}", key="size", exact=True, ) ``` ```typescript client.facet("{collection_name}", { key: "size", exact: true, }); ``` ```rust use qdrant_client::qdrant::FacetCountsBuilder; client .facet( FacetCountsBuilder::new("{collection_name}", "size") .limit(10) .exact(true), ) .await?; ``` ```java client .facetAsync( Points.FacetCounts.newBuilder() .setCollectionName(collection_name) .setKey("foo") .setExact(true) .build()) .get(); ``` ```csharp using Qdrant.Client; await client.FacetAsync( "{collection_name}", key: "size", exact: true, ); ``` ```go res, err := client.Facet(ctx, &qdrant.FacetCounts{ CollectionName: "{collection_name}", Key: "key", Exact: true, }) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/payload.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/payload.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-98-lllmstxt|> ## private-cloud-setup - [Documentation](https://qdrant.tech/documentation/) - [Private cloud](https://qdrant.tech/documentation/private-cloud/) - Setup Private Cloud --- # [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#qdrant-private-cloud-setup) Qdrant Private Cloud Setup ## [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#requirements) Requirements - **Kubernetes cluster:** To install Qdrant Private Cloud, you need a [standard compliant](https://www.cncf.io/training/certification/software-conformance/) Kubernetes cluster. You can run this cluster in any cloud, on-premise or edge environment, with distributions that range from AWS EKS to VMWare vSphere. See [Deployment Platforms](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/) for more information. - **Storage:** For storage, you need to set up the Kubernetes cluster with a Container Storage Interface (CSI) driver that provides block storage. For vertical scaling, the CSI driver needs to support volume expansion. For backups and restores, the driver needs to support CSI snapshots and restores. - **Permissions:** To install the Qdrant Kubernetes Operator you need to have `cluster-admin` access in your Kubernetes cluster. - **Locations:** By default, the Qdrant Operator Helm charts and container images are served from `registry.cloud.qdrant.io`. > **Note:** You can also mirror these images and charts into your own registry and pull them from there. ### [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#cli-tools) CLI tools During the onboarding, you will need to deploy the Qdrant Kubernetes Operator using Helm. Make sure you have the following tools installed: - [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) - [helm](https://helm.sh/docs/intro/install/) You will need to have access to the Kubernetes cluster with `kubectl` and `helm` configured to connect to it. Please refer the documentation of your Kubernetes distribution for more information. ### [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#required-artifacts) Required artifacts Container images: - `registry.cloud.qdrant.io/qdrant/qdrant` - `registry.cloud.qdrant.io/qdrant/operator` - `registry.cloud.qdrant.io/qdrant/cluster-manager` Open Containers Initiative (OCI) Helm charts: - `registry.cloud.qdrant.io/qdrant-charts/qdrant-private-cloud` - `registry.cloud.qdrant.io/library/qdrant-kubernetes-api` ### [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#mirroring-images-and-charts) Mirroring images and charts To mirror all necessary container images and Helm charts into your own registry, you can either use a replication feature that your registry provides, or you can manually sync the images with [Skopeo](https://github.com/containers/skopeo): First login to the source registry: ```shell skopeo login registry.cloud.qdrant.io ``` Then login to your own registry: ```shell skopeo login your-registry.example.com ``` To sync all container images: ```shell skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/qdrant your-registry.example.com/qdrant/qdrant skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/cluster-manager your-registry.example.com/qdrant/cluster-manager skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant/operator your-registry.example.com/qdrant/operator ``` To sync all helm charts: ```shell skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/qdrant-private-cloud your-registry.example.com/qdrant-charts/qdrant-private-cloud skopeo sync --all --src docker --dest docker registry.cloud.qdrant.io/qdrant-charts/qdrant-kubernetes-api your-registry.example.com/qdrant-charts/qdrant-kubernetes-api ``` During the installation or upgrade, you will need to adapt the repository information in the Helm chart values. See [Private Cloud Configuration](https://qdrant.tech/documentation/private-cloud/configuration/) for details. ## [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#installation-and-upgrades) Installation and Upgrades Once you are onboarded to Qdrant Private Cloud, you will receive credentials to access the Qdrant Cloud Registry. You can use these credentials to install the Qdrant Private Cloud solution using the following commands. You can choose the Kubernetes namespace freely. ```bash kubectl create namespace qdrant-private-cloud kubectl create secret docker-registry qdrant-registry-creds --docker-server=registry.cloud.qdrant.io --docker-username='your-username' --docker-password='your-password' --namespace qdrant-private-cloud helm registry login 'registry.cloud.qdrant.io' --username 'your-username' --password 'your-password' helm upgrade --install qdrant-private-cloud-crds oci://registry.cloud.qdrant.io/qdrant-charts/qdrant-kubernetes-api --namespace qdrant-private-cloud --version v1.16.6 --wait helm upgrade --install qdrant-private-cloud oci://registry.cloud.qdrant.io/qdrant-charts/qdrant-private-cloud --namespace qdrant-private-cloud --version 1.7.1 ``` For a list of available versions consult the [Private Cloud Changelog](https://qdrant.tech/documentation/private-cloud/changelog/). Current default versions are: - qdrant-kubernetes-api v1.16.6 - qdrant-private-cloud 1.7.1 Especially ensure, that the default values to reference `StorageClasses` and the corresponding `VolumeSnapshotClass` are set correctly in your environment. ### [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#scope-of-the-operator) Scope of the operator By default, the Qdrant Operator will only manage Qdrant clusters in the same Kubernetes namespace, where it is already deployed. The RoleBindings are also limited to this specific namespace. This default is chosen to limit the operator to the least amount of permissions necessary within a Kubernetes cluster. If you want to manage Qdrant clusters in multiple namespaces with the same operator, you can either configure a list of namespaces that the operator should watch: ```yaml operator: watch: # If true, watches only the namespace where the Qdrant operator is deployed, otherwise watches the namespaces in watch.namespaces onlyReleaseNamespace: false # an empty list watches all namespaces. namespaces: - qdrant-private-cloud - some-other-namespase limitRBAC: true ``` Or you can configure the operator to watch all namespaces: ```yaml operator: watch: # If true, watches only the namespace where the Qdrant operator is deployed, otherwise watches the namespaces in watch.namespaces onlyReleaseNamespace: false # an empty list watches all namespaces. namespaces: [] limitRBAC: false ``` ## [Anchor](https://qdrant.tech/documentation/private-cloud/private-cloud-setup/\#uninstallation) Uninstallation To uninstall the Qdrant Private Cloud solution, you can use the following command: ```bash helm uninstall qdrant-private-cloud --namespace qdrant-private-cloud helm uninstall qdrant-private-cloud-crds --namespace qdrant-private-cloud kubectl delete namespace qdrant-private-cloud ``` Note that uninstalling the `qdrant-private-cloud-crds` Helm chart will remove all Custom Resource Definitions (CRDs) will also remove all Qdrant clusters that were managed by the operator. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/private-cloud-setup.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/private-cloud-setup.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-99-lllmstxt|> ## user-management - [Documentation](https://qdrant.tech/documentation/) - [Cloud rbac](https://qdrant.tech/documentation/cloud-rbac/) - User Management --- # [Anchor](https://qdrant.tech/documentation/cloud-rbac/user-management/\#user-management) User Management > 💡 You can access this in **Access Management > User & Role Management** _if available see [this page for details](https://qdrant.tech/documentation/cloud-rbac/)._ ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/user-management/\#inviting-users-to-an-account) Inviting Users to an Account Users can be invited via the **User Management** section, where they are assigned the **Base role** by default. Additionally, users have the option to select a specific role when inviting another user. The **Base role** is a predefined role with minimal permissions, granting users access to the platform while restricting them to viewing only their own profile. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/user-invitation.png) ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/user-management/\#inviting-users-from-a-role) Inviting Users from a Role Users can be invited attached to a specific role by inviting them through the **Role Details** page - just click on the Users tab and follow the prompts. Once accepted, they’ll be assigned that role’s permissions, along with the base role. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/invite-user.png) ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/user-management/\#revoking-an-invitation) Revoking an Invitation Before being accepted, an Admin/Owner can cancel a pending invite directly on either the **User Management** or **Role Details** page. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/revoke-invite.png) ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/user-management/\#updating-a-users-roles) Updating a User’s Roles Authorized users can give or take away roles from users in **User Management**. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/update-user-role.png) ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/update-user-role-edit-dialog.png) ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/user-management/\#removing-a-user-from-an-account) Removing a User from an Account Users can be removed from an account by clicking on their name in either **User Management** (via Actions). This option is only available after they’ve accepted the invitation to join, ensuring that only active users can be removed. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/remove-user.png) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/user-management.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/user-management.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-100-lllmstxt|> ## reranking-semantic-search - [Documentation](https://qdrant.tech/documentation/) - [Search precision](https://qdrant.tech/documentation/search-precision/) - Reranking in Semantic Search --- # [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#reranking-in-rag-with-qdrant-vector-database) Reranking in RAG with Qdrant Vector Database In Retrieval-Augmented Generation (RAG) systems, irrelevant or missing information can throw off your model’s ability to produce accurate, meaningful outputs. One of the best ways to ensure you’re feeding your language model the most relevant, context-rich documents is through reranking. It’s a game-changer. In this guide, we’ll dive into using reranking to boost the relevance of search results in Qdrant. We’ll start with an easy use case that leverages the Cohere Rerank model. Then, we’ll take it up a notch by exploring ColBERT for a more advanced approach. By the time you’re done, you’ll know how to implement [hybrid search](https://qdrant.tech/articles/hybrid-search/), fine-tune reranking models, and significantly improve your accuracy. Ready? Let’s jump in. --- # [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#understanding-reranking) Understanding Reranking This section is broken down into key parts to help you easily grasp the background, mechanics, and significance of reranking. ## [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#background) Background In search systems, two metrics—precision and recall—are the backbone of success. But what do they mean? Precision tells us how many of the retrieved results are actually relevant, while recall measures how well we’ve captured all the relevant results out there. Simply put: ![image5.png](https://qdrant.tech/documentation/examples/reranking-semantic-search/image5.png) Sparse vector searches usually give you high precision because they’re great at finding exact matches. But, here’s the catch—your recall can suffer when relevant documents don’t contain those exact keywords. On the flip side, dense vector searches are fantastic for recall since they grasp the broader, semantic meaning of your query. However, this can lead to lower precision, where you might see results that are only loosely related. This is exactly where reranking comes to the rescue. It takes a wide net of documents (giving you high recall) and then refines them by reordering the top candidates based on their relevance scores—boosting precision without losing that broad understanding. Typically, we retain only the top K candidates after reordering to focus on the most relevant results. ## [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#working) Working Picture this: You walk into a massive library and ask for a book on “climate change.” The librarian pulls out a dozen books for you—some are scientific papers, others are personal essays, and one’s even a novel. Sure, they’re all relevant, but the first one you get handed is the novel. Not exactly what you were hoping for, right? Now, imagine a smarter, more intuitive librarian who really gets what you’re after. This one knows exactly which books are most impactful, the most current, and perfectly aligned with what you need. That’s what reranking does for your search results—it doesn’t just grab any relevant document; it smartly reorders them so the best ones land at the top of your list. It’s like having a librarian who knows exactly what you’re looking for before you do! ![image6.png](https://qdrant.tech/documentation/examples/reranking-semantic-search/image6.png) An illustration of the rerank model prioritizing better results To become that smart, intuitive librarian, your algorithm needs to learn how to understand both your queries and the documents it retrieves. It has to evaluate the relationship between them effectively, so it can give you exactly what you’re looking for. The way reranker models operate varies based on their type, which will be discussed later, but in general, they calculate a relevance score for each document-query pair.Unlike embedding models, which squash everything into a single vector upfront, rerankers keep all the important details intact by using the full transformer output to calculate a similarity score. The result? Precision. But, there’s a trade-off—reranking can be slow. Processing millions of documents can take hours, which is why rerankers focus on refining results, not searching through the entire document collection. Rerankers come in different types, each with its own strengths. Let’s break them down: 1. **Cross Encoder Models**: These boost reranking by using a classification system to evaluate pairs of data—like sentences or documents. They spit out a similarity score from 0 to 1, showing how closely the document matches your query. The catch? Cross-encoders need both query and document, so they can’t handle standalone documents or queries by themselves. 2. **Multi-Vector Rerankers (e.g., ColBERT)**: These models take a more efficient route. They encode your query and the documents separately and only compare them later, reducing the computational load. This means document representations can be precomputed, speeding up retrieval times 3. **Large Language Models (LLMs) as Rerankers**: This is a newer, smarter way to rerank. LLMs, like GPT, are getting better by the day. With the right instructions, they can prioritize the most relevant documents for you, leveraging their massive understanding of language to deliver even more accurate results. Each of these rerankers has its own special way of making sure you get the best search results, fast and relevant to what you need. ## [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#importance) Importance In the previous section, we explored the background and mechanics of reranking, but now let’s talk about the three big wins you get from using it: - **Enhancing Search Accuracy:** Reranking is all about making your search results sharper and more relevant. After the initial ranking, rerankers step in, reshuffling the results based on deeper analysis to ensure that the most crucial information is front and center. [Research shows that rerankers](https://cohere.com/blog/rerank) can pull off a serious boost—improving the top results for about 72% of search queries. That’s a huge leap in precision. - **Reducing Information Overload:** If you feel like you’re drowning in a sea of search results, rerankers can come to your rescue. They filter and fine-tune the flood of information so you get exactly what you need, without the overwhelm. It makes your search experience more focused and way less chaotic. - **Balancing Speed and Relevance:** First stage retrieval and second stage reranking strike the perfect balance between speed and accuracy. Sure, the second stage may add a bit of latency due to their processing power, but the trade-off is worth it. You get highly relevant results, and in the end, that’s what matters most. Now that you know why reranking is such a game-changer, let’s dive into the practical side of things. --- # [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#implementing-vector-search-with-reranking) Implementing Vector Search with Reranking In this section, you’re going to see how to implement vector search with reranking using Cohere. But first, let’s break it down. ## [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#overview) Overview A typical search system works in two main stages: Ingestion and Retrieval. Think of ingestion as the process where your data gets prepped and loaded into the system, and retrieval as the part where the magic happens—where your queries pull out the most relevant documents. Check out the architectural diagram below to visualize how these stages work together. ![image1.png](https://qdrant.tech/documentation/examples/reranking-semantic-search/image1.png) The two essential stages of a search system: Ingestion and Retrieval Process ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#ingestion-stage) Ingestion Stage - **Documents:** This is where it all starts. The system takes in raw data or documents that need to be prepped for search—this is your initial input. - **Embeddings:** Next, these documents are transformed into sparse or dense [embeddings](https://qdrant.tech/documentation/embeddings/), which are basically vector representations. These vectors capture the deep, underlying meaning of the text, allowing your system to perform smart, efficient searches and comparisons based on semantic meaning - **Vector Database:** Once your documents are converted into these embeddings, they get stored in a vector database—essentially the powerhouse behind fast, accurate similarity searches. Here, we’ll see the capabilities of the Qdrant vector database. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#retrieval-stage) Retrieval Stage - **User’s Query:** Now we enter the retrieval phase. The user submits a query, and it’s time to match that query against the stored documents. - **Embeddings:** Just like with the documents, the user’s query is converted into a sparse or dense embedding. This enables the system to compare the query’s meaning with the meanings of the stored documents. - **Vector Search:** The system searches for the most relevant documents by comparing the query’s embedding to those in the vector database, and it pulls up the closest matches. - **Rerank:** Once the initial results are in, the reranking process kicks in to ensure you get the best results on top. We’ll be using **Cohere’s** rerank-english-v3.0 model, which excels at reordering English language documents to prioritize relevance. It can handle up to 4096 tokens, giving it plenty of context to work with. And if you’re dealing with multi-lingual data, don’t worry—Cohere’s got reranking models for other languages too. ## [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#implementation) Implementation Now it’s time to dive into the actual implementation. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#setup) Setup To follow along with this tutorial, you’ll need a few key tools:: - Python Client for Qdrant - Cohere Let’s install everything you need in one go using the Python package manager:: ```jsx pip install qdrant-client cohere ``` * * * Now, let’s bring in all the necessary components in one tidy block: ```jsx from qdrant_client import QdrantClient from qdrant_client.models import Distance, VectorParams, PointStruct import cohere ``` * * * Qdrant is a powerful vector similarity search engine that gives you a production-ready service with an easy-to-use API for storing, searching, and managing data. You can interact with Qdrant through a local or cloud setup, but since we’re working in Colab, let’s go with the cloud setup. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#steps-to-set-up-qdrant-cloud)**Steps to Set Up Qdrant Cloud:** 1. **Sign Up**: Head to Qdrant’s website and sign up for a cloud account using your email, Google, or GitHub credentials. 2. **Create Your First Cluster**: Once you’re in, navigate to the Overview section and follow the onboarding steps under Create First Cluster. 3. **Get Your API Key**: After creating your cluster, an API key will be generated. This key will let you interact with the cluster using the Python client. 4. **Check Your Cluster**: Your new cluster will appear under the Clusters section. From here, you’re all set to start interacting with your data. Finally, under the Overview section, you’ll see the following code snippet: ![image7.png](https://qdrant.tech/documentation/examples/reranking-semantic-search/image7.png) Qdrant Overview Section Add your API keys. This will let your Python client connect to Qdrant and Cohere. ```jsx client = QdrantClient( url="", api_key="", ) print(client.get_collections()) ``` * * * Next, we’ll set up Cohere for reranking. Log in to your Cohere account, generate an API key, and add it like this:: ```jsx co = cohere.Client("") ``` * * * ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#ingestion) Ingestion ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#there-are-three-key-parts-to-ingestion-creating-a-collection-converting-documents-to-embeddings-and-upserting-the-data-lets-break-it-down) There are three key parts to ingestion: Creating a Collection, Converting Documents to Embeddings, and Upserting the Data. Let’s break it down. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#creating-a-collection) Creating a Collection A collection is basically a named group of points (vectors with data) that you can search through. All the vectors in a collection need to have the same size and be compared using one distance metric. Here’s how to create one: ```jsx client.create_collection( collection_name="basic-search-rerank", vectors_config=VectorParams(size=1024, distance=Distance.DOT), ) ``` * * * Here, the vector size is set to 1024 to match our dense embeddings, and we’re using dot product as the distance metric—perfect for capturing the similarity between vectors, especially when they’re normalized. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#documents-to-embeddings) Documents to Embeddings Let’s set up some example data. Here’s a query and a few documents for demonstration: ```jsx query = "What is the purpose of feature scaling in machine learning?" documents = [\ "In machine learning, feature scaling is the process of normalizing the range of independent variables or features. The goal is to ensure that all features contribute equally to the model, especially in algorithms like SVM or k-nearest neighbors where distance calculations matter.",\ \ "Feature scaling is commonly used in data preprocessing to ensure that features are on the same scale. This is particularly important for gradient descent-based algorithms where features with larger scales could disproportionately impact the cost function.",\ \ "In data science, feature extraction is the process of transforming raw data into a set of engineered features that can be used in predictive models. Feature scaling is related but focuses on adjusting the values of these features.",\ \ "Unsupervised learning algorithms, such as clustering methods, may benefit from feature scaling as it ensures that features with larger numerical ranges don't dominate the learning process.",\ \ "One common data preprocessing technique in data science is feature selection. Unlike feature scaling, feature selection aims to reduce the number of input variables used in a model to avoid overfitting.",\ \ "Principal component analysis (PCA) is a dimensionality reduction technique used in data science to reduce the number of variables. PCA works best when data is scaled, as it relies on variance which can be skewed by features on different scales.",\ \ "Min-max scaling is a common feature scaling technique that usually transforms features to a fixed range [0, 1]. This method is useful when the distribution of data is not Gaussian.",\ \ "Standardization, or z-score normalization, is another technique that transforms features into a mean of 0 and a standard deviation of 1. This method is effective for data that follows a normal distribution.",\ \ "Feature scaling is critical when using algorithms that rely on distances, such as k-means clustering, as unscaled features can lead to misleading results.",\ \ "Scaling can improve the convergence speed of gradient descent algorithms by preventing issues with different feature scales affecting the cost function's landscape.",\ \ "In deep learning, feature scaling helps in stabilizing the learning process, allowing for better performance and faster convergence during training.",\ \ "Robust scaling is another method that uses the median and the interquartile range to scale features, making it less sensitive to outliers.",\ \ "When working with time series data, feature scaling can help in standardizing the input data, improving model performance across different periods.",\ \ "Normalization is often used in image processing to scale pixel values to a range that enhances model performance in computer vision tasks.",\ \ "Feature scaling is significant when features have different units of measurement, such as height in centimeters and weight in kilograms.",\ \ "In recommendation systems, scaling features such as user ratings can improve the model's ability to find similar users or items.",\ \ "Dimensionality reduction techniques, like t-SNE and UMAP, often require feature scaling to visualize high-dimensional data in lower dimensions effectively.",\ \ "Outlier detection techniques can also benefit from feature scaling, as they can be influenced by unscaled features that have extreme values.",\ \ "Data preprocessing steps, including feature scaling, can significantly impact the performance of machine learning models, making it a crucial part of the modeling pipeline.",\ \ "In ensemble methods, like random forests, feature scaling is not strictly necessary, but it can still enhance interpretability and comparison of feature importance.",\ \ "Feature scaling should be applied consistently across training and test datasets to avoid data leakage and ensure reliable model evaluation.",\ \ "In natural language processing (NLP), scaling can be useful when working with numerical features derived from text data, such as word counts or term frequencies.",\ \ "Log transformation is a technique that can be applied to skewed data to stabilize variance and make the data more suitable for scaling.",\ \ "Data augmentation techniques in machine learning may also include scaling to ensure consistency across training datasets, especially in computer vision tasks."\ ] ``` * * * We’ll generate embeddings for these documents using Cohere’s embed-english-v3.0 model, which produces 1024-dimensional vectors: ```python model="embed-english-v3.0" doc_embeddings = co.embed(texts=documents, model=model, input_type="search_document", embedding_types=['float']) ``` * * * This code taps into the power of the Cohere API to generate embeddings for your list of documents. It uses the embed-english-v3.0 model, sets the input type to “search\_document,” and asks for the embeddings in float format. The result? A set of dense embeddings, each one representing the deep semantic meaning of your documents. These embeddings will be stored in doc\_embeddings, ready for action. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#upsert-data) Upsert Data We need to transform those dense embeddings into a format Qdrant can work with, and that’s where Points come in. Points are the building blocks of Qdrant—they’re records made up of a vector (the embedding) and an optional payload (like your document text). Here’s how we convert those embeddings into Points: ```python points = [] for idx, (embedding, doc) in enumerate(zip(doc_embeddings.embeddings.float_, documents)): point = PointStruct( id=idx, vector=embedding, payload={"document": doc} ) points.append(point) ``` * * * What’s happening here? We’re building a list of Points from the embeddings: - First, we start with an empty list. - Then, we loop through both **doc\_embeddings** and **documents** at the same time using enumerate() to grab the index (idx) along the way. - For each pair (an embedding and its corresponding document), we create a PointStruct. Each point gets: - An id (from idx). - A vector (the embedding). - A payload (the actual document text). - Each Point is added to our list. Once that’s done, it’s time to send these Points into your Qdrant collection with the upsert() function: ```python operation_info = client.upsert( collection_name="basic-search-rerank", points=points ) ``` * * * ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#now-your-embeddings-are-all-set-in-qdrant-ready-to-power-your-search) Now your embeddings are all set in Qdrant, ready to power your search. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#retrieval) Retrieval The first few steps here mirror what we did during ingestion—just like before, we need to convert the query into an embedding: ```python query_embeddings = co.embed(texts=[query], model=model, input_type="search_query", embedding_types=['float']) ``` * * * After that, we’ll move on to retrieve results using vector search and apply reranking on the results. This two-stage process is super efficient because we’re grabbing a small set of the most relevant documents first, which is much faster than reranking a huge dataset. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#vector-search) Vector Search This snippet grabs the top 10 most relevant points from your Qdrant collection using the query embedding. ```python search_result = client.query_points( collection_name="basic-search-rerank", query=query_embeddings.embeddings.float_[0], limit=10 ).points ``` * * * Here’s how it works: we use the query\_points method to search within the “basic-search-rerank” collection. It compares the query embedding (the first embedding in query\_embeddings) against all the document embeddings, pulling up the 10 closest matches. The matching points get stored in search\_result. And here’s a sneak peek at what you’ll get from the vector search: | **ID** | **Document** | **Score** | | --- | --- | --- | | 0 | In machine learning, feature scaling is the process of normalizing the range of independent… | 0.71 | | 10 | In deep learning, feature scaling helps stabilize the learning process, allowing for… | 0.69 | | 1 | Feature scaling is commonly used in data preprocessing to ensure that features are on the… | 0.68 | | 23 | Data augmentation techniques in machine learning may also include scaling to ensure… | 0.64 | | 3 | Unsupervised learning algorithms, such as clustering methods, may benefit from feature… | 0.64 | | 12 | When working with time series data, feature scaling can help standardize the input… | 0.62 | | 19 | In ensemble methods, like random forests, feature scaling is not strictly necessary… | 0.61 | | 21 | In natural language processing (NLP), scaling can be useful when working with numerical… | 0.61 | | 20 | Feature scaling should be applied consistently across training and test datasets… | 0.61 | | 18 | Data preprocessing steps, including feature scaling, can significantly impact the performance… | 0.61 | From the looks of it, the data pulled up is highly relevant to your query. Now, with this solid base of results, it’s time to refine them further with reranking. ### [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#rerank) Rerank This code takes the documents from the search results and reranks them based on your query, making sure you get the most relevant ones right at the top. First, we pull out the documents from the search results. Then we use Cohere’s rerank model to refine these results: ```python document_list = [point.payload['document'] for point in search_result] rerank_results = co.rerank( model="rerank-english-v3.0", query=query, documents=document_list, top_n=5, ) ``` * * * What’s happening here? In the first line, we’re building a list of documents by grabbing the ‘document’ field from each search result point. Then, we pass this list, along with the original query, to Cohere’s rerank method. Using the **rerank-english-v3.0** model, it reshuffles the documents and gives you back the top 5, ranked by their relevance to the query. Here’s the reranked result table, with the new order and their relevance scores: | **Index** | **Document** | **Relevance Score** | | --- | --- | --- | | 0 | In machine learning, feature scaling is the process of normalizing the range of independent variables or features. | 0.99995166 | | 1 | Feature scaling is commonly used in data preprocessing to ensure that features are on the same scale. | 0.99929035 | | 10 | In deep learning, feature scaling helps stabilize the learning process, allowing for better performance and faster convergence. | 0.998675 | | 23 | Data augmentation techniques in machine learning may also include scaling to ensure consistency across training datasets. | 0.998043 | | 3 | Unsupervised learning algorithms, such as clustering methods, may benefit from feature scaling. | 0.9979967 | As you can see, the reranking did its job. Positions for documents 10 and 1 got swapped, showing that the reranker has fine-tuned the results to give you the most relevant content at the top. ## [Anchor](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/\#conclusion) Conclusion Reranking is a powerful way to boost the relevance and precision of search results in RAG systems. By combining Qdrant’s vector search capabilities with tools like Cohere’s Rerank model or ColBERT, you can refine search outputs, ensuring the most relevant information rises to the top. This guide demonstrated how reranking enhances precision without sacrificing recall, delivering sharper, context-rich results. With these tools, you’re equipped to create search systems that provide meaningful and impactful user experiences. Start implementing reranking to take your applications to the next level! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/search-precision/reranking-semantic-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/search-precision/reranking-semantic-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-101-lllmstxt|> ## search-beginners - [Documentation](https://qdrant.tech/documentation/) - [Beginner tutorials](https://qdrant.tech/documentation/beginner-tutorials/) - Semantic Search 101 --- # [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#build-your-first-semantic-search-engine-in-5-minutes) Build Your First Semantic Search Engine in 5 Minutes | Time: 5 - 15 min | Level: Beginner | | | | --- | --- | --- | --- | ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#overview) Overview If you are new to vector databases, this tutorial is for you. In 5 minutes you will build a semantic search engine for science fiction books. After you set it up, you will ask the engine about an impending alien threat. Your creation will recommend books as preparation for a potential space attack. Before you begin, you need to have a [recent version of Python](https://www.python.org/downloads/) installed. If you don’t know how to run this code in a virtual environment, follow Python documentation for [Creating Virtual Environments](https://docs.python.org/3/tutorial/venv.html#creating-virtual-environments) first. This tutorial assumes you’re in the bash shell. Use the Python documentation to activate a virtual environment, with commands such as: ```bash source tutorial-env/bin/activate ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#1-installation) 1\. Installation You need to process your data so that the search engine can work with it. The [Sentence Transformers](https://www.sbert.net/) framework gives you access to common Large Language Models that turn raw data into embeddings. ```bash pip install -U sentence-transformers ``` Once encoded, this data needs to be kept somewhere. Qdrant lets you store data as embeddings. You can also use Qdrant to run search queries against this data. This means that you can ask the engine to give you relevant answers that go way beyond keyword matching. ```bash pip install -U qdrant-client ``` ### [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#import-the-models) Import the models Once the two main frameworks are defined, you need to specify the exact models this engine will use. ```python from qdrant_client import models, QdrantClient from sentence_transformers import SentenceTransformer ``` The [Sentence Transformers](https://www.sbert.net/) framework contains many embedding models. We’ll take [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as it has a good balance between speed and embedding quality for this tutorial. ```python encoder = SentenceTransformer("all-MiniLM-L6-v2") ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#2-add-the-dataset) 2\. Add the dataset [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) will encode the data you provide. Here you will list all the science fiction books in your library. Each book has metadata, a name, author, publication year and a short description. ```python documents = [\ {\ "name": "The Time Machine",\ "description": "A man travels through time and witnesses the evolution of humanity.",\ "author": "H.G. Wells",\ "year": 1895,\ },\ {\ "name": "Ender's Game",\ "description": "A young boy is trained to become a military leader in a war against an alien race.",\ "author": "Orson Scott Card",\ "year": 1985,\ },\ {\ "name": "Brave New World",\ "description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.",\ "author": "Aldous Huxley",\ "year": 1932,\ },\ {\ "name": "The Hitchhiker's Guide to the Galaxy",\ "description": "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.",\ "author": "Douglas Adams",\ "year": 1979,\ },\ {\ "name": "Dune",\ "description": "A desert planet is the site of political intrigue and power struggles.",\ "author": "Frank Herbert",\ "year": 1965,\ },\ {\ "name": "Foundation",\ "description": "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.",\ "author": "Isaac Asimov",\ "year": 1951,\ },\ {\ "name": "Snow Crash",\ "description": "A futuristic world where the internet has evolved into a virtual reality metaverse.",\ "author": "Neal Stephenson",\ "year": 1992,\ },\ {\ "name": "Neuromancer",\ "description": "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.",\ "author": "William Gibson",\ "year": 1984,\ },\ {\ "name": "The War of the Worlds",\ "description": "A Martian invasion of Earth throws humanity into chaos.",\ "author": "H.G. Wells",\ "year": 1898,\ },\ {\ "name": "The Hunger Games",\ "description": "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.",\ "author": "Suzanne Collins",\ "year": 2008,\ },\ {\ "name": "The Andromeda Strain",\ "description": "A deadly virus from outer space threatens to wipe out humanity.",\ "author": "Michael Crichton",\ "year": 1969,\ },\ {\ "name": "The Left Hand of Darkness",\ "description": "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.",\ "author": "Ursula K. Le Guin",\ "year": 1969,\ },\ {\ "name": "The Three-Body Problem",\ "description": "Humans encounter an alien civilization that lives in a dying system.",\ "author": "Liu Cixin",\ "year": 2008,\ },\ ] ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#3-define-storage-location) 3\. Define storage location You need to tell Qdrant where to store embeddings. This is a basic demo, so your local computer will use its memory as temporary storage. ```python client = QdrantClient(":memory:") ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#4-create-a-collection) 4\. Create a collection All data in Qdrant is organized by collections. In this case, you are storing books, so we are calling it `my_books`. ```python client.create_collection( collection_name="my_books", vectors_config=models.VectorParams( size=encoder.get_sentence_embedding_dimension(), # Vector size is defined by used model distance=models.Distance.COSINE, ), ) ``` - The `vector_size` parameter defines the size of the vectors for a specific collection. If their size is different, it is impossible to calculate the distance between them. 384 is the encoder output dimensionality. You can also use model.get\_sentence\_embedding\_dimension() to get the dimensionality of the model you are using. - The `distance` parameter lets you specify the function used to measure the distance between two points. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#5-upload-data-to-collection) 5\. Upload data to collection Tell the database to upload `documents` to the `my_books` collection. This will give each record an id and a payload. The payload is just the metadata from the dataset. ```python client.upload_points( collection_name="my_books", points=[\ models.PointStruct(\ id=idx, vector=encoder.encode(doc["description"]).tolist(), payload=doc\ )\ for idx, doc in enumerate(documents)\ ], ) ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#6--ask-the-engine-a-question) 6\. Ask the engine a question Now that the data is stored in Qdrant, you can ask it questions and receive semantically relevant results. ```python hits = client.query_points( collection_name="my_books", query=encoder.encode("alien invasion").tolist(), limit=3, ).points for hit in hits: print(hit.payload, "score:", hit.score) ``` **Response:** The search engine shows three of the most likely responses that have to do with the alien invasion. Each of the responses is assigned a score to show how close the response is to the original inquiry. ```text {'name': 'The War of the Worlds', 'description': 'A Martian invasion of Earth throws humanity into chaos.', 'author': 'H.G. Wells', 'year': 1898} score: 0.570093257022374 {'name': "The Hitchhiker's Guide to the Galaxy", 'description': 'A comedic science fiction series following the misadventures of an unwitting human and his alien friend.', 'author': 'Douglas Adams', 'year': 1979} score: 0.5040468703143637 {'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216 ``` ### [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#narrow-down-the-query) Narrow down the query How about the most recent book from the early 2000s? ```python hits = client.query_points( collection_name="my_books", query=encoder.encode("alien invasion").tolist(), query_filter=models.Filter( must=[models.FieldCondition(key="year", range=models.Range(gte=2000))] ), limit=1, ).points for hit in hits: print(hit.payload, "score:", hit.score) ``` **Response:** The query has been narrowed down to one result from 2008. ```text {'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216 ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/\#next-steps) Next Steps Congratulations, you have just created your very first search engine! Trust us, the rest of Qdrant is not that complicated, either. For your next tutorial you should try building an actual [Neural Search Service with a complete API and a dataset](https://qdrant.tech/documentation/tutorials/neural-search/). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/search-beginners.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/search-beginners.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-102-lllmstxt|> ## quickstart - [Documentation](https://qdrant.tech/documentation/) - Local Quickstart --- # [Anchor](https://qdrant.tech/documentation/quickstart/\#how-to-get-started-with-qdrant-locally) How to Get Started with Qdrant Locally In this short example, you will use the Python Client to create a Collection, load data into it and run a basic search query. ## [Anchor](https://qdrant.tech/documentation/quickstart/\#download-and-run) Download and run First, download the latest Qdrant image from Dockerhub: ```bash docker pull qdrant/qdrant ``` Then, run the service: ```bash docker run -p 6333:6333 -p 6334:6334 \ -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \ qdrant/qdrant ``` Under the default configuration all data will be stored in the `./qdrant_storage` directory. This will also be the only directory that both the Container and the host machine can both see. Qdrant is now accessible: - REST API: [localhost:6333](http://localhost:6333/) - Web UI: [localhost:6333/dashboard](http://localhost:6333/dashboard) - GRPC API: [localhost:6334](http://localhost:6334/) ## [Anchor](https://qdrant.tech/documentation/quickstart/\#initialize-the-client) Initialize the client pythontypescriptrustjavacsharpgo ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); ``` ```rust use qdrant_client::Qdrant; // The Rust client uses Qdrant's gRPC interface let client = Qdrant::from_url("http://localhost:6334").build()?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; // The Java client uses Qdrant's gRPC interface QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); ``` ```csharp using Qdrant.Client; // The C# client uses Qdrant's gRPC interface var client = new QdrantClient("localhost", 6334); ``` ```go import "github.com/qdrant/go-client/qdrant" // The Go client uses Qdrant's gRPC interface client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) ``` ## [Anchor](https://qdrant.tech/documentation/quickstart/\#create-a-collection) Create a collection You will be storing all of your vector data in a Qdrant collection. Let’s call it `test_collection`. This collection will be using a dot product distance metric to compare vectors. pythontypescriptrustjavacsharpgo ```python from qdrant_client.models import Distance, VectorParams client.create_collection( collection_name="test_collection", vectors_config=VectorParams(size=4, distance=Distance.DOT), ) ``` ```typescript await client.createCollection("test_collection", { vectors: { size: 4, distance: "Dot" }, }); ``` ```rust use qdrant_client::qdrant::{CreateCollectionBuilder, VectorParamsBuilder}; client .create_collection( CreateCollectionBuilder::new("test_collection") .vectors_config(VectorParamsBuilder::new(4, Distance::Dot)), ) .await?; ``` ```java import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; client.createCollectionAsync("test_collection", VectorParams.newBuilder().setDistance(Distance.Dot).setSize(4).build()).get(); ``` ```csharp using Qdrant.Client.Grpc; await client.CreateCollectionAsync(collectionName: "test_collection", vectorsConfig: new VectorParams { Size = 4, Distance = Distance.Dot }); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 4, Distance: qdrant.Distance_Cosine, }), }) ``` ## [Anchor](https://qdrant.tech/documentation/quickstart/\#add-vectors) Add vectors Let’s now add a few vectors with a payload. Payloads are other data you want to associate with the vector: pythontypescriptrustjavacsharpgo ```python from qdrant_client.models import PointStruct operation_info = client.upsert( collection_name="test_collection", wait=True, points=[\ PointStruct(id=1, vector=[0.05, 0.61, 0.76, 0.74], payload={"city": "Berlin"}),\ PointStruct(id=2, vector=[0.19, 0.81, 0.75, 0.11], payload={"city": "London"}),\ PointStruct(id=3, vector=[0.36, 0.55, 0.47, 0.94], payload={"city": "Moscow"}),\ PointStruct(id=4, vector=[0.18, 0.01, 0.85, 0.80], payload={"city": "New York"}),\ PointStruct(id=5, vector=[0.24, 0.18, 0.22, 0.44], payload={"city": "Beijing"}),\ PointStruct(id=6, vector=[0.35, 0.08, 0.11, 0.44], payload={"city": "Mumbai"}),\ ], ) print(operation_info) ``` ```typescript const operationInfo = await client.upsert("test_collection", { wait: true, points: [\ { id: 1, vector: [0.05, 0.61, 0.76, 0.74], payload: { city: "Berlin" } },\ { id: 2, vector: [0.19, 0.81, 0.75, 0.11], payload: { city: "London" } },\ { id: 3, vector: [0.36, 0.55, 0.47, 0.94], payload: { city: "Moscow" } },\ { id: 4, vector: [0.18, 0.01, 0.85, 0.80], payload: { city: "New York" } },\ { id: 5, vector: [0.24, 0.18, 0.22, 0.44], payload: { city: "Beijing" } },\ { id: 6, vector: [0.35, 0.08, 0.11, 0.44], payload: { city: "Mumbai" } },\ ], }); console.debug(operationInfo); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; let points = vec![\ PointStruct::new(1, vec![0.05, 0.61, 0.76, 0.74], [("city", "Berlin".into())]),\ PointStruct::new(2, vec![0.19, 0.81, 0.75, 0.11], [("city", "London".into())]),\ PointStruct::new(3, vec![0.36, 0.55, 0.47, 0.94], [("city", "Moscow".into())]),\ // ..truncated\ ]; let response = client .upsert_points(UpsertPointsBuilder::new("test_collection", points).wait(true)) .await?; dbg!(response); ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.ValueFactory.value; import static io.qdrant.client.VectorsFactory.vectors; import io.qdrant.client.grpc.Points.PointStruct; import io.qdrant.client.grpc.Points.UpdateResult; UpdateResult operationInfo = client .upsertAsync( "test_collection", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(0.05f, 0.61f, 0.76f, 0.74f)) .putAllPayload(Map.of("city", value("Berlin"))) .build(), PointStruct.newBuilder() .setId(id(2)) .setVectors(vectors(0.19f, 0.81f, 0.75f, 0.11f)) .putAllPayload(Map.of("city", value("London"))) .build(), PointStruct.newBuilder() .setId(id(3)) .setVectors(vectors(0.36f, 0.55f, 0.47f, 0.94f)) .putAllPayload(Map.of("city", value("Moscow"))) .build())) // Truncated .get(); System.out.println(operationInfo); ``` ```csharp using Qdrant.Client.Grpc; var operationInfo = await client.UpsertAsync(collectionName: "test_collection", points: new List { new() { Id = 1, Vectors = new float[] { 0.05f, 0.61f, 0.76f, 0.74f }, Payload = { ["city"] = "Berlin" } }, new() { Id = 2, Vectors = new float[] { 0.19f, 0.81f, 0.75f, 0.11f }, Payload = { ["city"] = "London" } }, new() { Id = 3, Vectors = new float[] { 0.36f, 0.55f, 0.47f, 0.94f }, Payload = { ["city"] = "Moscow" } }, // Truncated }); Console.WriteLine(operationInfo); ``` ```go import ( "context" "fmt" "github.com/qdrant/go-client/qdrant" ) operationInfo, err := client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "test_collection", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectors(0.05, 0.61, 0.76, 0.74), Payload: qdrant.NewValueMap(map[string]any{"city": "Berlin"}), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectors(0.19, 0.81, 0.75, 0.11), Payload: qdrant.NewValueMap(map[string]any{"city": "London"}), }, { Id: qdrant.NewIDNum(3), Vectors: qdrant.NewVectors(0.36, 0.55, 0.47, 0.94), Payload: qdrant.NewValueMap(map[string]any{"city": "Moscow"}), }, // Truncated }, }) if err != nil { panic(err) } fmt.Println(operationInfo) ``` **Response:** pythontypescriptrustjavacsharpgo ```python operation_id=0 status= ``` ```typescript { operation_id: 0, status: 'completed' } ``` ```rust PointsOperationResponse { result: Some( UpdateResult { operation_id: Some( 0, ), status: Completed, }, ), time: 0.00094027, } ``` ```java operation_id: 0 status: Completed ``` ```csharp { "operationId": "0", "status": "Completed" } ``` ```go operation_id:0 status:Acknowledged ``` ## [Anchor](https://qdrant.tech/documentation/quickstart/\#run-a-query) Run a query Let’s ask a basic question - Which of our stored vectors are most similar to the query vector `[0.2, 0.1, 0.9, 0.7]`? pythontypescriptrustjavacsharpgo ```python search_result = client.query_points( collection_name="test_collection", query=[0.2, 0.1, 0.9, 0.7], with_payload=False, limit=3 ).points print(search_result) ``` ```typescript let searchResult = await client.query( "test_collection", { query: [0.2, 0.1, 0.9, 0.7], limit: 3 }); console.debug(searchResult.points); ``` ```rust use qdrant_client::qdrant::QueryPointsBuilder; let search_result = client .query( QueryPointsBuilder::new("test_collection") .query(vec![0.2, 0.1, 0.9, 0.7]) ) .await?; dbg!(search_result); ``` ```java import java.util.List; import io.qdrant.client.grpc.Points.ScoredPoint; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; List searchResult = client.queryAsync(QueryPoints.newBuilder() .setCollectionName("test_collection") .setLimit(3) .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .build()).get(); System.out.println(searchResult); ``` ```csharp var searchResult = await client.QueryAsync( collectionName: "test_collection", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, limit: 3, ); Console.WriteLine(searchResult); ``` ```go import ( "context" "fmt" "github.com/qdrant/go-client/qdrant" ) searchResult, err := client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "test_collection", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), }) if err != nil { panic(err) } fmt.Println(searchResult) ``` **Response:** ```json [\ {\ "id": 4,\ "version": 0,\ "score": 1.362,\ "payload": null,\ "vector": null\ },\ {\ "id": 1,\ "version": 0,\ "score": 1.273,\ "payload": null,\ "vector": null\ },\ {\ "id": 3,\ "version": 0,\ "score": 1.208,\ "payload": null,\ "vector": null\ }\ ] ``` The results are returned in decreasing similarity order. Note that payload and vector data is missing in these results by default. See [payload and vector in the result](https://qdrant.tech/documentation/concepts/search/#payload-and-vector-in-the-result) on how to enable it. ## [Anchor](https://qdrant.tech/documentation/quickstart/\#add-a-filter) Add a filter We can narrow down the results further by filtering by payload. Let’s find the closest results that include “London”. pythontypescriptrustjavacsharpgo ```python from qdrant_client.models import Filter, FieldCondition, MatchValue search_result = client.query_points( collection_name="test_collection", query=[0.2, 0.1, 0.9, 0.7], query_filter=Filter( must=[FieldCondition(key="city", match=MatchValue(value="London"))] ), with_payload=True, limit=3, ).points print(search_result) ``` ```typescript searchResult = await client.query("test_collection", { query: [0.2, 0.1, 0.9, 0.7], filter: { must: [{ key: "city", match: { value: "London" } }], }, with_payload: true, limit: 3, }); console.debug(searchResult); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, QueryPointsBuilder}; let search_result = client .query( QueryPointsBuilder::new("test_collection") .query(vec![0.2, 0.1, 0.9, 0.7]) .filter(Filter::must([Condition::matches(\ "city",\ "London".to_string(),\ )])) .with_payload(true), ) .await?; dbg!(search_result); ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; List searchResult = client.queryAsync(QueryPoints.newBuilder() .setCollectionName("test_collection") .setLimit(3) .setFilter(Filter.newBuilder().addMust(matchKeyword("city", "London"))) .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setWithPayload(enable(true)) .build()).get(); System.out.println(searchResult); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; var searchResult = await client.QueryAsync( collectionName: "test_collection", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, filter: MatchKeyword("city", "London"), limit: 3, payloadSelector: true ); Console.WriteLine(searchResult); ``` ```go import ( "context" "fmt" "github.com/qdrant/go-client/qdrant" ) searchResult, err := client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "test_collection", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), }, }, WithPayload: qdrant.NewWithPayload(true), }) if err != nil { panic(err) } fmt.Println(searchResult) ``` **Response:** ```json [\ {\ "id": 2,\ "version": 0,\ "score": 0.871,\ "payload": {\ "city": "London"\ },\ "vector": null\ }\ ] ``` You have just conducted vector search. You loaded vectors into a database and queried the database with a vector of your own. Qdrant found the closest results and presented you with a similarity score. ## [Anchor](https://qdrant.tech/documentation/quickstart/\#next-steps) Next steps Now you know how Qdrant works. Getting started with [Qdrant Cloud](https://qdrant.tech/documentation/cloud/quickstart-cloud/) is just as easy. [Create an account](https://qdrant.to/cloud) and use our SaaS completely free. We will take care of infrastructure maintenance and software updates. To move onto some more complex examples of vector search, read our [Tutorials](https://qdrant.tech/documentation/tutorials/) and create your own app with the help of our [Examples](https://qdrant.tech/documentation/examples/). **Note:** There is another way of running Qdrant locally. If you are a Python developer, we recommend that you try Local Mode in [Qdrant Client](https://github.com/qdrant/qdrant-client), as it only takes a few moments to get setup. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/quickstart.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/quickstart.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-103-lllmstxt|> ## metric-learning-tips - [Articles](https://qdrant.tech/articles/) - Metric Learning Tips & Tricks [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Metric Learning Tips & Tricks Andrei Vasnetsov · May 15, 2021 ![Metric Learning Tips & Tricks](https://qdrant.tech/articles_data/metric-learning-tips/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#how-to-train-object-matching-model-with-no-labeled-data-and-use-it-in-production) How to train object matching model with no labeled data and use it in production Currently, most machine-learning-related business cases are solved as a classification problems. Classification algorithms are so well studied in practice that even if the original problem is not directly a classification task, it is usually decomposed or approximately converted into one. However, despite its simplicity, the classification task has requirements that could complicate its production integration and scaling. E.g. it requires a fixed number of classes, where each class should have a sufficient number of training samples. In this article, I will describe how we overcome these limitations by switching to metric learning. By the example of matching job positions and candidates, I will show how to train metric learning model with no manually labeled data, how to estimate prediction confidence, and how to serve metric learning in production. ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#what-is-metric-learning-and-why-using-it) What is metric learning and why using it? According to Wikipedia, metric learning is the task of learning a distance function over objects. In practice, it means that we can train a model that tells a number for any pair of given objects. And this number should represent a degree or score of similarity between those given objects. For example, objects with a score of 0.9 could be more similar than objects with a score of 0.5 Actual scores and their direction could vary among different implementations. In practice, there are two main approaches to metric learning and two corresponding types of NN architectures. The first is the interaction-based approach, which first builds local interactions (i.e., local matching signals) between two objects. Deep neural networks learn hierarchical interaction patterns for matching. Examples of neural network architectures include MV-LSTM, ARC-II, and MatchPyramid. ![MV-LSTM, example of interaction-based model](https://gist.githubusercontent.com/generall/4821e3c6b5eee603d56729e7a156e461/raw/b0eb4ea5d088fe1095e529eb12708ac69f304ce3/mv_lstm.png) > MV-LSTM, example of interaction-based model, [Shengxian Wan et al.](https://www.researchgate.net/figure/Illustration-of-MV-LSTM-S-X-and-S-Y-are-the-in_fig1_285271115) via Researchgate The second is the representation-based approach. In this case distance function is composed of 2 components: the Encoder transforms an object into embedded representation - usually a large float point vector, and the Comparator takes embeddings of a pair of objects from the Encoder and calculates their similarity. The most well-known example of this embedding representation is Word2Vec. Examples of neural network architectures also include DSSM, C-DSSM, and ARC-I. The Comparator is usually a very simple function that could be calculated very quickly. It might be cosine similarity or even a dot production. Two-stage schema allows performing complex calculations only once per object. Once transformed, the Comparator can calculate object similarity independent of the Encoder much more quickly. For more convenience, embeddings can be placed into specialized storages or vector search engines. These search engines allow to manage embeddings using API, perform searches and other operations with vectors. ![C-DSSM, example of representation-based model](https://gist.githubusercontent.com/generall/4821e3c6b5eee603d56729e7a156e461/raw/b0eb4ea5d088fe1095e529eb12708ac69f304ce3/cdssm.png) > C-DSSM, example of representation-based model, [Xue Li et al.](https://arxiv.org/abs/1901.10710v2) via arXiv Pre-trained NNs can also be used. The output of the second-to-last layer could work as an embedded representation. Further in this article, I would focus on the representation-based approach, as it proved to be more flexible and fast. So what are the advantages of using metric learning comparing to classification? Object Encoder does not assume the number of classes. So if you can’t split your object into classes, if the number of classes is too high, or you suspect that it could grow in the future - consider using metric learning. In our case, business goal was to find suitable vacancies for candidates who specify the title of the desired position. To solve this, we used to apply a classifier to determine the job category of the vacancy and the candidate. But this solution was limited to only a few hundred categories. Candidates were complaining that they couldn’t find the right category for them. Training the classifier for new categories would be too long and require new training data for each new category. Switching to metric learning allowed us to overcome these limitations, the resulting solution could compare any pair position descriptions, even if we don’t have this category reference yet. ![T-SNE with job samples](https://gist.githubusercontent.com/generall/4821e3c6b5eee603d56729e7a156e461/raw/b0eb4ea5d088fe1095e529eb12708ac69f304ce3/embeddings.png) > T-SNE with job samples, Image by Author. Play with [Embedding Projector](https://projector.tensorflow.org/?config=https://gist.githubusercontent.com/generall/7e712425e3b340c2c4dbc1a29f515d91/raw/b45b2b6f6c1d5ab3d3363c50805f3834a85c8879/config.json) yourself. With metric learning, we learn not a concrete job type but how to match job descriptions from a candidate’s CV and a vacancy. Secondly, with metric learning, it is easy to add more reference occupations without model retraining. We can then add the reference to a vector search engine. Next time we will match occupations - this new reference vector will be searchable. ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#data-for-metric-learning) Data for metric learning Unlike classifiers, a metric learning training does not require specific class labels. All that is required are examples of similar and dissimilar objects. We would call them positive and negative samples. At the same time, it could be a relative similarity between a pair of objects. For example, twins look more alike to each other than a pair of random people. And random people are more similar to each other than a man and a cat. A model can use such relative examples for learning. The good news is that the division into classes is only a special case of determining similarity. To use such datasets, it is enough to declare samples from one class as positive and samples from another class as negative. In this way, it is possible to combine several datasets with mismatched classes into one generalized dataset for metric learning. But not only datasets with division into classes are suitable for extracting positive and negative examples. If, for example, there are additional features in the description of the object, the value of these features can also be used as a similarity factor. It may not be as explicit as class membership, but the relative similarity is also suitable for learning. In the case of job descriptions, there are many ontologies of occupations, which were able to be combined into a single dataset thanks to this approach. We even went a step further and used identical job titles to find similar descriptions. As a result, we got a self-supervised universal dataset that did not require any manual labeling. Unfortunately, universality does not allow some techniques to be applied in training. Next, I will describe how to overcome this disadvantage. ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#training-the-model) Training the model There are several ways to train a metric learning model. Among the most popular is the use of Triplet or Contrastive loss functions, but I will not go deep into them in this article. However, I will tell you about one interesting trick that helped us work with unified training examples. One of the most important practices to efficiently train the metric learning model is hard negative mining. This technique aims to include negative samples on which model gave worse predictions during the last training epoch. Most articles that describe this technique assume that training data consists of many small classes (in most cases it is people’s faces). With data like this, it is easy to find bad samples - if two samples from different classes have a high similarity score, we can use it as a negative sample. But we had no such classes in our data, the only thing we have is occupation pairs assumed to be similar in some way. We cannot guarantee that there is no better match for each job occupation among this pair. That is why we can’t use hard negative mining for our model. ![Loss variations](https://gist.githubusercontent.com/generall/4821e3c6b5eee603d56729e7a156e461/raw/b0eb4ea5d088fe1095e529eb12708ac69f304ce3/losses.png) > [Alfonso Medela et al.](https://arxiv.org/abs/1905.10675) via arXiv To compensate for this limitation we can try to increase the number of random (weak) negative samples. One way to achieve this is to train the model longer, so it will see more samples by the end of the training. But we found a better solution in adjusting our loss function. In a regular implementation of Triplet or Contractive loss, each positive pair is compared with some or a few negative samples. What we did is we allow pair comparison amongst the whole batch. That means that loss-function penalizes all pairs of random objects if its score exceeds any of the positive scores in a batch. This extension gives `~ N * B^2` comparisons where `B` is a size of batch and `N` is a number of batches. Much bigger than `~ N * B` in regular triplet loss. This means that increasing the size of the batch significantly increases the number of negative comparisons, and therefore should improve the model performance. We were able to observe this dependence in our experiments. Similar idea we also found in the article [Supervised Contrastive Learning](https://arxiv.org/abs/2004.11362). ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#model-confidence) Model confidence In real life it is often needed to know how confident the model was in the prediction. Whether manual adjustment or validation of the result is required. With conventional classification, it is easy to understand by scores how confident the model is in the result. If the probability values of different classes are close to each other, the model is not confident. If, on the contrary, the most probable class differs greatly, then the model is confident. At first glance, this cannot be applied to metric learning. Even if the predicted object similarity score is small it might only mean that the reference set has no proper objects to compare with. Conversely, the model can group garbage objects with a large score. Fortunately, we found a small modification to the embedding generator, which allows us to define confidence in the same way as it is done in conventional classifiers with a Softmax activation function. The modification consists in building an embedding as a combination of feature groups. Each feature group is presented as a one-hot encoded sub-vector in the embedding. If the model can confidently predict the feature value - the corresponding sub-vector will have a high absolute value in some of its elements. For a more intuitive understanding, I recommend thinking about embeddings not as points in space, but as a set of binary features. To implement this modification and form proper feature groups we would need to change a regular linear output layer to a concatenation of several Softmax layers. Each softmax component would represent an independent feature and force the neural network to learn them. Let’s take for example that we have 4 softmax components with 128 elements each. Every such component could be roughly imagined as a one-hot-encoded number in the range of 0 to 127. Thus, the resulting vector will represent one of `128^4` possible combinations. If the trained model is good enough, you can even try to interpret the values of singular features individually. ![Softmax feature embeddings](https://gist.githubusercontent.com/generall/4821e3c6b5eee603d56729e7a156e461/raw/b0eb4ea5d088fe1095e529eb12708ac69f304ce3/feature_embedding.png) > Softmax feature embeddings, Image by Author. ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#neural-rules) Neural rules Machine learning models rarely train to 100% accuracy. In a conventional classifier, errors can only be eliminated by modifying and repeating the training process. Metric training, however, is more flexible in this matter and allows you to introduce additional steps that allow you to correct the errors of an already trained model. A common error of the metric learning model is erroneously declaring objects close although in reality they are not. To correct this kind of error, we introduce exclusion rules. Rules consist of 2 object anchors encoded into vector space. If the target object falls into one of the anchors’ effects area - it triggers the rule. It will exclude all objects in the second anchor area from the prediction result. ![Exclusion rules](https://gist.githubusercontent.com/generall/4821e3c6b5eee603d56729e7a156e461/raw/b0eb4ea5d088fe1095e529eb12708ac69f304ce3/exclusion_rule.png) > Neural exclusion rules, Image by Author. The convenience of working with embeddings is that regardless of the number of rules, you only need to perform the encoding once per object. Then to find a suitable rule, it is enough to compare the target object’s embedding and the pre-calculated embeddings of the rule’s anchors. Which, when implemented, translates into just one additional query to the vector search engine. ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#vector-search-in-production) Vector search in production When implementing a metric learning model in production, the question arises about the storage and management of vectors. It should be easy to add new vectors if new job descriptions appear in the service. In our case, we also needed to apply additional conditions to the search. We needed to filter, for example, the location of candidates and the level of language proficiency. We did not find a ready-made tool for such vector management, so we created [Qdrant](https://github.com/qdrant/qdrant) \- open-source vector search engine. It allows you to add and delete vectors with a simple API, independent of a programming language you are using. You can also assign the payload to vectors. This payload allows additional filtering during the search request. Qdrant has a pre-built docker image and start working with it is just as simple as running ```bash docker run -p 6333:6333 qdrant/qdrant ``` Documentation with examples could be found [here](https://api.qdrant.tech/api-reference). ## [Anchor](https://qdrant.tech/articles/metric-learning-tips/\#conclusion) Conclusion In this article, I have shown how metric learning can be more scalable and flexible than the classification models. I suggest trying similar approaches in your tasks - it might be matching similar texts, images, or audio data. With the existing variety of pre-trained neural networks and a vector search engine, it is easy to build your metric learning-based application. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/metric-learning-tips.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/metric-learning-tips.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-104-lllmstxt|> ## filtrable-hnsw - [Articles](https://qdrant.tech/articles/) - Filtrable HNSW [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Filtrable HNSW Andrei Vasnetsov · November 24, 2019 ![Filtrable HNSW](https://qdrant.tech/articles_data/filtrable-hnsw/preview/title.jpg) If you need to find some similar objects in vector space, provided e.g. by embeddings or matching NN, you can choose among a variety of libraries: Annoy, FAISS or NMSLib. All of them will give you a fast approximate neighbors search within almost any space. But what if you need to introduce some constraints in your search? For example, you want search only for products in some category or select the most similar customer of a particular brand. I did not find any simple solutions for this. There are several discussions like [this](https://github.com/spotify/annoy/issues/263), but they only suggest to iterate over top search results and apply conditions consequently after the search. Let’s see if we could somehow modify any of ANN algorithms to be able to apply constrains during the search itself. Annoy builds tree index over random projections. Tree index implies that we will meet same problem that appears in relational databases: if field indexes were built independently, then it is possible to use only one of them at a time. Since nobody solved this problem before, it seems that there is no easy approach. There is another algorithm which shows top results on the [benchmark](https://github.com/erikbern/ann-benchmarks). It is called HNSW which stands for Hierarchical Navigable Small World. The [original paper](https://arxiv.org/abs/1603.09320) is well written and very easy to read, so I will only give the main idea here. We need to build a navigation graph among all indexed points so that the greedy search on this graph will lead us to the nearest point. This graph is constructed by sequentially adding points that are connected by a fixed number of edges to previously added points. In the resulting graph, the number of edges at each point does not exceed a given threshold m and always contains the nearest considered points. ![NSW](https://qdrant.tech/articles_data/filtrable-hnsw/NSW.png) ### [Anchor](https://qdrant.tech/articles/filtrable-hnsw/\#how-can-we-modify-it) How can we modify it? What if we simply apply the filter criteria to the nodes of this graph and use in the greedy search only those that meet these criteria? It turns out that even with this naive modification algorithm can cover some use cases. One such case is if your criteria do not correlate with vector semantics. For example, you use a vector search for clothing names and want to filter out some sizes. In this case, the nodes will be uniformly filtered out from the entire cluster structure. Therefore, the theoretical conclusions obtained in the [Percolation theory](https://en.wikipedia.org/wiki/Percolation_theory) become applicable: > Percolation is related to the robustness of the graph (called also network). Given a random graph of n nodes and an average degree ⟨k⟩ . Next we remove randomly a fraction 1−p of nodes and leave only a fraction p. There exists a critical percolation threshold pc=1⟨k⟩ below which the network becomes fragmented while above pc a giant connected component exists. This statement also confirmed by experiments: ![Dependency of connectivity to the number of edges](https://qdrant.tech/articles_data/filtrable-hnsw/exp_connectivity_glove_m0.png) Dependency of connectivity to the number of edges ![Dependency of connectivity to the number of point (no dependency).](https://qdrant.tech/articles_data/filtrable-hnsw/exp_connectivity_glove_num_elements.png) Dependency of connectivity to the number of point (no dependency). There is a clear threshold when the search begins to fail. This threshold is due to the decomposition of the graph into small connected components. The graphs also show that this threshold can be shifted by increasing the m parameter of the algorithm, which is responsible for the degree of nodes. Let’s consider some other filtering conditions we might want to apply in the search: - Categorical filtering - Select only points in a specific category - Select points which belong to a specific subset of categories - Select points with a specific set of labels - Numerical range - Selection within some geographical region In the first case, we can guarantee that the HNSW graph will be connected simply by creating additional edges inside each category separately, using the same graph construction algorithm, and then combining them into the original graph. In this case, the total number of edges will increase by no more than 2 times, regardless of the number of categories. Second case is a little harder. A connection may be lost between two categories if they lie in different clusters. ![category clusters](https://qdrant.tech/articles_data/filtrable-hnsw/hnsw_graph_category.png) The idea here is to build same navigation graph but not between nodes, but between categories. Distance between two categories might be defined as distance between category entry points (or, for precision, as the average distance between a random sample). Now we can estimate expected graph connectivity by number of excluded categories, not nodes. It still does not guarantee that two random categories will be connected, but allows us to switch to multiple searches in each category if connectivity threshold passed. In some cases, multiple searches can be even faster if you take advantage of parallel processing. ![Dependency of connectivity to the random categories included in search](https://qdrant.tech/articles_data/filtrable-hnsw/exp_random_groups.png) Dependency of connectivity to the random categories included in search Third case might be resolved in a same way it is resolved in classical databases. Depending on labeled subsets size ration we can go for one of the following scenarios: - if at least one subset is small: perform search over the label containing smallest subset and then filter points consequently. - if large subsets give large intersection: perform regular search with constraints expecting that intersection size fits connectivity threshold. - if large subsets give small intersection: perform linear search over intersection expecting that it is small enough to fit a time frame. Numerical range case can be reduces to the previous one if we split numerical range into a buckets containing equal amount of points. Next we also connect neighboring buckets to achieve graph connectivity. We still need to filter some results which presence in border buckets but do not fulfill actual constraints, but their amount might be regulated by the size of buckets. Geographical case is a lot like a numerical one. Usual geographical search involves [geohash](https://en.wikipedia.org/wiki/Geohash), which matches any geo-point to a fixes length identifier. ![Geohash example](https://qdrant.tech/articles_data/filtrable-hnsw/geohash.png) We can use this identifiers as categories and additionally make connections between neighboring geohashes. It will ensure that any selected geographical region will also contain connected HNSW graph. ## [Anchor](https://qdrant.tech/articles/filtrable-hnsw/\#conclusion) Conclusion It is possible to enchant HNSW algorithm so that it will support filtering points in a first search phase. Filtering can be carried out on the basis of belonging to categories, which in turn is generalized to such popular cases as numerical ranges and geo. Experiments were carried by modification [python implementation](https://github.com/generall/hnsw-python) of the algorithm, but real production systems require much faster version, like [NMSLib](https://github.com/nmslib/nmslib). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/filtrable-hnsw.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/filtrable-hnsw.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-105-lllmstxt|> ## cluster-monitoring - [Documentation](https://qdrant.tech/documentation/) - [Cloud](https://qdrant.tech/documentation/cloud/) - Monitor Clusters --- # [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#monitoring-qdrant-cloud-clusters) Monitoring Qdrant Cloud Clusters ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#telemetry) Telemetry ![Cluster Metrics](https://qdrant.tech/documentation/cloud/cluster-metrics.png) Qdrant Cloud provides you with a set of metrics to monitor the health of your database cluster. You can access these metrics in the Qdrant Cloud Console in the **Metrics** and **Request** sections of the cluster details page. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#logs) Logs ![Cluster Logs](https://qdrant.tech/documentation/cloud/cluster-logs.png) Logs of the database cluster are available in the Qdrant Cloud Console in the **Logs** section of the cluster details page. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#alerts) Alerts You will receive automatic alerts via email before your cluster reaches the currently configured memory or storage limits, including recommendations for scaling your cluster. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#qdrant-database-metrics-and-telemetry) Qdrant Database Metrics and Telemetry You can also directly access the metrics and telemetry that the Qdrant database nodes provide. To scrape metrics from a Qdrant cluster running in Qdrant Cloud, an [API key](https://qdrant.tech/documentation/cloud/authentication/) is required to access `/metrics` and `/sys_metrics`. Qdrant Cloud also supports supplying the API key as a [Bearer token](https://www.rfc-editor.org/rfc/rfc6750.html), which may be required by some providers. ### [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#qdrant-node-metrics) Qdrant Node Metrics Metrics in a Prometheus compatible format are available at the `/metrics` endpoint of each Qdrant database node. When scraping, you should use the [node specific URLs](https://qdrant.tech/documentation/cloud/cluster-access/#node-specific-endpoints) to ensure that you are scraping metrics from all nodes in each cluster. For more information see [Qdrant monitoring](https://qdrant.tech/documentation/guides/monitoring/). You can also access the `/telemetry` [endpoint](https://api.qdrant.tech/api-reference/service/telemetry) of your database. This endpoint is available on the cluster endpoint and provides information about the current state of the database, including the number of vectors, shards, and other useful information. For more information, see [Qdrant monitoring](https://qdrant.tech/documentation/guides/monitoring/). ### [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#cluster-system-metrics) Cluster System Metrics Cluster system metrics is a cloud-only endpoint that not only shares all the information about the database from `/metrics` but also provides additional operational data from our infrastructure about your cluster, including information from our load balancers, ingresses, and cluster workloads themselves. Metrics in a Prometheus-compatible format are available at the `/sys_metrics` cluster endpoint. Database API Keys are used to authenticate access to cluster system metrics. `/sys_metrics` only need to be queried once per cluster on the main load-balanced cluster endpoint. You don’t need to scrape each cluster node individually, instead it will always provide metrics about all nodes. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#grafana-dashboard) Grafana Dashboard If you scrape your Qdrant Cluster system metrics into your own monitoring system, and your are using Grafana, you can use our [Grafana dashboard](https://github.com/qdrant/qdrant-cloud-grafana-dashboard) to visualize these metrics. ![Grafa dashboard](https://qdrant.tech/documentation/cloud/cloud-grafana-dashboard.png) Qdrant's Full Observability with Monitoring - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [Qdrant's Full Observability with Monitoring](https://www.youtube.com/watch?v=pKPP-tL5_6w) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=pKPP-tL5_6w&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 2:31 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=pKPP-tL5_6w "Watch on YouTube") ### [Anchor](https://qdrant.tech/documentation/cloud/cluster-monitoring/\#cluster-system-mtrics-sys_metrics) Cluster System Mtrics `/sys_metrics` In Qdrant Cloud, each Qdrant cluster will expose the following metrics. This endpoint is not available when running Qdrant open-source. **List of metrics** | Name | Type | Meaning | | --- | --- | --- | | app\_info | gauge | Information about the Qdrant server | | app\_status\_recovery\_mode | gauge | If Qdrant is currently started in recovery mode | | cluster\_commit | | | | cluster\_enabled | | Indicates wether multi-node clustering is enabled | | cluster\_peers\_total | counter | Total number of cluster peers | | cluster\_pending\_operations\_total | counter | Total number of pending operations in the cluster | | cluster\_term | | | | cluster\_voter | | | | collection\_hardware\_metric\_cpu | | | | collection\_hardware\_metric\_io\_read | | | | collection\_hardware\_metric\_io\_write | | | | collections\_total | counter | Number of collections | | collections\_vector\_total | counter | Total number of vectors in all collections | | container\_cpu\_cfs\_periods\_total | | | | container\_cpu\_cfs\_throttled\_periods\_total | counter | Indicating that your CPU demand was higher than what your instance offers | | container\_cpu\_usage\_seconds\_total | counter | Total CPU usage in seconds | | container\_file\_descriptors | | | | container\_fs\_reads\_bytes\_total | counter | Total number of bytes read by the container file system (disk) | | container\_fs\_reads\_total | counter | Total number of read operations on the container file system (disk) | | container\_fs\_writes\_bytes\_total | counter | Total number of bytes written by the container file system (disk) | | container\_fs\_writes\_total | counter | Total number of write operations on the container file system (disk) | | container\_memory\_cache | gauge | Memory used for cache in the container | | container\_memory\_mapped\_file | gauge | Memory used for memory-mapped files in the container | | container\_memory\_rss | gauge | Resident Set Size (RSS) - Memory used by the container excluding swap space used for caching | | container\_memory\_working\_set\_bytes | gauge | Total memory used by the container, including both anonymous and file-backed memory | | container\_network\_receive\_bytes\_total | counter | Total bytes received over the container’s network interface | | container\_network\_receive\_errors\_total | | | | container\_network\_receive\_packets\_dropped\_total | | | | container\_network\_receive\_packets\_total | | | | container\_network\_transmit\_bytes\_total | counter | Total bytes transmitted over the container’s network interface | | container\_network\_transmit\_errors\_total | | | | container\_network\_transmit\_packets\_dropped\_total | | | | container\_network\_transmit\_packets\_total | | | | kube\_persistentvolumeclaim\_info | | | | kube\_pod\_container\_info | | | | kube\_pod\_container\_resource\_limits | gauge | Response contains limits for CPU and memory of DB. | | kube\_pod\_container\_resource\_requests | gauge | Response contains requests for CPU and memory of DB. | | kube\_pod\_container\_status\_last\_terminated\_exitcode | | | | kube\_pod\_container\_status\_last\_terminated\_reason | | | | kube\_pod\_container\_status\_last\_terminated\_timestamp | | | | kube\_pod\_container\_status\_ready | | | | kube\_pod\_container\_status\_restarts\_total | | | | kube\_pod\_container\_status\_running | | | | kube\_pod\_container\_status\_terminated | | | | kube\_pod\_container\_status\_terminated\_reason | | | | kube\_pod\_created | | | | kube\_pod\_info | | | | kube\_pod\_start\_time | | | | kube\_pod\_status\_container\_ready\_time | | | | kube\_pod\_status\_initialized\_time | | | | kube\_pod\_status\_phase | gauge | Pod status in terms of different phases (Failed/Running/Succeeded/Unknown) | | kube\_pod\_status\_ready | gauge | Pod readiness state (unknown/false/true) | | kube\_pod\_status\_ready\_time | | | | kube\_pod\_status\_reason | | | | kubelet\_volume\_stats\_capacity\_bytes | gauge | Amount of disk available | | kubelet\_volume\_stats\_inodes | gauge | Amount of inodes available | | kubelet\_volume\_stats\_inodes\_used | gauge | Amount of inodes used | | kubelet\_volume\_stats\_used\_bytes | gauge | Amount of disk used | | memory\_active\_bytes | | | | memory\_allocated\_bytes | | | | memory\_metadata\_bytes | | | | memory\_resident\_bytes | | | | memory\_retained\_bytes | | | | qdrant\_cluster\_state | | | | qdrant\_collection\_commit | | | | qdrant\_collection\_config\_hnsw\_full\_ef\_construct | | | | qdrant\_collection\_config\_hnsw\_full\_scan\_threshold | | | | qdrant\_collection\_config\_hnsw\_m | | | | qdrant\_collection\_config\_hnsw\_max\_indexing\_threads | | | | qdrant\_collection\_config\_hnsw\_on\_disk | | | | qdrant\_collection\_config\_hnsw\_payload\_m | | | | qdrant\_collection\_config\_optimizer\_default\_segment\_number | | | | qdrant\_collection\_config\_optimizer\_deleted\_threshold | | | | qdrant\_collection\_config\_optimizer\_flush\_interval\_sec | | | | qdrant\_collection\_config\_optimizer\_indexing\_threshold | | | | qdrant\_collection\_config\_optimizer\_max\_optimization\_threads | | | | qdrant\_collection\_config\_optimizer\_max\_segment\_size | | | | qdrant\_collection\_config\_optimizer\_memmap\_threshold | | | | qdrant\_collection\_config\_optimizer\_vacuum\_min\_vector\_number | | | | qdrant\_collection\_config\_params\_always\_ram | | | | qdrant\_collection\_config\_params\_on\_disk\_payload | | | | qdrant\_collection\_config\_params\_product\_compression | | | | qdrant\_collection\_config\_params\_read\_fanout\_factor | | | | qdrant\_collection\_config\_params\_replication\_factor | | | | qdrant\_collection\_config\_params\_scalar\_quantile | | | | qdrant\_collection\_config\_params\_scalar\_type | | | | qdrant\_collection\_config\_params\_shard\_number | | | | qdrant\_collection\_config\_params\_vector\_size | | | | qdrant\_collection\_config\_params\_write\_consistency\_factor | | | | qdrant\_collection\_config\_quantization\_always\_ram | | | | qdrant\_collection\_config\_quantization\_product\_compression | | | | qdrant\_collection\_config\_quantization\_scalar\_quantile | | | | qdrant\_collection\_config\_quantization\_scalar\_type | | | | qdrant\_collection\_config\_wal\_capacity\_mb | | | | qdrant\_collection\_config\_wal\_segments\_ahead | | | | qdrant\_collection\_consensus\_thread\_status | | | | qdrant\_collection\_is\_voter | | | | qdrant\_collection\_number\_of\_collections | counter | Total number of collections in Qdrant | | qdrant\_collection\_number\_of\_grpc\_requests | counter | Total number of gRPC requests on a collection | | qdrant\_collection\_number\_of\_rest\_requests | counter | Total number of REST requests on a collection | | qdrant\_collection\_pending\_operations | counter | Total number of pending operations on a collection | | qdrant\_collection\_role | | | | qdrant\_collection\_shard\_segment\_num\_indexed\_vectors | | | | qdrant\_collection\_shard\_segment\_num\_points | | | | qdrant\_collection\_shard\_segment\_num\_vectors | | | | qdrant\_collection\_shard\_segment\_type | | | | qdrant\_collection\_term | | | | qdrant\_collection\_transfer | | | | qdrant\_operator\_cluster\_info\_total | | | | qdrant\_operator\_cluster\_phase | gauge | Information about the status of Qdrant clusters | | qdrant\_operator\_cluster\_pod\_up\_to\_date | | | | qdrant\_operator\_cluster\_restore\_info\_total | | | | qdrant\_operator\_cluster\_restore\_phase | | | | qdrant\_operator\_cluster\_scheduled\_snapshot\_info\_total | | | | qdrant\_operator\_cluster\_scheduled\_snapshot\_phase | | | | qdrant\_operator\_cluster\_snapshot\_duration\_sconds | | | | qdrant\_operator\_cluster\_snapshot\_phase | gauge | Information about the status of Qdrant cluster backups | | qdrant\_operator\_cluster\_status\_nodes | | | | qdrant\_operator\_cluster\_status\_nodes\_ready | | | | qdrant\_node\_rssanon\_bytes | gauge | Allocated memory without memory-mapped files. This is the hard metric on memory which will lead to an OOM if it goes over the limit | | rest\_responses\_avg\_duration\_seconds | | | | rest\_responses\_duration\_seconds\_bucket | | | | rest\_responses\_duration\_seconds\_count | | | | rest\_responses\_duration\_seconds\_sum | | | | rest\_responses\_fail\_total | | | | rest\_responses\_max\_duration\_seconds | | | | rest\_responses\_min\_duration\_seconds | | | | rest\_responses\_total | | | | traefik\_service\_open\_connections | | | | traefik\_service\_request\_duration\_seconds\_bucket | | | | traefik\_service\_request\_duration\_seconds\_count | | | | traefik\_service\_request\_duration\_seconds\_sum | gauge | Response contains list of metrics for each Traefik service. | | traefik\_service\_requests\_bytes\_total | | | | traefik\_service\_requests\_total | counter | Response contains list of metrics for each Traefik service. | | traefik\_service\_responses\_bytes\_total | | | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-monitoring.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-monitoring.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-106-lllmstxt|> ## private-cloud - [Documentation](https://qdrant.tech/documentation/) - Private Cloud --- # [Anchor](https://qdrant.tech/documentation/private-cloud/\#qdrant-private-cloud) Qdrant Private Cloud Qdrant Private Cloud allows you to manage Qdrant database clusters in any Kubernetes cluster on any infrastructure. It uses the same Qdrant Operator that powers Qdrant Managed Cloud and Qdrant Hybrid Cloud, but without any connection to the Qdrant Cloud Management Console. On top of the open source Qdrant database, it allows - Easy deployment and management of Qdrant database clusters in your own Kubernetes infrastructure - Zero-downtime upgrades of the Qdrant database with replication - Vertical and horizontal up and downscaling of the Qdrant database with auto rebalancing and shard splitting - Full control over scheduling, including Multi-AZ deployments - Backup & Disaster Recovery - Extended telemetry - Qdrant Enterprise Support Services If you are interested in using Qdrant Private Cloud, please [contact us](https://qdrant.tech/contact-us/) for more information. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-107-lllmstxt|> ## tags --- # Qdrant Blog ## Features and News What are you Looking for? [![GraphRAG: How Lettria Unlocked 20% Accuracy Gains with Qdrant and Neo4j](https://qdrant.tech/blog/case-study-lettria-v2/preview/title.jpg)\\ **GraphRAG: How Lettria Unlocked 20% Accuracy Gains with Qdrant and Neo4j** \\ \\ Daniel Azoulai\\ \\ June 17, 2025](https://qdrant.tech/blog/case-study-lettria-v2/) [**How Lawme Scaled AI Legal Assistants and Significantly Cut Costs with Qdrant** \\ \\ Daniel Azoulai\\ \\ June 11, 2025](https://qdrant.tech/blog/case-study-lawme/)[**How ConvoSearch Boosted Revenue for D2C Brands with Qdrant** \\ \\ Daniel Azoulai\\ \\ June 10, 2025](https://qdrant.tech/blog/case-study-convosearch/)[**​​Introducing the Official Qdrant Node for n8n** \\ \\ Maddie Duhon & Evgeniya Sukhodolskaya\\ \\ June 09, 2025](https://qdrant.tech/blog/n8n-node/) [![Vector Data Migration Tool](https://qdrant.tech/blog/beta-database-migration-tool/preview/preview.jpg)\\ **Vector Data Migration Tool** \\ Migrate data across clusters, regions, from open source to cloud, and more with just one command.\\ \\ Qdrant\\ \\ June 16, 2025](https://qdrant.tech/blog/beta-database-migration-tool/)[![LegalTech Builder's Guide: Navigating Strategic Decisions with Vector Search](https://qdrant.tech/blog/legal-tech-builders-guide/preview/preview.jpg)\\ **LegalTech Builder's Guide: Navigating Strategic Decisions with Vector Search** \\ This guide explores critical architectural decisions for LegalTech builders using Qdrant, covering accuracy, hybrid search, reranking, score boosting, quantization, and enterprise scaling needs.\\ \\ Daniel Azoulai\\ \\ June 10, 2025](https://qdrant.tech/blog/legal-tech-builders-guide/)[![Qdrant Achieves SOC 2 Type II and HIPAA Certifications](https://qdrant.tech/blog/soc-2-type-II-hipaa/preview/preview.jpg)\\ **Qdrant Achieves SOC 2 Type II and HIPAA Certifications** \\ Qdrant achieves SOC 2 Type II and HIPAA certifications.\\ \\ Daniel Azoulai\\ \\ June 10, 2025](https://qdrant.tech/blog/soc-2-type-ii-hipaa/)[![Qdrant + DataTalks.Club: Free 10-Week Course on LLM Applications](https://qdrant.tech/blog/datatalks-course/preview/preview.jpg)\\ **Qdrant + DataTalks.Club: Free 10-Week Course on LLM Applications** \\ Gain hands-on experience with RAG, vector search, evaluation, monitoring, and more.\\ \\ Qdrant\\ \\ June 05, 2025](https://qdrant.tech/blog/datatalks-course/)[![How Qovery Accelerated Developer Autonomy with Qdrant](https://qdrant.tech/blog/case-study-qovery/preview/preview.jpg)\\ **How Qovery Accelerated Developer Autonomy with Qdrant** \\ Discover how Qovery empowered developers and drastically reduced infrastructure management latency using Qdrant.\\ \\ Daniel Azoulai\\ \\ May 27, 2025](https://qdrant.tech/blog/case-study-qovery/)[![How Tripadvisor Drives 2 to 3x More Revenue with Qdrant-Powered AI](https://qdrant.tech/blog/case-study-tripadvisor/preview/preview.jpg)\\ **How Tripadvisor Drives 2 to 3x More Revenue with Qdrant-Powered AI** \\ Tripadvisor transformed trip planning and search by using Qdrant to index over a billion user-generated reviews and images. Learn how this powered AI features that boost revenue 2 to 3x for engaged users.\\ \\ Daniel Azoulai\\ \\ May 13, 2025](https://qdrant.tech/blog/case-study-tripadvisor/)[![Precision at Scale: How Aracor Accelerated Legal Due Diligence with Hybrid Vector Search](https://qdrant.tech/blog/case-study-aracor/preview/preview.jpg)\\ **Precision at Scale: How Aracor Accelerated Legal Due Diligence with Hybrid Vector Search** \\ Explore how Aracor transformed manual, error-prone legal document processing into an accurate, scalable, and rapid workflow, leveraging hybrid, filtered, and multitenant vector search technology.\\ \\ Daniel Azoulai\\ \\ May 13, 2025](https://qdrant.tech/blog/case-study-aracor/)[![How Garden Scaled Patent Intelligence with Qdrant](https://qdrant.tech/blog/case-study-garden-intel/preview/preview.jpg)\\ **How Garden Scaled Patent Intelligence with Qdrant** \\ Discover how Garden ingests 200 M+ patents and product documents, achieves sub-100 ms query latency, and launched a new infringement-analysis business line with Qdrant.\\ \\ Daniel Azoulai\\ \\ May 09, 2025](https://qdrant.tech/blog/case-study-garden-intel/)[![Exploring Qdrant Cloud Just Got Easier](https://qdrant.tech/blog/product-ui-changes/preview/preview.jpg)\\ **Exploring Qdrant Cloud Just Got Easier** \\ Read about recent improvements designed to simplify your journey from login, creating your first cluster, prototyping, and going to production.\\ \\ Qdrant\\ \\ May 06, 2025](https://qdrant.tech/blog/product-ui-changes/) - [1](https://qdrant.tech/blog/) - [2](https://qdrant.tech/blog/page/2/) - [3](https://qdrant.tech/blog/page/3/) - [4](https://qdrant.tech/blog/page/4/) - [5](https://qdrant.tech/blog/page/5/) - [6](https://qdrant.tech/blog/page/6/) - [7](https://qdrant.tech/blog/page/7/) - [8](https://qdrant.tech/blog/page/8/) - [9](https://qdrant.tech/blog/page/9/) - [10](https://qdrant.tech/blog/page/10/) - [11](https://qdrant.tech/blog/page/11/) - [12](https://qdrant.tech/blog/page/12/) - [13](https://qdrant.tech/blog/page/13/) - [Newest](https://qdrant.tech/blog/) ### Get Started with Qdrant Free [Get Started](https://cloud.qdrant.io/signup) ![](https://qdrant.tech/img/rocket.svg) ###### Sign up for Qdrant updates We'll occasionally send you best practices for using vector data and similarity search, as well as product news. Email\* utm\_campaign utm\_content utm\_medium utm\_source last\_form\_fill\_url referrer\_url Last Conversion Campaign Type Last Conversion Campaign Name explicit opt in By submitting, you agree to subscribe to Qdrant's updates. You can withdraw your consent anytime. More details are in the [Privacy Policy](https://qdrant.tech/legal/privacy-policy/). × [Powered by](https://qdrant.tech/) <|page-108-lllmstxt|> ## embedding-recycler - [Articles](https://qdrant.tech/articles/) - Layer Recycling and Fine-tuning Efficiency [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Layer Recycling and Fine-tuning Efficiency Yusuf Sarıgöz · August 23, 2022 ![Layer Recycling and Fine-tuning Efficiency](https://qdrant.tech/articles_data/embedding-recycling/preview/title.jpg) A recent [paper](https://arxiv.org/abs/2207.04993) by Allen AI has attracted attention in the NLP community as they cache the output of a certain intermediate layer in the training and inference phases to achieve a speedup of ~83% with a negligible loss in model performance. This technique is quite similar to [the caching mechanism in Quaterion](https://quaterion.qdrant.tech/tutorials/cache_tutorial.html), but the latter is intended for any data modalities while the former focuses only on language models despite presenting important insights from their experiments. In this post, I will share our findings combined with those, hoping to provide the community with a wider perspective on layer recycling. ## [Anchor](https://qdrant.tech/articles/embedding-recycler/\#how-layer-recycling-works) How layer recycling works The main idea of layer recycling is to accelerate the training (and inference) by avoiding repeated passes of the same data object through the frozen layers. Instead, it is possible to pass objects through those layers only once, cache the output and use them as inputs to the unfrozen layers in future epochs. In the paper, they usually cache 50% of the layers, e.g., the output of the 6th multi-head self-attention block in a 12-block encoder. However, they find out that it does not work equally for all the tasks. For example, the question answering task suffers from a more significant degradation in performance with 50% of the layers recycled, and they choose to lower it down to 25% for this task, so they suggest determining the level of caching based on the task at hand. they also note that caching provides a more considerable speedup for larger models and on lower-end machines. In layer recycling, the cache is hit for exactly the same object. It is easy to achieve this in textual data as it is easily hashable, but you may need more advanced tricks to generate keys for the cache when you want to generalize this technique to diverse data types. For instance, hashing PyTorch tensors [does not work as you may expect](https://github.com/joblib/joblib/issues/1282). Quaterion comes with an intelligent key extractor that may be applied to any data type, but it is also allowed to customize it with a callable passed as an argument. Thanks to this flexibility, we were able to run a variety of experiments in different setups, and I believe that these findings will be helpful for your future projects. ## [Anchor](https://qdrant.tech/articles/embedding-recycler/\#experiments) Experiments We conducted different experiments to test the performance with: 1. Different numbers of layers recycled in [the similar cars search example](https://quaterion.qdrant.tech/tutorials/cars-tutorial.html). 2. Different numbers of samples in the dataset for training and fine-tuning for similar cars search. 3. Different numbers of layers recycled in [the question answerring example](https://quaterion.qdrant.tech/tutorials/nlp_tutorial.html). ## [Anchor](https://qdrant.tech/articles/embedding-recycler/\#easy-layer-recycling-with-quaterion) Easy layer recycling with Quaterion The easiest way of caching layers in Quaterion is to compose a [TrainableModel](https://quaterion.qdrant.tech/quaterion.train.trainable_model.html#quaterion.train.trainable_model.TrainableModel) with a frozen [Encoder](https://quaterion-models.qdrant.tech/quaterion_models.encoders.encoder.html#quaterion_models.encoders.encoder.Encoder) and an unfrozen [EncoderHead](https://quaterion-models.qdrant.tech/quaterion_models.heads.encoder_head.html#quaterion_models.heads.encoder_head.EncoderHead). Therefore, we modified the `TrainableModel` in the [example](https://github.com/qdrant/quaterion/blob/master/examples/cars/models.py) as in the following: ```python class Model(TrainableModel): # ... def configure_encoders(self) -> Union[Encoder, Dict[str, Encoder]]: pre_trained_encoder = torchvision.models.resnet34(pretrained=True) self.avgpool = copy.deepcopy(pre_trained_encoder.avgpool) self.finetuned_block = copy.deepcopy(pre_trained_encoder.layer4) modules = [] for name, child in pre_trained_encoder.named_children(): modules.append(child) if name == "layer3": break pre_trained_encoder = nn.Sequential(*modules) return CarsEncoder(pre_trained_encoder) def configure_head(self, input_embedding_size) -> EncoderHead: return SequentialHead(self.finetuned_block, self.avgpool, nn.Flatten(), SkipConnectionHead(512, dropout=0.3, skip_dropout=0.2), output_size=512) # ... ``` This trick lets us finetune one more layer from the base model as a part of the `EncoderHead` while still benefiting from the speedup in the frozen `Encoder` provided by the cache. ## [Anchor](https://qdrant.tech/articles/embedding-recycler/\#experiment-1-percentage-of-layers-recycled) Experiment 1: Percentage of layers recycled The paper states that recycling 50% of the layers yields little to no loss in performance when compared to full fine-tuning. In this setup, we compared performances of four methods: 1. Freeze the whole base model and train only `EncoderHead`. 2. Move one of the four residual blocks `EncoderHead` and train it together with the head layer while freezing the rest (75% layer recycling). 3. Move two of the four residual blocks to `EncoderHead` while freezing the rest (50% layer recycling). 4. Train the whole base model together with `EncoderHead`. **Note**: During these experiments, we used ResNet34 instead of ResNet152 as the pretrained model in order to be able to use a reasonable batch size in full training. The baseline score with ResNet34 is 0.106. | Model | RRP | | --- | --- | | Full training | 0.32 | | 50% recycling | 0.31 | | 75% recycling | 0.28 | | Head only | 0.22 | | Baseline | 0.11 | As is seen in the table, the performance in 50% layer recycling is very close to that in full training. Additionally, we can still have a considerable speedup in 50% layer recycling with only a small drop in performance. Although 75% layer recycling is better than training only `EncoderHead`, its performance drops quickly when compared to 50% layer recycling and full training. ## [Anchor](https://qdrant.tech/articles/embedding-recycler/\#experiment-2-amount-of-available-data) Experiment 2: Amount of available data In the second experiment setup, we compared performances of fine-tuning strategies with different dataset sizes. We sampled 50% of the training set randomly while still evaluating models on the whole validation set. | Model | RRP | | --- | --- | | Full training | 0.27 | | 50% recycling | 0.26 | | 75% recycling | 0.25 | | Head only | 0.21 | | Baseline | 0.11 | This experiment shows that, the smaller the available dataset is, the bigger drop in performance we observe in full training, 50% and 75% layer recycling. On the other hand, the level of degradation in training only `EncoderHead` is really small when compared to others. When we further reduce the dataset size, full training becomes untrainable at some point, while we can still improve over the baseline by training only `EncoderHead`. ## [Anchor](https://qdrant.tech/articles/embedding-recycler/\#experiment-3-layer-recycling-in-question-answering) Experiment 3: Layer recycling in question answering We also wanted to test layer recycling in a different domain as one of the most important takeaways of the paper is that the performance of layer recycling is task-dependent. To this end, we set up an experiment with the code from the [Question Answering with Similarity Learning tutorial](https://quaterion.qdrant.tech/tutorials/nlp_tutorial.html). | Model | RP@1 | RRK | | --- | --- | --- | | Full training | 0.76 | 0.65 | | 50% recycling | 0.75 | 0.63 | | 75% recycling | 0.69 | 0.59 | | Head only | 0.67 | 0.58 | | Baseline | 0.64 | 0.55 | In this task, 50% layer recycling can still do a good job with only a small drop in performance when compared to full training. However, the level of degradation is smaller than that in the similar cars search example. This can be attributed to several factors such as the pretrained model quality, dataset size and task definition, and it can be the subject of a more elaborate and comprehensive research project. Another observation is that the performance of 75% layer recycling is closer to that of training only `EncoderHead` than 50% layer recycling. ## [Anchor](https://qdrant.tech/articles/embedding-recycler/\#conclusion) Conclusion We set up several experiments to test layer recycling under different constraints and confirmed that layer recycling yields varying performances with different tasks and domains. One of the most important observations is the fact that the level of degradation in layer recycling is sublinear with a comparison to full training, i.e., we lose a smaller percentage of performance than the percentage we recycle. Additionally, training only `EncoderHead` is more resistant to small dataset sizes. There is even a critical size under which full training does not work at all. The issue of performance differences shows that there is still room for further research on layer recycling, and luckily Quaterion is flexible enough to run such experiments quickly. We will continue to report our findings on fine-tuning efficiency. **Fun fact**: The preview image for this article was created with Dall.e with the following prompt: “Photo-realistic robot using a tuning fork to adjust a piano.” [Click here](https://qdrant.tech/articles_data/embedding-recycling/full.png) to see it in full size! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/embedding-recycler.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/embedding-recycler.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-109-lllmstxt|> ## neural-search-tutorial - [Articles](https://qdrant.tech/articles/) - Neural Search 101: A Complete Guide and Step-by-Step Tutorial [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # Neural Search 101: A Complete Guide and Step-by-Step Tutorial Andrey Vasnetsov · June 10, 2021 ![Neural Search 101: A Complete Guide and Step-by-Step Tutorial](https://qdrant.tech/articles_data/neural-search-tutorial/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#neural-search-101-a-comprehensive-guide-and-step-by-step-tutorial) Neural Search 101: A Comprehensive Guide and Step-by-Step Tutorial Information retrieval technology is one of the main technologies that enabled the modern Internet to exist. These days, search technology is the heart of a variety of applications. From web-pages search to product recommendations. For many years, this technology didn’t get much change until neural networks came into play. In this guide we are going to find answers to these questions: - What is the difference between regular and neural search? - What neural networks could be used for search? - In what tasks is neural network search useful? - How to build and deploy own neural search service step-by-step? ## [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#what-is-neural-search) What is neural search? A regular full-text search, such as Google’s, consists of searching for keywords inside a document. For this reason, the algorithm can not take into account the real meaning of the query and documents. Many documents that might be of interest to the user are not found because they use different wording. Neural search tries to solve exactly this problem - it attempts to enable searches not by keywords but by meaning. To achieve this, the search works in 2 steps. In the first step, a specially trained neural network encoder converts the query and the searched objects into a vector representation called embeddings. The encoder must be trained so that similar objects, such as texts with the same meaning or alike pictures get a close vector representation. ![Encoders and embedding space](https://gist.githubusercontent.com/generall/c229cc94be8c15095286b0c55a3f19d7/raw/e52e3f1a320cd985ebc96f48955d7f355de8876c/encoders.png) Having this vector representation, it is easy to understand what the second step should be. To find documents similar to the query you now just need to find the nearest vectors. The most convenient way to determine the distance between two vectors is to calculate the cosine distance. The usual Euclidean distance can also be used, but it is not so efficient due to [the curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality). ## [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#which-model-could-be-used) Which model could be used? It is ideal to use a model specially trained to determine the closeness of meanings. For example, models trained on Semantic Textual Similarity (STS) datasets. Current state-of-the-art models can be found on this [leaderboard](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=roberta-a-robustly-optimized-bert-pretraining). However, not only specially trained models can be used. If the model is trained on a large enough dataset, its internal features can work as embeddings too. So, for instance, you can take any pre-trained on ImageNet model and cut off the last layer from it. In the penultimate layer of the neural network, as a rule, the highest-level features are formed, which, however, do not correspond to specific classes. The output of this layer can be used as an embedding. ## [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#what-tasks-is-neural-search-good-for) What tasks is neural search good for? Neural search has the greatest advantage in areas where the query cannot be formulated precisely. Querying a table in an SQL database is not the best place for neural search. On the contrary, if the query itself is fuzzy, or it cannot be formulated as a set of conditions - neural search can help you. If the search query is a picture, sound file or long text, neural network search is almost the only option. If you want to build a recommendation system, the neural approach can also be useful. The user’s actions can be encoded in vector space in the same way as a picture or text. And having those vectors, it is possible to find semantically similar users and determine the next probable user actions. ## [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#step-by-step-neural-search-tutorial-using-qdrant) Step-by-step neural search tutorial using Qdrant With all that said, let’s make our neural network search. As an example, I decided to make a search for startups by their description. In this demo, we will see the cases when text search works better and the cases when neural network search works better. I will use data from [startups-list.com](https://www.startups-list.com/). Each record contains the name, a paragraph describing the company, the location and a picture. Raw parsed data can be found at [this link](https://storage.googleapis.com/generall-shared-data/startups_demo.json). ### [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#step-1-prepare-data-for-neural-search) Step 1: Prepare data for neural search To be able to search for our descriptions in vector space, we must get vectors first. We need to encode the descriptions into a vector representation. As the descriptions are textual data, we can use a pre-trained language model. As mentioned above, for the task of text search there is a whole set of pre-trained models specifically tuned for semantic similarity. One of the easiest libraries to work with pre-trained language models, in my opinion, is the [sentence-transformers](https://github.com/UKPLab/sentence-transformers) by UKPLab. It provides a way to conveniently download and use many pre-trained models, mostly based on transformer architecture. Transformers is not the only architecture suitable for neural search, but for our task, it is quite enough. We will use a model called `all-MiniLM-L6-v2`. This model is an all-round model tuned for many use-cases. Trained on a large and diverse dataset of over 1 billion training pairs. It is optimized for low memory consumption and fast inference. The complete code for data preparation with detailed comments can be found and run in [Colab Notebook](https://colab.research.google.com/drive/1kPktoudAP8Tu8n8l-iVMOQhVmHkWV_L9?usp=sharing). [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kPktoudAP8Tu8n8l-iVMOQhVmHkWV_L9?usp=sharing) ### [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#step-2-incorporate-a-vector-search-engine) Step 2: Incorporate a Vector search engine Now as we have a vector representation for all our records, we need to store them somewhere. In addition to storing, we may also need to add or delete a vector, save additional information with the vector. And most importantly, we need a way to search for the nearest vectors. The vector search engine can take care of all these tasks. It provides a convenient API for searching and managing vectors. In our tutorial, we will use [Qdrant vector search engine](https://github.com/qdrant/qdrant) vector search engine. It not only supports all necessary operations with vectors but also allows you to store additional payload along with vectors and use it to perform filtering of the search result. Qdrant has a client for Python and also defines the API schema if you need to use it from other languages. The easiest way to use Qdrant is to run a pre-built image. So make sure you have Docker installed on your system. To start Qdrant, use the instructions on its [homepage](https://github.com/qdrant/qdrant). Download image from [DockerHub](https://hub.docker.com/r/qdrant/qdrant): ```bash docker pull qdrant/qdrant ``` And run the service inside the docker: ```bash docker run -p 6333:6333 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrant ``` You should see output like this ```text ... [2021-02-05T00:08:51Z INFO actix_server::builder] Starting 12 workers [2021-02-05T00:08:51Z INFO actix_server::builder] Starting "actix-web-service-0.0.0.0:6333" service on 0.0.0.0:6333 ``` This means that the service is successfully launched and listening port 6333. To make sure you can test [http://localhost:6333/](http://localhost:6333/) in your browser and get qdrant version info. All uploaded to Qdrant data is saved into the `./qdrant_storage` directory and will be persisted even if you recreate the container. ### [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#step-3-upload-data-to-qdrant) Step 3: Upload data to Qdrant Now once we have the vectors prepared and the search engine running, we can start uploading the data. To interact with Qdrant from python, I recommend using an out-of-the-box client library. To install it, use the following command ```bash pip install qdrant-client ``` At this point, we should have startup records in file `startups.json`, encoded vectors in file `startup_vectors.npy`, and running Qdrant on a local machine. Let’s write a script to upload all startup data and vectors into the search engine. First, let’s create a client object for Qdrant. ```python --- # Import client library from qdrant_client import QdrantClient from qdrant_client.models import VectorParams, Distance qdrant_client = QdrantClient(host='localhost', port=6333) ``` Qdrant allows you to combine vectors of the same purpose into collections. Many independent vector collections can exist on one service at the same time. Let’s create a new collection for our startup vectors. ```python if not qdrant_client.collection_exists('startups'): qdrant_client.create_collection( collection_name='startups', vectors_config=VectorParams(size=384, distance=Distance.COSINE), ) ``` The `vector_size` parameter is very important. It tells the service the size of the vectors in that collection. All vectors in a collection must have the same size, otherwise, it is impossible to calculate the distance between them. `384` is the output dimensionality of the encoder we are using. The `distance` parameter allows specifying the function used to measure the distance between two points. The Qdrant client library defines a special function that allows you to load datasets into the service. However, since there may be too much data to fit a single computer memory, the function takes an iterator over the data as input. Let’s create an iterator over the startup data and vectors. ```python import numpy as np import json fd = open('./startups.json') --- # payload is now an iterator over startup data payload = map(json.loads, fd) --- # Here we load all vectors into memory, numpy array works as iterable for itself. --- # Other option would be to use Mmap, if we don't want to load all data into RAM vectors = np.load('./startup_vectors.npy') ``` And the final step - data uploading ```python qdrant_client.upload_collection( collection_name='startups', vectors=vectors, payload=payload, ids=None, # Vector ids will be assigned automatically batch_size=256 # How many vectors will be uploaded in a single request? ) ``` Now we have vectors uploaded to the vector search engine. In the next step, we will learn how to actually search for the closest vectors. The full code for this step can be found [here](https://github.com/qdrant/qdrant_demo/blob/master/qdrant_demo/init_collection_startups.py). ### [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#step-4-make-a-search-api) Step 4: Make a search API Now that all the preparations are complete, let’s start building a neural search class. First, install all the requirements: ```bash pip install sentence-transformers numpy ``` In order to process incoming requests neural search will need 2 things. A model to convert the query into a vector and Qdrant client, to perform a search queries. ```python --- # File: neural_searcher.py from qdrant_client import QdrantClient from sentence_transformers import SentenceTransformer class NeuralSearcher: def __init__(self, collection_name): self.collection_name = collection_name # Initialize encoder model self.model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu') # initialize Qdrant client self.qdrant_client = QdrantClient(host='localhost', port=6333) ``` The search function looks as simple as possible: ```python def search(self, text: str): # Convert text query into vector vector = self.model.encode(text).tolist() # Use `vector` for search for closest vectors in the collection search_result = self.qdrant_client.search( collection_name=self.collection_name, query_vector=vector, query_filter=None, # We don't want any filters for now top=5 # 5 the most closest results is enough ) # `search_result` contains found vector ids with similarity scores along with the stored payload # In this function we are interested in payload only payloads = [hit.payload for hit in search_result] return payloads ``` With Qdrant it is also feasible to add some conditions to the search. For example, if we wanted to search for startups in a certain city, the search query could look like this: ```python from qdrant_client.models import Filter ... city_of_interest = "Berlin" # Define a filter for cities city_filter = Filter(**{ "must": [{\ "key": "city", # We store city information in a field of the same name\ "match": { # This condition checks if payload field have requested value\ "keyword": city_of_interest\ }\ }] }) search_result = self.qdrant_client.search( collection_name=self.collection_name, query_vector=vector, query_filter=city_filter, top=5 ) ... ``` We now have a class for making neural search queries. Let’s wrap it up into a service. ### [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#step-5-deploy-as-a-service) Step 5: Deploy as a service To build the service we will use the FastAPI framework. It is super easy to use and requires minimal code writing. To install it, use the command ```bash pip install fastapi uvicorn ``` Our service will have only one API endpoint and will look like this: ```python --- # That is the file where NeuralSearcher is stored from neural_searcher import NeuralSearcher app = FastAPI() --- # Create an instance of the neural searcher neural_searcher = NeuralSearcher(collection_name='startups') @app.get("/api/search") def search_startup(q: str): return { "result": neural_searcher.search(text=q) } if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` Now, if you run the service with ```bash python service.py ``` and open your browser at [http://localhost:8000/docs](http://localhost:8000/docs) , you should be able to see a debug interface for your service. ![FastAPI Swagger interface](https://gist.githubusercontent.com/generall/c229cc94be8c15095286b0c55a3f19d7/raw/d866e37a60036ebe65508bd736faff817a5d27e9/fastapi_neural_search.png) Feel free to play around with it, make queries and check out the results. This concludes the tutorial. ### [Anchor](https://qdrant.tech/articles/neural-search-tutorial/\#experience-neural-search-with-qdrants-free-demo) Experience Neural Search With Qdrant’s Free Demo Excited to see neural search in action? Take the next step and book a [free demo](https://qdrant.to/semantic-search-demo) with Qdrant! Experience firsthand how this cutting-edge technology can transform your search capabilities. Our demo will help you grow intuition for cases when the neural search is useful. The demo contains a switch that selects between neural and full-text searches. You can turn neural search on and off to compare the result with regular full-text search. Try to use a startup description to find similar ones. Join our [Discord community](https://qdrant.to/discord), where we talk about vector search and similarity learning, and publish other examples of neural networks and neural search applications. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/neural-search-tutorial.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/neural-search-tutorial.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-110-lllmstxt|> ## multitenancy - [Articles](https://qdrant.tech/articles/) - How to Implement Multitenancy and Custom Sharding in Qdrant [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # How to Implement Multitenancy and Custom Sharding in Qdrant David Myriel · February 06, 2024 ![How to Implement Multitenancy and Custom Sharding in Qdrant](https://qdrant.tech/articles_data/multitenancy/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/multitenancy/\#scaling-your-machine-learning-setup-the-power-of-multitenancy-and-custom-sharding-in-qdrant) Scaling Your Machine Learning Setup: The Power of Multitenancy and Custom Sharding in Qdrant We are seeing the topics of [multitenancy](https://qdrant.tech/documentation/guides/multiple-partitions/) and [distributed deployment](https://qdrant.tech/documentation/guides/distributed_deployment/#sharding) pop-up daily on our [Discord support channel](https://qdrant.to/discord). This tells us that many of you are looking to scale Qdrant along with the rest of your machine learning setup. Whether you are building a bank fraud-detection system, [RAG](https://qdrant.tech/articles/what-is-rag-in-ai/) for e-commerce, or services for the federal government - you will need to leverage a multitenant architecture to scale your product. In the world of SaaS and enterprise apps, this setup is the norm. It will considerably increase your application’s performance and lower your hosting costs. ## [Anchor](https://qdrant.tech/articles/multitenancy/\#multitenancy--custom-sharding-with-qdrant) Multitenancy & custom sharding with Qdrant We have developed two major features just for this. **You can now scale a single Qdrant cluster and support all of your customers worldwide.** Under [multitenancy](https://qdrant.tech/documentation/guides/multiple-partitions/), each customer’s data is completely isolated and only accessible by them. At times, if this data is location-sensitive, Qdrant also gives you the option to divide your cluster by region or other criteria that further secure your customer’s access. This is called [custom sharding](https://qdrant.tech/documentation/guides/distributed_deployment/#user-defined-sharding). Combining these two will result in an efficiently-partitioned architecture that further leverages the convenience of a single Qdrant cluster. This article will briefly explain the benefits and show how you can get started using both features. ## [Anchor](https://qdrant.tech/articles/multitenancy/\#one-collection-many-tenants) One collection, many tenants When working with Qdrant, you can upsert all your data to a single collection, and then partition each vector via its payload. This means that all your users are leveraging the power of a single Qdrant cluster, but their data is still isolated within the collection. Let’s take a look at a two-tenant collection: **Figure 1:** Each individual vector is assigned a specific payload that denotes which tenant it belongs to. This is how a large number of different tenants can share a single Qdrant collection. ![Qdrant Multitenancy](https://qdrant.tech/articles_data/multitenancy/multitenancy-single.png) Qdrant is built to excel in a single collection with a vast number of tenants. You should only create multiple collections when your data is not homogenous or if users’ vectors are created by different embedding models. Creating too many collections may result in resource overhead and cause dependencies. This can increase costs and affect overall performance. ## [Anchor](https://qdrant.tech/articles/multitenancy/\#sharding-your-database) Sharding your database With Qdrant, you can also specify a shard for each vector individually. This feature is useful if you want to [control where your data is kept in the cluster](https://qdrant.tech/documentation/guides/distributed_deployment/#sharding). For example, one set of vectors can be assigned to one shard on its own node, while another set can be on a completely different node. During vector search, your operations will be able to hit only the subset of shards they actually need. In massive-scale deployments, **this can significantly improve the performance of operations that do not require the whole collection to be scanned**. This works in the other direction as well. Whenever you search for something, you can specify a shard or several shards and Qdrant will know where to find them. It will avoid asking all machines in your cluster for results. This will minimize overhead and maximize performance. ### [Anchor](https://qdrant.tech/articles/multitenancy/\#common-use-cases) Common use cases A clear use-case for this feature is managing a multitenant collection, where each tenant (let it be a user or organization) is assumed to be segregated, so they can have their data stored in separate shards. Sharding solves the problem of region-based data placement, whereby certain data needs to be kept within specific locations. To do this, however, you will need to [move your shards between nodes](https://qdrant.tech/documentation/guides/distributed_deployment/#moving-shards). **Figure 2:** Users can both upsert and query shards that are relevant to them, all within the same collection. Regional sharding can help avoid cross-continental traffic. ![Qdrant Multitenancy](https://qdrant.tech/articles_data/multitenancy/shards.png) Custom sharding also gives you precise control over other use cases. A time-based data placement means that data streams can index shards that represent latest updates. If you organize your shards by date, you can have great control over the recency of retrieved data. This is relevant for social media platforms, which greatly rely on time-sensitive data. ## [Anchor](https://qdrant.tech/articles/multitenancy/\#before-i-go-any-furtherhow-secure-is-my-user-data) Before I go any further…..how secure is my user data? By design, Qdrant offers three levels of isolation. We initially introduced collection-based isolation, but your scaled setup has to move beyond this level. In this scenario, you will leverage payload-based isolation (from multitenancy) and resource-based isolation (from sharding). The ultimate goal is to have a single collection, where you can manipulate and customize placement of shards inside your cluster more precisely and avoid any kind of overhead. The diagram below shows the arrangement of your data within a two-tier isolation arrangement. **Figure 3:** Users can query the collection based on two filters: the `group_id` and the individual `shard_key_selector`. This gives your data two additional levels of isolation. ![Qdrant Multitenancy](https://qdrant.tech/articles_data/multitenancy/multitenancy.png) ## [Anchor](https://qdrant.tech/articles/multitenancy/\#create-custom-shards-for-a-single-collection) Create custom shards for a single collection When creating a collection, you will need to configure user-defined sharding. This lets you control the shard placement of your data, so that operations can hit only the subset of shards they actually need. In big clusters, this can significantly improve the performance of operations, since you won’t need to go through the entire collection to retrieve data. ```python client.create_collection( collection_name="{tenant_data}", shard_number=2, sharding_method=models.ShardingMethod.CUSTOM, # ... other collection parameters ) client.create_shard_key("{tenant_data}", "canada") client.create_shard_key("{tenant_data}", "germany") ``` In this example, your cluster is divided between Germany and Canada. Canadian and German law differ when it comes to international data transfer. Let’s say you are creating a RAG application that supports the healthcare industry. Your Canadian customer data will have to be clearly separated for compliance purposes from your German customer. Even though it is part of the same collection, data from each shard is isolated from other shards and can be retrieved as such. For additional examples on shards and retrieval, consult [Distributed Deployments](https://qdrant.tech/documentation/guides/distributed_deployment/) documentation and [Qdrant Client specification](https://python-client.qdrant.tech/). ## [Anchor](https://qdrant.tech/articles/multitenancy/\#configure-a-multitenant-setup-for-users) Configure a multitenant setup for users Let’s continue and start adding data. As you upsert your vectors to your new collection, you can add a `group_id` field to each vector. If you do this, Qdrant will assign each vector to its respective group. Additionally, each vector can now be allocated to a shard. You can specify the `shard_key_selector` for each individual vector. In this example, you are upserting data belonging to `tenant_1` to the Canadian region. ```python client.upsert( collection_name="{tenant_data}", points=[\ models.PointStruct(\ id=1,\ payload={"group_id": "tenant_1"},\ vector=[0.9, 0.1, 0.1],\ ),\ models.PointStruct(\ id=2,\ payload={"group_id": "tenant_1"},\ vector=[0.1, 0.9, 0.1],\ ),\ ], shard_key_selector="canada", ) ``` Keep in mind that the data for each `group_id` is isolated. In the example below, `tenant_1` vectors are kept separate from `tenant_2`. The first tenant will be able to access their data in the Canadian portion of the cluster. However, as shown below `tenant_2 ` might only be able to retrieve information hosted in Germany. ```python client.upsert( collection_name="{tenant_data}", points=[\ models.PointStruct(\ id=3,\ payload={"group_id": "tenant_2"},\ vector=[0.1, 0.1, 0.9],\ ),\ ], shard_key_selector="germany", ) ``` ## [Anchor](https://qdrant.tech/articles/multitenancy/\#retrieve-data-via-filters) Retrieve data via filters The access control setup is completed as you specify the criteria for data retrieval. When searching for vectors, you need to use a `query_filter` along with `group_id` to filter vectors for each user. ```python client.search( collection_name="{tenant_data}", query_filter=models.Filter( must=[\ models.FieldCondition(\ key="group_id",\ match=models.MatchValue(\ value="tenant_1",\ ),\ ),\ ] ), query_vector=[0.1, 0.1, 0.9], limit=10, ) ``` ## [Anchor](https://qdrant.tech/articles/multitenancy/\#performance-considerations) Performance considerations The speed of indexation may become a bottleneck if you are adding large amounts of data in this way, as each user’s vector will be indexed into the same collection. To avoid this bottleneck, consider _bypassing the construction of a global vector index_ for the entire collection and building it only for individual groups instead. By adopting this strategy, Qdrant will index vectors for each user independently, significantly accelerating the process. To implement this approach, you should: 1. Set `payload_m` in the HNSW configuration to a non-zero value, such as 16. 2. Set `m` in hnsw config to 0. This will disable building global index for the whole collection. ```python from qdrant_client import QdrantClient, models client = QdrantClient("localhost", port=6333) client.create_collection( collection_name="{tenant_data}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), hnsw_config=models.HnswConfigDiff( payload_m=16, m=0, ), ) ``` 3. Create keyword payload index for `group_id` field. ```python client.create_payload_index( collection_name="{tenant_data}", field_name="group_id", field_schema=models.PayloadSchemaType.KEYWORD, ) ``` > Note: Keep in mind that global requests (without the `group_id` filter) will be slower since they will necessitate scanning all groups to identify the nearest neighbors. ## [Anchor](https://qdrant.tech/articles/multitenancy/\#explore-multitenancy-and-custom-sharding-in-qdrant-for-scalable-solutions) Explore multitenancy and custom sharding in Qdrant for scalable solutions Qdrant is ready to support a massive-scale architecture for your machine learning project. If you want to see whether our [vector database](https://qdrant.tech/) is right for you, try the [quickstart tutorial](https://qdrant.tech/documentation/quick-start/) or read our [docs and tutorials](https://qdrant.tech/documentation/). To spin up a free instance of Qdrant, sign up for [Qdrant Cloud](https://qdrant.to/cloud) \- no strings attached. Get support or share ideas in our [Discord](https://qdrant.to/discord) community. This is where we talk about vector search theory, publish examples and demos and discuss vector database setups. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/multitenancy.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/multitenancy.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-111-lllmstxt|> ## fastembed - [Articles](https://qdrant.tech/articles/) - FastEmbed: Qdrant's Efficient Python Library for Embedding Generation [Back to Ecosystem](https://qdrant.tech/articles/ecosystem/) --- # FastEmbed: Qdrant's Efficient Python Library for Embedding Generation Nirant Kasliwal · October 18, 2023 ![FastEmbed: Qdrant's Efficient Python Library for Embedding Generation](https://qdrant.tech/articles_data/fastembed/preview/title.jpg) Data Science and Machine Learning practitioners often find themselves navigating through a labyrinth of models, libraries, and frameworks. Which model to choose, what embedding size, and how to approach tokenizing, are just some questions you are faced with when starting your work. We understood how many data scientists wanted an easier and more intuitive means to do their embedding work. This is why we built FastEmbed, a Python library engineered for speed, efficiency, and usability. We have created easy to use default workflows, handling the 80% use cases in NLP embedding. ## [Anchor](https://qdrant.tech/articles/fastembed/\#current-state-of-affairs-for-generating-embeddings) Current State of Affairs for Generating Embeddings Usually you make embedding by utilizing PyTorch or TensorFlow models under the hood. However, using these libraries comes at a cost in terms of ease of use and computational speed. This is at least in part because these are built for both: model inference and improvement e.g. via fine-tuning. To tackle these problems we built a small library focused on the task of quickly and efficiently creating text embeddings. We also decided to start with only a small sample of best in class transformer models. By keeping it small and focused on a particular use case, we could make our library focused without all the extraneous dependencies. We ship with limited models, quantize the model weights and seamlessly integrate them with the ONNX Runtime. FastEmbed strikes a balance between inference time, resource utilization and performance (recall/accuracy). ## [Anchor](https://qdrant.tech/articles/fastembed/\#quick-embedding-text-document-example) Quick Embedding Text Document Example Here is an example of how simple we have made embedding text documents: ```python documents: List[str] = [\ "Hello, World!",\ "fastembed is supported by and maintained by Qdrant."\ ] embedding_model = DefaultEmbedding() embeddings: List[np.ndarray] = list(embedding_model.embed(documents)) ``` These 3 lines of code do a lot of heavy lifting for you: They download the quantized model, load it using ONNXRuntime, and then run a batched embedding creation of your documents. ### [Anchor](https://qdrant.tech/articles/fastembed/\#code-walkthrough) Code Walkthrough Let’s delve into a more advanced example code snippet line-by-line: ```python from fastembed.embedding import DefaultEmbedding ``` Here, we import the FlagEmbedding class from FastEmbed and alias it as Embedding. This is the core class responsible for generating embeddings based on your chosen text model. This is also the class which you can import directly as DefaultEmbedding which is [BAAI/bge-small-en-v1.5](https://huggingface.co/baai/bge-small-en-v1.5) ```python documents: List[str] = [\ "passage: Hello, World!",\ "query: How is the World?",\ "passage: This is an example passage.",\ "fastembed is supported by and maintained by Qdrant."\ ] ``` In this list called documents, we define four text strings that we want to convert into embeddings. Note the use of prefixes “passage” and “query” to differentiate the types of embeddings to be generated. This is inherited from the cross-encoder implementation of the BAAI/bge series of models themselves. This is particularly useful for retrieval and we strongly recommend using this as well. The use of text prefixes like “query” and “passage” isn’t merely syntactic sugar; it informs the algorithm on how to treat the text for embedding generation. A “query” prefix often triggers the model to generate embeddings that are optimized for similarity comparisons, while “passage” embeddings are fine-tuned for contextual understanding. If you omit the prefix, the default behavior is applied, although specifying it is recommended for more nuanced results. Next, we initialize the Embedding model with the default model: [BAAI/bge-small-en-v1.5](https://huggingface.co/baai/bge-small-en-v1.5). ```python embedding_model = DefaultEmbedding() ``` The default model and several other models have a context window of a maximum of 512 tokens. This maximum limit comes from the embedding model training and design itself. If you’d like to embed sequences larger than that, we’d recommend using some pooling strategy to get a single vector out of the sequence. For example, you can use the mean of the embeddings of different chunks of a document. This is also what the [SBERT Paper recommends](https://lilianweng.github.io/posts/2021-05-31-contrastive/#sentence-bert) This model strikes a balance between speed and accuracy, ideal for real-world applications. ```python embeddings: List[np.ndarray] = list(embedding_model.embed(documents)) ``` Finally, we call the `embed()` method on our embedding\_model object, passing in the documents list. The method returns a Python generator, so we convert it to a list to get all the embeddings. These embeddings are NumPy arrays, optimized for fast mathematical operations. The `embed()` method returns a list of NumPy arrays, each corresponding to the embedding of a document in your original documents list. The dimensions of these arrays are determined by the model you chose e.g. for “BAAI/bge-small-en-v1.5” it’s a 384-dimensional vector. You can easily parse these NumPy arrays for any downstream application—be it clustering, similarity comparison, or feeding them into a machine learning model for further analysis. ## [Anchor](https://qdrant.tech/articles/fastembed/\#3-key-features-of-fastembed) 3 Key Features of FastEmbed FastEmbed is built for inference speed, without sacrificing (too much) performance: 1. 50% faster than PyTorch Transformers 2. Better performance than Sentence Transformers and OpenAI Ada-002 3. Cosine similarity of quantized and original model vectors is 0.92 We use `BAAI/bge-small-en-v1.5` as our DefaultEmbedding, hence we’ve chosen that for comparison: ![](https://qdrant.tech/articles_data/fastembed/throughput.png) ## [Anchor](https://qdrant.tech/articles/fastembed/\#under-the-hood-of-fastembed) Under the Hood of FastEmbed **Quantized Models**: We quantize the models for CPU (and Mac Metal) – giving you the best buck for your compute model. Our default model is so small, you can run this in AWS Lambda if you’d like! Shout out to Huggingface’s [Optimum](https://github.com/huggingface/optimum) – which made it easier to quantize models. **Reduced Installation Time**: FastEmbed sets itself apart by maintaining a low minimum RAM/Disk usage. It’s designed to be agile and fast, useful for businesses looking to integrate text embedding for production usage. For FastEmbed, the list of dependencies is refreshingly brief: > - onnx: Version ^1.11 – We’ll try to drop this also in the future if we can! > - onnxruntime: Version ^1.15 > - tqdm: Version ^4.65 – used only at Download > - requests: Version ^2.31 – used only at Download > - tokenizers: Version ^0.13 This minimized list serves two purposes. First, it significantly reduces the installation time, allowing for quicker deployments. Second, it limits the amount of disk space required, making it a viable option even for environments with storage limitations. Notably absent from the dependency list are bulky libraries like PyTorch, and there’s no requirement for CUDA drivers. This is intentional. FastEmbed is engineered to deliver optimal performance right on your CPU, eliminating the need for specialized hardware or complex setups. **ONNXRuntime**: The ONNXRuntime gives us the ability to support multiple providers. The quantization we do is limited for CPU (Intel), but we intend to support GPU versions of the same in the future as well.  This allows for greater customization and optimization, further aligning with your specific performance and computational requirements. ## [Anchor](https://qdrant.tech/articles/fastembed/\#current-models) Current Models We’ve started with a small set of supported models: All the models we support are [quantized](https://pytorch.org/docs/stable/quantization.html) to enable even faster computation! If you’re using FastEmbed and you’ve got ideas or need certain features, feel free to let us know. Just drop an issue on our GitHub page. That’s where we look first when we’re deciding what to work on next. Here’s where you can do it: [FastEmbed GitHub Issues](https://github.com/qdrant/fastembed/issues). When it comes to FastEmbed’s DefaultEmbedding model, we’re committed to supporting the best Open Source models. If anything changes, you’ll see a new version number pop up, like going from 0.0.6 to 0.1. So, it’s a good idea to lock in the FastEmbed version you’re using to avoid surprises. ## [Anchor](https://qdrant.tech/articles/fastembed/\#using-fastembed-with-qdrant) Using FastEmbed with Qdrant Qdrant is a Vector Store, offering comprehensive, efficient, and scalable [enterprise solutions](https://qdrant.tech/enterprise-solutions/) for modern machine learning and AI applications. Whether you are dealing with billions of data points, require a low latency performant [vector database solution](https://qdrant.tech/qdrant-vector-database/), or specialized quantization methods – [Qdrant is engineered](https://qdrant.tech/documentation/overview/) to meet those demands head-on. The fusion of FastEmbed with Qdrant’s vector store capabilities enables a transparent workflow for seamless embedding generation, storage, and retrieval. This simplifies the API design — while still giving you the flexibility to make significant changes e.g. you can use FastEmbed to make your own embedding other than the DefaultEmbedding and use that with Qdrant. Below is a detailed guide on how to get started with FastEmbed in conjunction with Qdrant. ### [Anchor](https://qdrant.tech/articles/fastembed/\#step-1-installation) Step 1: Installation Before diving into the code, the initial step involves installing the Qdrant Client along with the FastEmbed library. This can be done using pip: ``` pip install qdrant-client[fastembed] ``` For those using zsh as their shell, you might encounter syntax issues. In such cases, wrap the package name in quotes: ``` pip install 'qdrant-client[fastembed]' ``` ### [Anchor](https://qdrant.tech/articles/fastembed/\#step-2-initializing-the-qdrant-client) Step 2: Initializing the Qdrant Client After successful installation, the next step involves initializing the Qdrant Client. This can be done either in-memory or by specifying a database path: ```python from qdrant_client import QdrantClient --- # Initialize the client client = QdrantClient(":memory:")  # or QdrantClient(path="path/to/db") ``` ### [Anchor](https://qdrant.tech/articles/fastembed/\#step-3-preparing-documents-metadata-and-ids) Step 3: Preparing Documents, Metadata, and IDs Once the client is initialized, prepare the text documents you wish to embed, along with any associated metadata and unique IDs: ```python docs = [\ "Qdrant has Langchain integrations",\ "Qdrant also has Llama Index integrations"\ ] metadata = [\ {"source": "Langchain-docs"},\ {"source": "LlamaIndex-docs"},\ ] ids = [42, 2] ``` Note that the add method we’ll use is overloaded: If you skip the ids, we’ll generate those for you. metadata is obviously optional. So, you can simply use this too: ```python docs = [\ "Qdrant has Langchain integrations",\ "Qdrant also has Llama Index integrations"\ ] ``` ### [Anchor](https://qdrant.tech/articles/fastembed/\#step-4-adding-documents-to-a-collection) Step 4: Adding Documents to a Collection With your documents, metadata, and IDs ready, you can proceed to add these to a specified collection within Qdrant using the add method: ```python client.add( collection_name="demo_collection", documents=docs, metadata=metadata, ids=ids ) ``` Inside this function, Qdrant Client uses FastEmbed to make the text embedding, generate ids if they’re missing, and then add them to the index with metadata. This uses the DefaultEmbedding model: [BAAI/bge-small-en-v1.5](https://huggingface.co/baai/bge-small-en-v1.5) ![INDEX TIME: Sequence Diagram for Qdrant and FastEmbed](https://qdrant.tech/articles_data/fastembed/generate-embeddings-from-docs.png) ### [Anchor](https://qdrant.tech/articles/fastembed/\#step-5-performing-queries) Step 5: Performing Queries Finally, you can perform queries on your stored documents. Qdrant offers a robust querying capability, and the query results can be easily retrieved as follows: ```python search_result = client.query( collection_name="demo_collection", query_text="This is a query document" ) print(search_result) ``` Behind the scenes, we first convert the query\_text to the embedding and use that to query the vector index. ![QUERY TIME: Sequence Diagram for Qdrant and FastEmbed integration](https://qdrant.tech/articles_data/fastembed/generate-embeddings-query.png) By following these steps, you effectively utilize the combined capabilities of FastEmbed and Qdrant, thereby streamlining your embedding generation and retrieval tasks. Qdrant is designed to handle large-scale datasets with billions of data points. Its architecture employs techniques like [binary quantization](https://qdrant.tech/articles/binary-quantization/) and [scalar quantization](https://qdrant.tech/articles/scalar-quantization/) for efficient storage and retrieval. When you inject FastEmbed’s CPU-first design and lightweight nature into this equation, you end up with a system that can scale seamlessly while maintaining low latency. ## [Anchor](https://qdrant.tech/articles/fastembed/\#summary) Summary If you’re curious about how FastEmbed and Qdrant can make your search tasks a breeze, why not take it for a spin? You get a real feel for what it can do. Here are two easy ways to get started: 1. **Cloud**: Get started with a free plan on the [Qdrant Cloud](https://qdrant.to/cloud?utm_source=qdrant&utm_medium=website&utm_campaign=fastembed&utm_content=article). 2. **Docker Container**: If you’re the DIY type, you can set everything up on your own machine. Here’s a quick guide to help you out: [Quick Start with Docker](https://qdrant.tech/documentation/quick-start/?utm_source=qdrant&utm_medium=website&utm_campaign=fastembed&utm_content=article). So, go ahead, take it for a test drive. We’re excited to hear what you think! Lastly, If you find FastEmbed useful and want to keep up with what we’re doing, giving our GitHub repo a star would mean a lot to us. Here’s the link to [star the repository](https://github.com/qdrant/fastembed). If you ever have questions about FastEmbed, please ask them on the Qdrant Discord: [https://discord.gg/Qy6HCJK9Dc](https://discord.gg/Qy6HCJK9Dc) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/fastembed.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/fastembed.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-112-lllmstxt|> ## usage-statistics - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Usage Statistics --- # [Anchor](https://qdrant.tech/documentation/guides/usage-statistics/\#usage-statistics) Usage statistics The Qdrant open-source container image collects anonymized usage statistics from users in order to improve the engine by default. You can [deactivate](https://qdrant.tech/documentation/guides/usage-statistics/#deactivate-telemetry) at any time, and any data that has already been collected can be [deleted on request](https://qdrant.tech/documentation/guides/usage-statistics/#request-information-deletion). Deactivating this will not affect your ability to monitor the Qdrant database yourself by accessing the `/metrics` or `/telemetry` endpoints of your database. It will just stop sending independend, anonymized usage statistics to the Qdrant team. ## [Anchor](https://qdrant.tech/documentation/guides/usage-statistics/\#why-do-we-collect-usage-statistics) Why do we collect usage statistics? We want to make Qdrant fast and reliable. To do this, we need to understand how it performs in real-world scenarios. We do a lot of benchmarking internally, but it is impossible to cover all possible use cases, hardware, and configurations. In order to identify bottlenecks and improve Qdrant, we need to collect information about how it is used. Additionally, Qdrant uses a bunch of internal heuristics to optimize the performance. To better set up parameters for these heuristics, we need to collect timings and counters of various pieces of code. With this information, we can make Qdrant faster for everyone. ## [Anchor](https://qdrant.tech/documentation/guides/usage-statistics/\#what-information-is-collected) What information is collected? There are 3 types of information that we collect: - System information - general information about the system, such as CPU, RAM, and disk type. As well as the configuration of the Qdrant instance. - Performance - information about timings and counters of various pieces of code. - Critical error reports - information about critical errors, such as backtraces, that occurred in Qdrant. This information would allow to identify problems nobody yet reported to us. ### [Anchor](https://qdrant.tech/documentation/guides/usage-statistics/\#we-never-collect-the-following-information) We **never** collect the following information: - User’s IP address - Any data that can be used to identify the user or the user’s organization - Any data, stored in the collections - Any names of the collections - Any URLs ## [Anchor](https://qdrant.tech/documentation/guides/usage-statistics/\#how-do-we-anonymize-data) How do we anonymize data? We understand that some users may be concerned about the privacy of their data. That is why we make an extra effort to ensure your privacy. There are several different techniques that we use to anonymize the data: - We use a random UUID to identify instances. This UUID is generated on each startup and is not stored anywhere. There are no other ways to distinguish between different instances. - We round all big numbers, so that the last digits are always 0. For example, if the number is 123456789, we will store 123456000. - We replace all names with irreversibly hashed values. So no collection or field names will leak into the telemetry. - All urls are hashed as well. You can see exact version of anomymized collected data by accessing the [telemetry API](https://api.qdrant.tech/master/api-reference/service/telemetry) with `anonymize=true` parameter. For example, [http://localhost:6333/telemetry?details\_level=6&anonymize=true](http://localhost:6333/telemetry?details_level=6&anonymize=true) ## [Anchor](https://qdrant.tech/documentation/guides/usage-statistics/\#deactivate-usage-statistics) Deactivate usage statistics You can deactivate usage statistics by: - setting the `QDRANT__TELEMETRY_DISABLED` environment variable to `true` - setting the config option `telemetry_disabled` to `true` in the `config/production.yaml` or `config/config.yaml` files - using cli option `--disable-telemetry` Any of these options will prevent Qdrant from sending any usage statistics data. If you decide to deactivate usage statistics, we kindly ask you to share your feedback with us in the [Discord community](https://qdrant.to/discord) or GitHub [discussions](https://github.com/qdrant/qdrant/discussions) ## [Anchor](https://qdrant.tech/documentation/guides/usage-statistics/\#request-information-deletion) Request information deletion We provide an email address so that users can request the complete removal of their data from all of our tools. To do so, send an email to [privacy@qdrant.com](mailto:privacy@qdrant.com) containing the unique identifier generated for your Qdrant installation. You can find this identifier in the telemetry API response ( `"id"` field), or in the logs of your Qdrant instance. Any questions regarding the management of the data we collect can also be sent to this email address. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/usage-statistics.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/usage-statistics.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-113-lllmstxt|> ## data-privacy - [Articles](https://qdrant.tech/articles/) - Data Privacy with Qdrant: Implementing Role-Based Access Control (RBAC) [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # Data Privacy with Qdrant: Implementing Role-Based Access Control (RBAC) Qdrant Team · June 18, 2024 ![ Data Privacy with Qdrant: Implementing Role-Based Access Control (RBAC)](https://qdrant.tech/articles_data/data-privacy/preview/title.jpg) Data stored in vector databases is often proprietary to the enterprise and may include sensitive information like customer records, legal contracts, electronic health records (EHR), financial data, and intellectual property. Moreover, strong security measures become critical to safeguarding this data. If the data stored in a vector database is not secured, it may open a vulnerability known as “ [embedding inversion attack](https://arxiv.org/abs/2004.00053),” where malicious actors could potentially [reconstruct the original data from the embeddings](https://arxiv.org/pdf/2305.03010) themselves. Strict compliance regulations govern data stored in vector databases across various industries. For instance, healthcare must comply with HIPAA, which dictates how protected health information (PHI) is stored, transmitted, and secured. Similarly, the financial services industry follows PCI DSS to safeguard sensitive financial data. These regulations require developers to ensure data storage and transmission comply with industry-specific legal frameworks across different regions. **As a result, features that enable data privacy, security and sovereignty are deciding factors when choosing the right vector database.** This article explores various strategies to ensure the security of your critical data while leveraging the benefits of vector search. Implementing some of these security approaches can help you build privacy-enhanced similarity search algorithms and integrate them into your AI applications. Additionally, you will learn how to build a fully data-sovereign architecture, allowing you to retain control over your data and comply with relevant data laws and regulations. > To skip right to the code implementation, [click here](https://qdrant.tech/articles/data-privacy/#jwt-on-qdrant). ## [Anchor](https://qdrant.tech/articles/data-privacy/\#vector-database-security-an-overview) Vector Database Security: An Overview Vector databases are often unsecured by default to facilitate rapid prototyping and experimentation. This approach allows developers to quickly ingest data, build vector representations, and test similarity search algorithms without initial security concerns. However, in production environments, unsecured databases pose significant data breach risks. For production use, robust security systems are essential. Authentication, particularly using static API keys, is a common approach to control access and prevent unauthorized modifications. Yet, simple API authentication is insufficient for enterprise data, which requires granular control. The primary challenge with static API keys is their all-or-nothing access, inadequate for role-based data segregation in enterprise applications. Additionally, a compromised key could grant attackers full access to manipulate or steal data. To strengthen the security of the vector database, developers typically need the following: 1. **Encryption**: This ensures that sensitive data is scrambled as it travels between the application and the vector database. This safeguards against Man-in-the-Middle ( [MitM](https://en.wikipedia.org/wiki/Man-in-the-middle_attack)) attacks, where malicious actors can attempt to intercept and steal data during transmission. 2. **Role-Based Access Control**: As mentioned before, traditional static API keys grant all-or-nothing access, which is a significant security risk in enterprise environments. RBAC offers a more granular approach by defining user roles and assigning specific data access permissions based on those roles. For example, an analyst might have read-only access to specific datasets, while an administrator might have full CRUD (Create, Read, Update, Delete) permissions across the database. 3. **Deployment Flexibility**: Data residency regulations like GDPR (General Data Protection Regulation) and industry-specific compliance requirements dictate where data can be stored, processed, and accessed. Developers would need to choose a database solution which offers deployment options that comply with these regulations. This might include on-premise deployments within a company’s private cloud or geographically distributed cloud deployments that adhere to data residency laws. ## [Anchor](https://qdrant.tech/articles/data-privacy/\#how-qdrant-handles-data-privacy-and-security) How Qdrant Handles Data Privacy and Security One of the cornerstones of our design choices at Qdrant has been the focus on security features. We have built in a range of features keeping the enterprise user in mind, which allow building of granular access control on a fully data sovereign architecture. A Qdrant instance is unsecured by default. However, when you are ready to deploy in production, Qdrant offers a range of security features that allow you to control access to your data, protect it from breaches, and adhere to regulatory requirements. Using Qdrant, you can build granular access control, segregate roles and privileges, and create a fully data sovereign architecture. ### [Anchor](https://qdrant.tech/articles/data-privacy/\#api-keys-and-tls-encryption) API Keys and TLS Encryption For simpler use cases, Qdrant offers API key-based authentication. This includes both regular API keys and read-only API keys. Regular API keys grant full access to read, write, and delete operations, while read-only keys restrict access to data retrieval operations only, preventing write actions. On Qdrant Cloud, you can create API keys using the [Cloud Dashboard](https://qdrant.to/cloud). This allows you to generate API keys that give you access to a single node or cluster, or multiple clusters. You can read the steps to do so [here](https://qdrant.tech/documentation/cloud/authentication/). ![web-ui](https://qdrant.tech/articles_data/data-privacy/web-ui.png) For on-premise or local deployments, you’ll need to configure API key authentication. This involves specifying a key in either the Qdrant configuration file or as an environment variable. This ensures that all requests to the server must include a valid API key sent in the header. When using the simple API key-based authentication, you should also turn on TLS encryption. Otherwise, you are exposing the connection to sniffing and MitM attacks. To secure your connection using TLS, you would need to create a certificate and private key, and then [enable TLS](https://qdrant.tech/documentation/guides/security/#tls) in the configuration. API authentication, coupled with TLS encryption, offers a first layer of security for your Qdrant instance. However, to enable more granular access control, the recommended approach is to leverage JSON Web Tokens (JWTs). ### [Anchor](https://qdrant.tech/articles/data-privacy/\#jwt-on-qdrant) JWT on Qdrant JSON Web Tokens (JWTs) are a compact, URL-safe, and stateless means of representing _claims_ to be transferred between two parties. These claims are encoded as a JSON object and are cryptographically signed. JWT is composed of three parts: a header, a payload, and a signature, which are concatenated with dots (.) to form a single string. The header contains the type of token and algorithm being used. The payload contains the claims (explained in detail later). The signature is a cryptographic hash and ensures the token’s integrity. In Qdrant, JWT forms the foundation through which powerful access controls can be built. Let’s understand how. JWT is enabled on the Qdrant instance by specifying the API key and turning on the **jwt\_rbac** feature in the configuration (alternatively, they can be set as environment variables). For any subsequent request, the API key is used to encode or decode the token. The way JWT works is that just the API key is enough to generate the token, and doesn’t require any communication with the Qdrant instance or server. There are several libraries that help generate tokens by encoding a payload, such as [PyJWT](https://pyjwt.readthedocs.io/en/stable/) (for Python), [jsonwebtoken](https://www.npmjs.com/package/jsonwebtoken) (for JavaScript), and [jsonwebtoken](https://crates.io/crates/jsonwebtoken) (for Rust). Qdrant uses the HS256 algorithm to encode or decode the tokens. We will look at the payload structure shortly, but here’s how you can generate a token using PyJWT. ```python import jwt import datetime --- # Define your API key and other payload data api_key = "your_api_key" payload = { ... } token = jwt.encode(payload, api_key, algorithm="HS256") print(token) ``` Once you have generated the token, you should include it in the subsequent requests. You can do so by providing it as a bearer token in the Authorization header, or in the API Key header of your requests. Below is an example of how to do so using QdrantClient in Python: ```python from qdrant_client import QdrantClient qdrant_client = QdrantClient( "http://localhost:6333", api_key="", # the token goes here ) --- # Example search vector search_vector = [0.1, 0.2, 0.3, 0.4] --- # Example similarity search request response = qdrant_client.search( collection_name="demo_collection", query_vector=search_vector, limit=5 # Number of results to retrieve ) ``` For convenience, we have added a JWT generation tool in the Qdrant Web UI, which is present under the 🔑 tab. For your local deployments, you will find it at [http://localhost:6333/dashboard#/jwt](http://localhost:6333/dashboard#/jwt). ### [Anchor](https://qdrant.tech/articles/data-privacy/\#payload-configuration) Payload Configuration There are several different options (claims) you can use in the JWT payload that help control access and functionality. Let’s look at them one by one. **exp**: This claim is the expiration time of the token, and is a unix timestamp in seconds. After the expiration time, the token will be invalid. **value\_exists**: This claim validates the token against a specific key-value stored in a collection. By using this claim, you can revoke access by simply changing a value without having to invalidate the API key. **access**: This claim defines the access level of the token. The access level can be global read (r) or manage (m). It can also be specific to a collection, or even a subset of a collection, using read (r) and read-write (rw). Let’s look at a few example JWT payload configurations. **Scenario 1: 1-hour expiry time, and read-only access to a collection** ```json { "exp": 1690995200, // Set to 1 hour from the current time (Unix timestamp) "access": [\ {\ "collection": "demo_collection",\ "access": "r" // Read-only access\ }\ ] } ``` **Scenario 2: 1-hour expiry time, and access to user with a specific role** Suppose you have a ‘users’ collection and have defined specific roles for each user, such as ‘developer’, ‘manager’, ‘admin’, ‘analyst’, and ‘revoked’. In such a scenario, you can use a combination of **exp** and **value\_exists**. ```json { "exp": 1690995200, "value_exists": { "collection": "users", "matches": [\ { "key": "username", "value": "john" },\ { "key": "role", "value": "developer" }\ ], }, } ``` Now, if you ever want to revoke access for a user, simply change the value of their role. All future requests will be invalid using a token payload of the above type. **Scenario 3: 1-hour expiry time, and read-write access to a subset of a collection** You can even specify access levels specific to subsets of a collection. This can be especially useful when you are leveraging [multitenancy](https://qdrant.tech/documentation/guides/multiple-partitions/), and want to segregate access. ```json { "exp": 1690995200, "access": [\ {\ "collection": "demo_collection",\ "access": "r",\ "payload": {\ "user_id": "user_123456"\ }\ }\ ] } ``` By combining the claims, you can fully customize the access level that a user or a role has within the vector store. ### [Anchor](https://qdrant.tech/articles/data-privacy/\#creating-role-based-access-control-rbac-using-jwt) Creating Role-Based Access Control (RBAC) Using JWT As we saw above, JWT claims create powerful levers through which you can create granular access control on Qdrant. Let’s bring it all together and understand how it helps you create Role-Based Access Control (RBAC). In a typical enterprise application, you will have a segregation of users based on their roles and permissions. These could be: 1. **Admin or Owner:** with full access, and can generate API keys. 2. **Editor:** with read-write access levels to specific collections. 3. **Viewer:** with read-only access to specific collections. 4. **Data Scientist or Analyst:** with read-only access to specific collections. 5. **Developer:** with read-write access to development- or testing-specific collections, but limited access to production data. 6. **Guest:** with limited read-only access to publicly available collections. In addition, you can create access levels within sections of a collection. In a multi-tenant application, where you have used payload-based partitioning, you can create read-only access for specific user roles for a subset of the collection that belongs to that user. Your application requirements will eventually help you decide the roles and access levels you should create. For example, in an application managing customer data, you could create additional roles such as: **Customer Support Representative**: read-write access to customer service-related data but no access to billing information. **Billing Department**: read-only access to billing data and read-write access to payment records. **Marketing Analyst**: read-only access to anonymized customer data for analytics. Each role can be assigned a JWT with claims that specify expiration times, read/write permissions for collections, and validating conditions. In such an application, an example JWT payload for a customer support representative role could be: ```json { "exp": 1690995200, "access": [\ {\ "collection": "customer_data",\ "access": "rw",\ "payload": {\ "department": "support"\ }\ }\ ], "value_exists": { "collection": "departments", "matches": [\ { "key": "department", "value": "support" }\ ] } } ``` As you can see, by implementing RBAC, you can ensure proper segregation of roles and their privileges, and avoid privacy loopholes in your application. ## [Anchor](https://qdrant.tech/articles/data-privacy/\#qdrant-hybrid-cloud-and-data-sovereignty) Qdrant Hybrid Cloud and Data Sovereignty Data governance varies by country, especially for global organizations dealing with different regulations on data privacy, security, and access. This often necessitates deploying infrastructure within specific geographical boundaries. To address these needs, the vector database you choose should support deployment and scaling within your controlled infrastructure. [Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/) offers this flexibility, along with features like sharding, replicas, JWT authentication, and monitoring. Qdrant Hybrid Cloud integrates Kubernetes clusters from various environments—cloud, on-premises, or edge—into a unified managed service. This allows organizations to manage Qdrant databases through the Qdrant Cloud UI while keeping the databases within their infrastructure. With JWT and RBAC, Qdrant Hybrid Cloud provides a secure, private, and sovereign vector store. Enterprises can scale their AI applications geographically, comply with local laws, and maintain strict data control. ## [Anchor](https://qdrant.tech/articles/data-privacy/\#conclusion) Conclusion Vector similarity is increasingly becoming the backbone of AI applications that leverage unstructured data. By transforming data into vectors – their numerical representations – organizations can build powerful applications that harness semantic search, ranging from better recommendation systems to algorithms that help with personalization, or powerful customer support chatbots. However, to fully leverage the power of AI in production, organizations need to choose a vector database that offers strong privacy and security features, while also helping them adhere to local laws and regulations. Qdrant provides exceptional efficiency and performance, along with the capability to implement granular access control to data, Role-Based Access Control (RBAC), and the ability to build a fully data-sovereign architecture. Interested in mastering vector search security and deployment strategies? [Join our Discord community](https://discord.gg/qdrant) to explore more advanced search strategies, connect with other developers and researchers in the industry, and stay updated on the latest innovations! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/data-privacy.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/data-privacy.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-114-lllmstxt|> ## changelog - [Documentation](https://qdrant.tech/documentation/) - [Private cloud](https://qdrant.tech/documentation/private-cloud/) - Changelog --- # [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#changelog) Changelog ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#171-2025-06-03) 1.7.1 (2025-06-03) | | | | --- | --- | | qdrant-kubernetes-api version | v1.16.6 | | operator version | 2.6.0 | | qdrant-cluster-manager version | v0.3.6 | - Performance and stability improvements ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#170-2025-05-14) 1.7.0 (2025-05-14) | | | | --- | --- | | qdrant-kubernetes-api version | v1.16.3 | | operator version | 2.4.2 | | qdrant-cluster-manager version | v0.3.5 | - Add optional automatic shard balancing - Set strict mode by default for new clusters to only allow queries with payload filters on fields that are indexed ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#164-2025-04-17) 1.6.4 (2025-04-17) | | | | --- | --- | | qdrant-kubernetes-api version | v1.15.5 | | operator version | 2.3.4 | | qdrant-cluster-manager version | v0.3.4 | - Fix bug in operator Helm chart that caused role binding generation to fail when using `watch.namespaces` ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#163-2025-03-28) 1.6.3 (2025-03-28) | | | | --- | --- | | qdrant-kubernetes-api version | v1.15.0 | | operator version | 2.3.3 | | qdrant-cluster-manager version | v0.3.4 | - Performance and stability improvements for collection re-sharding ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#162-2025-03-21) 1.6.2 (2025-03-21) | | | | --- | --- | | qdrant-kubernetes-api version | v1.15.0 | | operator version | 2.3.2 | | qdrant-cluster-manager version | v0.3.3 | - Allow disabling NetworkPolicy management in Qdrant Cluster operator ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#161-2025-03-14) 1.6.1 (2025-03-14) | | | | --- | --- | | qdrant-kubernetes-api version | v1.14.2 | | operator version | 2.3.2 | | qdrant-cluster-manager version | v0.3.3 | - Add support for GPU instances - Experimental support for automatic shard balancing ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#151-2025-03-04) 1.5.1 (2025-03-04) | | | | --- | --- | | qdrant-kubernetes-api version | v1.12.0 | | operator version | 2.1.26 | | qdrant-cluster-manager version | v0.3.2 | - Fix scaling down clusters that have TLS with self-signed certificates configured - Various performance improvements and stability fixes ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#150-2025-02-21) 1.5.0 (2025-02-21) | | | | --- | --- | | qdrant-kubernetes-api version | v1.12.0 | | operator version | 2.1.26 | | qdrant-cluster-manager version | v0.3.0 | - Added support for P2P TLS configuration - Faster node removal on scale down - Various performance improvements and stability fixes ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#140-2025-01-23) 1.4.0 (2025-01-23) | | | | --- | --- | | qdrant-kubernetes-api version | v1.8.0 | | operator version | 2.1.26 | | qdrant-cluster-manager version | v0.3.0 | - Support deleting peers on horizontal scale down, even if they are already offline - Support removing partially deleted peers ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#130-2025-01-17) 1.3.0 (2025-01-17) | | | | --- | --- | | qdrant-kubernetes-api version | v1.8.0 | | operator version | 2.1.21 | | qdrant-cluster-manager version | v0.2.10 | - Support for re-sharding with Qdrant >= 1.13.0 ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#120-2025-01-16) 1.2.0 (2025-01-16) | | | | --- | --- | | qdrant-kubernetes-api version | v1.8.0 | | operator version | 2.1.20 | | qdrant-cluster-manager version | v0.2.9 | - Performance and stability improvements ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#110-2024-12-03) 1.1.0 (2024-12-03) \| qdrant-kubernetes-api version \| v1.6.4 \| \| operator version \| 2.1.10 \| \| qdrant-cluster-manager version \| v0.2.6 \| - Activate cluster-manager for automatic shard replication ## [Anchor](https://qdrant.tech/documentation/private-cloud/changelog/\#100-2024-11-11) 1.0.0 (2024-11-11) | | | | --- | --- | | qdrant-kubernetes-api version | v1.2.7 | | operator version | 0.1.3 | | qdrant-cluster-manager version | v0.2.4 | - Initial release ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/changelog.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/changelog.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-115-lllmstxt|> ## data-ingestion-beginners - [Documentation](https://qdrant.tech/documentation/) - Data Ingestion for Beginners ![data-ingestion-beginners-7](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-7.png) --- # [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#send-s3-data-to-qdrant-vector-store-with-langchain) Send S3 Data to Qdrant Vector Store with LangChain | Time: 30 min | Level: Beginner | | | | --- | --- | --- | --- | **Data ingestion into a vector store** is essential for building effective search and retrieval algorithms, especially since nearly 80% of data is unstructured, lacking any predefined format. In this tutorial, we’ll create a streamlined data ingestion pipeline, pulling data directly from **AWS S3** and feeding it into Qdrant. We’ll dive into vector embeddings, transforming unstructured data into a format that allows you to search documents semantically. Prepare to discover new ways to uncover insights hidden within unstructured data! ## [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#ingestion-workflow-architecture) Ingestion Workflow Architecture We’ll set up a powerful document ingestion and analysis pipeline in this workflow using cloud storage, natural language processing (NLP) tools, and embedding technologies. Starting with raw data in an S3 bucket, we’ll preprocess it with LangChain, apply embedding APIs for both text and images and store the results in Qdrant – a vector database optimized for similarity search. **Figure 1: Data Ingestion Workflow Architecture** ![data-ingestion-beginners-5](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-5.png) Let’s break down each component of this workflow: - **S3 Bucket:** This is our starting point—a centralized, scalable storage solution for various file types like PDFs, images, and text. - **LangChain:** Acting as the pipeline’s orchestrator, LangChain handles extraction, preprocessing, and manages data flow for embedding generation. It simplifies processing PDFs, so you won’t need to worry about applying OCR (Optical Character Recognition) here. - **Qdrant:** As your vector database, Qdrant stores embeddings and their [payloads](https://qdrant.tech/documentation/concepts/payload/), enabling efficient similarity search and retrieval across all content types. ## [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#prerequisites) Prerequisites ![data-ingestion-beginners-11](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-11.png) In this section, you’ll get a step-by-step guide on ingesting data from an S3 bucket. But before we dive in, let’s make sure you’re set up with all the prerequisites: | | | | --- | --- | | Sample Data | We’ll use a sample dataset, where each folder includes product reviews in text format along with corresponding images. | | AWS Account | An active [AWS account](https://aws.amazon.com/free/) with access to S3 services. | | Qdrant Cloud | A [Qdrant Cloud account](https://cloud.qdrant.io/) with access to the WebUI for managing collections and running queries. | | LangChain | You will use this [popular framework](https://www.langchain.com/) to tie everything together. | #### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#supported-document-types) Supported Document Types The documents used for ingestion can be of various types, such as PDFs, text files, or images. We will organize a structured S3 bucket with folders with the supported document types for testing and experimentation. #### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#python-environment) Python Environment Ensure you have a Python environment (Python 3.9 or higher) with these libraries installed: ```python boto3 langchain-community langchain python-dotenv unstructured unstructured[pdf] qdrant_client fastembed ``` * * * **Access Keys:** Store your AWS access key, S3 secret key, and Qdrant API key in a .env file for easy access. Here’s a sample `.env` file. ```text ACCESS_KEY = "" SECRET_ACCESS_KEY = "" QDRANT_KEY = "" ``` * * * ## [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#step-1-ingesting-data-from-s3) Step 1: Ingesting Data from S3 ![data-ingestion-beginners-9.png](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-9.png) The LangChain framework makes it easy to ingest data from storage services like AWS S3, with built-in support for loading documents in formats such as PDFs, images, and text files. To connect LangChain with S3, you’ll use the `S3DirectoryLoader`, which lets you load files directly from an S3 bucket into LangChain’s pipeline. ### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#example-configuring-langchain-to-load-files-from-s3) Example: Configuring LangChain to Load Files from S3 Here’s how to set up LangChain to ingest data from an S3 bucket: ```python from langchain_community.document_loaders import S3DirectoryLoader --- # Initialize the S3 document loader loader = S3DirectoryLoader( "product-dataset", # S3 bucket name "p_1", #S3 Folder name containing the data for the first product aws_access_key_id=aws_access_key_id, # AWS Access Key aws_secret_access_key=aws_secret_access_key # AWS Secret Access Key ) --- # Load documents from the specified S3 bucket docs = loader.load() ``` * * * ## [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#step-2-turning-documents-into-embeddings) Step 2. Turning Documents into Embeddings [Embeddings](https://qdrant.tech/articles/what-are-embeddings/) are the secret sauce here—they’re numerical representations of data (like text, images, or audio) that capture the “meaning” in a form that’s easy to compare. By converting text and images into embeddings, you’ll be able to perform similarity searches quickly and efficiently. Think of embeddings as the bridge to storing and retrieving meaningful insights from your data in Qdrant. ### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#models-well-use-for-generating-embeddings) Models We’ll Use for Generating Embeddings To get things rolling, we’ll use two powerful models: 1. **`sentence-transformers/all-MiniLM-L6-v2` Embeddings** for transforming text data. 2. **`CLIP` (Contrastive Language-Image Pretraining)** for image data. * * * ### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#document-processing-function) Document Processing Function ![data-ingestion-beginners-8.png](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-8.png) Next, we’ll define two functions — `process_text` and `process_image` to handle different file types in our document pipeline. The `process_text` function extracts and returns the raw content from a text-based document, while `process_image` retrieves an image from an S3 source and loads it into memory. ```python from PIL import Image def process_text(doc): source = doc.metadata['source'] # Extract document source (e.g., S3 URL) text = doc.page_content # Extract the content from the text file print(f"Processing text from {source}") return source, text def process_image(doc): source = doc.metadata['source'] # Extract document source (e.g., S3 URL) print(f"Processing image from {source}") bucket_name, object_key = parse_s3_url(source) # Parse the S3 URL response = s3.get_object(Bucket=bucket_name, Key=object_key) # Fetch image from S3 img_bytes = response['Body'].read() img = Image.open(io.BytesIO(img_bytes)) return source, img ``` ### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#helper-functions-for-document-processing) Helper Functions for Document Processing To retrieve images from S3, a helper function `parse_s3_url` breaks down the S3 URL into its bucket and critical components. This is essential for fetching the image from S3 storage. ```python def parse_s3_url(s3_url): parts = s3_url.replace("s3://", "").split("/", 1) bucket_name = parts[0] object_key = parts[1] return bucket_name, object_key ``` * * * ## [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#step-3-loading-embeddings-into-qdrant) Step 3: Loading Embeddings into Qdrant ![data-ingestion-beginners-10](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-10.png) Now that your documents have been processed and converted into embeddings, the next step is to load these embeddings into Qdrant. ### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#creating-a-collection-in-qdrant) Creating a Collection in Qdrant In Qdrant, data is organized in collections, each representing a set of embeddings (or points) and their associated metadata (payload). To store the embeddings generated earlier, you’ll first need to create a collection. Here’s how to create a collection in Qdrant to store both text and image embeddings: ```python def create_collection(collection_name): qdrant_client.create_collection( collection_name, vectors_config={ "text_embedding": models.VectorParams( size=384, # Dimension of text embeddings distance=models.Distance.COSINE, # Cosine similarity is used for comparison ), "image_embedding": models.VectorParams( size=512, # Dimension of image embeddings distance=models.Distance.COSINE, # Cosine similarity is used for comparison ), }, ) create_collection("products-data") ``` * * * This function creates a collection for storing text (384 dimensions) and image (512 dimensions) embeddings, using cosine similarity to compare embeddings within the collection. Once the collection is set up, you can load the embeddings into Qdrant. This involves inserting (or updating) the embeddings and their associated metadata (payload) into the specified collection. Here’s the code for loading embeddings into Qdrant: ```python def ingest_data(points): operation_info = qdrant_client.upsert( collection_name="products-data", # Collection where data is being inserted points=points ) return operation_info ``` * * * **Explanation of Ingestion** 1. **Upserting the Data Point:** The upsert method on the `qdrant_client` inserts each PointStruct into the specified collection. If a point with the same ID already exists, it will be updated with the new values. 2. **Operation Info:** The function returns `operation_info`, which contains details about the upsert operation, such as success status or any potential errors. **Running the Ingestion Code** Here’s how to call the function and ingest data: ```python from qdrant_client import models if __name__ == "__main__": collection_name = "products-data" create_collection(collection_name) for i in range(1,6): # Five documents folder = f"p_{i}" loader = S3DirectoryLoader( "product-dataset", folder, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key ) docs = loader.load() points, text_review, product_image = [], "", "" for idx, doc in enumerate(docs): source = doc.metadata['source'] if source.endswith(".txt") or source.endswith(".pdf"): _text_review_source, text_review = process_text(doc) elif source.endswith(".png"): product_image_source, product_image = process_image(doc) if text_review: point = models.PointStruct( id=idx, # Unique identifier for each point vector={ "text_embedding": models.Document( text=text_review, model="sentence-transformers/all-MiniLM-L6-v2" ), "image_embedding": models.Image( image=product_image, model="Qdrant/clip-ViT-B-32-vision" ), }, payload={"review": text_review, "product_image": product_image_source}, ) points.append(point) operation_info = ingest_data(points) print(operation_info) ``` The `PointStruct` is instantiated with these key parameters: - **id:** A unique identifier for each embedding, typically an incremental index. - **vector:** A dictionary holding the text and image inputs to be embedded. `qdrant-client` uses [FastEmbed](https://github.com/qdrant/fastembed) under the hood to automatically generate vector representations from these inputs locally. - **payload:** A dictionary storing additional metadata, like product reviews and image references, which is invaluable for retrieval and context during searches. The code dynamically loads folders from an S3 bucket, processes text and image files separately, and stores their embeddings and associated data in dedicated lists. It then creates a `PointStruct` for each data entry and calls the ingestion function to load it into Qdrant. ### [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#exploring-the-qdrant-webui-dashboard) Exploring the Qdrant WebUI Dashboard Once the embeddings are loaded into Qdrant, you can use the WebUI dashboard to visualize and manage your collections. The dashboard provides a clear, structured interface for viewing collections and their data. Let’s take a closer look in the next section. ## [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#step-4-visualizing-data-in-qdrant-webui) Step 4: Visualizing Data in Qdrant WebUI To start visualizing your data in the Qdrant WebUI, head to the **Overview** section and select **Access the database**. **Figure 2: Accessing the Database from the Qdrant UI**![data-ingestion-beginners-2.png](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-2.png) When prompted, enter your API key. Once inside, you’ll be able to view your collections and the corresponding data points. You should see your collection displayed like this: **Figure 3: The product-data Collection in Qdrant**![data-ingestion-beginners-4.png](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-4.png) Here’s a look at the most recent point ingested into Qdrant: **Figure 4: The Latest Point Added to the product-data Collection**![data-ingestion-beginners-6.png](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-6.png) The Qdrant WebUI’s search functionality allows you to perform vector searches across your collections. With options to apply filters and parameters, retrieving relevant embeddings and exploring relationships within your data becomes easy. To start, head over to the **Console** in the left panel, where you can create queries: **Figure 5: Overview of Console in Qdrant**![data-ingestion-beginners-1.png](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-1.png) The first query retrieves all collections, the second fetches points from the product-data collection, and the third performs a sample query. This demonstrates how straightforward it is to interact with your data in the Qdrant UI. Now, let’s retrieve some documents from the database using a query!. **Figure 6: Querying the Qdrant Client to Retrieve Relevant Documents**![data-ingestion-beginners-3.png](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-3.png) In this example, we queried **Phones with improved design**. Then, we converted the text to vectors using OpenAI and retrieved a relevant phone review highlighting design improvements. ## [Anchor](https://qdrant.tech/documentation/data-ingestion-beginners/\#conclusion) Conclusion In this guide, we set up an S3 bucket, ingested various data types, and stored embeddings in Qdrant. Using LangChain, we dynamically processed text and image files, making it easy to work with each file type. Now, it’s your turn. Try experimenting with different data types, such as videos, and explore Qdrant’s advanced features to enhance your applications. To get started, [sign up](https://cloud.qdrant.io/signup) for Qdrant today. ![data-ingestion-beginners-12](https://qdrant.tech/documentation/examples/data-ingestion-beginners/data-ingestion-12.png) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/data-ingestion-beginners.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/data-ingestion-beginners.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-116-lllmstxt|> ## vectors - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Vectors --- # [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#vectors) Vectors Vectors (or embeddings) are the core concept of the Qdrant Vector Search engine. Vectors define the similarity between objects in the vector space. If a pair of vectors are similar in vector space, it means that the objects they represent are similar in some way. For example, if you have a collection of images, you can represent each image as a vector. If two images are similar, their vectors will be close to each other in the vector space. In order to obtain a vector representation of an object, you need to apply a vectorization algorithm to the object. Usually, this algorithm is a neural network that converts the object into a fixed-size vector. The neural network is usually [trained](https://qdrant.tech/articles/metric-learning-tips/) on a pairs or [triplets](https://qdrant.tech/articles/triplet-loss/) of similar and dissimilar objects, so it learns to recognize a specific type of similarity. By using this property of vectors, you can explore your data in a number of ways; e.g. by searching for similar objects, clustering objects, and more. ## [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#vector-types) Vector Types Modern neural networks can output vectors in different shapes and sizes, and Qdrant supports most of them. Let’s take a look at the most common types of vectors supported by Qdrant. ### [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#dense-vectors) Dense Vectors This is the most common type of vector. It is a simple list of numbers, it has a fixed length and each element of the list is a floating-point number. It looks like this: ```json // A piece of a real-world dense vector [\ -0.013052909,\ 0.020387933,\ -0.007869,\ -0.11111383,\ -0.030188112,\ -0.0053388323,\ 0.0010654867,\ 0.072027855,\ -0.04167721,\ 0.014839341,\ -0.032948174,\ -0.062975034,\ -0.024837125,\ ....\ ] ``` The majority of neural networks create dense vectors, so you can use them with Qdrant without any additional processing. Although compatible with most embedding models out there, Qdrant has been tested with the following [verified embedding providers](https://qdrant.tech/documentation/embeddings/). ### [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#sparse-vectors) Sparse Vectors Sparse vectors are a special type of vectors. Mathematically, they are the same as dense vectors, but they contain many zeros so they are stored in a special format. Sparse vectors in Qdrant don’t have a fixed length, as it is dynamically allocated during vector insertion. The amount of non-zero values in sparse vectors is currently limited to u32 datatype range (4294967295). In order to define a sparse vector, you need to provide a list of non-zero elements and their indexes. ```json // A sparse vector with 4 non-zero elements { "indexes": [1, 3, 5, 7], "values": [0.1, 0.2, 0.3, 0.4] } ``` Sparse vectors in Qdrant are kept in special storage and indexed in a separate index, so their configuration is different from dense vectors. To create a collection with sparse vectors: httpbashpythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "sparse_vectors": { "text": { } } } ``` ```bash curl -X PUT http://localhost:6333/collections/{collection_name} \ -H 'Content-Type: application/json' \ --data-raw '{ "sparse_vectors": { "text": { } } }' ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config={}, sparse_vectors_config={ "text": models.SparseVectorParams(), }, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { sparse_vectors: { text: { }, }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{ CreateCollectionBuilder, SparseVectorParamsBuilder, SparseVectorsConfigBuilder, }; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut sparse_vector_config = SparseVectorsConfigBuilder::default(); sparse_vector_config.add_named_vector_params("text", SparseVectorParamsBuilder::default()); client .create_collection( CreateCollectionBuilder::new("{collection_name}") .sparse_vectors_config(sparse_vector_config), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.SparseVectorConfig; import io.qdrant.client.grpc.Collections.SparseVectorParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setSparseVectorsConfig( SparseVectorConfig.newBuilder() .putMap("text", SparseVectorParams.getDefaultInstance())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", sparseVectorsConfig: ("text", new SparseVectorParams()) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", SparseVectorsConfig: qdrant.NewSparseVectorsConfig( map[string]*qdrant.SparseVectorParams{ "text": {}, }), }) ``` Insert a point with a sparse vector into the created collection: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "vector": {\ "text": {\ "indices": [1, 3, 5, 7],\ "values": [0.1, 0.2, 0.3, 0.4]\ }\ }\ }\ ] } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ payload={}, # Add any additional payload if necessary\ vector={\ "text": models.SparseVector(\ indices=[1, 3, 5, 7],\ values=[0.1, 0.2, 0.3, 0.4]\ )\ },\ )\ ], ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.upsert("{collection_name}", { points: [\ {\ id: 1,\ vector: {\ text: {\ indices: [1, 3, 5, 7],\ values: [0.1, 0.2, 0.3, 0.4]\ },\ },\ }\ ] }); ``` ```rust use qdrant_client::qdrant::{NamedVectors, PointStruct, UpsertPointsBuilder, Vector}; use qdrant_client::{Payload, Qdrant}; let client = Qdrant::from_url("http://localhost:6334").build()?; let points = vec![PointStruct::new(\ 1,\ NamedVectors::default().add_vector(\ "text",\ Vector::new_sparse(vec![1, 3, 5, 7], vec![0.1, 0.2, 0.3, 0.4]),\ ),\ Payload::new(),\ )]; client .upsert_points(UpsertPointsBuilder::new("{collection_name}", points)) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.VectorFactory.vector; import static io.qdrant.client.VectorsFactory.namedVectors; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors( namedVectors(Map.of( "text", vector(List.of(1.0f, 2.0f), List.of(6, 7)))) ) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List < PointStruct > { new() { Id = 1, Vectors = new Dictionary { ["text"] = ([0.1f, 0.2f, 0.3f, 0.4f], [1, 3, 5, 7]) } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectorsMap( map[string]*qdrant.Vector{ "text": qdrant.NewVectorSparse( []uint32{1, 3, 5, 7}, []float32{0.1, 0.2, 0.3, 0.4}), }), }, }, }) ``` Now you can run a search with sparse vectors: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": { "indices": [1, 3, 5, 7], "values": [0.1, 0.2, 0.3, 0.4] }, "using": "text" } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") result = client.query_points( collection_name="{collection_name}", query=models.SparseVector(indices=[1, 3, 5, 7], values=[0.1, 0.2, 0.3, 0.4]), using="text", ).points ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: { indices: [1, 3, 5, 7], values: [0.1, 0.2, 0.3, 0.4] }, using: "text", limit: 3, }); ``` ```rust use qdrant_client::qdrant::QueryPointsBuilder; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![(1, 0.2), (3, 0.1), (5, 0.9), (7, 0.7)]) .limit(10) .using("text"), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setUsing("text") .setQuery(nearest(List.of(0.1f, 0.2f, 0.3f, 0.4f), List.of(1, 3, 5, 7))) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new (float, uint)[] {(0.1f, 1), (0.2f, 3), (0.3f, 5), (0.4f, 7)}, usingVector: "text", limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuerySparse( []uint32{1, 3, 5, 7}, []float32{0.1, 0.2, 0.3, 0.4}), Using: qdrant.PtrOf("text"), }) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#multivectors) Multivectors **Available as of v1.10.0** Qdrant supports the storing of a variable amount of same-shaped dense vectors in a single point. This means that instead of a single dense vector, you can upload a matrix of dense vectors. The length of the matrix is fixed, but the number of vectors in the matrix can be different for each point. Multivectors look like this: ```json // A multivector of size 4 "vector": [\ [-0.013, 0.020, -0.007, -0.111],\ [-0.030, -0.055, 0.001, 0.072],\ [-0.041, 0.014, -0.032, -0.062],\ ....\ ] ``` There are two scenarios where multivectors are useful: - **Multiple representation of the same object** \- For example, you can store multiple embeddings for pictures of the same object, taken from different angles. This approach assumes that the payload is same for all vectors. - **Late interaction embeddings** \- Some text embedding models can output multiple vectors for a single text. For example, a family of models such as ColBERT output a relatively small vector for each token in the text. In order to use multivectors, we need to specify a function that will be used to compare between matrices of vectors Currently, Qdrant supports `max_sim` function, which is defined as a sum of maximum similarities between each pair of vectors in the matrices. score=∑i=1Nmaxj=1MSim(vectorAi,vectorBj) Where N is the number of vectors in the first matrix, M is the number of vectors in the second matrix, and Sim is a similarity function, for example, cosine similarity. To use multivectors, create a collection with the following configuration: httppythontypescriptrustjavacsharpgo ```http PUT collections/{collection_name} { "vectors": { "size": 128, "distance": "Cosine", "multivector_config": { "comparator": "max_sim" } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 128, distance: "Cosine", multivector_config: { comparator: "max_sim" } }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, VectorParamsBuilder, MultiVectorComparator, MultiVectorConfigBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config( VectorParamsBuilder::new(100, Distance::Cosine) .multivector_config( MultiVectorConfigBuilder::new(MultiVectorComparator::MaxSim) ), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.MultiVectorComparator; import io.qdrant.client.grpc.Collections.MultiVectorConfig; import io.qdrant.client.grpc.Collections.VectorParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.createCollectionAsync("{collection_name}", VectorParams.newBuilder().setSize(128) .setDistance(Distance.Cosine) .setMultivectorConfig(MultiVectorConfig.newBuilder() .setComparator(MultiVectorComparator.MaxSim) .build()) .build()).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 128, Distance = Distance.Cosine, MultivectorConfig = new() { Comparator = MultiVectorComparator.MaxSim } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 128, Distance: qdrant.Distance_Cosine, MultivectorConfig: &qdrant.MultiVectorConfig{ Comparator: qdrant.MultiVectorComparator_MaxSim, }, }), }) ``` To insert a point with multivector: httppythontypescriptrustjavacsharpgo ```http PUT collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "vector": [\ [-0.013, 0.020, -0.007, -0.111, ...],\ [-0.030, -0.055, 0.001, 0.072, ...],\ [-0.041, 0.014, -0.032, -0.062, ...]\ ]\ }\ ] } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ vector=[\ [-0.013, 0.020, -0.007, -0.111],\ [-0.030, -0.055, 0.001, 0.072],\ [-0.041, 0.014, -0.032, -0.062]\ ],\ )\ ], ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.upsert("{collection_name}", { points: [\ {\ id: 1,\ vector: [\ [-0.013, 0.020, -0.007, -0.111, ...],\ [-0.030, -0.055, 0.001, 0.072, ...],\ [-0.041, 0.014, -0.032, -0.062, ...]\ ],\ }\ ] }); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder, Vector}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let points = vec![\ PointStruct::new(\ 1,\ Vector::new_multi(vec![\ vec![-0.013, 0.020, -0.007, -0.111],\ vec![-0.030, -0.055, 0.001, 0.072],\ vec![-0.041, 0.014, -0.032, -0.062],\ ]),\ Payload::new()\ )\ ]; client .upsert_points( UpsertPointsBuilder::new("{collection_name}", points) ).await?; ``` ```java import java.util.List; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.VectorsFactory.vectors; import static io.qdrant.client.VectorFactory.multiVector; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(multiVector(new float[][] { {-0.013f, 0.020f, -0.007f, -0.111f}, {-0.030f, -0.055f, 0.001f, 0.072f}, {-0.041f, 0.014f, -0.032f, -0.062f} }))) .build() )) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new float[][] { [-0.013f, 0.020f, -0.007f, -0.111f], [-0.030f, -0.05f, 0.001f, 0.072f], [-0.041f, 0.014f, -0.032f, -0.062f ], }, }, } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectorsMulti( [][]float32{ {-0.013, 0.020, -0.007, -0.111}, {-0.030, -0.055, 0.001, 0.072}, {-0.041, 0.014, -0.032, -0.062}}), }, }, }) ``` To search with multivector (available in `query` API): httppythontypescriptrustjavacsharpgo ```http POST collections/{collection_name}/points/query { "query": [\ [-0.013, 0.020, -0.007, -0.111, ...],\ [-0.030, -0.055, 0.001, 0.072, ...],\ [-0.041, 0.014, -0.032, -0.062, ...]\ ] } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[\ [-0.013, 0.020, -0.007, -0.111],\ [-0.030, -0.055, 0.001, 0.072],\ [-0.041, 0.014, -0.032, -0.062]\ ], ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { "query": [\ [-0.013, 0.020, -0.007, -0.111],\ [-0.030, -0.055, 0.001, 0.072],\ [-0.041, 0.014, -0.032, -0.062]\ ] }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{ QueryPointsBuilder, VectorInput }; let client = Qdrant::from_url("http://localhost:6334").build()?; let res = client.query( QueryPointsBuilder::new("{collection_name}") .query(VectorInput::new_multi( vec![\ vec![-0.013, 0.020, -0.007, -0.111],\ vec![-0.030, -0.055, 0.001, 0.072],\ vec![-0.041, 0.014, -0.032, -0.062],\ ] )) ).await?; ``` ```java import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(new float[][] { {-0.013f, 0.020f, -0.007f, -0.111f}, {-0.030f, -0.055f, 0.001f, 0.072f}, {-0.041f, 0.014f, -0.032f, -0.062f} })) .build()).get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[][] { [-0.013f, 0.020f, -0.007f, -0.111f], [-0.030f, -0.055f, 0.001 , 0.072f], [-0.041f, 0.014f, -0.032f, -0.062f], } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQueryMulti( [][]float32{ {-0.013, 0.020, -0.007, -0.111}, {-0.030, -0.055, 0.001, 0.072}, {-0.041, 0.014, -0.032, -0.062}, }), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#named-vectors) Named Vectors In Qdrant, you can store multiple vectors of different sizes and [types](https://qdrant.tech/documentation/concepts/vectors/#vector-types) in the same data [point](https://qdrant.tech/documentation/concepts/points/). This is useful when you need to define your data with multiple embeddings to represent different features or modalities (e.g., image, text or video). To store different vectors for each point, you need to create separate named vector spaces in the [collection](https://qdrant.tech/documentation/concepts/collections/). You can define these vector spaces during collection creation and manage them independently. To create a collection with named vectors, you need to specify a configuration for each vector: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "image": { "size": 4, "distance": "Dot" }, "text": { "size": 5, "distance": "Cosine" } }, "sparse_vectors": { "text-sparse": {} } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config={ "image": models.VectorParams(size=4, distance=models.Distance.DOT), "text": models.VectorParams(size=5, distance=models.Distance.COSINE), }, sparse_vectors_config={"text-sparse": models.SparseVectorParams()}, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { image: { size: 4, distance: "Dot" }, text: { size: 5, distance: "Cosine" }, }, sparse_vectors: { text_sparse: {} } }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, SparseVectorParamsBuilder, SparseVectorsConfigBuilder, VectorParamsBuilder, VectorsConfigBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut vector_config = VectorsConfigBuilder::default(); vector_config.add_named_vector_params("text", VectorParamsBuilder::new(5, Distance::Dot)); vector_config.add_named_vector_params("image", VectorParamsBuilder::new(4, Distance::Cosine)); let mut sparse_vectors_config = SparseVectorsConfigBuilder::default(); sparse_vectors_config .add_named_vector_params("text-sparse", SparseVectorParamsBuilder::default()); client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(vector_config) .sparse_vectors_config(sparse_vectors_config), ) .await?; ``` ```java import java.util.Map; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.SparseVectorConfig; import io.qdrant.client.grpc.Collections.SparseVectorParams; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorParamsMap; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig(VectorsConfig.newBuilder().setParamsMap( VectorParamsMap.newBuilder().putAllMap(Map.of("image", VectorParams.newBuilder() .setSize(4) .setDistance(Distance.Dot) .build(), "text", VectorParams.newBuilder() .setSize(5) .setDistance(Distance.Cosine) .build())))) .setSparseVectorsConfig(SparseVectorConfig.newBuilder().putMap( "text-sparse", SparseVectorParams.getDefaultInstance())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParamsMap { Map = { ["image"] = new VectorParams { Size = 4, Distance = Distance.Dot }, ["text"] = new VectorParams { Size = 5, Distance = Distance.Cosine }, } }, sparseVectorsConfig: new SparseVectorConfig { Map = { ["text-sparse"] = new() } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfigMap( map[string]*qdrant.VectorParams{ "image": { Size: 4, Distance: qdrant.Distance_Dot, }, "text": { Size: 5, Distance: qdrant.Distance_Cosine, }, }), SparseVectorsConfig: qdrant.NewSparseVectorsConfig( map[string]*qdrant.SparseVectorParams{ "text-sparse": {}, }, ), }) ``` To insert a point with named vectors: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points?wait=true { "points": [\ {\ "id": 1,\ "vector": {\ "image": [0.9, 0.1, 0.1, 0.2],\ "text": [0.4, 0.7, 0.1, 0.8, 0.1],\ "text-sparse": {\ "indices": [1, 3, 5, 7],\ "values": [0.1, 0.2, 0.3, 0.4]\ }\ }\ }\ ] } ``` ```python client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ vector={\ "image": [0.9, 0.1, 0.1, 0.2],\ "text": [0.4, 0.7, 0.1, 0.8, 0.1],\ "text-sparse": {\ "indices": [1, 3, 5, 7],\ "values": [0.1, 0.2, 0.3, 0.4],\ },\ },\ ),\ ], ) ``` ```typescript client.upsert("{collection_name}", { points: [\ {\ id: 1,\ vector: {\ image: [0.9, 0.1, 0.1, 0.2],\ text: [0.4, 0.7, 0.1, 0.8, 0.1],\ text_sparse: {\ indices: [1, 3, 5, 7],\ values: [0.1, 0.2, 0.3, 0.4]\ }\ },\ },\ ], }); ``` ```rust use qdrant_client::qdrant::{ NamedVectors, PointStruct, UpsertPointsBuilder, Vector, }; use qdrant_client::Payload; client .upsert_points( UpsertPointsBuilder::new( "{collection_name}", vec![PointStruct::new(\ 1,\ NamedVectors::default()\ .add_vector("text", Vector::new_dense(vec![0.4, 0.7, 0.1, 0.8, 0.1]))\ .add_vector("image", Vector::new_dense(vec![0.9, 0.1, 0.1, 0.2]))\ .add_vector(\ "text-sparse",\ Vector::new_sparse(vec![1, 3, 5, 7], vec![0.1, 0.2, 0.3, 0.4]),\ ),\ Payload::default(),\ )], ) .wait(true), ) .await?; ``` ```java import java.util.List; import java.util.Map; import static io.qdrant.client.PointIdFactory.id; import static io.qdrant.client.VectorFactory.vector; import static io.qdrant.client.VectorsFactory.namedVectors; import io.qdrant.client.grpc.Points.PointStruct; client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors( namedVectors( Map.of( "image", vector(List.of(0.9f, 0.1f, 0.1f, 0.2f)), "text", vector(List.of(0.4f, 0.7f, 0.1f, 0.8f, 0.1f)), "text-sparse", vector(List.of(0.1f, 0.2f, 0.3f, 0.4f), List.of(1, 3, 5, 7))))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new Dictionary { ["image"] = new() { Data = {0.9f, 0.1f, 0.1f, 0.2f} }, ["text"] = new() { Data = {0.4f, 0.7f, 0.1f, 0.8f, 0.1f} }, ["text-sparse"] = ([0.1f, 0.2f, 0.3f, 0.4f], [1, 3, 5, 7]), } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectorsMap(map[string]*qdrant.Vector{ "image": qdrant.NewVector(0.9, 0.1, 0.1, 0.2), "text": qdrant.NewVector(0.4, 0.7, 0.1, 0.8, 0.1), "text-sparse": qdrant.NewVectorSparse( []uint32{1, 3, 5, 7}, []float32{0.1, 0.2, 0.3, 0.4}), }), }, }, }) ``` To search with named vectors (available in `query` API): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "using": "image", "limit": 3 } ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], using="image", limit=3, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], using: "image", limit: 3, }); ``` ```rust use qdrant_client::qdrant::QueryPointsBuilder; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .using("image"), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setUsing("image") .setLimit(3) .build()).get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, usingVector: "image", limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Using: qdrant.PtrOf("image"), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#datatypes) Datatypes Newest versions of embeddings models generate vectors with very large dimentionalities. With OpenAI’s `text-embedding-3-large` embedding model, the dimensionality can go up to 3072. The amount of memory required to store such vectors grows linearly with the dimensionality, so it is important to choose the right datatype for the vectors. The choice between datatypes is a trade-off between memory consumption and precision of vectors. Qdrant supports a number of datatypes for both dense and sparse vectors: **Float32** This is the default datatype for vectors in Qdrant. It is a 32-bit (4 bytes) floating-point number. The standard OpenAI embedding of 1536 dimensionality will require 6KB of memory to store in Float32. You don’t need to specify the datatype for vectors in Qdrant, as it is set to Float32 by default. **Float16** This is a 16-bit (2 bytes) floating-point number. It is also known as half-precision float. Intuitively, it looks like this: ```text float32 -> float16 delta (float32 - float16).abs 0.79701585 -> 0.796875 delta 0.00014084578 0.7850789 -> 0.78515625 delta 0.00007736683 0.7775044 -> 0.77734375 delta 0.00016063452 0.85776305 -> 0.85791016 delta 0.00014710426 0.6616839 -> 0.6616211 delta 0.000062823296 ``` The main advantage of Float16 is that it requires half the memory of Float32, while having virtually no impact on the quality of vector search. To use Float16, you need to specify the datatype for vectors in the collection configuration: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 128, "distance": "Cosine", "datatype": "float16" // <-- For dense vectors }, "sparse_vectors": { "text": { "index": { "datatype": "float16" // <-- And for sparse vectors } } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( size=128, distance=models.Distance.COSINE, datatype=models.Datatype.FLOAT16 ), sparse_vectors_config={ "text": models.SparseVectorParams( index=models.SparseIndexParams(datatype=models.Datatype.FLOAT16) ), }, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 128, distance: "Cosine", datatype: "float16" }, sparse_vectors: { text: { index: { datatype: "float16" } } } }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Datatype, Distance, SparseIndexConfigBuilder, SparseVectorParamsBuilder, SparseVectorsConfigBuilder, VectorParamsBuilder }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut sparse_vector_config = SparseVectorsConfigBuilder::default(); sparse_vector_config.add_named_vector_params( "text", SparseVectorParamsBuilder::default() .index(SparseIndexConfigBuilder::default().datatype(Datatype::Float32)), ); let create_collection = CreateCollectionBuilder::new("{collection_name}") .sparse_vectors_config(sparse_vector_config) .vectors_config( VectorParamsBuilder::new(128, Distance::Cosine).datatype(Datatype::Float16), ); client.create_collection(create_collection).await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Datatype; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.SparseIndexConfig; import io.qdrant.client.grpc.Collections.SparseVectorConfig; import io.qdrant.client.grpc.Collections.SparseVectorParams; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig(VectorsConfig.newBuilder() .setParams(VectorParams.newBuilder() .setSize(128) .setDistance(Distance.Cosine) .setDatatype(Datatype.Float16) .build()) .build()) .setSparseVectorsConfig( SparseVectorConfig.newBuilder() .putMap("text", SparseVectorParams.newBuilder() .setIndex(SparseIndexConfig.newBuilder() .setDatatype(Datatype.Float16) .build()) .build())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 128, Distance = Distance.Cosine, Datatype = Datatype.Float16 }, sparseVectorsConfig: ( "text", new SparseVectorParams { Index = new SparseIndexConfig { Datatype = Datatype.Float16 } } ) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 128, Distance: qdrant.Distance_Cosine, Datatype: qdrant.Datatype_Float16.Enum(), }), SparseVectorsConfig: qdrant.NewSparseVectorsConfig( map[string]*qdrant.SparseVectorParams{ "text": { Index: &qdrant.SparseIndexConfig{ Datatype: qdrant.Datatype_Float16.Enum(), }, }, }), }) ``` **Uint8** Another step towards memory optimization is to use the Uint8 datatype for vectors. Unlike Float16, Uint8 is not a floating-point number, but an integer number in the range from 0 to 255. Not all embeddings models generate vectors in the range from 0 to 255, so you need to be careful when using Uint8 datatype. In order to convert a number from float range to Uint8 range, you need to apply a process called quantization. Some embedding providers may provide embeddings in a pre-quantized format. One of the most notable examples is the [Cohere int8 & binary embeddings](https://cohere.com/blog/int8-binary-embeddings). For other embeddings, you will need to apply quantization yourself. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 128, "distance": "Cosine", "datatype": "uint8" // <-- For dense vectors }, "sparse_vectors": { "text": { "index": { "datatype": "uint8" // <-- For sparse vectors } } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( size=128, distance=models.Distance.COSINE, datatype=models.Datatype.UINT8 ), sparse_vectors_config={ "text": models.SparseVectorParams( index=models.SparseIndexParams(datatype=models.Datatype.UINT8) ), }, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 128, distance: "Cosine", datatype: "uint8" }, sparse_vectors: { text: { index: { datatype: "uint8" } } } }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Datatype, Distance, SparseIndexConfigBuilder, SparseVectorParamsBuilder, SparseVectorsConfigBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let mut sparse_vector_config = SparseVectorsConfigBuilder::default(); sparse_vector_config.add_named_vector_params( "text", SparseVectorParamsBuilder::default() .index(SparseIndexConfigBuilder::default().datatype(Datatype::Uint8)), ); let create_collection = CreateCollectionBuilder::new("{collection_name}") .sparse_vectors_config(sparse_vector_config) .vectors_config( VectorParamsBuilder::new(128, Distance::Cosine) .datatype(Datatype::Uint8) ); client.create_collection(create_collection).await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Datatype; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.SparseIndexConfig; import io.qdrant.client.grpc.Collections.SparseVectorConfig; import io.qdrant.client.grpc.Collections.SparseVectorParams; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig(VectorsConfig.newBuilder() .setParams(VectorParams.newBuilder() .setSize(128) .setDistance(Distance.Cosine) .setDatatype(Datatype.Uint8) .build()) .build()) .setSparseVectorsConfig( SparseVectorConfig.newBuilder() .putMap("text", SparseVectorParams.newBuilder() .setIndex(SparseIndexConfig.newBuilder() .setDatatype(Datatype.Uint8) .build()) .build())) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 128, Distance = Distance.Cosine, Datatype = Datatype.Uint8 }, sparseVectorsConfig: ( "text", new SparseVectorParams { Index = new SparseIndexConfig { Datatype = Datatype.Uint8 } } ) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 128, Distance: qdrant.Distance_Cosine, Datatype: qdrant.Datatype_Uint8.Enum(), }), SparseVectorsConfig: qdrant.NewSparseVectorsConfig( map[string]*qdrant.SparseVectorParams{ "text": { Index: &qdrant.SparseIndexConfig{ Datatype: qdrant.Datatype_Uint8.Enum(), }, }, }), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#quantization) Quantization Apart from changing the datatype of the original vectors, Qdrant can create quantized representations of vectors alongside the original ones. This quantized representation can be used to quickly select candidates for rescoring with the original vectors or even used directly for search. Quantization is applied in the background, during the optimization process. More information about the quantization process can be found in the [Quantization](https://qdrant.tech/documentation/guides/quantization/) section. ## [Anchor](https://qdrant.tech/documentation/concepts/vectors/\#vector-storage) Vector Storage Depending on the requirements of the application, Qdrant can use one of the data storage options. Keep in mind that you will have to tradeoff between search speed and the size of RAM used. More information about the storage options can be found in the [Storage](https://qdrant.tech/documentation/concepts/storage/#vector-storage) section. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/vectors.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/vectors.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-117-lllmstxt|> ## io_uring - [Articles](https://qdrant.tech/articles/) - Qdrant under the hood: io\_uring [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Qdrant under the hood: io\_uring Andre Bogus · June 21, 2023 ![Qdrant under the hood: io_uring](https://qdrant.tech/articles_data/io_uring/preview/title.jpg) With Qdrant [version 1.3.0](https://github.com/qdrant/qdrant/releases/tag/v1.3.0) we introduce the alternative io\_uring based _async uring_ storage backend on Linux-based systems. Since its introduction, io\_uring has been known to improve async throughput wherever the OS syscall overhead gets too high, which tends to occur in situations where software becomes _IO bound_ (that is, mostly waiting on disk). ## [Anchor](https://qdrant.tech/articles/io_uring/\#inputoutput) Input+Output Around the mid-90s, the internet took off. The first servers used a process- per-request setup, which was good for serving hundreds if not thousands of concurrent request. The POSIX Input + Output (IO) was modeled in a strictly synchronous way. The overhead of starting a new process for each request made this model unsustainable. So servers started forgoing process separation, opting for the thread-per-request model. But even that ran into limitations. I distinctly remember when someone asked the question whether a server could serve 10k concurrent connections, which at the time exhausted the memory of most systems (because every thread had to have its own stack and some other metadata, which quickly filled up available memory). As a result, the synchronous IO was replaced by asynchronous IO during the 2.5 kernel update, either via `select` or `epoll` (the latter being Linux-only, but a small bit more efficient, so most servers of the time used it). However, even this crude form of asynchronous IO carries the overhead of at least one system call per operation. Each system call incurs a context switch, and while this operation is itself not that slow, the switch disturbs the caches. Today’s CPUs are much faster than memory, but if their caches start to miss data, the memory accesses required led to longer and longer wait times for the CPU. ### [Anchor](https://qdrant.tech/articles/io_uring/\#memory-mapped-io) Memory-mapped IO Another way of dealing with file IO (which unlike network IO doesn’t have a hard time requirement) is to map parts of files into memory - the system fakes having that chunk of the file in memory, so when you read from a location there, the kernel interrupts your process to load the needed data from disk, and resumes your process once done, whereas writing to the memory will also notify the kernel. Also the kernel can prefetch data while the program is running, thus reducing the likelyhood of interrupts. Thus there is still some overhead, but (especially in asynchronous applications) it’s far less than with `epoll`. The reason this API is rarely used in web servers is that these usually have a large variety of files to access, unlike a database, which can map its own backing store into memory once. ### [Anchor](https://qdrant.tech/articles/io_uring/\#combating-the-poll-ution) Combating the Poll-ution There were multiple experiments to improve matters, some even going so far as moving a HTTP server into the kernel, which of course brought its own share of problems. Others like Intel added their own APIs that ignored the kernel and worked directly on the hardware. Finally, Jens Axboe took matters into his own hands and proposed a ring buffer based interface called _io\_uring_. The buffers are not directly for data, but for operations. User processes can setup a Submission Queue (SQ) and a Completion Queue (CQ), both of which are shared between the process and the kernel, so there’s no copying overhead. ![io_uring diagram](https://qdrant.tech/articles_data/io_uring/io-uring.png) Apart from avoiding copying overhead, the queue-based architecture lends itself to multithreading as item insertion/extraction can be made lockless, and once the queues are set up, there is no further syscall that would stop any user thread. Servers that use this can easily get to over 100k concurrent requests. Today Linux allows asynchronous IO via io\_uring for network, disk and accessing other ports, e.g. for printing or recording video. ## [Anchor](https://qdrant.tech/articles/io_uring/\#and-what-about-qdrant) And what about Qdrant? Qdrant can store everything in memory, but not all data sets may fit, which can require storing on disk. Before io\_uring, Qdrant used mmap to do its IO. This led to some modest overhead in case of disk latency. The kernel may stop a user thread trying to access a mapped region, which incurs some context switching overhead plus the wait time until the disk IO is finished. Ultimately, this works very well with the asynchronous nature of Qdrant’s core. One of the great optimizations Qdrant offers is quantization (either [scalar](https://qdrant.tech/articles/scalar-quantization/) or [product](https://qdrant.tech/articles/product-quantization/)-based). However unless the collection resides fully in memory, this optimization method generates significant disk IO, so it is a prime candidate for possible improvements. If you run Qdrant on Linux, you can enable io\_uring with the following in your configuration: ```yaml --- # within the storage config storage: # enable the async scorer which uses io_uring async_scorer: true ``` You can return to the mmap based backend by either deleting the `async_scorer` entry or setting the value to `false`. ## [Anchor](https://qdrant.tech/articles/io_uring/\#benchmarks) Benchmarks To run the benchmark, use a test instance of Qdrant. If necessary spin up a docker container and load a snapshot of the collection you want to benchmark with. You can copy and edit our [benchmark script](https://qdrant.tech/articles_data/io_uring/rescore-benchmark.sh) to run the benchmark. Run the script with and without enabling `storage.async_scorer` and once. You can measure IO usage with `iostat` from another console. For our benchmark, we chose the laion dataset picking 5 million 768d entries. We enabled scalar quantization + HNSW with m=16 and ef\_construct=512. We do the quantization in RAM, HNSW in RAM but keep the original vectors on disk (which was a network drive rented from Hetzner for the benchmark). If you want to reproduce the benchmarks, you can get snapshots containing the datasets: - [mmap only](https://storage.googleapis.com/common-datasets-snapshots/laion-768-6m-mmap.snapshot) - [with scalar quantization](https://storage.googleapis.com/common-datasets-snapshots/laion-768-6m-sq-m16-mmap.shapshot) Running the benchmark, we get the following IOPS, CPU loads and wall clock times: | | oversampling | parallel | ~max IOPS | CPU% (of 4 cores) | time (s) (avg of 3) | | --- | --- | --- | --- | --- | --- | | io\_uring | 1 | 4 | 4000 | 200 | 12 | | mmap | 1 | 4 | 2000 | 93 | 43 | | io\_uring | 1 | 8 | 4000 | 200 | 12 | | mmap | 1 | 8 | 2000 | 90 | 43 | | io\_uring | 4 | 8 | 7000 | 100 | 30 | | mmap | 4 | 8 | 2300 | 50 | 145 | Note that in this case, the IO operations have relatively high latency due to using a network disk. Thus, the kernel takes more time to fulfil the mmap requests, and application threads need to wait, which is reflected in the CPU percentage. On the other hand, with the io\_uring backend, the application threads can better use available cores for the rescore operation without any IO-induced delays. Oversampling is a new feature to improve accuracy at the cost of some performance. It allows setting a factor, which is multiplied with the `limit` while doing the search. The results are then re-scored using the original vector and only then the top results up to the limit are selected. ## [Anchor](https://qdrant.tech/articles/io_uring/\#discussion) Discussion Looking back, disk IO used to be very serialized; re-positioning read-write heads on moving platter was a slow and messy business. So the system overhead didn’t matter as much, but nowadays with SSDs that can often even parallelize operations while offering near-perfect random access, the overhead starts to become quite visible. While memory-mapped IO gives us a fair deal in terms of ease of use and performance, we can improve on the latter in exchange for some modest complexity increase. io\_uring is still quite young, having only been introduced in 2019 with kernel 5.1, so some administrators will be wary of introducing it. Of course, as with performance, the right answer is usually “it depends”, so please review your personal risk profile and act accordingly. ## [Anchor](https://qdrant.tech/articles/io_uring/\#best-practices) Best Practices If your on-disk collection’s query performance is of sufficiently high priority to you, enable the io\_uring-based async\_scorer to greatly reduce operating system overhead from disk IO. On the other hand, if your collections are in memory only, activating it will be ineffective. Also note that many queries are not IO bound, so the overhead may or may not become measurable in your workload. Finally, on-device disks typically carry lower latency than network drives, which may also affect mmap overhead. Therefore before you roll out io\_uring, perform the above or a similar benchmark with both mmap and io\_uring and measure both wall time and IOps). Benchmarks are always highly use-case dependent, so your mileage may vary. Still, doing that benchmark once is a small price for the possible performance wins. Also please [tell us](https://discord.com/channels/907569970500743200/907569971079569410) about your benchmark results! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/io_uring.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/io_uring.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-118-lllmstxt|> ## dataset-quality - [Articles](https://qdrant.tech/articles/) - Finding errors in datasets with Similarity Search [Back to Data Exploration](https://qdrant.tech/articles/data-exploration/) --- # Finding errors in datasets with Similarity Search George Panchuk · July 18, 2022 ![Finding errors in datasets with Similarity Search](https://qdrant.tech/articles_data/dataset-quality/preview/title.jpg) Nowadays, people create a huge number of applications of various types and solve problems in different areas. Despite such diversity, they have something in common - they need to process data. Real-world data is a living structure, it grows day by day, changes a lot and becomes harder to work with. In some cases, you need to categorize or label your data, which can be a tough problem given its scale. The process of splitting or labelling is error-prone and these errors can be very costly. Imagine that you failed to achieve the desired quality of the model due to inaccurate labels. Worse, your users are faced with a lot of irrelevant items, unable to find what they need and getting annoyed by it. Thus, you get poor retention, and it directly impacts company revenue. It is really important to avoid such errors in your data. ## [Anchor](https://qdrant.tech/articles/dataset-quality/\#furniture-web-marketplace) Furniture web-marketplace Let’s say you work on an online furniture marketplace. ![Furniture marketplace](https://storage.googleapis.com/demo-dataset-quality-public/article/furniture_marketplace.png) Furniture marketplace In this case, to ensure a good user experience, you need to split items into different categories: tables, chairs, beds, etc. One can arrange all the items manually and spend a lot of money and time on this. There is also another way: train a classification or similarity model and rely on it. With both approaches it is difficult to avoid mistakes. Manual labelling is a tedious task, but it requires concentration. Once you got distracted or your eyes became blurred mistakes won’t keep you waiting. The model also can be wrong. You can analyse the most uncertain predictions and fix them, but the other errors will still leak to the site. There is no silver bullet. You should validate your dataset thoroughly, and you need tools for this. When you are sure that there are not many objects placed in the wrong category, they can be considered outliers or anomalies. Thus, you can train a model or a bunch of models capable of looking for anomalies, e.g. autoencoder and a classifier on it. However, this is again a resource-intensive task, both in terms of time and manual labour, since labels have to be provided for classification. On the contrary, if the proportion of out-of-place elements is high enough, outlier search methods are likely to be useless. ### [Anchor](https://qdrant.tech/articles/dataset-quality/\#similarity-search) Similarity search The idea behind similarity search is to measure semantic similarity between related parts of the data. E.g. between category title and item images. The hypothesis is, that unsuitable items will be less similar. We can’t directly compare text and image data. For this we need an intermediate representation - embeddings. Embeddings are just numeric vectors containing semantic information. We can apply a pre-trained model to our data to produce these vectors. After embeddings are created, we can measure the distances between them. Assume we want to search for something other than a single bed in «Single beds» category. ![Similarity search](https://storage.googleapis.com/demo-dataset-quality-public/article/similarity_search.png) Similarity search One of the possible pipelines would look like this: - Take the name of the category as an anchor and calculate the anchor embedding. - Calculate embeddings for images of each object placed into this category. - Compare obtained anchor and object embeddings. - Find the furthest. For instance, we can do it with the [CLIP](https://huggingface.co/sentence-transformers/clip-ViT-B-32-multilingual-v1) model. ![Category vs. Image](https://storage.googleapis.com/demo-dataset-quality-public/article/category_vs_image_transparent.png) Category vs. Image We can also calculate embeddings for titles instead of images, or even for both of them to find more errors. ![Category vs. Title and Image](https://storage.googleapis.com/demo-dataset-quality-public/article/category_vs_name_and_image_transparent.png) Category vs. Title and Image As you can see, different approaches can find new errors or the same ones. Stacking several techniques or even the same techniques with different models may provide better coverage. Hint: Caching embeddings for the same models and reusing them among different methods can significantly speed up your lookup. ### [Anchor](https://qdrant.tech/articles/dataset-quality/\#diversity-search) Diversity search Since pre-trained models have only general knowledge about the data, they can still leave some misplaced items undetected. You might find yourself in a situation when the model focuses on non-important features, selects a lot of irrelevant elements, and fails to find genuine errors. To mitigate this issue, you can perform a diversity search. Diversity search is a method for finding the most distinctive examples in the data. As similarity search, it also operates on embeddings and measures the distances between them. The difference lies in deciding which point should be extracted next. Let’s imagine how to get 3 points with similarity search and then with diversity search. Similarity: 1. Calculate distance matrix 2. Choose your anchor 3. Get a vector corresponding to the distances from the selected anchor from the distance matrix 4. Sort fetched vector 5. Get top-3 embeddings Diversity: 1. Calculate distance matrix 2. Initialize starting point (randomly or according to the certain conditions) 3. Get a distance vector for the selected starting point from the distance matrix 4. Find the furthest point 5. Get a distance vector for the new point 6. Find the furthest point from all of already fetched points ![Diversity search](https://storage.googleapis.com/demo-dataset-quality-public/article/diversity_transparent.png) Diversity search Diversity search utilizes the very same embeddings, and you can reuse them. If your data is huge and does not fit into memory, vector search engines like [Qdrant](https://github.com/qdrant/qdrant) might be helpful. Although the described methods can be used independently. But they are simple to combine and improve detection capabilities. If the quality remains insufficient, you can fine-tune the models using a similarity learning approach (e.g. with [Quaterion](https://quaterion.qdrant.tech/) both to provide a better representation of your data and pull apart dissimilar objects in space. ## [Anchor](https://qdrant.tech/articles/dataset-quality/\#conclusion) Conclusion In this article, we enlightened distance-based methods to find errors in categorized datasets. Showed how to find incorrectly placed items in the furniture web store. I hope these methods will help you catch sneaky samples leaked into the wrong categories in your data, and make your users\` experience more enjoyable. Poke the [demo](https://dataset-quality.qdrant.tech/). Stay tuned :) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dataset-quality.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dataset-quality.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-119-lllmstxt|> ## qdrant-fundamentals - [Documentation](https://qdrant.tech/documentation/) - [Faq](https://qdrant.tech/documentation/faq/) - Qdrant Fundamentals --- # [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#frequently-asked-questions-general-topics) Frequently Asked Questions: General Topics | | | | | | | --- | --- | --- | --- | --- | | [Vectors](https://qdrant.tech/documentation/faq/qdrant-fundamentals/#vectors) | [Search](https://qdrant.tech/documentation/faq/qdrant-fundamentals/#search) | [Collections](https://qdrant.tech/documentation/faq/qdrant-fundamentals/#collections) | [Compatibility](https://qdrant.tech/documentation/faq/qdrant-fundamentals/#compatibility) | [Cloud](https://qdrant.tech/documentation/faq/qdrant-fundamentals/#cloud) | ## [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#vectors) Vectors ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#what-is-the-maximum-vector-dimension-supported-by-qdrant) What is the maximum vector dimension supported by Qdrant? Qdrant supports up to 65,535 dimensions by default, but this can be configured to support higher dimensions. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#what-is-the-maximum-size-of-vector-metadata-that-can-be-stored) What is the maximum size of vector metadata that can be stored? There is no inherent limitation on metadata size, but it should be [optimized for performance and resource usage](https://qdrant.tech/documentation/guides/optimize/). Users can set upper limits in the configuration. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#can-the-same-similarity-search-query-yield-different-results-on-different-machines) Can the same similarity search query yield different results on different machines? Yes, due to differences in hardware configurations and parallel processing, results may vary slightly. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#how-do-i-choose-the-right-vector-embeddings-for-my-use-case) How do I choose the right vector embeddings for my use case? This depends on the nature of your data and the specific application. Consider factors like dimensionality, domain-specific models, and the performance characteristics of different embeddings. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#how-does-qdrant-handle-different-vector-embeddings-from-various-providers-in-the-same-collection) How does Qdrant handle different vector embeddings from various providers in the same collection? Qdrant natively [supports multiple vectors per data point](https://qdrant.tech/documentation/concepts/vectors/#multivectors), allowing different embeddings from various providers to coexist within the same collection. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#can-i-migrate-my-embeddings-from-another-vector-store-to-qdrant) Can I migrate my embeddings from another vector store to Qdrant? Yes, Qdrant supports migration of embeddings from other vector stores, facilitating easy transitions and adoption of Qdrant’s features. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#why-the-amount-of-indexed-vectors-doesnt-match-the-amount-of-vectors-in-the-collection) Why the amount of indexed vectors doesn’t match the amount of vectors in the collection? Qdrant doesn’t always need to index all vectors in the collection. It stores data is segments, and if the segment is small enough, it is more efficient to perform a full-scan search on it. Make sure to check that the collection status is `green` and that the number of unindexed vectors smaller than indexing threshold. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#why-collection-info-shows-inaccurate-number-of-points) Why collection info shows inaccurate number of points? Collection info API in Qdrant returns an approximate number of points in the collection. If you need an exact number, you can use the [count](https://qdrant.tech/documentation/concepts/points/#counting-points) API. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#vectors-in-the-collection-dont-match-what-i-uploaded) Vectors in the collection don’t match what I uploaded. There are two possible reasons for this: - You used the `Cosine` distance metric in the [collection settings](https://qdrant.tech/concepts/collections/#collections). In this case, Qdrant pre-normalizes your vectors for faster distance computation. If you strictly need the original vectors to be preserved, consider using the `Dot` distance metric instead. - You used the `uint8` [datatype](https://qdrant.tech/documentation/concepts/vectors/#datatypes) to store vectors. `uint8` requires a special format for input values, which might not be compatible with the typical output of embedding models. ## [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#search) Search ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#how-does-qdrant-handle-real-time-data-updates-and-search) How does Qdrant handle real-time data updates and search? Qdrant supports live updates for vector data, with newly inserted, updated and deleted vectors available for immediate search. The system uses full-scan search on unindexed segments during background index updates. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#my-search-results-contain-vectors-with-null-values-why) My search results contain vectors with null values. Why? By default, Qdrant tries to minimize network traffic and doesn’t return vectors in search results. But you can force Qdrant to do so by setting the `with_vector` parameter of the Search/Scroll to `true`. If you’re still seeing `"vector": null` in your results, it might be that the vector you’re passing is not in the correct format, or there’s an issue with how you’re calling the upsert method. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#how-can-i-search-without-a-vector) How can I search without a vector? You are likely looking for the [scroll](https://qdrant.tech/documentation/concepts/points/#scroll-points) method. It allows you to retrieve the records based on filters or even iterate over all the records in the collection. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#does-qdrant-support-a-full-text-search-or-a-hybrid-search) Does Qdrant support a full-text search or a hybrid search? Qdrant is a vector search engine in the first place, and we only implement full-text support as long as it doesn’t compromise the vector search use case. That includes both the interface and the performance. What Qdrant can do: - Search with full-text filters - Apply full-text filters to the vector search (i.e., perform vector search among the records with specific words or phrases) - Do prefix search and semantic [search-as-you-type](https://qdrant.tech/articles/search-as-you-type/) - Sparse vectors, as used in [SPLADE](https://github.com/naver/splade) or similar models - [Multi-vectors](https://qdrant.tech/documentation/concepts/vectors/#multivectors), for example ColBERT and other late-interaction models - Combination of the [multiple searches](https://qdrant.tech/documentation/concepts/hybrid-queries/) What Qdrant doesn’t plan to support: - Non-vector-based retrieval or ranking functions - Built-in ontologies or knowledge graphs - Query analyzers and other NLP tools Of course, you can always combine Qdrant with any specialized tool you need, including full-text search engines. Read more about [our approach](https://qdrant.tech/articles/hybrid-search/) to hybrid search. ## [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#collections) Collections ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#how-many-collections-can-i-create) How many collections can I create? As many as you want, but be aware that each collection requires additional resources. It is _highly_ recommended not to create many small collections, as it will lead to significant resource consumption overhead. We consider creating a collection for each user/dialog/document as an antipattern. Please read more about collections, isolation, and multiple users in our [Multitenancy](https://qdrant.tech/documentation/tutorials/multiple-partitions/) tutorial. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#how-do-i-upload-a-large-number-of-vectors-into-a-qdrant-collection) How do I upload a large number of vectors into a Qdrant collection? Read about our recommendations in the [bulk upload](https://qdrant.tech/documentation/tutorials/bulk-upload/) tutorial. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#can-i-only-store-quantized-vectors-and-discard-full-precision-vectors) Can I only store quantized vectors and discard full precision vectors? No, Qdrant requires full precision vectors for operations like reindexing, rescoring, etc. ## [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#compatibility) Compatibility ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#is-qdrant-compatible-with-cpus-or-gpus-for-vector-computation) Is Qdrant compatible with CPUs or GPUs for vector computation? Qdrant primarily relies on CPU acceleration for scalability and efficiency. However, we also support GPU-accelerated indexing on all major vendors. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#do-you-guarantee-compatibility-across-versions) Do you guarantee compatibility across versions? In case your version is older, we only guarantee compatibility between two consecutive minor versions. This also applies to client versions. Ensure your client version is never more than one minor version away from your cluster version. While we will assist with break/fix troubleshooting of issues and errors specific to our products, Qdrant is not accountable for reviewing, writing (or rewriting), or debugging custom code. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#do-you-support-downgrades) Do you support downgrades? We do not support downgrading a cluster on any of our products. If you deploy a newer version of Qdrant, your data is automatically migrated to the newer storage format. This migration is not reversible. ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#how-do-i-avoid-issues-when-updating-to-the-latest-version) How do I avoid issues when updating to the latest version? We only guarantee compatibility if you update between consecutive versions. You would need to upgrade versions one at a time: `1.1 -> 1.2`, then `1.2 -> 1.3`, then `1.3 -> 1.4`. ## [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#cloud) Cloud ### [Anchor](https://qdrant.tech/documentation/faq/qdrant-fundamentals/\#is-it-possible-to-scale-down-a-qdrant-cloud-cluster) Is it possible to scale down a Qdrant Cloud cluster? Yes, it is possible to both vertically and horizontally scale down a Qdrant Cloud cluster. Note, that during the vertical scaling down, the disk size cannot be reduced. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/faq/qdrant-fundamentals.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/faq/qdrant-fundamentals.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-120-lllmstxt|> ## embeddings - [Documentation](https://qdrant.tech/documentation/) - Embeddings --- # [Anchor](https://qdrant.tech/documentation/embeddings/\#supported-embedding-providers--models) Supported Embedding Providers & Models Qdrant supports all available text and multimodal dense vector embedding models as well as vector embedding services without any limitations. ## [Anchor](https://qdrant.tech/documentation/embeddings/\#some-of-the-embeddings-you-can-use-with-qdrant) Some of the Embeddings you can use with Qdrant SentenceTransformers, BERT, SBERT, Clip, OpenClip, Open AI, Vertex AI, Azure AI, AWS Bedrock, Jina AI, Upstage AI, Mistral AI, Cohere AI, Voyage AI, Aleph Alpha, Baidu Qianfan, BGE, Instruct, Watsonx Embeddings, Snowflake Embeddings, NVIDIA NeMo, Nomic, OCI Embeddings, Ollama Embeddings, MixedBread, Together AI, Clarifai, Databricks Embeddings, GPT4All Embeddings, John Snow Labs Embeddings. Additionally, [any open-source embeddings from HuggingFace](https://huggingface.co/spaces/mteb/leaderboard) can be used with Qdrant. ## [Anchor](https://qdrant.tech/documentation/embeddings/\#code-samples) Code samples | Embeddings Providers | Description | | --- | --- | | [Aleph Alpha](https://qdrant.tech/documentation/embeddings/aleph-alpha/) | Multilingual embeddings focused on European languages. | | [Bedrock](https://qdrant.tech/documentation/embeddings/bedrock/) | AWS managed service for foundation models and embeddings. | | [Cohere](https://qdrant.tech/documentation/embeddings/cohere/) | Language model embeddings for NLP tasks. | | [Gemini](https://qdrant.tech/documentation/embeddings/gemini/) | Google’s multimodal embeddings for text and vision. | | [Jina AI](https://qdrant.tech/documentation/embeddings/jina-embeddings/) | Customizable embeddings for neural search. | | [Mistral](https://qdrant.tech/documentation/embeddings/mistral/) | Open-source, efficient language model embeddings. | | [MixedBread](https://qdrant.tech/documentation/embeddings/mixedbread/) | Lightweight embeddings for constrained environments. | | [Mixpeek](https://qdrant.tech/documentation/embeddings/mixpeek/) | Managed SDK for video chunking, embedding, and post-processing. ​ | | [Nomic](https://qdrant.tech/documentation/embeddings/nomic/) | Embeddings for data visualization. | | [Nvidia](https://qdrant.tech/documentation/embeddings/nvidia/) | GPU-optimized embeddings from Nvidia. | | [Ollama](https://qdrant.tech/documentation/embeddings/ollama/) | Embeddings for conversational AI. | | [OpenAI](https://qdrant.tech/documentation/embeddings/openai/) | Industry-leading embeddings for NLP. | | [Prem AI](https://qdrant.tech/documentation/embeddings/premai/) | Precise language embeddings. | | [Twelve Labs](https://qdrant.tech/documentation/embeddings/twelvelabs/) | Multimodal embeddings from Twelve labs. | | [Snowflake](https://qdrant.tech/documentation/embeddings/snowflake/) | Scalable embeddings for big data. | | [Upstage](https://qdrant.tech/documentation/embeddings/upstage/) | Embeddings for speech and language tasks. | | [Voyage AI](https://qdrant.tech/documentation/embeddings/voyage/) | Navigation and spatial understanding embeddings. | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/embeddings/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/embeddings/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-121-lllmstxt|> ## minicoil - [Articles](https://qdrant.tech/articles/) - miniCOIL: on the Road to Usable Sparse Neural Retrieval [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # miniCOIL: on the Road to Usable Sparse Neural Retrieval Evgeniya Sukhodolskaya · May 13, 2025 ![miniCOIL: on the Road to Usable Sparse Neural Retrieval](https://qdrant.tech/articles_data/minicoil/preview/title.jpg) Have you ever heard of sparse neural retrieval? If so, have you used it in production? It’s a field with excellent potential – who wouldn’t want to use an approach that combines the strengths of dense and term-based text retrieval? Yet it’s not so popular. Is it due to the common curse of _“What looks good on paper is not going to work in practice”?_? This article describes our path towards sparse neural retrieval _as it should be_ – lightweight term-based retrievers capable of distinguishing word meanings. Learning from the mistakes of previous attempts, we created **miniCOIL**, a new sparse neural candidate to take BM25’s place in hybrid searches. We’re happy to share it with you and are awaiting your feedback. ## [Anchor](https://qdrant.tech/articles/minicoil/\#the-good-the-bad-and-the-ugly) The Good, the Bad and the Ugly Sparse neural retrieval is not so well known, as opposed to methods it’s based on – term-based and dense retrieval. Their weaknesses motivated this field’s development, guiding its evolution. Let’s follow its path. ![Retrievers evolution](https://qdrant.tech/articles_data/minicoil/models_evolution.png) Retrievers evolution ### [Anchor](https://qdrant.tech/articles/minicoil/\#term-based-retrieval) Term-based Retrieval Term-based retrieval usually treats text as a bag of words. These words play roles of different importance, contributing to the overall relevance score between a document and a query. Famous **BM25** estimates words’ contribution based on their: 1. Importance in a particular text – Term Frequency (TF) based. 2. Significance within the whole corpus – Inverse Document Frequency (IDF) based. It also has several parameters reflecting typical text length in the corpus, the exact meaning of which you can check in [our detailed breakdown of the BM25 formula](https://qdrant.tech/articles/bm42/#why-has-bm25-stayed-relevant-for-so-long). Precisely defining word importance within a text is nontrivial. BM25 is built on the idea that term importance can be defined statistically. This isn’t far from the truth in long texts, where frequent repetition of a certain word signals that the text is related to this concept. In very short texts – say, chunks for Retrieval Augmented Generation (RAG) – it’s less applicable, with TF of 0 or 1. We approached fixing it in our [BM42 modification of BM25 algorithm.](https://qdrant.tech/articles/bm42/) Yet there is one component of a word’s importance for retrieval, which is not considered in BM25 at all – word meaning. The same words have different meanings in different contexts, and it affects the text’s relevance. Think of _“fruit **bat**”_ and _“baseball **bat**"_—the same importance in the text, different meanings. ### [Anchor](https://qdrant.tech/articles/minicoil/\#dense-retrieval) Dense Retrieval How to capture the meaning? Bag-of-words models like BM25 assume that words are placed in a text independently, while linguists say: > “You shall know a word by the company it keeps” - John Rupert Firth This idea, together with the motivation to numerically express word relationships, powered the development of the second branch of retrieval – dense vectors. Transformer models with attention mechanisms solved the challenge of distinguishing word meanings within text context, making it a part of relevance matching in retrieval. Yet dense retrieval didn’t (and can’t) become a complete replacement for term-based retrieval. Dense retrievers are capable of broad semantic similarity searches, yet they lack precision when we need results including a specific keyword. It’s a fool’s errand – trying to make dense retrievers do exact matching, as they’re built in a paradigm where every word matches every other word semantically to some extent, and this semantic similarity depends on the training data of a particular model. ### [Anchor](https://qdrant.tech/articles/minicoil/\#sparse-neural-retrieval) Sparse Neural Retrieval So, on one side, we have weak control over matching, sometimes leading to too broad retrieval results, and on the other—lightweight, explainable and fast term-based retrievers like BM25, incapable of capturing semantics. Of course, we want the best of both worlds, fused in one model, no drawbacks included. Sparse neural retrieval evolution was pushed by this desire. - Why **sparse**? Term-based retrieval can operate on sparse vectors, where each word in the text is assigned a non-zero value (its importance in this text). - Why **neural**? Instead of deriving an importance score for a word based on its statistics, let’s use machine learning models capable of encoding words’ meaning. **So why is it not widely used?** ![Problems of modern sparse neural retrievers](https://qdrant.tech/articles_data/minicoil/models_problems.png) Problems of modern sparse neural retrievers The detailed history of sparse neural retrieval makes for [a whole other article](https://qdrant.tech/articles/modern-sparse-neural-retrieval/). Summing a big part of it up, there were many attempts to map a word representation produced by a dense encoder to a single-valued importance score, and most of them never saw the real world outside of research papers ( **DeepImpact**, **TILDEv2**, **uniCOIL**). Trained end-to-end on a relevance objective, most of the **sparse encoders** estimated word importance well only for a particular domain. Their out-of-domain accuracy, on datasets they hadn’t “seen” during training, [was worse than BM25.](https://arxiv.org/pdf/2307.10488) The SOTA of sparse neural retrieval is **SPLADE** – (Sparse Lexical and Expansion Model). This model has made its way into retrieval systems - you can [use SPLADE++ in Qdrant with FastEmbed](https://qdrant.tech/documentation/fastembed/fastembed-splade/). Yet there’s a catch. The “expansion” part of SPLADE’s name refers to a technique that combats against another weakness of term-based retrieval – **vocabulary mismatch**. While dense encoders can successfully connect related terms like “fruit bat” and “flying fox”, term-based retrieval fails at this task. SPLADE solves this problem by **expanding documents and queries with additional fitting terms**. However, it leads to SPLADE inference becoming heavy. Additionally, produced representations become not-so-sparse (so, consequently, not lightweight) and far less explainable as expansion choices are made by machine learning models. > “Big man in a suit of armor. Take that off, what are you?” Experiments showed that SPLADE without its term expansion tells the same old story of sparse encoders — [it performs worse than BM25.](https://arxiv.org/pdf/2307.10488) ## [Anchor](https://qdrant.tech/articles/minicoil/\#eyes-on-the-prize-usable-sparse-neural-retrieval) Eyes on the Prize: Usable Sparse Neural Retrieval Striving for perfection on specific benchmarks, the sparse neural retrieval field either produced models performing worse than BM25 out-of-domain(ironically, [trained with BM25-based hard negatives](https://arxiv.org/pdf/2307.10488)) or models based on heavy document expansion, lowering sparsity. To be usable in production, the minimal criteria a sparse neural retriever should meet are: - **Producing lightweight sparse representations (it’s in the name!).** Inheriting the perks of term-based retrieval, it should be lightweight and simple. For broader semantic search, there are dense retrievers. - **Being better than BM25 at ranking in different domains.** The goal is a term-based retriever capable of distinguishing word meanings — what BM25 can’t do — preserving BM25’s out-of-domain, time-proven performance. ![The idea behind miniCOIL](https://qdrant.tech/articles_data/minicoil/minicoil.png) The idea behind miniCOIL ### [Anchor](https://qdrant.tech/articles/minicoil/\#inspired-by-coil) Inspired by COIL One of the attempts in the field of Sparse Neural Retrieval — [Contextualized Inverted Lists (COIL)](https://qdrant.tech/articles/modern-sparse-neural-retrieval/#sparse-neural-retriever-which-understood-homonyms) — stands out with its approach to term weights encoding. Instead of squishing high-dimensional token representations (usually 768-dimensional BERT embeddings) into a single number, COIL authors project them to smaller vectors of 32 dimensions. They propose storing these vectors in **inverted lists** of an **inverted index** (used in term-based retrieval) as is and comparing vector representations through dot product. This approach captures deeper semantics, a single number simply cannot convey all the nuanced meanings a word can have. Despite this advantage, COIL failed to gain widespread adoption for several key reasons: - Inverted indexes are usually not designed to store vectors and perform vector operations. - Trained end-to-end with a relevance objective on [MS MARCO dataset](https://microsoft.github.io/msmarco/), COIL’s performance is heavily domain-bound. - Additionally, COIL operates on tokens, reusing BERT’s tokenizer. However, working at a word level is far better for term-based retrieval. Imagine we want to search for a _“retriever”_ in our documentation. COIL will break it down into `re`, `#trie`, and `#ver` 32-dimensional vectors and match all three parts separately – not so convenient. However, COIL representations allow distinguishing homographs, a skill BM25 lacks. The best ideas don’t start from zero. We propose an approach **built on top of COIL, keeping in mind what needs fixing**: 1. We should **abandon end-to-end training on a relevance objective** to get a model performant on out-of-domain data. There is not enough data to train a model able to generalize. 2. We should **keep representations sparse and reusable in a classic inverted index**. 3. We should **fix tokenization**. This problem is the easiest one to solve, as it was already done in several sparse neural retrievers, and [we also learned to do it in our BM42](https://qdrant.tech/articles/bm42/#wordpiece-retokenization). ### [Anchor](https://qdrant.tech/articles/minicoil/\#standing-on-the-shoulders-of-bm25) Standing on the Shoulders of BM25 BM25 has been a decent baseline across various domains for many years – and for a good reason. So why discard a time-proven formula? Instead of training our sparse neural retriever to assign words’ importance scores, let’s add a semantic COIL-inspired component to BM25 formula. score(D,Q)=∑i=1NIDF(qi)⋅ImportanceDqi⋅Meaningqi×dj, where term dj∈D equals qi Then, if we manage to capture a word’s meaning, our solution alone could work like BM25 combined with a semantically aware reranker – or, in other words: - It could see the difference between homographs; - When used with word stems, it could distinguish parts of speech. ![Meaning component](https://qdrant.tech/articles_data/minicoil/examples.png) Meaning component And if our model stumbles upon a word it hasn’t “seen” during training, we can just fall back to the original BM25 formula! ### [Anchor](https://qdrant.tech/articles/minicoil/\#bag-of-words-in-4d) Bag-of-words in 4D COIL uses 32 values to describe one term. Do we need this many? How many words with 32 separate meanings could we name without additional research? Yet, even if we use fewer values in COIL representations, the initial problem of dense vectors not fitting into a classical inverted index persists. Unless… We perform a simple trick! ![miniCOIL vectors to sparse representation](https://qdrant.tech/articles_data/minicoil/bow_4D.png) miniCOIL vectors to sparse representation Imagine a bag-of-words sparse vector. Every word from the vocabulary takes up one cell. If the word is present in the encoded text — we assign some weight; if it isn’t — it equals zero. If we have a mini COIL vector describing a word’s meaning, for example, in 4D semantic space, we could just dedicate 4 consecutive cells for word in the sparse vector, one cell per “meaning” dimension. If we don’t, we could fall back to a classic one-cell description with a pure BM25 score. **Such representations can be used in any standard inverted index.** ## [Anchor](https://qdrant.tech/articles/minicoil/\#training-minicoil) Training miniCOIL Now, we’re coming to the part where we need to somehow get this low-dimensional encapsulation of a word’s meaning – **a miniCOIL vector**. We want to work smarter, not harder, and rely as much as possible on time-proven solutions. Dense encoders are good at encoding a word’s meaning in its context, so it would be convenient to reuse their output. Moreover, we could kill two birds with one stone if we wanted to add miniCOIL to hybrid search – where dense encoder inference is done regardless. ### [Anchor](https://qdrant.tech/articles/minicoil/\#reducing-dimensions) Reducing Dimensions Dense encoder outputs are high-dimensional, so we need to perform **dimensionality reduction, which should preserve the word’s meaning in context**. The goal is to: - Avoid relevance objective and dependence on labelled datasets; - Find a target capturing spatial relations between word’s meanings; - Use the simplest architecture possible. ### [Anchor](https://qdrant.tech/articles/minicoil/\#training-data) Training Data We want miniCOIL vectors to be comparable according to a word’s meaning — _fruit **bat**_ and _vampire **bat**_ should be closer to each other in low-dimensional vector space than to _baseball **bat**_. So, we need something to calibrate on when reducing the dimensionality of words’ contextualized representations. It’s said that a word’s meaning is hidden in the surrounding context or, simply put, in any texts that include this word. In bigger texts, we risk the word’s meaning blending out. So, let’s work at the sentence level and assume that sentences sharing one word should cluster in a way that each cluster contains sentences where this word is used in one specific meaning. If that’s true, we could encode various sentences with a sophisticated dense encoder and form a reusable spatial relations target for input dense encoders. It’s not a big problem to find lots of textual data containing frequently used words when we have datasets like the [OpenWebText dataset](https://paperswithcode.com/dataset/openwebtext), spanning the whole web. With this amount of data available, we could afford generalization and domain independence, which is hard to achieve with the relevance objective. #### [Anchor](https://qdrant.tech/articles/minicoil/\#its-going-to-work-i-bat) It’s Going to Work, I Bat Let’s test our assumption and take a look at the word _“bat”_. We took several thousand sentences with this word, which we sampled from [OpenWebText dataset](https://paperswithcode.com/dataset/openwebtext) and vectorized with a [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) encoder. The goal was to check if we could distinguish any clusters containing sentences where _“bat”_ shares the same meaning. ![Sentences with "bat" in 2D](https://qdrant.tech/articles_data/minicoil/bat.png) Sentences with “bat” in 2D. A very important observation: _Looks like a bat_:) The result had two big clusters related to _“bat”_ as an animal and _“bat”_ as a sports equipment, and two smaller ones related to fluttering motion and the verb used in sports. Seems like it could work! ### [Anchor](https://qdrant.tech/articles/minicoil/\#architecture-and-training-objective) Architecture and Training Objective Let’s continue dealing with _“bats”_. We have a training pool of sentences containing the word _“bat”_ in different meanings. Using a dense encoder of choice, we get a contextualized embedding of _“bat”_ from each sentence and learn to compress it into a low-dimensional miniCOIL _“bat”_ space, guided by [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) sentence embeddings. We’re dealing with only one word, so it should be enough to use just one linear layer for dimensionality reduction, with a [`Tanh activation`](https://pytorch.org/docs/stable/generated/torch.nn.Tanh.html) on top, mapping values of compressed vectors to (-1, 1) range. The activation function choice is made to align miniCOIL representations with dense encoder ones, which are mainly compared through `cosine similarity`. ![miniCOIL architecture on a word level](https://qdrant.tech/articles_data/minicoil/miniCOIL_one_word.png) miniCOIL architecture on a word level As a training objective, we can select the minimization of [triplet loss](https://qdrant.tech/articles/triplet-loss/), where triplets are picked and aligned based on distances between [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) sentence embeddings. We rely on the confidence (size of the margin) of [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) to guide our _“bat”_ miniCOIL compression. ![miniCOIL training](https://qdrant.tech/articles_data/minicoil/training_objective.png) miniCOIL training #### [Anchor](https://qdrant.tech/articles/minicoil/\#eating-elephant-one-bite-at-a-time) Eating Elephant One Bite at a Time Now, we have the full idea of how to train miniCOIL for one word. How do we scale to a whole vocabulary? What if we keep it simple and continue training a model per word? It has certain benefits: 1. Extremely simple architecture: even one layer per word can suffice. 2. Super fast and easy training process. 3. Cheap and fast inference due to the simple architecture. 4. Flexibility to discover and tune underperforming words. 5. Flexibility to extend and shrink the vocabulary depending on the domain and use case. Then we could train all the words we’re interested in and simply combine (stack) all models into one big miniCOIL. ![miniCOIL model](https://qdrant.tech/articles_data/minicoil/miniCOIL_full.png) miniCOIL model ### [Anchor](https://qdrant.tech/articles/minicoil/\#implementation-details) Implementation Details The code of the training approach sketched above is open-sourced [in this repository](https://github.com/qdrant/miniCOIL). Here are the specific characteristics of the miniCOIL model we trained based on this approach: | Component | Description | | --- | --- | | **Input Dense Encoder** | [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en) (512 dimensions) | | **miniCOIL Vectors Size** | 4 dimensions | | **miniCOIL Vocabulary** | List of 30,000 of the most common English words, cleaned of stop words and words shorter than 3 letters, [taken from here](https://github.com/arstgit/high-frequency-vocabulary/tree/master). Words are stemmed to align miniCOIL with our BM25 implementation. | | **Training Data** | 40 million sentences — a random subset of the [OpenWebText dataset](https://paperswithcode.com/dataset/openwebtext). To make triplet sampling convenient, we uploaded sentences and their [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) embeddings to Qdrant and built a [full-text payload index](https://qdrant.tech/documentation/concepts/indexing/#full-text-index) on sentences with a tokenizer of type `word`. | | **Training Data per Word** | We sample 8000 sentences per word and form triplets with a margin of at least **0.1**.
Additionally, we apply **augmentation** — take a sentence and cut out the target word plus its 1–3 neighbours. We reuse the same similarity score between original and augmented sentences for simplicity. | | **Training Parameters** | **Epochs**: 60
**Optimizer**: Adam with a learning rate of 1e-4
**Validation set**: 20% | Each word was **trained on just one CPU**, and it took approximately fifty seconds per word to train. We included this `minicoil-v1` version in the [v0.7.0 release of our FastEmbed library](https://github.com/qdrant/fastembed). You can check an example of `minicoil-v1` usage with FastEmbed in the [HuggingFace card](https://huggingface.co/Qdrant/minicoil-v1). ## [Anchor](https://qdrant.tech/articles/minicoil/\#results) Results ### [Anchor](https://qdrant.tech/articles/minicoil/\#validation-loss) Validation Loss Input transformer [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en) approximates the “role model” transformer [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) context relations with a (measured though triplets) quality of 83%. That means that in 17% of cases, [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en) will take a sentence triplet from [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) and embed it in a way that the negative example from the perspective of `mxbai` will be closer to the anchor than the positive one. The validation loss we obtained, depending on the miniCOIL vector size (4, 8, or 16), demonstrates miniCOIL correctly distinguishing from 76% (60 failed triplets on average per batch of size 256) to 85% (38 failed triplets on average per batch of size 256) triplets respectively. ![Validation loss](https://qdrant.tech/articles_data/minicoil/validation_loss.png) Validation loss ### [Anchor](https://qdrant.tech/articles/minicoil/\#benchmarking) Benchmarking The benchmarking code is open-sourced in [this repository](https://github.com/qdrant/mini-coil-demo/tree/master/minicoil_demo). To check our 4D miniCOIL version performance in different domains, we, ironically, chose a subset of the same [BEIR datasets](https://github.com/beir-cellar/beir), high benchmark values on which became an end in itself for many sparse neural retrievers. Yet the difference is that **miniCOIL wasn’t trained on BEIR datasets and shouldn’t be biased towards them**. We’re testing our 4D miniCOIL model versus [our BM25 implementation](https://huggingface.co/Qdrant/bm25). BEIR datasets are indexed to Qdrant using the following parameters for both methods: - `k = 1.2`, `b = 0.75` default values recommended to use with BM25 scoring; - `avg_len` estimated on 50,000 documents from a respective dataset. We compare models based on the `NDCG@10` metric, as we’re interested in the ranking performance of miniCOIL compared to BM25. Both retrieve the same subset of indexed documents based on exact matches, but miniCOIL should ideally rank this subset better based on its semantics understanding. The result on several domains we tested is the following: | Dataset | BM25 (NDCG@10) | MiniCOIL (NDCG@10) | | --- | --- | --- | | MS MARCO | 0.237 | **0.244** | | NQ | 0.304 | **0.319** | | Quora | 0.784 | **0.802** | | FiQA-2018 | 0.252 | **0.257** | | HotpotQA | **0.634** | 0.633 | We can see miniCOIL performing slightly better than BM25 in four out of five tested domains. It shows that **we’re moving in the right direction**. ## [Anchor](https://qdrant.tech/articles/minicoil/\#key-takeaways) Key Takeaways This article describes our attempt to make a lightweight sparse neural retriever that is able to generalize to out-of-domain data. Sparse neural retrieval has a lot of potential, and we hope to see it gain more traction. ### [Anchor](https://qdrant.tech/articles/minicoil/\#why-is-this-approach-useful) Why is this Approach Useful? This approach to training sparse neural retrievers: 1. Doesn’t rely on a relevance objective because it is trained in a self-supervised way, so it doesn’t need labeled datasets to scale. 2. Builds on the proven BM25 formula, simply adding a semantic component to it. 3. Creates lightweight sparse representations that fit into a standard inverted index. 4. Fully reuses the outputs of dense encoders, making it adaptable to different models. This also makes miniCOIL a cheap upgrade for hybrid search solutions. 5. Uses an extremely simple model architecture, with one trainable layer per word in miniCOIL’s vocabulary. This results in very fast training and inference. Also, this word-level training makes it easy to expand miniCOIL’s vocabulary for a specific use case. ### [Anchor](https://qdrant.tech/articles/minicoil/\#the-right-tool-for-the-right-job) The Right Tool for the Right Job When are miniCOIL retrievers applicable? If you need precise term matching but BM25-based retrieval doesn’t meet your needs, ranking higher documents with words of the right form but the wrong semantical meaning. Say you’re implementing search in your documentation. In this use case, keywords-based search prevails, but BM25 won’t account for different context-based meanings of these keywords. For example, if you’re searching for a _“data **point**”_ in our documentation, you’d prefer to see _“a **point** is a record in Qdrant”_ ranked higher than _floating **point** precision_, and here miniCOIL-based retrieval is an alternative to consider. Additionally, miniCOIL fits nicely as a part of a hybrid search, as it enhances sparse retrieval without any noticeable increase in resource consumption, directly reusing contextual word representations produced by a dense encoder. To sum up, miniCOIL should work as if BM25 understood the meaning of words and ranked documents based on this semantic knowledge. It operates only on exact matches, so if you aim for documents semantically similar to the query but expressed in different words, dense encoders are the way to go. ### [Anchor](https://qdrant.tech/articles/minicoil/\#whats-next) What’s Next? We will continue working on improving our approach – both in-depth, searching for ways to improve the model’s quality, and in-width, extending it to various dense encoders and languages beyond English. And we would love to share this road to usable sparse neural retrieval with you! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/miniCOIL.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/miniCOIL.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-122-lllmstxt|> ## vector-search-resource-optimization - [Articles](https://qdrant.tech/articles/) - Vector Search Resource Optimization Guide [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # Vector Search Resource Optimization Guide David Myriel · February 09, 2025 ![Vector Search Resource Optimization Guide](https://qdrant.tech/articles_data/vector-search-resource-optimization/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#whats-in-this-guide) What’s in This Guide? [**Resource Management Strategies:**](https://qdrant.tech/articles/vector-search-resource-optimization/#storage-disk-vs-ram) If you are trying to scale your app on a budget - this is the guide for you. We will show you how to avoid wasting compute resources and get the maximum return on your investment. [**Performance Improvement Tricks:**](https://qdrant.tech/articles/vector-search-resource-optimization/#configure-indexing-for-faster-searches) We’ll dive into advanced techniques like indexing, compression, and partitioning. Our tips will help you get better results at scale, while reducing total resource expenditure. [**Query Optimization Methods:**](https://qdrant.tech/articles/vector-search-resource-optimization/#query-optimization) Improving your vector database setup isn’t just about saving costs. We’ll show you how to build search systems that deliver consistently high precision while staying adaptable. * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#remember-optimization-is-a-balancing-act) Remember: Optimization is a Balancing Act In this guide, we will show you how to use Qdrant’s features to meet your performance needs. However - there are resource tradeoffs and you can’t have it all. It is up to you to choose the optimization strategy that best fits your goals. ![optimization](https://qdrant.tech/articles_data/vector-search-resource-optimization/optimization.png) Let’s take a look at some common goals and optimization strategies: | Intended Result | Optimization Strategy | | --- | --- | | [**High Search Precision + Low Memory Expenditure**](https://qdrant.tech/documentation/guides/optimize/#1-high-speed-search-with-low-memory-usage) | [**On-Disk Indexing**](https://qdrant.tech/documentation/guides/optimize/#1-high-speed-search-with-low-memory-usage) | | [**Low Memory Expenditure + Fast Search Speed**](https://qdrant.tech/documentation/guides/quantization/) | [**Quantization**](https://qdrant.tech/documentation/guides/quantization/) | | [**High Search Precision + Fast Search Speed**](https://qdrant.tech/documentation/guides/optimize/#3-high-precision-with-high-speed-search) | [**RAM Storage + Quantization**](https://qdrant.tech/documentation/guides/optimize/#3-high-precision-with-high-speed-search) | | [**Balance Latency vs Throughput**](https://qdrant.tech/documentation/guides/optimize/#balancing-latency-and-throughput) | [**Segment Configuration**](https://qdrant.tech/documentation/guides/optimize/#balancing-latency-and-throughput) | After this article, check out the code samples in our docs on [**Qdrant’s Optimization Methods**](https://qdrant.tech/documentation/guides/optimize/). * * * ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#configure-indexing-for-faster-searches) Configure Indexing for Faster Searches ![indexing](https://qdrant.tech/articles_data/vector-search-resource-optimization/index.png) A vector index is the central location where Qdrant calculates vector similarity. It is the backbone of your search process, retrieving relevant results from vast amounts of data. Qdrant uses the [**HNSW (Hierarchical Navigable Small World Graph) algorithm**](https://qdrant.tech/documentation/concepts/indexing/#vector-index) as its dense vector index, which is both powerful and scalable. **Figure 2:** A sample HNSW vector index with three layers. Follow the blue arrow on the top layer to see how a query travels throughout the database index. The closest result is on the bottom level, nearest to the gray query point. ![hnsw](https://qdrant.tech/articles_data/vector-search-resource-optimization/hnsw.png) #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#vector-index-optimization-parameters) Vector Index Optimization Parameters Working with massive datasets that contain billions of vectors demands significant resources—and those resources come with a price. While Qdrant provides reasonable defaults, tailoring them to your specific use case can unlock optimal performance. Here’s what you need to know. The following parameters give you the flexibility to fine-tune Qdrant’s performance for your specific workload. You can modify them directly in Qdrant’s [**configuration**](https://qdrant.tech/documentation/guides/configuration/) files or at the collection and named vector levels for more granular control. **Figure 3:** A description of three key HNSW parameters. ![hnsw-parameters](https://qdrant.tech/articles_data/vector-search-resource-optimization/hnsw-parameters.png) #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#1-the-m-parameter-determines-edges-per-node) 1\. The `m` parameter determines edges per node This controls the number of edges in the graph. A higher value enhances search accuracy but demands more memory and build time. Fine-tune this to balance memory usage and precision. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#2-the-ef_construct-parameter-controls-the-index-build-range) 2\. The `ef_construct` parameter controls the index build range This parameter sets how many neighbors are considered during index construction. A larger value improves the accuracy of the index but increases the build time. Use this to customize your indexing speed versus quality. You need to set both the `m` and `ef parameters` as you create the collection: ```python client.update_collection( collection_name="{collection_name}", vectors_config={ "my_vector": models.VectorParamsDiff( hnsw_config=models.HnswConfigDiff( m=32, ef_construct=123, ), ), } ) ``` #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#3-the-ef-parameter-updates-vector-search-range) 3\. The `ef` parameter updates vector search range This determines how many neighbors are evaluated during a search query. You can adjust this to balance query speed and accuracy. The `ef` parameter is configured during the search process: ```python client.query_points( collection_name="{collection_name}", query=[...] search_params=models.SearchParams(hnsw_ef=128, exact=False), ) ``` * * * These are just the basics of HNSW. Learn More about [**Indexing**](https://qdrant.tech/documentation/concepts/indexing/). * * * ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#data-compression-techniques) Data Compression Techniques ![compression](https://qdrant.tech/articles_data/vector-search-resource-optimization/compress.png) Efficient data compression is a cornerstone of resource optimization in vector databases. By reducing memory usage, you can achieve faster query performance without sacrificing too much accuracy. One powerful technique is [**quantization**](https://qdrant.tech/documentation/guides/quantization/), which transforms high-dimensional vectors into compact representations while preserving relative similarity. Let’s explore the quantization options available in Qdrant. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#scalar-quantization) Scalar Quantization Scalar quantization strikes an excellent balance between compression and performance, making it the go-to choice for most use cases. This method minimizes the number of bits used to represent each vector component. For instance, Qdrant compresses 32-bit floating-point values ( **float32**) into 8-bit unsigned integers ( **uint8**), slashing memory usage by an impressive 75%. **Figure 4:** The top example shows a float32 vector with a size of 40 bytes. Converting it to int8 format reduces its size by a factor of four, while maintaining approximate similarity relationships between vectors. The loss in precision compared to the original representation is typically negligible for most practical applications. ![scalar-quantization](https://qdrant.tech/articles_data/vector-search-resource-optimization/scalar-quantization.png) #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#benefits-of-scalar-quantization) Benefits of Scalar Quantization: | Benefit | Description | | --- | --- | | **Memory usage will drop** | Compression cuts memory usage by a factor of 4. Qdrant compresses 32-bit floating-point values (float32) into 8-bit unsigned integers (uint8). | | **Accuracy loss is minimal** | Converting from float32 to uint8 introduces a small loss in precision. Typical error rates remain below 1%, making this method highly efficient. | | **Best for specific use cases** | To be used with high-dimensional vectors where minor accuracy losses are acceptable. | #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#set-it-up-as-you-create-the-collection) Set it up as you create the collection: ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, quantile=0.99, always_ram=True, ), ), ) ``` When working with Qdrant, you can fine-tune the quantization configuration to optimize precision, memory usage, and performance. Here’s what the key configuration options include: | Configuration Option | Description | | --- | --- | | `type` | Specifies the quantized vector type (currently supports only int8). | | `quantile` | Sets bounds for quantization, excluding outliers. For example, 0.99 excludes the top 1% of extreme values to maintain better accuracy. | | `always_ram` | Keeps quantized vectors in RAM to speed up searches. | Adjust these settings to strike the right balance between precision and efficiency for your specific workload. * * * Learn More about [**Scalar Quantization**](https://qdrant.tech/documentation/guides/quantization/) * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#binary-quantization) Binary Quantization **Binary quantization** takes scalar quantization to the next level by compressing each vector component into just **a single bit**. This method achieves unparalleled memory efficiency and query speed, reducing memory usage by a factor of 32 and enabling searches up to 40x faster. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#benefits-of-binary-quantization)**Benefits of Binary Quantization:** Binary quantization is ideal for large-scale datasets and compatible embedding models, where compression and speed are paramount. **Figure 5:** This method causes maximum compression. It reduces memory usage by 32x and speeds up searches by up to 40x. ![binary-quantization](https://qdrant.tech/articles_data/vector-search-resource-optimization/binary-quantization.png) | Benefit | Description | | --- | --- | | **Efficient similarity calculations** | Emulates Hamming distance through dot product comparisons, making it fast and effective. | | **Perfect for high-dimensional vectors** | Works well with embedding models like OpenAI’s text-embedding-ada-002 or Cohere’s embed-english-v3.0. | | **Precision management** | Consider rescoring or oversampling to offset precision loss. | Here’s how you can enable binary quantization in Qdrant: ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE), quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True, ), ), ) ``` > By default, quantized vectors load like original vectors unless you set `always_ram` to `True` for instant access and faster queries. * * * Learn more about [**Binary Quantization**](https://qdrant.tech/documentation/guides/quantization/) * * * ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#scaling-the-database) Scaling the Database ![sharding](https://qdrant.tech/articles_data/vector-search-resource-optimization/shards.png) Efficiently managing large datasets in distributed systems like Qdrant requires smart strategies for data isolation. **Multitenancy** and **Sharding** are essential tools to help you handle high volumes of user-specific data while maintaining performance and scalability. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#multitenancy) Multitenancy **Multitenancy** is a software architecture where multiple independent users (or tenants) share the same resources or environment. In Qdrant, a single collection with logical partitioning is often the most efficient setup for multitenant use cases. **Figure 5:** Each individual vector is assigned a specific payload that denotes which tenant it belongs to. This is how a large number of different tenants can share a single Qdrant collection. ![multitenancy](https://qdrant.tech/articles_data/vector-search-resource-optimization/multitenancy.png) **Why Choose Multitenancy?** - **Logical Isolation**: Ensures each tenant’s data remains separate while residing in the same collection. - **Minimized Overhead**: Reduces resource consumption compared to maintaining separate collections for each user. - **Scalability**: Handles high user volumes without compromising performance. Here’s how you can implement multitenancy efficiently in Qdrant: ```python client.create_payload_index( collection_name="{collection_name}", field_name="group_id", field_schema=models.KeywordIndexParams( type="keyword", is_tenant=True, ), ) ``` Creating a keyword payload index, with the `is_tenant` parameter set to `True`, modifies the way the vectors will be logically stored. Storage structure will be organized to co-locate vectors of the same tenant together. Now, each point stored in Qdrant should have the `group_id` payload attribute set: ```python client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ payload={"group_id": "user_1"},\ vector=[0.9, 0.1, 0.1],\ ),\ \ models.PointStruct(\ id=2,\ payload={"group_id": "user_2"},\ vector=[0.5, 0.9, 0.4],\ )\ ] ) ``` * * * To ensure proper data isolation in a multitenant environment, you can assign a unique identifier, such as a **group\_id**, to each vector. This approach ensures that each user’s data remains segregated, allowing users to access only their own data. You can further enhance this setup by applying filters during queries to restrict access to the relevant data. * * * Learn More about [**Multitenancy**](https://qdrant.tech/documentation/guides/multiple-partitions/) * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#sharding) Sharding Sharding is a critical strategy in Qdrant for splitting collections into smaller units, called **shards**, to efficiently distribute data across multiple nodes. It’s a powerful tool for improving scalability and maintaining performance in large-scale systems. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#user-defined-sharding) User-Defined Sharding: **User-Defined Sharding** allows you to take control of data placement by specifying a shard key. This feature is particularly useful in multi-tenant setups, as it enables the isolation of each tenant’s data within separate shards, ensuring better organization and enhanced data security. **Figure 6:** Users can both upsert and query shards that are relevant to them, all within the same collection. Regional sharding can help avoid cross-continental traffic. ![user-defined-sharding](https://qdrant.tech/articles_data/vector-search-resource-optimization/user-defined-sharding.png) **Example:** ```python client.create_collection( collection_name="my_custom_sharded_collection", shard_number=1, sharding_method=models.ShardingMethod.CUSTOM ) client.create_shard_key("my_custom_sharded_collection", "tenant_id") ``` * * * When implementing user-defined sharding in Qdrant, two key parameters are critical to achieving efficient data distribution: 1. **Shard Key**: The shard key determines how data points are distributed across shards. For example, using a key like `tenant_id` allows you to control how Qdrant partitions the data. Each data point added to the collection will be assigned to a shard based on the value of this key, ensuring logical isolation of data. 2. **Shard Number**: This defines the total number of physical shards for each shard key, influencing resource allocation and query performance. Here’s how you can add a data point to a collection with user-defined sharding: ```python client.upsert( collection_name="my_custom_sharded_collection", points=[\ models.PointStruct(\ id=1111,\ vector=[0.1, 0.2, 0.3]\ )\ ], shard_key_selector="tenant_1" ) ``` * * * This code assigns the point to a specific shard based on the `tenant_1` shard key, ensuring proper data placement. Here’s how to choose the shard\_number: | Recommendation | Description | | --- | --- | | **Match Shards to Nodes** | The number of shards should align with the number of nodes in your cluster to balance resource utilization and query performance. | | **Plan for Scalability** | Start with at least **2 shards per node** to allow room for future growth. | | **Future-Proofing** | Starting with around **12 shards** is a good rule of thumb. This setup allows your system to scale seamlessly from 1 to 12 nodes without requiring re-sharding. | Learn more about [**Sharding in Distributed Deployment**](https://qdrant.tech/documentation/guides/distributed_deployment/) * * * ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#query-optimization) Query Optimization ![qdrant](https://qdrant.tech/articles_data/vector-search-resource-optimization/query.png) Improving vector database performance is critical when dealing with large datasets and complex queries. By leveraging techniques like **filtering**, **batch processing**, **reranking**, **rescoring**, and **oversampling**, so you can ensure fast response times and maintain efficiency even at scale. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#filtering) Filtering Filtering allows you to select only the required fields in your query results. By limiting the output size, you can significantly reduce response time and improve performance. The filterable vector index is Qdrant’s solves pre and post-filtering problems by adding specialized links to the search graph. It aims to maintain the speed advantages of vector search while allowing for precise filtering, addressing the inefficiencies that can occur when applying filters after the vector search. **Example:** ```python results = client.search( collection_name="my_collection", query_vector=[0.1, 0.2, 0.3], query_filter=models.Filter(must=[\ models.FieldCondition(\ key="category",\ match=models.MatchValue(value="my-category-name"),\ )\ ]), limit=10, ) ``` **Figure 7:** The filterable vector index adds specialized links to the search graph to speed up traversal. ![filterable-vector-index](https://qdrant.tech/articles_data/vector-search-resource-optimization/filterable-vector-index.png) [**Filterable vector index**](https://qdrant.tech/documentation/concepts/indexing/): This technique builds additional links **(orange)** between leftover data points. The filtered points which stay behind are now traversible once again. Qdrant uses special category-based methods to connect these data points. * * * Read more about [**Filtering Docs**](https://qdrant.tech/documentation/concepts/filtering/) and check out the [**Complete Filtering Guide**](https://qdrant.tech/articles/vector-search-filtering/). * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#batch-processing) Batch Processing Batch processing consolidates multiple operations into a single execution cycle, reducing request overhead and enhancing throughput. It’s an effective strategy for both data insertion and query execution. ![batch-processing](https://qdrant.tech/articles_data/vector-search-resource-optimization/batch-processing.png) **Batch Insertions**: Instead of inserting vectors individually, group them into medium-sized batches to minimize the number of database requests and the overhead of frequent writes. **Example:** ```python vectors = [\ [.1, .0, .0, .0],\ [.0, .1, .0, .0],\ [.0, .0, .1, .0],\ [.0, .0, .0, .1],\ …\ ] client.upload_collection( collection_name="test_collection", vectors=vectors, ) ``` This reduces write operations and ensures faster data ingestion. **Batch Queries**: Similarly, you can batch multiple queries together rather than executing them one by one. This reduces the number of round trips to the database, optimizing performance and reducing latency. **Example:** ```python results = client.search_batch( collection_name="test_collection", requests=[\ SearchRequest(\ vector=[0., 0., 2., 0.],\ limit=1,\ ),\ SearchRequest(\ vector=[0., 0., 0., 0.01],\ with_vector=True,\ limit=2,\ )\ ] ) ``` Batch queries are particularly useful when processing a large number of similar queries or when handling multiple user requests simultaneously. * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#hybrid-search) Hybrid Search Hybrid search combines **keyword filtering** with **vector similarity search**, enabling faster and more precise results. Keywords help narrow down the dataset quickly, while vector similarity ensures semantic accuracy. This search method combines [**dense and sparse vectors**](https://qdrant.tech/documentation/concepts/vectors/). Hybrid search in Qdrant uses both fusion and reranking. The former is about combining the results from different search methods, based solely on the scores returned by each method. That usually involves some normalization, as the scores returned by different methods might be in different ranges. **Figure 8**: Hybrid Search Architecture ![hybrid-search](https://qdrant.tech/articles_data/vector-search-resource-optimization/hybrid-search.png) After that, there is a formula that takes the relevancy measures and calculates the final score that we use later on to reorder the documents. Qdrant has built-in support for the Reciprocal Rank Fusion method, which is the de facto standard in the field. * * * Learn more about [**Hybrid Search**](https://qdrant.tech/articles/hybrid-search/) and read out [**Hybrid Queries docs**](https://qdrant.tech/documentation/concepts/hybrid-queries/). * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#oversampling) Oversampling Oversampling is a technique that helps compensate for any precision lost due to quantization. Since quantization simplifies vectors, some relevant matches could be missed in the initial search. To avoid this, you can **retrieve more candidates**, increasing the chances that the most relevant vectors make it into the final results. You can control the number of extra candidates by setting an `oversampling` parameter. For example, if your desired number of results ( `limit`) is 4 and you set an `oversampling` factor of 2, Qdrant will retrieve 8 candidates (4 × 2). You can adjust the oversampling factor to control how many extra vectors Qdrant includes in the initial pool. More candidates mean a better chance of obtaining high-quality top-K results, especially after rescoring with the original vectors. * * * Learn more about [**Oversampling**](https://qdrant.tech/articles/what-is-vector-quantization/#2-oversampling). * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#rescoring) Rescoring After oversampling to gather more potential matches, each candidate is re-evaluated based on additional criteria to ensure higher accuracy and relevance to the query. The rescoring process maps the quantized vectors to their corresponding original vectors, allowing you to consider factors like context, metadata, or additional relevance that wasn’t included in the initial search, leading to more accurate results. **Example of Rescoring and Oversampling:**: ```python client.query_points( collection_name="my_collection", query_vector=[0.22, -0.01, -0.98, 0.37], search_params=models.SearchParams( quantization=models.QuantizationSearchParams( rescore=True, # Enables rescoring with original vectors oversampling=2 # Retrieves extra candidates for rescoring ) ), limit=4 # Desired number of final results ) ``` * * * Learn more about [**Rescoring**](https://qdrant.tech/articles/what-is-vector-quantization/#3-rescoring-with-original-vectors). * * * #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#reranking) Reranking Reranking adjusts the order of search results based on additional criteria, ensuring the most relevant results are prioritized. This method is about taking the results from different search methods and reordering them based on some additional processing using the content of the documents, not just the scores. This processing may rely on an additional neural model, such as a cross-encoder which would be inefficient enough to be used on the whole dataset. ![reranking](https://qdrant.tech/articles_data/vector-search-resource-optimization/reranking.png) These methods are practically applicable only when used on a smaller subset of candidates returned by the faster search methods. Late interaction models, such as ColBERT, are way more efficient in this case, as they can be used to rerank the candidates without the need to access all the documents in the collection. **Example:** ```python client.query_points( "collection-name", prefetch=prefetch, # Previous results query=late_vectors, # Colbert converted query using="colbertv2.0", with_payload=True, limit=10, ) ``` * * * Learn more about [**Reranking**](https://qdrant.tech/documentation/search-precision/reranking-hybrid-search/#rerank). * * * ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#storage-disk-vs-ram) Storage: Disk vs RAM ![disk](https://qdrant.tech/articles_data/vector-search-resource-optimization/disk.png) | Storage | Description | | --- | --- | | **RAM** | Crucial for fast access to frequently used data, such as indexed vectors. The amount of RAM required can be estimated based on your dataset size and dimensionality. For example, storing **1 million vectors with 1024 dimensions** would require approximately **5.72 GB of RAM**. | | **Disk** | Suitable for less frequently accessed data, such as payloads and non-critical information. Disk-backed storage reduces memory demands but can introduce slight latency. | #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#which-disk-type) Which Disk Type? **Local SSDs** are recommended for optimal performance, as they provide the fastest query response times with minimal latency. While network-attached storage is also viable, it typically introduces additional latency that can affect performance, so local SSDs are preferred when possible, particularly for workloads requiring high-speed random access. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#memory-management-for-vectors-and-payload) Memory Management for Vectors and Payload As your data scales, effective resource management becomes crucial to keeping costs low while ensuring your application remains reliable and performant. One of the key areas to focus on is **memory management**. Understanding how Qdrant handles memory can help you make informed decisions about scaling your vector database. Qdrant supports two main methods for storing vectors: #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#1-in-memory-storage) 1\. In-Memory Storage - **How it works**: All data is stored in RAM, providing the fastest access times for queries and operations. - **When to use it**: This setup is ideal for applications where performance is critical, and your RAM capacity can accommodate all data. - **Advantages**: Maximum speed for queries and updates. - **Limitations**: RAM usage can become a bottleneck as your dataset grows. #### [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#2-memmap-storage) 2\. Memmap Storage - **How it works**: Instead of loading all data into memory, memmap storage maps data files directly to a virtual address space on disk. The system’s page cache handles data access, making it highly efficient. - **When to use it**: Perfect for storing large collections that exceed your available RAM while still maintaining near in-memory performance when enough RAM is available. - **Advantages**: Balances performance and memory usage, allowing you to work with datasets larger than your physical RAM. - **Limitations**: Slightly slower than pure in-memory storage but significantly more scalable. To enable memmap vector storage in Qdrant, you can set the **on\_disk** parameter to `true` when creating or updating a collection. ```python client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( … on_disk=True ) ) ``` To do the same for payloads: ```python client.create_collection( collection_name="{collection_name}", on_disk_payload= True ) ``` The general guideline for selecting a storage method in Qdrant is to use **InMemory storage** when high performance is a priority, and sufficient RAM is available to accommodate the dataset. This approach ensures the fastest access speeds by keeping data readily accessible in memory. However, for larger datasets or scenarios where memory is limited, **Memmap** and **OnDisk storage** are more suitable. These methods significantly reduce memory usage by storing data on disk while leveraging advanced techniques like page caching and indexing to maintain efficient and relatively fast data access. ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#monitoring-the-database) Monitoring the Database ![monitoring](https://qdrant.tech/articles_data/vector-search-resource-optimization/monitor.png) Continuous monitoring is essential for maintaining system health and identifying potential issues before they escalate. Tools like **Prometheus** and **Grafana** are widely used to achieve this. - **Prometheus**: An open-source monitoring and alerting toolkit, Prometheus collects and stores metrics in a time-series database. It scrapes metrics from predefined endpoints and supports powerful querying and visualization capabilities. - **Grafana**: Often paired with Prometheus, Grafana provides an intuitive interface for visualizing metrics and creating interactive dashboards. Qdrant exposes metrics in the **Prometheus/OpenMetrics** format through the /metrics endpoint. Prometheus can scrape this endpoint to monitor various aspects of the Qdrant system. For a local Qdrant instance, the metrics endpoint is typically available at: ```python http://localhost:6333/metrics ``` * * * Here are some important metrics to monitor: | **Metric Name** | | **Meaning** | | --- | --- | --- | | collections\_total | | Total number of collections | | collections\_vector\_total | | Total number of vectors in all collections | | rest\_responses\_avg\_duration\_seconds | | Average response duration in REST API | | grpc\_responses\_avg\_duration\_seconds | | Average response duration in gRPC API | | rest\_responses\_fail\_total | | Total number of failed responses (REST) | Read more about [**Qdrant Open Source Monitoring**](https://qdrant.tech/documentation/guides/monitoring/) and [**Qdrant Cloud Monitoring**](https://qdrant.tech/documentation/cloud/cluster-monitoring/) for managed clusters. * * * ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#recap-when-should-you-optimize) Recap: When Should You Optimize? ![solutions](https://qdrant.tech/articles_data/vector-search-resource-optimization/solutions.png) | Scenario | Description | | --- | --- | | **When You Scale Up** | As data grows and the request surge, optimizing resource usage ensures your systems stay responsive and cost-efficient, even under heavy loads. | | **If Facing Budget Constraints** | Strike the perfect balance between performance and cost, cutting unnecessary expenses while maintaining essential capabilities. | | **You Need Better Performance** | If you’re noticing slow query speeds, latency issues, or frequent timeouts, it’s time to fine-tune your resource allocation. | | **When System Stability is Paramount** | To manage high-traffic environments you will need to prevent crashes or failures caused by resource exhaustion. | ## [Anchor](https://qdrant.tech/articles/vector-search-resource-optimization/\#get-the-cheatsheet) Get the Cheatsheet Want to download a printer-friendly version of this guide? [**Download it now.**](https://try.qdrant.tech/resource-optimization-guide). [![downloadable vector search resource optimization guide](https://qdrant.tech/articles_data/vector-search-resource-optimization/downloadable-guide.jpg)](https://try.qdrant.tech/resource-optimization-guide) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/vector-search-resource-optimization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/vector-search-resource-optimization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-123-lllmstxt|> ## immutable-data-structures - [Articles](https://qdrant.tech/articles/) - Qdrant Internals: Immutable Data Structures [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Qdrant Internals: Immutable Data Structures Andrey Vasnetsov · August 20, 2024 ![Qdrant Internals: Immutable Data Structures](https://qdrant.tech/articles_data/immutable-data-structures/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#data-structures-101) Data Structures 101 Those who took programming courses might remember that there is no such thing as a universal data structure. Some structures are good at accessing elements by index (like arrays), while others shine in terms of insertion efficiency (like linked lists). ![Hardware-optimized data structure](https://qdrant.tech/articles_data/immutable-data-structures/hardware-optimized.png) Hardware-optimized data structure However, when we move from theoretical data structures to real-world systems, and particularly in performance-critical areas such as [vector search](https://qdrant.tech/use-cases/), things become more complex. [Big-O notation](https://en.wikipedia.org/wiki/Big_O_notation) provides a good abstraction, but it doesn’t account for the realities of modern hardware: cache misses, memory layout, disk I/O, and other low-level considerations that influence actual performance. > From the perspective of hardware efficiency, the ideal data structure is a contiguous array of bytes that can be read sequentially in a single thread. This scenario allows hardware optimizations like prefetching, caching, and branch prediction to operate at their best. However, real-world use cases require more complex structures to perform various operations like insertion, deletion, and search. These requirements increase complexity and introduce performance trade-offs. ### [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#mutability) Mutability One of the most significant challenges when working with data structures is ensuring **mutability — the ability to change the data structure after it’s created**, particularly with fast update operations. Let’s consider a simple example: we want to iterate over items in sorted order. Without a mutability requirement, we can use a simple array and sort it once. This is very close to our ideal scenario. We can even put the structure on disk - which is trivial for an array. However, if we need to insert an item into this array, **things get more complicated**. Inserting into a sorted array requires shifting all elements after the insertion point, which leads to linear time complexity for each insertion, which is not acceptable for many applications. To handle such cases, more complex structures like [B-trees](https://en.wikipedia.org/wiki/B-tree) come into play. B-trees are specifically designed to optimize both insertion and read operations for large data sets. However, they sacrifice the raw speed of array reads for better insertion performance. Here’s a benchmark that illustrates the difference between iterating over a plain array and a BTreeSet in Rust: ```rust use std::collections::BTreeSet; use rand::Rng; fn main() { // Benchmark plain vector VS btree in a task of iteration over all elements let mut rand = rand::thread_rng(); let vector: Vec<_> = (0..1000000).map(|_| rand.gen::()).collect(); let btree: BTreeSet<_> = vector.iter().copied().collect(); { let mut sum = 0; for el in vector { sum += el; } } // Elapsed: 850.924µs { let mut sum = 0; for el in btree { sum += el; } } // Elapsed: 5.213025ms, ~6x slower } ``` [Vector databases](https://qdrant.tech/), like Qdrant, have to deal with a large variety of data structures. If we could make them immutable, it would significantly improve performance and optimize memory usage. ## [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#how-does-immutability-help) How Does Immutability Help? A large part of the immutable advantage comes from the fact that we know the exact data we need to put into the structure even before we start building it. The simplest example is a sorted array: we would know exactly how many elements we have to put into the array so we can allocate the exact amount of memory once. More complex data structures might require additional statistics to be collected before the structure is built. A Qdrant-related example of this is [Scalar Quantization](https://qdrant.tech/articles/scalar-quantization/#conversion-to-integers): in order to select proper quantization levels, we have to know the distribution of the data. ![Scalar Quantization Quantile](https://qdrant.tech/articles_data/immutable-data-structures/quantization-quantile.png) Scalar Quantization Quantile Computing this distribution requires knowing all the data in advance, but once we have it, applying scalar quantization is a simple operation. Let’s take a look at a non-exhaustive list of data structures and potential improvements we can get from making them immutable: | Function | Mutable Data Structure | Immutable Alternative | Potential improvements | | --- | --- | --- | --- | | Read by index | Array | Fixed chunk of memory | Allocate exact amount of memory | | Vector Storage | Array or Arrays | Memory-mapped file | Offload data to disk | | Read sorted ranges | B-Tree | Sorted Array | Store all data close, avoid cache misses | | Read by key | Hash Map | Hash Map with Perfect Hashing | Avoid hash collisions | | Get documents by keyword | Inverted Index | Inverted Index with Sorted
and BitPacked Postings | Less memory usage, faster search | | Vector Search | HNSW graph | HNSW graph with
payload-aware connections | Better precision with filters | | Tenant Isolation | Vector Storage | Defragmented Vector Storage | Faster access to on-disk data | For more info on payload-aware connections in HNSW, read our [previous article](https://qdrant.tech/articles/filtrable-hnsw/). This time around, we will focus on the latest additions to Qdrant: - **the immutable hash map with perfect hashing** - **defragmented vector storage**. ### [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#perfect-hashing) Perfect Hashing A hash table is one of the most commonly used data structures implemented in almost every programming language, including Rust. It provides fast access to elements by key, with an average time complexity of O(1) for read and write operations. There is, however, the assumption that should be satisfied for the hash table to work efficiently: _hash collisions should not cause too much overhead_. In a hash table, each key is mapped to a “bucket,” a slot where the value is stored. When different keys map to the same bucket, a collision occurs. In regular mutable hash tables, minimization of collisions is achieved by: - making the number of buckets bigger so the probability of collision is lower - using a linked list or a tree to store multiple elements with the same hash However, these strategies have overheads, which become more significant if we consider using high-latency storage like disk. Indeed, every read operation from disk is several orders of magnitude slower than reading from RAM, so we want to know the correct location of the data from the first attempt. In order to achieve this, we can use a so-called minimal perfect hash function (MPHF). This special type of hash function is constructed specifically for a given set of keys, and it guarantees no collisions while using minimal amount of buckets. In Qdrant, we decided to use _fingerprint-based minimal perfect hash function_ implemented in the [ph crate 🦀](https://crates.io/crates/ph) by [Piotr Beling](https://dl.acm.org/doi/10.1145/3596453). According to our benchmarks, using the perfect hash function does introduce some overhead in terms of hashing time, but it significantly reduces the time for the whole operation: | Volume | `ph::Function` | `std::hash::Hash` | `HashMap::get` | | --- | --- | --- | --- | | 1000 | 60ns | ~20ns | 34ns | | 100k | 90ns | ~20ns | 220ns | | 10M | 238ns | ~20ns | 500ns | Even thought the absolute time for hashing is higher, the time for the whole operation is lower, because PHF guarantees no collisions. The difference is even more significant when we consider disk read time, which might up to several milliseconds (10^6 ns). PHF RAM size scales linearly for `ph::Function`: 3.46 kB for 10k elements, 119MB for 350M elements. The construction time required to build the hash function is surprisingly low, and we only need to do it once: | Volume | `ph::Function` (construct) | PHF size | Size of int64 keys (for reference) | | --- | --- | --- | --- | | 1M | 52ms | 0.34Mb | 7.62Mb | | 100M | 7.4s | 33.7Mb | 762.9Mb | The usage of PHF in Qdrant lets us minimize the latency of cold reads, which is especially important for large-scale multi-tenant systems. With PHF, it is enough to read a single page from a disk to get the exact location of the data. ### [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#defragmentation) Defragmentation When you read data from a disk, you almost never read a single byte. Instead, you read a page, which is a fixed-size chunk of data. On many systems, the page size is 4KB, which means that every read operation will read 4KB of data, even if you only need a single byte. Vector search, on the other hand, requires reading a lot of small vectors, which might create a large overhead. It is especially noticeable if we use binary quantization, where the size of even large OpenAI 1536d vectors is compressed down to **192 bytes**. ![Overhead when reading a single vector](https://qdrant.tech/articles_data/immutable-data-structures/page-vector.png) Overhead when reading single vector That means if the vectors we access during the search are randomly scattered across the disk, we will have to read 4KB for each vector, which is 20 times more than the actual data size. There is, however, a simple way to avoid this overhead: **defragmentation**. If we knew some additional information about the data, we could combine all relevant vectors into a single page. ![Defragmentation](https://qdrant.tech/articles_data/immutable-data-structures/defragmentation.png) Defragmentation This additional information is available to Qdrant via the [payload index](https://qdrant.tech/documentation/concepts/indexing/#payload-index). By specifying the payload index, which is going to be used for filtering most of the time, we can put all vectors with the same payload together. This way, reading a single page will also read nearby vectors, which will be used in the search. This approach is especially efficient for [multi-tenant systems](https://qdrant.tech/documentation/guides/multiple-partitions/), where only a small subset of vectors is actively used for search. The capacity of such a deployment is typically defined by the size of the hot subset, which is much smaller than the total number of vectors. > Grouping relevant vectors together allows us to optimize the size of the hot subset by avoiding caching of irrelevant data. > The following benchmark data compares RPS for defragmented and non-defragmented storage: | % of hot subset | Tenant Size (vectors) | RPS, Non-defragmented | RPS, Defragmented | | --- | --- | --- | --- | | 2.5% | 50k | 1.5 | 304 | | 12.5% | 50k | 0.47 | 279 | | 25% | 50k | 0.4 | 63 | | 50% | 50k | 0.3 | 8 | | 2.5% | 5k | 56 | 490 | | 12.5% | 5k | 5.8 | 488 | | 25% | 5k | 3.3 | 490 | | 50% | 5k | 3.1 | 480 | | 75% | 5k | 2.9 | 130 | | 100% | 5k | 2.7 | 95 | **Dataset size:** 2M 768d vectors (~6Gb Raw data), binary quantization, 650Mb of RAM limit. All benchmarks are made with minimal RAM allocation to demonstrate disk cache efficiency. As you can see, the biggest impact is on the small tenant size, where defragmentation allows us to achieve **100x more RPS**. Of course, the real-world impact of defragmentation depends on the specific workload and the size of the hot subset, but enabling this feature can significantly improve the performance of Qdrant. Please find more details on how to enable defragmentation in the [indexing documentation](https://qdrant.tech/documentation/concepts/indexing/#tenant-index). ## [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#updating-immutable-data-structures) Updating Immutable Data Structures One may wonder how Qdrant allows updating collection data if everything is immutable. Indeed, [Qdrant API](https://api.qdrant.tech/) allows the change of any vector or payload at any time, so from the user’s perspective, the whole collection is mutable at any time. As it usually happens with every decent magic trick, the secret is disappointingly simple: not all data in Qdrant is immutable. In Qdrant, storage is divided into segments, which might be either mutable or immutable. New data is always written to the mutable segment, which is later converted to the immutable one by the optimization process. ![Optimization process](https://qdrant.tech/articles_data/immutable-data-structures/optimization.png) Optimization process If we need to update the data in the immutable or currenly optimized segment, instead of changing the data in place, we perform a copy-on-write operation, move the data to the mutable segment, and update it there. Data in the original segment is marked as deleted, and later vacuumed by the optimization process. ## [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#downsides-and-how-to-compensate) Downsides and How to Compensate While immutable data structures are great for read-heavy operations, they come with trade-offs: - **Higher update costs:** Immutable structures are less efficient for updates. The amortized time complexity might be the same as mutable structures, but the constant factor is higher. - **Rebuilding overhead:** In some cases, we may need to rebuild indices or structures for the same data more than once. - **Read-heavy workloads:** Immutability assumes a search-heavy workload, which is typical for search engines but not for all applications. In Qdrant, we mitigate these downsides by allowing the user to adapt the system to their specific workload. For example, changing the default size of the segment might help to reduce the overhead of rebuilding indices. In extreme cases, multi-segment storage can act as a single segment, falling back to the mutable data structure when needed. ## [Anchor](https://qdrant.tech/articles/immutable-data-structures/\#conclusion) Conclusion Immutable data structures, while tricky to implement correctly, offer significant performance gains, especially for read-heavy systems like search engines. They allow us to take full advantage of hardware optimizations, reduce memory overhead, and improve cache performance. In Qdrant, the combination of techniques like perfect hashing and defragmentation brings further benefits, making our vector search operations faster and more efficient. While there are trade-offs, the flexibility of Qdrant’s architecture — including segment-based storage — allows us to balance the best of both worlds. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/immutable-data-structures.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/immutable-data-structures.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-124-lllmstxt|> ## natural-language-search-oracle-cloud-infrastructure-cohere-langchain - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - RAG System for Employee Onboarding --- # [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#rag-system-for-employee-onboarding) RAG System for Employee Onboarding Public websites are a great way to share information with a wide audience. However, finding the right information can be challenging, if you are not familiar with the website’s structure or the terminology used. That’s what the search bar is for, but it is not always easy to formulate a query that will return the desired results, if you are not yet familiar with the content. This is even more important in a corporate environment, and for the new employees, who are just starting to learn the ropes, and don’t even know how to ask the right questions yet. You may have even the best intranet pages, but onboarding is more than just reading the documentation, it is about understanding the processes. Semantic search can help with finding right resources easier, but wouldn’t it be easier to just chat with the website, like you would with a colleague? Technological advancements have made it possible to interact with websites using natural language. This tutorial will guide you through the process of integrating [Cohere](https://cohere.com/)’s language models with Qdrant to enable natural language search on your documentation. We are going to use [LangChain](https://langchain.com/) as an orchestrator. Everything will be hosted on [Oracle Cloud Infrastructure (OCI)](https://www.oracle.com/cloud/), so you can scale your application as needed, and do not send your data to third parties. That is especially important when you are working with confidential or sensitive data. ## [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#building-up-the-application) Building up the application Our application will consist of two main processes: indexing and searching. Langchain will glue everything together, as we will use a few components, including Cohere and Qdrant, as well as some OCI services. Here is a high-level overview of the architecture: ![Architecture diagram of the target system](https://qdrant.tech/documentation/examples/faq-oci-cohere-langchain/architecture-diagram.png) ### [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#prerequisites) Prerequisites Before we dive into the implementation, make sure to set up all the necessary accounts and tools. #### [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#libraries) Libraries We are going to use a few Python libraries. Of course, Langchain will be our main framework, but the Cohere models on OCI are accessible via the [OCI SDK](https://docs.oracle.com/en-us/iaas/tools/python/2.125.1/). Let’s install all the necessary libraries: ```shell pip install langchain oci qdrant-client langchainhub ``` #### [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#oracle-cloud) Oracle Cloud Our application will be fully running on Oracle Cloud Infrastructure (OCI). It’s up to you to choose how you want to deploy your application. Qdrant Hybrid Cloud will be running in your [Kubernetes cluster running on Oracle Cloud\\ (OKE)](https://www.oracle.com/cloud/cloud-native/container-engine-kubernetes/), so all the processes might be also deployed there. You can get started with signing up for an account on [Oracle Cloud](https://signup.cloud.oracle.com/). Cohere models are available on OCI as a part of the [Generative AI\\ Service](https://www.oracle.com/artificial-intelligence/generative-ai/generative-ai-service/). We need both the [Generation models](https://docs.oracle.com/en-us/iaas/Content/generative-ai/use-playground-generate.htm) and the [Embedding models](https://docs.oracle.com/en-us/iaas/Content/generative-ai/use-playground-embed.htm). Please follow the linked tutorials to grasp the basics of using Cohere models there. Accessing the models programmatically requires knowing the compartment OCID. Please refer to the [documentation that\\ describes how to find it](https://docs.oracle.com/en-us/iaas/Content/GSG/Tasks/contactingsupport_topic-Locating_Oracle_Cloud_Infrastructure_IDs.htm#Finding_the_OCID_of_a_Compartment). For the further reference, we will assume that the compartment OCID is stored in the environment variable: shellpython ```shell export COMPARTMENT_OCID="" ``` ```python import os os.environ["COMPARTMENT_OCID"] = "" ``` #### [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#qdrant-hybrid-cloud) Qdrant Hybrid Cloud Qdrant Hybrid Cloud running on Oracle Cloud helps you build a solution without sending your data to external services. Our documentation provides a step-by-step guide on how to [deploy Qdrant Hybrid Cloud on Oracle\\ Cloud](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/#oracle-cloud-infrastructure). Qdrant will be running on a specific URL and access will be restricted by the API key. Make sure to store them both as environment variables as well: ```shell export QDRANT_URL="https://qdrant.example.com" export QDRANT_API_KEY="your-api-key" ``` _Optional:_ Whenever you use LangChain, you can also [configure LangSmith](https://docs.smith.langchain.com/), which will help us trace, monitor and debug LangChain applications. You can sign up for LangSmith [here](https://smith.langchain.com/). ```shell export LANGCHAIN_TRACING_V2=true export LANGCHAIN_API_KEY="your-api-key" export LANGCHAIN_PROJECT="your-project" # if not specified, defaults to "default" ``` Now you can get started: ```python import os os.environ["QDRANT_URL"] = "https://qdrant.example.com" os.environ["QDRANT_API_KEY"] = "your-api-key" ``` Let’s create the collection that will store the indexed documents. We will use the `qdrant-client` library, and our collection will be named `oracle-cloud-website`. Our embedding model, `cohere.embed-english-v3.0`, produces embeddings of size 1024, and we have to specify that when creating the collection. ```python from qdrant_client import QdrantClient, models client = QdrantClient( location=os.environ.get("QDRANT_URL"), api_key=os.environ.get("QDRANT_API_KEY"), ) client.create_collection( collection_name="oracle-cloud-website", vectors_config=models.VectorParams( size=1024, distance=models.Distance.COSINE, ), ) ``` ### [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#indexing-process) Indexing process We have all the necessary tools set up, so let’s start with the indexing process. We will use the Cohere Embedding models to convert the text into vectors, and then store them in Qdrant. Langchain is integrated with OCI Generative AI Service, so we can easily access the models. Our dataset will be fairly simple, as it will consist of the questions and answers from the [Oracle Cloud Free Tier\\ FAQ page](https://www.oracle.com/cloud/free/faq/). ![Some examples of the Oracle Cloud FAQ](https://qdrant.tech/documentation/examples/faq-oci-cohere-langchain/oracle-faq.png) Questions and answers are presented in an HTML format, but we don’t want to manually extract the text and adapt it for each subpage. Instead, we will use the `WebBaseLoader` that just loads the HTML content from given URL and converts it to text. ```python from langchain_community.document_loaders.web_base import WebBaseLoader loader = WebBaseLoader("https://www.oracle.com/cloud/free/faq/") documents = loader.load() ``` Our `documents` is a list with just a single element, which is the text of the whole page. We need to split it into meaningful parts, so we will use the `RecursiveCharacterTextSplitter` component. It will try to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text. The chunk size and overlap are both parameters that can be adjusted to fit the specific use case. ```python from langchain_text_splitters import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=100) split_documents = splitter.split_documents(documents) ``` Our documents might be now indexed, but we need to convert them into vectors. Let’s configure the embeddings so the `cohere.embed-english-v3.0` is used. Not all the regions support the Generative AI Service, so we need to specify the region where the models are stored. We will use the `us-chicago-1`, but please check the [documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions) for the most up-to-date list of supported regions. ```python from langchain_community.embeddings.oci_generative_ai import OCIGenAIEmbeddings embeddings = OCIGenAIEmbeddings( model_id="cohere.embed-english-v3.0", service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com", compartment_id=os.environ.get("COMPARTMENT_OCID"), ) ``` Now we can embed the documents and store them in Qdrant. We will create an instance of `Qdrant` and add the split documents to the collection. ```python from langchain.vectorstores.qdrant import Qdrant qdrant = Qdrant( client=client, collection_name="oracle-cloud-website", embeddings=embeddings, ) qdrant.add_documents(split_documents, batch_size=20) ``` Our documents should be now indexed and ready for searching. Let’s move to the next step. ### [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#speaking-to-the-website) Speaking to the website The intended method of interaction with the website is through the chatbot. Large Language Model, in our case [Cohere\\ Command](https://cohere.com/command), will be answering user’s questions based on the relevant documents that Qdrant will return using the question as a query. Our LLM is also hosted on OCI, so we can access it similarly to the embedding model: ```python from langchain_community.llms.oci_generative_ai import OCIGenAI llm = OCIGenAI( model_id="cohere.command", service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com", compartment_id=os.environ.get("COMPARTMENT_OCID"), ) ``` Connection to Qdrant might be established in the same way as we did during the indexing process. We can use it to create a retrieval chain, which implements the question-answering process. The retrieval chain also requires an additional chain that will combine retrieved documents before sending them to an LLM. ```python from langchain.chains.combine_documents import create_stuff_documents_chain from langchain.chains.retrieval import create_retrieval_chain from langchain import hub retriever = qdrant.as_retriever() combine_docs_chain = create_stuff_documents_chain( llm=llm, # Default prompt is loaded from the hub, but we can also modify it prompt=hub.pull("langchain-ai/retrieval-qa-chat"), ) retrieval_qa_chain = create_retrieval_chain( retriever=retriever, combine_docs_chain=combine_docs_chain, ) response = retrieval_qa_chain.invoke({"input": "What is the Oracle Cloud Free Tier?"}) ``` The output of the `.invoke` method is a dictionary-like structure with the query and answer, but we can also access the source documents used to generate the response. This might be useful for debugging or for further processing. ```python { 'input': 'What is the Oracle Cloud Free Tier?', 'context': [\ Document(\ page_content='* Free Tier is generally available in regions where commercial Oracle Cloud Infrastructure service is available. See the data regions page for detailed service availability (the exact regions available for Free Tier may differ during the sign-up process). The US$300 cloud credit is available in',\ metadata={\ 'language': 'en-US',\ 'source': 'https://www.oracle.com/cloud/free/faq/',\ 'title': "FAQ on Oracle's Cloud Free Tier",\ '_id': 'c8cf98e0-4b88-4750-be42-4157495fed2c',\ '_collection_name': 'oracle-cloud-website'\ }\ ),\ Document(\ page_content='Oracle Cloud Free Tier allows you to sign up for an Oracle Cloud account which provides a number of Always Free services and a Free Trial with US$300 of free credit to use on all eligible Oracle Cloud Infrastructure services for up to 30 days. The Always Free services are available for an unlimited',\ metadata={\ 'language': 'en-US',\ 'source': 'https://www.oracle.com/cloud/free/faq/',\ 'title': "FAQ on Oracle's Cloud Free Tier",\ '_id': 'dc291430-ff7b-4181-944a-39f6e7a0de69',\ '_collection_name': 'oracle-cloud-website'\ }\ ),\ Document(\ page_content='Oracle Cloud Free Tier does not include SLAs. Community support through our forums is available to all customers. Customers using only Always Free resources are not eligible for Oracle Support. Limited support is available for Oracle Cloud Free Tier with Free Trial credits. After you use all of',\ metadata={\ 'language': 'en-US',\ 'source': 'https://www.oracle.com/cloud/free/faq/',\ 'title': "FAQ on Oracle's Cloud Free Tier",\ '_id': '9e831039-7ccc-47f7-9301-20dbddd2fc07',\ '_collection_name': 'oracle-cloud-website'\ }\ ),\ Document(\ page_content='looking to test things before moving to cloud, a student wanting to learn, or an academic developing curriculum in the cloud, Oracle Cloud Free Tier enables you to learn, explore, build and test for free.',\ metadata={\ 'language': 'en-US',\ 'source': 'https://www.oracle.com/cloud/free/faq/',\ 'title': "FAQ on Oracle's Cloud Free Tier",\ '_id': 'e2dc43e1-50ee-4678-8284-6df60a835cf5',\ '_collection_name': 'oracle-cloud-website'\ }\ )\ ], 'answer': ' Oracle Cloud Free Tier is a subscription that gives you access to Always Free services and a Free Trial with $300 of credit that can be used on all eligible Oracle Cloud Infrastructure services for up to 30 days. \n\nThrough this Free Tier, you can learn, explore, build, and test for free. It is aimed at those who want to experiment with cloud services before making a commitment, as wellTheir use cases range from testing prior to cloud migration to learning and academic curriculum development. ' } ``` #### [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#other-experiments) Other experiments Asking the basic questions is just the beginning. What you want to avoid is a hallucination, where the model generates an answer that is not based on the actual content. The default prompt of Langchain should already prevent this, but you might still want to check it. Let’s ask a question that is not directly answered on the FAQ page: ```python response = retrieval_qa.invoke({ "input": "Is Oracle Generative AI Service included in the free tier?" }) ``` Output: > Oracle Generative AI Services are not specifically mentioned as being available in the free tier. As per the text, the > $300 free credit can be used on all eligible services for up to 30 days. To confirm if Oracle Generative AI Services > are included in the free credit offer, it is best to check the official Oracle Cloud website or contact their support. It seems that Cohere Command model could not find the exact answer in the provided documents, but it tried to interpret the context and provide a reasonable answer, without making up the information. This is a good sign that the model is not hallucinating in that case. ## [Anchor](https://qdrant.tech/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain/\#wrapping-up) Wrapping up This tutorial has shown how to integrate Cohere’s language models with Qdrant to enable natural language search on your website. We have used Langchain as an orchestrator, and everything was hosted on Oracle Cloud Infrastructure (OCI). Real world would require integrating this mechanism into your organization’s systems, but we built a solid foundation that can be further developed. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/natural-language-search-oracle-cloud-infrastructure-cohere-langchain.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-125-lllmstxt|> ## what-are-embeddings - [Articles](https://qdrant.tech/articles/) - What are Vector Embeddings? - Revolutionize Your Search Experience [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # What are Vector Embeddings? - Revolutionize Your Search Experience Sabrina Aquino · February 06, 2024 ![What are Vector Embeddings? - Revolutionize Your Search Experience](https://qdrant.tech/articles_data/what-are-embeddings/preview/title.jpg) > **Embeddings** are numerical machine learning representations of the semantic of the input data. They capture the meaning of complex, high-dimensional data, like text, images, or audio, into vectors. Enabling algorithms to process and analyze the data more efficiently. You know when you’re scrolling through your social media feeds and the content just feels incredibly tailored to you? There’s the news you care about, followed by a perfect tutorial with your favorite tech stack, and then a meme that makes you laugh so hard you snort. Or what about how YouTube recommends videos you ended up loving. It’s by creators you’ve never even heard of and you didn’t even send YouTube a note about your ideal content lineup. This is the magic of embeddings. These are the result of **deep learning models** analyzing the data of your interactions online. From your likes, shares, comments, searches, the kind of content you linger on, and even the content you decide to skip. It also allows the algorithm to predict future content that you are likely to appreciate. The same embeddings can be repurposed for search, ads, and other features, creating a highly personalized user experience. ![How embeddings are applied to perform recommendantions and other use cases](https://qdrant.tech/articles_data/what-are-embeddings/Embeddings-Use-Case.jpg) They make [high-dimensional](https://www.sciencedirect.com/topics/computer-science/high-dimensional-data) data more manageable. This reduces storage requirements, improves computational efficiency, and makes sense of a ton of **unstructured** data. ## [Anchor](https://qdrant.tech/articles/what-are-embeddings/\#why-use-vector-embeddings) Why use vector embeddings? The **nuances** of natural language or the hidden **meaning** in large datasets of images, sounds, or user interactions are hard to fit into a table. Traditional relational databases can’t efficiently query most types of data being currently used and produced, making the **retrieval** of this information very limited. In the embeddings space, synonyms tend to appear in similar contexts and end up having similar embeddings. The space is a system smart enough to understand that “pretty” and “attractive” are playing for the same team. Without being explicitly told so. That’s the magic. At their core, vector embeddings are about semantics. They take the idea that “a word is known by the company it keeps” and apply it on a grand scale. ![Example of how synonyms are placed closer together in the embeddings space](https://qdrant.tech/articles_data/what-are-embeddings/Similar-Embeddings.jpg) This capability is crucial for creating search systems, recommendation engines, retrieval augmented generation (RAG) and any application that benefits from a deep understanding of content. ## [Anchor](https://qdrant.tech/articles/what-are-embeddings/\#how-do-embeddings-work) How do embeddings work? Embeddings are created through neural networks. They capture complex relationships and semantics into [dense vectors](https://www1.se.cuhk.edu.hk/~seem5680/lecture/semantics-with-dense-vectors-2018.pdf) which are more suitable for machine learning and data processing applications. They can then project these vectors into a proper **high-dimensional** space, specifically, a [Vector Database](https://qdrant.tech/articles/what-is-a-vector-database/). ![The process for turning raw data into embeddings and placing them into the vector space](https://qdrant.tech/articles_data/what-are-embeddings/How-Embeddings-Work.jpg) The meaning of a data point is implicitly defined by its **position** on the vector space. After the vectors are stored, we can use their spatial properties to perform [nearest neighbor searches](https://en.wikipedia.org/wiki/Nearest_neighbor_search#:~:text=Nearest%20neighbor%20search%20%28NNS%29%2C,the%20larger%20the%20function%20values.). These searches retrieve semantically similar items based on how close they are in this space. > The quality of the vector representations drives the performance. The embedding model that works best for you depends on your use case. ### [Anchor](https://qdrant.tech/articles/what-are-embeddings/\#creating-vector-embeddings) Creating vector embeddings Embeddings translate the complexities of human language to a format that computers can understand. It uses neural networks to assign **numerical values** to the input data, in a way that similar data has similar values. ![The process of using Neural Networks to create vector embeddings](https://qdrant.tech/articles_data/what-are-embeddings/How-Do-Embeddings-Work_.jpg) For example, if I want to make my computer understand the word ‘right’, I can assign a number like 1.3. So when my computer sees 1.3, it sees the word ‘right’. Now I want to make my computer understand the context of the word ‘right’. I can use a two-dimensional vector, such as \[1.3, 0.8\], to represent ‘right’. The first number 1.3 still identifies the word ‘right’, but the second number 0.8 specifies the context. We can introduce more dimensions to capture more nuances. For example, a third dimension could represent formality of the word, a fourth could indicate its emotional connotation (positive, neutral, negative), and so on. The evolution of this concept led to the development of embedding models like [Word2Vec](https://en.wikipedia.org/wiki/Word2vec) and [GloVe](https://en.wikipedia.org/wiki/GloVe). They learn to understand the context in which words appear to generate high-dimensional vectors for each word, capturing far more complex properties. ![How Word2Vec model creates the embeddings for a word](https://qdrant.tech/articles_data/what-are-embeddings/Word2Vec-model.jpg) However, these models still have limitations. They generate a single vector per word, based on its usage across texts. This means all the nuances of the word “right” are blended into one vector representation. That is not enough information for computers to fully understand the context. So, how do we help computers grasp the nuances of language in different contexts? In other words, how do we differentiate between: - “your answer is right” - “turn right at the corner” - “everyone has the right to freedom of speech” Each of these sentences use the word ‘right’, with different meanings. More advanced models like [BERT](https://en.wikipedia.org/wiki/BERT_%28language_model%29) and [GPT](https://en.wikipedia.org/wiki/Generative_pre-trained_transformer) use deep learning models based on the [transformer architecture](https://arxiv.org/abs/1706.03762), which helps computers consider the full context of a word. These models pay attention to the entire context. The model understands the specific use of a word in its **surroundings**, and then creates different embeddings for each. ![How the BERT model creates the embeddings for a word](https://qdrant.tech/articles_data/what-are-embeddings/BERT-model.jpg) But how does this process of understanding and interpreting work in practice? Think of the term: “biophilic design”, for example. To generate its embedding, the transformer architecture can use the following contexts: - “Biophilic design incorporates natural elements into architectural planning.” - “Offices with biophilic design elements report higher employee well-being.” - “…plant life, natural light, and water features are key aspects of biophilic design.” And then it compares contexts to known architectural and design principles: - “Sustainable designs prioritize environmental harmony.” - “Ergonomic spaces enhance user comfort and health.” The model creates a vector embedding for “biophilic design” that encapsulates the concept of integrating natural elements into man-made environments. Augmented with attributes that highlight the correlation between this integration and its positive impact on health, well-being, and environmental sustainability. ### [Anchor](https://qdrant.tech/articles/what-are-embeddings/\#integration-with-embedding-apis) Integration with embedding APIs Selecting the right embedding model for your use case is crucial to your application performance. Qdrant makes it easier by offering seamless integration with the best selection of embedding APIs, including [Cohere](https://qdrant.tech/documentation/embeddings/cohere/), [Gemini](https://qdrant.tech/documentation/embeddings/gemini/), [Jina Embeddings](https://qdrant.tech/documentation/embeddings/jina-embeddings/), [OpenAI](https://qdrant.tech/documentation/embeddings/openai/), [Aleph Alpha](https://qdrant.tech/documentation/embeddings/aleph-alpha/), [Fastembed](https://github.com/qdrant/fastembed), and [AWS Bedrock](https://qdrant.tech/documentation/embeddings/bedrock/). If you’re looking for NLP and rapid prototyping, including language translation, question-answering, and text generation, OpenAI is a great choice. Gemini is ideal for image search, duplicate detection, and clustering tasks. Fastembed, which we’ll use on the example below, is designed for efficiency and speed, great for applications needing low-latency responses, such as autocomplete and instant content recommendations. We plan to go deeper into selecting the best model based on performance, cost, integration ease, and scalability in a future post. ## [Anchor](https://qdrant.tech/articles/what-are-embeddings/\#create-a-neural-search-service-with-fastmbed) Create a neural search service with Fastmbed Now that you’re familiar with the core concepts around vector embeddings, how about start building your own [Neural Search Service](https://qdrant.tech/documentation/tutorials/neural-search/)? Tutorial guides you through a practical application of how to use Qdrant for document management based on descriptions of companies from [startups-list.com](https://www.startups-list.com/). From embedding data, integrating it with Qdrant’s vector database, constructing a search API, and finally deploying your solution with FastAPI. Check out what the final version of this project looks like on the [live online demo](https://qdrant.to/semantic-search-demo). Let us know what you’re building with embeddings! Join our [Discord](https://discord.gg/qdrant-907569970500743200) community and share your projects! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-are-embeddings.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-are-embeddings.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-126-lllmstxt|> ## serverless - [Articles](https://qdrant.tech/articles/) - Serverless Semantic Search [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Serverless Semantic Search Andre Bogus · July 12, 2023 ![Serverless Semantic Search](https://qdrant.tech/articles_data/serverless/preview/title.jpg) Do you want to insert a semantic search function into your website or online app? Now you can do so - without spending any money! In this example, you will learn how to create a free prototype search engine for your own non-commercial purposes. ## [Anchor](https://qdrant.tech/articles/serverless/\#ingredients) Ingredients - A [Rust](https://rust-lang.org/) toolchain - [cargo lambda](https://cargo-lambda.info/) (install via package manager, [download](https://github.com/cargo-lambda/cargo-lambda/releases) binary or `cargo install cargo-lambda`) - The [AWS CLI](https://aws.amazon.com/cli) - Qdrant instance ( [free tier](https://cloud.qdrant.io/) available) - An embedding provider service of your choice (see our [Embeddings docs](https://qdrant.tech/documentation/embeddings/). You may be able to get credits from [AI Grant](https://aigrant.org/), also Cohere has a [rate-limited non-commercial free tier](https://cohere.com/pricing)) - AWS Lambda account (12-month free tier available) ## [Anchor](https://qdrant.tech/articles/serverless/\#what-youre-going-to-build) What you’re going to build You’ll combine the embedding provider and the Qdrant instance to a neat semantic search, calling both services from a small Lambda function. ![lambda integration diagram](https://qdrant.tech/articles_data/serverless/lambda_integration.png) Now lets look at how to work with each ingredient before connecting them. ## [Anchor](https://qdrant.tech/articles/serverless/\#rust-and-cargo-lambda) Rust and cargo-lambda You want your function to be quick, lean and safe, so using Rust is a no-brainer. To compile Rust code for use within Lambda functions, the `cargo-lambda` subcommand has been built. `cargo-lambda` can put your Rust code in a zip file that AWS Lambda can then deploy on a no-frills `provided.al2` runtime. To interface with AWS Lambda, you will need a Rust project with the following dependencies in your `Cargo.toml`: ```toml [dependencies] tokio = { version = "1", features = ["macros"] } lambda_http = { version = "0.8", default-features = false, features = ["apigw_http"] } lambda_runtime = "0.8" ``` This gives you an interface consisting of an entry point to start the Lambda runtime and a way to register your handler for HTTP calls. Put the following snippet into `src/helloworld.rs`: ```rust use lambda_http::{run, service_fn, Body, Error, Request, RequestExt, Response}; /// This is your callback function for responding to requests at your URL async fn function_handler(_req: Request) -> Result, Error> { Response::from_text("Hello, Lambda!") } #[tokio::main] async fn main() { run(service_fn(function_handler)).await } ``` You can also use a closure to bind other arguments to your function handler (the `service_fn` call then becomes `service_fn(|req| function_handler(req, ...))`). Also if you want to extract parameters from the request, you can do so using the [Request](https://docs.rs/lambda_http/latest/lambda_http/type.Request.html) methods (e.g. `query_string_parameters` or `query_string_parameters_ref`). Add the following to your `Cargo.toml` to define the binary: ```toml [[bin]] name = "helloworld" path = "src/helloworld.rs" ``` On the AWS side, you need to setup a Lambda and IAM role to use with your function. ![create lambda web page](https://qdrant.tech/articles_data/serverless/create_lambda.png) Choose your function name, select “Provide your own bootstrap on Amazon Linux 2”. As architecture, use `arm64`. You will also activate a function URL. Here it is up to you if you want to protect it via IAM or leave it open, but be aware that open end points can be accessed by anyone, potentially costing money if there is too much traffic. By default, this will also create a basic role. To look up the role, you can go into the Function overview: ![function overview](https://qdrant.tech/articles_data/serverless/lambda_overview.png) Click on the “Info” link near the “▸ Function overview” heading, and select the “Permissions” tab on the left. You will find the “Role name” directly under _Execution role_. Note it down for later. ![function overview](https://qdrant.tech/articles_data/serverless/lambda_role.png) To test that your “Hello, Lambda” service works, you can compile and upload the function: ```bash $ export LAMBDA_FUNCTION_NAME=hello $ export LAMBDA_ROLE= $ export LAMBDA_REGION=us-east-1 $ cargo lambda build --release --arm --bin helloworld --output-format zip Downloaded libc v0.2.137 --- # [..] output omitted for brevity Finished release [optimized] target(s) in 1m 27s $ # Delete the old empty definition $ aws lambda delete-function-url-config --region $LAMBDA_REGION --function-name $LAMBDA_FUNCTION_NAME $ aws lambda delete-function --region $LAMBDA_REGION --function-name $LAMBDA_FUNCTION_NAME $ # Upload the function $ aws lambda create-function --function-name $LAMBDA_FUNCTION_NAME \ --handler bootstrap \ --architectures arm64 \ --zip-file fileb://./target/lambda/helloworld/bootstrap.zip \ --runtime provided.al2 \ --region $LAMBDA_REGION \ --role $LAMBDA_ROLE \ --tracing-config Mode=Active $ # Add the function URL $ aws lambda add-permission \ --function-name $LAMBDA_FUNCTION_NAME \ --action lambda:InvokeFunctionUrl \ --principal "*" \ --function-url-auth-type "NONE" \ --region $LAMBDA_REGION \ --statement-id url $ # Here for simplicity unauthenticated URL access. Beware! $ aws lambda create-function-url-config \ --function-name $LAMBDA_FUNCTION_NAME \ --region $LAMBDA_REGION \ --cors "AllowOrigins=*,AllowMethods=*,AllowHeaders=*" \ --auth-type NONE ``` Now you can go to your _Function Overview_ and click on the Function URL. You should see something like this: ```text Hello, Lambda! ``` Bearer ! You have set up a Lambda function in Rust. On to the next ingredient: ## [Anchor](https://qdrant.tech/articles/serverless/\#embedding) Embedding Most providers supply a simple https GET or POST interface you can use with an API key, which you have to supply in an authentication header. If you are using this for non-commercial purposes, the rate limited trial key from Cohere is just a few clicks away. Go to [their welcome page](https://dashboard.cohere.ai/welcome/register), register and you’ll be able to get to the dashboard, which has an “API keys” menu entry which will bring you to the following page: [cohere dashboard](https://qdrant.tech/articles_data/serverless/cohere-dashboard.png) From there you can click on the ⎘ symbol next to your API key to copy it to the clipboard. _Don’t put your API key in the code!_ Instead read it from an env variable you can set in the lambda environment. This avoids accidentally putting your key into a public repo. Now all you need to get embeddings is a bit of code. First you need to extend your dependencies with `reqwest` and also add `anyhow` for easier error handling: ```toml anyhow = "1.0" reqwest = { version = "0.11.18", default-features = false, features = ["json", "rustls-tls"] } serde = "1.0" ``` Now given the API key from above, you can make a call to get the embedding vectors: ```rust use anyhow::Result; use serde::Deserialize; use reqwest::Client; #[derive(Deserialize)] struct CohereResponse { outputs: Vec> } pub async fn embed(client: &Client, text: &str, api_key: &str) -> Result>> { let CohereResponse { outputs } = client .post("https://api.cohere.ai/embed") .header("Authorization", &format!("Bearer {api_key}")) .header("Content-Type", "application/json") .header("Cohere-Version", "2021-11-08") .body(format!("{{\"text\":[\"{text}\"],\"model\":\"small\"}}")) .send() .await? .json() .await?; Ok(outputs) } ``` Note that this may return multiple vectors if the text overflows the input dimensions. Cohere’s `small` model has 1024 output dimensions. Other providers have similar interfaces. Consult our [Embeddings docs](https://qdrant.tech/documentation/embeddings/) for further information. See how little code it took to get the embedding? While you’re at it, it’s a good idea to write a small test to check if embedding works and the vectors are of the expected size: ```rust #[tokio::test] async fn check_embedding() { // ignore this test if API_KEY isn't set let Ok(api_key) = &std::env::var("API_KEY") else { return; } let embedding = crate::embed("What is semantic search?", api_key).unwrap()[0]; // Cohere's `small` model has 1024 output dimensions. assert_eq!(1024, embedding.len()); } ``` Run this while setting the `API_KEY` environment variable to check if the embedding works. ## [Anchor](https://qdrant.tech/articles/serverless/\#qdrant-search) Qdrant search Now that you have embeddings, it’s time to put them into your Qdrant. You could of course use `curl` or `python` to set up your collection and upload the points, but as you already have Rust including some code to obtain the embeddings, you can stay in Rust, adding `qdrant-client` to the mix. ```rust use anyhow::Result; use qdrant_client::prelude::*; use qdrant_client::qdrant::{VectorsConfig, VectorParams}; use qdrant_client::qdrant::vectors_config::Config; use std::collections::HashMap; fn setup<'i>( embed_client: &reqwest::Client, embed_api_key: &str, qdrant_url: &str, api_key: Option<&str>, collection_name: &str, data: impl Iterator)>, ) -> Result<()> { let mut config = QdrantClientConfig::from_url(qdrant_url); config.api_key = api_key; let client = QdrantClient::new(Some(config))?; // create the collections if !client.has_collection(collection_name).await? { client .create_collection(&CreateCollection { collection_name: collection_name.into(), vectors_config: Some(VectorsConfig { config: Some(Config::Params(VectorParams { size: 1024, // output dimensions from above distance: Distance::Cosine as i32, ..Default::default() })), }), ..Default::default() }) .await?; } let mut id_counter = 0_u64; let points = data.map(|(text, payload)| { let id = std::mem::replace(&mut id_counter, *id_counter + 1); let vectors = Some(embed(embed_client, text, embed_api_key).unwrap()); PointStruct { id, vectors, payload } }).collect(); client.upsert_points(collection_name, points, None).await?; Ok(()) } ``` Depending on whether you want to efficiently filter the data, you can also add some indexes. I’m leaving this out for brevity. Also this does not implement chunking (splitting the data to upsert in multiple requests, which avoids timeout errors). Add a suitable `main` method and you can run this code to insert the points (or just use the binary from the example). Be sure to include the port in the `qdrant_url`. Now that you have the points inserted, you can search them by embedding: ```rust use anyhow::Result; use qdrant_client::prelude::*; pub async fn search( text: &str, collection_name: String, client: &Client, api_key: &str, qdrant: &QdrantClient, ) -> Result> { Ok(qdrant.search_points(&SearchPoints { collection_name, limit: 5, // use what fits your use case here with_payload: Some(true.into()), vector: embed(client, text, api_key)?, ..Default::default() }).await?.result) } ``` You can also filter by adding a `filter: ...` field to the `SearchPoints`, and you will likely want to process the result further, but the example code already does that, so feel free to start from there in case you need this functionality. ## [Anchor](https://qdrant.tech/articles/serverless/\#putting-it-all-together) Putting it all together Now that you have all the parts, it’s time to join them up. Now copying and wiring up the snippets above is left as an exercise to the reader. You’ll want to extend the `main` method a bit to connect with the Client once at the start, also get API keys from the environment so you don’t need to compile them into the code. To do that, you can get them with `std::env::var(_)` from the rust code and set the environment from the AWS console. ```bash $ export QDRANT_URI= $ export QDRANT_API_KEY= $ export COHERE_API_KEY= $ export COLLECTION_NAME=site-cohere $ aws lambda update-function-configuration \ --function-name $LAMBDA_FUNCTION_NAME \ --environment "Variables={QDRANT_URI=$QDRANT_URI,\ QDRANT_API_KEY=$QDRANT_API_KEY,COHERE_API_KEY=${COHERE_API_KEY},\ COLLECTION_NAME=${COLLECTION_NAME}"` ``` In any event, you will arrive at one command line program to insert your data and one Lambda function. The former can just be `cargo run` to set up the collection. For the latter, you can again call `cargo lambda` and the AWS console: ```bash $ export LAMBDA_FUNCTION_NAME=search $ export LAMBDA_REGION=us-east-1 $ cargo lambda build --release --arm --output-format zip Downloaded libc v0.2.137 --- # [..] output omitted for brevity Finished release [optimized] target(s) in 1m 27s $ # Update the function $ aws lambda update-function-code --function-name $LAMBDA_FUNCTION_NAME \ --zip-file fileb://./target/lambda/page-search/bootstrap.zip \ --region $LAMBDA_REGION ``` ## [Anchor](https://qdrant.tech/articles/serverless/\#discussion) Discussion Lambda works by spinning up your function once the URL is called, so they don’t need to keep the compute on hand unless it is actually used. This means that the first call will be burdened by some 1-2 seconds of latency for loading the function, later calls will resolve faster. Of course, there is also the latency for calling the embeddings provider and Qdrant. On the other hand, the free tier doesn’t cost a thing, so you certainly get what you pay for. And for many use cases, a result within one or two seconds is acceptable. Rust minimizes the overhead for the function, both in terms of file size and runtime. Using an embedding service means you don’t need to care about the details. Knowing the URL, API key and embedding size is sufficient. Finally, with free tiers for both Lambda and Qdrant as well as free credits for the embedding provider, the only cost is your time to set everything up. Who could argue with free? ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/serverless.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/serverless.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-127-lllmstxt|> ## rag-chatbot-vultr-dspy-ollama - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Private RAG Information Extraction Engine --- # [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#private-rag-information-extraction-engine) Private RAG Information Extraction Engine | Time: 90 min | Level: Advanced | | | | --- | --- | --- | --- | Handling private documents is a common task in many industries. Various businesses possess a large amount of unstructured data stored as huge files that must be processed and analyzed. Industry reports, financial analysis, legal documents, and many other documents are stored in PDF, Word, and other formats. Conversational chatbots built on top of RAG pipelines are one of the viable solutions for finding the relevant answers in such documents. However, if we want to extract structured information from these documents, and pass them to downstream systems, we need to use a different approach. Information extraction is a process of structuring unstructured data into a format that can be easily processed by machines. In this tutorial, we will show you how to use [DSPy](https://dspy-docs.vercel.app/) to perform that process on a set of documents. Assuming we cannot send our data to an external service, we will use [Ollama](https://ollama.com/) to run our own LLM model on our premises, using [Vultr](https://www.vultr.com/) as a cloud provider. Qdrant, acting in this setup as a knowledge base providing the relevant pieces of documents for a given query, will also be hosted in the Hybrid Cloud mode on Vultr. The last missing piece, the DSPy application will be also running in the same environment. If you work in a regulated industry, or just need to keep your data private, this tutorial is for you. ![Architecture diagram](https://qdrant.tech/documentation/examples/information-extraction-ollama-vultr/architecture-diagram.png) ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#deploying-qdrant-hybrid-cloud-on-vultr) Deploying Qdrant Hybrid Cloud on Vultr All the services we are going to use in this tutorial will be running on [Vultr Kubernetes\\ Engine](https://www.vultr.com/kubernetes/). That gives us a lot of flexibility in terms of scaling and managing the resources. Vultr manages the control plane and worker nodes and provides integration with other managed services such as Load Balancers, Block Storage, and DNS. 1. To start using managed Kubernetes on Vultr, follow the [platform-specific documentation](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/#vultr). 2. Once your Kubernetes clusters are up, [you can begin deploying Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/). ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#installing-the-necessary-packages) Installing the necessary packages We are going to need a couple of Python packages to run our application. They might be installed together with the `dspy-ai` package and `qdrant` extra: ```shell pip install dspy-ai dspy-qdrant ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#qdrant-hybrid-cloud) Qdrant Hybrid Cloud Our [documentation](https://qdrant.tech/documentation/hybrid-cloud/) contains a comprehensive guide on how to set up Qdrant in the Hybrid Cloud mode on Vultr. Please follow it carefully to get your Qdrant instance up and running. Once it’s done, we need to store the Qdrant URL and the API key in the environment variables. You can do it by running the following commands: shellpython ```shell export QDRANT_URL="https://qdrant.example.com" export QDRANT_API_KEY="your-api-key" ``` ```python import os os.environ["QDRANT_URL"] = "https://qdrant.example.com" os.environ["QDRANT_API_KEY"] = "your-api-key" ``` DSPy is framework we are going to use. It’s integrated with Qdrant already, but it assumes you use [FastEmbed](https://qdrant.github.io/fastembed/) to create the embeddings. DSPy does not provide a way to index the data, but leaves this task to the user. We are going to create a collection on our own, and fill it with the embeddings of our document chunks. #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#data-indexing) Data indexing FastEmbed uses the `BAAI/bge-small-en` as the default embedding model. We are going to use it as well. Our collection will be created automatically if we call the `.add` method on an existing `QdrantClient` instance. In this tutorial we are not going to focus much on the document parsing, as there are plenty of tools that can help with that. The [`unstructured`](https://github.com/Unstructured-IO/unstructured) library is one of the options you can launch on your infrastructure. In our simplified example, we are going to use a list of strings as our documents. These are the descriptions of the made up technical events. Each of them should contain the name of the event along with the location and start and end dates. ```python documents = [\ "Taking place in San Francisco, USA, from the 10th to the 12th of June, 2024, the Global Developers Conference is the annual gathering spot for developers worldwide, offering insights into software engineering, web development, and mobile applications.",\ "The AI Innovations Summit, scheduled for 15-17 September 2024 in London, UK, aims at professionals and researchers advancing artificial intelligence and machine learning.",\ "Berlin, Germany will host the CyberSecurity World Conference between November 5th and 7th, 2024, serving as a key forum for cybersecurity professionals to exchange strategies and research on threat detection and mitigation.",\ "Data Science Connect in New York City, USA, occurring from August 22nd to 24th, 2024, connects data scientists, analysts, and engineers to discuss data science's innovative methodologies, tools, and applications.",\ "Set for July 14-16, 2024, in Tokyo, Japan, the Frontend Developers Fest invites developers to delve into the future of UI/UX design, web performance, and modern JavaScript frameworks.",\ "The Blockchain Expo Global, happening May 20-22, 2024, in Dubai, UAE, focuses on blockchain technology's applications, opportunities, and challenges for entrepreneurs, developers, and investors.",\ "Singapore's Cloud Computing Summit, scheduled for October 3-5, 2024, is where IT professionals and cloud experts will convene to discuss strategies, architectures, and cloud solutions.",\ "The IoT World Forum, taking place in Barcelona, Spain from December 1st to 3rd, 2024, is the premier conference for those focused on the Internet of Things, from smart cities to IoT security.",\ "Los Angeles, USA, will become the hub for game developers, designers, and enthusiasts at the Game Developers Arcade, running from April 18th to 20th, 2024, to showcase new games and discuss development tools.",\ "The TechWomen Summit in Sydney, Australia, from March 8-10, 2024, aims to empower women in tech with workshops, keynotes, and networking opportunities.",\ "Seoul, South Korea's Mobile Tech Conference, happening from September 29th to October 1st, 2024, will explore the future of mobile technology, including 5G networks and app development trends.",\ "The Open Source Summit, to be held in Helsinki, Finland from August 11th to 13th, 2024, celebrates open source technologies and communities, offering insights into the latest software and collaboration techniques.",\ "Vancouver, Canada will play host to the VR/AR Innovation Conference from June 20th to 22nd, 2024, focusing on the latest in virtual and augmented reality technologies.",\ "Scheduled for May 5-7, 2024, in London, UK, the Fintech Leaders Forum brings together experts to discuss the future of finance, including innovations in blockchain, digital currencies, and payment technologies.",\ "The Digital Marketing Summit, set for April 25-27, 2024, in New York City, USA, is designed for marketing professionals and strategists to discuss digital marketing and social media trends.",\ "EcoTech Symposium in Paris, France, unfolds over 2024-10-09 to 2024-10-11, spotlighting sustainable technologies and green innovations for environmental scientists, tech entrepreneurs, and policy makers.",\ "Set in Tokyo, Japan, from 16th to 18th May '24, the Robotic Innovations Conference showcases automation, robotics, and AI-driven solutions, appealing to enthusiasts and engineers.",\ "The Software Architecture World Forum in Dublin, Ireland, occurring 22-24 Sept 2024, gathers software architects and IT managers to discuss modern architecture patterns.",\ "Quantum Computing Summit, convening in Silicon Valley, USA from 2024/11/12 to 2024/11/14, is a rendezvous for exploring quantum computing advancements with physicists and technologists.",\ "From March 3 to 5, 2024, the Global EdTech Conference in London, UK, discusses the intersection of education and technology, featuring e-learning and digital classrooms.",\ "Bangalore, India's NextGen DevOps Days, from 28 to 30 August 2024, is a hotspot for IT professionals keen on the latest DevOps tools and innovations.",\ "The UX/UI Design Conference, slated for April 21-23, 2024, in New York City, USA, invites discussions on the latest in user experience and interface design among designers and developers.",\ "Big Data Analytics Summit, taking place 2024 July 10-12 in Amsterdam, Netherlands, brings together data professionals to delve into big data analysis and insights.",\ "Toronto, Canada, will see the HealthTech Innovation Forum from June 8 to 10, '24, focusing on technology's impact on healthcare with professionals and innovators.",\ "Blockchain for Business Summit, happening in Singapore from 2024-05-02 to 2024-05-04, focuses on blockchain's business applications, from finance to supply chain.",\ "Las Vegas, USA hosts the Global Gaming Expo from October 18th to 20th, 2024, a premiere event for game developers, publishers, and enthusiasts.",\ "The Renewable Energy Tech Conference in Copenhagen, Denmark, from 2024/09/05 to 2024/09/07, discusses renewable energy innovations and policies.",\ "Set for 2024 Apr 9-11 in Boston, USA, the Artificial Intelligence in Healthcare Summit gathers healthcare professionals to discuss AI's healthcare applications.",\ "Nordic Software Engineers Conference, happening in Stockholm, Sweden from June 15 to 17, 2024, focuses on software development in the Nordic region.",\ "The International Space Exploration Symposium, scheduled in Houston, USA from 2024-08-05 to 2024-08-07, invites discussions on space exploration technologies and missions."\ ] ``` We’ll be able to ask general questions, for example, about topics we are interested in or events happening in a specific location, but expect the results to be returned in a structured format. ![An example of extracted information](https://qdrant.tech/documentation/examples/information-extraction-ollama-vultr/extracted-information.png) Indexing in Qdrant is a single call if we have the documents defined: ```python client.add( collection_name="document-parts", documents=documents, metadata=[{"document": document} for document in documents], ) ``` Our collection is ready to be queried. We can now move to the next step, which is setting up the Ollama model. ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#ollama-on-vultr) Ollama on Vultr Ollama is a great tool for running the LLM models on your own infrastructure. It’s designed to be lightweight and easy to use, and [an official Docker image](https://hub.docker.com/r/ollama/ollama) is available. We can use it to run Ollama on our Vultr Kubernetes cluster. In case of LLMs we may have some special requirements, like a GPU, and Vultr provides the [Vultr Kubernetes Engine for Cloud GPU](https://www.vultr.com/products/cloud-gpu/) so the model can be run on a specialized machine. Please refer to the official documentation to get Ollama up and running within your environment. Once it’s done, we need to store the Ollama URL in the environment variable: shellpython ```shell export OLLAMA_URL="https://ollama.example.com" ``` ```python os.environ["OLLAMA_URL"] = "https://ollama.example.com" ``` We will refer to this URL later on when configuring the Ollama model in our application. #### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#setting-up-the-large-language-model) Setting up the Large Language Model We are going to use one of the lightweight LLMs available in Ollama, a `gemma:2b` model. It was developed by Google DeepMind team and has 3B parameters. The [Ollama version](https://ollama.com/library/gemma:2b) uses 4-bit quantization. Installing the model is as simple as running the following command on the machine where Ollama is running: ```shell ollama run gemma:2b ``` Ollama models are also integrated with DSPy, so we can use them directly in our application. ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#implementing-the-information-extraction-pipeline) Implementing the information extraction pipeline DSPy is a bit different from the other LLM frameworks. It’s designed to optimize the prompts and weights of LMs in a pipeline. It’s a bit like a compiler for LMs: you write a pipeline in a high-level language, and DSPy generates the prompts and weights for you. This means you can build complex systems without having to worry about the details of how to prompt your LMs, as DSPy will do that for you. It is somehow similar to PyTorch but for LLMs. First of all, we will define the Language Model we are going to use: ```python import dspy gemma_model = dspy.OllamaLocal( model="gemma:2b", base_url=os.environ.get("OLLAMA_URL"), max_tokens=500, ) ``` Similarly, we have to define connection to our Qdrant Hybrid Cloud cluster: ```python from dspy_qdrant import QdrantRM from qdrant_client import QdrantClient, models client = QdrantClient( os.environ.get("QDRANT_URL"), api_key=os.environ.get("QDRANT_API_KEY"), ) qdrant_retriever = QdrantRM( qdrant_collection_name="document-parts", qdrant_client=client, ) ``` Finally, both components have to be configured in DSPy with a simple call to one of the functions: ```python dspy.configure(lm=gemma_model, rm=qdrant_retriever) ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#application-logic) Application logic There is a concept of signatures which defines input and output formats of the pipeline. We are going to define a simple signature for the event: ```python class Event(dspy.Signature): description = dspy.InputField( desc="Textual description of the event, including name, location and dates" ) event_name = dspy.OutputField(desc="Name of the event") location = dspy.OutputField(desc="Location of the event") start_date = dspy.OutputField(desc="Start date of the event, YYYY-MM-DD") end_date = dspy.OutputField(desc="End date of the event, YYYY-MM-DD") ``` It is designed to derive the structured information from the textual description of the event. Now, we can build our module that will use it, along with Qdrant and Ollama model. Let’s call it `EventExtractor`: ```python class EventExtractor(dspy.Module): def __init__(self): super().__init__() # Retrieve module to get relevant documents self.retriever = dspy.Retrieve(k=3) # Predict module for the created signature self.predict = dspy.Predict(Event) def forward(self, query: str): # Retrieve the most relevant documents results = self.retriever.forward(query) # Try to extract events from the retrieved documents events = [] for document in results.passages: event = self.predict(description=document) events.append(event) return events ``` The logic is simple: we retrieve the most relevant documents from Qdrant, and then try to extract the structured information from them using the `Event` signature. We can simply call it and see the results: ```python extractor = EventExtractor() extractor.forward("Blockchain events close to Europe") ``` Output: ```python [\ Prediction(\ event_name='Event Name: Blockchain Expo Global',\ location='Dubai, UAE',\ start_date='2024-05-20',\ end_date='2024-05-22'\ ),\ Prediction(\ event_name='Event Name: Blockchain for Business Summit',\ location='Singapore',\ start_date='2024-05-02',\ end_date='2024-05-04'\ ),\ Prediction(\ event_name='Event Name: Open Source Summit',\ location='Helsinki, Finland',\ start_date='2024-08-11',\ end_date='2024-08-13'\ )\ ] ``` The task was solved successfully, even without any optimization. However, each of the events has the “Event Name: " prefix that we might want to remove. DSPy allows optimizing the module, so we can improve the results. Optimization might be done in different ways, and it’s [well covered in the DSPy\\ documentation](https://dspy.ai/learn/optimization/optimizers/). We are not going to go through the optimization process in this tutorial. However, we encourage you to experiment with it, as it might significantly improve the performance of your pipeline. Created module might be easily stored on a specific path, and loaded later on: ```python extractor.save("event_extractor") ``` To load, just create an instance of the module and call the `load` method: ```python second_extractor = EventExtractor() second_extractor.load("event_extractor") ``` This is especially useful when you optimize the module, as the optimized version might be stored and loaded later on without redoing the optimization process each time you run the application. ### [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#deploying-the-extraction-pipeline) Deploying the extraction pipeline Vultr gives us a lot of flexibility in terms of deploying the applications. Perfectly, we would use the Kubernetes cluster we set up earlier to run it. The deployment is as simple as running any other Python application. This time we don’t need a GPU, as Ollama is already running on a separate machine, and DSPy just interacts with it. ## [Anchor](https://qdrant.tech/documentation/examples/rag-chatbot-vultr-dspy-ollama/\#wrapping-up) Wrapping up In this tutorial, we showed you how to set up a private environment for information extraction using DSPy, Ollama, and Qdrant. All the components might be securely hosted on the Vultr cloud, giving you full control over your data. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-chatbot-vultr-dspy-ollama.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-chatbot-vultr-dspy-ollama.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-128-lllmstxt|> ## benchmarks-intro --- # How vector search should be benchmarked? January 01, 0001 --- # [Anchor](https://qdrant.tech/benchmarks/benchmarks-intro/\#benchmarking-vector-databases) Benchmarking Vector Databases At Qdrant, performance is the top-most priority. We always make sure that we use system resources efficiently so you get the **fastest and most accurate results at the cheapest cloud costs**. So all of our decisions from [choosing Rust](https://qdrant.tech/articles/why-rust/), [io optimisations](https://qdrant.tech/articles/io_uring/), [serverless support](https://qdrant.tech/articles/serverless/), [binary quantization](https://qdrant.tech/articles/binary-quantization/), to our [fastembed library](https://qdrant.tech/articles/fastembed/) are all based on our principle. In this article, we will compare how Qdrant performs against the other vector search engines. Here are the principles we followed while designing these benchmarks: - We do comparative benchmarks, which means we focus on **relative numbers** rather than absolute numbers. - We use affordable hardware, so that you can reproduce the results easily. - We run benchmarks on the same exact machines to avoid any possible hardware bias. - All the benchmarks are [open-sourced](https://github.com/qdrant/vector-db-benchmark), so you can contribute and improve them. Scenarios we tested 1. Upload & Search benchmark on single node [Benchmark](https://qdrant.tech/benchmarks/single-node-speed-benchmark/) 2. Filtered search benchmark - [Benchmark](https://qdrant.tech/benchmarks/#filtered-search-benchmark) 3. Memory consumption benchmark - Coming soon 4. Cluster mode benchmark - Coming soon Some of our experiment design decisions are described in the [F.A.Q Section](https://qdrant.tech/benchmarks/#benchmarks-faq). Reach out to us on our [Discord channel](https://qdrant.to/discord) if you want to discuss anything related Qdrant or these benchmarks. Share this article [x](https://twitter.com/intent/tweet?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fbenchmarks-intro%2F&text=How%20vector%20search%20should%20be%20benchmarked? "x")[LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fbenchmarks-intro%2F "LinkedIn") Up! <|page-129-lllmstxt|> ## rag-and-genai - [Articles](https://qdrant.tech/articles/) - RAG & GenAI #### RAG & GenAI Leverage Qdrant for Retrieval-Augmented Generation (RAG) and build AI Agents [![Preview](https://qdrant.tech/articles_data/agentic-rag/preview/preview.jpg)\\ **What is Agentic RAG? Building Agents with Qdrant** \\ Agents are a new paradigm in AI, and they are changing how we build RAG systems. Learn how to build agents with Qdrant and which framework to choose.\\ \\ Kacper Łukawski\\ \\ November 22, 2024](https://qdrant.tech/articles/agentic-rag/)[![Preview](https://qdrant.tech/articles_data/rapid-rag-optimization-with-qdrant-and-quotient/preview/preview.jpg)\\ **Optimizing RAG Through an Evaluation-Based Methodology** \\ Learn how Qdrant-powered RAG applications can be tested and iteratively improved using LLM evaluation tools like Quotient.\\ \\ Atita Arora\\ \\ June 12, 2024](https://qdrant.tech/articles/rapid-rag-optimization-with-qdrant-and-quotient/)[![Preview](https://qdrant.tech/articles_data/semantic-cache-ai-data-retrieval/preview/preview.jpg)\\ **Semantic Cache: Accelerating AI with Lightning-Fast Data Retrieval** \\ Semantic cache is reshaping AI applications by enabling rapid data retrieval. Discover how its implementation benefits your RAG setup.\\ \\ Daniel Romero, David Myriel\\ \\ May 07, 2024](https://qdrant.tech/articles/semantic-cache-ai-data-retrieval/)[![Preview](https://qdrant.tech/articles_data/what-is-rag-in-ai/preview/preview.jpg)\\ **What is RAG: Understanding Retrieval-Augmented Generation** \\ Explore how RAG enables LLMs to retrieve and utilize relevant external data when generating responses, rather than being limited to their original training data alone.\\ \\ Sabrina Aquino\\ \\ March 19, 2024](https://qdrant.tech/articles/what-is-rag-in-ai/)[![Preview](https://qdrant.tech/articles_data/rag-is-dead/preview/preview.jpg)\\ **Is RAG Dead? The Role of Vector Databases in Vector Search \| Qdrant** \\ Uncover the necessity of vector databases for RAG and learn how Qdrant's vector database empowers enterprise AI with unmatched accuracy and cost-effectiveness.\\ \\ David Myriel\\ \\ February 27, 2024](https://qdrant.tech/articles/rag-is-dead/) × [Powered by](https://qdrant.tech/) <|page-130-lllmstxt|> ## security - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Security --- # [Anchor](https://qdrant.tech/documentation/guides/security/\#security) Security Please read this page carefully. Although there are various ways to secure your Qdrant instances, **they are unsecured by default**. You need to enable security measures before production use. Otherwise, they are completely open to anyone ## [Anchor](https://qdrant.tech/documentation/guides/security/\#authentication) Authentication _Available as of v1.2.0_ Qdrant supports a simple form of client authentication using a static API key. This can be used to secure your instance. To enable API key based authentication in your own Qdrant instance you must specify a key in the configuration: ```yaml service: # Set an api-key. # If set, all requests must include a header with the api-key. # example header: `api-key: ` # # If you enable this you should also enable TLS. # (Either above or via an external service like nginx.) # Sending an api-key over an unencrypted channel is insecure. api_key: your_secret_api_key_here ``` Or alternatively, you can use the environment variable: ```bash docker run -p 6333:6333 \ -e QDRANT__SERVICE__API_KEY=your_secret_api_key_here \ qdrant/qdrant ``` For using API key based authentication in Qdrant Cloud see the cloud [Authentication](https://qdrant.tech/documentation/cloud/authentication/) section. The API key then needs to be present in all REST or gRPC requests to your instance. All official Qdrant clients for Python, Go, Rust, .NET and Java support the API key parameter. bashpythontypescriptrustjavacsharpgo ```bash curl \ -X GET https://localhost:6333 \ --header 'api-key: your_secret_api_key_here' ``` ```python from qdrant_client import QdrantClient client = QdrantClient( url="https://localhost:6333", api_key="your_secret_api_key_here", ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ url: "http://localhost", port: 6333, apiKey: "your_secret_api_key_here", }); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("https://xyz-example.eu-central.aws.cloud.qdrant.io:6334") .api_key("") .build()?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder( "xyz-example.eu-central.aws.cloud.qdrant.io", 6334, true) .withApiKey("") .build()); ``` ```csharp using Qdrant.Client; var client = new QdrantClient( host: "xyz-example.eu-central.aws.cloud.qdrant.io", https: true, apiKey: "" ); ``` ```go import "github.com/qdrant/go-client/qdrant" client, err := qdrant.NewClient(&qdrant.Config{ Host: "xyz-example.eu-central.aws.cloud.qdrant.io", Port: 6334, APIKey: "", UseTLS: true, }) ``` ### [Anchor](https://qdrant.tech/documentation/guides/security/\#read-only-api-key) Read-only API key _Available as of v1.7.0_ In addition to the regular API key, Qdrant also supports a read-only API key. This key can be used to access read-only operations on the instance. ```yaml service: read_only_api_key: your_secret_read_only_api_key_here ``` Or with the environment variable: ```bash export QDRANT__SERVICE__READ_ONLY_API_KEY=your_secret_read_only_api_key_here ``` Both API keys can be used simultaneously. ### [Anchor](https://qdrant.tech/documentation/guides/security/\#granular-access-control-with-jwt) Granular access control with JWT _Available as of v1.9.0_ For more complex cases, Qdrant supports granular access control with [JSON Web Tokens (JWT)](https://jwt.io/). This allows you to create tokens which restrict access to data stored in your cluster, and build [Role-based access control (RBAC)](https://en.wikipedia.org/wiki/Role-based_access_control) on top of that. In this way, you can define permissions for users and restrict access to sensitive endpoints. To enable JWT-based authentication in your own Qdrant instance you need to specify the `api-key` and enable the `jwt_rbac` feature in the configuration: ```yaml service: api_key: you_secret_api_key_here jwt_rbac: true ``` Or with the environment variables: ```bash export QDRANT__SERVICE__API_KEY=your_secret_api_key_here export QDRANT__SERVICE__JWT_RBAC=true ``` The `api_key` you set in the configuration will be used to encode and decode the JWTs, so –needless to say– keep it secure. If your `api_key` changes, all existing tokens will be invalid. To use JWT-based authentication, you need to provide it as a bearer token in the `Authorization` header, or as an key in the `Api-Key` header of your requests. httppythontypescriptrustjavacsharpgo ```http Authorization: Bearer // or Api-Key: ``` ```python from qdrant_client import QdrantClient qdrant_client = QdrantClient( "xyz-example.eu-central.aws.cloud.qdrant.io", api_key="", ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "xyz-example.eu-central.aws.cloud.qdrant.io", apiKey: "", }); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("https://xyz-example.eu-central.aws.cloud.qdrant.io:6334") .api_key("") .build()?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; QdrantClient client = new QdrantClient( QdrantGrpcClient.newBuilder( "xyz-example.eu-central.aws.cloud.qdrant.io", 6334, true) .withApiKey("") .build()); ``` ```csharp using Qdrant.Client; var client = new QdrantClient( host: "xyz-example.eu-central.aws.cloud.qdrant.io", https: true, apiKey: "" ); ``` ```go import "github.com/qdrant/go-client/qdrant" client, err := qdrant.NewClient(&qdrant.Config{ Host: "xyz-example.eu-central.aws.cloud.qdrant.io", Port: 6334, APIKey: "", UseTLS: true, }) ``` #### [Anchor](https://qdrant.tech/documentation/guides/security/\#generating-json-web-tokens) Generating JSON Web Tokens Due to the nature of JWT, anyone who knows the `api_key` can generate tokens by using any of the existing libraries and tools, it is not necessary for them to have access to the Qdrant instance to generate them. For convenience, we have added a JWT generation tool the Qdrant Web UI under the 🔑 tab, if you’re using the default url, it will be at `http://localhost:6333/dashboard#/jwt`. - **JWT Header** \- Qdrant uses the `HS256` algorithm to decode the tokens. ```json { "alg": "HS256", "typ": "JWT" } ``` - **JWT Payload** \- You can include any combination of the [parameters available](https://qdrant.tech/documentation/guides/security/#jwt-configuration) in the payload. Keep reading for more info on each one. ```json { "exp": 1640995200, // Expiration time "value_exists": ..., // Validate this token by looking for a point with a payload value "access": "r", // Define the access level. } ``` **Signing the token** \- To confirm that the generated token is valid, it needs to be signed with the `api_key` you have set in the configuration. That would mean, that someone who knows the `api_key` gives the authorization for the new token to be used in the Qdrant instance. Qdrant can validate the signature, because it knows the `api_key` and can decode the token. The process of token generation can be done on the client side offline, and doesn’t require any communication with the Qdrant instance. Here is an example of libraries that can be used to generate JWT tokens: - Python: [PyJWT](https://pyjwt.readthedocs.io/en/stable/) - JavaScript: [jsonwebtoken](https://www.npmjs.com/package/jsonwebtoken) - Rust: [jsonwebtoken](https://crates.io/crates/jsonwebtoken) #### [Anchor](https://qdrant.tech/documentation/guides/security/\#jwt-configuration) JWT Configuration These are the available options, or **claims** in the JWT lingo. You can use them in the JWT payload to define its functionality. - **`exp`** \- The expiration time of the token. This is a Unix timestamp in seconds. The token will be invalid after this time. The check for this claim includes a 30-second leeway to account for clock skew. ```json { "exp": 1640995200, // Expiration time } ``` - **`value_exists`** \- This is a claim that can be used to validate the token against the data stored in a collection. Structure of this claim is as follows: ```json { "value_exists": { "collection": "my_validation_collection", "matches": [\ { "key": "my_key", "value": "value_that_must_exist" }\ ], }, } ``` If this claim is present, Qdrant will check if there is a point in the collection with the specified key-values. If it does, the token is valid. This claim is especially useful if you want to have an ability to revoke tokens without changing the `api_key`. Consider a case where you have a collection of users, and you want to revoke access to a specific user. ```json { "value_exists": { "collection": "users", "matches": [\ { "key": "user_id", "value": "andrey" },\ { "key": "role", "value": "manager" }\ ], }, } ``` You can create a token with this claim, and when you want to revoke access, you can change the `role` of the user to something else, and the token will be invalid. - **`access`** \- This claim defines the [access level](https://qdrant.tech/documentation/guides/security/#table-of-access) of the token. If this claim is present, Qdrant will check if the token has the required access level to perform the operation. If this claim is **not** present, **manage** access is assumed. It can provide global access with `r` for read-only, or `m` for manage. For example: ```json { "access": "r" } ``` It can also be specific to one or more collections. The `access` level for each collection is `r` for read-only, or `rw` for read-write, like this: ```json { "access": [\ {\ "collection": "my_collection",\ "access": "rw"\ }\ ] } ``` You can also specify which subset of the collection the user is able to access by specifying a `payload` restriction that the points must have. ```json { "access": [\ {\ "collection": "my_collection",\ "access": "r",\ "payload": {\ "user_id": "user_123456"\ }\ }\ ] } ``` This `payload` claim will be used to implicitly filter the points in the collection. It will be equivalent to appending this filter to each request: ```json { "filter": { "must": [{ "key": "user_id", "match": { "value": "user_123456" } }] } } ``` ### [Anchor](https://qdrant.tech/documentation/guides/security/\#table-of-access) Table of access Check out this table to see which actions are allowed or denied based on the access level. This is also applicable to using api keys instead of tokens. In that case, `api_key` maps to **manage**, while `read_only_api_key` maps to **read-only**. **Symbols:** ✅ Allowed \| ❌ Denied \| 🟡 Allowed, but filtered | Action | manage | read-only | collection read-write | collection read-only | collection with payload claim (r / rw) | | --- | --- | --- | --- | --- | --- | | list collections | ✅ | ✅ | 🟡 | 🟡 | 🟡 | | get collection info | ✅ | ✅ | ✅ | ✅ | ❌ | | create collection | ✅ | ❌ | ❌ | ❌ | ❌ | | delete collection | ✅ | ❌ | ❌ | ❌ | ❌ | | update collection params | ✅ | ❌ | ❌ | ❌ | ❌ | | get collection cluster info | ✅ | ✅ | ✅ | ✅ | ❌ | | collection exists | ✅ | ✅ | ✅ | ✅ | ✅ | | update collection cluster setup | ✅ | ❌ | ❌ | ❌ | ❌ | | update aliases | ✅ | ❌ | ❌ | ❌ | ❌ | | list collection aliases | ✅ | ✅ | 🟡 | 🟡 | 🟡 | | list aliases | ✅ | ✅ | 🟡 | 🟡 | 🟡 | | create shard key | ✅ | ❌ | ❌ | ❌ | ❌ | | delete shard key | ✅ | ❌ | ❌ | ❌ | ❌ | | create payload index | ✅ | ❌ | ✅ | ❌ | ❌ | | delete payload index | ✅ | ❌ | ✅ | ❌ | ❌ | | list collection snapshots | ✅ | ✅ | ✅ | ✅ | ❌ | | create collection snapshot | ✅ | ❌ | ✅ | ❌ | ❌ | | delete collection snapshot | ✅ | ❌ | ✅ | ❌ | ❌ | | download collection snapshot | ✅ | ✅ | ✅ | ✅ | ❌ | | upload collection snapshot | ✅ | ❌ | ❌ | ❌ | ❌ | | recover collection snapshot | ✅ | ❌ | ❌ | ❌ | ❌ | | list shard snapshots | ✅ | ✅ | ✅ | ✅ | ❌ | | create shard snapshot | ✅ | ❌ | ✅ | ❌ | ❌ | | delete shard snapshot | ✅ | ❌ | ✅ | ❌ | ❌ | | download shard snapshot | ✅ | ✅ | ✅ | ✅ | ❌ | | upload shard snapshot | ✅ | ❌ | ❌ | ❌ | ❌ | | recover shard snapshot | ✅ | ❌ | ❌ | ❌ | ❌ | | list full snapshots | ✅ | ✅ | ❌ | ❌ | ❌ | | create full snapshot | ✅ | ❌ | ❌ | ❌ | ❌ | | delete full snapshot | ✅ | ❌ | ❌ | ❌ | ❌ | | download full snapshot | ✅ | ✅ | ❌ | ❌ | ❌ | | get cluster info | ✅ | ✅ | ❌ | ❌ | ❌ | | recover raft state | ✅ | ❌ | ❌ | ❌ | ❌ | | delete peer | ✅ | ❌ | ❌ | ❌ | ❌ | | get point | ✅ | ✅ | ✅ | ✅ | ❌ | | get points | ✅ | ✅ | ✅ | ✅ | ❌ | | upsert points | ✅ | ❌ | ✅ | ❌ | ❌ | | update points batch | ✅ | ❌ | ✅ | ❌ | ❌ | | delete points | ✅ | ❌ | ✅ | ❌ | ❌ / 🟡 | | update vectors | ✅ | ❌ | ✅ | ❌ | ❌ | | delete vectors | ✅ | ❌ | ✅ | ❌ | ❌ / 🟡 | | set payload | ✅ | ❌ | ✅ | ❌ | ❌ | | overwrite payload | ✅ | ❌ | ✅ | ❌ | ❌ | | delete payload | ✅ | ❌ | ✅ | ❌ | ❌ | | clear payload | ✅ | ❌ | ✅ | ❌ | ❌ | | scroll points | ✅ | ✅ | ✅ | ✅ | 🟡 | | query points | ✅ | ✅ | ✅ | ✅ | 🟡 | | search points | ✅ | ✅ | ✅ | ✅ | 🟡 | | search groups | ✅ | ✅ | ✅ | ✅ | 🟡 | | recommend points | ✅ | ✅ | ✅ | ✅ | ❌ | | recommend groups | ✅ | ✅ | ✅ | ✅ | ❌ | | discover points | ✅ | ✅ | ✅ | ✅ | ❌ | | count points | ✅ | ✅ | ✅ | ✅ | 🟡 | | version | ✅ | ✅ | ✅ | ✅ | ✅ | | readyz, healthz, livez | ✅ | ✅ | ✅ | ✅ | ✅ | | telemetry | ✅ | ✅ | ❌ | ❌ | ❌ | | metrics | ✅ | ✅ | ❌ | ❌ | ❌ | | update locks | ✅ | ❌ | ❌ | ❌ | ❌ | | get locks | ✅ | ✅ | ❌ | ❌ | ❌ | ## [Anchor](https://qdrant.tech/documentation/guides/security/\#tls) TLS _Available as of v1.2.0_ TLS for encrypted connections can be enabled on your Qdrant instance to secure connections. First make sure you have a certificate and private key for TLS, usually in `.pem` format. On your local machine you may use [mkcert](https://github.com/FiloSottile/mkcert#readme) to generate a self signed certificate. To enable TLS, set the following properties in the Qdrant configuration with the correct paths and restart: ```yaml service: # Enable HTTPS for the REST and gRPC API enable_tls: true --- # Required if either service.enable_tls or cluster.p2p.enable_tls is true. tls: # Server certificate chain file cert: ./tls/cert.pem # Server private key file key: ./tls/key.pem ``` For internal communication when running cluster mode, TLS can be enabled with: ```yaml cluster: # Configuration of the inter-cluster communication p2p: # Use TLS for communication between peers enable_tls: true ``` With TLS enabled, you must start using HTTPS connections. For example: bashpythontypescriptrust ```bash curl -X GET https://localhost:6333 ``` ```python from qdrant_client import QdrantClient client = QdrantClient( url="https://localhost:6333", ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ url: "https://localhost", port: 6333 }); ``` ```rust use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; ``` Certificate rotation is enabled with a default refresh time of one hour. This reloads certificate files every hour while Qdrant is running. This way changed certificates are picked up when they get updated externally. The refresh time can be tuned by changing the `tls.cert_ttl` setting. You can leave this on, even if you don’t plan to update your certificates. Currently this is only supported for the REST API. Optionally, you can enable client certificate validation on the server against a local certificate authority. Set the following properties and restart: ```yaml service: # Check user HTTPS client certificate against CA file specified in tls config verify_https_client_certificate: false --- # Required if either service.enable_tls or cluster.p2p.enable_tls is true. tls: # Certificate authority certificate file. # This certificate will be used to validate the certificates # presented by other nodes during inter-cluster communication. # # If verify_https_client_certificate is true, it will verify # HTTPS client certificate # # Required if cluster.p2p.enable_tls is true. ca_cert: ./tls/cacert.pem ``` ## [Anchor](https://qdrant.tech/documentation/guides/security/\#hardening) Hardening We recommend reducing the amount of permissions granted to Qdrant containers so that you can reduce the risk of exploitation. Here are some ways to reduce the permissions of a Qdrant container: - Run Qdrant as a non-root user. This can help mitigate the risk of future container breakout vulnerabilities. Qdrant does not need the privileges of the root user for any purpose. - You can use the image `qdrant/qdrant:-unprivileged` instead of the default Qdrant image. - You can use the flag `--user=1000:2000` when running [`docker run`](https://docs.docker.com/reference/cli/docker/container/run/). - You can set [`user: 1000`](https://docs.docker.com/compose/compose-file/05-services/#user) when using Docker Compose. - You can set [`runAsUser: 1000`](https://kubernetes.io/docs/tasks/configure-pod-container/security-context) when running in Kubernetes (our [Helm chart](https://github.com/qdrant/qdrant-helm) does this by default). - Run Qdrant with a read-only root filesystem. This can help mitigate vulnerabilities that require the ability to modify system files, which is a permission Qdrant does not need. As long as the container uses mounted volumes for storage ( `/qdrant/storage` and `/qdrant/snapshots` by default), Qdrant can continue to operate while being prevented from writing data outside of those volumes. - You can use the flag `--read-only` when running [`docker run`](https://docs.docker.com/reference/cli/docker/container/run/). - You can set [`read_only: true`](https://docs.docker.com/compose/compose-file/05-services/#read_only) when using Docker Compose. - You can set [`readOnlyRootFilesystem: true`](https://kubernetes.io/docs/tasks/configure-pod-container/security-context) when running in Kubernetes (our [Helm chart](https://github.com/qdrant/qdrant-helm) does this by default). - Block Qdrant’s external network access. This can help mitigate [server side request forgery attacks](https://owasp.org/www-community/attacks/Server_Side_Request_Forgery), like via the [snapshot recovery API](https://api.qdrant.tech/api-reference/snapshots/recover-from-snapshot). Single-node Qdrant clusters do not require any outbound network access. Multi-node Qdrant clusters only need the ability to connect to other Qdrant nodes via TCP ports 6333, 6334, and 6335. - You can use [`docker network create --internal `](https://docs.docker.com/reference/cli/docker/network/create/#internal) and use that network when running [`docker run --network `](https://docs.docker.com/reference/cli/docker/container/run/#network). - You can create an [internal network](https://docs.docker.com/compose/compose-file/06-networks/#internal) when using Docker Compose. - You can create a [NetworkPolicy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) when using Kubernetes. Note that multi-node Qdrant clusters [will also need access to cluster DNS in Kubernetes](https://github.com/ahmetb/kubernetes-network-policy-recipes/blob/master/11-deny-egress-traffic-from-an-application.md#allowing-dns-traffic). There are other techniques for reducing the permissions such as dropping [Linux capabilities](https://www.man7.org/linux/man-pages/man7/capabilities.7.html) depending on your deployment method, but the methods mentioned above are the most important. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/security.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/security.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-131-lllmstxt|> ## async-api - [Documentation](https://qdrant.tech/documentation/) - [Database tutorials](https://qdrant.tech/documentation/database-tutorials/) - Build With Async API --- # [Anchor](https://qdrant.tech/documentation/database-tutorials/async-api/\#using-qdrants-async-api-for-efficient-python-applications) Using Qdrant’s Async API for Efficient Python Applications Asynchronous programming is being broadly adopted in the Python ecosystem. Tools such as FastAPI [have embraced this new\\ paradigm](https://fastapi.tiangolo.com/async/), but it is also becoming a standard for ML models served as SaaS. For example, the Cohere SDK [provides an async client](https://github.com/cohere-ai/cohere-python/blob/856a4c3bd29e7a75fa66154b8ac9fcdf1e0745e0/src/cohere/client.py#L189) next to its synchronous counterpart. Databases are often launched as separate services and are accessed via a network. All the interactions with them are IO-bound and can be performed asynchronously so as not to waste time actively waiting for a server response. In Python, this is achieved by using [`async/await`](https://docs.python.org/3/library/asyncio-task.html) syntax. That lets the interpreter switch to another task while waiting for a response from the server. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/async-api/\#when-to-use-async-api) When to use async API There is no need to use async API if the application you are writing will never support multiple users at once (e.g it is a script that runs once per day). However, if you are writing a web service that multiple users will use simultaneously, you shouldn’t be blocking the threads of the web server as it limits the number of concurrent requests it can handle. In this case, you should use the async API. Modern web frameworks like [FastAPI](https://fastapi.tiangolo.com/) and [Quart](https://quart.palletsprojects.com/en/latest/) support async API out of the box. Mixing asynchronous code with an existing synchronous codebase might be a challenge. The `async/await` syntax cannot be used in synchronous functions. On the other hand, calling an IO-bound operation synchronously in async code is considered an antipattern. Therefore, if you build an async web service, exposed through an [ASGI](https://asgi.readthedocs.io/en/latest/) server, you should use the async API for all the interactions with Qdrant. ### [Anchor](https://qdrant.tech/documentation/database-tutorials/async-api/\#using-qdrant-asynchronously) Using Qdrant asynchronously The simplest way of running asynchronous code is to use define `async` function and use the `asyncio.run` in the following way to run it: ```python from qdrant_client import models import qdrant_client import asyncio async def main(): client = qdrant_client.AsyncQdrantClient("localhost") # Create a collection await client.create_collection( collection_name="my_collection", vectors_config=models.VectorParams(size=4, distance=models.Distance.COSINE), ) # Insert a vector await client.upsert( collection_name="my_collection", points=[\ models.PointStruct(\ id="5c56c793-69f3-4fbf-87e6-c4bf54c28c26",\ payload={\ "color": "red",\ },\ vector=[0.9, 0.1, 0.1, 0.5],\ ),\ ], ) # Search for nearest neighbors points = await client.query_points( collection_name="my_collection", query=[0.9, 0.1, 0.1, 0.5], limit=2, ).points # Your async code using AsyncQdrantClient might be put here # ... asyncio.run(main()) ``` The `AsyncQdrantClient` provides the same methods as the synchronous counterpart `QdrantClient`. If you already have a synchronous codebase, switching to async API is as simple as replacing `QdrantClient` with `AsyncQdrantClient` and adding `await` before each method call. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/async-api.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/async-api.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-132-lllmstxt|> ## qdrant-dspy-medicalbot - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Building a Chain-of-Thought Medical Chatbot with Qdrant and DSPy --- # [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#building-a-chain-of-thought-medical-chatbot-with-qdrant-and-dspy) Building a Chain-of-Thought Medical Chatbot with Qdrant and DSPy Accessing medical information from LLMs can lead to hallucinations or outdated information. Relying on this type of information can result in serious medical consequences. Building a trustworthy and context-aware medical chatbot can solve this. In this article, we will look at how to tackle these challenges using: - **Retrieval-Augmented Generation (RAG)**: Instead of answering the questions from scratch, the bot retrieves the information from medical literature before answering questions. - **Filtering**: Users can filter the results by specialty and publication year, ensuring the information is accurate and up-to-date. Let’s discover the technologies needed to build the medical bot. ## [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#tech-stack-overview) Tech Stack Overview To build a robust and trustworthy medical chatbot, we will combine the following technologies: - [**Qdrant Cloud**](https://qdrant.tech/cloud/): Qdrant is a high-performance vector search engine for storing and retrieving large collections of embeddings. In this project, we will use it to enable fast and accurate search across millions of medical documents, supporting dense and multi-vector (ColBERT) retrieval for context-aware answers. - [**Stanford DSPy**](https://qdrant.tech/documentation/frameworks/dspy/) **:** DSPy is the AI framework we will use to obtain the final answer. It allows the medical bot to retrieve the relevant information and reason step-by-step to produce accurate and explainable answers. ![medicalbot flow chart](https://qdrant.tech/articles_data/Qdrant-DSPy-medicalbot/medicalbot.png) ## [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#dataset-preparation-and-indexing) Dataset Preparation and Indexing A medical chatbot is only as good as the knowledge it has access to. For this project, we will leverage the [MIRIAD medical dataset](https://huggingface.co/datasets/miriad/miriad-5.8M), a large-scale collection of medical passages enriched with metadata such as publication year and specialty. ### [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#indexing-with-dense-and-colbert-multivectors) Indexing with Dense and ColBERT Multivectors To enable high-quality retrieval, we will embed each medical passage with two models: - **Dense Embeddings**: These are generated using the `BAAI/bge-small-en` model and capture the passages’ general semantic meaning. - **ColBERT Multivectors**: These provide more fine-grained representations, enabling precise ranking of results. ```python dense_documents = [\ models.Document(text=doc, model="BAAI/bge-small-en") for doc in ds["passage_text"]\ ] colbert_documents = [\ models.Document(text=doc, model="colbert-ir/colbertv2.0")\ for doc in ds["passage_text"]\ ] collection_name = "miriad" --- # Create collection if not client.collection_exists(collection_name): client.create_collection( collection_name=collection_name, vectors_config={ "dense": models.VectorParams(size=384, distance=models.Distance.COSINE), "colbert": models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), hnsw_config=models.HnswConfigDiff(m=0), # reranker: no indexing ), }, ) ``` We disable indexing for the ColBERT multivector since it will only be used for reranking. To learn more about this, check out the [How to Effectively Use Multivector Representations in Qdrant for Reranking](https://qdrant.tech/documentation/advanced-tutorials/using-multivector-representations/) article. ### [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#batch-uploading-to-qdrant) Batch Uploading to Qdrant To avoid hitting API limits, we upload the data in batches, each batch containing: - The passage text - ColBERT and dense embeddings. - `year` and `specialty` metadata fields. ```python BATCH_SIZE = 3 points_batch = [] for i in range(len(ds["passage_text"])): point = models.PointStruct( id=i, vector={"dense": dense_documents[i], "colbert": colbert_documents[i]}, payload={ "passage_text": ds["passage_text"][i], "year": ds["year"][i], "specialty": ds["specialty"][i], }, ) points_batch.append(point) if len(points_batch) == BATCH_SIZE: client.upsert(collection_name=collection_name, points=points_batch) print(f"Uploaded batch ending at index {i}") points_batch = [] --- # Final flush if points_batch: client.upsert(collection_name=collection_name, points=points_batch) print("Uploaded final batch.") ``` ## [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#retrieval-augmented-generation-rag-pipeline) Retrieval-Augmented Generation (RAG) Pipeline Our chatbot will use a Retrieval-Augmented Generation (RAG) pipeline to ensure its answers are grounded in medical literature. ### [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#integration-of-dspy-and-qdrant) Integration of DSPy and Qdrant At the heart of the application is the Qdrant vector database that provides the information sent to DSPy to generate the final answer. This is what happens when a user submits a query: - DSPy searches against the Qdrant vector database to retrieve the top documents and answers the query. The results are also filtered with a particular year range for a specific specialty. - The retrieved passages are then reranked using ColBERT multivector embeddings, leading to the most relevant and contextually appropriate answers. - DSPy uses these passages to guide the language model through a chain-of-thought reasoning to generate the most accurate answer. ```python def rerank_with_colbert(query_text, min_year, max_year, specialty): from fastembed import TextEmbedding, LateInteractionTextEmbedding # Encode query once with both models dense_model = TextEmbedding("BAAI/bge-small-en") colbert_model = LateInteractionTextEmbedding("colbert-ir/colbertv2.0") dense_query = list(dense_model.embed(query_text))[0] colbert_query = list(colbert_model.embed(query_text))[0] # Combined query: retrieve with dense, # rerank with ColBERT results = client.query_points( collection_name=collection_name, prefetch=models.Prefetch(query=dense_query, using="dense"), query=colbert_query, using="colbert", limit=5, with_payload=True, query_filter=Filter( must=[\ FieldCondition(key="specialty", match=MatchValue(value=specialty)),\ FieldCondition(\ key="year",\ range=models.Range(gt=None, gte=min_year, lt=None, lte=max_year),\ ),\ ] ), ) points = results.points docs = [] for point in points: docs.append(point.payload["passage_text"]) return docs ``` The pipeline ensures that each response is grounded in real and recent medical literature and is aligned with the user’s needs. ## [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#guardrails-and-medical-question-detection) Guardrails and Medical Question Detection Since this is a medical chatbot, we can introduce a simple guardrail to ensure it doesn’t respond to unrelated questions like the weather. This can be implemented using a DSPy module. The chatbot checks if every question is medical-related before attempting to answer it. This is achieved by a DSPy module that classifies each incoming query as medical or not. If the question is not medical-related, the chatbot declines to answer, reducing the risk of misinformation or inappropriate responses. ```python class MedicalGuardrail(dspy.Module): def forward(self, question): prompt = ( """ Is the following question a medical question? Answer with 'Yes' or 'No'.n" f"Question: {question}n" "Answer: """ ) response = dspy.settings.lm(prompt) answer = response[0].strip().lower() return answer.startswith("yes") if not self.guardrail.forward(question): class DummyResult: final_answer = """ Sorry, I can only answer medical questions. Please ask a question related to medicine or healthcare """ return DummyResult() ``` By combining this guardrail with specialty and year filtering, we ensure that the chatbot: - Only answers medical questions. - Answers questions from recent medical literature. - Doesn’t make up answers by grounding its answers in the provided literature. ![medicalbot demo](https://qdrant.tech/articles_data/Qdrant-DSPy-medicalbot/medicaldemo.png) ## [Anchor](https://qdrant.tech/documentation/examples/qdrant-dspy-medicalbot/\#conclusion) Conclusion By leveraging Qdrant and DSPy, you can build a medical chatbot that generates accurate and up-to-date medical responses. Qdrant provides the technology and enables fast and scalable retrieval, while DSPy synthesizes this information to provide correct answers grounded in the medical literature. As a result, you can achieve a medical system that is truthful, safe, and provides relevant responses. Check out the entire project from this [notebook](https://github.com/qdrant/examples/blob/master/DSPy-medical-bot/medical_bot_DSPy_Qdrant.ipynb). You’ll need a free [Qdrant Cloud](https://qdrant.tech/cloud/) account to run the notebook. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/Qdrant-DSPy-medicalbot.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/Qdrant-DSPy-medicalbot.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-133-lllmstxt|> ## beginner-tutorials - [Documentation](https://qdrant.tech/documentation/) - Vector Search Basics --- # [Anchor](https://qdrant.tech/documentation/beginner-tutorials/\#beginner-tutorials) Beginner Tutorials | | | --- | | [Build Your First Semantic Search Engine in 5 Minutes](https://qdrant.tech/documentation/beginner-tutorials/search-beginners/) | | [Build a Neural Search Service with Sentence Transformers and Qdrant](https://qdrant.tech/documentation/beginner-tutorials/neural-search/) | | [Build a Hybrid Search Service with FastEmbed and Qdrant](https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/) | | [Measure and Improve Retrieval Quality in Semantic Search](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/) | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-134-lllmstxt|> ## binary-quantization - [Articles](https://qdrant.tech/articles/) - Binary Quantization - Vector Search, 40x Faster [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Binary Quantization - Vector Search, 40x Faster Nirant Kasliwal · September 18, 2023 ![Binary Quantization - Vector Search, 40x Faster ](https://qdrant.tech/articles_data/binary-quantization/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/binary-quantization/\#optimizing-high-dimensional-vectors-with-binary-quantization) Optimizing High-Dimensional Vectors with Binary Quantization Qdrant is built to handle typical scaling challenges: high throughput, low latency and efficient indexing. **Binary quantization (BQ)** is our latest attempt to give our customers the edge they need to scale efficiently. This feature is particularly excellent for collections with large vector lengths and a large number of points. Our results are dramatic: Using BQ will reduce your memory consumption and improve retrieval speeds by up to 40x. As is the case with other quantization methods, these benefits come at the cost of recall degradation. However, our implementation lets you balance the tradeoff between speed and recall accuracy at time of search, rather than time of index creation. The rest of this article will cover: 1. The importance of binary quantization 2. Basic implementation using our Python client 3. Benchmark analysis and usage recommendations ## [Anchor](https://qdrant.tech/articles/binary-quantization/\#what-is-binary-quantization) What is Binary Quantization? Binary quantization (BQ) converts any vector embedding of floating point numbers into a vector of binary or boolean values. This feature is an extension of our past work on [scalar quantization](https://qdrant.tech/articles/scalar-quantization/) where we convert `float32` to `uint8` and then leverage a specific SIMD CPU instruction to perform fast vector comparison. ![What is binary quantization](https://qdrant.tech/articles_data/binary-quantization/bq-2.png) **This binarization function is how we convert a range to binary values. All numbers greater than zero are marked as 1. If it’s zero or less, they become 0.** The benefit of reducing the vector embeddings to binary values is that boolean operations are very fast and need significantly less CPU instructions. In exchange for reducing our 32 bit embeddings to 1 bit embeddings we can see up to a 40x retrieval speed up gain! One of the reasons vector search still works with such a high compression rate is that these large vectors are over-parameterized for retrieval. This is because they are designed for ranking, clustering, and similar use cases, which typically need more information encoded in the vector. For example, The 1536 dimension OpenAI embedding is worse than Open Source counterparts of 384 dimension at retrieval and ranking. Specifically, it scores 49.25 on the same [Embedding Retrieval Benchmark](https://huggingface.co/spaces/mteb/leaderboard) where the Open Source `bge-small` scores 51.82. This 2.57 points difference adds up quite soon. Our implementation of quantization achieves a good balance between full, large vectors at ranking time and binary vectors at search and retrieval time. It also has the ability for you to adjust this balance depending on your use case. ## [Anchor](https://qdrant.tech/articles/binary-quantization/\#faster-search-and-retrieval) Faster search and retrieval Unlike product quantization, binary quantization does not rely on reducing the search space for each probe. Instead, we build a binary index that helps us achieve large increases in search speed. ![Speed by quantization method](https://qdrant.tech/articles_data/binary-quantization/bq-3.png) HNSW is the approximate nearest neighbor search. This means our accuracy improves up to a point of diminishing returns, as we check the index for more similar candidates. In the context of binary quantization, this is referred to as the **oversampling rate**. For example, if `oversampling=2.0` and the `limit=100`, then 200 vectors will first be selected using a quantized index. For those 200 vectors, the full 32 bit vector will be used with their HNSW index to a much more accurate 100 item result set. As opposed to doing a full HNSW search, we oversample a preliminary search and then only do the full search on this much smaller set of vectors. ## [Anchor](https://qdrant.tech/articles/binary-quantization/\#improved-storage-efficiency) Improved storage efficiency The following diagram shows the binarization function, whereby we reduce 32 bits storage to 1 bit information. Text embeddings can be over 1024 elements of floating point 32 bit numbers. For example, remember that OpenAI embeddings are 1536 element vectors. This means each vector is 6kB for just storing the vector. ![Improved storage efficiency](https://qdrant.tech/articles_data/binary-quantization/bq-4.png) In addition to storing the vector, we also need to maintain an index for faster search and retrieval. Qdrant’s formula to estimate overall memory consumption is: `memory_size = 1.5 * number_of_vectors * vector_dimension * 4 bytes` For 100K OpenAI Embedding ( `ada-002`) vectors we would need 900 Megabytes of RAM and disk space. This consumption can start to add up rapidly as you create multiple collections or add more items to the database. **With binary quantization, those same 100K OpenAI vectors only require 128 MB of RAM.** We benchmarked this result using methods similar to those covered in our [Scalar Quantization memory estimation](https://qdrant.tech/articles/scalar-quantization/#benchmarks). This reduction in RAM usage is achieved through the compression that happens in the binary conversion. HNSW and quantized vectors will live in RAM for quick access, while original vectors can be offloaded to disk only. For searching, quantized HNSW will provide oversampled candidates, then they will be re-evaluated using their disk-stored original vectors to refine the final results. All of this happens under the hood without any additional intervention on your part. ### [Anchor](https://qdrant.tech/articles/binary-quantization/\#when-should-you-not-use-bq) When should you not use BQ? Since this method exploits the over-parameterization of embedding, you can expect poorer results for small embeddings i.e. less than 1024 dimensions. With the smaller number of elements, there is not enough information maintained in the binary vector to achieve good results. You will still get faster boolean operations and reduced RAM usage, but the accuracy degradation might be too high. ## [Anchor](https://qdrant.tech/articles/binary-quantization/\#sample-implementation) Sample implementation Now that we have introduced you to binary quantization, let’s try our a basic implementation. In this example, we will be using OpenAI and Cohere with Qdrant. #### [Anchor](https://qdrant.tech/articles/binary-quantization/\#create-a-collection-with-binary-quantization-enabled) Create a collection with Binary Quantization enabled Here is what you should do at indexing time when you create the collection: 1. We store all the “full” vectors on disk. 2. Then we set the binary embeddings to be in RAM. By default, both the full vectors and BQ get stored in RAM. We move the full vectors to disk because this saves us memory and allows us to store more vectors in RAM. By doing this, we explicitly move the binary vectors to memory by setting `always_ram=True`. ```python from qdrant_client import QdrantClient #collect to our Qdrant Server client = QdrantClient( url="http://localhost:6333", prefer_grpc=True, ) #Create the collection to hold our embeddings --- # on_disk=True and the quantization_config are the areas to focus on collection_name = "binary-quantization" if not client.collection_exists(collection_name): client.create_collection( collection_name=f"{collection_name}", vectors_config=models.VectorParams( size=1536, distance=models.Distance.DOT, on_disk=True, ), optimizers_config=models.OptimizersConfigDiff( default_segment_number=5, ), hnsw_config=models.HnswConfigDiff( m=0, ), quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig(always_ram=True), ), ) ``` #### [Anchor](https://qdrant.tech/articles/binary-quantization/\#what-is-happening-in-the-hnswconfig) What is happening in the HnswConfig? We’re setting `m` to 0 i.e. disabling the HNSW graph construction. This allows faster uploads of vectors and payloads. We will turn it back on down below, once all the data is loaded. #### [Anchor](https://qdrant.tech/articles/binary-quantization/\#next-we-upload-our-vectors-to-this-and-then-enable-the-graph-construction) Next, we upload our vectors to this and then enable the graph construction: ```python batch_size = 10000 client.upload_collection( collection_name=collection_name, ids=range(len(dataset)), vectors=dataset["openai"], payload=[\ {"text": x} for x in dataset["text"]\ ], parallel=10, # based on the machine ) ``` Enable HNSW graph construction again: ```python client.update_collection( collection_name=f"{collection_name}", hnsw_config=models.HnswConfigDiff( m=16, , ) ``` #### [Anchor](https://qdrant.tech/articles/binary-quantization/\#configure-the-search-parameters) Configure the search parameters: When setting search parameters, we specify that we want to use `oversampling` and `rescore`. Here is an example snippet: ```python client.search( collection_name="{collection_name}", query_vector=[0.2, 0.1, 0.9, 0.7, ...], search_params=models.SearchParams( quantization=models.QuantizationSearchParams( ignore=False, rescore=True, oversampling=2.0, ) ) ) ``` After Qdrant pulls the oversampled vectors set, the full vectors which will be, say 1536 dimensions for OpenAI will then be pulled up from disk. Qdrant computes the nearest neighbor with the query vector and returns the accurate, rescored order. This method produces much more accurate results. We enabled this by setting `rescore=True`. These two parameters are how you are going to balance speed versus accuracy. The larger the size of your oversample, the more items you need to read from disk and the more elements you have to search with the relatively slower full vector index. On the other hand, doing this will produce more accurate results. If you have lower accuracy requirements you can even try doing a small oversample without rescoring. Or maybe, for your data set combined with your accuracy versus speed requirements you can just search the binary index and no rescoring, i.e. leaving those two parameters out of the search query. ## [Anchor](https://qdrant.tech/articles/binary-quantization/\#benchmark-results) Benchmark results We retrieved some early results on the relationship between limit and oversampling using the the DBPedia OpenAI 1M vector dataset. We ran all these experiments on a Qdrant instance where 100K vectors were indexed and used 100 random queries. We varied the 3 parameters that will affect query time and accuracy: limit, rescore and oversampling. We offer these as an initial exploration of this new feature. You are highly encouraged to reproduce these experiments with your data sets. > Aside: Since this is a new innovation in vector databases, we are keen to hear feedback and results. [Join our Discord server](https://discord.gg/Qy6HCJK9Dc) for further discussion! **Oversampling:** In the figure below, we illustrate the relationship between recall and number of candidates: ![Correct vs candidates](https://qdrant.tech/articles_data/binary-quantization/bq-5.png) We see that “correct” results i.e. recall increases as the number of potential “candidates” increase (limit x oversampling). To highlight the impact of changing the `limit`, different limit values are broken apart into different curves. For example, we see that the lowest recall for limit 50 is around 94 correct, with 100 candidates. This also implies we used an oversampling of 2.0 As oversampling increases, we see a general improvement in results – but that does not hold in every case. **Rescore:** As expected, rescoring increases the time it takes to return a query. We also repeated the experiment with oversampling except this time we looked at how rescore impacted result accuracy. ![Relationship between limit and rescore on correct](https://qdrant.tech/articles_data/binary-quantization/bq-7.png) **Limit:** We experiment with limits from Top 1 to Top 50 and we are able to get to 100% recall at limit 50, with rescore=True, in an index with 100K vectors. ## [Anchor](https://qdrant.tech/articles/binary-quantization/\#recommendations) Recommendations Quantization gives you the option to make tradeoffs against other parameters: Dimension count/embedding size Throughput and Latency requirements Recall requirements If you’re working with OpenAI or Cohere embeddings, we recommend the following oversampling settings: | Method | Dimensionality | Test Dataset | Recall | Oversampling | | --- | --- | --- | --- | --- | | OpenAI text-embedding-3-large | 3072 | [DBpedia 1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M) | 0.9966 | 3x | | OpenAI text-embedding-3-small | 1536 | [DBpedia 100K](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K) | 0.9847 | 3x | | OpenAI text-embedding-3-large | 1536 | [DBpedia 1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M) | 0.9826 | 3x | | OpenAI text-embedding-ada-002 | 1536 | [DbPedia 1M](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) | 0.98 | 4x | | Gemini | 768 | No Open Data | 0.9563 | 3x | | Mistral Embed | 768 | No Open Data | 0.9445 | 3x | If you determine that binary quantization is appropriate for your datasets and queries then we suggest the following: - Binary Quantization with always\_ram=True - Vectors stored on disk - Oversampling=2.0 (or more) - Rescore=True ## [Anchor](https://qdrant.tech/articles/binary-quantization/\#whats-next) What’s next? Binary quantization is exceptional if you need to work with large volumes of data under high recall expectations. You can try this feature either by spinning up a [Qdrant container image](https://hub.docker.com/r/qdrant/qdrant) locally or, having us create one for you through a [free account](https://cloud.qdrant.io/signup) in our cloud hosted service. The article gives examples of data sets and configuration you can use to get going. Our documentation covers [adding large datasets to Qdrant](https://qdrant.tech/documentation/tutorials/bulk-upload/) to your Qdrant instance as well as [more quantization methods](https://qdrant.tech/documentation/guides/quantization/). If you have any feedback, drop us a note on Twitter or LinkedIn to tell us about your results. [Join our lively Discord Server](https://discord.gg/Qy6HCJK9Dc) if you want to discuss BQ with like-minded people! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/binary-quantization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/binary-quantization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-135-lllmstxt|> ## monitoring - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Monitoring & Telemetry --- # [Anchor](https://qdrant.tech/documentation/guides/monitoring/\#monitoring--telemetry) Monitoring & Telemetry Qdrant exposes its metrics in [Prometheus](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format)/ [OpenMetrics](https://github.com/OpenObservability/OpenMetrics) format, so you can integrate them easily with the compatible tools and monitor Qdrant with your own monitoring system. You can use the `/metrics` endpoint and configure it as a scrape target. Metrics endpoint: [http://localhost:6333/metrics](http://localhost:6333/metrics) The integration with Qdrant is easy to [configure](https://prometheus.io/docs/prometheus/latest/getting_started/#configure-prometheus-to-monitor-the-sample-targets) with Prometheus and Grafana. ## [Anchor](https://qdrant.tech/documentation/guides/monitoring/\#monitoring-multi-node-clusters) Monitoring multi-node clusters When scraping metrics from multi-node Qdrant clusters, it is important to scrape from each node individually instead of using a load-balanced URL. Otherwise, your metrics will appear inconsistent after each scrape. ## [Anchor](https://qdrant.tech/documentation/guides/monitoring/\#monitoring-in-qdrant-cloud) Monitoring in Qdrant Cloud Qdrant Cloud offers additional metrics and telemetry that are not available in the open-source version. For more information, see [Qdrant Cloud Monitoring](https://qdrant.tech/documentation/cloud/cluster-monitoring/). ## [Anchor](https://qdrant.tech/documentation/guides/monitoring/\#exposed-metrics) Exposed metrics There are two endpoints avaliable: - `/metrics` is the direct endpoint of the underlying Qdrant database node. - `/sys_metrics` is a Qdrant cloud-only endpoint that provides additional operational and infrastructure metrics about your cluster, like CPU, memory and disk utilisation, collection metrics and load balancer telemetry. For more information, see [Qdrant Cloud Monitoring](https://qdrant.tech/documentation/cloud/cluster-monitoring/). ### [Anchor](https://qdrant.tech/documentation/guides/monitoring/\#node-metrics-metrics) Node metrics `/metrics` Each Qdrant server will expose the following metrics. | Name | Type | Meaning | | --- | --- | --- | | app\_info | gauge | Information about Qdrant server | | app\_status\_recovery\_mode | gauge | If Qdrant is currently started in recovery mode | | collections\_total | gauge | Number of collections | | collections\_vector\_total | gauge | Total number of vectors in all collections | | collections\_full\_total | gauge | Number of full collections | | collections\_aggregated\_total | gauge | Number of aggregated collections | | rest\_responses\_total | counter | Total number of responses through REST API | | rest\_responses\_fail\_total | counter | Total number of failed responses through REST API | | rest\_responses\_avg\_duration\_seconds | gauge | Average response duration in REST API | | rest\_responses\_min\_duration\_seconds | gauge | Minimum response duration in REST API | | rest\_responses\_max\_duration\_seconds | gauge | Maximum response duration in REST API | | grpc\_responses\_total | counter | Total number of responses through gRPC API | | grpc\_responses\_fail\_total | counter | Total number of failed responses through REST API | | grpc\_responses\_avg\_duration\_seconds | gauge | Average response duration in gRPC API | | grpc\_responses\_min\_duration\_seconds | gauge | Minimum response duration in gRPC API | | grpc\_responses\_max\_duration\_seconds | gauge | Maximum response duration in gRPC API | | cluster\_enabled | gauge | Whether the cluster support is enabled. 1 - YES | | memory\_active\_bytes | gauge | Total number of bytes in active pages allocated by the application. [Reference](https://jemalloc.net/jemalloc.3.html#stats.active) | | memory\_allocated\_bytes | gauge | Total number of bytes allocated by the application. [Reference](https://jemalloc.net/jemalloc.3.html#stats.allocated) | | memory\_metadata\_bytes | gauge | Total number of bytes dedicated to allocator metadata. [Reference](https://jemalloc.net/jemalloc.3.html#stats.metadata) | | memory\_resident\_bytes | gauge | Maximum number of bytes in physically resident data pages mapped. [Reference](https://jemalloc.net/jemalloc.3.html#stats.resident) | | memory\_retained\_bytes | gauge | Total number of bytes in virtual memory mappings. [Reference](https://jemalloc.net/jemalloc.3.html#stats.retained) | | collection\_hardware\_metric\_cpu | gauge | CPU measurements of a collection | **Cluster-related metrics** There are also some metrics which are exposed in distributed mode only. | Name | Type | Meaning | | --- | --- | --- | | cluster\_peers\_total | gauge | Total number of cluster peers | | cluster\_term | counter | Current cluster term | | cluster\_commit | counter | Index of last committed (finalized) operation cluster peer is aware of | | cluster\_pending\_operations\_total | gauge | Total number of pending operations for cluster peer | | cluster\_voter | gauge | Whether the cluster peer is a voter or learner. 1 - VOTER | ## [Anchor](https://qdrant.tech/documentation/guides/monitoring/\#telemetry-endpoint) Telemetry endpoint Qdrant also provides a `/telemetry` endpoint, which provides information about the current state of the database, including the number of vectors, shards, and other useful information. You can find a full documentation of this endpoint in the [API reference](https://api.qdrant.tech/api-reference/service/telemetry). ## [Anchor](https://qdrant.tech/documentation/guides/monitoring/\#kubernetes-health-endpoints) Kubernetes health endpoints _Available as of v1.5.0_ Qdrant exposes three endpoints, namely [`/healthz`](http://localhost:6333/healthz), [`/livez`](http://localhost:6333/livez) and [`/readyz`](http://localhost:6333/readyz), to indicate the current status of the Qdrant server. These currently provide the most basic status response, returning HTTP 200 if Qdrant is started and ready to be used. Regardless of whether an [API key](https://qdrant.tech/documentation/guides/security/#authentication) is configured, the endpoints are always accessible. You can read more about Kubernetes health endpoints [here](https://kubernetes.io/docs/reference/using-api/health-checks/). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/monitoring.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/monitoring.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-136-lllmstxt|> ## cloud-pricing-payments - [Documentation](https://qdrant.tech/documentation/) - Billing & Payments --- # [Anchor](https://qdrant.tech/documentation/cloud-pricing-payments/\#qdrant-cloud-billing--payments) Qdrant Cloud Billing & Payments Qdrant database clusters in Qdrant Cloud are priced based on CPU, memory, and disk storage usage. To get a clearer idea for the pricing structure, based on the amounts of vectors you want to store, please use our [Pricing Calculator](https://cloud.qdrant.io/calculator). ## [Anchor](https://qdrant.tech/documentation/cloud-pricing-payments/\#billing) Billing You can pay for your Qdrant Cloud database clusters either with a credit card or through an AWS, GCP, or Azure Marketplace subscription. Your payment method is charged at the beginning of each month for the previous month’s usage. There is no difference in pricing between the different payment methods. If you choose to pay through a marketplace, the Qdrant Cloud usage costs are added as usage units to your existing billing for your cloud provider services. A detailed breakdown of your usage is available in the Qdrant Cloud Console. Note: Even if you pay using a marketplace subscription, your database clusters will still be deployed into Qdrant-owned infrastructure. The setup and management of Qdrant database clusters will also still be done via the Qdrant Cloud Console UI. If you wish to deploy Qdrant database clusters into your own environment from Qdrant Cloud then we recommend our [Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/) solution. ![Payment Options](https://qdrant.tech/documentation/cloud/payment-options.png) ### [Anchor](https://qdrant.tech/documentation/cloud-pricing-payments/\#credit-card) Credit Card Credit card payments are processed through Stripe. To set up a credit card, go to the Billing Details screen in the [Qdrant Cloud Console](https://cloud.qdrant.io/), select **Stripe** as the payment method, and enter your credit card details. ### [Anchor](https://qdrant.tech/documentation/cloud-pricing-payments/\#aws-marketplace) AWS Marketplace Our [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-rtphb42tydtzg) listing streamlines access to Qdrant for users who rely on Amazon Web Services for hosting and application development. To subscribe: 1. Go to Billing Details screen in the [Qdrant Cloud Console](https://cloud.qdrant.io/) 2. Select **AWS Marketplace** as the payment method. You will be redirected to the AWS Marketplace listing for Qdrant. 3. Click the bright orange button - **View purchase options**. 4. On the next screen, under Purchase, click **Subscribe**. 5. Up top, on the green banner, click **Set up your account**. You will be redirected to the Billing Details screen in the [Qdrant Cloud Console](https://cloud.qdrant.io/). From there you can start to create Qdrant database clusters. ### [Anchor](https://qdrant.tech/documentation/cloud-pricing-payments/\#gcp-marketplace) GCP Marketplace Our [GCP Marketplace](https://console.cloud.google.com/marketplace/product/qdrant-public/qdrant) listing streamlines access to Qdrant for users who rely on the Google Cloud Platform for hosting and application development. To subscribe: 1. Go to Billing Details screen in the [Qdrant Cloud Console](https://cloud.qdrant.io/) 2. Select **GCP Marketplace** as the payment method. You will be redirected to the GCP Marketplace listing for Qdrant. 3. Select **Subscribe**. (If you have already subscribed, select **Manage on Provider**.) 4. On the next screen, choose options as required, and select **Subscribe**. 5. On the pop-up window that appers, select **Sign up with Qdrant**. You will be redirected to the Billing Details screen in the [Qdrant Cloud Console](https://cloud.qdrant.io/). From there you can start to create Qdrant database clusters. ### [Anchor](https://qdrant.tech/documentation/cloud-pricing-payments/\#azure-marketplace) Azure Marketplace Our [Azure Marketplace](https://portal.azure.com/#view/Microsoft_Azure_Marketplace/GalleryItemDetailsBladeNopdl/id/qdrantsolutionsgmbh1698769709989.qdrant-db/selectionMode~/false/resourceGroupId//resourceGroupLocation//dontDiscardJourney~/false/selectedMenuId/home/launchingContext~/%7B%22galleryItemId%22%3A%22qdrantsolutionsgmbh1698769709989.qdrant-dbqdrant_cloud_unit%22%2C%22source%22%3A%5B%22GalleryFeaturedMenuItemPart%22%2C%22VirtualizedTileDetails%22%5D%2C%22menuItemId%22%3A%22home%22%2C%22subMenuItemId%22%3A%22Search%20results%22%2C%22telemetryId%22%3A%221df5537b-8b29-4200-80ce-0cd38c7e0e56%22%7D/searchTelemetryId/6b44fb90-7b9c-4286-aad8-59f88f3cc2ff) listing streamlines access to Qdrant for users who rely on Microsoft Azure for hosting and application development. To subscribe: 1. Go to Billing Details screen in the [Qdrant Cloud Console](https://cloud.qdrant.io/) 2. Select **Azure Marketplace** as the payment method. You will be redirected to the Azure Marketplace listing for Qdrant. 3. Select **Subscribe**. 4. On the next screen, choose options as required, and select **Review + Subscribe**. 5. After reviewing all settings, select **Subscribe**. 6. Once the SaaS subscription is created, select **Configure account now**. You will be redirected to the Billing Details screen in the [Qdrant Cloud Console](https://cloud.qdrant.io/). From there you can start to create Qdrant database clusters. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-pricing-payments.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-pricing-payments.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-137-lllmstxt|> ## frameworks - [Documentation](https://qdrant.tech/documentation/) - Frameworks ## [Anchor](https://qdrant.tech/documentation/frameworks/\#framework-integrations) Framework Integrations | Framework | Description | | --- | --- | | [AutoGen](https://qdrant.tech/documentation/frameworks/autogen/) | Framework from Microsoft building LLM applications using multiple conversational agents. | | [Camel](https://qdrant.tech/documentation/frameworks/camel/) | Framework to build and use LLM-based agents for real-world task solving | | [Canopy](https://qdrant.tech/documentation/frameworks/canopy/) | Framework from Pinecone for building RAG applications using LLMs and knowledge bases. | | [Cheshire Cat](https://qdrant.tech/documentation/frameworks/cheshire-cat/) | Framework to create personalized AI assistants using custom data. | | [CrewAI](https://qdrant.tech/documentation/frameworks/crewai/) | CrewAI is a framework to build automated workflows using multiple AI agents that perform complex tasks. | | [Dagster](https://qdrant.tech/documentation/frameworks/dagster/) | Python framework for data orchestration with integrated lineage, observability. | | [DeepEval](https://qdrant.tech/documentation/frameworks/deepeval/) | Python framework for testing large language model systems. | | [DocArray](https://qdrant.tech/documentation/frameworks/docarray/) | Python library for managing data in multi-modal AI applications. | | [DSPy](https://qdrant.tech/documentation/frameworks/dspy/) | Framework for algorithmically optimizing LM prompts and weights. | | [dsRAG](https://qdrant.tech/documentation/frameworks/dsrag/) | High-performance Python retrieval engine for unstructured data. | | [Dynamiq](https://qdrant.tech/documentation/frameworks/dynamiq/) | Dynamiq is all-in-one Gen AI framework, designed to streamline the development of AI-powered applications. | | [Feast](https://qdrant.tech/documentation/frameworks/feast/) | Open-source feature store to operate production ML systems at scale as a set of features. | | [Fifty-One](https://qdrant.tech/documentation/frameworks/fifty-one/) | Toolkit for building high-quality datasets and computer vision models. | | [Genkit](https://qdrant.tech/documentation/frameworks/genkit/) | Framework to build, deploy, and monitor production-ready AI-powered apps. | | [Haystack](https://qdrant.tech/documentation/frameworks/haystack/) | LLM orchestration framework to build customizable, production-ready LLM applications. | | [HoneyHive](https://qdrant.tech/documentation/frameworks/honeyhive/) | AI observability and evaluation platform that provides tracing and monitoring tools for GenAI pipelines. | | [Lakechain](https://qdrant.tech/documentation/frameworks/lakechain/) | Python framework for deploying document processing pipelines on AWS using infrastructure-as-code. | | [Langchain](https://qdrant.tech/documentation/frameworks/langchain/) | Python framework for building context-aware, reasoning applications using LLMs. | | [Langchain-Go](https://qdrant.tech/documentation/frameworks/langchain-go/) | Go framework for building context-aware, reasoning applications using LLMs. | | [Langchain4j](https://qdrant.tech/documentation/frameworks/langchain4j/) | Java framework for building context-aware, reasoning applications using LLMs. | | [LangGraph](https://qdrant.tech/documentation/frameworks/langgraph/) | Python, Javascript libraries for building stateful, multi-actor applications. | | [LlamaIndex](https://qdrant.tech/documentation/frameworks/llama-index/) | A data framework for building LLM applications with modular integrations. | | [Mastra](https://qdrant.tech/documentation/frameworks/mastra/) | Typescript framework to build AI applications and features quickly. | | [Mirror Security](https://qdrant.tech/documentation/frameworks/mirror-security/) | Python framework for vector encryption and access control. | | [Mem0](https://qdrant.tech/documentation/frameworks/mem0/) | Self-improving memory layer for LLM applications, enabling personalized AI experiences. | | [Neo4j GraphRAG](https://qdrant.tech/documentation/frameworks/neo4j-graphrag/) | Package to build graph retrieval augmented generation (GraphRAG) applications using Neo4j and Python. | | [NLWeb](https://qdrant.tech/documentation/frameworks/nlweb/) | A framework to turn websites into chat-ready data using schema.org and associated data formats. | | [OpenAI Agents](https://qdrant.tech/documentation/frameworks/openai-agents/) | Python framework for managing multiple AI agents that can work together. | | [Pandas-AI](https://qdrant.tech/documentation/frameworks/pandas-ai/) | Python library to query/visualize your data (CSV, XLSX, PostgreSQL, etc.) in natural language | | [Ragbits](https://qdrant.tech/documentation/frameworks/ragbits/) | Python package that offers essential “bits” for building powerful Retrieval-Augmented Generation (RAG) applications. | | [Rig-rs](https://qdrant.tech/documentation/frameworks/rig-rs/) | Rust library for building scalable, modular, and ergonomic LLM-powered applications. | | [Semantic Router](https://qdrant.tech/documentation/frameworks/semantic-router/) | Python library to build a decision-making layer for AI applications using vector search. | | [SmolAgents](https://qdrant.tech/documentation/frameworks/smolagents/) | Barebones library for agents. Agents write python code to call tools and orchestrate other agent. | | [Solon](https://qdrant.tech/documentation/frameworks/solon/) | A lightweight, high-performance Java enterprise framework | | [Spring AI](https://qdrant.tech/documentation/frameworks/spring-ai/) | Java AI framework for building with Spring design principles such as portability and modular design. | | [Superduper](https://qdrant.tech/documentation/frameworks/superduper/) | Framework for building flexible, compositional AI apps which may be applied directly to databases. | | [Sycamore](https://qdrant.tech/documentation/frameworks/sycamore/) | Document processing engine for ETL, RAG, LLM-based applications, and analytics on unstructured data. | | [Testcontainers](https://qdrant.tech/documentation/frameworks/testcontainers/) | Framework for providing throwaway, lightweight instances of systems for testing | | [txtai](https://qdrant.tech/documentation/frameworks/txtai/) | Python library for semantic search, LLM orchestration and language model workflows. | | [Vanna AI](https://qdrant.tech/documentation/frameworks/vanna-ai/) | Python RAG framework for SQL generation and querying. | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/frameworks/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/frameworks/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-138-lllmstxt|> ## rag-is-dead - [Articles](https://qdrant.tech/articles/) - Is RAG Dead? The Role of Vector Databases in Vector Search \| Qdrant [Back to RAG & GenAI](https://qdrant.tech/articles/rag-and-genai/) --- # Is RAG Dead? The Role of Vector Databases in Vector Search \| Qdrant David Myriel · February 27, 2024 ![Is RAG Dead? The Role of Vector Databases in Vector Search | Qdrant](https://qdrant.tech/articles_data/rag-is-dead/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/rag-is-dead/\#is-rag-dead-the-role-of-vector-databases-in-ai-efficiency-and-vector-search) Is RAG Dead? The Role of Vector Databases in AI Efficiency and Vector Search When Anthropic came out with a context window of 100K tokens, they said: “ _[Vector search](https://qdrant.tech/solutions/) is dead. LLMs are getting more accurate and won’t need RAG anymore._” Google’s Gemini 1.5 now offers a context window of 10 million tokens. [Their supporting paper](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf) claims victory over accuracy issues, even when applying Greg Kamradt’s [NIAH methodology](https://twitter.com/GregKamradt/status/1722386725635580292). _It’s over. [RAG](https://qdrant.tech/articles/what-is-rag-in-ai/) (Retrieval Augmented Generation) must be completely obsolete now. Right?_ No. Larger context windows are never the solution. Let me repeat. Never. They require more computational resources and lead to slower processing times. The community is already stress testing Gemini 1.5: ![RAG and Gemini 1.5](https://qdrant.tech/articles_data/rag-is-dead/rag-is-dead-1.png) This is not surprising. LLMs require massive amounts of compute and memory to run. To cite Grant, running such a model by itself “would deplete a small coal mine to generate each completion”. Also, who is waiting 30 seconds for a response? ## [Anchor](https://qdrant.tech/articles/rag-is-dead/\#context-stuffing-is-not-the-solution) Context stuffing is not the solution > Relying on context is expensive, and it doesn’t improve response quality in real-world applications. Retrieval based on [vector search](https://qdrant.tech/solutions/) offers much higher precision. If you solely rely on an [LLM](https://qdrant.tech/articles/what-is-rag-in-ai/) to perfect retrieval and precision, you are doing it wrong. A large context window makes it harder to focus on relevant information. This increases the risk of errors or hallucinations in its responses. Google found Gemini 1.5 significantly more accurate than GPT-4 at shorter context lengths and “a very small decrease in recall towards 1M tokens”. The recall is still below 0.8. ![Gemini 1.5 Data](https://qdrant.tech/articles_data/rag-is-dead/rag-is-dead-2.png) We don’t think 60-80% is good enough. The LLM might retrieve enough relevant facts in its context window, but it still loses up to 40% of the available information. > The whole point of vector search is to circumvent this process by efficiently picking the information your app needs to generate the best response. A [vector database](https://qdrant.tech/) keeps the compute load low and the query response fast. You don’t need to wait for the LLM at all. Qdrant’s benchmark results are strongly in favor of accuracy and efficiency. We recommend that you consider them before deciding that an LLM is enough. Take a look at our [open-source benchmark reports](https://qdrant.tech/benchmarks/) and [try out the tests](https://github.com/qdrant/vector-db-benchmark) yourself. ## [Anchor](https://qdrant.tech/articles/rag-is-dead/\#vector-search-in-compound-systems) Vector search in compound systems The future of AI lies in careful system engineering. As per [Zaharia et al.](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/), results from Databricks find that “60% of LLM applications use some form of RAG, while 30% use multi-step chains.” Even Gemini 1.5 demonstrates the need for a complex strategy. When looking at [Google’s MMLU Benchmark](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf), the model was called 32 times to reach a score of 90.0% accuracy. This shows us that even a basic compound arrangement is superior to monolithic models. As a retrieval system, a [vector database](https://qdrant.tech/) perfectly fits the need for compound systems. Introducing them into your design opens the possibilities for superior applications of LLMs. It is superior because it’s faster, more accurate, and much cheaper to run. > The key advantage of RAG is that it allows an LLM to pull in real-time information from up-to-date internal and external knowledge sources, making it more dynamic and adaptable to new information. - Oliver Molander, CEO of IMAGINAI ## [Anchor](https://qdrant.tech/articles/rag-is-dead/\#qdrant-scales-to-enterprise-rag-scenarios) Qdrant scales to enterprise RAG scenarios People still don’t understand the economic benefit of vector databases. Why would a large corporate AI system need a standalone vector database like [Qdrant](https://qdrant.tech/)? In our minds, this is the most important question. Let’s pretend that LLMs cease struggling with context thresholds altogether. **How much would all of this cost?** If you are running a RAG solution in an enterprise environment with petabytes of private data, your compute bill will be unimaginable. Let’s assume 1 cent per 1K input tokens (which is the current GPT-4 Turbo pricing). Whatever you are doing, every time you go 100 thousand tokens deep, it will cost you $1. That’s a buck a question. > According to our estimations, vector search queries are **at least** 100 million times cheaper than queries made by LLMs. Conversely, the only up-front investment with vector databases is the indexing (which requires more compute). After this step, everything else is a breeze. Once setup, Qdrant easily scales via [features like Multitenancy and Sharding](https://qdrant.tech/articles/multitenancy/). This lets you scale up your reliance on the vector retrieval process and minimize your use of the compute-heavy LLMs. As an optimization measure, Qdrant is irreplaceable. Julien Simon from HuggingFace says it best: > RAG is not a workaround for limited context size. For mission-critical enterprise use cases, RAG is a way to leverage high-value, proprietary company knowledge that will never be found in public datasets used for LLM training. At the moment, the best place to index and query this knowledge is some sort of vector index. In addition, RAG downgrades the LLM to a writing assistant. Since built-in knowledge becomes much less important, a nice small 7B open-source model usually does the trick at a fraction of the cost of a huge generic model. ## [Anchor](https://qdrant.tech/articles/rag-is-dead/\#get-superior-accuracy-with-qdrants-vector-database) Get superior accuracy with Qdrant’s vector database As LLMs continue to require enormous computing power, users will need to leverage vector search and [RAG](https://qdrant.tech/rag/rag-evaluation-guide/). Our customers remind us of this fact every day. As a product, [our vector database](https://qdrant.tech/) is highly scalable and business-friendly. We develop our features strategically to follow our company’s Unix philosophy. We want to keep Qdrant compact, efficient and with a focused purpose. This purpose is to empower our customers to use it however they see fit. When large enterprises release their generative AI into production, they need to keep costs under control, while retaining the best possible quality of responses. Qdrant has the [vector search solutions](https://qdrant.tech/solutions/) to do just that. Revolutionize your vector search capabilities and get started with [a Qdrant demo](https://qdrant.tech/contact-us/). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/rag-is-dead.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/rag-is-dead.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-139-lllmstxt|> ## bm42 - [Articles](https://qdrant.tech/articles/) - BM42: New Baseline for Hybrid Search [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # BM42: New Baseline for Hybrid Search Andrey Vasnetsov · July 01, 2024 ![BM42: New Baseline for Hybrid Search](https://qdrant.tech/articles_data/bm42/preview/title.jpg) For the last 40 years, BM25 has served as the standard for search engines. It is a simple yet powerful algorithm that has been used by many search engines, including Google, Bing, and Yahoo. Though it seemed that the advent of vector search would diminish its influence, it did so only partially. The current state-of-the-art approach to retrieval nowadays tries to incorporate BM25 along with embeddings into a hybrid search system. However, the use case of text retrieval has significantly shifted since the introduction of RAG. Many assumptions upon which BM25 was built are no longer valid. For example, the typical length of documents and queries vary significantly between traditional web search and modern RAG systems. In this article, we will recap what made BM25 relevant for so long and why alternatives have struggled to replace it. Finally, we will discuss BM42, as the next step in the evolution of lexical search. ## [Anchor](https://qdrant.tech/articles/bm42/\#why-has-bm25-stayed-relevant-for-so-long) Why has BM25 stayed relevant for so long? To understand why, we need to analyze its components. The famous BM25 formula is defined as: score(D,Q)=∑i=1NIDF(qi)×f(qi,D)⋅(k1+1)f(qi,D)+k1⋅(1−b+b⋅\|D\|avgdl) Let’s simplify this to gain a better understanding. - The score(D,Q) \- means that we compute the score for each pair of document D and query Q. - The ∑i=1N \- means that each of N terms in the query contribute to the final score as a part of the sum. - The IDF(qi) \- is the inverse document frequency. The more rare the term qi is, the more it contributes to the score. A simplified formula for this is: IDF(qi)=Number of documentsNumber of documents with qi It is fair to say that the `IDF` is the most important part of the BM25 formula. `IDF` selects the most important terms in the query relative to the specific document collection. So intuitively, we can interpret the `IDF` as **term importance within the corpora**. That explains why BM25 is so good at handling queries, which dense embeddings consider out-of-domain. The last component of the formula can be intuitively interpreted as **term importance within the document**. This might look a bit complicated, so let’s break it down. Term importance in document (qi)=f(qi,D)⋅(k1+1)f(qi,D)+k1⋅(1−b+b⋅\|D\|avgdl) - The f(qi,D) \- is the frequency of the term qi in the document D. Or in other words, the number of times the term qi appears in the document D. - The k1 and b are the hyperparameters of the BM25 formula. In most implementations, they are constants set to k1=1.5 and b=0.75. Those constants define relative implications of the term frequency and the document length in the formula. - The \|D\|avgdl \- is the relative length of the document D compared to the average document length in the corpora. The intuition befind this part is following: if the token is found in the smaller document, it is more likely that this token is important for this document. #### [Anchor](https://qdrant.tech/articles/bm42/\#will-bm25-term-importance-in-the-document-work-for-rag) Will BM25 term importance in the document work for RAG? As we can see, the _term importance in the document_ heavily depends on the statistics within the document. Moreover, statistics works well if the document is long enough. Therefore, it is suitable for searching webpages, books, articles, etc. However, would it work as well for modern search applications, such as RAG? Let’s see. The typical length of a document in RAG is much shorter than that of web search. In fact, even if we are working with webpages and articles, we would prefer to split them into chunks so that a) Dense models can handle them and b) We can pinpoint the exact part of the document which is relevant to the query As a result, the document size in RAG is small and fixed. That effectively renders the term importance in the document part of the BM25 formula useless. The term frequency in the document is always 0 or 1, and the relative length of the document is always 1. So, the only part of the BM25 formula that is still relevant for RAG is `IDF`. Let’s see how we can leverage it. ## [Anchor](https://qdrant.tech/articles/bm42/\#why-splade-is-not-always-the-answer) Why SPLADE is not always the answer Before discussing our new approach, let’s examine the current state-of-the-art alternative to BM25 - SPLADE. The idea behind SPLADE is interesting—what if we let a smart, end-to-end trained model generate a bag-of-words representation of the text for us? It will assign all the weights to the tokens, so we won’t need to bother with statistics and hyperparameters. The documents are then represented as a sparse embedding, where each token is represented as an element of the sparse vector. And it works in academic benchmarks. Many papers report that SPLADE outperforms BM25 in terms of retrieval quality. This performance, however, comes at a cost. - **Inappropriate Tokenizer**: To incorporate transformers for this task, SPLADE models require using a standard transformer tokenizer. These tokenizers are not designed for retrieval tasks. For example, if the word is not in the (quite limited) vocabulary, it will be either split into subwords or replaced with a `[UNK]` token. This behavior works well for language modeling but is completely destructive for retrieval tasks. - **Expensive Token Expansion**: In order to compensate the tokenization issues, SPLADE uses _token expansion_ technique. This means that we generate a set of similar tokens for each token in the query. There are a few problems with this approach: - It is computationally and memory expensive. We need to generate more values for each token in the document, which increases both the storage size and retrieval time. - It is not always clear where to stop with the token expansion. The more tokens we generate, the more likely we are to get the relevant one. But simultaneously, the more tokens we generate, the more likely we are to get irrelevant results. - Token expansion dilutes the interpretability of the search. We can’t say which tokens were used in the document and which were generated by the token expansion. - **Domain and Language Dependency**: SPLADE models are trained on specific corpora. This means that they are not always generalizable to new or rare domains. As they don’t use any statistics from the corpora, they cannot adapt to the new domain without fine-tuning. - **Inference Time**: Additionally, currently available SPLADE models are quite big and slow. They usually require a GPU to make the inference in a reasonable time. At Qdrant, we acknowledge the aforementioned problems and are looking for a solution. Our idea was to combine the best of both worlds - the simplicity and interpretability of BM25 and the intelligence of transformers while avoiding the pitfalls of SPLADE. And here is what we came up with. ## [Anchor](https://qdrant.tech/articles/bm42/\#the-best-of-both-worlds) The best of both worlds As previously mentioned, `IDF` is the most important part of the BM25 formula. In fact it is so important, that we decided to build its calculation into the Qdrant engine itself. Check out our latest [release notes](https://github.com/qdrant/qdrant/releases/tag/v1.10.0). This type of separation allows streaming updates of the sparse embeddings while keeping the `IDF` calculation up-to-date. As for the second part of the formula, _the term importance within the document_ needs to be rethought. Since we can’t rely on the statistics within the document, we can try to use the semantics of the document instead. And semantics is what transformers are good at. Therefore, we only need to solve two problems: - How does one extract the importance information from the transformer? - How can tokenization issues be avoided? ### [Anchor](https://qdrant.tech/articles/bm42/\#attention-is-all-you-need) Attention is all you need Transformer models, even those used to generate embeddings, generate a bunch of different outputs. Some of those outputs are used to generate embeddings. Others are used to solve other kinds of tasks, such as classification, text generation, etc. The one particularly interesting output for us is the attention matrix. ![Attention matrix](https://qdrant.tech/articles_data/bm42/attention-matrix.png) Attention matrix The attention matrix is a square matrix, where each row and column corresponds to the token in the input sequence. It represents the importance of each token in the input sequence for each other. The classical transformer models are trained to predict masked tokens in the context, so the attention weights define which context tokens influence the masked token most. Apart from regular text tokens, the transformer model also has a special token called `[CLS]`. This token represents the whole sequence in the classification tasks, which is exactly what we need. By looking at the attention row for the `[CLS]` token, we can get the importance of each token in the document for the whole document. ```python sentences = "Hello, World - is the starting point in most programming languages" features = transformer.tokenize(sentences) --- # ... attentions = transformer.auto_model(**features, output_attentions=True).attentions weights = torch.mean(attentions[-1][0,:,0], axis=0) --- # │ │ │ └─── [CLS] token is the first one --- # │ │ └─────── First item of the batch --- # │ └────────── Last transformer layer --- # └────────────────────────── Average all 6 attention heads for weight, token in zip(weights, tokens): print(f"{token}: {weight}") --- # [CLS] : 0.434 // Filter out the [CLS] token --- # world : 0.107 // <-- The most important token --- # programming : 0.060 // <-- The third most important token --- # languages : 0.062 // <-- The second most important token --- # [SEP] : 0.047 // Filter out the [SEP] token ``` The resulting formula for the BM42 score would look like this: score(D,Q)=∑i=1NIDF(qi)×Attention(CLS,qi) Note that classical transformers have multiple attention heads, so we can get multiple importance vectors for the same document. The simplest way to combine them is to simply average them. These averaged attention vectors make up the importance information we were looking for. The best part is, one can get them from any transformer model, without any additional training. Therefore, BM42 can support any natural language as long as there is a transformer model for it. In our implementation, we use the `sentence-transformers/all-MiniLM-L6-v2` model, which gives a huge boost in the inference speed compared to the SPLADE models. In practice, any transformer model can be used. It doesn’t require any additional training, and can be easily adapted to work as BM42 backend. ### [Anchor](https://qdrant.tech/articles/bm42/\#wordpiece-retokenization) WordPiece retokenization The final piece of the puzzle we need to solve is the tokenization issue. In order to get attention vectors, we need to use native transformer tokenization. But this tokenization is not suitable for the retrieval tasks. What can we do about it? Actually, the solution we came up with is quite simple. We reverse the tokenization process after we get the attention vectors. Transformers use [WordPiece](https://huggingface.co/learn/nlp-course/en/chapter6/6) tokenization. In case it sees the word, which is not in the vocabulary, it splits it into subwords. Here is how that looks: ```text "unbelievable" -> ["un", "##believ", "##able"] ``` What can merge the subwords back into the words. Luckily, the subwords are marked with the `##` prefix, so we can easily detect them. Since the attention weights are normalized, we can simply sum the attention weights of the subwords to get the attention weight of the word. After that, we can apply the same traditional NLP techniques, as - Removing of the stop-words - Removing of the punctuation - Lemmatization In this way, we can significantly reduce the number of tokens, and therefore minimize the memory footprint of the sparse embeddings. We won’t simultaneously compromise the ability to match (almost) exact tokens. ## [Anchor](https://qdrant.tech/articles/bm42/\#practical-examples) Practical examples | Trait | BM25 | SPLADE | BM42 | | --- | --- | --- | --- | | Interpretability | High ✅ | Ok 🆗 | High ✅ | | Document Inference speed | Very high ✅ | Slow 🐌 | High ✅ | | Query Inference speed | Very high ✅ | Slow 🐌 | Very high ✅ | | Memory footprint | Low ✅ | High ❌ | Low ✅ | | In-domain accuracy | Ok 🆗 | High ✅ | High ✅ | | Out-of-domain accuracy | Ok 🆗 | Low ❌ | Ok 🆗 | | Small documents accuracy | Low ❌ | High ✅ | High ✅ | | Large documents accuracy | High ✅ | Low ❌ | Ok 🆗 | | Unknown tokens handling | Yes ✅ | Bad ❌ | Yes ✅ | | Multi-lingual support | Yes ✅ | No ❌ | Yes ✅ | | Best Match | Yes ✅ | No ❌ | Yes ✅ | Starting from Qdrant v1.10.0, BM42 can be used in Qdrant via FastEmbed inference. Let’s see how you can setup a collection for hybrid search with BM42 and [jina.ai](https://jina.ai/embeddings/) dense embeddings. httppython ```http PUT collections/my-hybrid-collection { "vectors": { "jina": { "size": 768, "distance": "Cosine" } }, "sparse_vectors": { "bm42": { "modifier": "idf" // <--- This parameter enables the IDF calculation } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient() client.create_collection( collection_name="my-hybrid-collection", vectors_config={ "jina": models.VectorParams( size=768, distance=models.Distance.COSINE, ) }, sparse_vectors_config={ "bm42": models.SparseVectorParams( modifier=models.Modifier.IDF, ) } ) ``` The search query will retrieve the documents with both dense and sparse embeddings and combine the scores using the Reciprocal Rank Fusion (RRF) algorithm. ```python from fastembed import SparseTextEmbedding, TextEmbedding query_text = "best programming language for beginners?" model_bm42 = SparseTextEmbedding(model_name="Qdrant/bm42-all-minilm-l6-v2-attentions") model_jina = TextEmbedding(model_name="jinaai/jina-embeddings-v2-base-en") sparse_embedding = list(model_bm42.query_embed(query_text))[0] dense_embedding = list(model_jina.query_embed(query_text))[0] client.query_points( collection_name="my-hybrid-collection", prefetch=[\ models.Prefetch(query=sparse_embedding.as_object(), using="bm42", limit=10),\ models.Prefetch(query=dense_embedding.tolist(), using="jina", limit=10),\ ], query=models.FusionQuery(fusion=models.Fusion.RRF), # <--- Combine the scores limit=10 ) ``` ### [Anchor](https://qdrant.tech/articles/bm42/\#benchmarks) Benchmarks To prove the point further we have conducted some benchmarks to highlight the cases where BM42 outperforms BM25. Please note, that we didn’t intend to make an exhaustive evaluation, as we are presenting a new approach, not a new model. For out experiments we choose [quora](https://huggingface.co/datasets/BeIR/quora) dataset, which represents a question-deduplication task ~~the Question-Answering task~~. The typical example of the dataset is the following: ```text {"_id": "109", "text": "How GST affects the CAs and tax officers?"} {"_id": "110", "text": "Why can't I do my homework?"} {"_id": "111", "text": "How difficult is it get into RSI?"} ``` As you can see, it has pretty short texts, there are not much of the statistics to rely on. After encoding with BM42, the average vector size is only **5.6 elements per document**. With `datatype: uint8` available in Qdrant, the total size of the sparse vector index is about **13MB** for ~530k documents. As a reference point, we use: - BM25 with tantivy - the [sparse vector BM25 implementation](https://github.com/qdrant/bm42_eval/blob/master/index_bm25_qdrant.py) with the same preprocessing pipeline like for BM42: tokenization, stop-words removal, and lemmatization | | BM25 (tantivy) | BM25 (Sparse) | BM42 | | --- | --- | --- | --- | | ~~Precision @ 10~~ \* | ~~0.45~~ | ~~0.45~~ | ~~0.49~~ | | Recall @ 10 | ~~0.71~~ **0.89** | 0.83 | 0.85 | \\* \- values were corrected after the publication due to a mistake in the evaluation script. To make our benchmarks transparent, we have published scripts we used for the evaluation: see [github repo](https://github.com/qdrant/bm42_eval). Please note, that both BM25 and BM42 won’t work well on their own in a production environment. Best results are achieved with a combination of sparse and dense embeddings in a hybrid approach. In this scenario, the two models are complementary to each other. The sparse model is responsible for exact token matching, while the dense model is responsible for semantic matching. Some more advanced models might outperform default `sentence-transformers/all-MiniLM-L6-v2` model we were using. We encourage developers involved in training embedding models to include a way to extract attention weights and contribute to the BM42 backend. ## [Anchor](https://qdrant.tech/articles/bm42/\#fostering-curiosity-and-experimentation) Fostering curiosity and experimentation Despite all of its advantages, BM42 is not always a silver bullet. For large documents without chunks, BM25 might still be a better choice. There might be a smarter way to extract the importance information from the transformer. There could be a better method to weigh IDF against attention scores. Qdrant does not specialize in model training. Our core project is the search engine itself. However, we understand that we are not operating in a vacuum. By introducing BM42, we are stepping up to empower our community with novel tools for experimentation. We truly believe that the sparse vectors method is at exact level of abstraction to yield both powerful and flexible results. Many of you are sharing your recent Qdrant projects in our [Discord channel](https://discord.com/invite/qdrant). Feel free to try out BM42 and let us know what you come up with. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/bm42.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/bm42.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-140-lllmstxt|> ## memory-consumption - [Articles](https://qdrant.tech/articles/) - Minimal RAM you need to serve a million vectors [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Minimal RAM you need to serve a million vectors Andrei Vasnetsov · December 07, 2022 ![Minimal RAM you need to serve a million vectors](https://qdrant.tech/articles_data/memory-consumption/preview/title.jpg) When it comes to measuring the memory consumption of our processes, we often rely on tools such as `htop` to give us an indication of how much RAM is being used. However, this method can be misleading and doesn’t always accurately reflect the true memory usage of a process. There are many different ways in which `htop` may not be a reliable indicator of memory usage. For instance, a process may allocate memory in advance but not use it, or it may not free deallocated memory, leading to overstated memory consumption. A process may be forked, which means that it will have a separate memory space, but it will share the same code and data with the parent process. This means that the memory consumption of the child process will be counted twice. Additionally, a process may utilize disk cache, which is also accounted as resident memory in the `htop` measurements. As a result, even if `htop` shows that a process is using 10GB of memory, it doesn’t necessarily mean that the process actually requires 10GB of RAM to operate efficiently. In this article, we will explore how to properly measure RAM usage and optimize [Qdrant](https://qdrant.tech/) for optimal memory consumption. ## [Anchor](https://qdrant.tech/articles/memory-consumption/\#how-to-measure-actual-ram-requirements) How to measure actual RAM requirements We need to know memory consumption in order to estimate how much RAM is required to run the program. So in order to determine that, we can conduct a simple experiment. Let’s limit the allowed memory of the process and observe at which point it stops functioning. In this way we can determine the minimum amount of RAM the program needs to operate. One way to do this is by conducting a grid search, but a more efficient method is to use binary search to quickly find the minimum required amount of RAM. We can use docker to limit the memory usage of the process. Before running each benchmark, it is important to clear the page cache with the following command: ```bash sudo bash -c 'sync; echo 1 > /proc/sys/vm/drop_caches' ``` This ensures that the process doesn’t utilize any data from previous runs, providing more accurate and consistent results. We can use the following command to run Qdrant with a memory limit of 1GB: ```bash docker run -it --rm \ --memory 1024mb \ --network=host \ -v "$(pwd)/data/storage:/qdrant/storage" \ qdrant/qdrant:latest ``` ## [Anchor](https://qdrant.tech/articles/memory-consumption/\#lets-run-some-benchmarks) Let’s run some benchmarks Let’s run some benchmarks to see how much RAM Qdrant needs to serve 1 million vectors. We can use the `glove-100-angular` and scripts from the [vector-db-benchmark](https://github.com/qdrant/vector-db-benchmark) project to upload and query the vectors. With the first run we will use the default configuration of Qdrant with all data stored in RAM. ```bash --- # Upload vectors python run.py --engines qdrant-all-in-ram --datasets glove-100-angular ``` After uploading vectors, we will repeat the same experiment with different RAM limits to see how they affect the memory consumption and search speed. ```bash --- # Search vectors python run.py --engines qdrant-all-in-ram --datasets glove-100-angular --skip-upload ``` ### [Anchor](https://qdrant.tech/articles/memory-consumption/\#all-in-memory) All in Memory In the first experiment, we tested how well our system performs when all vectors are stored in memory. We tried using different amounts of memory, ranging from 1512mb to 1024mb, and measured the number of requests per second (rps) that our system was able to handle. | Memory | Requests/s | | --- | --- | | 1512mb | 774.38 | | 1256mb | 760.63 | | 1200mb | 794.72 | | 1152mb | out of memory | | 1024mb | out of memory | We found that 1152MB memory limit resulted in our system running out of memory, but using 1512mb, 1256mb, and 1200mb of memory resulted in our system being able to handle around 780 RPS. This suggests that about 1.2GB of memory is needed to serve around 1 million vectors, and there is no speed degradation when limiting memory usage above 1.2GB. ### [Anchor](https://qdrant.tech/articles/memory-consumption/\#vectors-stored-using-mmap) Vectors stored using MMAP Let’s go a bit further! In the second experiment, we tested how well our system performs when **vectors are stored using the memory-mapped file** (mmap). Create collection with: ```http PUT /collections/benchmark { "vectors": { ... "on_disk": true } } ``` This configuration tells Qdrant to use mmap for vectors if the segment size is greater than 20000Kb (which is approximately 40K 128d-vectors). Now the out-of-memory happens when we allow using **600mb** RAM only Experiments details | Memory | Requests/s | | --- | --- | | 1200mb | 759.94 | | 1100mb | 687.00 | | 1000mb | 10 | — use a bit faster disk — | Memory | Requests/s | | --- | --- | | 1000mb | 25 rps | | 750mb | 5 rps | | 625mb | 2.5 rps | | 600mb | out of memory | At this point we have to switch from network-mounted storage to a faster disk, as the network-based storage is too slow to handle the amount of sequential reads that our system needs to serve the queries. But let’s first see how much RAM we need to serve 1 million vectors and then we will discuss the speed optimization as well. ### [Anchor](https://qdrant.tech/articles/memory-consumption/\#vectors-and-hnsw-graph-stored-using-mmap) Vectors and HNSW graph stored using MMAP In the third experiment, we tested how well our system performs when vectors and [HNSW](https://qdrant.tech/articles/filtrable-hnsw/) graph are stored using the memory-mapped files. Create collection with: ```http PUT /collections/benchmark { "vectors": { ... "on_disk": true }, "hnsw_config": { "on_disk": true }, ... } ``` With this configuration we are able to serve 1 million vectors with **only 135mb of RAM**! Experiments details | Memory | Requests/s | | --- | --- | | 600mb | 5 rps | | 300mb | 0.9 rps / 1.1 sec per query | | 150mb | 0.4 rps / 2.5 sec per query | | 135mb | 0.33 rps / 3 sec per query | | 125mb | out of memory | At this point the importance of the disk speed becomes critical. We can serve the search requests with 135mb of RAM, but the speed of the requests makes it impossible to use the system in production. Let’s see how we can improve the speed. ## [Anchor](https://qdrant.tech/articles/memory-consumption/\#how-to-speed-up-the-search) How to speed up the search To measure the impact of disk parameters on search speed, we used the `fio` tool to test the speed of different types of disks. ```bash --- # Run fio to check the random reads speed fio --randrepeat=1 \ --ioengine=libaio \ --direct=1 \ --gtod_reduce=1 \ --name=fiotest \ --filename=testfio \ --bs=4k \ --iodepth=64 \ --size=8G \ --readwrite=randread ``` Initially, we tested on a network-mounted disk, but its performance was too slow, with a read IOPS of 6366 and a bandwidth of 24.9 MiB/s: ```text read: IOPS=6366, BW=24.9MiB/s (26.1MB/s)(8192MiB/329424msec) ``` To improve performance, we switched to a local disk, which showed much faster results, with a read IOPS of 63.2k and a bandwidth of 247 MiB/s: ```text read: IOPS=63.2k, BW=247MiB/s (259MB/s)(8192MiB/33207msec) ``` That gave us a significant speed boost, but we wanted to see if we could improve performance even further. To do that, we switched to a machine with a local SSD, which showed even better results, with a read IOPS of 183k and a bandwidth of 716 MiB/s: ```text read: IOPS=183k, BW=716MiB/s (751MB/s)(8192MiB/11438msec) ``` Let’s see how these results translate into search speed: | Memory | RPS with IOPS=63.2k | RPS with IOPS=183k | | --- | --- | --- | | 600mb | 5 | 50 | | 300mb | 0.9 | 13 | | 200mb | 0.5 | 8 | | 150mb | 0.4 | 7 | As you can see, the speed of the disk has a significant impact on the search speed. With a local SSD, we were able to increase the search speed by 10x! With the production-grade disk, the search speed could be even higher. Some configurations of the SSDs can reach 1M IOPS and more. Which might be an interesting option to serve large datasets with low search latency in Qdrant. ## [Anchor](https://qdrant.tech/articles/memory-consumption/\#conclusion) Conclusion In this article, we showed that Qdrant has flexibility in terms of RAM usage and can be used to serve large datasets. It provides configurable trade-offs between RAM usage and search speed. If you’re interested to learn more about Qdrant, [book a demo today](https://qdrant.tech/contact-us/)! We are eager to learn more about how you use Qdrant in your projects, what challenges you face, and how we can help you solve them. Please feel free to join our [Discord](https://qdrant.to/discord) and share your experience with us! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/memory-consumption.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/memory-consumption.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-141-lllmstxt|> ## distance-based-exploration - [Articles](https://qdrant.tech/articles/) - Distance-based data exploration [Back to Data Exploration](https://qdrant.tech/articles/data-exploration/) --- # Distance-based data exploration Andrey Vasnetsov · March 11, 2025 ![Distance-based data exploration](https://qdrant.tech/articles_data/distance-based-exploration/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/distance-based-exploration/\#hidden-structure) Hidden Structure When working with large collections of documents, images, or other arrays of unstructured data, it often becomes useful to understand the big picture. Examining data points individually is not always the best way to grasp the structure of the data. ![Data visualization](https://qdrant.tech/articles_data/distance-based-exploration/no-context-data.png) Datapoints without context, pretty much useless As numbers in a table obtain meaning when plotted on a graph, visualising distances (similar/dissimilar) between unstructured data items can reveal hidden structures and patterns. ![Data visualization](https://qdrant.tech/articles_data/distance-based-exploration/data-on-chart.png) Vizualized chart, very intuitive There are many tools to investigate data similarity, and Qdrant’s [1.12 release](https://qdrant.tech/blog/qdrant-1.12.x/) made it much easier to start this investigation. With the new [Distance Matrix API](https://qdrant.tech/documentation/concepts/explore/#distance-matrix), Qdrant handles the most computationally expensive part of the process—calculating the distances between data points. In many implementations, the distance matrix calculation was part of the clustering or visualization processes, requiring either brute-force computation or building a temporary index. With Qdrant, however, the data is already indexed, and the distance matrix can be computed relatively cheaply. In this article, we will explore several methods for data exploration using the Distance Matrix API. ## [Anchor](https://qdrant.tech/articles/distance-based-exploration/\#dimensionality-reduction) Dimensionality Reduction Initially, we might want to visualize an entire dataset, or at least a large portion of it, at a glance. However, high-dimensional data cannot be directly visualized. We must apply dimensionality reduction techniques to convert data into a lower-dimensional representation while preserving important data properties. In this article, we will use [UMAP](https://github.com/lmcinnes/umap) as our dimensionality reduction algorithm. Here is a **very** simplified but intuitive explanation of UMAP: 1. _Randomly generate points in 2D space_: Assign a random 2D point to each high-dimensional point. 2. _Compute distance matrix for high-dimensional points_: Calculate distances between all pairs of points. 3. _Compute distance matrix for 2D points_: Perform similarly to step 2. 4. _Match both distance matrices_: Adjust 2D points to minimize differences. ![UMAP](https://qdrant.tech/articles_data/distance-based-exploration/umap.png) Canonical example of UMAP results, [source](https://github.com/lmcinnes/umap?tab=readme-ov-file#performance-and-examples) UMAP preserves the relative distances between high-dimensional points; the actual coordinates are not essential. If we already have the distance matrix, step 2 can be skipped entirely. Let’s use Qdrant to calculate the distance matrix and apply UMAP. We will use one of the default datasets perfect for experimenting in Qdrant– [Midjourney Styles dataset](https://midlibrary.io/). Use this command to download and import the dataset into Qdrant: ```http PUT /collections/midlib/snapshots/recover { "location": "http://snapshots.qdrant.io/midlib.snapshot" } ``` We also need to prepare our python enviroment: ```bash pip install umap-learn seaborn matplotlib qdrant-client ``` Import the necessary libraries: ```python --- # Used to talk to Qdrant from qdrant_client import QdrantClient --- # Package with original UMAP implementation from umap import UMAP --- # Python implementation for sparse matrices from scipy.sparse import csr_matrix --- # For vizualization import seaborn as sns ``` Establish connection to Qdrant: ```python client = QdrantClient("http://localhost:6333") ``` After this is done, we can compute the distance matrix: ```python --- # `_offsets` suffix defines a format of the output matrix. result = client.search_matrix_offsets( collection_name="midlib", sample=1000, # Select a subset of the data, as the whole dataset might be too large limit=20, # For performance reasons, limit the number of closest neighbors to consider ) --- # Convert distances matrix to python-native format matrix = csr_matrix( (result.scores, (result.offsets_row, result.offsets_col)) ) --- # Distance matrix is always symmetric, but qdrant only computes half of it. matrix = matrix + matrix.T ``` Now we can apply UMAP to the distance matrix: ```python umap = UMAP( metric="precomputed", # We provide ready-made distance matrix n_components=2, # output dimension n_neighbors=20, # Same as the limit in the search_matrix_offsets ) vectors_2d = umap.fit_transform(matrix) ``` That’s all that is needed to get the 2d representation of the data. ![UMAP on Midlib](https://qdrant.tech/articles_data/distance-based-exploration/umap-midlib.png) UMAP applied to Midlib dataset UMAP isn’t the only algorithm compatible with our distance matrix API. For example, `scikit-learn` also offers: - [Isomap](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html) \- Non-linear dimensionality reduction through Isometric Mapping. - [SpectralEmbedding](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.SpectralEmbedding.html) \- Forms an affinity matrix given by the specified function and applies spectral decomposition to the corresponding graph Laplacian. - [TSNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) \- well-known algorithm for dimensionality reduction. ## [Anchor](https://qdrant.tech/articles/distance-based-exploration/\#clustering) Clustering Another approach to data structure understanding is clustering–grouping similar items. _Note that there’s no universally best clustering criterion or algorithm._ ![Clustering](https://qdrant.tech/articles_data/distance-based-exploration/clustering.png) Clustering example, [source](https://scikit-learn.org/) Many clustering algorithms accept precomputed distance matrix as input, so we can use the same distance matrix we calculated before. Let’s consider a simple example of clustering the Midlib dataset with **KMeans algorithm**. From [scikit-learn.cluster documentation](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) we know that `fit()` method of KMeans algorithm prefers as an input: > `X : {array-like, sparse matrix} of shape (n_samples, n_features)`: > > Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous. If a sparse matrix is passed, a copy will be made if it’s not in CSR format. So we can re-use `matrix` from the previous example: ```python from sklearn.cluster import KMeans --- # Initialize KMeans with 10 clusters kmeans = KMeans(n_clusters=10) --- # Generate index of the cluster each sample belongs to cluster_labels = kmeans.fit_predict(matrix) ``` With this simple code, we have clustered the data into 10 clusters, while the main CPU-intensive part of the process was done by Qdrant. ![Clustering on Midlib](https://qdrant.tech/articles_data/distance-based-exploration/clustering-midlib.png) Clustering applied to Midlib dataset How to plot this chart ```python sns.scatterplot( # Coordinates obtained from UMAP x=vectors_2d[:, 0], y=vectors_2d[:, 1], # Color datapoints by cluster hue=cluster_labels, palette=sns.color_palette("pastel", 10), legend="full", ) ``` ## [Anchor](https://qdrant.tech/articles/distance-based-exploration/\#graphs) Graphs Clustering and dimensionality reduction both aim to provide a more transparent overview of the data. However, they share a common characteristic - they require a training step before the results can be visualized. This also implies that introducing new data points necessitates re-running the training step, which may be computationally expensive. Graphs offer an alternative approach to data exploration, enabling direct, interactive visualization of relationships between data points. In a graph representation, each data point is a node, and similarities between data points are represented as edges connecting the nodes. Such a graph can be rendered in real-time using [force-directed layout](https://en.wikipedia.org/wiki/Force-directed_graph_drawing) algorithms, which aim to minimize the system’s energy by repositioning nodes dynamically–the more similar the data points are, the stronger the edges between them. Adding new data points to the graph is as straightforward as inserting new nodes and edges without the need to re-run any training steps. In practice, rendering a graph for an entire dataset at once may be computationally expensive and overwhelming for the user. Therefore, let’s explore a few strategies to address this issue. ### [Anchor](https://qdrant.tech/articles/distance-based-exploration/\#expanding-from-a-single-node) Expanding from a single node This is the simplest approach, where we start with a single node and expand the graph by adding the most similar nodes to the graph. ![Graph](https://qdrant.tech/articles_data/distance-based-exploration/graph.gif) Graph representation of the data ### [Anchor](https://qdrant.tech/articles/distance-based-exploration/\#sampling-from-a-collection) Sampling from a collection Expanding a single node works well if you want to explore neighbors of a single point, but what if you want to explore the whole dataset? If your dataset is small enough, you can render relations for all the data points at once. But it is a rare case in practice. Instead, we can sample a subset of the data and render the graph for this subset. This way, we can get a good overview of the data without overwhelming the user with too much information. Let’s try to do so in [Qdrant’s Graph Exploration Tool](https://qdrant.tech/blog/qdrant-1.11.x/#web-ui-graph-exploration-tool): ```json { "limit": 5, # node neighbors to consider "sample": 100 # nodes } ``` ![Graph](https://qdrant.tech/articles_data/distance-based-exploration/graph-sampled.png) Graph representation of the data ( [Qdrant’s Graph Exploration Tool](https://qdrant.tech/blog/qdrant-1.11.x/#web-ui-graph-exploration-tool)) This graph captures some high-level structure of the data, but as you might have noticed, it is quite noisy. This is because the differences in similarities are relatively small, and they might be overwhelmed by the stretches and compressions of the force-directed layout algorithm. To make the graph more readable, let’s concentrate on the most important similarities and build a so called [Minimum/Maximum Spanning Tree](https://en.wikipedia.org/wiki/Minimum_spanning_tree). ```json { "limit": 5, "sample": 100, "tree": true } ``` ![Graph](https://qdrant.tech/articles_data/distance-based-exploration/spanning-tree.png) Spanning tree of the graph ( [Qdrant’s Graph Exploration Tool](https://qdrant.tech/blog/qdrant-1.11.x/#web-ui-graph-exploration-tool)) This algorithm will only keep the most important edges and remove the rest while keeping the graph connected. By doing so, we can reveal clusters of the data and the most important relations between them. In some sense, this is similar to hierarchical clustering, but with the ability to interactively explore the data. Another analogy might be a dynamically constructed mind map. ## [Anchor](https://qdrant.tech/articles/distance-based-exploration/\#conclusion) Conclusion Vector similarity goes beyond looking up the nearest neighbors–it provides a powerful tool for data exploration. Many algorithms can construct human-readable data representations, and Qdrant makes using them easy. Several data exploration instruments are available in the Qdrant Web UI ( [Visualization and Graph Exploration Tools](https://qdrant.tech/articles/web-ui-gsoc/)), and for more advanced use cases, you could directly utilise our distance matrix API. Try it with your data and see what hidden structures you can reveal! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/distance-based-exploration.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/distance-based-exploration.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-142-lllmstxt|> ## role-management - [Documentation](https://qdrant.tech/documentation/) - [Cloud rbac](https://qdrant.tech/documentation/cloud-rbac/) - Role Management --- # [Anchor](https://qdrant.tech/documentation/cloud-rbac/role-management/\#role-management) Role Management > 💡 You can access this in **Access Management > User & Role Management** _if available see [this page for details](https://qdrant.tech/documentation/cloud-rbac/)._ A **Role** contains a set of **permissions** that define the ability to perform or control specific actions in Qdrant Cloud. Permissions are accessible through the Permissions tab in the Role Details page and offer fine-grained access control, logically grouped for easy identification. ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/role-management/\#built-in-roles) Built-In Roles Qdrant Cloud includes some built-in roles for common use-cases. The permissions for these built-in roles cannot be changed. There are three types: - The **Base Role** is assigned to all users, and provides the minimum privileges required to access Qdrant Cloud. - The **Admin Role**  has all available permissions, except for account write permissions. - The **Owner Role** has all available permissions assigned, including account write permissions. There can only be one Owner per account currently. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/built-in-roles.png) ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/role-management/\#custom-roles) Custom Roles An authorized user can create their own custom roles with specific sets of permissions, giving them more control over who has what access to which resource. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/custom-roles.png) ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/role-management/\#creating-a-custom-role) Creating a Custom Role To create a new custom role, click on the **Add** button at the top-right corner of the **Custom Roles** list. - **Role Name**: Must be unique across roles. - **Role Description**: Brief description of the role’s purpose. Once created, the new role will appear under the **Custom Roles** section in the navigation. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/create-custom-role.png) ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/role-management/\#editing-a-custom-role) Editing a Custom Role To update a specific role’s permissions, select it from the list and click on the **Permissions** tab. Here, you’ll find logically grouped options that are easy to identify and edit as needed. Once you’ve made your changes, save them to apply the updated permissions to the role. ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/update-permission.png) ### [Anchor](https://qdrant.tech/documentation/cloud-rbac/role-management/\#renaming-deleting-and-duplicating-a-custom-role) Renaming, Deleting and Duplicating a Custom Role Each custom role can be renamed, duplicated or deleted via the action buttons located to the right of the role title bar. - **Rename**: Opens a dialog allowing users to update both the role name and description. - **Delete**: Triggers a confirmation prompt to confirm the deletion. Once confirmed, this action is irreversible. Any users assigned to the deleted role will automatically be unassigned from it. - **Duplicate:** Opens a dialog asking for a confirmation and also allowing users to view the list of permissions that will be assigned to the duplicated role ![image.png](https://qdrant.tech/documentation/cloud/role-based-access-control/role-actions.png) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/role-management.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/role-management.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-143-lllmstxt|> ## databricks - [Documentation](https://qdrant.tech/documentation/) - [Send data](https://qdrant.tech/documentation/send-data/) - Qdrant on Databricks --- # [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#qdrant-on-databricks) Qdrant on Databricks | Time: 30 min | Level: Intermediate | [Complete Notebook](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4750876096379825/93425612168199/6949977306828869/latest.html) | | --- | --- | --- | [Databricks](https://www.databricks.com/) is a unified analytics platform for working with big data and AI. It’s built around Apache Spark, a powerful open-source distributed computing system well-suited for processing large-scale datasets and performing complex analytics tasks. Apache Spark is designed to scale horizontally, meaning it can handle expensive operations like generating vector embeddings by distributing computation across a cluster of machines. This scalability is crucial when dealing with large datasets. In this example, we will demonstrate how to vectorize a dataset with dense and sparse embeddings using Qdrant’s [FastEmbed](https://qdrant.github.io/fastembed/) library. We will then load this vectorized data into a Qdrant cluster using the [Qdrant Spark connector](https://qdrant.tech/documentation/frameworks/spark/) on Databricks. ### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#setting-up-a-databricks-project) Setting up a Databricks project - Set up a **[Databricks cluster](https://docs.databricks.com/en/compute/configure.html)** following the official documentation guidelines. - Install the **[Qdrant Spark connector](https://qdrant.tech/documentation/frameworks/spark/)** as a library: - Navigate to the `Libraries` section in your cluster dashboard. - Click on `Install New` at the top-right to open the library installation modal. - Search for `io.qdrant:spark:VERSION` in the Maven packages and click on `Install`. ![Install the library](https://qdrant.tech/documentation/examples/databricks/library-install.png) - Create a new **[Databricks notebook](https://docs.databricks.com/en/notebooks/index.html)** on your cluster to begin working with your data and libraries. ### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#download-a-dataset) Download a dataset - **Install the required dependencies:** ```python %pip install fastembed datasets ``` - **Download the dataset:** ```python from datasets import load_dataset dataset_name = "tasksource/med" dataset = load_dataset(dataset_name, split="train") --- # We'll use the first 100 entries from this dataset and exclude some unused columns. dataset = dataset.select(range(100)).remove_columns(["gold_label", "genre"]) ``` - **Convert the dataset into a Spark dataframe:** ```python dataset.to_parquet("/dbfs/pq.pq") dataset_df = spark.read.parquet("file:/dbfs/pq.pq") ``` ### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#vectorizing-the-data) Vectorizing the data In this section, we’ll be generating both dense and sparse vectors for our rows using [FastEmbed](https://qdrant.github.io/fastembed/). We’ll create a user-defined function (UDF) to handle this step. #### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#creating-the-vectorization-function) Creating the vectorization function ```python from fastembed import TextEmbedding, SparseTextEmbedding def vectorize(partition_data): # Initialize dense and sparse models dense_model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5") sparse_model = SparseTextEmbedding(model_name="Qdrant/bm25") for row in partition_data: # Generate dense and sparse vectors dense_vector = next(dense_model.embed(row.sentence1)) sparse_vector = next(sparse_model.embed(row.sentence2)) yield [\ row.sentence1, # 1st column: original text\ row.sentence2, # 2nd column: original text\ dense_vector.tolist(), # 3rd column: dense vector\ sparse_vector.indices.tolist(), # 4th column: sparse vector indices\ sparse_vector.values.tolist(), # 5th column: sparse vector values\ ] ``` We’re using the [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) model for dense embeddings and [BM25](https://huggingface.co/Qdrant/bm25) for sparse embeddings. #### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#applying-the-udf-on-our-dataframe) Applying the UDF on our dataframe Next, let’s apply our `vectorize` UDF on our Spark dataframe to generate embeddings. ```python embeddings = dataset_df.rdd.mapPartitions(vectorize) ``` The `mapPartitions()` method returns a [Resilient Distributed Dataset (RDD)](https://www.databricks.com/glossary/what-is-rdd) which should then be converted back to a Spark dataframe. #### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#building-the-new-spark-dataframe-with-the-vectorized-data) Building the new Spark dataframe with the vectorized data We’ll now create a new Spark dataframe ( `embeddings_df`) with the vectorized data using the specified schema. ```python from pyspark.sql.types import StructType, StructField, StringType, ArrayType, FloatType, IntegerType --- # Define the schema for the new dataframe schema = StructType([\ StructField("sentence1", StringType()),\ StructField("sentence2", StringType()),\ StructField("dense_vector", ArrayType(FloatType())),\ StructField("sparse_vector_indices", ArrayType(IntegerType())),\ StructField("sparse_vector_values", ArrayType(FloatType()))\ ]) --- # Create the new dataframe with the vectorized data embeddings_df = spark.createDataFrame(data=embeddings, schema=schema) ``` ### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#uploading-the-data-to-qdrant) Uploading the data to Qdrant - **Create a Qdrant collection:** - [Follow the documentation](https://qdrant.tech/documentation/concepts/collections/#create-a-collection) to create a collection with the appropriate configurations. Here’s an example request to support both dense and sparse vectors: ```json PUT /collections/{collection_name} { "vectors": { "dense": { "size": 384, "distance": "Cosine" } }, "sparse_vectors": { "sparse": {} } } ``` - **Upload the dataframe to Qdrant:** ```python options = { "qdrant_url": "", "api_key": "", "collection_name": "", "vector_fields": "dense_vector", "vector_names": "dense", "sparse_vector_value_fields": "sparse_vector_values", "sparse_vector_index_fields": "sparse_vector_indices", "sparse_vector_names": "sparse", "schema": embeddings_df.schema.json(), } embeddings_df.write.format("io.qdrant.spark.Qdrant").options(**options).mode( "append" ).save() ``` Ensure to replace the placeholder values ( ``, ``, ``) with your actual values. If the `id_field` option is not specified, Qdrant Spark connector generates random UUIDs for each point. The command output you should see is similar to: ```console Command took 40.37 seconds -- by xxxxx90@xxxxxx.com at 4/17/2024, 12:13:28 PM on fastembed ``` ### [Anchor](https://qdrant.tech/documentation/send-data/databricks/\#conclusion) Conclusion That wraps up our tutorial! Feel free to explore more functionalities and experiments with different models, parameters, and features available in Databricks, Spark, and Qdrant. Happy data engineering! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/databricks.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/send-data/databricks.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-144-lllmstxt|> ## agentic-rag - [Articles](https://qdrant.tech/articles/) - What is Agentic RAG? Building Agents with Qdrant [Back to RAG & GenAI](https://qdrant.tech/articles/rag-and-genai/) --- # What is Agentic RAG? Building Agents with Qdrant Kacper Łukawski · November 22, 2024 ![What is Agentic RAG? Building Agents with Qdrant](https://qdrant.tech/articles_data/agentic-rag/preview/title.jpg) Standard [Retrieval Augmented Generation](https://qdrant.tech/articles/what-is-rag-in-ai/) follows a predictable, linear path: receive a query, retrieve relevant documents, and generate a response. In many cases that might be enough to solve a particular problem. In the worst case scenario, your LLM will just decide to not answer the question, because the context does not provide enough information. ![Standard, linear RAG pipeline](https://qdrant.tech/articles_data/agentic-rag/linear-rag.png) On the other hand, we have agents. These systems are given more freedom to act, and can take multiple non-linear steps to achieve a certain goal. There isn’t a single definition of what an agent is, but in general, it is an application that uses LLM and usually some tools to communicate with the outside world. LLMs are used as decision-makers which decide what action to take next. Actions can be anything, but they are usually well-defined and limited to a certain set of possibilities. One of these actions might be to query a vector database, like Qdrant, to retrieve relevant documents, if the context is not enough to make a decision. However, RAG is just a single tool in the agent’s arsenal. ![AI Agent](https://qdrant.tech/articles_data/agentic-rag/ai-agent.png) ## [Anchor](https://qdrant.tech/articles/agentic-rag/\#agentic-rag-combining-rag-with-agents) Agentic RAG: Combining RAG with Agents Since the agent definition is vague, the concept of **Agentic RAG** is also not well-defined. In general, it refers to the combination of RAG with agents. This allows the agent to use external knowledge sources to make decisions, and primarily to decide when the external knowledge is needed. We can describe a system as Agentic RAG if it breaks the linear flow of a standard RAG system, and gives the agent the ability to take multiple steps to achieve a goal. A simple router that chooses a path to follow is often described as the simplest form of an agent. Such a system has multiple paths with conditions describing when to take a certain path. In the context of Agentic RAG, the agent can decide to query a vector database if the context is not enough to answer, or skip the query if it’s enough, or when the question refers to common knowledge. Alternatively, there might be multiple collections storing different kinds of information, and the agent can decide which collection to query based on the context. The key factor is that the decision of choosing a path is made by the LLM, which is the core of the agent. A routing agent never comes back to the previous step, so it’s ultimately just a conditional decision-making system. ![Routing Agent](https://qdrant.tech/articles_data/agentic-rag/routing-agent.png) However, routing is just the beginning. Agents can be much more complex, and extreme forms of agents can have complete freedom to act. In such cases, the agent is given a set of tools and can autonomously decide which ones to use, how to use them, and in which order. LLMs are asked to plan and execute actions, and the agent can take multiple steps to achieve a goal, including taking steps back if needed. Such a system does not have to follow a DAG structure (Directed Acyclic Graph), and can have loops that help to self-correct the decisions made in the past. An agentic RAG system built in that manner can have tools not only to query a vector database, but also to play with the query, summarize the results, or even generate new data to answer the question. Options are endless, but there are some common patterns that can be observed in the wild. ![Autonomous Agent](https://qdrant.tech/articles_data/agentic-rag/autonomous-agent.png) ### [Anchor](https://qdrant.tech/articles/agentic-rag/\#solving-information-retrieval-problems-with-llms) Solving Information Retrieval Problems with LLMs Generally speaking, tools exposed in an agentic RAG system are used to solve information retrieval problems which are not new to the search community. LLMs have changed how we approach these problems, but the core of the problem remains the same. What kind of tools you can consider using in an agentic RAG? Here are some examples: - **Querying a vector database** \- the most common tool used in agentic RAG systems. It allows the agent to retrieve relevant documents based on the query. - **Query expansion** \- a tool that can be used to improve the query. It can be used to add synonyms, correct typos, or even to generate new queries based on the original one. ![Query expansion example](https://qdrant.tech/articles_data/agentic-rag/query-expansion.png) - **Extracting filters** \- vector search alone is sometimes not enough. In many cases, you might want to narrow down the results based on specific parameters. This extraction process can automatically identify relevant conditions from the query. Otherwise, your users would have to manually define these search constraints. ![Extracting filters](https://qdrant.tech/articles_data/agentic-rag/extracting-filters.png) - **Quality judgement** \- knowing the quality of the results for given query can be used to decide whether they are good enough to answer, or if the agent should take another step to improve them somehow. Alternatively it can also admit the failure to provide good response. ![Quality judgement](https://qdrant.tech/articles_data/agentic-rag/quality-judgement.png) These are just some of the examples, but the list is not exhaustive. For example, your LLM could possibly play with Qdrant search parameters or choose different methods to query it. An example? If your users are searching using some specific keywords, you may prefer sparse vectors to dense vectors, as they are more efficient in such cases. In that case you have to arm your agent with tools to decide when to use sparse vectors and when to use dense vectors. Agent aware of the collection structure can make such decisions easily. Each of these tools might be a separate agent on its own, and multi-agent systems are not uncommon. In such cases, agents can communicate with each other, and one agent can decide to use another agent to solve a particular problem. Pretty useful component of an agentic RAG is also a human in the loop, which can be used to correct the agent’s decisions, or steer it in the right direction. ## [Anchor](https://qdrant.tech/articles/agentic-rag/\#where-are-agents-used) Where are Agents Used? Agents are an interesting concept, but since they heavily rely on LLMs, they are not applicable to all problems. Using Large Language Models is expensive and tend to be slow, what in many cases, it’s not worth the cost. Standard RAG involves just a single call to the LLM, and the response is generated in a predictable way. Agents, on the other hand, can take multiple steps, and the latency experienced by the user adds up. In many cases, it’s not acceptable. Agentic RAG is probably not that widely applicable in ecommerce search, where the user expects a quick response, but might be fine for customer support, where the user is willing to wait a bit longer for a better answer. ## [Anchor](https://qdrant.tech/articles/agentic-rag/\#which-framework-is-best) Which Framework is Best? There are lots of frameworks available to build agents, and choosing the best one is not easy. It depends on your existing stack or the tools you are familiar with. Some of the most popular LLM libraries have already drifted towards the agent paradigm, and they are offering tools to build them. There are, however, some tools built primarily for agents development, so let’s focus on them. ### [Anchor](https://qdrant.tech/articles/agentic-rag/\#langgraph) LangGraph Developed by the LangChain team, LangGraph seems like a natural extension for those who already use LangChain for building their RAG systems, and would like to start with agentic RAG. Surprisingly, LangGraph has nothing to do with Large Language Models on its own. It’s a framework for building graph-based applications in which each **node** is a step of the workflow. Each node takes an application **state** as an input, and produces a modified state as an output. The state is then passed to the next node, and so on. **Edges** between the nodes might be conditional what makes branching possible. Contrary to some DAG-based tool (i.e. Apache Airflow), LangGraph allows for loops in the graph, which makes it possible to implement cyclic workflows, so an agent can achieve self-reflection and self-correction. Theoretically, LangGraph can be used to build any kind of applications in a graph-based manner, not only LLM agents. Some of the strengths of LangGraph include: - **Persistence** \- the state of the workflow graph is stored as a checkpoint. That happens at each so-called super-step (which is a single sequential node of a graph). It enables replying certain steps of the workflow, fault-tolerance, and including human-in-the-loop interactions. This mechanism also acts as a **short-term memory**, accessible in a context of a particular workflow execution. - **Long-term memory** \- LangGraph also has a concept of memories that are shared between different workflow runs. However, this mechanism has to explicitly handled by our nodes. **Qdrant with its semantic search capabilities is** **often used as a long-term memory layer**. - **Multi-agent support** \- while there is no separate concept of multi-agent systems in LangGraph, it’s possible to create such an architecture by building a graph that includes multiple agents and some kind of supervisor that makes a decision which agent to use in a given situation. If a node might be anything, then it might be another agent as well. Some other interesting features of LangGraph include the ability to visualize the graph, automate the retries of failed steps, and include human-in-the-loop interactions. A minimal example of an agentic RAG could improve the user query, e.g. by fixing typos, expanding it with synonyms, or even generating a new query based on the original one. The agent could then retrieve documents from a vector database based on the improved query, and generate a response. The LangGraph app implementing this approach could look like this: ```python from typing import Sequence from typing_extensions import TypedDict, Annotated from langchain_core.messages import BaseMessage from langgraph.constants import START, END from langgraph.graph import add_messages, StateGraph class AgentState(TypedDict): # The state of the agent includes at least the messages exchanged between the agent(s) # and the user. It is, however, possible to include other information in the state, as # it depends on the specific agent. messages: Annotated[Sequence[BaseMessage], add_messages] def improve_query(state: AgentState): ... def retrieve_documents(state: AgentState): ... def generate_response(state: AgentState): ... --- # Building a graph requires defining nodes and building the flow between them with edges. builder = StateGraph(AgentState) builder.add_node("improve_query", improve_query) builder.add_node("retrieve_documents", retrieve_documents) builder.add_node("generate_response", generate_response) builder.add_edge(START, "improve_query") builder.add_edge("improve_query", "retrieve_documents") builder.add_edge("retrieve_documents", "generate_response") builder.add_edge("generate_response", END) --- # Compiling the graph performs some checks and prepares the graph for execution. compiled_graph = builder.compile() --- # Compiled graph might be invoked with the initial state to start. compiled_graph.invoke({ "messages": [\ ("user", "Why Qdrant is the best vector database out there?"),\ ] }) ``` Each node of the process is just a Python function that does certain operation. You can call an LLM of your choice inside of them, if you want to, but there is no assumption about the messages being created by any AI. **LangGraph** **rather acts as a runtime that launches these functions in a specific order, and passes the state between them**. While [LangGraph](https://www.langchain.com/langgraph) integrates well with the LangChain ecosystem, it can be used independently. For teams looking for additional support and features, there’s also a commercial offering called LangGraph Platform. The framework is available for both Python and JavaScript environments, making it possible to be used in different tech stacks. ### [Anchor](https://qdrant.tech/articles/agentic-rag/\#crewai) CrewAI CrewAI is another popular choice for building agents, including agentic RAG. It’s a high-level framework that assumes there are some LLM-based agents working together to achieve a common goal. That’s where the “crew” in CrewAI comes from. CrewAI is designed with multi-agent systems in mind. Contrary to LangGraph, the developer does not create a graph of processing, but defines agents and their roles within the crew. Some of the key concepts of CrewAI include: - **Agent** \- a unit that has a specific role and goal, controlled by an LLM. It can optionally use some external tools to communicate with the outside world, but generally steered by prompt we provide to the LLM. - **Process** \- currently either sequential or hierarchical. It defines how the task will be executed by the agents. In a sequential process, agents are executed one after another, while in a hierarchical process, agent is selected by the manager agent, which is responsible for making decisions about which agent to use in a given situation. - **Roles and goals** \- each agent has a certain role within the crew, and the goal it should aim to achieve. These are set when we define an agent and are used to make decisions about which agent to use in a given situation. - **Memory** \- an extensive memory system consists of short-term memory, long-term memory, entity memory, and contextual memory that combines the other three. There is also user memory for preferences and personalization. **This is where** **Qdrant comes into play, as it might be used as a long-term memory layer.** CrewAI provides a rich set of tools integrated into the framework. That may be a huge advantage for those who want to combine RAG with e.g. code execution, or image generation. The ecosystem is rich, however brining your own tools is not a big deal, as CrewAI is designed to be extensible. A simple agentic RAG application implemented in CrewAI could look like this: ```python from crewai import Crew, Agent, Task from crewai.memory.entity.entity_memory import EntityMemory from crewai.memory.short_term.short_term_memory import ShortTermMemory from crewai.memory.storage.rag_storage import RAGStorage class QdrantStorage(RAGStorage): ... response_generator_agent = Agent( role="Generate response based on the conversation", goal="Provide the best response, or admit when the response is not available.", backstory=( "I am a response generator agent. I generate " "responses based on the conversation." ), verbose=True, ) query_reformulation_agent = Agent( role="Reformulate the query", goal="Rewrite the query to get better results. Fix typos, grammar, word choice, etc.", backstory=( "I am a query reformulation agent. I reformulate the " "query to get better results." ), verbose=True, ) task = Task( description="Let me know why Qdrant is the best vector database out there.", expected_output="3 bullet points", agent=response_generator_agent, ) crew = Crew( agents=[response_generator_agent, query_reformulation_agent], tasks=[task], memory=True, entity_memory=EntityMemory(storage=QdrantStorage("entity")), short_term_memory=ShortTermMemory(storage=QdrantStorage("short-term")), ) crew.kickoff() ``` _Disclaimer: QdrantStorage is not a part of the CrewAI framework, but it’s taken from the Qdrant documentation on [how\_\ _to integrate Qdrant with CrewAI](https://qdrant.tech/documentation/frameworks/crewai/)._ Although it’s not a technical advantage, CrewAI has a [great documentation](https://docs.crewai.com/introduction). The framework is available for Python, and it’s easy to get started with it. CrewAI also has a commercial offering, CrewAI Enterprise, which provides a platform for building and deploying agents at scale. ### [Anchor](https://qdrant.tech/articles/agentic-rag/\#autogen) AutoGen AutoGen emphasizes multi-agent architectures as a fundamental design principle. The framework requires at least two agents in any system to really call an application agentic - typically an assistant and a user proxy exchange messages to achieve a common goal. Sequential chat with more than two agents is also supported, as well as group chat and nested chat for internal dialogue. However, AutoGen does not assume there is a structured state that is passed between the agents, and the chat conversation is the only way to communicate between them. There are many interesting concepts in the framework, some of them even quite unique: - **Tools/functions** \- external components that can be used by agents to communicate with the outside world. They are defined as Python callables, and can be used for any external interaction we want to allow the agent to do. Type annotations are used to define the input and output of the tools, and Pydantic models are supported for more complex type schema. AutoGen supports only OpenAI-compatible tool call API for the time being. - **Code executors** \- built-in code executors include local command, Docker command, and Jupyter. An agent can write and launch code, so theoretically the agents can do anything that can be done in Python. None of the other frameworks made code generation and execution that prominent. Code execution being the first-class citizen in AutoGen is an interesting concept. Each AutoGen agent uses at least one of the components: human-in-the-loop, code executor, tool executor, or LLM. A simple agentic RAG, based on the conversation of two agents which can retrieve documents from a vector database, or improve the query, could look like this: ```python from os import environ from autogen import ConversableAgent from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent from qdrant_client import QdrantClient client = QdrantClient(...) response_generator_agent = ConversableAgent( name="response_generator_agent", system_message=( "You answer user questions based solely on the provided context. You ask to retrieve relevant documents for " "your query, or reformulate the query, if it is incorrect in some way." ), description="A response generator agent that can answer your queries.", llm_config={"config_list": [{"model": "gpt-4", "api_key": environ.get("OPENAI_API_KEY")}]}, human_input_mode="NEVER", ) user_proxy = RetrieveUserProxyAgent( name="retrieval_user", llm_config={"config_list": [{"model": "gpt-4", "api_key": environ.get("OPENAI_API_KEY")}]}, human_input_mode="NEVER", retrieve_config={ "task": "qa", "chunk_token_size": 2000, "vector_db": "qdrant", "db_config": {"client": client}, "get_or_create": True, "overwrite": True, }, ) result = user_proxy.initiate_chat( response_generator_agent, message=user_proxy.message_generator, problem="Why Qdrant is the best vector database out there?", max_turns=10, ) ``` For those new to agent development, AutoGen offers AutoGen Studio, a low-code interface for prototyping agents. While not intended for production use, it significantly lowers the barrier to entry for experimenting with agent architectures. ![AutoGen Studio](https://qdrant.tech/articles_data/agentic-rag/autogen-studio.png) It’s worth noting that AutoGen is currently undergoing significant updates, with version 0.4.x in development introducing substantial API changes compared to the stable 0.2.x release. While the framework currently has limited built-in persistence and state management capabilities, these features may evolve in future releases. ### [Anchor](https://qdrant.tech/articles/agentic-rag/\#openai-swarm) OpenAI Swarm Unliked the other frameworks described in this article, OpenAI Swarm is an educational project, and it’s not ready for production use. It’s worth mentioning, though, as it’s pretty lightweight and easy to get started with. OpenAI Swarm is an experimental framework for orchestrating multi-agent workflows that focuses on agent coordination through direct handoffs rather than complex orchestration patterns. With that setup, **agents** are just exchanging messages in a chat, optionally calling some Python functions to communicate with external services, or handing off the conversation to another agent, if the other one seems to be more suitable to answer the question. Each agent has a certain role, defined by the instructions we have to define. We have to decide which LLM will a particular agent use, and a set of functions it can call. For example, **a retrieval** **agent could use a vector database to retrieve documents**, and return the results to the next agent. That means, there should be a function that performs the semantic search on its behalf, but the model will decide how the query should look like. Here is how a similar agentic RAG application, implemented in OpenAI Swarm, could look like: ```python from swarm import Swarm, Agent client = Swarm() def retrieve_documents(query: str) -> list[str]: """ Retrieve documents based on the query. """ ... def transfer_to_query_improve_agent(): return query_improve_agent query_improve_agent = Agent( name="Query Improve Agent", instructions=( "You are a search expert that takes user queries and improves them to get better results. You fix typos and " "extend queries with synonyms, if needed. You never ask the user for more information." ), ) response_generation_agent = Agent( name="Response Generation Agent", instructions=( "You take the whole conversation and generate a final response based on the chat history. " "If you don't have enough information, you can retrieve the documents from the knowledge base or " "reformulate the query by transferring to other agent. You never ask the user for more information. " "You have to always be the last participant of each conversation." ), functions=[retrieve_documents, transfer_to_query_improve_agent], ) response = client.run( agent=response_generation_agent, messages=[\ {\ "role": "user",\ "content": "Why Qdrant is the best vector database out there?"\ }\ ], ) ``` Even though we don’t explicitly define the graph of processing, the agents can still decide to hand off the processing to a different agent. There is no concept of a state, so everything relies on the messages exchanged between different components. OpenAI Swarm does not focus on integration with external tools, and **if you would like to integrate semantic search** **with Qdrant, you would have to implement it fully yourself**. Obviously, the library is tightly coupled with OpenAI models, and while using some other ones is possible, it requires some additional work like setting up proxy that will adjust the interface to OpenAI API. ### [Anchor](https://qdrant.tech/articles/agentic-rag/\#the-winner) The winner? Choosing the best framework for your agentic RAG system depends on your existing stack, team expertise, and the specific requirements of your project. All the described tools are strong contenders, and they are developed at rapid pace. It’s worth keeping an eye on all of them, as they are likely to evolve and improve over time. Eventually, you should be able to build the same processes with any of them, but some of them may be more suitable in a specific ecosystem of the tools you want your agent to interact with. There are, however, some important factors to consider when choosing a framework for your agentic RAG system: - **Human-in-the-loop** \- even though we aim to build autonomous agents, it’s often important to include the feedback from the human, so our agents cannot perform malicious actions. - **Observability** \- how easy it is to debug the system, and how easy it is to understand what’s happening inside. Especially important, since we are dealing with lots of LLM prompts. Still, choosing the right toolkit depends on the state of your project, and the specific requirements you have. If you want to integrate your agent with number of external tools, CrewAI might be the best choice, as the set of out-of-the-box integrations is the biggest. However, LangGraph integrates well with LangChain, so if you are familiar with that ecosystem, it may suit you better. All the frameworks have different approaches to building agents, so it’s worth experimenting with all of them to see which one fits your needs the best. LangGraph and CrewAI are more mature and have more features, while AutoGen and OpenAI Swarm are more lightweight and more experimental. However, **none of the existing frameworks solves all the** **mentioned Information Retrieval problems**, so you still have to build your own tools to fill the gaps. ## [Anchor](https://qdrant.tech/articles/agentic-rag/\#building-agentic-rag-with-qdrant) Building Agentic RAG with Qdrant No matter which framework you choose, Qdrant is a great tool to build agentic RAG systems. Please check out [our\\ integrations](https://qdrant.tech/documentation/frameworks/) to choose the best one for your use case and preferences. The easiest way to start using Qdrant is to use our managed service, [Qdrant Cloud](https://cloud.qdrant.io/). A free 1GB cluster is available for free, so you can start building your agentic RAG system in minutes. ### [Anchor](https://qdrant.tech/articles/agentic-rag/\#further-reading) Further Reading See how Qdrant integrates with: - [Autogen](https://qdrant.tech/documentation/frameworks/autogen/) - [CrewAI](https://qdrant.tech/documentation/frameworks/crewai/) - [LangGraph](https://qdrant.tech/documentation/frameworks/langgraph/) - [Swarm](https://qdrant.tech/documentation/frameworks/swarm/) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/agentic-rag.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/agentic-rag.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) ![Company Logo](https://cdn.cookielaw.org/logos/static/ot_company_logo.png) ## Privacy Preference Center Cookies used on the site are categorized, and below, you can read about each category and allow or deny some or all of them. When categories that have been previously allowed are disabled, all cookies assigned to that category will be removed from your browser. Additionally, you can see a list of cookies assigned to each category and detailed information in the cookie declaration. [More information](https://qdrant.tech/legal/privacy-policy/#cookies-and-web-beacons) Allow All ### Manage Consent Preferences #### Targeting Cookies Targeting Cookies These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising. #### Functional Cookies Functional Cookies These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly. #### Strictly Necessary Cookies Always Active These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information. #### Performance Cookies Performance Cookies These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance. Back Button ### Cookie List Search Icon Filter Icon Clear checkbox labellabel ApplyCancel ConsentLeg.Interest checkbox labellabel checkbox labellabel checkbox labellabel Reject AllConfirm My Choices [![Powered by Onetrust](https://cdn.cookielaw.org/logos/static/powered_by_logo.svg)](https://www.onetrust.com/products/cookie-consent/) <|page-145-lllmstxt|> ## database-tutorials - [Documentation](https://qdrant.tech/documentation/) - Using the Database --- # [Anchor](https://qdrant.tech/documentation/database-tutorials/\#database-tutorials) Database Tutorials | | | --- | | [Bulk Upload Vectors to a Qdrant Collection](https://qdrant.tech/documentation/database-tutorials/bulk-upload/) | | [Large Scale Search](https://qdrant.tech/documentation/database-tutorials/large-scale-search/) | | [Backup and Restore Qdrant Collections Using Snapshots](https://qdrant.tech/documentation/database-tutorials/create-snapshot/) | | [Load and Search Hugging Face Datasets with Qdrant](https://qdrant.tech/documentation/database-tutorials/huggingface-datasets/) | | [Using Qdrant’s Async API for Efficient Python Applications](https://qdrant.tech/documentation/database-tutorials/async-api/) | | [Qdrant Migration Guide](https://qdrant.tech/documentation/database-tutorials/migration/) | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-146-lllmstxt|> ## single-node-speed-benchmark --- # Single node benchmarks August 23, 2022 Dataset:dbpedia-openai-1M-1536-angulardeep-image-96-angulargist-960-euclideanglove-100-angular Search threads:1001 Plot values: RPS Latency p95 latency Index time | Engine | Setup | Dataset | Upload Time(m) | Upload + Index Time(m) | Latency(ms) | P95(ms) | P99(ms) | RPS | Precision | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | qdrant | qdrant-sq-rps-m-64-ef-512 | dbpedia-openai-1M-1536-angular | 3.51 | 24.43 | 3.54 | 4.95 | 8.62 | 1238.0016 | 0.99 | | weaviate | latest-weaviate-m32 | dbpedia-openai-1M-1536-angular | 13.94 | 13.94 | 4.99 | 7.16 | 11.33 | 1142.13 | 0.97 | | elasticsearch | elasticsearch-m-32-ef-128 | dbpedia-openai-1M-1536-angular | 19.18 | 83.72 | 22.10 | 72.53 | 135.68 | 716.80 | 0.98 | | redis | redis-m-32-ef-256 | dbpedia-openai-1M-1536-angular | 92.49 | 92.49 | 140.65 | 160.85 | 167.35 | 625.27 | 0.97 | | milvus | milvus-m-16-ef-128 | dbpedia-openai-1M-1536-angular | 0.27 | 1.16 | 393.31 | 441.32 | 576.65 | 219.11 | 0.99 | _Download raw data: [here](https://qdrant.tech/benchmarks/results-1-100-thread-2024-06-15.json)_ ## [Anchor](https://qdrant.tech/benchmarks/single-node-speed-benchmark/\#observations) Observations Most of the engines have improved since [our last run](https://qdrant.tech/benchmarks/single-node-speed-benchmark-2022/). Both life and software have trade-offs but some clearly do better: - **`Qdrant` achives highest RPS and lowest latencies in almost all the scenarios, no matter the precision threshold and the metric we choose.** It has also shown 4x RPS gains on one of the datasets. - `Elasticsearch` has become considerably fast for many cases but it’s very slow in terms of indexing time. It can be 10x slower when storing 10M+ vectors of 96 dimensions! (32mins vs 5.5 hrs) - `Milvus` is the fastest when it comes to indexing time and maintains good precision. However, it’s not on-par with others when it comes to RPS or latency when you have higher dimension embeddings or more number of vectors. - `Redis` is able to achieve good RPS but mostly for lower precision. It also achieved low latency with single thread, however its latency goes up quickly with more parallel requests. Part of this speed gain comes from their custom protocol. - `Weaviate` has improved the least since our last run. ## [Anchor](https://qdrant.tech/benchmarks/single-node-speed-benchmark/\#how-to-read-the-results) How to read the results - Choose the dataset and the metric you want to check. - Select a precision threshold that would be satisfactory for your usecase. This is important because ANN search is all about trading precision for speed. This means in any vector search benchmark, **two results must be compared only when you have similar precision**. However most benchmarks miss this critical aspect. - The table is sorted by the value of the selected metric (RPS / Latency / p95 latency / Index time), and the first entry is always the winner of the category 🏆 ### [Anchor](https://qdrant.tech/benchmarks/single-node-speed-benchmark/\#latency-vs-rps) Latency vs RPS In our benchmark we test two main search usage scenarios that arise in practice. - **Requests-per-Second (RPS)**: Serve more requests per second in exchange of individual requests taking longer (i.e. higher latency). This is a typical scenario for a web application, where multiple users are searching at the same time. To simulate this scenario, we run client requests in parallel with multiple threads and measure how many requests the engine can handle per second. - **Latency**: React quickly to individual requests rather than serving more requests in parallel. This is a typical scenario for applications where server response time is critical. Self-driving cars, manufacturing robots, and other real-time systems are good examples of such applications. To simulate this scenario, we run client in a single thread and measure how long each request takes. ### [Anchor](https://qdrant.tech/benchmarks/single-node-speed-benchmark/\#tested-datasets) Tested datasets Our [benchmark tool](https://github.com/qdrant/vector-db-benchmark) is inspired by [github.com/erikbern/ann-benchmarks](https://github.com/erikbern/ann-benchmarks/). We used the following datasets to test the performance of the engines on ANN Search tasks: | Datasets | \# Vectors | Dimensions | Distance | | --- | --- | --- | --- | | [dbpedia-openai-1M-angular](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) | 1M | 1536 | cosine | | [deep-image-96-angular](http://sites.skoltech.ru/compvision/noimi/) | 10M | 96 | cosine | | [gist-960-euclidean](http://corpus-texmex.irisa.fr/) | 1M | 960 | euclidean | | [glove-100-angular](https://nlp.stanford.edu/projects/glove/) | 1.2M | 100 | cosine | ### [Anchor](https://qdrant.tech/benchmarks/single-node-speed-benchmark/\#setup) Setup ![Benchmarks configuration](https://qdrant.tech/benchmarks/client-server.png) Benchmarks configuration - This was our setup for this experiment: - Client: 8 vcpus, 16 GiB memory, 64GiB storage ( `Standard D8ls v5` on Azure Cloud) - Server: 8 vcpus, 32 GiB memory, 64GiB storage ( `Standard D8s v3` on Azure Cloud) - The Python client uploads data to the server, waits for all required indexes to be constructed, and then performs searches with configured number of threads. We repeat this process with different configurations for each engine, and then select the best one for a given precision. - We ran all the engines in docker and limited their memory to 25GB. This was used to ensure fairness by avoiding the case of some engine configs being too greedy with RAM usage. This 25 GB limit is completely fair because even to serve the largest `dbpedia-openai-1M-1536-angular` dataset, one hardly needs `1M * 1536 * 4bytes * 1.5 = 8.6GB` of RAM (including vectors + index). Hence, we decided to provide all the engines with ~3x the requirement. Please note that some of the configs of some engines crashed on some datasets because of the 25 GB memory limit. That’s why you might see fewer points for some engines on choosing higher precision thresholds. Share this article [x](https://twitter.com/intent/tweet?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fsingle-node-speed-benchmark%2F&text=Single%20node%20benchmarks "x")[LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Fsingle-node-speed-benchmark%2F "LinkedIn") Up! <|page-147-lllmstxt|> ## overview - [Documentation](https://qdrant.tech/documentation/) - What is Qdrant? --- # [Anchor](https://qdrant.tech/documentation/overview/\#introduction) Introduction Vector databases are a relatively new way for interacting with abstract data representations derived from opaque machine learning models such as deep learning architectures. These representations are often called vectors or embeddings and they are a compressed version of the data used to train a machine learning model to accomplish a task like sentiment analysis, speech recognition, object detection, and many others. These new databases shine in many applications like [semantic search](https://en.wikipedia.org/wiki/Semantic_search) and [recommendation systems](https://en.wikipedia.org/wiki/Recommender_system), and here, we’ll learn about one of the most popular and fastest growing vector databases in the market, [Qdrant](https://github.com/qdrant/qdrant). ## [Anchor](https://qdrant.tech/documentation/overview/\#what-is-qdrant) What is Qdrant? [Qdrant](https://github.com/qdrant/qdrant) “is a vector similarity search engine that provides a production-ready service with a convenient API to store, search, and manage points (i.e. vectors) with an additional payload.” You can think of the payloads as additional pieces of information that can help you hone in on your search and also receive useful information that you can give to your users. You can get started using Qdrant with the Python `qdrant-client`, by pulling the latest docker image of `qdrant` and connecting to it locally, or by trying out [Qdrant’s Cloud](https://cloud.qdrant.io/) free tier option until you are ready to make the full switch. With that out of the way, let’s talk about what are vector databases. ## [Anchor](https://qdrant.tech/documentation/overview/\#what-are-vector-databases) What Are Vector Databases? ![dbs](https://raw.githubusercontent.com/ramonpzg/mlops-sydney-2023/main/images/databases.png) Vector databases are a type of database designed to store and query high-dimensional vectors efficiently. In traditional [OLTP](https://www.ibm.com/topics/oltp) and [OLAP](https://www.ibm.com/topics/olap) databases (as seen in the image above), data is organized in rows and columns (and these are called **Tables**), and queries are performed based on the values in those columns. However, in certain applications including image recognition, natural language processing, and recommendation systems, data is often represented as vectors in a high-dimensional space, and these vectors, plus an id and a payload, are the elements we store in something called a **Collection** within a vector database like Qdrant. A vector in this context is a mathematical representation of an object or data point, where elements of the vector implicitly or explicitly correspond to specific features or attributes of the object. For example, in an image recognition system, a vector could represent an image, with each element of the vector representing a pixel value or a descriptor/characteristic of that pixel. In a music recommendation system, each vector could represent a song, and elements of the vector would capture song characteristics such as tempo, genre, lyrics, and so on. Vector databases are optimized for **storing** and **querying** these high-dimensional vectors efficiently, and they often use specialized data structures and indexing techniques such as Hierarchical Navigable Small World (HNSW) – which is used to implement Approximate Nearest Neighbors – and Product Quantization, among others. These databases enable fast similarity and semantic search while allowing users to find vectors that are the closest to a given query vector based on some distance metric. The most commonly used distance metrics are Euclidean Distance, Cosine Similarity, and Dot Product, and these three are fully supported Qdrant. Here’s a quick overview of the three: - [**Cosine Similarity**](https://en.wikipedia.org/wiki/Cosine_similarity) \- Cosine similarity is a way to measure how similar two vectors are. To simplify, it reflects whether the vectors have the same direction (similar) or are poles apart. Cosine similarity is often used with text representations to compare how similar two documents or sentences are to each other. The output of cosine similarity ranges from -1 to 1, where -1 means the two vectors are completely dissimilar, and 1 indicates maximum similarity. - [**Dot Product**](https://en.wikipedia.org/wiki/Dot_product) \- The dot product similarity metric is another way of measuring how similar two vectors are. Unlike cosine similarity, it also considers the length of the vectors. This might be important when, for example, vector representations of your documents are built based on the term (word) frequencies. The dot product similarity is calculated by multiplying the respective values in the two vectors and then summing those products. The higher the sum, the more similar the two vectors are. If you normalize the vectors (so the numbers in them sum up to 1), the dot product similarity will become the cosine similarity. - [**Euclidean Distance**](https://en.wikipedia.org/wiki/Euclidean_distance) \- Euclidean distance is a way to measure the distance between two points in space, similar to how we measure the distance between two places on a map. It’s calculated by finding the square root of the sum of the squared differences between the two points’ coordinates. This distance metric is also commonly used in machine learning to measure how similar or dissimilar two vectors are. Now that we know what vector databases are and how they are structurally different than other databases, let’s go over why they are important. ## [Anchor](https://qdrant.tech/documentation/overview/\#why-do-we-need-vector-databases) Why do we need Vector Databases? Vector databases play a crucial role in various applications that require similarity search, such as recommendation systems, content-based image retrieval, and personalized search. By taking advantage of their efficient indexing and searching techniques, vector databases enable faster and more accurate retrieval of unstructured data already represented as vectors, which can help put in front of users the most relevant results to their queries. In addition, other benefits of using vector databases include: 1. Efficient storage and indexing of high-dimensional data. 2. Ability to handle large-scale datasets with billions of data points. 3. Support for real-time analytics and queries. 4. Ability to handle vectors derived from complex data types such as images, videos, and natural language text. 5. Improved performance and reduced latency in machine learning and AI applications. 6. Reduced development and deployment time and cost compared to building a custom solution. Keep in mind that the specific benefits of using a vector database may vary depending on the use case of your organization and the features of the database you ultimately choose. Let’s now evaluate, at a high-level, the way Qdrant is architected. ## [Anchor](https://qdrant.tech/documentation/overview/\#high-level-overview-of-qdrants-architecture) High-Level Overview of Qdrant’s Architecture ![qdrant](https://raw.githubusercontent.com/ramonpzg/mlops-sydney-2023/main/images/qdrant_overview_high_level.png) The diagram above represents a high-level overview of some of the main components of Qdrant. Here are the terminologies you should get familiar with. - [Collections](https://qdrant.tech/documentation/concepts/collections/): A collection is a named set of points (vectors with a payload) among which you can search. The vector of each point within the same collection must have the same dimensionality and be compared by a single metric. [Named vectors](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors) can be used to have multiple vectors in a single point, each of which can have their own dimensionality and metric requirements. - [Distance Metrics](https://en.wikipedia.org/wiki/Metric_space): These are used to measure similarities among vectors and they must be selected at the same time you are creating a collection. The choice of metric depends on the way the vectors were obtained and, in particular, on the neural network that will be used to encode new queries. - [Points](https://qdrant.tech/documentation/concepts/points/): The points are the central entity that Qdrant operates with and they consist of a vector and an optional id and payload. - id: a unique identifier for your vectors. - Vector: a high-dimensional representation of data, for example, an image, a sound, a document, a video, etc. - [Payload](https://qdrant.tech/documentation/concepts/payload/): A payload is a JSON object with additional data you can add to a vector. - [Storage](https://qdrant.tech/documentation/concepts/storage/): Qdrant can use one of two options for storage, **In-memory** storage (Stores all vectors in RAM, has the highest speed since disk access is required only for persistence), or **Memmap** storage, (creates a virtual address space associated with the file on disk). - Clients: the programming languages you can use to connect to Qdrant. ## [Anchor](https://qdrant.tech/documentation/overview/\#next-steps) Next Steps Now that you know more about vector databases and Qdrant, you are ready to get started with one of our tutorials. If you’ve never used a vector database, go ahead and jump straight into the **Getting Started** section. Conversely, if you are a seasoned developer in these technology, jump to the section most relevant to your use case. As you go through the tutorials, please let us know if any questions come up in our [Discord channel here](https://qdrant.to/discord). 😎 ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/overview/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/overview/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-148-lllmstxt|> ## practicle-examples - [Articles](https://qdrant.tech/articles/) - Practical Examples #### Practical Examples Building blocks and reference implementations to help you get started with Qdrant. Learn how to use Qdrant to solve real-world problems and build the next generation of AI applications. [![Preview](https://qdrant.tech/articles_data/binary-quantization-openai/preview/preview.jpg)\\ **Optimizing OpenAI Embeddings: Enhance Efficiency with Qdrant's Binary Quantization** \\ Explore how Qdrant's Binary Quantization can significantly improve the efficiency and performance of OpenAI's Ada-003 embeddings. Learn best practices for real-time search applications.\\ \\ Nirant Kasliwal\\ \\ February 21, 2024](https://qdrant.tech/articles/binary-quantization-openai/)[![Preview](https://qdrant.tech/articles_data/food-discovery-demo/preview/preview.jpg)\\ **Food Discovery Demo** \\ Feeling hungry? Find the perfect meal with Qdrant's multimodal semantic search.\\ \\ Kacper Łukawski\\ \\ September 05, 2023](https://qdrant.tech/articles/food-discovery-demo/)[![Preview](https://qdrant.tech/articles_data/search-as-you-type/preview/preview.jpg)\\ **Semantic Search As You Type** \\ To show off Qdrant's performance, we show how to do a quick search-as-you-type that will come back within a few milliseconds.\\ \\ Andre Bogus\\ \\ August 14, 2023](https://qdrant.tech/articles/search-as-you-type/)[![Preview](https://qdrant.tech/articles_data/serverless/preview/preview.jpg)\\ **Serverless Semantic Search** \\ Create a serverless semantic search engine using nothing but Qdrant and free cloud services.\\ \\ Andre Bogus\\ \\ July 12, 2023](https://qdrant.tech/articles/serverless/)[![Preview](https://qdrant.tech/articles_data/chatgpt-plugin/preview/preview.jpg)\\ **Extending ChatGPT with a Qdrant-based knowledge base** \\ ChatGPT factuality might be improved with semantic search. Here is how.\\ \\ Kacper Łukawski\\ \\ March 23, 2023](https://qdrant.tech/articles/chatgpt-plugin/)[![Preview](https://qdrant.tech/articles_data/langchain-integration/preview/preview.jpg)\\ **Using LangChain for Question Answering with Qdrant** \\ We combined LangChain, a pre-trained LLM from OpenAI, SentenceTransformers & Qdrant to create a question answering system with just a few lines of code. Learn more!\\ \\ Kacper Łukawski\\ \\ January 31, 2023](https://qdrant.tech/articles/langchain-integration/)[![Preview](https://qdrant.tech/articles_data/qa-with-cohere-and-qdrant/preview/preview.jpg)\\ **Question Answering as a Service with Cohere and Qdrant** \\ End-to-end Question Answering system for the biomedical data with SaaS tools: Cohere co.embed API and Qdrant\\ \\ Kacper Łukawski\\ \\ November 29, 2022](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/)[![Preview](https://qdrant.tech/articles_data/faq-question-answering/preview/preview.jpg)\\ **Q&A with Similarity Learning** \\ A complete guide to building a Q&A system using Quaterion and SentenceTransformers.\\ \\ George Panchuk\\ \\ June 28, 2022](https://qdrant.tech/articles/faq-question-answering/) × [Powered by](https://qdrant.tech/) <|page-149-lllmstxt|> ## filtered-search-benchmark February 13, 2023 Dataset:keyword-100range-100int-2048100-kw-small-vocabkeyword-2048geo-radius-100range-2048geo-radius-2048int-100h-and-m-2048arxiv-titles-384 Plot values: Regular search Filter search _Download raw data: [here](https://qdrant.tech/benchmarks/filter-result-2023-02-03.json)_ ## [Anchor](https://qdrant.tech/benchmarks/filtered-search-benchmark/\#filtered-results) Filtered Results As you can see from the charts, there are three main patterns: - **Speed boost** \- for some engines/queries, the filtered search is faster than the unfiltered one. It might happen if the filter is restrictive enough, to completely avoid the usage of the vector index. - **Speed downturn** \- some engines struggle to keep high RPS, it might be related to the requirement of building a filtering mask for the dataset, as described above. - **Accuracy collapse** \- some engines are loosing accuracy dramatically under some filters. It is related to the fact that the HNSW graph becomes disconnected, and the search becomes unreliable. Qdrant avoids all these problems and also benefits from the speed boost, as it implements an advanced [query planning strategy](https://qdrant.tech/documentation/search/#query-planning). Share this article [x](https://twitter.com/intent/tweet?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Ffiltered-search-benchmark%2F&text= "x")[LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fqdrant.tech%2Fbenchmarks%2Ffiltered-search-benchmark%2F "LinkedIn") Up! <|page-150-lllmstxt|> ## database-optimization - [Documentation](https://qdrant.tech/documentation/) - [Faq](https://qdrant.tech/documentation/faq/) - Database Optimization --- # [Anchor](https://qdrant.tech/documentation/faq/database-optimization/\#frequently-asked-questions-database-optimization) Frequently Asked Questions: Database Optimization ### [Anchor](https://qdrant.tech/documentation/faq/database-optimization/\#how-do-i-reduce-memory-usage) How do I reduce memory usage? The primary source of memory usage is vector data. There are several ways to address that: - Configure [Quantization](https://qdrant.tech/documentation/guides/quantization/) to reduce the memory usage of vectors. - Configure on-disk vector storage The choice of the approach depends on your requirements. Read more about [configuring the optimal](https://qdrant.tech/documentation/tutorials/optimize/) use of Qdrant. ### [Anchor](https://qdrant.tech/documentation/faq/database-optimization/\#how-do-you-choose-the-machine-configuration) How do you choose the machine configuration? There are two main scenarios of Qdrant usage in terms of resource consumption: - **Performance-optimized** – when you need to serve vector search as fast (many) as possible. In this case, you need to have as much vector data in RAM as possible. Use our [calculator](https://cloud.qdrant.io/calculator) to estimate the required RAM. - **Storage-optimized** – when you need to store many vectors and minimize costs by compromising some search speed. In this case, pay attention to the disk speed instead. More about it in the article about [Memory Consumption](https://qdrant.tech/articles/memory-consumption/). ### [Anchor](https://qdrant.tech/documentation/faq/database-optimization/\#i-configured-on-disk-vector-storage-but-memory-usage-is-still-high-why) I configured on-disk vector storage, but memory usage is still high. Why? Firstly, memory usage metrics as reported by `top` or `htop` may be misleading. They are not showing the minimal amount of memory required to run the service. If the RSS memory usage is 10 GB, it doesn’t mean that it won’t work on a machine with 8 GB of RAM. Qdrant uses many techniques to reduce search latency, including caching disk data in RAM and preloading data from disk to RAM. As a result, the Qdrant process might use more memory than the minimum required to run the service. > Unused RAM is wasted RAM If you want to limit the memory usage of the service, we recommend using [limits in Docker](https://docs.docker.com/config/containers/resource_constraints/#memory) or Kubernetes. ### [Anchor](https://qdrant.tech/documentation/faq/database-optimization/\#my-requests-are-very-slow-or-time-out-what-should-i-do) My requests are very slow or time out. What should I do? There are several possible reasons for that: - **Using filters without payload index** – If you’re performing a search with a filter but you don’t have a payload index, Qdrant will have to load whole payload data from disk to check the filtering condition. Ensure you have adequately configured [payload indexes](https://qdrant.tech/documentation/concepts/indexing/#payload-index). - **Usage of on-disk vector storage with slow disks** – If you’re using on-disk vector storage, ensure you have fast enough disks. We recommend using local SSDs with at least 50k IOPS. Read more about the influence of the disk speed on the search latency in the article about [Memory Consumption](https://qdrant.tech/articles/memory-consumption/). - **Large limit or non-optimal query parameters** – A large limit or offset might lead to significant performance degradation. Please pay close attention to the query/collection parameters that significantly diverge from the defaults. They might be the reason for the performance issues. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/faq/database-optimization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/faq/database-optimization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-151-lllmstxt|> ## cloud-getting-started - [Documentation](https://qdrant.tech/documentation/) - Getting Started --- # [Anchor](https://qdrant.tech/documentation/cloud-getting-started/\#getting-started-with-qdrant-managed-cloud) Getting Started with Qdrant Managed Cloud Welcome to Qdrant Managed Cloud! This document contains all the information you need to get started. ## [Anchor](https://qdrant.tech/documentation/cloud-getting-started/\#prerequisites) Prerequisites Before creating a cluster, make sure you have a Qdrant Cloud account. Detailed instructions for signing up can be found in the [Qdrant Cloud Setup](https://qdrant.tech/documentation/cloud/qdrant-cloud-setup/) guide. You also need to provide [payment details](https://qdrant.tech/documentation/cloud/pricing-payments/). If you have a custom payment agreement, first create your account, then [contact our Support Team](https://support.qdrant.io/) to finalize the setup. Premium Plan subscribers can enable single sign-on (SSO) for their organizations. To activate SSO, please reach out to the Support Team at [https://support.qdrant.io/](https://support.qdrant.io/) for guidance. ## [Anchor](https://qdrant.tech/documentation/cloud-getting-started/\#cluster-sizing) Cluster Sizing Before deploying any cluster, consider the resources needed for your specific workload. Our [Capacity Planning guide](https://qdrant.tech/documentation/guides/capacity-planning/) describes how to assess the required CPU, memory, and storage. Additionally, the [Pricing Calculator](https://cloud.qdrant.io/calculator) helps you estimate associated costs based on your projected usage. ## [Anchor](https://qdrant.tech/documentation/cloud-getting-started/\#creating-and-managing-clusters) Creating and Managing Clusters After setting up your account, you can create a Qdrant Cluster by following the steps in [Create a Cluster](https://qdrant.tech/documentation/cloud/create-cluster/). ## [Anchor](https://qdrant.tech/documentation/cloud-getting-started/\#preparing-for-production) Preparing for Production For a production-ready environment, consider deploying a multi-node Qdrant cluster (at least three nodes) with replication enabled. Instructions for configuring distributed clusters are available in the [Distributed Deployment](https://qdrant.tech/documentation/guides/distributed_deployment/) guide. If you are looking to optimize costs, you can reduce memory usage through [Quantization](https://qdrant.tech/documentation/guides/quantization/) or by [offloading vectors to disk](https://qdrant.tech/documentation/concepts/storage/#configuring-memmap-storage). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-getting-started.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-getting-started.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-152-lllmstxt|> ## dedicated-vector-search - [Articles](https://qdrant.tech/articles/) - Built for Vector Search [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Built for Vector Search Evgeniya Sukhodolskaya & Andrey Vasnetsov · February 17, 2025 ![Built for Vector Search](https://qdrant.tech/articles_data/dedicated-vector-search/preview/title.jpg) Any problem with even a bit of complexity requires a specialized solution. You can use a Swiss Army knife to open a bottle or poke a hole in a cardboard box, but you will need an axe to chop wood — the same goes for software. In this article, we will describe the unique challenges vector search poses and why a dedicated solution is the best way to tackle them. ## [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#vectors) Vectors ![vectors](https://qdrant.tech/articles_data/dedicated-vector-search/image1.jpg) Let’s look at the central concept of vector databases — [**vectors**](https://qdrant.tech/documentation/concepts/vectors/). Vectors (also known as embeddings) are high-dimensional representations of various data points — texts, images, videos, etc. Many state-of-the-art (SOTA) embedding models generate representations of over 1,500 dimensions. When it comes to state-of-the-art PDF retrieval, the representations can reach [**over 100,000 dimensions per page**](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/). This brings us to the first challenge of vector search — vectors are heavy. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#vectors-are-heavy) Vectors are Heavy To put this in perspective, consider one million records stored in a relational database. It’s a relatively small amount of data for modern databases, which a free tier of many cloud providers could easily handle. Now, generate a 1536-dimensional embedding with OpenAI’s `text-embedding-ada-002` model from each record, and you are looking at around **6GB of storage**. As a result, vector search workloads, especially if not optimized, will quickly dominate the main use cases of a non-vector database. Having vectors as a part of a main database is a potential issue for another reason — vectors are always a transformation of other data. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#vectors-are-a-transformation) Vectors are a Transformation Vectors are obtained from some other source-of-truth data. They can be restored if lost with the same embedding model previously used. At the same time, even small changes in that model can shift the geometry of the vector space, so if you update or change the embedding model, you need to update and reindex all the data to maintain accurate vector comparisons. If coupled with the main database, this update process can lead to significant complications and even unavailability of the whole system. However, vectors have positive properties as well. One of the most important is that vectors are fixed-size. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#vectors-are-fixed-size) Vectors are Fixed-Size Embedding models are designed to produce vectors of a fixed size. We have to use it to our advantage. For fast search, vectors need to be instantly accessible. Whether in [**RAM or disk**](https://qdrant.tech/documentation/concepts/storage/), vectors should be stored in a format that allows quick access and comparison. This is essential, as vector comparison is a very hot operation in vector search workloads. It is often performed thousands of times per search query, so even a small overhead can lead to a significant slowdown. For dedicated storage, vectors’ fixed size comes as a blessing. Knowing how much space one data point needs, we don’t have to deal with the usual overhead of locating data — the location of elements in storage is straightforward to calculate. Everything becomes far less intuitive if vectors are stored together with other data types, for example, texts or JSONs. The size of a single data point is not fixed anymore, so accessing it becomes non-trivial, especially if data is added, updated, and deleted over time. ![Fixed size columns VS Variable length table](https://qdrant.tech/articles_data/dedicated-vector-search/dedicated_storage.png) Fixed size columns VS Variable length table **Storing vectors together with other types of data, we lose all the benefits of their characteristics**; however, we fully “enjoy” their drawbacks, polluting the storage with an extremely heavy transformation of data already existing in that storage. ## [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#vector-search) Vector Search ![vector-search](https://qdrant.tech/articles_data/dedicated-vector-search/image2.jpg) Unlike traditional databases that serve as data stores, **vector databases are more like search engines**. They are designed to be **scalable**, always **available**, and capable of delivering high-speed search results even under heavy loads. Just as Google or Bing can handle billions of queries at once, vector databases are designed for scenarios where rapid, high-throughput, low-latency retrieval is a must. ![Database Compass](https://qdrant.tech/articles_data/dedicated-vector-search/compass.png) Database Compass ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#pick-any-two) Pick Any Two Distributed systems are perfect for scalability — horizontal scaling in these systems allows you to add more machines as needed. In the world of distributed systems, one well-known principle — the **CAP theorem** — illustrates that you cannot have it all. The theorem states that a distributed system can guarantee only two out of three properties: **Consistency**, **Availability**, and **Partition Tolerance**. As network partitions are inevitable in any real-world distributed system, all modern distributed databases are designed with partition tolerance in mind, forcing a trade-off between **consistency** (providing the most up-to-date data) and **availability** (remaining responsive). There are two main design philosophies for databases in this context: ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#acid-prioritizing-consistency) ACID: Prioritizing Consistency The ACID model ensures that every transaction (a group of operations treated as a single unit, such as transferring money between accounts) is executed fully or not at all (reverted), leaving the database in a valid state. When a system is distributed, achieving ACID properties requires complex coordination between nodes. Each node must communicate and agree on the state of a transaction, which can **limit system availability** — if a node is uncertain about the state of another, it may refuse to process a transaction until consistency is assured. This coordination also makes **scaling more challenging**. Financial institutions use ACID-compliant databases when dealing with money transfers, where even a momentary discrepancy in an account balance is unacceptable. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#base-prioritizing-availability) BASE: Prioritizing Availability On the other hand, the BASE model favors high availability and partition tolerance. BASE systems distribute data and workload across multiple nodes, enabling them to respond to read and write requests immediately. They operate under the principle of **eventual consistency** — although data may be temporarily out-of-date, the system will converge on a consistent state given time. Social media platforms, streaming services, and search engines all benefit from the BASE approach. For these applications, having immediate responsiveness is more critical than strict consistency. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#based-vector-search) BASEd Vector Search Considering the specifics of vector search — its nature demanding availability & scalability — it should be served on BASE-oriented architecture. This choice is made due to the need for horizontal scaling, high availability, low latency, and high throughput. For example, having BASE-focused architecture allows us to [**easily manage resharding**](https://qdrant.tech/documentation/cloud/cluster-scaling/#resharding). A strictly consistent transactional approach also loses its attractiveness when we remember that vectors are heavy transformations of data at our disposal — what’s the point in limiting data protection mechanisms if we can always restore vectorized data through a transformation? ## [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#vector-index) Vector Index ![vector-index](https://qdrant.tech/articles_data/dedicated-vector-search/image3.jpg) [**Vector search**](https://qdrant.tech/documentation/concepts/search/) relies on high-dimensional vector mathematics, making it computationally heavy at scale. A brute-force similarity search would require comparing a query against every vector in the database. In a database with 100 million 1536-dimensional vectors, performing 100 million comparisons per one query is unfeasible for production scenarios. Instead of a brute-force approach, vector databases have specialized approximate nearest neighbour (ANN) indexes that balance search precision and speed. These indexes require carefully designed architectures to make their maintenance in production feasible. ![HNSW Index](https://qdrant.tech/articles_data/dedicated-vector-search/hnsw.png) HNSW Index One of the most popular vector indexes is **HNSW (Hierarchical Navigable Small World)**, which we picked for its capability to provide simultaneously high search speed and accuracy. High performance came with a cost — implementing it in production is untrivial due to several challenges, so to make it shine all the system’s architecture has to be structured around it, serving the capricious index. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#index-complexity) Index Complexity [**HNSW**](https://qdrant.tech/documentation/concepts/indexing/) is structured as a multi-layered graph. With a new data point inserted, the algorithm must compare it to existing nodes across several layers to index it. As the number of vectors grows, these comparisons will noticeably slow down the construction process, making updates increasingly time-consuming. The indexing operation can quickly become the bottleneck in the system, slowing down search requests. Building an HNSW monolith means limiting the scalability of your solution — its size has to be capped, as its construction time scales **non-linearly** with the number of elements. To keep the construction process feasible and ensure it doesn’t affect the search time, we came up with a layered architecture that breaks down all data management into small units called **segments**. ![Storage structure](https://qdrant.tech/articles_data/dedicated-vector-search/segments.png) Storage structure Each segment isolates a subset of vectorized corpora and supports all collection-level operations on it, from searching to indexing, for example segments build their own index on the subset of data available to them. For users working on a collection level, the specifics of segmentation are unnoticeable. The search results they get span the whole collection, as sub-results are gathered from segments and then merged & deduplicated. By balancing between size and number of segments, we can ensure the right balance between search speed and indexing time, making the system flexible for different workloads. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#immutability) Immutability With index maintenance divided between segments, Qdrant can ensure high performance even during heavy load, and additional optimizations secure that further. These optimizations come from an idea that working with immutable structures introduces plenty of benefits: the possibility of using internally fixed sized lists (so no dynamic updates), ordering stored data accordingly to access patterns (so no unpredictable random accesses). With this in mind, to optimize search speed and memory management further, we use a strategy that combines and manages [**mutable and immutable segments**](https://qdrant.tech/articles/immutable-data-structures/). | | | | --- | --- | | **Mutable Segments** | These are used for quickly ingesting new data and handling changes (updates) to existing data. | | **Immutable Segments** | Once a mutable segment reaches a certain size, an optimization process converts it into an immutable segment, constructing an HNSW index – you could [**read about these optimizers here**](https://qdrant.tech/documentation/concepts/optimizer/#optimizer) in detail. This immutability trick allowed us, for example, to ensure effective [**tenant isolation**](https://qdrant.tech/documentation/concepts/indexing/#tenant-index). | Immutable segments are an implementation detail transparent for users — they can delete vectors at any time, while additions and updates are applied to a mutable segment instead. This combination of mutability and immutability allows search and indexing to smoothly run simultaneously, even under heavy loads. This approach minimizes the performance impact of indexing time and allows on-the-fly configuration changes on a collection level (such as enabling or disabling data quantization) without downtimes. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#filterable-index) Filterable Index Vector search wasn’t historically designed for filtering — imposing strict constraints on results. It’s inherently fuzzy; every document is, to some extent, both similar and dissimilar to any query — there’s no binary “ _fits/doesn’t fit_” segregation. As a result, vector search algorithms weren’t originally built with filtering in mind. At the same time, filtering is unavoidable in many vector search applications, such as [**e-commerce search/recommendations**](https://qdrant.tech/recommendations/). Searching for a Christmas present, you might want to filter out everything over 100 euros while still benefiting from the vector search’s semantic nature. In many vector search solutions, filtering is approached in two ways: **pre-filtering** (computes a binary mask for all vectors fitting the condition before running HNSW search) or **post-filtering** (running HNSW as usual and then filtering the results). | | | | | --- | --- | --- | | ❌ | **Pre-filtering** | Has the linear complexity of computing the vector mask and becomes a bottleneck for large datasets. | | ❌ | **Post-filtering** | The problem with **post-filtering** is tied to vector search “ _everything fits and doesn’t at the same time_” nature: imagine a low-cardinality filter that leaves only a few matching elements in the database. If none of them are similar enough to the query to appear in the top-X retrieved results, they’ll all be filtered out. | Qdrant [**took filtering in vector search further**](https://qdrant.tech/articles/vector-search-filtering/), recognizing the limitations of pre-filtering & post-filtering strategies. We developed an adaptation of HNSW — [**filterable HNSW**](https://qdrant.tech/articles/filtrable-hnsw/) — that also enables **in-place filtering** during graph traversal. To make this possible, we condition HNSW index construction on possible filtering conditions reflected by [**payload indexes**](https://qdrant.tech/documentation/concepts/indexing/#payload-index) (inverted indexes built on vectors’ [**metadata**](https://qdrant.tech/documentation/concepts/payload/)). **Qdrant was designed with a vector index being a central component of the system.** That made it possible to organize optimizers, payload indexes and other components around the vector index, unlocking the possibility of building a filterable HNSW. ![Filterable Vector Index](https://qdrant.tech/articles_data/dedicated-vector-search/filterable-vector-index.png) Filterable Vector Index In general, optimizing vector search requires a custom, finely tuned approach to data and index management that secures high performance even as data grows and changes dynamically. This specialized architecture is the key reason why **dedicated vector databases will always outperform general-purpose databases in production settings**. ## [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#vector-search-beyond-rag) Vector Search Beyond RAG ![Vector Search is not Text Search Extension](https://qdrant.tech/articles_data/dedicated-vector-search/venn-diagram.png) Vector Search is not Text Search Extension Many discussions about the purpose of vector databases focus on Retrieval-Augmented Generation (RAG) — or its more advanced variant, agentic RAG — where vector databases are used as a knowledge source to retrieve context for large language models (LLMs). This is a legitimate use case, however, the hype wave of RAG solutions has overshadowed the broader potential of vector search, which goes [**beyond augmenting generative AI**](https://qdrant.tech/articles/vector-similarity-beyond-search/). ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#discovery) Discovery The strength of vector search lies in its ability to facilitate [**discovery**](https://qdrant.tech/articles/discovery-search/). Vector search allows you to refine your choices as you search rather than starting with a fixed query. Say, [**you’re ordering food not knowing exactly what you want**](https://qdrant.tech/articles/food-discovery-demo/) — just that it should contain meat & not a burger, or that it should be meat with cheese & not tacos. Instead of searching for a specific dish, vector search helps you navigate options based on similarity and dissimilarity, guiding you toward something that matches your taste without requiring you to define it upfront. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#recommendations) Recommendations Vector search is perfect for [**recommendations**](https://qdrant.tech/documentation/concepts/explore/#recommendation-api). Imagine browsing for a new book or movie. Instead of searching for an exact match, you might look for stories that capture a certain mood or theme but differ in key aspects from what you already know. For example, you may [**want a film featuring wizards without the familiar feel of the “Harry Potter” series**](https://www.youtube.com/watch?v=O5mT8M7rqQQ). This flexibility is possible because vector search is not tied to the binary “match/not match” concept but operates on distances in a vector space. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#big-unstructured-data-analysis) Big Unstructured Data Analysis Vector search nature makes it also ideal for [**big unstructured data analysis**](https://www.youtube.com/watch?v=_BQTnXpuH-E), for instance, anomaly detection. In large, unstructured, and often unlabelled datasets, vector search can help identify clusters and outliers by analyzing distance relationships between data points. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#fundamentally-different) Fundamentally Different **Vector search beyond RAG isn’t just another feature — it’s a fundamental shift in how we interact with data**. Dedicated solutions integrate these capabilities natively and are designed from the ground up to handle high-dimensional math and (dis-)similarity-based retrieval. In contrast, databases with vector extensions are built around a different data paradigm, making it impossible to efficiently support advanced vector search capabilities. Even if you want to retrofit these capabilities, it’s not just a matter of adding a new feature — it’s a structural problem. Supporting advanced vector search requires **dedicated interfaces** that enable flexible usage of vector search from multi-stage filtering to dynamic exploration of high-dimensional spaces. When the underlying architecture wasn’t initially designed for this kind of interaction, integrating interfaces is a **software engineering team nightmare**. You end up breaking existing assumptions, forcing inefficient workarounds, and often introducing backwards-compatibility problems. It’s why attempts to patch vector search onto traditional databases won’t match the efficiency of purpose-built systems. ## [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#making-vector-search-state-of-the-art) Making Vector Search State-of-the-Art ![vector-search-state-of-the-art](https://qdrant.tech/articles_data/dedicated-vector-search/image4.jpg) Now, let’s shift focus to another key advantage of dedicated solutions — their ability to keep up with state-of-the-art solutions in the field. [**Vector databases**](https://qdrant.tech/qdrant-vector-database/) are purpose-built for vector retrieval, and as a result, they offer cutting-edge features that are often critical for AI businesses relying on vector search. Vector database engineers invest significant time and effort into researching and implementing the most optimal ways to perform vector search. Many of these innovations come naturally to vector-native architectures, while general-purpose databases with added vector capabilities may struggle to adapt and replicate these benefits efficiently. Consider some of the advanced features implemented in Qdrant: - [**GPU-Accelerated Indexing**](https://qdrant.tech/blog/qdrant-1.13.x/#gpu-accelerated-indexing) By offloading index construction tasks to the GPU, Qdrant can significantly speed up the process of data indexing while keeping costs low. This becomes especially valuable when working with large datasets in hot data scenarios. GPU acceleration in Qdrant is a custom solution developed by an enthusiast from our core team. It’s vendor-free and natively supports all Qdrant’s unique architectural features, from FIlterable HNSW to multivectors. - [**Multivectors**](https://qdrant.tech/documentation/concepts/vectors/?q=multivectors#multivectors) Some modern embedding models produce an entire matrix (a list of vectors) as output rather than a single vector. Qdrant supports multivectors natively. This feature is critical when using state-of-the-art retrieval models such as [**ColBERT**](https://qdrant.tech/documentation/fastembed/fastembed-colbert/), ColPali, or ColQwen. For instance, ColPali and ColQwen produce multivector outputs, and supporting them natively is crucial for [**state-of-the-art (SOTA) PDF-retrieval**](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/). In addition to that, we continuously look for improvements in: | | | | --- | --- | | **Memory Efficiency & Compression** | Techniques such as [**quantization**](https://qdrant.tech/articles/dedicated-vector-search/documentation/guides/quantization/) and [**HNSW compression**](https://qdrant.tech/blog/qdrant-1.13.x/#hnsw-graph-compression) to reduce storage requirements | | **Retrieval Algorithms** | Support for the latest retrieval algorithms, including [**sparse neural retrieval**](https://qdrant.tech/articles/modern-sparse-neural-retrieval/), [**hybrid search**](https://qdrant.tech/documentation/concepts/hybrid-queries/) methods, and [**re-rankers**](https://qdrant.tech/documentation/fastembed/fastembed-rerankers/). | | **Vector Data Analysis & Visualization** | Tools like the [**distance matrix API**](https://qdrant.tech/blog/qdrant-1.12.x/#distance-matrix-api-for-data-insights) provide insights into vectorized data, and a [**Web UI**](https://qdrant.tech/blog/qdrant-1.11.x/#web-ui-search-quality-tool) allows for intuitive exploration of data. | | **Search Speed & Scalability** | Includes optimizations for [**multi-tenant environments**](https://qdrant.tech/articles/multitenancy/) to ensure efficient and scalable search. | **These advancements are not just incremental improvements — they define the difference between a system optimized for vector search and one that accommodates it.** Staying at the cutting edge of vector search is not just about performance — it’s also about keeping pace with an evolving AI landscape. ## [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#summing-up) Summing up ![conclusion-vector-search](https://qdrant.tech/articles_data/dedicated-vector-search/image5.jpg) When it comes to vector search, there’s a clear distinction between using a dedicated vector search solution and extending a database to support vector operations. **For small-scale applications or prototypes handling up to a million data points, a non-optimized architecture might suffice.** However, as the volume of vectors grows, an unoptimized solution will quickly become a bottleneck — slowing down search operations and limiting scalability. Dedicated vector search solutions are engineered from the ground up to handle massive amounts of high-dimensional data efficiently. State-of-the-art (SOTA) vector search evolves rapidly. If you plan to build on the latest advances, using a vector extension will eventually hold you back. Dedicated vector search solutions integrate these features natively, ensuring that you benefit from continuous innovations without compromising performance. The power of vector search extends into areas such as big data analysis, recommendation systems, and discovery-based applications, and to support these vector search capabilities, a dedicated solution is needed. ### [Anchor](https://qdrant.tech/articles/dedicated-vector-search/\#when-to-choose-a-dedicated-database-over-an-extension) When to Choose a Dedicated Database over an Extension: - **High-Volume, Real-Time Search**: Ideal for applications with many simultaneous users who require fast, continuous access to search results—think search engines, e-commerce recommendations, social media, or media streaming services. - **Dynamic, Unstructured Data**: Perfect for scenarios where data is continuously evolving and where the goal is to discover insights from data patterns. - **Innovative Applications**: If you’re looking to implement advanced use cases such as recommendation engines, hybrid search solutions, or exploratory data analysis where traditional exact or token-based searches hold short. Investing in a dedicated vector search engine will deliver the performance and flexibility necessary for success if your application relies on vector search at scale, keeps up with trends, or requires more than just a simple small-scale similarity search. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dedicated-vector-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dedicated-vector-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-153-lllmstxt|> ## binary-quantization-openai - [Articles](https://qdrant.tech/articles/) - Optimizing OpenAI Embeddings: Enhance Efficiency with Qdrant's Binary Quantization [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Optimizing OpenAI Embeddings: Enhance Efficiency with Qdrant's Binary Quantization Nirant Kasliwal · February 21, 2024 ![Optimizing OpenAI Embeddings: Enhance Efficiency with Qdrant's Binary Quantization](https://qdrant.tech/articles_data/binary-quantization-openai/preview/title.jpg) OpenAI Ada-003 embeddings are a powerful tool for natural language processing (NLP). However, the size of the embeddings are a challenge, especially with real-time search and retrieval. In this article, we explore how you can use Qdrant’s Binary Quantization to enhance the performance and efficiency of OpenAI embeddings. In this post, we discuss: - The significance of OpenAI embeddings and real-world challenges. - Qdrant’s Binary Quantization, and how it can improve the performance of OpenAI embeddings - Results of an experiment that highlights improvements in search efficiency and accuracy - Implications of these findings for real-world applications - Best practices for leveraging Binary Quantization to enhance OpenAI embeddings If you’re new to Binary Quantization, consider reading our article which walks you through the concept and [how to use it with Qdrant](https://qdrant.tech/articles/binary-quantization/) You can also try out these techniques as described in [Binary Quantization OpenAI](https://github.com/qdrant/examples/blob/openai-3/binary-quantization-openai/README.md), which includes Jupyter notebooks. ## [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#new-openai-embeddings-performance-and-changes) New OpenAI embeddings: performance and changes As the technology of embedding models has advanced, demand has grown. Users are looking more for powerful and efficient text-embedding models. OpenAI’s Ada-003 embeddings offer state-of-the-art performance on a wide range of NLP tasks, including those noted in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) and [MIRACL](https://openai.com/blog/new-embedding-models-and-api-updates). These models include multilingual support in over 100 languages. The transition from text-embedding-ada-002 to text-embedding-3-large has led to a significant jump in performance scores (from 31.4% to 54.9% on MIRACL). #### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#matryoshka-representation-learning) Matryoshka representation learning The new OpenAI models have been trained with a novel approach called “ [Matryoshka Representation Learning](https://aniketrege.github.io/blog/2024/mrl/)”. Developers can set up embeddings of different sizes (number of dimensions). In this post, we use small and large variants. Developers can select embeddings which balances accuracy and size. Here, we show how the accuracy of binary quantization is quite good across different dimensions – for both the models. ## [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#enhanced-performance-and-efficiency-with-binary-quantization) Enhanced performance and efficiency with binary quantization By reducing storage needs, you can scale applications with lower costs. This addresses a critical challenge posed by the original embedding sizes. Binary Quantization also speeds the search process. It simplifies the complex distance calculations between vectors into more manageable bitwise operations, which supports potentially real-time searches across vast datasets. The accompanying graph illustrates the promising accuracy levels achievable with binary quantization across different model sizes, showcasing its practicality without severely compromising on performance. This dual advantage of storage reduction and accelerated search capabilities underscores the transformative potential of Binary Quantization in deploying OpenAI embeddings more effectively across various real-world applications. ![](https://qdrant.tech/blog/openai/Accuracy_Models.png) The efficiency gains from Binary Quantization are as follows: - Reduced storage footprint: It helps with large-scale datasets. It also saves on memory, and scales up to 30x at the same cost. - Enhanced speed of data retrieval: Smaller data sizes generally leads to faster searches. - Accelerated search process: It is based on simplified distance calculations between vectors to bitwise operations. This enables real-time querying even in extensive databases. ### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#experiment-setup-openai-embeddings-in-focus) Experiment setup: OpenAI embeddings in focus To identify Binary Quantization’s impact on search efficiency and accuracy, we designed our experiment on OpenAI text-embedding models. These models, which capture nuanced linguistic features and semantic relationships, are the backbone of our analysis. We then delve deep into the potential enhancements offered by Qdrant’s Binary Quantization feature. This approach not only leverages the high-caliber OpenAI embeddings but also provides a broad basis for evaluating the search mechanism under scrutiny. #### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#dataset) Dataset The research employs 100K random samples from the [OpenAI 1M](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) 1M dataset, focusing on 100 randomly selected records. These records serve as queries in the experiment, aiming to assess how Binary Quantization influences search efficiency and precision within the dataset. We then use the embeddings of the queries to search for the nearest neighbors in the dataset. #### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#parameters-oversampling-rescoring-and-search-limits) Parameters: oversampling, rescoring, and search limits For each record, we run a parameter sweep over the number of oversampling, rescoring, and search limits. We can then understand the impact of these parameters on search accuracy and efficiency. Our experiment was designed to assess the impact of Binary Quantization under various conditions, based on the following parameters: - **Oversampling**: By oversampling, we can limit the loss of information inherent in quantization. This also helps to preserve the semantic richness of your OpenAI embeddings. We experimented with different oversampling factors, and identified the impact on the accuracy and efficiency of search. Spoiler: higher oversampling factors tend to improve the accuracy of searches. However, they usually require more computational resources. - **Rescoring**: Rescoring refines the first results of an initial binary search. This process leverages the original high-dimensional vectors to refine the search results, **always** improving accuracy. We toggled rescoring on and off to measure effectiveness, when combined with Binary Quantization. We also measured the impact on search performance. - **Search Limits**: We specify the number of results from the search process. We experimented with various search limits to measure their impact the accuracy and efficiency. We explored the trade-offs between search depth and performance. The results provide insight for applications with different precision and speed requirements. Through this detailed setup, our experiment sought to shed light on the nuanced interplay between Binary Quantization and the high-quality embeddings produced by OpenAI’s models. By meticulously adjusting and observing the outcomes under different conditions, we aimed to uncover actionable insights that could empower users to harness the full potential of Qdrant in combination with OpenAI’s embeddings, regardless of their specific application needs. ### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#results-binary-quantizations-impact-on-openai-embeddings) Results: binary quantization’s impact on OpenAI embeddings To analyze the impact of rescoring ( `True` or `False`), we compared results across different model configurations and search limits. Rescoring sets up a more precise search, based on results from an initial query. #### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#rescoring) Rescoring ![Graph that measures the impact of rescoring](https://qdrant.tech/blog/openai/Rescoring_Impact.png) Here are some key observations, which analyzes the impact of rescoring ( `True` or `False`): 1. **Significantly Improved Accuracy**: - Across all models and dimension configurations, enabling rescoring ( `True`) consistently results in higher accuracy scores compared to when rescoring is disabled ( `False`). - The improvement in accuracy is true across various search limits (10, 20, 50, 100). 2. **Model and Dimension Specific Observations**: - For the `text-embedding-3-large` model with 3072 dimensions, rescoring boosts the accuracy from an average of about 76-77% without rescoring to 97-99% with rescoring, depending on the search limit and oversampling rate. - The accuracy improvement with increased oversampling is more pronounced when rescoring is enabled, indicating a better utilization of the additional binary codes in refining search results. - With the `text-embedding-3-small` model at 512 dimensions, accuracy increases from around 53-55% without rescoring to 71-91% with rescoring, highlighting the significant impact of rescoring, especially at lower dimensions. In contrast, for lower dimension models (such as text-embedding-3-small with 512 dimensions), the incremental accuracy gains from increased oversampling levels are less significant, even with rescoring enabled. This suggests a diminishing return on accuracy improvement with higher oversampling in lower dimension spaces. 3. **Influence of Search Limit**: - The performance gain from rescoring seems to be relatively stable across different search limits, suggesting that rescoring consistently enhances accuracy regardless of the number of top results considered. In summary, enabling rescoring dramatically improves search accuracy across all tested configurations. It is crucial feature for applications where precision is paramount. The consistent performance boost provided by rescoring underscores its value in refining search results, particularly when working with complex, high-dimensional data like OpenAI embeddings. This enhancement is critical for applications that demand high accuracy, such as semantic search, content discovery, and recommendation systems, where the quality of search results directly impacts user experience and satisfaction. ### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#dataset-combinations) Dataset combinations For those exploring the integration of text embedding models with Qdrant, it’s crucial to consider various model configurations for optimal performance. The dataset combinations defined above illustrate different configurations to test against Qdrant. These combinations vary by two primary attributes: 1. **Model Name**: Signifying the specific text embedding model variant, such as “text-embedding-3-large” or “text-embedding-3-small”. This distinction correlates with the model’s capacity, with “large” models offering more detailed embeddings at the cost of increased computational resources. 2. **Dimensions**: This refers to the size of the vector embeddings produced by the model. Options range from 512 to 3072 dimensions. Higher dimensions could lead to more precise embeddings but might also increase the search time and memory usage in Qdrant. Optimizing these parameters is a balancing act between search accuracy and resource efficiency. Testing across these combinations allows users to identify the configuration that best meets their specific needs, considering the trade-offs between computational resources and the quality of search results. ```python dataset_combinations = [\ {\ "model_name": "text-embedding-3-large",\ "dimensions": 3072,\ },\ {\ "model_name": "text-embedding-3-large",\ "dimensions": 1024,\ },\ {\ "model_name": "text-embedding-3-large",\ "dimensions": 1536,\ },\ {\ "model_name": "text-embedding-3-small",\ "dimensions": 512,\ },\ {\ "model_name": "text-embedding-3-small",\ "dimensions": 1024,\ },\ {\ "model_name": "text-embedding-3-small",\ "dimensions": 1536,\ },\ ] ``` #### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#exploring-dataset-combinations-and-their-impacts-on-model-performance) Exploring dataset combinations and their impacts on model performance The code snippet iterates through predefined dataset and model combinations. For each combination, characterized by the model name and its dimensions, the corresponding experiment’s results are loaded. These results, which are stored in JSON format, include performance metrics like accuracy under different configurations: with and without oversampling, and with and without a rescore step. Following the extraction of these metrics, the code computes the average accuracy across different settings, excluding extreme cases of very low limits (specifically, limits of 1 and 5). This computation groups the results by oversampling, rescore presence, and limit, before calculating the mean accuracy for each subgroup. After gathering and processing this data, the average accuracies are organized into a pivot table. This table is indexed by the limit (the number of top results considered), and columns are formed based on combinations of oversampling and rescoring. ```python import pandas as pd for combination in dataset_combinations: model_name = combination["model_name"] dimensions = combination["dimensions"] print(f"Model: {model_name}, dimensions: {dimensions}") results = pd.read_json(f"../results/results-{model_name}-{dimensions}.json", lines=True) average_accuracy = results[results["limit"] != 1] average_accuracy = average_accuracy[average_accuracy["limit"] != 5] average_accuracy = average_accuracy.groupby(["oversampling", "rescore", "limit"])[\ "accuracy"\ ].mean() average_accuracy = average_accuracy.reset_index() acc = average_accuracy.pivot( index="limit", columns=["oversampling", "rescore"], values="accuracy" ) print(acc) ``` Here is a selected slice of these results, with `rescore=True`: | Method | Dimensionality | Test Dataset | Recall | Oversampling | | --- | --- | --- | --- | --- | | OpenAI text-embedding-3-large (highest MTEB score from the table) | 3072 | [DBpedia 1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M) | 0.9966 | 3x | | OpenAI text-embedding-3-small | 1536 | [DBpedia 100K](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K) | 0.9847 | 3x | | OpenAI text-embedding-3-large | 1536 | [DBpedia 1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M) | 0.9826 | 3x | #### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#impact-of-oversampling) Impact of oversampling You can use oversampling in machine learning to counteract imbalances in datasets. It works well when one class significantly outnumbers others. This imbalance can skew the performance of models, which favors the majority class at the expense of others. By creating additional samples from the minority classes, oversampling helps equalize the representation of classes in the training dataset, thus enabling more fair and accurate modeling of real-world scenarios. The screenshot showcases the effect of oversampling on model performance metrics. While the actual metrics aren’t shown, we expect to see improvements in measures such as precision, recall, or F1-score. These improvements illustrate the effectiveness of oversampling in creating a more balanced dataset. It allows the model to learn a better representation of all classes, not just the dominant one. Without an explicit code snippet or output, we focus on the role of oversampling in model fairness and performance. Through graphical representation, you can set up before-and-after comparisons. These comparisons illustrate the contribution to machine learning projects. ![Measuring the impact of oversampling](https://qdrant.tech/blog/openai/Oversampling_Impact.png) ### [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#leveraging-binary-quantization-best-practices) Leveraging binary quantization: best practices We recommend the following best practices for leveraging Binary Quantization to enhance OpenAI embeddings: 1. Embedding Model: Use the text-embedding-3-large from MTEB. It is most accurate among those tested. 2. Dimensions: Use the highest dimension available for the model, to maximize accuracy. The results are true for English and other languages. 3. Oversampling: Use an oversampling factor of 3 for the best balance between accuracy and efficiency. This factor is suitable for a wide range of applications. 4. Rescoring: Enable rescoring to improve the accuracy of search results. 5. RAM: Store the full vectors and payload on disk. Limit what you load from memory to the binary quantization index. This helps reduce the memory footprint and improve the overall efficiency of the system. The incremental latency from the disk read is negligible compared to the latency savings from the binary scoring in Qdrant, which uses SIMD instructions where possible. ## [Anchor](https://qdrant.tech/articles/binary-quantization-openai/\#whats-next) What’s next? Binary quantization is exceptional if you need to work with large volumes of data under high recall expectations. You can try this feature either by spinning up a [Qdrant container image](https://hub.docker.com/r/qdrant/qdrant) locally or, having us create one for you through a [free account](https://cloud.qdrant.io/login) in our cloud hosted service. The article gives examples of data sets and configuration you can use to get going. Our documentation covers [adding large datasets to Qdrant](https://qdrant.tech/documentation/tutorials/bulk-upload/) to your Qdrant instance as well as [more quantization methods](https://qdrant.tech/documentation/guides/quantization/). Want to discuss these findings and learn more about Binary Quantization? [Join our Discord community.](https://discord.gg/qdrant) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/binary-quantization-openai.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/binary-quantization-openai.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-154-lllmstxt|> ## create-cluster - [Documentation](https://qdrant.tech/documentation/) - [Cloud](https://qdrant.tech/documentation/cloud/) - Create a Cluster --- # [Anchor](https://qdrant.tech/documentation/cloud/create-cluster/\#creating-a-qdrant-cloud-cluster) Creating a Qdrant Cloud Cluster Qdrant Cloud offers two types of clusters: **Free** and **Standard**. ## [Anchor](https://qdrant.tech/documentation/cloud/create-cluster/\#free-clusters) Free Clusters Free tier clusters are perfect for prototyping and testing. You don’t need a credit card to join. A free tier cluster only includes 1 single node with the following resources: | Resource | Value | | --- | --- | | RAM | 1 GB | | vCPU | 0.5 | | Disk space | 4 GB | | Nodes | 1 | This configuration supports serving about 1 M vectors of 768 dimensions. To calculate your needs, refer to our documentation on [Capacity Planning](https://qdrant.tech/documentation/guides/capacity-planning/). The choice of cloud providers and regions is limited. It includes: - Standard Support - Basic monitoring - Basic log access - Basic alerting - Version upgrades with downtime - Only manual snapshots and restores via API - No dedicated resources If unused, free tier clusters are automatically suspended after 1 week, and deleted after 4 weeks of inactivity if not reactivated. You can always upgrade to a standard cluster with more resources and features. ## [Anchor](https://qdrant.tech/documentation/cloud/create-cluster/\#standard-clusters) Standard Clusters On top of the Free cluster features, Standard clusters offer: - Response time and uptime SLAs - Dedicated resources - Backup and disaster recovery - Multi-node clusters for high availability - Horizontal and vertical scaling - Monitoring and log management - Zero-downtime upgrades for multi-node clusters with replication You have a broad choice of regions on AWS, Azure and Google Cloud. For payment information see [**Pricing and Payments**](https://qdrant.tech/documentation/cloud/pricing-payments/). ## [Anchor](https://qdrant.tech/documentation/cloud/create-cluster/\#create-a-cluster) Create a Cluster ![Create Cluster Page](https://qdrant.tech/documentation/cloud/create-cluster.png) This page shows you how to use the Qdrant Cloud Console to create a custom Qdrant Cloud cluster. > **Prerequisite:** Please make sure you have provided billing information before creating a custom cluster. 01. Start in the **Clusters** section of the [Cloud Dashboard](https://cloud.qdrant.io/). 02. Select **Clusters** and then click **\+ Create**. 03. In the **Create a cluster** screen select **Free** or **Standard** Most of the remaining configuration options are only available for standard clusters. 04. Select a provider. Currently, you can deploy to: - Amazon Web Services (AWS) - Google Cloud Platform (GCP) - Microsoft Azure - Your own [Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/) Infrastructure 05. Choose your data center region or Hybrid Cloud environment. 06. Configure RAM for each node. > For more information, see our [Capacity Planning](https://qdrant.tech/documentation/guides/capacity-planning/) guidance. 07. Choose the number of vCPUs per node. If you add more RAM, the menu provides different options for vCPUs. 08. Select the number of nodes you want the cluster to be deployed on. > Each node is automatically attached with a disk, that has enough space to store data with Qdrant’s default collection configuration. 09. Select additional disk space for your deployment. > Depending on your collection configuration, you may need more disk space per RAM. For example, if you configure `on_disk: true` and only use RAM for caching. 10. Review your cluster configuration and pricing. 11. When you’re ready, select **Create**. It takes some time to provision your cluster. Once provisioned, you can access your cluster on ports 443 and 6333 (REST) and 6334 (gRPC). ![Cluster configured in the UI](https://qdrant.tech/documentation/cloud/cluster-detail.png) You should now see the new cluster in the **Clusters** menu. ## [Anchor](https://qdrant.tech/documentation/cloud/create-cluster/\#deleting-a-cluster) Deleting a Cluster You can delete a Qdrant database cluster from the cluster’s detail page. ![Delete Cluster](https://qdrant.tech/documentation/cloud/delete-cluster.png) ## [Anchor](https://qdrant.tech/documentation/cloud/create-cluster/\#next-steps) Next Steps You will need to connect to your new Qdrant Cloud cluster. Follow [**Authentication**](https://qdrant.tech/documentation/cloud/authentication/) to create one or more API keys. You can also scale your cluster both horizontally and vertically. Read more in [**Cluster Scaling**](https://qdrant.tech/documentation/cloud/cluster-scaling/). If a new Qdrant version becomes available, you can upgrade your cluster. See [**Cluster Upgrades**](https://qdrant.tech/documentation/cloud/cluster-upgrades/). For more information on creating and restoring backups of a cluster, see [**Backups**](https://qdrant.tech/documentation/cloud/backups/). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/create-cluster.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/create-cluster.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-155-lllmstxt|> ## running-with-gpu - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Running with GPU --- # [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#running-qdrant-with-gpu-support) Running Qdrant with GPU Support Starting from version v1.13.0, Qdrant offers support for GPU acceleration. However, GPU support is not included in the default Qdrant binary due to additional dependencies and libraries. Instead, you will need to use dedicated Docker images with GPU support ( [NVIDIA](https://qdrant.tech/documentation/guides/running-with-gpu/#nvidia-gpus), [AMD](https://qdrant.tech/documentation/guides/running-with-gpu/#amd-gpus)). ## [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#configuration) Configuration Qdrant includes a number of configuration options to control GPU usage. The following options are available: ```yaml gpu: # Enable GPU indexing. indexing: false # Force half precision for `f32` values while indexing. # `f16` conversion will take place # only inside GPU memory and won't affect storage type. force_half_precision: false # Used vulkan "groups" of GPU. # In other words, how many parallel points can be indexed by GPU. # Optimal value might depend on the GPU model. # Proportional, but doesn't necessary equal # to the physical number of warps. # Do not change this value unless you know what you are doing. # Default: 512 groups_count: 512 # Filter for GPU devices by hardware name. Case insensitive. # Comma-separated list of substrings to match # against the gpu device name. # Example: "nvidia" # Default: "" - all devices are accepted. device_filter: "" # List of explicit GPU devices to use. # If host has multiple GPUs, this option allows to select specific devices # by their index in the list of found devices. # If `device_filter` is set, indexes are applied after filtering. # By default, all devices are accepted. devices: null # How many parallel indexing processes are allowed to run. # Default: 1 parallel_indexes: 1 # Allow to use integrated GPUs. # Default: false allow_integrated: false # Allow to use emulated GPUs like LLVMpipe. Useful for CI. # Default: false allow_emulated: false ``` It is not recommended to change these options unless you are familiar with the Qdrant internals and the Vulkan API. ## [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#standalone-gpu-support) Standalone GPU Support For standalone usage, you can build Qdrant with GPU support by running the following command: ```bash cargo build --release --features gpu ``` Ensure your device supports Vulkan API v1.3. This includes compatibility with Apple Silicon, Intel GPUs, and CPU emulators. Note that `gpu.indexing: true` must be set in your configuration to use GPUs at runtime. ## [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#nvidia-gpus) NVIDIA GPUs ### [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#prerequisites) Prerequisites To use Docker with NVIDIA GPU support, ensure the following are installed on your host: - Latest NVIDIA drivers - [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) Most AI or CUDA images on Amazon/GCP come pre-configured with the NVIDIA container toolkit. ### [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#docker-images-with-nvidia-gpu-support) Docker images with NVIDIA GPU support Docker images with NVIDIA GPU support use the tag suffix `gpu-nvidia`, e.g., `qdrant/qdrant:v1.13.0-gpu-nvidia`. These images include all necessary dependencies. To enable GPU support, use the `--gpus=all` flag with Docker settings. Example: ```bash --- # `--gpus=all` flag says to Docker that we want to use GPUs. --- # `-e QDRANT__GPU__INDEXING=1` flag says to Qdrant that we want to use GPUs for indexing. docker run \ --rm \ --gpus=all \ -p 6333:6333 \ -p 6334:6334 \ -e QDRANT__GPU__INDEXING=1 \ qdrant/qdrant:gpu-nvidia-latest ``` To ensure that the GPU was initialized correctly, you may check it in logs. First Qdrant prints all found GPU devices without filtering and then prints list of all created devices: ```text 2025-01-13T11:58:29.124087Z INFO gpu::instance: Found GPU device: NVIDIA GeForce RTX 3090 2025-01-13T11:58:29.124118Z INFO gpu::instance: Found GPU device: llvmpipe (LLVM 15.0.7, 256 bits) 2025-01-13T11:58:29.124138Z INFO gpu::device: Create GPU device NVIDIA GeForce RTX 3090 ``` Here you can see that two devices were found: RTX 3090 and llvmpipe (a CPU-emulated GPU which is included in the Docker image). Later, you will see that only RTX was initialized. This concludes the setup. Now, you can start using this Qdrant instance. ### [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#troubleshooting-nvidia-gpus) Troubleshooting NVIDIA GPUs If your GPU is not detected in Docker, make sure your driver and `nvidia-container-toolkit` are up-to-date. If needed, you can install latest version of `nvidia-container-toolkit` from it’s GitHub Releases [page](https://github.com/NVIDIA/nvidia-container-toolkit/releases) Verify Vulkan API visibility in the Docker container using: ```bash docker run --rm --gpus=all qdrant/qdrant:gpu-nvidia-latest vulkaninfo --summary ``` The system may show you an error message explaining why the NVIDIA device is not visible. Note that if your NVIDIA GPU is not visible in Docker, the Docker image cannot use libGLX\_nvidia.so.0 on your host. Here is what an error message could look like: ```text ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get `vkCreateInstance` via `vk_icdGetInstanceProcAddr` for ICD libGLX_nvidia.so.0 WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 0. Skipping ICD. ``` To resolve errors, update your NVIDIA container runtime configuration: ```bash sudo nano /etc/nvidia-container-runtime/config.toml ``` Set `no-cgroups=false`, save the configuration, and restart Docker: ```bash sudo systemctl restart docker ``` ## [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#amd-gpus) AMD GPUs ### [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#prerequisites-1) Prerequisites Running Qdrant with AMD GPUs requires [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/detailed-install.html) to be installed on your host. ### [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#docker-images-with-amd-gpu-support) Docker images with AMD GPU support Docker images for AMD GPUs use the tag suffix `gpu-amd`, e.g., `qdrant/qdrant:v1.13.0-gpu-amd`. These images include all required dependencies. To enable GPU for Docker, you need additional `--device /dev/kfd --device /dev/dri` flags. To enable GPU for Qdrant you need to set the enable flag. Here is an example: ```bash --- # `--device /dev/kfd --device /dev/dri` flags say to Docker that we want to use GPUs. --- # `-e QDRANT__GPU__INDEXING=1` flag says to Qdrant that we want to use GPUs for indexing. docker run \ --rm \ --device /dev/kfd --device /dev/dri \ -p 6333:6333 \ -p 6334:6334 \ -e QDRANT__LOG_LEVEL=debug \ -e QDRANT__GPU__INDEXING=1 \ qdrant/qdrant:gpu-amd-latest ``` Check logs to confirm GPU initialization. Example log output: ```text 2025-01-10T11:56:55.926466Z INFO gpu::instance: Found GPU device: AMD Radeon Graphics (RADV GFX1103_R1) 2025-01-10T11:56:55.926485Z INFO gpu::instance: Found GPU device: llvmpipe (LLVM 17.0.6, 256 bits) 2025-01-10T11:56:55.926504Z INFO gpu::device: Create GPU device AMD Radeon Graphics (RADV GFX1103_R1) ``` This concludes the setup. In a basic scenario, you won’t need to configure anything else. ## [Anchor](https://qdrant.tech/documentation/guides/running-with-gpu/\#known-limitations) Known limitations - **Platform Support:** Docker images are only available for Linux x86\_64. Windows, macOS, ARM, and other platforms are not supported. - **Memory Limits:** Each GPU can process up to 16GB of vector data per indexing iteration. Due to this limitation, you should not create segments where either original vectors OR quantized vectors are larger than 16GB. For example, a collection with 1536d vectors and scalar quantization can have at most: ```text 16Gb / 1536 ~= 11 million vectors per segment ``` And without quantization: ```text 16Gb / 1536 * 4 ~= 2.7 million vectors per segment ``` The maximum size of each segment can be configured in the collection settings. Use the following operation to [change](https://qdrant.tech/documentation/concepts/collections/#update-collection-parameters) on your existing collection: ```http PATCH collections/{collection_name} { "optimizers_config": { "max_segment_size": 1000000 } } ``` Note that `max_segment_size` is specified in KiloBytes. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/running-with-GPU.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/running-with-GPU.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-156-lllmstxt|> ## optimizer - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Optimizer --- # [Anchor](https://qdrant.tech/documentation/concepts/optimizer/\#optimizer) Optimizer It is much more efficient to apply changes in batches than perform each change individually, as many other databases do. Qdrant here is no exception. Since Qdrant operates with data structures that are not always easy to change, it is sometimes necessary to rebuild those structures completely. Storage optimization in Qdrant occurs at the segment level (see [storage](https://qdrant.tech/documentation/concepts/storage/)). In this case, the segment to be optimized remains readable for the time of the rebuild. ![Segment optimization](https://qdrant.tech/docs/optimization.svg) The availability is achieved by wrapping the segment into a proxy that transparently handles data changes. Changed data is placed in the copy-on-write segment, which has priority for retrieval and subsequent updates. ## [Anchor](https://qdrant.tech/documentation/concepts/optimizer/\#vacuum-optimizer) Vacuum Optimizer The simplest example of a case where you need to rebuild a segment repository is to remove points. Like many other databases, Qdrant does not delete entries immediately after a query. Instead, it marks records as deleted and ignores them for future queries. This strategy allows us to minimize disk access - one of the slowest operations. However, a side effect of this strategy is that, over time, deleted records accumulate, occupy memory and slow down the system. To avoid these adverse effects, Vacuum Optimizer is used. It is used if the segment has accumulated too many deleted records. The criteria for starting the optimizer are defined in the configuration file. Here is an example of parameter values: ```yaml storage: optimizers: # The minimal fraction of deleted vectors in a segment, required to perform segment optimization deleted_threshold: 0.2 # The minimal number of vectors in a segment, required to perform segment optimization vacuum_min_vector_number: 1000 ``` ## [Anchor](https://qdrant.tech/documentation/concepts/optimizer/\#merge-optimizer) Merge Optimizer The service may require the creation of temporary segments. Such segments, for example, are created as copy-on-write segments during optimization itself. It is also essential to have at least one small segment that Qdrant will use to store frequently updated data. On the other hand, too many small segments lead to suboptimal search performance. The merge optimizer constantly tries to reduce the number of segments if there currently are too many. The desired number of segments is specified with `default_segment_number` and defaults to the number of CPUs. The optimizer may takes at least the three smallest segments and merges them into one. Segments will not be merged if they’ll exceed the maximum configured segment size with `max_segment_size_kb`. It prevents creating segments that are too large to efficiently index. Increasing this number may help to reduce the number of segments if you have a lot of data, and can potentially improve search performance. The criteria for starting the optimizer are defined in the configuration file. Here is an example of parameter values: ```yaml storage: optimizers: # Target amount of segments optimizer will try to keep. # Real amount of segments may vary depending on multiple parameters: # - Amount of stored points # - Current write RPS # # It is recommended to select default number of segments as a factor of the number of search threads, # so that each segment would be handled evenly by one of the threads. # If `default_segment_number = 0`, will be automatically selected by the number of available CPUs default_segment_number: 0 # Do not create segments larger this size (in KiloBytes). # Large segments might require disproportionately long indexation times, # therefore it makes sense to limit the size of segments. # # If indexation speed have more priority for your - make this parameter lower. # If search speed is more important - make this parameter higher. # Note: 1Kb = 1 vector of size 256 # If not set, will be automatically selected considering the number of available CPUs. max_segment_size_kb: null ``` ## [Anchor](https://qdrant.tech/documentation/concepts/optimizer/\#indexing-optimizer) Indexing Optimizer Qdrant allows you to choose the type of indexes and data storage methods used depending on the number of records. So, for example, if the number of points is less than 10000, using any index would be less efficient than a brute force scan. The Indexing Optimizer is used to implement the enabling of indexes and memmap storage when the minimal amount of records is reached. The criteria for starting the optimizer are defined in the configuration file. Here is an example of parameter values: ```yaml storage: optimizers: # Maximum size (in kilobytes) of vectors to store in-memory per segment. # Segments larger than this threshold will be stored as read-only memmaped file. # Memmap storage is disabled by default, to enable it, set this threshold to a reasonable value. # To disable memmap storage, set this to `0`. # Note: 1Kb = 1 vector of size 256 memmap_threshold: 200000 # Maximum size (in kilobytes) of vectors allowed for plain index, exceeding this threshold will enable vector indexing # Default value is 20,000, based on . # To disable vector indexing, set to `0`. # Note: 1kB = 1 vector of size 256. indexing_threshold_kb: 20000 ``` In addition to the configuration file, you can also set optimizer parameters separately for each [collection](https://qdrant.tech/documentation/concepts/collections/). Dynamic parameter updates may be useful, for example, for more efficient initial loading of points. You can disable indexing during the upload process with these settings and enable it immediately after it is finished. As a result, you will not waste extra computation resources on rebuilding the index. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/optimizer.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/optimizer.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-157-lllmstxt|> ## cluster-upgrades - [Documentation](https://qdrant.tech/documentation/) - [Cloud](https://qdrant.tech/documentation/cloud/) - Update Clusters --- # [Anchor](https://qdrant.tech/documentation/cloud/cluster-upgrades/\#updating-qdrant-cloud-clusters) Updating Qdrant Cloud Clusters As soon as a new Qdrant version is available. Qdrant Cloud will show you an update notification in the Cluster list and on the Cluster details page. To update to a new version, go to the Cluster details page, choose the new version from the version dropdown and click **Update**. ![Cluster Updates](https://qdrant.tech/documentation/cloud/cluster-upgrades.png) If you have a multi-node cluster and if your collections have a replication factor of at least **2**, the update process will be zero-downtime and done in a rolling fashion. You will be able to use your database cluster normally. If you have a single-node cluster or a collection with a replication factor of **1**, the update process will require a short downtime period to restart your cluster with the new version. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-upgrades.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-upgrades.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-158-lllmstxt|> ## data-exploration - [Articles](https://qdrant.tech/articles/) - Data Exploration #### Data Exploration Learn how you can leverage vector similarity beyond just search. Reveal hidden patterns and insights in your data, provide recommendations, and navigate data space. [![Preview](https://qdrant.tech/articles_data/distance-based-exploration/preview/preview.jpg)\\ **Distance-based data exploration** \\ Explore your data under a new angle with Qdrant's tools for dimensionality reduction, clusterization, and visualization.\\ \\ Andrey Vasnetsov\\ \\ March 11, 2025](https://qdrant.tech/articles/distance-based-exploration/)[![Preview](https://qdrant.tech/articles_data/discovery-search/preview/preview.jpg)\\ **Discovery needs context** \\ Discovery Search, an innovative way to constrain the vector space in which a search is performed, relying only on vectors.\\ \\ Luis Cossío\\ \\ January 31, 2024](https://qdrant.tech/articles/discovery-search/)[![Preview](https://qdrant.tech/articles_data/vector-similarity-beyond-search/preview/preview.jpg)\\ **Vector Similarity: Going Beyond Full-Text Search \| Qdrant** \\ Discover how vector similarity expands data exploration beyond full-text search. Explore diversity sampling and more for enhanced data discovery!\\ \\ Luis Cossío\\ \\ August 08, 2023](https://qdrant.tech/articles/vector-similarity-beyond-search/)[![Preview](https://qdrant.tech/articles_data/dataset-quality/preview/preview.jpg)\\ **Finding errors in datasets with Similarity Search** \\ Improving quality of text-and-images datasets on the online furniture marketplace example.\\ \\ George Panchuk\\ \\ July 18, 2022](https://qdrant.tech/articles/dataset-quality/) × [Powered by](https://qdrant.tech/) <|page-159-lllmstxt|> ## pdf-retrieval-at-scale - [Documentation](https://qdrant.tech/documentation/) - [Advanced tutorials](https://qdrant.tech/documentation/advanced-tutorials/) - Scaling PDF Retrieval with Qdrant --- # [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#scaling-pdf-retrieval-with-qdrant) Scaling PDF Retrieval with Qdrant ![scaling-pdf-retrieval-qdrant](https://qdrant.tech/documentation/tutorials/pdf-retrieval-at-scale/image1.png) | Time: 30 min | Level: Intermediate | Output: [GitHub](https://github.com/qdrant/examples/blob/master/pdf-retrieval-at-scale/ColPali_ColQwen2_Tutorial.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/qdrant/examples/blob/master/pdf-retrieval-at-scale/ColPali_ColQwen2_Tutorial.ipynb) | | --- | --- | --- | --- | Efficient PDF documents retrieval is a common requirement in tasks like **(agentic) retrieval-augmented generation (RAG)** and many other search-based applications. At the same time, setting up PDF documents retrieval is rarely possible without additional challenges. Many traditional PDF retrieval solutions rely on **optical character recognition (OCR)** together with use case-specific heuristics to handle visually complex elements like tables, images and charts. These algorithms are often non-transferable – even within the same domain – with their task-customized parsing and chunking strategies, labor-intensive, prone to errors, and difficult to scale. Recent advancements in **Vision Large Language Models (VLLMs)**, such as [**ColPali**](https://huggingface.co/blog/manu/colpali) and its successor [**ColQwen**](https://huggingface.co/vidore/colqwen2-v0.1), started the transformation of the PDF retrieval. These multimodal models work directly with PDF pages as inputs, no pre-processing required. Anything that can be converted into an **image** (think of PDFs as screenshots of document pages) can be effectively processed by these models. Being far simpler in use, VLLMs achieve state-of-the-art performance in PDF retrieval benchmarks like the [Visual Document Retrieval (ViDoRe) Benchmark](https://huggingface.co/spaces/vidore/vidore-leaderboard). ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#how-vllms-work-for-pdf-retrieval) How VLLMs Work for PDF Retrieval VLLMs like **ColPali** and **ColQwen** generate **multivector representations** for each PDF page; the representations are stored and indexed in a vector database. During the retrieval process, models dynamically create multivector representations for (textual) user queries, and precise retrieval – matching between PDF pages and queries – is achieved through [late-interaction mechanism](https://qdrant.tech/blog/qdrant-colpali/#how-colpali-works-under-the-hood). ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#challenges-of-scaling-vllms) Challenges of Scaling VLLMs The heavy multivector representations produced by VLLMs make PDF retrieval at scale computationally intensive. These models are inefficient for large-scale PDF retrieval tasks if used without optimization. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#math-behind-the-scaling) Math Behind the Scaling **ColPali** generates over **1,000 vectors per PDF page**, while its successor, **ColQwen**, generates slightly fewer — up to **768 vectors**, dynamically adjusted based on the image size. Typically, ColQwen produces **~700 vectors per page**. To understand the impact, consider the construction of an [**HNSW index**](https://qdrant.tech/articles/what-is-a-vector-database/#1-indexing-hnsw-index-and-sending-data-to-qdrant), a common indexing algorithm for vector databases. Let’s roughly estimate the number of comparisons needed to insert a new PDF page into the index. - **Vectors per page:** ~700 (ColQwen) or ~1,000 (ColPali) - **[ef\_construct](https://qdrant.tech/documentation/concepts/indexing/#vector-index):** 100 (default) The lower bound estimation for the number of vector comparisions comparisons would be: 700×700×100=49millions Now imagine how much it will take to build an index on **20,000 pages**! For ColPali, this number doubles. The result is **extremely slow index construction time**. ### [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#our-solution) Our Solution We recommend reducing the number of vectors in a PDF page representation for the **first-stage retrieval**. After the first stage retrieval with a reduced amount of vectors, we propose to **rerank** retrieved subset with the original uncompressed representation. The reduction of vectors can be achieved by applying a **mean pooling operation** to the multivector VLLM-generated outputs. Mean pooling averages the values across all vectors within a selected subgroup, condensing multiple vectors into a single representative vector. If done right, it allows the preservation of important information from the original page while significantly reducing the number of vectors. VLLMs generate vectors corresponding to patches that represent different portions of a PDF page. These patches can be grouped in columns and rows of a PDF page. For example: - ColPali divides PDF page into **1,024 patches**. - Applying mean pooling by rows (or columns) of this patch matrix reduces the page representation to just **32 vectors**. ![ColPali patching of a PDF page](https://qdrant.tech/documentation/tutorials/pdf-retrieval-at-scale/pooling-by-rows.png) We tested this approach with the ColPali model, mean pooling its multivectors by PDF page rows. The results showed: - **Indexing time faster by an order of magnitude** - **Retrieval quality comparable to the original model** For details of this experiment refer to our [gitHub repository](https://github.com/qdrant/demo-colpali-optimized), [ColPali optimization blog post](https://qdrant.tech/blog/colpali-qdrant-optimization/) or [webinar “PDF Retrieval at Scale”](https://www.youtube.com/watch?v=_h6SN1WwnLs) ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#goal-of-this-tutorial) Goal of This Tutorial In this tutorial, we will demonstrate a scalable approach to PDF retrieval using **Qdrant** and **ColPali** & **ColQwen2** VLLMs. The presented approach is **highly recommended** to avoid the common pitfalls of long indexing times and slow retrieval speeds. In the following sections, we will demonstrate an optimized retrieval algorithm born out of our successful experimentation: **First-Stage Retrieval with Mean-Pooled Vectors:** - Construct an HNSW index using **only mean-pooled vectors**. - Use them for the first-stage retrieval. **Reranking with Original Model Multivectors:** - Use the original multivectors from ColPali or ColQwen2 **to rerank** the results retrieved in the first stage. ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#setup) Setup Install & import required libraries ```python --- # pip install colpali_engine>=0.3.1 from colpali_engine.models import ColPali, ColPaliProcessor --- # pip install qdrant-client>=1.12.0 from qdrant_client import QdrantClient, models ``` To run these experiments, we’re using a **Qdrant cluster**. If you’re just getting started, you can set up a **free-tier cluster** for testing and exploration. Follow the instructions in the documentation [“How to Create a Free-Tier Qdrant Cluster”](https://qdrant.tech/documentation/cloud/create-cluster/#free-clusters) ```python client = QdrantClient( url=, api_key= ) ``` Download **ColPali** model along with its input processors. Make sure to select the backend that suits your setup. ```python colpali_model = ColPali.from_pretrained( "vidore/colpali-v1.3", torch_dtype=torch.bfloat16, device_map="mps", # Use "cuda:0" for GPU, "cpu" for CPU, or "mps" for Apple Silicon ).eval() colpali_processor = ColPaliProcessor.from_pretrained("vidore/colpali-v1.3") ``` For **ColQwen** model ```python from colpali_engine.models import ColQwen2, ColQwen2Processor colqwen_model = ColQwen2.from_pretrained( "vidore/colqwen2-v0.1", torch_dtype=torch.bfloat16, device_map="mps", # Use "cuda:0" for GPU, "cpu" for CPU, or "mps" for Apple Silicon ).eval() colqwen_processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v0.1") ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#create-qdrant-collections) Create Qdrant Collections We can now create a collection in Qdrant to store the multivector representations of PDF pages generated by **ColPali** or **ColQwen**. Collection will include **mean pooled** by rows and columns representations of a PDF page, as well as the **original** multivector representation. ```python client.create_collection( collection_name=collection_name, vectors_config={ "original": models.VectorParams( #switch off HNSW size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ), hnsw_config=models.HnswConfigDiff( m=0 #switching off HNSW ) ), "mean_pooling_columns": models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ) ), "mean_pooling_rows": models.VectorParams( size=128, distance=models.Distance.COSINE, multivector_config=models.MultiVectorConfig( comparator=models.MultiVectorComparator.MAX_SIM ) ) } ) ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#choose-a-dataset) Choose a dataset We’ll use the **UFO Dataset** by Daniel van Strien for this tutorial. It’s available on Hugging Face; you can download it directly from there. ```python from datasets import load_dataset ufo_dataset = "davanstrien/ufo-ColPali" dataset = load_dataset(ufo_dataset, split="train") ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#embedding-and-mean-pooling) Embedding and Mean Pooling We’ll use a function that generates multivector representations and their mean pooled versions of each PDF page (aka image) in batches. For complete understanding, it’s important to consider the following specifics of **ColPali** and **ColQwen**: **ColPali:** In theory, ColPali is designed to generate 1,024 vectors per PDF page, but in practice, it produces 1,030 vectors. This discrepancy is due to ColPali’s pre-processor, which appends the text `Describe the image.` to each input. This additional text generates an extra 6 multivectors. **ColQwen:** ColQwen dynamically determines the number of patches in “rows and columns” of a PDF page based on its size. Consequently, the number of multivectors can vary between inputs. ColQwen pre-processor prepends `<|im_start|>user<|vision_start|>` and appends `<|vision_end|>Describe the image.<|im_end|><|endoftext|>`. For example, that’s how ColQwen multivector output is formed. ![that’s how ColQwen multivector output is formed](https://qdrant.tech/documentation/tutorials/pdf-retrieval-at-scale/ColQwen-preprocessing.png) The `get_patches` function is to get the number of `x_patches` (rows) and `y_patches` (columns) ColPali/ColQwen2 models will divide a PDF page into. For ColPali, the numbers will always be 32 by 32; ColQwen will define them dynamically based on the PDF page size. ```python x_patches, y_patches = model_processor.get_n_patches( image_size, patch_size=model.patch_size ) ``` For **ColQwen** model ```python model_processor.get_n_patches( image_size, patch_size=model.patch_size, spatial_merge_size=model.spatial_merge_size ) ``` We choose to **preserve prefix and postfix multivectors**. Our **pooling** operation compresses the multivectors representing **the image tokens** based on the number of rows and columns determined by the model (static 32x32 for ColPali, dynamic XxY for ColQwen). Function retains and integrates the additional multivectors produced by the model back to pooled representations. Simplified version of pooling for **ColPali** model: (see the full version – also applicable for **ColQwen** – in the [tutorial notebook](https://githubtocolab.com/qdrant/examples/blob/master/pdf-retrieval-at-scale/ColPali_ColQwen2_Tutorial.ipynb)) ```python processed_images = model_processor.process_images(image_batch) --- # Image embeddings of shape (batch_size, 1030, 128) image_embeddings = model(**processed_images) --- # (1030, 128) image_embedding = image_embeddings[0] # take the first element of the batch --- # Now we need to identify vectors that correspond to the image tokens --- # It can be done by selecting tokens corresponding to special `image_token_id` --- # (1030, ) - boolean mask (for the first element in the batch), True for image tokens mask = processed_images.input_ids[0] == model_processor.image_token_id --- # For convenience, we now select only image tokens --- # and reshape them to (x_patches, y_patches, dim) --- # (x_patches, y_patches, 128) image_patch_embeddings = image_embedding[mask].view(x_patches, y_patches, model.dim) --- # Now we can apply mean pooling by rows and columns --- # (x_patches, 128) pooled_by_rows = image_patch_embeddings.mean(dim=0) --- # (y_patches, 128) pooled_by_columns = image_patch_embeddings.mean(dim=1) --- # [Optionally] we can also concatenate special tokens to the pooled representations, --- # (x_patches + 6, 128) pooled_by_rows = torch.cat([pooled_by_rows, image_embedding[~mask]]) --- # (y_patches + 6, 128) pooled_by_columns = torch.cat([pooled_by_columns, image_embedding[~mask]]) ``` ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#upload-to-qdrant) Upload to Qdrant The upload process is trivial; the only thing to pay attention to is the compute cost for ColPali and ColQwen2 models. In low-resource environments, it’s recommended to use a smaller batch size for embedding and mean pooling. Full version of the upload code is available in the [tutorial notebook](https://githubtocolab.com/qdrant/examples/blob/master/pdf-retrieval-at-scale/ColPali_ColQwen2_Tutorial.ipynb) ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#querying-pdfs) Querying PDFs After indexing PDF documents, we can move on to querying them using our two-stage retrieval approach. ```python query = "Lee Harvey Oswald's involvement in the JFK assassination" processed_queries = model_processor.process_queries([query]).to(model.device) --- # Resulting query embedding is a tensor of shape (22, 128) query_embedding = model(**processed_queries)[0] ``` Now let’s design a function for the two-stage retrieval with multivectors produced by VLLMs: - **Step 1:** Prefetch results using a compressed multivector representation & HNSW index. - **Step 2:** Re-rank the prefetched results using the original multivector representation. Let’s query our collections using combined mean pooled representations for the first stage of retrieval. ```python --- # Final amount of results to return search_limit = 10 --- # Amount of results to prefetch for reranking prefetch_limit = 100 response = client.query_points( collection_name=collection_name, query=query_embedding, prefetch=[\ models.Prefetch(\ query=query_embedding,\ limit=prefetch_limit,\ using="mean_pooling_columns"\ ),\ models.Prefetch(\ query=query_embedding,\ limit=prefetch_limit,\ using="mean_pooling_rows"\ ),\ ], limit=search_limit, with_payload=True, with_vector=False, using="original" ) ``` And check the top retrieved result to our query _“Lee Harvey Oswald’s involvement in the JFK assassination”_. ```python dataset[response.points[0].payload['index']]['image'] ``` ![Results, ColPali](https://qdrant.tech/documentation/tutorials/pdf-retrieval-at-scale/result-VLLMs.png) ## [Anchor](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/\#conclusion) Conclusion In this tutorial, we demonstrated an optimized approach using **Qdrant for PDF retrieval at scale** with VLLMs producing **heavy multivector representations** like **ColPali** and **ColQwen2**. Without such optimization, the performance of retrieval systems can degrade severely, both in terms of indexing time and query latency, especially as the dataset size grows. We **strongly recommend** implementing this approach in your workflows to ensure efficient and scalable PDF retrieval. Neglecting to optimize the retrieval process could result in unacceptably slow performance, hindering the usability of your system. Start scaling your PDF retrieval today! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/pdf-retrieval-at-scale.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/pdf-retrieval-at-scale.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-160-lllmstxt|> ## fastembed-semantic-search - [Documentation](https://qdrant.tech/documentation/) - [Fastembed](https://qdrant.tech/documentation/fastembed/) - FastEmbed & Qdrant --- # [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/\#using-fastembed-with-qdrant-for-vector-search) Using FastEmbed with Qdrant for Vector Search ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/\#install-qdrant-client-and-fastembed) Install Qdrant Client and FastEmbed ```python pip install "qdrant-client[fastembed]>=1.14.2" ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/\#initialize-the-client) Initialize the client Qdrant Client has a simple in-memory mode that lets you try semantic search locally. ```python from qdrant_client import QdrantClient, models client = QdrantClient(":memory:") # Qdrant is running from RAM. ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/\#add-data) Add data Now you can add two sample documents, their associated metadata, and a point `id` for each. ```python docs = [\ "Qdrant has a LangChain integration for chatbots.",\ "Qdrant has a LlamaIndex integration for agents.",\ ] metadata = [\ {"source": "langchain-docs"},\ {"source": "llamaindex-docs"},\ ] ids = [42, 2] ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/\#create-a-collection) Create a collection Qdrant stores vectors and associated metadata in collections. Collection requires vector parameters to be set during creation. In this tutorial, we’ll be using `BAAI/bge-small-en` to compute embeddings. ```python model_name = "BAAI/bge-small-en" client.create_collection( collection_name="test_collection", vectors_config=models.VectorParams( size=client.get_embedding_size(model_name), distance=models.Distance.COSINE ), # size and distance are model dependent ) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/\#upsert-documents-to-the-collection) Upsert documents to the collection Qdrant client can do inference implicitly within its methods via FastEmbed integration. It requires wrapping your data in models, like `models.Document` (or `models.Image` if you’re working with images) ```python metadata_with_docs = [\ {"document": doc, "source": meta["source"]} for doc, meta in zip(docs, metadata)\ ] client.upload_collection( collection_name="test_collection", vectors=[models.Document(text=doc, model=model_name) for doc in docs], payload=metadata_with_docs, ids=ids, ) ``` ## [Anchor](https://qdrant.tech/documentation/fastembed/fastembed-semantic-search/\#run-vector-search) Run vector search Here, you will ask a dummy question that will allow you to retrieve a semantically relevant result. ```python search_result = client.query_points( collection_name="test_collection", query=models.Document( text="Which integration is best for agents?", model=model_name ) ).points print(search_result) ``` The semantic search engine will retrieve the most similar result in order of relevance. In this case, the second statement about LlamaIndex is more relevant. ```python [\ ScoredPoint(\ id=2,\ score=0.87491801319731,\ payload={\ "document": "Qdrant has a LlamaIndex integration for agents.",\ "source": "llamaindex-docs",\ },\ ...\ ),\ ScoredPoint(\ id=42,\ score=0.8351846627714035,\ payload={\ "document": "Qdrant has a LangChain integration for chatbots.",\ "source": "langchain-docs",\ },\ ...\ ),\ ] ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-semantic-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/fastembed/fastembed-semantic-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-161-lllmstxt|> ## multiple-partitions - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Multitenancy --- # [Anchor](https://qdrant.tech/documentation/guides/multiple-partitions/\#configure-multitenancy) Configure Multitenancy **How many collections should you create?** In most cases, you should only use a single collection with payload-based partitioning. This approach is called multitenancy. It is efficient for most of users, but it requires additional configuration. This document will show you how to set it up. **When should you create multiple collections?** When you have a limited number of users and you need isolation. This approach is flexible, but it may be more costly, since creating numerous collections may result in resource overhead. Also, you need to ensure that they do not affect each other in any way, including performance-wise. ## [Anchor](https://qdrant.tech/documentation/guides/multiple-partitions/\#partition-by-payload) Partition by payload When an instance is shared between multiple users, you may need to partition vectors by user. This is done so that each user can only access their own vectors and can’t see the vectors of other users. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/points { "points": [\ {\ "id": 1,\ "payload": {"group_id": "user_1"},\ "vector": [0.9, 0.1, 0.1]\ },\ {\ "id": 2,\ "payload": {"group_id": "user_1"},\ "vector": [0.1, 0.9, 0.1]\ },\ {\ "id": 3,\ "payload": {"group_id": "user_2"},\ "vector": [0.1, 0.1, 0.9]\ },\ ] } ``` ```python client.upsert( collection_name="{collection_name}", points=[\ models.PointStruct(\ id=1,\ payload={"group_id": "user_1"},\ vector=[0.9, 0.1, 0.1],\ ),\ models.PointStruct(\ id=2,\ payload={"group_id": "user_1"},\ vector=[0.1, 0.9, 0.1],\ ),\ models.PointStruct(\ id=3,\ payload={"group_id": "user_2"},\ vector=[0.1, 0.1, 0.9],\ ),\ ], ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.upsert("{collection_name}", { points: [\ {\ id: 1,\ payload: { group_id: "user_1" },\ vector: [0.9, 0.1, 0.1],\ },\ {\ id: 2,\ payload: { group_id: "user_1" },\ vector: [0.1, 0.9, 0.1],\ },\ {\ id: 3,\ payload: { group_id: "user_2" },\ vector: [0.1, 0.1, 0.9],\ },\ ], }); ``` ```rust use qdrant_client::qdrant::{PointStruct, UpsertPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .upsert_points(UpsertPointsBuilder::new( "{collection_name}", vec![\ PointStruct::new(1, vec![0.9, 0.1, 0.1], [("group_id", "user_1".into())]),\ PointStruct::new(2, vec![0.1, 0.9, 0.1], [("group_id", "user_1".into())]),\ PointStruct::new(3, vec![0.1, 0.1, 0.9], [("group_id", "user_2".into())]),\ ], )) .await?; ``` ```java import java.util.List; import java.util.Map; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.PointStruct; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .upsertAsync( "{collection_name}", List.of( PointStruct.newBuilder() .setId(id(1)) .setVectors(vectors(0.9f, 0.1f, 0.1f)) .putAllPayload(Map.of("group_id", value("user_1"))) .build(), PointStruct.newBuilder() .setId(id(2)) .setVectors(vectors(0.1f, 0.9f, 0.1f)) .putAllPayload(Map.of("group_id", value("user_1"))) .build(), PointStruct.newBuilder() .setId(id(3)) .setVectors(vectors(0.1f, 0.1f, 0.9f)) .putAllPayload(Map.of("group_id", value("user_2"))) .build())) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.UpsertAsync( collectionName: "{collection_name}", points: new List { new() { Id = 1, Vectors = new[] { 0.9f, 0.1f, 0.1f }, Payload = { ["group_id"] = "user_1" } }, new() { Id = 2, Vectors = new[] { 0.1f, 0.9f, 0.1f }, Payload = { ["group_id"] = "user_1" } }, new() { Id = 3, Vectors = new[] { 0.1f, 0.1f, 0.9f }, Payload = { ["group_id"] = "user_2" } } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Upsert(context.Background(), &qdrant.UpsertPoints{ CollectionName: "{collection_name}", Points: []*qdrant.PointStruct{ { Id: qdrant.NewIDNum(1), Vectors: qdrant.NewVectors(0.9, 0.1, 0.1), Payload: qdrant.NewValueMap(map[string]any{"group_id": "user_1"}), }, { Id: qdrant.NewIDNum(2), Vectors: qdrant.NewVectors(0.1, 0.9, 0.1), Payload: qdrant.NewValueMap(map[string]any{"group_id": "user_1"}), }, { Id: qdrant.NewIDNum(3), Vectors: qdrant.NewVectors(0.1, 0.1, 0.9), Payload: qdrant.NewValueMap(map[string]any{"group_id": "user_2"}), }, }, }) ``` 2. Use a filter along with `group_id` to filter vectors for each user. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.1, 0.1, 0.9], "filter": { "must": [\ {\ "key": "group_id",\ "match": {\ "value": "user_1"\ }\ }\ ] }, "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.1, 0.1, 0.9], query_filter=models.Filter( must=[\ models.FieldCondition(\ key="group_id",\ match=models.MatchValue(\ value="user_1",\ ),\ )\ ] ), limit=10, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.1, 0.1, 0.9], filter: { must: [{ key: "group_id", match: { value: "user_1" } }], }, limit: 10, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, QueryPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.1, 0.1, 0.9]) .limit(10) .filter(Filter::must([Condition::matches(\ "group_id",\ "user_1".to_string(),\ )])), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.ConditionFactory.matchKeyword; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder().addMust(matchKeyword("group_id", "user_1")).build()) .setQuery(nearest(0.1f, 0.1f, 0.9f)) .setLimit(10) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.1f, 0.1f, 0.9f }, filter: MatchKeyword("group_id", "user_1"), limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.1, 0.1, 0.9), Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("group_id", "user_1"), }, }, }) ``` ## [Anchor](https://qdrant.tech/documentation/guides/multiple-partitions/\#calibrate-performance) Calibrate performance The speed of indexation may become a bottleneck in this case, as each user’s vector will be indexed into the same collection. To avoid this bottleneck, consider _bypassing the construction of a global vector index_ for the entire collection and building it only for individual groups instead. By adopting this strategy, Qdrant will index vectors for each user independently, significantly accelerating the process. To implement this approach, you should: 1. Set `payload_m` in the HNSW configuration to a non-zero value, such as 16. 2. Set `m` in hnsw config to 0. This will disable building global index for the whole collection. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "hnsw_config": { "payload_m": 16, "m": 0 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), hnsw_config=models.HnswConfigDiff( payload_m=16, m=0, ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, hnsw_config: { payload_m: 16, m: 0, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, HnswConfigDiffBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .hnsw_config(HnswConfigDiffBuilder::default().payload_m(16).m(0)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.HnswConfigDiff; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setHnswConfig(HnswConfigDiff.newBuilder().setPayloadM(16).setM(0).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, hnswConfig: new HnswConfigDiff { PayloadM = 16, M = 0 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), HnswConfig: &qdrant.HnswConfigDiff{ PayloadM: qdrant.PtrOf(uint64(16)), M: qdrant.PtrOf(uint64(0)), }, }) ``` 3. Create keyword payload index for `group_id` field. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name}/index { "field_name": "group_id", "field_schema": { "type": "keyword", "is_tenant": true } } ``` ```python client.create_payload_index( collection_name="{collection_name}", field_name="group_id", field_schema=models.KeywordIndexParams( type="keyword", is_tenant=True, ), ) ``` ```typescript client.createPayloadIndex("{collection_name}", { field_name: "group_id", field_schema: { type: "keyword", is_tenant: true, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateFieldIndexCollectionBuilder, KeywordIndexParamsBuilder, FieldType }; use qdrant_client::{Qdrant, QdrantError}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.create_field_index( CreateFieldIndexCollectionBuilder::new( "{collection_name}", "group_id", FieldType::Keyword, ).field_index_params( KeywordIndexParamsBuilder::default() .is_tenant(true) ) ).await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.PayloadIndexParams; import io.qdrant.client.grpc.Collections.PayloadSchemaType; import io.qdrant.client.grpc.Collections.KeywordIndexParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createPayloadIndexAsync( "{collection_name}", "group_id", PayloadSchemaType.Keyword, PayloadIndexParams.newBuilder() .setKeywordIndexParams( KeywordIndexParams.newBuilder() .setIsTenant(true) .build()) .build(), null, null, null) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.CreatePayloadIndexAsync( collectionName: "{collection_name}", fieldName: "group_id", schemaType: PayloadSchemaType.Keyword, indexParams: new PayloadIndexParams { KeywordIndexParams = new KeywordIndexParams { IsTenant = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateFieldIndex(context.Background(), &qdrant.CreateFieldIndexCollection{ CollectionName: "{collection_name}", FieldName: "group_id", FieldType: qdrant.FieldType_FieldTypeKeyword.Enum(), FieldIndexParams: qdrant.NewPayloadIndexParams( &qdrant.KeywordIndexParams{ IsTenant: qdrant.PtrOf(true), }), }) ``` `is_tenant=true` parameter is optional, but specifying it provides storage with additional information about the usage patterns the collection is going to use. When specified, storage structure will be organized in a way to co-locate vectors of the same tenant together, which can significantly improve performance in some cases. ## [Anchor](https://qdrant.tech/documentation/guides/multiple-partitions/\#limitations) Limitations One downside to this approach is that global requests (without the `group_id` filter) will be slower since they will necessitate scanning all groups to identify the nearest neighbors. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/multiple-partitions.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/multiple-partitions.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-162-lllmstxt|> ## api-reference - [Documentation](https://qdrant.tech/documentation/) - [Private cloud](https://qdrant.tech/documentation/private-cloud/) - API Reference --- # [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#api-reference) API Reference ## [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#packages) Packages - [qdrant.io/v1](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantiov1) ## [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantiov1) qdrant.io/v1 Package v1 contains API Schema definitions for the qdrant.io v1 API group ### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#resource-types) Resource Types - [QdrantCloudRegion](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregion) - [QdrantCloudRegionList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionlist) - [QdrantCluster](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcluster) - [QdrantClusterList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterlist) - [QdrantClusterRestore](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestore) - [QdrantClusterRestoreList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestorelist) - [QdrantClusterScheduledSnapshot](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterscheduledsnapshot) - [QdrantClusterScheduledSnapshotList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterscheduledsnapshotlist) - [QdrantClusterSnapshot](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshot) - [QdrantClusterSnapshotList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshotlist) - [QdrantEntity](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentity) - [QdrantEntityList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentitylist) - [QdrantRelease](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantrelease) - [QdrantReleaseList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantreleaselist) #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#clusterphase) ClusterPhase _Underlying type:_ _string_ _Appears in:_ - [QdrantClusterStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterstatus) | Field | Description | | --- | --- | | `Creating` | | | `FailedToCreate` | | | `Updating` | | | `FailedToUpdate` | | | `Scaling` | | | `Upgrading` | | | `Suspending` | | | `Suspended` | | | `FailedToSuspend` | | | `Resuming` | | | `FailedToResume` | | | `Healthy` | | | `NotReady` | | | `RecoveryMode` | | | `ManualMaintenance` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#componentphase) ComponentPhase _Underlying type:_ _string_ _Appears in:_ - [ComponentStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#componentstatus) | Field | Description | | --- | --- | | `Ready` | | | `NotReady` | | | `Unknown` | | | `NotFound` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#componentreference) ComponentReference _Appears in:_ - [QdrantCloudRegionSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | APIVersion is the group and version of the component being referenced. | | | | `kind` _string_ | Kind is the type of component being referenced | | | | `name` _string_ | Name is the name of component being referenced | | | | `namespace` _string_ | Namespace is the namespace of component being referenced. | | | | `markedForDeletion` _boolean_ | MarkedForDeletion specifies whether the component is marked for deletion | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#componentstatus) ComponentStatus _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `name` _string_ | Name specifies the name of the component | | | | `namespace` _string_ | Namespace specifies the namespace of the component | | | | `version` _string_ | Version specifies the version of the component | | | | `phase` _[ComponentPhase](https://qdrant.tech/documentation/private-cloud/api-reference/#componentphase)_ | Phase specifies the current phase of the component | | | | `message` _string_ | Message specifies the info explaining the current phase of the component | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#entityphase) EntityPhase _Underlying type:_ _string_ _Appears in:_ - [QdrantEntityStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentitystatus) | Field | Description | | --- | --- | | `Creating` | | | `Ready` | | | `Updating` | | | `Failing` | | | `Deleting` | | | `Deleted` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#entityresult) EntityResult _Underlying type:_ _string_ EntityResult is the last result from the invocation to a manager _Appears in:_ - [QdrantEntityStatusResult](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentitystatusresult) | Field | Description | | --- | --- | | `Ok` | | | `Pending` | | | `Error` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#gpu) GPU _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `gpuType` _[GPUType](https://qdrant.tech/documentation/private-cloud/api-reference/#gputype)_ | GPUType specifies the type of the GPU to use. If set, GPU indexing is enabled. | | Enum: \[nvidia amd\] | | `forceHalfPrecision` _boolean_ | ForceHalfPrecision for `f32` values while indexing.
`f16` conversion will take place
only inside GPU memory and won’t affect storage type. | false | | | `deviceFilter` _string array_ | DeviceFilter for GPU devices by hardware name. Case-insensitive.
List of substrings to match against the gpu device name.
Example: \[- “nvidia”\]
If not specified, all devices are accepted. | | MinItems: 1 | | `devices` _string array_ | Devices is a List of explicit GPU devices to use.
If host has multiple GPUs, this option allows to select specific devices
by their index in the list of found devices.
If `deviceFilter` is set, indexes are applied after filtering.
If not specified, all devices are accepted. | | MinItems: 1 | | `parallelIndexes` _integer_ | ParallelIndexes is the number of parallel indexes to run on the GPU. | 1 | Minimum: 1 | | `groupsCount` _integer_ | GroupsCount is the amount of used vulkan “groups” of GPU.
In other words, how many parallel points can be indexed by GPU.
Optimal value might depend on the GPU model.
Proportional, but doesn’t necessary equal to the physical number of warps.
Do not change this value unless you know what you are doing. | | Minimum: 1 | | `allowIntegrated` _boolean_ | AllowIntegrated specifies whether to allow integrated GPUs to be used. | false | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#gputype) GPUType _Underlying type:_ _string_ GPUType specifies the type of GPU to use. _Validation:_ - Enum: \[nvidia amd\] _Appears in:_ - [GPU](https://qdrant.tech/documentation/private-cloud/api-reference/#gpu) | Field | Description | | --- | --- | | `nvidia` | | | `amd` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#helmrelease) HelmRelease _Appears in:_ - [QdrantCloudRegionSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `markedForDeletionAt` _string_ | MarkedForDeletionAt specifies the time when the helm release was marked for deletion | | | | `object` _[HelmRelease](https://qdrant.tech/documentation/private-cloud/api-reference/#helmrelease)_ | Object specifies the helm release object | | EmbeddedResource: {} | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#helmrepository) HelmRepository _Appears in:_ - [QdrantCloudRegionSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `markedForDeletionAt` _string_ | MarkedForDeletionAt specifies the time when the helm repository was marked for deletion | | | | `object` _[HelmRepository](https://qdrant.tech/documentation/private-cloud/api-reference/#helmrepository)_ | Object specifies the helm repository object | | EmbeddedResource: {} | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#inferenceconfig) InferenceConfig _Appears in:_ - [QdrantConfiguration](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfiguration) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `enabled` _boolean_ | Enabled specifies whether to enable inference for the cluster or not. | false | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#ingress) Ingress _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `enabled` _boolean_ | Enabled specifies whether to enable ingress for the cluster or not. | | | | `annotations` _object (keys:string, values:string)_ | Annotations specifies annotations for the ingress. | | | | `ingressClassName` _string_ | IngressClassName specifies the name of the ingress class | | | | `host` _string_ | Host specifies the host for the ingress. | | | | `tls` _boolean_ | TLS specifies whether to enable tls for the ingress.
The default depends on the ingress provider:
\- KubernetesIngress: False
\- NginxIngress: False
\- QdrantCloudTraefik: Depending on the config.tls setting of the operator. | | | | `tlsSecretName` _string_ | TLSSecretName specifies the name of the secret containing the tls certificate. | | | | `nginx` _[NGINXConfig](https://qdrant.tech/documentation/private-cloud/api-reference/#nginxconfig)_ | NGINX specifies the nginx ingress specific configurations. | | | | `traefik` _[TraefikConfig](https://qdrant.tech/documentation/private-cloud/api-reference/#traefikconfig)_ | Traefik specifies the traefik ingress specific configurations. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#kubernetesdistribution) KubernetesDistribution _Underlying type:_ _string_ _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | | --- | --- | | `unknown` | | | `aws` | | | `gcp` | | | `azure` | | | `do` | | | `scaleway` | | | `openshift` | | | `linode` | | | `civo` | | | `oci` | | | `ovhcloud` | | | `stackit` | | | `vultr` | | | `k3s` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#kubernetespod) KubernetesPod _Appears in:_ - [KubernetesStatefulSet](https://qdrant.tech/documentation/private-cloud/api-reference/#kubernetesstatefulset) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `annotations` _object (keys:string, values:string)_ | Annotations specifies the annotations for the Pods. | | | | `labels` _object (keys:string, values:string)_ | Labels specifies the labels for the Pods. | | | | `extraEnv` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | ExtraEnv specifies the extra environment variables for the Pods. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#kubernetesservice) KubernetesService _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `type` _[ServiceType](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#servicetype-v1-core)_ | Type specifies the type of the Service: “ClusterIP”, “NodePort”, “LoadBalancer”. | ClusterIP | | | `annotations` _object (keys:string, values:string)_ | Annotations specifies the annotations for the Service. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#kubernetesstatefulset) KubernetesStatefulSet _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `annotations` _object (keys:string, values:string)_ | Annotations specifies the annotations for the StatefulSet. | | | | `pods` _[KubernetesPod](https://qdrant.tech/documentation/private-cloud/api-reference/#kubernetespod)_ | Pods specifies the configuration of the Pods of the Qdrant StatefulSet. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#metricsource) MetricSource _Underlying type:_ _string_ _Appears in:_ - [Monitoring](https://qdrant.tech/documentation/private-cloud/api-reference/#monitoring) | Field | Description | | --- | --- | | `kubelet` | | | `api` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#monitoring) Monitoring _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `cAdvisorMetricSource` _[MetricSource](https://qdrant.tech/documentation/private-cloud/api-reference/#metricsource)_ | CAdvisorMetricSource specifies the cAdvisor metric source | | | | `nodeMetricSource` _[MetricSource](https://qdrant.tech/documentation/private-cloud/api-reference/#metricsource)_ | NodeMetricSource specifies the node metric source | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#nginxconfig) NGINXConfig _Appears in:_ - [Ingress](https://qdrant.tech/documentation/private-cloud/api-reference/#ingress) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `allowedSourceRanges` _string array_ | AllowedSourceRanges specifies the allowed CIDR source ranges for the ingress. | | | | `grpcHost` _string_ | GRPCHost specifies the host name for the GRPC ingress. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#nodeinfo) NodeInfo _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `name` _string_ | Name specifies the name of the node | | | | `region` _string_ | Region specifies the region of the node | | | | `zone` _string_ | Zone specifies the zone of the node | | | | `instanceType` _string_ | InstanceType specifies the instance type of the node | | | | `arch` _string_ | Arch specifies the CPU architecture of the node | | | | `capacity` _[NodeResourceInfo](https://qdrant.tech/documentation/private-cloud/api-reference/#noderesourceinfo)_ | Capacity specifies the capacity of the node | | | | `allocatable` _[NodeResourceInfo](https://qdrant.tech/documentation/private-cloud/api-reference/#noderesourceinfo)_ | Allocatable specifies the allocatable resources of the node | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#noderesourceinfo) NodeResourceInfo _Appears in:_ - [NodeInfo](https://qdrant.tech/documentation/private-cloud/api-reference/#nodeinfo) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `cpu` _string_ | CPU specifies the CPU resources of the node | | | | `memory` _string_ | Memory specifies the memory resources of the node | | | | `pods` _string_ | Pods specifies the pods resources of the node | | | | `ephemeralStorage` _string_ | EphemeralStorage specifies the ephemeral storage resources of the node | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#nodestatus) NodeStatus _Appears in:_ - [QdrantClusterStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `name` _string_ | Name specifies the name of the node | | | | `started_at` _string_ | StartedAt specifies the time when the node started (in RFC3339 format) | | | | `state` _object (keys: [PodConditionType](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#podconditiontype-v1-core), values: [ConditionStatus](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#conditionstatus-v1-core))_ | States specifies the condition states of the node | | | | `version` _string_ | Version specifies the version of Qdrant running on the node | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#pause) Pause _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `owner` _string_ | Owner specifies the owner of the pause request. | | | | `reason` _string_ | Reason specifies the reason for the pause request. | | | | `creationTimestamp` _string_ | CreationTimestamp specifies the time when the pause request was created. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantcloudregion) QdrantCloudRegion QdrantCloudRegion is the Schema for the qdrantcloudregions API _Appears in:_ - [QdrantCloudRegionList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionlist) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantCloudRegion` | | | | `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `spec` _[QdrantCloudRegionSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionspec)_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantcloudregionlist) QdrantCloudRegionList QdrantCloudRegionList contains a list of QdrantCloudRegion | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantCloudRegionList` | | | | `metadata` _[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#listmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `items` _[QdrantCloudRegion](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregion) array_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantcloudregionspec) QdrantCloudRegionSpec QdrantCloudRegionSpec defines the desired state of QdrantCloudRegion _Appears in:_ - [QdrantCloudRegion](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregion) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `id` _string_ | Id specifies the unique identifier of the region | | | | `components` _[ComponentReference](https://qdrant.tech/documentation/private-cloud/api-reference/#componentreference) array_ | Components specifies the list of components to be installed in the region | | | | `helmRepositories` _[HelmRepository](https://qdrant.tech/documentation/private-cloud/api-reference/#helmrepository) array_ | HelmRepositories specifies the list of helm repositories to be created to the region
Deprecated: Use “Components” instead | | | | `helmReleases` _[HelmRelease](https://qdrant.tech/documentation/private-cloud/api-reference/#helmrelease) array_ | HelmReleases specifies the list of helm releases to be created to the region
Deprecated: Use “Components” instead | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantcluster) QdrantCluster QdrantCluster is the Schema for the qdrantclusters API _Appears in:_ - [QdrantClusterList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterlist) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantCluster` | | | | `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `spec` _[QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec)_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterlist) QdrantClusterList QdrantClusterList contains a list of QdrantCluster | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantClusterList` | | | | `metadata` _[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#listmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `items` _[QdrantCluster](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcluster) array_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterrestore) QdrantClusterRestore QdrantClusterRestore is the Schema for the qdrantclusterrestores API _Appears in:_ - [QdrantClusterRestoreList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestorelist) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantClusterRestore` | | | | `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `spec` _[QdrantClusterRestoreSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestorespec)_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterrestorelist) QdrantClusterRestoreList QdrantClusterRestoreList contains a list of QdrantClusterRestore objects | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantClusterRestoreList` | | | | `metadata` _[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#listmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `items` _[QdrantClusterRestore](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestore) array_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterrestorespec) QdrantClusterRestoreSpec QdrantClusterRestoreSpec defines the desired state of QdrantClusterRestore _Appears in:_ - [QdrantClusterRestore](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestore) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `source` _[RestoreSource](https://qdrant.tech/documentation/private-cloud/api-reference/#restoresource)_ | Source defines the source snapshot from which the restore will be done | | | | `destination` _[RestoreDestination](https://qdrant.tech/documentation/private-cloud/api-reference/#restoredestination)_ | Destination defines the destination cluster where the source data will end up | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterscheduledsnapshot) QdrantClusterScheduledSnapshot QdrantClusterScheduledSnapshot is the Schema for the qdrantclusterscheduledsnapshots API _Appears in:_ - [QdrantClusterScheduledSnapshotList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterscheduledsnapshotlist) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantClusterScheduledSnapshot` | | | | `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `spec` _[QdrantClusterScheduledSnapshotSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterscheduledsnapshotspec)_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterscheduledsnapshotlist) QdrantClusterScheduledSnapshotList QdrantClusterScheduledSnapshotList contains a list of QdrantCluster | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantClusterScheduledSnapshotList` | | | | `metadata` _[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#listmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `items` _[QdrantClusterScheduledSnapshot](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterscheduledsnapshot) array_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterscheduledsnapshotspec) QdrantClusterScheduledSnapshotSpec QdrantClusterScheduledSnapshotSpec defines the desired state of QdrantCluster _Appears in:_ - [QdrantClusterScheduledSnapshot](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterscheduledsnapshot) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `cluster-id` _string_ | Id specifies the unique identifier of the cluster | | | | `scheduleShortId` _string_ | Specifies short Id which identifies a schedule | | MaxLength: 8 | | `schedule` _string_ | Cron expression for frequency of creating snapshots, see [https://en.wikipedia.org/wiki/Cron](https://en.wikipedia.org/wiki/Cron).
The schedule is specified in UTC. | | Pattern: `^(@(annually|yearly|monthly|weekly|daily|hourly|reboot))|(@every (\d+(ns|us|Âľs|ms|s|m|h))+)|((((\d+,)+\d+|([\d\*]+(\/|-)\d+)|\d+|\*) ?)\{5,7\})$` | | `retention` _string_ | Retention of schedule in hours | | Pattern: `^[0-9]+h$` | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclustersnapshot) QdrantClusterSnapshot QdrantClusterSnapshot is the Schema for the qdrantclustersnapshots API _Appears in:_ - [QdrantClusterSnapshotList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshotlist) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantClusterSnapshot` | | | | `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `spec` _[QdrantClusterSnapshotSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshotspec)_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclustersnapshotlist) QdrantClusterSnapshotList QdrantClusterSnapshotList contains a list of QdrantClusterSnapshot | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantClusterSnapshotList` | | | | `metadata` _[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#listmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `items` _[QdrantClusterSnapshot](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshot) array_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclustersnapshotphase) QdrantClusterSnapshotPhase _Underlying type:_ _string_ _Appears in:_ - [QdrantClusterSnapshotStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshotstatus) | Field | Description | | --- | --- | | `Running` | | | `Skipped` | | | `Failed` | | | `Succeeded` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclustersnapshotspec) QdrantClusterSnapshotSpec _Appears in:_ - [QdrantClusterSnapshot](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshot) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `cluster-id` _string_ | The cluster ID for which a Snapshot need to be taken
The cluster should be in the same namespace as this QdrantClusterSnapshot is located | | | | `creation-timestamp` _integer_ | The CreationTimestamp of the backup (expressed in Unix epoch format) | | | | `scheduleShortId` _string_ | Specifies the short Id which identifies a schedule, if any.
This field should not be set if the backup is made manually. | | MaxLength: 8 | | `retention` _string_ | The retention period of this snapshot in hours, if any.
If not set, the backup doesn’t have a retention period, meaning it will not be removed. | | Pattern: `^[0-9]+h$` | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantclusterspec) QdrantClusterSpec QdrantClusterSpec defines the desired state of QdrantCluster _Appears in:_ - [QdrantCluster](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcluster) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `id` _string_ | Id specifies the unique identifier of the cluster | | | | `version` _string_ | Version specifies the version of Qdrant to deploy | | | | `size` _integer_ | Size specifies the desired number of Qdrant nodes in the cluster | | Maximum: 30
Minimum: 1 | | `servicePerNode` _boolean_ | ServicePerNode specifies whether the cluster should start a dedicated service for each node. | true | | | `clusterManager` _boolean_ | ClusterManager specifies whether to use the cluster manager for this cluster.
The Python-operator will deploy a dedicated cluster manager instance.
The Go-operator will use a shared instance.
If not set, the default will be taken from the operator config. | | | | `suspend` _boolean_ | Suspend specifies whether to suspend the cluster.
If enabled, the cluster will be suspended and all related resources will be removed except the PVCs. | false | | | `pauses` _[Pause](https://qdrant.tech/documentation/private-cloud/api-reference/#pause) array_ | Pauses specifies a list of pause request by developer for manual maintenance.
Operator will skip handling any changes in the CR if any pause request is present. | | | | `image` _[QdrantImage](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantimage)_ | Image specifies the image to use for each Qdrant node. | | | | `resources` _[Resources](https://qdrant.tech/documentation/private-cloud/api-reference/#resources)_ | Resources specifies the resources to allocate for each Qdrant node. | | | | `security` _[QdrantSecurityContext](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantsecuritycontext)_ | Security specifies the security context for each Qdrant node. | | | | `tolerations` _[Toleration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#toleration-v1-core) array_ | Tolerations specifies the tolerations for each Qdrant node. | | | | `nodeSelector` _object (keys:string, values:string)_ | NodeSelector specifies the node selector for each Qdrant node. | | | | `config` _[QdrantConfiguration](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfiguration)_ | Config specifies the Qdrant configuration setttings for the clusters. | | | | `ingress` _[Ingress](https://qdrant.tech/documentation/private-cloud/api-reference/#ingress)_ | Ingress specifies the ingress for the cluster. | | | | `service` _[KubernetesService](https://qdrant.tech/documentation/private-cloud/api-reference/#kubernetesservice)_ | Service specifies the configuration of the Qdrant Kubernetes Service. | | | | `gpu` _[GPU](https://qdrant.tech/documentation/private-cloud/api-reference/#gpu)_ | GPU specifies GPU configuration for the cluster. If this field is not set, no GPU will be used. | | | | `statefulSet` _[KubernetesStatefulSet](https://qdrant.tech/documentation/private-cloud/api-reference/#kubernetesstatefulset)_ | StatefulSet specifies the configuration of the Qdrant Kubernetes StatefulSet. | | | | `storageClassNames` _[StorageClassNames](https://qdrant.tech/documentation/private-cloud/api-reference/#storageclassnames)_ | StorageClassNames specifies the storage class names for db and snapshots. | | | | `topologySpreadConstraints` _[TopologySpreadConstraint](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#topologyspreadconstraint-v1-core)_ | TopologySpreadConstraints specifies the topology spread constraints for the cluster. | | | | `podDisruptionBudget` _[PodDisruptionBudgetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#poddisruptionbudgetspec-v1-policy)_ | PodDisruptionBudget specifies the pod disruption budget for the cluster. | | | | `restartAllPodsConcurrently` _boolean_ | RestartAllPodsConcurrently specifies whether to restart all pods concurrently (also called one-shot-restart).
If enabled, all the pods in the cluster will be restarted concurrently in situations where multiple pods
need to be restarted, like when RestartedAtAnnotationKey is added/updated or the Qdrant version needs to be upgraded.
This helps sharded but not replicated clusters to reduce downtime to a possible minimum during restart.
If unset, the operator is going to restart nodes concurrently if none of the collections if replicated. | | | | `startupDelaySeconds` _integer_ | If StartupDelaySeconds is set (> 0), an additional ‘sleep ’ will be emitted to the pod startup.
The sleep will be added when a pod is restarted, it will not force any pod to restart.
This feature can be used for debugging the core, e.g. if a pod is in crash loop, it provided a way
to inspect the attached storage. | | | | `rebalanceStrategy` _[RebalanceStrategy](https://qdrant.tech/documentation/private-cloud/api-reference/#rebalancestrategy)_ | RebalanceStrategy specifies the strategy to use for automaticially rebalancing shards the cluster.
Cluster-manager needs to be enabled for this feature to work. | | Enum: \[by\_count by\_size by\_count\_and\_size\] | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantconfiguration) QdrantConfiguration _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `collection` _[QdrantConfigurationCollection](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfigurationcollection)_ | Collection specifies the default collection configuration for Qdrant. | | | | `log_level` _string_ | LogLevel specifies the log level for Qdrant. | | | | `service` _[QdrantConfigurationService](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfigurationservice)_ | Service specifies the service level configuration for Qdrant. | | | | `tls` _[QdrantConfigurationTLS](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfigurationtls)_ | TLS specifies the TLS configuration for Qdrant. | | | | `storage` _[StorageConfig](https://qdrant.tech/documentation/private-cloud/api-reference/#storageconfig)_ | Storage specifies the storage configuration for Qdrant. | | | | `inference` _[InferenceConfig](https://qdrant.tech/documentation/private-cloud/api-reference/#inferenceconfig)_ | Inference configuration. This is used in Qdrant Managed Cloud only. If not set Inference is not available to this cluster. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantconfigurationcollection) QdrantConfigurationCollection _Appears in:_ - [QdrantConfiguration](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfiguration) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `replication_factor` _integer_ | ReplicationFactor specifies the default number of replicas of each shard | | | | `write_consistency_factor` _integer_ | WriteConsistencyFactor specifies how many replicas should apply the operation to consider it successful | | | | `vectors` _[QdrantConfigurationCollectionVectors](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfigurationcollectionvectors)_ | Vectors specifies the default parameters for vectors | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantconfigurationcollectionvectors) QdrantConfigurationCollectionVectors _Appears in:_ - [QdrantConfigurationCollection](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfigurationcollection) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `on_disk` _boolean_ | OnDisk specifies whether vectors should be stored in memory or on disk. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantconfigurationservice) QdrantConfigurationService _Appears in:_ - [QdrantConfiguration](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfiguration) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `api_key` _[QdrantSecretKeyRef](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantsecretkeyref)_ | ApiKey for the qdrant instance | | | | `read_only_api_key` _[QdrantSecretKeyRef](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantsecretkeyref)_ | ReadOnlyApiKey for the qdrant instance | | | | `jwt_rbac` _boolean_ | JwtRbac specifies whether to enable jwt rbac for the qdrant instance
Default is false | | | | `hide_jwt_dashboard` _boolean_ | HideJwtDashboard specifies whether to hide the JWT dashboard of the embedded UI
Default is false | | | | `enable_tls` _boolean_ | EnableTLS specifies whether to enable tls for the qdrant instance
Default is false | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantconfigurationtls) QdrantConfigurationTLS _Appears in:_ - [QdrantConfiguration](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfiguration) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `cert` _[QdrantSecretKeyRef](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantsecretkeyref)_ | Reference to the secret containing the server certificate chain file | | | | `key` _[QdrantSecretKeyRef](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantsecretkeyref)_ | Reference to the secret containing the server private key file | | | | `caCert` _[QdrantSecretKeyRef](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantsecretkeyref)_ | Reference to the secret containing the CA certificate file | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantentity) QdrantEntity QdrantEntity is the Schema for the qdrantentities API _Appears in:_ - [QdrantEntityList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentitylist) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantEntity` | | | | `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `spec` _[QdrantEntitySpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentityspec)_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantentitylist) QdrantEntityList QdrantEntityList contains a list of QdrantEntity objects | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantEntityList` | | | | `metadata` _[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#listmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `items` _[QdrantEntity](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentity) array_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantentityspec) QdrantEntitySpec QdrantEntitySpec defines the desired state of QdrantEntity _Appears in:_ - [QdrantEntity](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentity) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `id` _string_ | The unique identifier of the entity (in UUID format). | | | | `entityType` _string_ | The type of the entity. | | | | `clusterId` _string_ | The optional cluster identifier | | | | `createdAt` _[MicroTime](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#microtime-v1-meta)_ | Timestamp when the entity was created. | | | | `lastUpdatedAt` _[MicroTime](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#microtime-v1-meta)_ | Timestamp when the entity was last updated. | | | | `deletedAt` _[MicroTime](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#microtime-v1-meta)_ | Timestamp when the entity was deleted (or is started to be deleting).
If not set the entity is not deleted | | | | `payload` _[JSON](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#json-v1-apiextensions-k8s-io)_ | Generic payload for this entity | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantentitystatusresult) QdrantEntityStatusResult QdrantEntityStatusResult is the last result from the invocation to a manager _Appears in:_ - [QdrantEntityStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantentitystatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `result` _[EntityResult](https://qdrant.tech/documentation/private-cloud/api-reference/#entityresult)_ | The result of last reconcile of the entity | | Enum: \[Ok Pending Error\] | | `reason` _string_ | The reason of the result (e.g. in case of an error) | | | | `payload` _[JSON](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#json-v1-apiextensions-k8s-io)_ | The optional payload of the status. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantimage) QdrantImage _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `repository` _string_ | Repository specifies the repository of the Qdrant image.
If not specified defaults the config of the operator (or qdrant/qdrant if not specified in operator). | | | | `pullPolicy` _[PullPolicy](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#pullpolicy-v1-core)_ | PullPolicy specifies the image pull policy for the Qdrant image.
If not specified defaults the config of the operator (or IfNotPresent if not specified in operator). | | | | `pullSecretName` _string_ | PullSecretName specifies the pull secret for the Qdrant image. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantrelease) QdrantRelease QdrantRelease describes an available Qdrant release _Appears in:_ - [QdrantReleaseList](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantreleaselist) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantRelease` | | | | `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `spec` _[QdrantReleaseSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantreleasespec)_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantreleaselist) QdrantReleaseList QdrantReleaseList contains a list of QdrantRelease | Field | Description | Default | Validation | | --- | --- | --- | --- | | `apiVersion` _string_ | `qdrant.io/v1` | | | | `kind` _string_ | `QdrantReleaseList` | | | | `metadata` _[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#listmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | | `items` _[QdrantRelease](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantrelease) array_ | | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantreleasespec) QdrantReleaseSpec QdrantReleaseSpec defines the desired state of QdrantRelease _Appears in:_ - [QdrantRelease](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantrelease) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `version` _string_ | Version number (should be semver compliant).
E.g. “v1.10.1” | | | | `default` _boolean_ | If set, this version is default for new clusters on Cloud.
There should be only 1 Qdrant version in the platform set as default. | false | | | `image` _string_ | Full docker image to use for this version.
If empty, a default image will be derived from Version (and qdrant/qdrant is assumed). | | | | `unavailable` _boolean_ | If set, this version cannot be used for new clusters. | false | | | `endOfLife` _boolean_ | If set, this version is no longer actively supported. | false | | | `accountIds` _string array_ | If set, this version can only be used by accounts with given IDs. | | | | `accountPrivileges` _string array_ | If set, this version can only be used by accounts that have been given the listed privileges. | | | | `remarks` _string_ | General remarks for human reading | | | | `releaseNotesURL` _string_ | Release Notes URL for the specified version | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantsecretkeyref) QdrantSecretKeyRef _Appears in:_ - [QdrantConfigurationService](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfigurationservice) - [QdrantConfigurationTLS](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfigurationtls) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `secretKeyRef` _[SecretKeySelector](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#secretkeyselector-v1-core)_ | SecretKeyRef to the secret containing data to configure the qdrant instance | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#qdrantsecuritycontext) QdrantSecurityContext _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `user` _integer_ | User specifies the user to run the Qdrant process as. | | | | `group` _integer_ | Group specifies the group to run the Qdrant process as. | | | | `fsGroup` _integer_ | FsGroup specifies file system group to run the Qdrant process as. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#rebalancestrategy) RebalanceStrategy _Underlying type:_ _string_ RebalanceStrategy specifies the strategy to use for automaticially rebalancing shards the cluster. _Validation:_ - Enum: \[by\_count by\_size by\_count\_and\_size\] _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | | --- | --- | | `by_count` | | | `by_size` | | | `by_count_and_size` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#regioncapabilities) RegionCapabilities _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `volumeSnapshot` _boolean_ | VolumeSnapshot specifies whether the Kubernetes cluster supports volume snapshot | | | | `volumeExpansion` _boolean_ | VolumeExpansion specifies whether the Kubernetes cluster supports volume expansion | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#regionphase) RegionPhase _Underlying type:_ _string_ _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | | --- | --- | | `Ready` | | | `NotReady` | | | `FailedToSync` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#resourcerequests) ResourceRequests _Appears in:_ - [Resources](https://qdrant.tech/documentation/private-cloud/api-reference/#resources) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `cpu` _string_ | CPU specifies the CPU request for each Qdrant node. | | | | `memory` _string_ | Memory specifies the memory request for each Qdrant node. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#resources) Resources _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `cpu` _string_ | CPU specifies the CPU limit for each Qdrant node. | | | | `memory` _string_ | Memory specifies the memory limit for each Qdrant node. | | | | `storage` _string_ | Storage specifies the storage amount for each Qdrant node. | | | | `requests` _[ResourceRequests](https://qdrant.tech/documentation/private-cloud/api-reference/#resourcerequests)_ | Requests specifies the resource requests for each Qdrant node. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#restoredestination) RestoreDestination _Appears in:_ - [QdrantClusterRestoreSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestorespec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `name` _string_ | Name of the destination cluster | | | | `namespace` _string_ | Namespace of the destination cluster | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#restorephase) RestorePhase _Underlying type:_ _string_ _Appears in:_ - [QdrantClusterRestoreStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestorestatus) | Field | Description | | --- | --- | | `Running` | | | `Skipped` | | | `Failed` | | | `Succeeded` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#restoresource) RestoreSource _Appears in:_ - [QdrantClusterRestoreSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterrestorespec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `snapshotName` _string_ | SnapshotName is the name of the snapshot from which we wish to restore | | | | `namespace` _string_ | Namespace of the snapshot | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#scheduledsnapshotphase) ScheduledSnapshotPhase _Underlying type:_ _string_ _Appears in:_ - [QdrantClusterScheduledSnapshotStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterscheduledsnapshotstatus) | Field | Description | | --- | --- | | `Active` | | | `Disabled` | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#storageclass) StorageClass _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `name` _string_ | Name specifies the name of the storage class | | | | `default` _boolean_ | Default specifies whether the storage class is the default storage class | | | | `provisioner` _string_ | Provisioner specifies the provisioner of the storage class | | | | `allowVolumeExpansion` _boolean_ | AllowVolumeExpansion specifies whether the storage class allows volume expansion | | | | `reclaimPolicy` _string_ | ReclaimPolicy specifies the reclaim policy of the storage class | | | | `parameters` _object (keys:string, values:string)_ | Parameters specifies the parameters of the storage class | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#storageclassnames) StorageClassNames _Appears in:_ - [QdrantClusterSpec](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclusterspec) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `db` _string_ | DB specifies the storage class name for db volume. | | | | `snapshots` _string_ | Snapshots specifies the storage class name for snapshots volume. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#storageconfig) StorageConfig _Appears in:_ - [QdrantConfiguration](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantconfiguration) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `performance` _[StoragePerformanceConfig](https://qdrant.tech/documentation/private-cloud/api-reference/#storageperformanceconfig)_ | Performance configuration | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#storageperformanceconfig) StoragePerformanceConfig _Appears in:_ - [StorageConfig](https://qdrant.tech/documentation/private-cloud/api-reference/#storageconfig) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `optimizer_cpu_budget` _integer_ | OptimizerCPUBudget defines the number of CPU allocation.
If 0 - auto selection, keep 1 or more CPUs unallocated depending on CPU size
If negative - subtract this number of CPUs from the available CPUs.
If positive - use this exact number of CPUs. | | | | `async_scorer` _boolean_ | AsyncScorer enables io\_uring when rescoring | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#traefikconfig) TraefikConfig _Appears in:_ - [Ingress](https://qdrant.tech/documentation/private-cloud/api-reference/#ingress) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `allowedSourceRanges` _string array_ | AllowedSourceRanges specifies the allowed CIDR source ranges for the ingress. | | | | `entryPoints` _string array_ | EntryPoints is the list of traefik entry points to use for the ingress route.
If nothing is set, it will take the entryPoints configured in the operator config. | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#volumesnapshotclass) VolumeSnapshotClass _Appears in:_ - [QdrantCloudRegionStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantcloudregionstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `name` _string_ | Name specifies the name of the volume snapshot class | | | | `driver` _string_ | Driver specifies the driver of the volume snapshot class | | | #### [Anchor](https://qdrant.tech/documentation/private-cloud/api-reference/\#volumesnapshotinfo) VolumeSnapshotInfo _Appears in:_ - [QdrantClusterSnapshotStatus](https://qdrant.tech/documentation/private-cloud/api-reference/#qdrantclustersnapshotstatus) | Field | Description | Default | Validation | | --- | --- | --- | --- | | `volumeSnapshotName` _string_ | VolumeSnapshotName is the name of the volume snapshot | | | | `volumeName` _string_ | VolumeName is the name of the volume that was backed up | | | | `readyToUse` _boolean_ | ReadyToUse indicates if the volume snapshot is ready to use | | | | `snapshotHandle` _string_ | SnapshotHandle is the identifier of the volume snapshot in the respective cloud provider | | | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/api-reference.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/private-cloud/api-reference.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-163-lllmstxt|> ## seed-round - [Articles](https://qdrant.tech/articles/) - On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round. [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round. Andre Zayarni · April 19, 2023 ![On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round.](https://qdrant.tech/articles_data/seed-round/preview/title.jpg) > Vector databases are here to stay. The New Age of AI is powered by vector embeddings, and vector databases are a foundational part of the stack. At Qdrant, we are working on cutting-edge open-source vector similarity search solutions to power fantastic AI applications with the best possible performance and excellent developer experience. > > Our 7.5M seed funding – led by [Unusual Ventures](https://www.unusual.vc/), awesome angels, and existing investors – will help us bring these innovations to engineers and empower them to make the most of their unstructured data and the awesome power of LLMs at any scale. We are thrilled to announce that we just raised our seed round from the best possible investor we could imagine for this stage. Let’s talk about fundraising later – it is a story itself that I could probably write a bestselling book about. First, let’s dive into a bit of background about our project, our progress, and future plans. ## [Anchor](https://qdrant.tech/articles/seed-round/\#a-need-for-vector-databases) A need for vector databases. Unstructured data is growing exponentially, and we are all part of a huge unstructured data workforce. This blog post is unstructured data; your visit here produces unstructured and semi-structured data with every web interaction, as does every photo you take or email you send. The global datasphere will grow to [165 zettabytes by 2025](https://github.com/qdrant/qdrant/pull/1639), and about 80% of that will be unstructured. At the same time, the rising demand for AI is vastly outpacing existing infrastructure. Around 90% of machine learning research results fail to reach production because of a lack of tools. ![Vector Databases Demand](https://qdrant.tech/articles_data/seed-round/demand.png) Demand for AI tools Thankfully there’s a new generation of tools that let developers work with unstructured data in the form of vector embeddings, which are deep representations of objects obtained from a neural network model. A vector database, also known as a vector similarity search engine or approximate nearest neighbour (ANN) search database, is a database designed to store, manage, and search high-dimensional data with an additional payload. Vector Databases turn research prototypes into commercial AI products. Vector search solutions are industry agnostic and bring solutions for a number of use cases, including classic ones like semantic search, matching engines, and recommender systems to more novel applications like anomaly detection, working with time series, or biomedical data. The biggest limitation is to have a neural network encoder in place for the data type you are working with. ![Vector Search Use Cases](https://qdrant.tech/articles_data/seed-round/use-cases.png) Vector Search Use Cases With the rise of large language models (LLMs), Vector Databases have become the fundamental building block of the new AI Stack. They let developers build even more advanced applications by extending the “knowledge base” of LLMs-based applications like ChatGPT with real-time and real-world data. A new AI product category, “Co-Pilot for X,” was born and is already affecting how we work. Starting from producing content to developing software. And this is just the beginning, there are even more types of novel applications being developed on top of this stack. ![New AI Stack](https://qdrant.tech/articles_data/seed-round/ai-stack.png) New AI Stack ## [Anchor](https://qdrant.tech/articles/seed-round/\#enter-qdrant) Enter Qdrant. At the same time, adoption has only begun. Vector Search Databases are replacing VSS libraries like FAISS, etc., which, despite their disadvantages, are still used by ~90% of projects out there They’re hard-coupled to the application code, lack of production-ready features like basic CRUD operations or advanced filtering, are a nightmare to maintain and scale and have many other difficulties that make life hard for developers. The current Qdrant ecosystem consists of excellent products to work with vector embeddings. We launched our managed vector database solution, Qdrant Cloud, early this year, and it is already serving more than 1,000 Qdrant clusters. We are extending our offering now with managed on-premise solutions for enterprise customers. ![Qdrant Vector Database Ecosystem](https://qdrant.tech/articles_data/seed-round/ecosystem.png) Qdrant Ecosystem Our plan for the current [open-source roadmap](https://github.com/qdrant/qdrant/blob/master/docs/roadmap/README.md) is to make billion-scale vector search affordable. Our recent release of the [Scalar Quantization](https://qdrant.tech/articles/scalar-quantization/) improves both memory usage (x4) as well as speed (x2). Upcoming [Product Quantization](https://www.irisa.fr/texmex/people/jegou/papers/jegou_searching_with_quantization.pdf) will introduce even another option with more memory saving. Stay tuned. Qdrant started more than two years ago with the mission of building a vector database powered by a well-thought-out tech stack. Using Rust as the system programming language and technical architecture decision during the development of the engine made Qdrant the leading and one of the most popular vector database solutions. Our unique custom modification of the [HNSW algorithm](https://qdrant.tech/articles/filtrable-hnsw/) for Approximate Nearest Neighbor Search (ANN) allows querying the result with a state-of-the-art speed and applying filters without compromising on results. Cloud-native support for distributed deployment and replications makes the engine suitable for high-throughput applications with real-time latency requirements. Rust brings stability, efficiency, and the possibility to make optimization on a very low level. In general, we always aim for the best possible results in [performance](https://qdrant.tech/benchmarks/), code quality, and feature set. Most importantly, we want to say a big thank you to our [open-source community](https://qdrant.to/discord), our adopters, our contributors, and our customers. Your active participation in the development of our products has helped make Qdrant the best vector database on the market. I cannot imagine how we could do what we’re doing without the community or without being open-source and having the TRUST of the engineers. Thanks to all of you! I also want to thank our team. Thank you for your patience and trust. Together we are strong. Let’s continue doing great things together. ## [Anchor](https://qdrant.tech/articles/seed-round/\#fundraising) Fundraising The whole process took only a couple of days, we got several offers, and most probably, we would get more with different conditions. We decided to go with Unusual Ventures because they truly understand how things work in the open-source space. They just did it right. Here is a big piece of advice for all investors interested in open-source: Dive into the community, and see and feel the traction and product feedback instead of looking at glossy pitch decks. With Unusual on our side, we have an active operational partner instead of one who simply writes a check. That help is much more important than overpriced valuations and big shiny names. Ultimately, the community and adopters will decide what products win and lose, not VCs. Companies don’t need crazy valuations to create products that customers love. You do not need Ph.D. to innovate. You do not need to over-engineer to build a scalable solution. You do not need ex-FANG people to have a great team. You need clear focus, a passion for what you’re building, and the know-how to do it well. We know how. PS: This text is written by me in an old-school way without any ChatGPT help. Sometimes you just need inspiration instead of AI ;-) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/seed-round.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/seed-round.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-164-lllmstxt|> ## search-feedback-loop - [Articles](https://qdrant.tech/articles/) - Relevance Feedback in Informational Retrieval [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Relevance Feedback in Informational Retrieval Evgeniya Sukhodolskaya · March 27, 2025 ![Relevance Feedback in Informational Retrieval](https://qdrant.tech/articles_data/search-feedback-loop/preview/title.jpg) > A problem well stated is a problem half solved. This quote applies as much to life as it does to information retrieval. With a well-formulated query, retrieving the relevant document becomes trivial. In reality, however, most users struggle to precisely define what they are searching for. While users may struggle to formulate a perfect request — especially in unfamiliar topics — they can easily judge whether a retrieved answer is relevant or not. **Relevance is a powerful feedback mechanism for a retrieval system** to iteratively refine results in the direction of user interest. In 2025, with social media flooded with daily AI breakthroughs, it almost seems like information retrieval is solved, agents can iteratively adjust their search queries while assessing the relevance. Of course, there’s a catch: these models still rely on retrieval systems ( _RAG isn’t dead yet, despite daily predictions of its demise_). They receive only a handful of top-ranked results provided by a far simpler and cheaper retriever. As a result, the success of guided retrieval still mainly depends on the retrieval system itself. So, we should find a way of effectively and efficiently incorporating relevance feedback directly into a retrieval system. In this article, we’ll explore the approaches proposed in the research literature and try to answer the following question: _If relevance feedback in search is so widely studied and praised as effective, why is it practically not used in dedicated vector search solutions?_ ## [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#dismantling-the-relevance-feedback) Dismantling the Relevance Feedback Both industry and academia tend to reinvent the wheel here and there. So, we first took some time to study and categorize different methods — just in case there was something we could plug directly into Qdrant. The resulting taxonomy isn’t set in stone, but we aim to make it useful. ![Types of Relevance Feedback](https://qdrant.tech/articles_data/search-feedback-loop/relevance-feedback.png) Types of Relevance Feedback ### [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#pseudo-relevance-feedback-prf) Pseudo-Relevance Feedback (PRF) Pseudo-Relevance feedback takes the top-ranked documents from the initial retrieval results and treats them as relevant. This approach might seem naive, but it provides a noticeable performance boost in lexical retrieval while being relatively cheap to compute. ### [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#binary-relevance-feedback) Binary Relevance Feedback The most straightforward way to gather feedback is to ask users directly if document is relevant. There are two main limitations to this approach: First, users are notoriously reluctant to provide feedback. Did you know that [Google once had](https://en.wikipedia.org/wiki/Google_SearchWiki#:~:text=SearchWiki%20was%20a%20Google%20Search,for%20a%20given%20search%20query) an upvote/downvote mechanism on search results but removed it because almost no one used it? Second, even if users are willing to provide feedback, no relevant documents might be present in the initial retrieval results. In this case, the user can’t provide a meaningful signal. Instead of asking users, we can ask a smart model to provide binary relevance judgements, but this would limit its potential to generate granular judgements. ### [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#re-scored-relevance-feedback) Re-scored Relevance Feedback We can also apply more sophisticated methods to extract relevance feedback from the top-ranked documents - machine learning models can provide a relevance score for each document. The obvious concern here is twofold: 1. How accurately can the automated judge determine relevance (or irrelevance)? 2. How cost-efficient is it? After all, you can’t expect GPT-4o to re-rank thousands of documents for every user query — unless you’re filthy rich. Nevertheless, automated re-scored feedback could be a scalable way to improve search when explicit binary feedback is not accessible. ## [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#has-the-problem-already-been-solved) Has the Problem Already Been Solved? Digging through research materials, we expected anything else but to discover that the first relevance feedback study dates back [_sixty years_](https://sigir.org/files/museum/pub-08/XXIII-1.pdf). In the midst of the neural search bubble, it’s easy to forget that lexical (term-based) retrieval has been around for decades. Naturally, research in that field has had enough time to develop. **Neural search** — aka [vector search](https://qdrant.tech/articles/neural-search-tutorial/) — gained traction in the industry around 5 years ago. Hence, vector-specific relevance feedback techniques might still be in their early stages, awaiting production-grade validation and industry adoption. As a [dedicated vector search engine](https://qdrant.tech/articles/dedicated-vector-search/), we would like to be these adopters. Our focus is neural search, but approaches in both lexical and neural retrieval seem worth exploring, as cross-field studies are always insightful, with the potential to reuse well-established methods of one field in another. We found some interesting methods applicable to neural search solutions and additionally revealed a **gap in the neural search-based relevance feedback approaches**. Stick around, and we’ll share our findings! ## [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#two-ways-to-approach-the-problem) Two Ways to Approach the Problem Retrieval as a recipe can be broken down into three main ingredients: 1. Query 2. Documents 3. Similarity scoring between them. ![Research Field Taxonomy Overview](https://qdrant.tech/articles_data/search-feedback-loop/taxonomy-overview.png) Research Field Taxonomy Overview Query formulation is a subjective process – it can be done in infinite configurations, making the relevance of a document unpredictable until the query is formulated and submitted to the system. So, adapting documents (or the search index) to relevance feedback would require per-request dynamic changes, which is impractical, considering that modern retrieval systems store billions of documents. Thus, approaches for incorporating relevance feedback in search fall into two categories: **refining a query** and **refining the similarity scoring function** between the query and documents. ## [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#query-refinement) Query Refinement There are several ways to refine a query based on relevance feedback. Globally, we prefer to distinguish between two approaches: modifying the query as text and modifying the vector representation of the query. ![Incorporating Relevance Feedback in Query](https://qdrant.tech/articles_data/search-feedback-loop/query.png) Incorporating Relevance Feedback in Query ### [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#query-as-text) Query As Text In **term-based retrieval**, an intuitive way to improve a query would be to **expand it with relevant terms**. It resembled the “ _aha, so that’s what it’s called_” stage in the discovery search. Before the deep learning era of this century, expansion terms were mainly selected using statistical or probabilistic models. The idea was to: 1. Either extract the **most frequent** terms from (pseudo-)relevant documents; 2. Or the **most specific** ones (for example, according to IDF); 3. Or the **most probable** ones (most likely to be in query according to a relevance set). Well-known methods of those times come from the family of [Relevance Models](https://sigir.org/wp-content/uploads/2017/06/p260.pdf), where terms for expansion are chosen based on their probability in pseudo-relevant documents (how often terms appear) and query terms likelihood given those pseudo-relevant documents - how strongly these pseudo-relevant documents match the query. The most famous one, `RM3` – interpolation of expansion terms probability with their probability in a query – is still appearing in papers of the last few years as a (noticeably decent) baseline in term-based retrieval, usually as part of [anserini](https://github.com/castorini/anserini). ![Simplified Query Expansion](https://qdrant.tech/articles_data/search-feedback-loop/relevance-models.png) Simplified Query Expansion With the time approaching the modern machine learning era, [multiple](https://aclanthology.org/2020.findings-emnlp.424.pdf) [studies](https://dl.acm.org/doi/10.1145/1390334.1390377) began claiming that these traditional ways of query expansion are not as effective as they could be. Started with simple classifiers based on hand-crafted features, this trend naturally led to use the famous [BERT (Bidirectional encoder representations from transformers)](https://huggingface.co/docs/transformers/model_doc/bert). For example, `BERT-QE` (Query Expansion) authors came up with this schema: 1. Get pseudo-relevance feedback from the finetuned BERT reranker (~10 documents); 2. Chunk these pseudo-relevant documents (~100 words) and score query-chunk relevance with the same reranker; 3. Expand the query with the most relevant chunks; 4. Rerank 1000 documents with the reranker using the expanded query. This approach significantly outperformed BM25 + RM3 baseline in experiments (+11% NDCG@20). However, it required **11.01x** more computation than just using BERT for reranking, and reranking 1000 documents with BERT would take around 9 seconds alone. Query term expansion can _hypothetically_ work for neural retrieval as well. New terms might shift the query vector closer to that of the desired document. However, [this approach isn’t guaranteed to succeed](https://dl.acm.org/doi/10.1145/3570724). Neural search depends entirely on embeddings, and how those embeddings are generated — consequently, how similar query and document vectors are — depends heavily on the model’s training. It definitely works if **query refining is done by a model operating in the same vector space**, which typically requires offline training of a retriever. The goal is to extend the query encoder input to also include feedback documents, producing an adjusted query embedding. Examples include [`ANCE-PRF`](https://arxiv.org/pdf/2108.13454) and [`ColBERT-PRF`](https://dl.acm.org/doi/10.1145/3572405) – ANCE and ColBERT fine-tuned extensions. ![Generating a new relevance-aware query vector](https://qdrant.tech/articles_data/search-feedback-loop/updated-encoder.png) Generating a new relevance-aware query vector The reason why you’re most probably not familiar with these models – their absence in the industry – is that their **training** itself is a **high upfront cost**, and even though it was “paid”, these models [struggle with generalization](https://arxiv.org/abs/2108.13454), performing poorly on out-of-domain tasks (datasets they haven’t seen during training). Additionally, feeding an attention-based model a lengthy input (query + documents) is not a good practice in production settings (attention is quadratic in the input length), where time and money are crucial decision factors. Alternatively, one could skip a step — and work directly with vectors. ### [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#query-as-vector) Query As Vector Instead of modifying the initial query, a more scalable approach is to directly adjust the query vector. It is easily applicable across modalities and suitable for both lexical and neural retrieval. Although vector search has become a trend in recent years, its core principles have existed in the field for decades. For example, the SMART retrieval system used by [Rocchio](https://sigir.org/files/museum/pub-08/XXIII-1.pdf) in 1965 for his relevance feedback experiments operated on bag-of-words vector representations of text. ![Roccio’s Relevance Feedback Method](https://qdrant.tech/articles_data/search-feedback-loop/Roccio.png) Roccio’s Relevance Feedback Method **Rocchio’s idea** — to update the query vector by adding a difference between the centroids of relevant and non-relevant documents — seems to translate well to modern dual encoders-based dense retrieval systems. Researchers seem to agree: a study from 2022 demonstrated that the [parametrized version of Rocchio’s method](https://arxiv.org/pdf/2108.11044) in dense retrieval consistently improves Recall@1000 by 1–5%, while keeping query processing time suitable for production — around 170 ms. However, parameters (centroids and query weights) in the dense retrieval version of Roccio’s method must be tuned for each dataset and, ideally, also for each request. #### [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#gradient-descent-based-methods) Gradient Descent-Based Methods The efficient way of doing so on-the-fly remained an open question until the introduction of a **gradient-descent-based Roccio’s method generalization**: [`Test-Time Optimization of Query Representations (TOUR)`](https://arxiv.org/pdf/2205.12680). TOUR adapts a query vector over multiple iterations of retrieval and reranking ( _retrieve → rerank → gradient descent step_), guided by a reranker’s relevance judgments. ![An overview of TOUR iteratively optimizing initial query representation based on pseudo relevance feedback. Figure adapted from Sung et al., 2023, Optimizing Test-Time Query Representations for Dense Retrieval](https://qdrant.tech/articles_data/search-feedback-loop/TOUR.png) An overview of TOUR iteratively optimizing initial query representation based on pseudo relevance feedback. Figure adapted from Sung et al., 2023, [Optimizing Test-Time Query Representations for Dense Retrieval](https://arxiv.org/pdf/2205.12680) The next iteration of gradient-based methods of query refinement – [`ReFit`](https://arxiv.org/abs/2305.11744) – proposed in 2024 a lighter, production-friendly alternative to TOUR, limiting _retrieve → rerank → gradient descent_ sequence to only one iteration. The retriever’s query vector is updated through matching (via [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)) retriever and cross-encoder’s similarity scores distribution over feedback documents. ReFit is model- and language-independent and stably improves Recall@100 metric on 2–3%. ![An overview of ReFit, a gradient-based method for query refinement](https://qdrant.tech/articles_data/search-feedback-loop/refit.png) An overview of ReFit, a gradient-based method for query refinement Gradient descent-based methods seem like a production-viable option, an alternative to finetuning the retriever (distilling it from a reranker). Indeed, it doesn’t require in-advance training and is compatible with any re-ranking models. However, a few limitations baked into these methods prevented a broader adoption in the industry. The gradient descent-based methods modify elements of the query vector as if it were model parameters; therefore, they require a substantial amount of feedback documents to converge to a stable solution. On top of that, the gradient descent-based methods are sensitive to the choice of hyperparameters, leading to **query drift**, where the query may drift entirely away from the user’s intent. ## [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#similarity-scoring) Similarity Scoring ![Incorporating Relevance Feedback in Similarity Scoring](https://qdrant.tech/articles_data/search-feedback-loop/similairty-scoring.png) Incorporating Relevance Feedback in Similarity Scoring Another family of approaches is built around the idea of incorporating relevance feedback directly into the similarity scoring function. It might be desirable in cases where we want to preserve the original query intent, but still adjust the similarity score based on relevance feedback. In **lexical retrieval**, this can be as simple as boosting documents that share more terms with those judged as relevant. Its **neural search counterpart** is a [`k-nearest neighbors-based method`](https://aclanthology.org/2022.emnlp-main.614.pdf) that adjusts the query-document similarity score by adding the sum of similarities between the candidate document and all known relevant examples. This technique yields a significant improvement, around 5.6 percentage points in NDCG@20, but it requires explicitly labelled (by users) feedback documents to be effective. In experiments, the knn-based method is treated as a reranker. In all other papers, we also found that adjusting similarity scores based on relevance feedback is centred around [reranking](https://qdrant.tech/documentation/search-precision/reranking-semantic-search/) – **training or finetuning rerankers to become relevance feedback-aware**. Typically, experiments include cross-encoders, though [simple classifiers are also an option](https://arxiv.org/pdf/1904.08861). These methods generally involve rescoring a broader set of documents retrieved during an initial search, guided by feedback from a smaller top-ranked subset. It is not a similarity matching function adjustment per se but rather a similarity scoring model adjustment. Methods typically fall into two categories: 1. **Training rerankers offline** to ingest relevance feedback as an additional input at inference time, [as here](https://aclanthology.org/D18-1478.pdf) — again, attention-based models and lengthy inputs: a production-deadly combination. 2. **Finetuning rerankers** on relevance feedback from the first retrieval stage, [as Baumgärtner et al. did](https://aclanthology.org/2022.emnlp-main.614.pdf), finetuning bias parameters of a small cross-encoder per query on 2k, k={2, 4, 8} feedback documents. The biggest limitation here is that these reranker-based methods cannot retrieve relevant documents beyond those returned in the initial search, and using rerankers on thousands of documents in production is a no-go – it’s too expensive. Ideally, to avoid that, a similarity scoring function updated with relevance feedback should be used directly in the second retrieval iteration. However, in every research paper we’ve come across, retrieval systems are **treated as black boxes** — ingesting queries, returning results, and offering no built-in mechanism to modify scoring. ## [Anchor](https://qdrant.tech/articles/search-feedback-loop/\#so-what-are-the-takeaways) So, what are the takeaways? Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of lexical retrievers. Several PRF-based approaches – mainly query terms expansion-based – are successfully integrated into traditional retrieval systems. At the same time, there are **no known industry-adopted analogues in neural (vector) search dedicated solutions**; neural search-compatible methods remain stuck in research papers. The gap we noticed while studying the field is that researchers have **no direct access to retrieval systems**, forcing them to design wrappers around the black-box-like retrieval oracles. This is sufficient for query-adjusting methods but not for similarity scoring function adjustment. Perhaps relevance feedback methods haven’t made it into the neural search systems for trivial reasons — like no one having the time to find the right balance between cost and efficiency. Getting it to work in a production setting means experimenting, building interfaces, and adapting architectures. Simply put, it needs to look worth it. And unlike 2D vector math, high-dimensional vector spaces are anything but intuitive. The curse of dimensionality is real. So is query drift. Even methods that make perfect sense on paper might not work in practice. A real-world solution should be simple. Maybe just a little bit smarter than a rule-based approach, but still practical. It shouldn’t require fine-tuning thousands of parameters or feeding paragraphs of text into transformers. **And for it to be effective, it needs to be integrated directly into the retrieval system itself.** ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/search-feedback-loop.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/search-feedback-loop.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-165-lllmstxt|> ## advanced-tutorials - [Documentation](https://qdrant.tech/documentation/) - Advanced Retrieval --- # [Anchor](https://qdrant.tech/documentation/advanced-tutorials/\#advanced-tutorials) Advanced Tutorials | | | --- | | [Use Collaborative Filtering to Build a Movie Recommendation System with Qdrant](https://qdrant.tech/documentation/advanced-tutorials/collaborative-filtering/) | | [Build a Text/Image Multimodal Search System with Qdrant and FastEmbed](https://qdrant.tech/documentation/advanced-tutorials/multimodal-search-fastembed/) | | [Navigate Your Codebase with Semantic Search and Qdrant](https://qdrant.tech/documentation/advanced-tutorials/code-search/) | | [Ensure optimal large-scale PDF Retrieval with Qdrant and ColPali/ColQwen](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/) | ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/advanced-tutorials/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-166-lllmstxt|> ## agentic-rag-camelai-discord - [Documentation](https://qdrant.tech/documentation/) - Agentic RAG Discord Bot with CAMEL-AI ![agentic-rag-camelai-astronaut](https://qdrant.tech/documentation/examples/agentic-rag-camelai-discord/astronaut-main.png) --- # [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#agentic-rag-discord-chatbot-with-qdrant-camel-ai--openai) Agentic RAG Discord ChatBot with Qdrant, CAMEL-AI, & OpenAI | Time: 45 min | Level: Intermediate | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ymqzm6ySoyVOekY7fteQBCFCXYiYyHxw#scrollTo=QQZXwzqmNfaS) | | | --- | --- | --- | --- | Unlike traditional RAG techniques, which passively retrieve context and generate responses, **agentic RAG** involves active decision-making and multi-step reasoning by the chatbot. Instead of just fetching data, the chatbot makes decisions, dynamically interacts with various data sources, and adapts based on context, giving it a much more dynamic and intelligent approach. In this tutorial, we’ll develop a fully functional chatbot using Qdrant, [CAMEL-AI](https://www.camel-ai.org/), and [OpenAI](https://openai.com/). Let’s get started! * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#workflow-overview) Workflow Overview Below is a high-level look at our Agentic RAG workflow: | Step | Description | | --- | --- | | **1\. Environment Setup** | Install required libraries ( `camel-ai`, `qdrant-client`, `discord.py`) and set up the Python environment. | | **2\. Set Up the OpenAI Embedding Instance** | Create an OpenAI account, generate an API key, and configure the embedding model. | | **3\. Configure the Qdrant Client** | Sign up for Qdrant Cloud, create a cluster, configure `QdrantStorage`, and set up the API connection. | | **4\. Scrape and Process Data** | Use `VectorRetriever` to scrape Qdrant documentation, chunk text, and store embeddings in Qdrant. | | **5\. Set Up the CAMEL-AI ChatAgent** | Instantiate a CAMEL-AI `ChatAgent` with OpenAI models for multi-step reasoning and context-aware responses. | | **6\. Create and Configure the Discord Bot** | Register a new bot in the Discord Developer Portal, invite it to a server, and enable permissions. | | **7\. Build the Discord Bot** | Integrate Discord.py with CAMEL-AI and Qdrant to retrieve context and generate intelligent responses. | | **8\. Test the Bot** | Run the bot in a live Discord server and verify that it provides relevant, context-rich answers. | ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#architecture-diagram) Architecture Diagram Below is the architecture diagram representing the workflow and interactions of the chatbot: ![Architecture Diagram](https://qdrant.tech/documentation/examples/agentic-rag-camelai-discord/diagram_discord_bot.png) The workflow starts by **scraping, chunking, and upserting** content from URLs using the `vector_retriever.process()` method, which generates embeddings with the **OpenAI embedding instance**. These embeddings, along with their metadata, are then indexed and stored in **Qdrant** via the `QdrantStorage` class. When a user sends a query through the **Discord bot**, it is processed by `vector_retriever.query()`, which first embeds the query using **OpenAI Embeddings** and then retrieves the most relevant matches from Qdrant via `QdrantStorage`. The retrieved context (e.g., relevant documentation snippets) is then passed to an **OpenAI-powered Qdrant Agent** under **CAMEL-AI**, which generates a final, context-aware response. The Qdrant Agent processes the retrieved vectors using the `GPT_4O_MINI` language model, producing a response that is contextually relevant to the user’s query. This response is then sent back to the user through the **Discord bot**, completing the flow. * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-1-environment-setup)**Step 1: Environment Setup** Before diving into the implementation, here’s a high-level overview of the stack we’ll use: | **Component** | **Purpose** | | --- | --- | | **Qdrant** | Vector database for storing and querying document embeddings. | | **OpenAI** | Embedding and language model for generating vector representations and chatbot responses. | | **CAMEL-AI** | Framework for managing dialogue flow, retrieval, and AI agent interactions. | | **Discord API** | Platform for deploying and interacting with the chatbot. | ### [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#install-dependencies) Install Dependencies We’ll install CAMEL-AI, which includes all necessary dependencies: ```python !pip install camel-ai[all]==0.2.17 ``` * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-2-set-up-the-openai-embedding-instance)**Step 2: Set Up the OpenAI Embedding Instance** 1. **Create an OpenAI Account**: Go to [OpenAI](https://platform.openai.com/signup) and sign up for an account if you don’t already have one. 2. **Generate an API Key**: - After logging in, click on your profile icon in the top-right corner and select **API keys**. - Click **Create new secret key**. - Copy the generated API key and store it securely. You won’t be able to see it again. Here’s how to set up the OpenAI client in your code: Create a `.env` file in your project directory and add your API key: ```bash OPENAI_API_KEY= ``` Make sure to replace `` with your actual API key. Now, start the OpenAI Client ```python import openai import os from dotenv import load_dotenv load_dotenv() openai_client = openai.Client( api_key=os.getenv("OPENAI_API_KEY") ) ``` To set up the embedding instance, we will use text embedding 3 large: ```python from camel.embeddings import OpenAIEmbedding from camel.types import EmbeddingModelType embedding_instance = OpenAIEmbedding(model_type=EmbeddingModelType.TEXT_EMBEDDING_3_LARGE) ``` ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-3-configure-the-qdrant-client)**Step 3: Configure the Qdrant Client** For this tutorial, we will be using the **Qdrant Cloud Free Tier**. Here’s how to set it up: 1. **Create an Account**: Sign up for a Qdrant Cloud account at [Qdrant Cloud](https://cloud.qdrant.io/). 2. **Create a Cluster**: - Navigate to the **Overview** section. - Follow the onboarding instructions under **Create First Cluster** to set up your cluster. - When you create the cluster, you will receive an **API Key**. Copy and securely store it, as you will need it later. 3. **Wait for the Cluster to Provision**: - Your new cluster will appear under the **Clusters** section. After obtaining your Qdrant Cloud details, add to your `.env` file: ```bash QDRANT_CLOUD_URL= QDRANT_CLOUD_API_KEY= ``` ### [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#configure-the-qdrantstorage) Configure the QdrantStorage The `QdrantStorage` will deal with connecting with the Qdrant Client for all necessary operations to your collection. ```python from camel.retrievers import VectorRetriever --- # Define collection name collection_name = "qdrant-agent" storage_instance = QdrantStorage( vector_dim=embedding_instance.get_output_dim(), url_and_api_key=( qdrant_cloud_url, qdrant_api_key, ), collection_name=collection_name, ) ``` Make sure to update the `` and `` fields. * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-4-scrape-and-process-data)**Step 4: Scrape and Process Data** We’ll use CamelAI `VectorRetriever` library to help us to It processes content from a file or URL, divides it into chunks, and stores the embeddings in the specified Qdrant collection. ```python from camel.retrievers import VectorRetriever vector_retriever = VectorRetriever(embedding_model=embedding_instance, storage=storage_instance) qdrant_urls = [\ "https://qdrant.tech/documentation/overview",\ "https://qdrant.tech/documentation/guides/installation",\ "https://qdrant.tech/documentation/concepts/filtering",\ "https://qdrant.tech/documentation/concepts/indexing",\ "https://qdrant.tech/documentation/guides/distributed_deployment",\ "https://qdrant.tech/documentation/guides/quantization"\ # Add more URLs as needed\ ] for qdrant_url in qdrant_urls: vector_retriever.process( content=qdrant_url, ) ``` * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-5-setup-the-camel-ai-chatagent-instance)**Step 5: Setup the CAMEL-AI ChatAgent Instance** Define the OpenAI model and create a CAMEL-AI ChatAgent instance. ```python from camel.configs import ChatGPTConfig from camel.models import ModelFactory from camel.types import ModelPlatformType, ModelType from camel.agents import ChatAgent --- # Create a ChatGPT configuration config = ChatGPTConfig(temperature=0.2).as_dict() --- # Create an OpenAI model using the configuration openai_model = ModelFactory.create( model_platform=ModelPlatformType.OPENAI, model_type=ModelType.GPT_4O_MINI, model_config_dict=config, ) assistant_sys_msg = """You are a helpful assistant to answer question, I will give you the Original Query and Retrieved Context, answer the Original Query based on the Retrieved Context, if you can't answer the question just say I don't know.""" qdrant_agent = ChatAgent(system_message=assistant_sys_msg, model=openai_model) ``` * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-6-create-and-configure-the-discord-bot)**Step 6: Create and Configure the Discord Bot** Now let’s bring the bot to life! It will serve as the interface through which users can interact with the agentic RAG system you’ve built. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#create-a-new-discord-bot) Create a New Discord Bot 1. Go to the [Discord Developer Portal](https://discord.com/developers/applications) and log in with your Discord account. 2. Click on the **New Application** button. 3. Give your application a name and click **Create**. 4. Navigate to the **Bot** tab on the left sidebar and click **Add Bot**. 5. Once the bot is created, click **Reset Token** under the **Token** section to generate a new bot token. Copy this token securely as you will need it later. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#invite-the-bot-to-your-server) Invite the Bot to Your Server 1. Go to the **OAuth2** tab and then to the **URL Generator** section. 2. Under **Scopes**, select **bot**. 3. Under **Bot Permissions**, select the necessary permissions: - Send Messages - Read Message History 4. Copy the generated URL and paste it into your browser. 5. Select the server where you want to invite the bot and click **Authorize**. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#grant-the-bot-permissions) Grant the Bot Permissions 1. Go back to the **Bot** tab. 2. Enable the following under **Privileged Gateway Intents**: - Server Members Intent - Message Content Intent Now, the bot is ready to be integrated with your code. ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-7-build-the-discord-bot)**Step 7: Build the Discord Bot** Add to your `.env` file: ```bash DISCORD_BOT_TOKEN= ``` We’ll use `discord.py` to create a simple Discord bot that interacts with users and retrieves context from Qdrant before responding. ```python from camel.bots import DiscordApp import nest_asyncio import discord nest_asyncio.apply() discord_q_bot = DiscordApp(token=os.getenv("DISCORD_BOT_TOKEN")) @discord_q_bot.client.event # triggers when a message is sent in the channel async def on_message(message: discord.Message): if message.author == discord_q_bot.client.user: return if message.type != discord.MessageType.default: return if message.author.bot: return user_input = message.content retrieved_info = vector_retriever.query( query=user_input, top_k=10, similarity_threshold=0.6 ) user_msg = str(retrieved_info) assistant_response = qdrant_agent.step(user_msg) response_content = assistant_response.msgs[0].content if len(response_content) > 2000: # discord message length limit for chunk in [response_content[i:i+2000] for i in range(0, len(response_content), 2000)]: await message.channel.send(chunk) else: await message.channel.send(response_content) discord_q_bot.run() ``` * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#step-9-test-the-bot)**Step 9: Test the Bot** 1. Invite your bot to your Discord server using the OAuth2 URL from the Discord Developer Portal. 2. Run the notebook. 3. Start chatting with the bot in your Discord server. It will retrieve context from Qdrant and provide relevant answers based on your queries. ![agentic-rag-discord-bot-what-is-quantization](https://qdrant.tech/documentation/examples/agentic-rag-camelai-discord/example.png) * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-camelai-discord/\#conclusion) Conclusion Nice work! You’ve built an agentic RAG-powered Discord bot that retrieves relevant information with Qdrant, generates smart responses with OpenAI, and handles multi-step reasoning using CAMEL-AI. Here’s a quick recap: - **Smart Knowledge Retrieval:** Your chatbot can now pull relevant info from large datasets using Qdrant’s vector search. - **Autonomous Reasoning with CAMEL-AI:** Enables multi-step reasoning instead of just regurgitating text. - **Live Discord Deployment:** You launched the chatbot on Discord, making it interactive and ready to help real users. One of the biggest advantages of CAMEL-AI is the abstraction it provides, allowing you to focus on designing intelligent interactions rather than worrying about low-level implementation details. You’re now well-equipped to tackle more complex real-world problems that require scalable, autonomous knowledge systems. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/agentic-rag-camelai-discord.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/agentic-rag-camelai-discord.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-167-lllmstxt|> ## llama-index-multitenancy - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Multitenancy with LlamaIndex --- # [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#multitenancy-with-llamaindex) Multitenancy with LlamaIndex If you are building a service that serves vectors for many independent users, and you want to isolate their data, the best practice is to use a single collection with payload-based partitioning. This approach is called **multitenancy**. Our guide on the [Separate Partitions](https://qdrant.tech/documentation/guides/multiple-partitions/) describes how to set it up in general, but if you use [LlamaIndex](https://qdrant.tech/documentation/integrations/llama-index/) as a backend, you may prefer reading a more specific instruction. So here it is! ## [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#prerequisites) Prerequisites This tutorial assumes that you have already installed Qdrant and LlamaIndex. If you haven’t, please run the following commands: ```bash pip install llama-index llama-index-vector-stores-qdrant ``` We are going to use a local Docker-based instance of Qdrant. If you want to use a remote instance, please adjust the code accordingly. Here is how we can start a local instance: ```bash docker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest ``` ## [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#setting-up-llamaindex-pipeline) Setting up LlamaIndex pipeline We are going to implement an end-to-end example of multitenant application using LlamaIndex. We’ll be indexing the documentation of different Python libraries, and we definitely don’t want any users to see the results coming from a library they are not interested in. In real case scenarios, this is even more dangerous, as the documents may contain sensitive information. ### [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#creating-vector-store) Creating vector store [QdrantVectorStore](https://docs.llamaindex.ai/en/stable/examples/vector_stores/QdrantIndexDemo.html) is a wrapper around Qdrant that provides all the necessary methods to work with your vector database in LlamaIndex. Let’s create a vector store for our collection. It requires setting a collection name and passing an instance of `QdrantClient`. ```python from qdrant_client import QdrantClient from llama_index.vector_stores.qdrant import QdrantVectorStore client = QdrantClient("http://localhost:6333") vector_store = QdrantVectorStore( collection_name="my_collection", client=client, ) ``` ### [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#defining-chunking-strategy-and-embedding-model) Defining chunking strategy and embedding model Any semantic search application requires a way to convert text queries into vectors - an embedding model. `ServiceContext` is a bundle of commonly used resources used during the indexing and querying stage in any LlamaIndex application. We can also use it to set up an embedding model - in our case, a local [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). set up ```python from llama_index.core import ServiceContext service_context = ServiceContext.from_defaults( embed_model="local:BAAI/bge-small-en-v1.5", ) ``` _Note_, in case you are using Large Language Model different from OpenAI’s ChatGPT, you should specify `llm` parameter for `ServiceContext`. We can also control how our documents are split into chunks, or nodes using LLamaIndex’s terminology. The `SimpleNodeParser` splits documents into fixed length chunks with an overlap. The defaults are reasonable, but we can also adjust them if we want to. Both values are defined in tokens. ```python from llama_index.core.node_parser import SimpleNodeParser node_parser = SimpleNodeParser.from_defaults(chunk_size=512, chunk_overlap=32) ``` Now we also need to inform the `ServiceContext` about our choices: ```python service_context = ServiceContext.from_defaults( embed_model="local:BAAI/bge-large-en-v1.5", node_parser=node_parser, ) ``` Both embedding model and selected node parser will be implicitly used during the indexing and querying. ### [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#combining-everything-together) Combining everything together The last missing piece, before we can start indexing, is the `VectorStoreIndex`. It is a wrapper around `VectorStore` that provides a convenient interface for indexing and querying. It also requires a `ServiceContext` to be initialized. ```python from llama_index.core import VectorStoreIndex index = VectorStoreIndex.from_vector_store( vector_store=vector_store, service_context=service_context ) ``` ## [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#indexing-documents) Indexing documents No matter how our documents are generated, LlamaIndex will automatically split them into nodes, if required, encode using selected embedding model, and then store in the vector store. Let’s define some documents manually and insert them into Qdrant collection. Our documents are going to have a single metadata attribute - a library name they belong to. ```python from llama_index.core.schema import Document documents = [\ Document(\ text="LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models.",\ metadata={\ "library": "llama-index",\ },\ ),\ Document(\ text="Qdrant is a vector database & vector similarity search engine.",\ metadata={\ "library": "qdrant",\ },\ ),\ ] ``` Now we can index them using our `VectorStoreIndex`: ```python for document in documents: index.insert(document) ``` ### [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#performance-considerations) Performance considerations Our documents have been split into nodes, encoded using the embedding model, and stored in the vector store. However, we don’t want to allow our users to search for all the documents in the collection, but only for the documents that belong to a library they are interested in. For that reason, we need to set up the Qdrant [payload index](https://qdrant.tech/documentation/concepts/indexing/#payload-index), so the search is more efficient. ```python from qdrant_client import models client.create_payload_index( collection_name="my_collection", field_name="metadata.library", field_type=models.PayloadSchemaType.KEYWORD, ) ``` The payload index is not the only thing we want to change. Since none of the search queries will be executed on the whole collection, we can also change its configuration, so the HNSW graph is not built globally. This is also done due to [performance reasons](https://qdrant.tech/documentation/guides/multiple-partitions/#calibrate-performance). **You should not be changing these parameters, if you know there will be some global search operations** **done on the collection.** ```python client.update_collection( collection_name="my_collection", hnsw_config=models.HnswConfigDiff(payload_m=16, m=0), ) ``` Once both operations are completed, we can start searching for our documents. ## [Anchor](https://qdrant.tech/documentation/examples/llama-index-multitenancy/\#querying-documents-with-constraints) Querying documents with constraints Let’s assume we are searching for some information about large language models, but are only allowed to use Qdrant documentation. LlamaIndex has a concept of retrievers, responsible for finding the most relevant nodes for a given query. Our `VectorStoreIndex` can be used as a retriever, with some additional constraints - in our case value of the `library` metadata attribute. ```python from llama_index.core.vector_stores.types import MetadataFilters, ExactMatchFilter qdrant_retriever = index.as_retriever( filters=MetadataFilters( filters=[\ ExactMatchFilter(\ key="library",\ value="qdrant",\ )\ ] ) ) nodes_with_scores = qdrant_retriever.retrieve("large language models") for node in nodes_with_scores: print(node.text, node.score) --- # Output: Qdrant is a vector database & vector similarity search engine. 0.60551536 ``` The description of Qdrant was the best match, even though it didn’t mention large language models at all. However, it was the only document that belonged to the `qdrant` library, so there was no other choice. Let’s try to search for something that is not present in the collection. Let’s define another retrieve, this time for the `llama-index` library: ```python llama_index_retriever = index.as_retriever( filters=MetadataFilters( filters=[\ ExactMatchFilter(\ key="library",\ value="llama-index",\ )\ ] ) ) nodes_with_scores = llama_index_retriever.retrieve("large language models") for node in nodes_with_scores: print(node.text, node.score) --- # Output: LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. 0.63576734 ``` The results returned by both retrievers are different, due to the different constraints, so we implemented a real multitenant search application! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/llama-index-multitenancy.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/llama-index-multitenancy.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-168-lllmstxt|> ## web-ui - [Documentation](https://qdrant.tech/documentation/) - Qdrant Web UI --- # [Anchor](https://qdrant.tech/documentation/web-ui/\#qdrant-web-ui) Qdrant Web UI You can manage both local and cloud Qdrant deployments through the Web UI. If you’ve set up a deployment locally with the Qdrant [Quickstart](https://qdrant.tech/documentation/quick-start/), navigate to http://localhost:6333/dashboard. If you’ve set up a deployment in a cloud cluster, find your Cluster URL in your cloud dashboard, at [https://cloud.qdrant.io](https://cloud.qdrant.io/). Add `:6333/dashboard` to the end of the URL. ## [Anchor](https://qdrant.tech/documentation/web-ui/\#access-the-web-ui) Access the Web UI Qdrant’s Web UI is an intuitive and efficient graphic interface for your Qdrant Collections, REST API and data points. In the **Console**, you may use the REST API to interact with Qdrant, while in **Collections**, you can manage all the collections and upload Snapshots. ![Qdrant Web UI](https://qdrant.tech/articles_data/qdrant-1.3.x/web-ui.png) ### [Anchor](https://qdrant.tech/documentation/web-ui/\#qdrant-web-ui-features) Qdrant Web UI features In the Qdrant Web UI, you can: - Run HTTP-based calls from the console - List and search existing [collections](https://qdrant.tech/documentation/concepts/collections/) - Learn from our interactive tutorial You can navigate to these options directly. For example, if you used our [quick start](https://qdrant.tech/documentation/quick-start/) to set up a cluster on localhost, you can review our tutorial at http://localhost:6333/dashboard#/tutorial. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/web-ui.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/web-ui.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-169-lllmstxt|> ## recommendation-system-ovhcloud - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Movie Recommendation System --- # [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#movie-recommendation-system) Movie Recommendation System | Time: 120 min | Level: Advanced | Output: [GitHub](https://github.com/infoslack/qdrant-example/blob/main/HC-demo/HC-OVH.ipynb) | | | --- | --- | --- | --- | In this tutorial, you will build a mechanism that recommends movies based on defined preferences. Vector databases like Qdrant are good for storing high-dimensional data, such as user and item embeddings. They can enable personalized recommendations by quickly retrieving similar entries based on advanced indexing techniques. In this specific case, we will use [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) to create an efficient and accurate recommendation system. **Privacy and Sovereignty:** Since preference data is proprietary, it should be stored in a secure and controlled environment. Our vector database can easily be hosted on [OVHcloud](https://ovhcloud.com/), our trusted [Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/) partner. This means that Qdrant can be run from your OVHcloud region, but the database itself can still be managed from within Qdrant Cloud’s interface. Both products have been tested for compatibility and scalability, and we recommend their [managed Kubernetes](https://www.ovhcloud.com/en/public-cloud/kubernetes/) service. > To see the entire output, use our [notebook with complete instructions](https://github.com/infoslack/qdrant-example/blob/main/HC-demo/HC-OVH.ipynb). ## [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#components) Components - **Dataset:** The [MovieLens dataset](https://grouplens.org/datasets/movielens/) contains a list of movies and ratings given by users. - **Cloud:** [OVHcloud](https://ovhcloud.com/), with managed Kubernetes. - **Vector DB:** [Qdrant Hybrid Cloud](https://hybrid-cloud.qdrant.tech/) running on [OVHcloud](https://ovhcloud.com/). **Methodology:** We’re adopting a collaborative filtering approach to construct a recommendation system from the dataset provided. Collaborative filtering works on the premise that if two users share similar tastes, they’re likely to enjoy similar movies. Leveraging this concept, we’ll identify users whose ratings align closely with ours, and explore the movies they liked but we haven’t seen yet. To do this, we’ll represent each user’s ratings as a vector in a high-dimensional, sparse space. Using Qdrant, we’ll index these vectors and search for users whose ratings vectors closely match ours. Ultimately, we will see which movies were enjoyed by users similar to us. ![](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/architecture-diagram.png) ## [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#deploying-qdrant-hybrid-cloud-on-ovhcloud) Deploying Qdrant Hybrid Cloud on OVHcloud [Service Managed Kubernetes](https://www.ovhcloud.com/en-in/public-cloud/kubernetes/), powered by OVH Public Cloud Instances, a leading European cloud provider. With OVHcloud Load Balancers and disks built in. OVHcloud Managed Kubernetes provides high availability, compliance, and CNCF conformance, allowing you to focus on your containerized software layers with total reversibility. 1. To start using managed Kubernetes on OVHcloud, follow the [platform-specific documentation](https://qdrant.tech/documentation/hybrid-cloud/platform-deployment-options/#ovhcloud). 2. Once your Kubernetes clusters are up, [you can begin deploying Qdrant Hybrid Cloud](https://qdrant.tech/documentation/hybrid-cloud/). ## [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#prerequisites) Prerequisites Download and unzip the MovieLens dataset: ```shell mkdir -p data wget https://files.grouplens.org/datasets/movielens/ml-1m.zip unzip ml-1m.zip -d data ``` The necessary \* libraries are installed using `pip`, including `pandas` for data manipulation, `qdrant-client` for interfacing with Qdrant, and `*-dotenv` for managing environment variables. ```python !pip install -U \ pandas \ qdrant-client \ *-dotenv ``` The `.env` file is used to store sensitive information like the Qdrant host URL and API key securely. ```shell QDRANT_HOST QDRANT_API_KEY ``` Load all environment variables into the setup: ```python import os from dotenv import load_dotenv load_dotenv('./.env') ``` ## [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#implementation) Implementation Load the data from the MovieLens dataset into pandas DataFrames to facilitate data manipulation and analysis. ```python from qdrant_client import QdrantClient, models import pandas as pd ``` Load user data: ```python users = pd.read_csv( 'data/ml-1m/users.dat', sep='::', names=['user_id', 'gender', 'age', 'occupation', 'zip'], engine='*' ) users.head() ``` Add movies: ```python movies = pd.read_csv( 'data/ml-1m/movies.dat', sep='::', names=['movie_id', 'title', 'genres'], engine='*', encoding='latin-1' ) movies.head() ``` Finally, add the ratings: ```python ratings = pd.read_csv( 'data/ml-1m/ratings.dat', sep='::', names=['user_id', 'movie_id', 'rating', 'timestamp'], engine='*' ) ratings.head() ``` ### [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#normalize-the-ratings) Normalize the ratings Sparse vectors can use advantage of negative values, so we can normalize ratings to have a mean of 0 and a standard deviation of 1. This normalization ensures that ratings are consistent and centered around zero, enabling accurate similarity calculations. In this scenario we can take into account movies that we don’t like. ```python ratings.rating = (ratings.rating - ratings.rating.mean()) / ratings.rating.std() ``` To get the results: ```python ratings.head() ``` ### [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#data-preparation) Data preparation Now you will transform user ratings into sparse vectors, where each vector represents ratings for different movies. This step prepares the data for indexing in Qdrant. First, create a collection with configured sparse vectors. For sparse vectors, you don’t need to specify the dimension, because it’s extracted from the data automatically. ```python from collections import defaultdict user_sparse_vectors = defaultdict(lambda: {"values": [], "indices": []}) for row in ratings.itertuples(): user_sparse_vectors[row.user_id]["values"].append(row.rating) user_sparse_vectors[row.user_id]["indices"].append(row.movie_id) ``` Connect to Qdrant and create a collection called **movielens**: ```python client = QdrantClient( url = os.getenv("QDRANT_HOST"), api_key = os.getenv("QDRANT_API_KEY") ) client.create_collection( "movielens", vectors_config={}, sparse_vectors_config={ "ratings": models.SparseVectorParams() } ) ``` Upload user ratings to the **movielens** collection in Qdrant as sparse vectors, along with user metadata. This step populates the database with the necessary data for recommendation generation. ```python def data_generator(): for user in users.itertuples(): yield models.PointStruct( id=user.user_id, vector={ "ratings": user_sparse_vectors[user.user_id] }, payload=user._asdict() ) client.upload_points( "movielens", data_generator() ) ``` ## [Anchor](https://qdrant.tech/documentation/examples/recommendation-system-ovhcloud/\#recommendations) Recommendations Personal movie ratings are specified, where positive ratings indicate likes and negative ratings indicate dislikes. These ratings serve as the basis for finding similar users with comparable tastes. Personal ratings are converted into a sparse vector representation suitable for querying Qdrant. This vector represents the user’s preferences across different movies. Let’s try to recommend something for ourselves: ``` 1 = Like -1 = dislike ``` ```python --- # Search with movies[movies.title.str.contains("Matrix", case=False)]. my_ratings = { 2571: 1, # Matrix 329: 1, # Star Trek 260: 1, # Star Wars 2288: -1, # The Thing 1: 1, # Toy Story 1721: -1, # Titanic 296: -1, # Pulp Fiction 356: 1, # Forrest Gump 2116: 1, # Lord of the Rings 1291: -1, # Indiana Jones 1036: -1 # Die Hard } inverse_ratings = {k: -v for k, v in my_ratings.items()} def to_vector(ratings): vector = models.SparseVector( values=[], indices=[] ) for movie_id, rating in ratings.items(): vector.values.append(rating) vector.indices.append(movie_id) return vector ``` Query Qdrant to find users with similar tastes based on the provided personal ratings. The search returns a list of similar users along with their ratings, facilitating collaborative filtering. ```python results = client.query_points( "movielens", query=to_vector(my_ratings), using="ratings", with_vectors=True, # We will use those to find new movies limit=20 ).points ``` Movie scores are computed based on how frequently each movie appears in the ratings of similar users, weighted by their ratings. This step identifies popular movies among users with similar tastes. Calculate how frequently each movie is found in similar users’ ratings ```python def results_to_scores(results): movie_scores = defaultdict(lambda: 0) for user in results: user_scores = user.vector['ratings'] for idx, rating in zip(user_scores.indices, user_scores.values): if idx in my_ratings: continue movie_scores[idx] += rating return movie_scores ``` The top-rated movies are sorted based on their scores and printed as recommendations for the user. These recommendations are tailored to the user’s preferences and aligned with their tastes. Sort movies by score and print top five: ```python movie_scores = results_to_scores(results) top_movies = sorted(movie_scores.items(), key=lambda x: x[1], reverse=True) for movie_id, score in top_movies[:5]: print(movies[movies.movie_id == movie_id].title.values[0], score) ``` Result: ```text Star Wars: Episode V - The Empire Strikes Back (1980) 20.02387858 Star Wars: Episode VI - Return of the Jedi (1983) 16.443184379999998 Princess Bride, The (1987) 15.840068229999996 Raiders of the Lost Ark (1981) 14.94489462 Sixth Sense, The (1999) 14.570322149999999 ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/recommendation-system-ovhcloud.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/recommendation-system-ovhcloud.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-170-lllmstxt|> ## search - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Search --- # [Anchor](https://qdrant.tech/documentation/concepts/search/\#similarity-search) Similarity search Searching for the nearest vectors is at the core of many representational learning applications. Modern neural networks are trained to transform objects into vectors so that objects close in the real world appear close in vector space. It could be, for example, texts with similar meanings, visually similar pictures, or songs of the same genre. ![This is how vector similarity works](https://qdrant.tech/docs/encoders.png) This is how vector similarity works ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#query-api) Query API _Available as of v1.10.0_ Qdrant provides a single interface for all kinds of search and exploration requests - the `Query API`. Here is a reference list of what kind of queries you can perform with the `Query API` in Qdrant: Depending on the `query` parameter, Qdrant might prefer different strategies for the search. | | | | --- | --- | | Nearest Neighbors Search | Vector Similarity Search, also known as k-NN | | Search By Id | Search by an already stored vector - skip embedding model inference | | [Recommendations](https://qdrant.tech/documentation/concepts/explore/#recommendation-api) | Provide positive and negative examples | | [Discovery Search](https://qdrant.tech/documentation/concepts/explore/#discovery-api) | Guide the search using context as a one-shot training set | | [Scroll](https://qdrant.tech/documentation/concepts/points/#scroll-points) | Get all points with optional filtering | | [Grouping](https://qdrant.tech/documentation/concepts/search/#grouping-api) | Group results by a certain field | | [Order By](https://qdrant.tech/documentation/concepts/hybrid-queries/#re-ranking-with-stored-values) | Order points by payload key | | [Hybrid Search](https://qdrant.tech/documentation/concepts/hybrid-queries/#hybrid-search) | Combine multiple queries to get better results | | [Multi-Stage Search](https://qdrant.tech/documentation/concepts/hybrid-queries/#multi-stage-queries) | Optimize performance for large embeddings | | [Random Sampling](https://qdrant.tech/documentation/concepts/search/#random-sampling) | Get random points from the collection | **Nearest Neighbors Search** httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7] // <--- Dense vector } ``` ```python client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], # <--- Dense vector ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], // <--- Dense vector }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{Condition, Filter, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(Query::new_nearest(vec![0.2, 0.1, 0.9, 0.7])) ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collectionName}") .setQuery(nearest(List.of(0.2f, 0.1f, 0.9f, 0.7f))) .build()).get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), }) ``` **Search By Id** httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": "43cf51e2-8777-4f52-bc74-c2cbde0c8b04" // <--- point id } ``` ```python client.query_points( collection_name="{collection_name}", query="43cf51e2-8777-4f52-bc74-c2cbde0c8b04", # <--- point id ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: '43cf51e2-8777-4f52-bc74-c2cbde0c8b04', // <--- point id }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{Condition, Filter, PointId, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(Query::new_nearest(PointId::new("43cf51e2-8777-4f52-bc74-c2cbde0c8b04"))) ) .await?; ``` ```java import java.util.UUID; import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collectionName}") .setQuery(nearest(UUID.fromString("43cf51e2-8777-4f52-bc74-c2cbde0c8b04"))) .build()).get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: Guid.Parse("43cf51e2-8777-4f52-bc74-c2cbde0c8b04") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQueryID(qdrant.NewID("43cf51e2-8777-4f52-bc74-c2cbde0c8b04")), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#metrics) Metrics There are many ways to estimate the similarity of vectors with each other. In Qdrant terms, these ways are called metrics. The choice of metric depends on the vectors obtained and, in particular, on the neural network encoder training method. Qdrant supports these most popular types of metrics: - Dot product: `Dot` \- [https://en.wikipedia.org/wiki/Dot\_product](https://en.wikipedia.org/wiki/Dot_product) - Cosine similarity: `Cosine` \- [https://en.wikipedia.org/wiki/Cosine\_similarity](https://en.wikipedia.org/wiki/Cosine_similarity) - Euclidean distance: `Euclid` \- [https://en.wikipedia.org/wiki/Euclidean\_distance](https://en.wikipedia.org/wiki/Euclidean_distance) - Manhattan distance: `Manhattan`\\*\- [https://en.wikipedia.org/wiki/Taxicab\_geometry](https://en.wikipedia.org/wiki/Taxicab_geometry) _\*Available as of v1.7_ The most typical metric used in similarity learning models is the cosine metric. ![Embeddings](https://qdrant.tech/docs/cos.png) Qdrant counts this metric in 2 steps, due to which a higher search speed is achieved. The first step is to normalize the vector when adding it to the collection. It happens only once for each vector. The second step is the comparison of vectors. In this case, it becomes equivalent to dot production - a very fast operation due to SIMD. Depending on the query configuration, Qdrant might prefer different strategies for the search. Read more about it in the [query planning](https://qdrant.tech/documentation/concepts/search/#query-planning) section. ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#search-api) Search API Let’s look at an example of a search query. REST API - API Schema definition is available [here](https://api.qdrant.tech/api-reference/search/query-points) httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.79], "filter": { "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ] }, "params": { "hnsw_ef": 128, "exact": false }, "limit": 3 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], query_filter=models.Filter( must=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(\ value="London",\ ),\ )\ ] ), search_params=models.SearchParams(hnsw_ef=128, exact=False), limit=3, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], filter: { must: [\ {\ key: "city",\ match: {\ value: "London",\ },\ },\ ], }, params: { hnsw_ef: 128, exact: false, }, limit: 3, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, QueryPointsBuilder, SearchParamsBuilder}; use qdrant_client::Qdrant; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .filter(Filter::must([Condition::matches(\ "city",\ "London".to_string(),\ )])) .params(SearchParamsBuilder::default().hnsw_ef(128).exact(false)), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SearchParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setFilter(Filter.newBuilder().addMust(matchKeyword("city", "London")).build()) .setParams(SearchParams.newBuilder().setExact(false).setHnswEf(128).build()) .setLimit(3) .build()).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, filter: MatchKeyword("city", "London"), searchParams: new SearchParams { Exact = false, HnswEf = 128 }, limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), }, }, Params: &qdrant.SearchParams{ Exact: qdrant.PtrOf(false), HnswEf: qdrant.PtrOf(uint64(128)), }, }) ``` In this example, we are looking for vectors similar to vector `[0.2, 0.1, 0.9, 0.7]`. Parameter `limit` (or its alias - `top`) specifies the amount of most similar results we would like to retrieve. Values under the key `params` specify custom parameters for the search. Currently, it could be: - `hnsw_ef` \- value that specifies `ef` parameter of the HNSW algorithm. - `exact` \- option to not use the approximate search (ANN). If set to true, the search may run for a long as it performs a full scan to retrieve exact results. - `indexed_only` \- With this option you can disable the search in those segments where vector index is not built yet. This may be useful if you want to minimize the impact to the search performance whilst the collection is also being updated. Using this option may lead to a partial result if the collection is not fully indexed yet, consider using it only if eventual consistency is acceptable for your use case. Since the `filter` parameter is specified, the search is performed only among those points that satisfy the filter condition. See details of possible filters and their work in the [filtering](https://qdrant.tech/documentation/concepts/filtering/) section. Example result of this API would be ```json { "result": [\ { "id": 10, "score": 0.81 },\ { "id": 14, "score": 0.75 },\ { "id": 11, "score": 0.73 }\ ], "status": "ok", "time": 0.001 } ``` The `result` contains ordered by `score` list of found point ids. Note that payload and vector data is missing in these results by default. See [payload and vector in the result](https://qdrant.tech/documentation/concepts/search/#payload-and-vector-in-the-result) on how to include it. If the collection was created with multiple vectors, the name of the vector to use for searching should be provided: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "using": "image", "limit": 3 } ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], using="image", limit=3, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], using: "image", limit: 3, }); ``` ```rust use qdrant_client::qdrant::QueryPointsBuilder; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .using("image"), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setUsing("image") .setLimit(3) .build()).get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, usingVector: "image", limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Using: qdrant.PtrOf("image"), }) ``` Search is processing only among vectors with the same name. If the collection was created with sparse vectors, the name of the sparse vector to use for searching should be provided: You can still use payload filtering and other features of the search API with sparse vectors. There are however important differences between dense and sparse vector search: | Index | Sparse Query | Dense Query | | --- | --- | --- | | Scoring Metric | Default is `Dot product`, no need to specify it | `Distance` has supported metrics e.g. Dot, Cosine | | Search Type | Always exact in Qdrant | HNSW is an approximate NN | | Return Behaviour | Returns only vectors with non-zero values in the same indices as the query vector | Returns `limit` vectors | In general, the speed of the search is proportional to the number of non-zero values in the query vector. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": { "indices": [1, 3, 5, 7], "values": [0.1, 0.2, 0.3, 0.4] }, "using": "text" } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") result = client.query_points( collection_name="{collection_name}", query=models.SparseVector(indices=[1, 3, 5, 7], values=[0.1, 0.2, 0.3, 0.4]), using="text", ).points ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: { indices: [1, 3, 5, 7], values: [0.1, 0.2, 0.3, 0.4] }, using: "text", limit: 3, }); ``` ```rust use qdrant_client::qdrant::QueryPointsBuilder; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![(1, 0.2), (3, 0.1), (5, 0.9), (7, 0.7)]) .limit(10) .using("text"), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setUsing("text") .setQuery(nearest(List.of(0.1f, 0.2f, 0.3f, 0.4f), List.of(1, 3, 5, 7))) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new (float, uint)[] {(0.1f, 1), (0.2f, 3), (0.3f, 5), (0.4f, 7)}, usingVector: "text", limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuerySparse( []uint32{1, 3, 5, 7}, []float32{0.1, 0.2, 0.3, 0.4}), Using: qdrant.PtrOf("text"), }) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/search/\#filtering-results-by-score) Filtering results by score In addition to payload filtering, it might be useful to filter out results with a low similarity score. For example, if you know the minimal acceptance score for your model and do not want any results which are less similar than the threshold. In this case, you can use `score_threshold` parameter of the search query. It will exclude all results with a score worse than the given. ### [Anchor](https://qdrant.tech/documentation/concepts/search/\#payload-and-vector-in-the-result) Payload and vector in the result By default, retrieval methods do not return any stored information such as payload and vectors. Additional parameters `with_vectors` and `with_payload` alter this behavior. Example: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "with_vectors": true, "with_payload": true } ``` ```python client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], with_vectors=True, with_payload=True, ) ``` ```typescript client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], with_vector: true, with_payload: true, }); ``` ```rust use qdrant_client::qdrant::QueryPointsBuilder; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .with_payload(true) .with_vectors(true), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.WithVectorsSelectorFactory; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.WithPayloadSelectorFactory.enable; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setWithPayload(enable(true)) .setWithVectors(WithVectorsSelectorFactory.enable(true)) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, payloadSelector: true, vectorsSelector: true, limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), WithPayload: qdrant.NewWithPayload(true), WithVectors: qdrant.NewWithVectors(true), }) ``` You can use `with_payload` to scope to or filter a specific payload subset. You can even specify an array of items to include, such as `city`, `village`, and `town`: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "with_payload": ["city", "village", "town"] } ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], with_payload=["city", "village", "town"], ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], with_payload: ["city", "village", "town"], }); ``` ```rust use qdrant_client::qdrant::{with_payload_selector::SelectorOptions, QueryPointsBuilder}; use qdrant_client::Qdrant; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .with_payload(SelectorOptions::Include( vec![\ "city".to_string(),\ "village".to_string(),\ "town".to_string(),\ ] .into(), )) .with_vectors(true), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.WithPayloadSelectorFactory.include; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setWithPayload(include(List.of("city", "village", "town"))) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, payloadSelector: new WithPayloadSelector { Include = new PayloadIncludeSelector { Fields = { new string[] { "city", "village", "town" } } } }, limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), WithPayload: qdrant.NewWithPayloadInclude("city", "village", "town"), }) ``` Or use `include` or `exclude` explicitly. For example, to exclude `city`: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "with_payload": { "exclude": ["city"] } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], with_payload=models.PayloadSelectorExclude( exclude=["city"], ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], with_payload: { exclude: ["city"], }, }); ``` ```rust use qdrant_client::qdrant::{with_payload_selector::SelectorOptions, QueryPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .with_payload(SelectorOptions::Exclude(vec!["city".to_string()].into())) .with_vectors(true), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.WithPayloadSelectorFactory.exclude; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setWithPayload(exclude(List.of("city"))) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, payloadSelector: new WithPayloadSelector { Exclude = new PayloadExcludeSelector { Fields = { new string[] { "city" } } } }, limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), WithPayload: qdrant.NewWithPayloadExclude("city"), }) ``` It is possible to target nested fields using a dot notation: - `payload.nested_field` \- for a nested field - `payload.nested_array[].sub_field` \- for projecting nested fields within an array Accessing array elements by index is currently not supported. ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#batch-search-api) Batch search API The batch search API enables to perform multiple search requests via a single request. Its semantic is straightforward, `n` batched search requests are equivalent to `n` singular search requests. This approach has several advantages. Logically, fewer network connections are required which can be very beneficial on its own. More importantly, batched requests will be efficiently processed via the query planner which can detect and optimize requests if they have the same `filter`. This can have a great effect on latency for non trivial filters as the intermediary results can be shared among the request. In order to use it, simply pack together your search requests. All the regular attributes of a search request are of course available. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query/batch { "searches": [\ {\ "query": [0.2, 0.1, 0.9, 0.7],\ "filter": {\ "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ]\ },\ "limit": 3\ },\ {\ "query": [0.5, 0.3, 0.2, 0.3],\ "filter": {\ "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ]\ },\ "limit": 3\ }\ ] } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") filter_ = models.Filter( must=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(\ value="London",\ ),\ )\ ] ) search_queries = [\ models.QueryRequest(query=[0.2, 0.1, 0.9, 0.7], filter=filter_, limit=3),\ models.QueryRequest(query=[0.5, 0.3, 0.2, 0.3], filter=filter_, limit=3),\ ] client.query_batch_points(collection_name="{collection_name}", requests=search_queries) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); const filter = { must: [\ {\ key: "city",\ match: {\ value: "London",\ },\ },\ ], }; const searches = [\ {\ query: [0.2, 0.1, 0.9, 0.7],\ filter,\ limit: 3,\ },\ {\ query: [0.5, 0.3, 0.2, 0.3],\ filter,\ limit: 3,\ },\ ]; client.queryBatch("{collection_name}", { searches, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, QueryBatchPointsBuilder, QueryPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let filter = Filter::must([Condition::matches("city", "London".to_string())]); let searches = vec![\ QueryPointsBuilder::new("{collection_name}")\ .query(vec![0.1, 0.2, 0.3, 0.4])\ .limit(3)\ .filter(filter.clone())\ .build(),\ QueryPointsBuilder::new("{collection_name}")\ .query(vec![0.5, 0.3, 0.2, 0.3])\ .limit(3)\ .filter(filter)\ .build(),\ ]; client .query_batch(QueryBatchPointsBuilder::new("{collection_name}", searches)) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.ConditionFactory.matchKeyword; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); Filter filter = Filter.newBuilder().addMust(matchKeyword("city", "London")).build(); List searches = List.of( QueryPoints.newBuilder() .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setFilter(filter) .setLimit(3) .build(), QueryPoints.newBuilder() .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setFilter(filter) .setLimit(3) .build()); client.queryBatchAsync("{collection_name}", searches).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); var filter = MatchKeyword("city", "London"); var queries = new List { new() { CollectionName = "{collection_name}", Query = new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, Filter = filter, Limit = 3 }, new() { CollectionName = "{collection_name}", Query = new float[] { 0.5f, 0.3f, 0.2f, 0.3f }, Filter = filter, Limit = 3 } }; await client.QueryBatchAsync(collectionName: "{collection_name}", queries: queries); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) filter := qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), }, } client.QueryBatch(context.Background(), &qdrant.QueryBatchPoints{ CollectionName: "{collection_name}", QueryPoints: []*qdrant.QueryPoints{ { CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Filter: &filter, }, { CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.5, 0.3, 0.2, 0.3), Filter: &filter, }, }, }) ``` The result of this API contains one array per search requests. ```json { "result": [\ [\ { "id": 10, "score": 0.81 },\ { "id": 14, "score": 0.75 },\ { "id": 11, "score": 0.73 }\ ],\ [\ { "id": 1, "score": 0.92 },\ { "id": 3, "score": 0.89 },\ { "id": 9, "score": 0.75 }\ ]\ ], "status": "ok", "time": 0.001 } ``` ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#query-by-id) Query by ID Whenever you need to use a vector as an input, you can always use a [point ID](https://qdrant.tech/documentation/concepts/points/#point-ids) instead. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": "43cf51e2-8777-4f52-bc74-c2cbde0c8b04" // <--- point id } ``` ```python client.query_points( collection_name="{collection_name}", query="43cf51e2-8777-4f52-bc74-c2cbde0c8b04", # <--- point id ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: '43cf51e2-8777-4f52-bc74-c2cbde0c8b04', // <--- point id }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{Condition, Filter, PointId, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(Query::new_nearest(PointId::new("43cf51e2-8777-4f52-bc74-c2cbde0c8b04"))) ) .await?; ``` ```java import java.util.UUID; import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collectionName}") .setQuery(nearest(UUID.fromString("43cf51e2-8777-4f52-bc74-c2cbde0c8b04"))) .build()).get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: Guid.Parse("43cf51e2-8777-4f52-bc74-c2cbde0c8b04") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQueryID(qdrant.NewID("43cf51e2-8777-4f52-bc74-c2cbde0c8b04")), }) ``` The above example will fetch the default vector from the point with this id, and use it as the query vector. If the `using` parameter is also specified, Qdrant will use the vector with that name. It is also possible to reference an ID from a different collection, by setting the `lookup_from` parameter. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": "43cf51e2-8777-4f52-bc74-c2cbde0c8b04", // <--- point id "using": "512d-vector" "lookup_from": { "collection": "another_collection", // <--- other collection name "vector": "image-512" // <--- vector name in the other collection } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query="43cf51e2-8777-4f52-bc74-c2cbde0c8b04", # <--- point id using="512d-vector", lookup_from=models.LookupLocation( collection="another_collection", # <--- other collection name vector="image-512", # <--- vector name in the other collection ) ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: '43cf51e2-8777-4f52-bc74-c2cbde0c8b04', // <--- point id using: '512d-vector', lookup_from: { collection: 'another_collection', // <--- other collection name vector: 'image-512', // <--- vector name in the other collection } }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{LookupLocationBuilder, PointId, Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; client.query( QueryPointsBuilder::new("{collection_name}") .query(Query::new_nearest("43cf51e2-8777-4f52-bc74-c2cbde0c8b04")) .using("512d-vector") .lookup_from( LookupLocationBuilder::new("another_collection") .vector_name("image-512") ) ).await?; ``` ```java import static io.qdrant.client.QueryFactory.nearest; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.LookupLocation; import io.qdrant.client.grpc.Points.QueryPoints; import java.util.UUID; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(UUID.fromString("43cf51e2-8777-4f52-bc74-c2cbde0c8b04"))) .setUsing("512d-vector") .setLookupFrom( LookupLocation.newBuilder() .setCollectionName("another_collection") .setVectorName("image-512") .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: Guid.Parse("43cf51e2-8777-4f52-bc74-c2cbde0c8b04"), // <--- point id usingVector: "512d-vector", lookupFrom: new() { CollectionName = "another_collection", // <--- other collection name VectorName = "image-512" // <--- vector name in the other collection } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQueryID(qdrant.NewID("43cf51e2-8777-4f52-bc74-c2cbde0c8b04")), Using: qdrant.PtrOf("512d-vector"), LookupFrom: &qdrant.LookupLocation{ CollectionName: "another_collection", VectorName: qdrant.PtrOf("image-512"), }, }) ``` In the case above, Qdrant will fetch the `"image-512"` vector from the specified point id in the collection `another_collection`. ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#pagination) Pagination Search and [recommendation](https://qdrant.tech/documentation/concepts/explore/#recommendation-api) APIs allow to skip first results of the search and return only the result starting from some specified offset: Example: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "with_vectors": true, "with_payload": true, "limit": 10, "offset": 100 } ``` ```python from qdrant_client import QdrantClient client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], with_vectors=True, with_payload=True, limit=10, offset=100, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], with_vector: true, with_payload: true, limit: 10, offset: 100, }); ``` ```rust use qdrant_client::qdrant::QueryPointsBuilder; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .with_payload(true) .with_vectors(true) .limit(10) .offset(100), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.WithVectorsSelectorFactory; import io.qdrant.client.grpc.Points.QueryPoints; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.WithPayloadSelectorFactory.enable; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setWithPayload(enable(true)) .setWithVectors(WithVectorsSelectorFactory.enable(true)) .setLimit(10) .setOffset(100) .build()) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, payloadSelector: true, vectorsSelector: true, limit: 10, offset: 100 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), WithPayload: qdrant.NewWithPayload(true), WithVectors: qdrant.NewWithVectors(true), Offset: qdrant.PtrOf(uint64(100)), }) ``` Is equivalent to retrieving the 11th page with 10 records per page. Vector-based retrieval in general and HNSW index in particular, are not designed to be paginated. It is impossible to retrieve Nth closest vector without retrieving the first N vectors first. However, using the offset parameter saves the resources by reducing network traffic and the number of times the storage is accessed. Using an `offset` parameter, will require to internally retrieve `offset + limit` points, but only access payload and vector from the storage those points which are going to be actually returned. ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#grouping-api) Grouping API It is possible to group results by a certain field. This is useful when you have multiple points for the same item, and you want to avoid redundancy of the same item in the results. For example, if you have a large document split into multiple chunks, and you want to search or [recommend](https://qdrant.tech/documentation/concepts/explore/#recommendation-api) on a per-document basis, you can group the results by the document ID. Consider having points with the following payloads: ```json [\ {\ "id": 0,\ "payload": {\ "chunk_part": 0,\ "document_id": "a"\ },\ "vector": [0.91]\ },\ {\ "id": 1,\ "payload": {\ "chunk_part": 1,\ "document_id": ["a", "b"]\ },\ "vector": [0.8]\ },\ {\ "id": 2,\ "payload": {\ "chunk_part": 2,\ "document_id": "a"\ },\ "vector": [0.2]\ },\ {\ "id": 3,\ "payload": {\ "chunk_part": 0,\ "document_id": 123\ },\ "vector": [0.79]\ },\ {\ "id": 4,\ "payload": {\ "chunk_part": 1,\ "document_id": 123\ },\ "vector": [0.75]\ },\ {\ "id": 5,\ "payload": {\ "chunk_part": 0,\ "document_id": -10\ },\ "vector": [0.6]\ }\ ] ``` With the _**groups**_ API, you will be able to get the best _N_ points for each document, assuming that the payload of the points contains the document ID. Of course there will be times where the best _N_ points cannot be fulfilled due to lack of points or a big distance with respect to the query. In every case, the `group_size` is a best-effort parameter, akin to the `limit` parameter. ### [Anchor](https://qdrant.tech/documentation/concepts/search/\#search-groups) Search groups REST API ( [Schema](https://api.qdrant.tech/api-reference/search/query-points-groups)): httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query/groups { // Same as in the regular query API "query": [1.1], // Grouping parameters "group_by": "document_id", // Path of the field to group by "limit": 4, // Max amount of groups "group_size": 2 // Max amount of points per group } ``` ```python client.query_points_groups( collection_name="{collection_name}", # Same as in the regular query_points() API query=[1.1], # Grouping parameters group_by="document_id", # Path of the field to group by limit=4, # Max amount of groups group_size=2, # Max amount of points per group ) ``` ```typescript client.queryGroups("{collection_name}", { query: [1.1], group_by: "document_id", limit: 4, group_size: 2, }); ``` ```rust use qdrant_client::qdrant::QueryPointGroupsBuilder; client .query_groups( QueryPointGroupsBuilder::new("{collection_name}", "document_id") .query(vec![0.2, 0.1, 0.9, 0.7]) .group_size(2u64) .with_payload(true) .with_vectors(true) .limit(4u64), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.grpc.Points.SearchPointGroups; client.queryGroupsAsync( QueryPointGroups.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setGroupBy("document_id") .setLimit(4) .setGroupSize(2) .build()) .get(); ``` ```csharp using Qdrant.Client; var client = new QdrantClient("localhost", 6334); await client.QueryGroupsAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, groupBy: "document_id", limit: 4, groupSize: 2 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.QueryGroups(context.Background(), &qdrant.QueryPointGroups{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), GroupBy: "document_id", GroupSize: qdrant.PtrOf(uint64(2)), }) ``` The output of a _**groups**_ call looks like this: ```json { "result": { "groups": [\ {\ "id": "a",\ "hits": [\ { "id": 0, "score": 0.91 },\ { "id": 1, "score": 0.85 }\ ]\ },\ {\ "id": "b",\ "hits": [\ { "id": 1, "score": 0.85 }\ ]\ },\ {\ "id": 123,\ "hits": [\ { "id": 3, "score": 0.79 },\ { "id": 4, "score": 0.75 }\ ]\ },\ {\ "id": -10,\ "hits": [\ { "id": 5, "score": 0.6 }\ ]\ }\ ] }, "status": "ok", "time": 0.001 } ``` The groups are ordered by the score of the top point in the group. Inside each group the points are sorted too. If the `group_by` field of a point is an array (e.g. `"document_id": ["a", "b"]`), the point can be included in multiple groups (e.g. `"document_id": "a"` and `document_id: "b"`). **Limitations**: - Only [keyword](https://qdrant.tech/documentation/concepts/payload/#keyword) and [integer](https://qdrant.tech/documentation/concepts/payload/#integer) payload values are supported for the `group_by` parameter. Payload values with other types will be ignored. - At the moment, pagination is not enabled when using **groups**, so the `offset` parameter is not allowed. ### [Anchor](https://qdrant.tech/documentation/concepts/search/\#lookup-in-groups) Lookup in groups Having multiple points for parts of the same item often introduces redundancy in the stored data. Which may be fine if the information shared by the points is small, but it can become a problem if the payload is large, because it multiplies the storage space needed to store the points by a factor of the amount of points we have per group. One way of optimizing storage when using groups is to store the information shared by the points with the same group id in a single point in another collection. Then, when using the [**groups** API](https://qdrant.tech/documentation/concepts/search/#grouping-api), add the `with_lookup` parameter to bring the information from those points into each group. ![Group id matches point id](https://qdrant.tech/docs/lookup_id_linking.png) This has the extra benefit of having a single point to update when the information shared by the points in a group changes. For example, if you have a collection of documents, you may want to chunk them and store the points for the chunks in a separate collection, making sure that you store the point id from the document it belongs in the payload of the chunk point. In this case, to bring the information from the documents into the chunks grouped by the document id, you can use the `with_lookup` parameter: httppythontypescriptrustjavacsharpgo ```http POST /collections/chunks/points/query/groups { // Same as in the regular query API "query": [1.1], // Grouping parameters "group_by": "document_id", "limit": 2, "group_size": 2, // Lookup parameters "with_lookup": { // Name of the collection to look up points in "collection": "documents", // Options for specifying what to bring from the payload // of the looked up point, true by default "with_payload": ["title", "text"], // Options for specifying what to bring from the vector(s) // of the looked up point, true by default "with_vectors": false } } ``` ```python client.query_points_groups( collection_name="chunks", # Same as in the regular search() API query=[1.1], # Grouping parameters group_by="document_id", # Path of the field to group by limit=2, # Max amount of groups group_size=2, # Max amount of points per group # Lookup parameters with_lookup=models.WithLookup( # Name of the collection to look up points in collection="documents", # Options for specifying what to bring from the payload # of the looked up point, True by default with_payload=["title", "text"], # Options for specifying what to bring from the vector(s) # of the looked up point, True by default with_vectors=False, ), ) ``` ```typescript client.queryGroups("{collection_name}", { query: [1.1], group_by: "document_id", limit: 2, group_size: 2, with_lookup: { collection: "documents", with_payload: ["title", "text"], with_vectors: false, }, }); ``` ```rust use qdrant_client::qdrant::{with_payload_selector::SelectorOptions, QueryPointGroupsBuilder, WithLookupBuilder}; client .query_groups( QueryPointGroupsBuilder::new("{collection_name}", "document_id") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(2u64) .limit(2u64) .with_lookup( WithLookupBuilder::new("documents") .with_payload(SelectorOptions::Include( vec!["title".to_string(), "text".to_string()].into(), )) .with_vectors(false), ), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.grpc.Points.QueryPointGroups; import io.qdrant.client.grpc.Points.WithLookup; import static io.qdrant.client.QueryFactory.nearest; import static io.qdrant.client.WithVectorsSelectorFactory.enable; import static io.qdrant.client.WithPayloadSelectorFactory.include; client.queryGroupsAsync( QueryPointGroups.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setGroupBy("document_id") .setLimit(2) .setGroupSize(2) .setWithLookup( WithLookup.newBuilder() .setCollection("documents") .setWithPayload(include(List.of("title", "text"))) .setWithVectors(enable(false)) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.SearchGroupsAsync( collectionName: "{collection_name}", vector: new float[] { 0.2f, 0.1f, 0.9f, 0.7f}, groupBy: "document_id", limit: 2, groupSize: 2, withLookup: new WithLookup { Collection = "documents", WithPayload = new WithPayloadSelector { Include = new PayloadIncludeSelector { Fields = { new string[] { "title", "text" } } } }, WithVectors = false } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.QueryGroups(context.Background(), &qdrant.QueryPointGroups{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), GroupBy: "document_id", GroupSize: qdrant.PtrOf(uint64(2)), WithLookup: &qdrant.WithLookup{ Collection: "documents", WithPayload: qdrant.NewWithPayloadInclude("title", "text"), }, }) ``` For the `with_lookup` parameter, you can also use the shorthand `with_lookup="documents"` to bring the whole payload and vector(s) without explicitly specifying it. The looked up result will show up under `lookup` in each group. ```json { "result": { "groups": [\ {\ "id": 1,\ "hits": [\ { "id": 0, "score": 0.91 },\ { "id": 1, "score": 0.85 }\ ],\ "lookup": {\ "id": 1,\ "payload": {\ "title": "Document A",\ "text": "This is document A"\ }\ }\ },\ {\ "id": 2,\ "hits": [\ { "id": 1, "score": 0.85 }\ ],\ "lookup": {\ "id": 2,\ "payload": {\ "title": "Document B",\ "text": "This is document B"\ }\ }\ }\ ] }, "status": "ok", "time": 0.001 } ``` Since the lookup is done by matching directly with the point id, the lookup collection must be pre-populated with points where the `id` matches the `group_by` value (e.g., document\_id) from your primary collection. Any group id that is not an existing (and valid) point id in the lookup collection will be ignored, and the `lookup` field will be empty. ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#random-sampling) Random Sampling _Available as of v1.11.0_ In some cases it might be useful to retrieve a random sample of points from the collection. This can be useful for debugging, testing, or for providing entry points for exploration. Random sampling API is a part of [Universal Query API](https://qdrant.tech/documentation/concepts/search/#query-api) and can be used in the same way as regular search API. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": { "sample": "random" } } ``` ```python from qdrant_client import QdrantClient, models sampled = client.query_points( collection_name="{collection_name}", query=models.SampleQuery(sample=models.Sample.RANDOM) ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); const sampled = await client.query("{collection_name}", { query: { sample: "random", }, }); ``` ```rust use qdrant_client::Qdrant; use qdrant_client::qdrant::{Query, QueryPointsBuilder}; let client = Qdrant::from_url("http://localhost:6334").build()?; let sampled = client .query( QueryPointsBuilder::new("{collection_name}") .query(Query::new_sample(Sample::Random)) ) .await?; ``` ```java import static io.qdrant.client.QueryFactory.sample; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.Sample; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(sample(Sample.Random)) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync(collectionName: "{collection_name}", query: Sample.Random); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.QueryGroups(context.Background(), &qdrant.QueryPointGroups{ CollectionName: "{collection_name}", Query: qdrant.NewQuerySample(qdrant.Sample_Random), }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/search/\#query-planning) Query planning Depending on the filter used in the search - there are several possible scenarios for query execution. Qdrant chooses one of the query execution options depending on the available indexes, the complexity of the conditions and the cardinality of the filtering result. This process is called query planning. The strategy selection process relies heavily on heuristics and can vary from release to release. However, the general principles are: - planning is performed for each segment independently (see [storage](https://qdrant.tech/documentation/concepts/storage/) for more information about segments) - prefer a full scan if the amount of points is below a threshold - estimate the cardinality of a filtered result before selecting a strategy - retrieve points using payload index (see [indexing](https://qdrant.tech/documentation/concepts/indexing/)) if cardinality is below threshold - use filterable vector index if the cardinality is above a threshold You can adjust the threshold using a [configuration file](https://github.com/qdrant/qdrant/blob/master/config/config.yaml), as well as independently for each collection. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-171-lllmstxt|> ## quantization - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Quantization --- # [Anchor](https://qdrant.tech/documentation/guides/quantization/\#quantization) Quantization Quantization is an optional feature in Qdrant that enables efficient storage and search of high-dimensional vectors. By transforming original vectors into a new representations, quantization compresses data while preserving close to original relative distances between vectors. Different quantization methods have different mechanics and tradeoffs. We will cover them in this section. Quantization is primarily used to reduce the memory footprint and accelerate the search process in high-dimensional vector spaces. In the context of the Qdrant, quantization allows you to optimize the search engine for specific use cases, striking a balance between accuracy, storage efficiency, and search speed. There are tradeoffs associated with quantization. On the one hand, quantization allows for significant reductions in storage requirements and faster search times. This can be particularly beneficial in large-scale applications where minimizing the use of resources is a top priority. On the other hand, quantization introduces an approximation error, which can lead to a slight decrease in search quality. The level of this tradeoff depends on the quantization method and its parameters, as well as the characteristics of the data. ## [Anchor](https://qdrant.tech/documentation/guides/quantization/\#scalar-quantization) Scalar Quantization _Available as of v1.1.0_ Scalar quantization, in the context of vector search engines, is a compression technique that compresses vectors by reducing the number of bits used to represent each vector component. For instance, Qdrant uses 32-bit floating numbers to represent the original vector components. Scalar quantization allows you to reduce the number of bits used to 8. In other words, Qdrant performs `float32 -> uint8` conversion for each vector component. Effectively, this means that the amount of memory required to store a vector is reduced by a factor of 4. In addition to reducing the memory footprint, scalar quantization also speeds up the search process. Qdrant uses a special SIMD CPU instruction to perform fast vector comparison. This instruction works with 8-bit integers, so the conversion to `uint8` allows Qdrant to perform the comparison faster. The main drawback of scalar quantization is the loss of accuracy. The `float32 -> uint8` conversion introduces an error that can lead to a slight decrease in search quality. However, this error is usually negligible, and tends to be less significant for high-dimensional vectors. In our experiments, we found that the error introduced by scalar quantization is usually less than 1%. However, this value depends on the data and the quantization parameters. Please refer to the [Quantization Tips](https://qdrant.tech/documentation/guides/quantization/#quantization-tips) section for more information on how to optimize the quantization parameters for your use case. ## [Anchor](https://qdrant.tech/documentation/guides/quantization/\#binary-quantization) Binary Quantization _Available as of v1.5.0_ Binary quantization is an extreme case of scalar quantization. This feature lets you represent each vector component as a single bit, effectively reducing the memory footprint by a **factor of 32**. This is the fastest quantization method, since it lets you perform a vector comparison with a few CPU instructions. Binary quantization can achieve up to a **40x** speedup compared to the original vectors. However, binary quantization is only efficient for high-dimensional vectors and require a centered distribution of vector components. At the moment, binary quantization shows good accuracy results with the following models: - OpenAI `text-embedding-ada-002` \- 1536d tested with [dbpedia dataset](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) achieving 0.98 recall@100 with 4x oversampling - Cohere AI `embed-english-v2.0` \- 4096d tested on Wikipedia embeddings - 0.98 recall@50 with 2x oversampling Models with a lower dimensionality or a different distribution of vector components may require additional experiments to find the optimal quantization parameters. We recommend using binary quantization only with rescoring enabled, as it can significantly improve the search quality with just a minor performance impact. Additionally, oversampling can be used to tune the tradeoff between search speed and search quality in the query time. ### [Anchor](https://qdrant.tech/documentation/guides/quantization/\#binary-quantization-as-hamming-distance) Binary Quantization as Hamming Distance The additional benefit of this method is that you can efficiently emulate Hamming distance with dot product. Specifically, if original vectors contain `{-1, 1}` as possible values, then the dot product of two vectors is equal to the Hamming distance by simply replacing `-1` with `0` and `1` with `1`. **Sample truth table** | Vector 1 | Vector 2 | Dot product | | --- | --- | --- | | 1 | 1 | 1 | | 1 | -1 | -1 | | -1 | 1 | -1 | | -1 | -1 | 1 | | Vector 1 | Vector 2 | Hamming distance | | --- | --- | --- | | 1 | 1 | 0 | | 1 | 0 | 1 | | 0 | 1 | 1 | | 0 | 0 | 0 | As you can see, both functions are equal up to a constant factor, which makes similarity search equivalent. Binary quantization makes it efficient to compare vectors using this representation. ## [Anchor](https://qdrant.tech/documentation/guides/quantization/\#product-quantization) Product Quantization _Available as of v1.2.0_ Product quantization is a method of compressing vectors to minimize their memory usage by dividing them into chunks and quantizing each segment individually. Each chunk is approximated by a centroid index that represents the original vector component. The positions of the centroids are determined through the utilization of a clustering algorithm such as k-means. For now, Qdrant uses only 256 centroids, so each centroid index can be represented by a single byte. Product quantization can compress by a more prominent factor than a scalar one. But there are some tradeoffs. Product quantization distance calculations are not SIMD-friendly, so it is slower than scalar quantization. Also, product quantization has a loss of accuracy, so it is recommended to use it only for high-dimensional vectors. Please refer to the [Quantization Tips](https://qdrant.tech/documentation/guides/quantization/#quantization-tips) section for more information on how to optimize the quantization parameters for your use case. ## [Anchor](https://qdrant.tech/documentation/guides/quantization/\#how-to-choose-the-right-quantization-method) How to choose the right quantization method Here is a brief table of the pros and cons of each quantization method: | Quantization method | Accuracy | Speed | Compression | | --- | --- | --- | --- | | Scalar | 0.99 | up to x2 | 4 | | Product | 0.7 | 0.5 | up to 64 | | Binary | 0.95\* | up to x40 | 32 | `*` \- for compatible models - **Binary Quantization** is the fastest method and the most memory-efficient, but it requires a centered distribution of vector components. It is recommended to use with tested models only. - **Scalar Quantization** is the most universal method, as it provides a good balance between accuracy, speed, and compression. It is recommended as default quantization if binary quantization is not applicable. - **Product Quantization** may provide a better compression ratio, but it has a significant loss of accuracy and is slower than scalar quantization. It is recommended if the memory footprint is the top priority and the search speed is not critical. ## [Anchor](https://qdrant.tech/documentation/guides/quantization/\#setting-up-quantization-in-qdrant) Setting up Quantization in Qdrant You can configure quantization for a collection by specifying the quantization parameters in the `quantization_config` section of the collection configuration. Quantization will be automatically applied to all vectors during the indexation process. Quantized vectors are stored alongside the original vectors in the collection, so you will still have access to the original vectors if you need them. _Available as of v1.1.1_ The `quantization_config` can also be set on a per vector basis by specifying it in a named vector. ### [Anchor](https://qdrant.tech/documentation/guides/quantization/\#setting-up-scalar-quantization) Setting up Scalar Quantization To enable scalar quantization, you need to specify the quantization parameters in the `quantization_config` section of the collection configuration. When enabling scalar quantization on an existing collection, use a PATCH request or the corresponding `update_collection` method and omit the vector configuration, as it’s already defined. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "quantization_config": { "scalar": { "type": "int8", "quantile": 0.99, "always_ram": true } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, quantile=0.99, always_ram=True, ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, quantization_config: { scalar: { type: "int8", quantile: 0.99, always_ram: true, }, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, QuantizationType, ScalarQuantizationBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .quantization_config( ScalarQuantizationBuilder::default() .r#type(QuantizationType::Int8.into()) .quantile(0.99) .always_ram(true), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.QuantizationConfig; import io.qdrant.client.grpc.Collections.QuantizationType; import io.qdrant.client.grpc.Collections.ScalarQuantization; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setQuantizationConfig( QuantizationConfig.newBuilder() .setScalar( ScalarQuantization.newBuilder() .setType(QuantizationType.Int8) .setQuantile(0.99f) .setAlwaysRam(true) .build()) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, quantizationConfig: new QuantizationConfig { Scalar = new ScalarQuantization { Type = QuantizationType.Int8, Quantile = 0.99f, AlwaysRam = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), QuantizationConfig: qdrant.NewQuantizationScalar( &qdrant.ScalarQuantization{ Type: qdrant.QuantizationType_Int8, Quantile: qdrant.PtrOf(float32(0.99)), AlwaysRam: qdrant.PtrOf(true), }, ), }) ``` There are 3 parameters that you can specify in the `quantization_config` section: `type` \- the type of the quantized vector components. Currently, Qdrant supports only `int8`. `quantile` \- the quantile of the quantized vector components. The quantile is used to calculate the quantization bounds. For instance, if you specify `0.99` as the quantile, 1% of extreme values will be excluded from the quantization bounds. Using quantiles lower than `1.0` might be useful if there are outliers in your vector components. This parameter only affects the resulting precision and not the memory footprint. It might be worth tuning this parameter if you experience a significant decrease in search quality. `always_ram` \- whether to keep quantized vectors always cached in RAM or not. By default, quantized vectors are loaded in the same way as the original vectors. However, in some setups you might want to keep quantized vectors in RAM to speed up the search process. In this case, you can set `always_ram` to `true` to store quantized vectors in RAM. ### [Anchor](https://qdrant.tech/documentation/guides/quantization/\#setting-up-binary-quantization) Setting up Binary Quantization To enable binary quantization, you need to specify the quantization parameters in the `quantization_config` section of the collection configuration. When enabling binary quantization on an existing collection, use a PATCH request or the corresponding `update_collection` method and omit the vector configuration, as it’s already defined. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 1536, "distance": "Cosine" }, "quantization_config": { "binary": { "always_ram": true } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE), quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True, ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 1536, distance: "Cosine", }, quantization_config: { binary: { always_ram: true, }, }, }); ``` ```rust use qdrant_client::qdrant::{ BinaryQuantizationBuilder, CreateCollectionBuilder, Distance, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(1536, Distance::Cosine)) .quantization_config(BinaryQuantizationBuilder::new(true)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.BinaryQuantization; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.QuantizationConfig; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(1536) .setDistance(Distance.Cosine) .build()) .build()) .setQuantizationConfig( QuantizationConfig.newBuilder() .setBinary(BinaryQuantization.newBuilder().setAlwaysRam(true).build()) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 1536, Distance = Distance.Cosine }, quantizationConfig: new QuantizationConfig { Binary = new BinaryQuantization { AlwaysRam = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 1536, Distance: qdrant.Distance_Cosine, }), QuantizationConfig: qdrant.NewQuantizationBinary( &qdrant.BinaryQuantization{ AlwaysRam: qdrant.PtrOf(true), }, ), }) ``` `always_ram` \- whether to keep quantized vectors always cached in RAM or not. By default, quantized vectors are loaded in the same way as the original vectors. However, in some setups you might want to keep quantized vectors in RAM to speed up the search process. In this case, you can set `always_ram` to `true` to store quantized vectors in RAM. ### [Anchor](https://qdrant.tech/documentation/guides/quantization/\#setting-up-product-quantization) Setting up Product Quantization To enable product quantization, you need to specify the quantization parameters in the `quantization_config` section of the collection configuration. When enabling product quantization on an existing collection, use a PATCH request or the corresponding `update_collection` method and omit the vector configuration, as it’s already defined. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "quantization_config": { "product": { "compression": "x16", "always_ram": true } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), quantization_config=models.ProductQuantization( product=models.ProductQuantizationConfig( compression=models.CompressionRatio.X16, always_ram=True, ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, quantization_config: { product: { compression: "x16", always_ram: true, }, }, }); ``` ```rust use qdrant_client::qdrant::{ CompressionRatio, CreateCollectionBuilder, Distance, ProductQuantizationBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .quantization_config( ProductQuantizationBuilder::new(CompressionRatio::X16.into()).always_ram(true), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CompressionRatio; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.ProductQuantization; import io.qdrant.client.grpc.Collections.QuantizationConfig; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setQuantizationConfig( QuantizationConfig.newBuilder() .setProduct( ProductQuantization.newBuilder() .setCompression(CompressionRatio.x16) .setAlwaysRam(true) .build()) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, quantizationConfig: new QuantizationConfig { Product = new ProductQuantization { Compression = CompressionRatio.X16, AlwaysRam = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), QuantizationConfig: qdrant.NewQuantizationProduct( &qdrant.ProductQuantization{ Compression: qdrant.CompressionRatio_x16, AlwaysRam: qdrant.PtrOf(true), }, ), }) ``` There are two parameters that you can specify in the `quantization_config` section: `compression` \- compression ratio. Compression ratio represents the size of the quantized vector in bytes divided by the size of the original vector in bytes. In this case, the quantized vector will be 16 times smaller than the original vector. `always_ram` \- whether to keep quantized vectors always cached in RAM or not. By default, quantized vectors are loaded in the same way as the original vectors. However, in some setups you might want to keep quantized vectors in RAM to speed up the search process. Then set `always_ram` to `true`. ### [Anchor](https://qdrant.tech/documentation/guides/quantization/\#searching-with-quantization) Searching with Quantization Once you have configured quantization for a collection, you don’t need to do anything extra to search with quantization. Qdrant will automatically use quantized vectors if they are available. However, there are a few options that you can use to control the search process: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "params": { "quantization": { "ignore": false, "rescore": true, "oversampling": 2.0 } }, "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], search_params=models.SearchParams( quantization=models.QuantizationSearchParams( ignore=False, rescore=True, oversampling=2.0, ) ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], params: { quantization: { ignore: false, rescore: true, oversampling: 2.0, }, }, limit: 10, }); ``` ```rust use qdrant_client::qdrant::{ QuantizationSearchParamsBuilder, QueryPointsBuilder, SearchParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(10) .params( SearchParamsBuilder::default().quantization( QuantizationSearchParamsBuilder::default() .ignore(false) .rescore(true) .oversampling(2.0), ), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QuantizationSearchParams; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SearchParams; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setParams( SearchParams.newBuilder() .setQuantization( QuantizationSearchParams.newBuilder() .setIgnore(false) .setRescore(true) .setOversampling(2.0) .build()) .build()) .setLimit(10) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, searchParams: new SearchParams { Quantization = new QuantizationSearchParams { Ignore = false, Rescore = true, Oversampling = 2.0 } }, limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Params: &qdrant.SearchParams{ Quantization: &qdrant.QuantizationSearchParams{ Ignore: qdrant.PtrOf(false), Rescore: qdrant.PtrOf(true), Oversampling: qdrant.PtrOf(2.0), }, }, }) ``` `ignore` \- Toggle whether to ignore quantized vectors during the search process. By default, Qdrant will use quantized vectors if they are available. `rescore` \- Having the original vectors available, Qdrant can re-evaluate top-k search results using the original vectors. This can improve the search quality, but may slightly decrease the search speed, compared to the search without rescore. It is recommended to disable rescore only if the original vectors are stored on a slow storage (e.g. HDD or network storage). By default, rescore is enabled. **Available as of v1.3.0** `oversampling` \- Defines how many extra vectors should be pre-selected using quantized index, and then re-scored using original vectors. For example, if oversampling is 2.4 and limit is 100, then 240 vectors will be pre-selected using quantized index, and then top-100 will be returned after re-scoring. Oversampling is useful if you want to tune the tradeoff between search speed and search quality in the query time. ## [Anchor](https://qdrant.tech/documentation/guides/quantization/\#quantization-tips) Quantization tips #### [Anchor](https://qdrant.tech/documentation/guides/quantization/\#accuracy-tuning) Accuracy tuning In this section, we will discuss how to tune the search precision. The fastest way to understand the impact of quantization on the search quality is to compare the search results with and without quantization. In order to disable quantization, you can set `ignore` to `true` in the search request: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "params": { "quantization": { "ignore": true } }, "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], search_params=models.SearchParams( quantization=models.QuantizationSearchParams( ignore=True, ) ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], params: { quantization: { ignore: true, }, }, }); ``` ```rust use qdrant_client::qdrant::{ QuantizationSearchParamsBuilder, QueryPointsBuilder, SearchParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .params( SearchParamsBuilder::default() .quantization(QuantizationSearchParamsBuilder::default().ignore(true)), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QuantizationSearchParams; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SearchParams; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setParams( SearchParams.newBuilder() .setQuantization( QuantizationSearchParams.newBuilder().setIgnore(true).build()) .build()) .setLimit(10) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, searchParams: new SearchParams { Quantization = new QuantizationSearchParams { Ignore = true } }, limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Params: &qdrant.SearchParams{ Quantization: &qdrant.QuantizationSearchParams{ Ignore: qdrant.PtrOf(false), }, }, }) ``` - **Adjust the quantile parameter**: The quantile parameter in scalar quantization determines the quantization bounds. By setting it to a value lower than 1.0, you can exclude extreme values (outliers) from the quantization bounds. For example, if you set the quantile to 0.99, 1% of the extreme values will be excluded. By adjusting the quantile, you find an optimal value that will provide the best search quality for your collection. - **Enable rescore**: Having the original vectors available, Qdrant can re-evaluate top-k search results using the original vectors. On large collections, this can improve the search quality, with just minor performance impact. #### [Anchor](https://qdrant.tech/documentation/guides/quantization/\#memory-and-speed-tuning) Memory and speed tuning In this section, we will discuss how to tune the memory and speed of the search process with quantization. There are 3 possible modes to place storage of vectors within the qdrant collection: - **All in RAM** \- all vector, original and quantized, are loaded and kept in RAM. This is the fastest mode, but requires a lot of RAM. Enabled by default. - **Original on Disk, quantized in RAM** \- this is a hybrid mode, allows to obtain a good balance between speed and memory usage. Recommended scenario if you are aiming to shrink the memory footprint while keeping the search speed. This mode is enabled by setting `always_ram` to `true` in the quantization config while using memmap storage: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine", "on_disk": true }, "quantization_config": { "scalar": { "type": "int8", "always_ram": true } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE, on_disk=True), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, always_ram=True, ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", on_disk: true, }, quantization_config: { scalar: { type: "int8", always_ram: true, }, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, QuantizationType, ScalarQuantizationBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .quantization_config( ScalarQuantizationBuilder::default() .r#type(QuantizationType::Int8.into()) .always_ram(true), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.QuantizationConfig; import io.qdrant.client.grpc.Collections.QuantizationType; import io.qdrant.client.grpc.Collections.ScalarQuantization; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .setOnDisk(true) .build()) .build()) .setQuantizationConfig( QuantizationConfig.newBuilder() .setScalar( ScalarQuantization.newBuilder() .setType(QuantizationType.Int8) .setAlwaysRam(true) .build()) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine, OnDisk = true }, quantizationConfig: new QuantizationConfig { Scalar = new ScalarQuantization { Type = QuantizationType.Int8, AlwaysRam = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, OnDisk: qdrant.PtrOf(true), }), QuantizationConfig: qdrant.NewQuantizationScalar(&qdrant.ScalarQuantization{ Type: qdrant.QuantizationType_Int8, AlwaysRam: qdrant.PtrOf(true), }), }) ``` In this scenario, the number of disk reads may play a significant role in the search speed. In a system with high disk latency, the re-scoring step may become a bottleneck. Consider disabling `rescore` to improve the search speed: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "params": { "quantization": { "rescore": false } }, "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], search_params=models.SearchParams( quantization=models.QuantizationSearchParams(rescore=False) ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], params: { quantization: { rescore: false, }, }, }); ``` ```rust use qdrant_client::qdrant::{ QuantizationSearchParamsBuilder, QueryPointsBuilder, SearchParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .params( SearchParamsBuilder::default() .quantization(QuantizationSearchParamsBuilder::default().rescore(false)), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QuantizationSearchParams; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SearchParams; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setParams( SearchParams.newBuilder() .setQuantization( QuantizationSearchParams.newBuilder().setRescore(false).build()) .build()) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, searchParams: new SearchParams { Quantization = new QuantizationSearchParams { Rescore = false } }, limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Params: &qdrant.SearchParams{ Quantization: &qdrant.QuantizationSearchParams{ Rescore: qdrant.PtrOf(false), }, }, }) ``` - **All on Disk** \- all vectors, original and quantized, are stored on disk. This mode allows to achieve the smallest memory footprint, but at the cost of the search speed. It is recommended to use this mode if you have a large collection and fast storage (e.g. SSD or NVMe). This mode is enabled by setting `always_ram` to `false` in the quantization config while using mmap storage: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine", "on_disk": true }, "quantization_config": { "scalar": { "type": "int8", "always_ram": false } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE, on_disk=True), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, always_ram=False, ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", on_disk: true, }, quantization_config: { scalar: { type: "int8", always_ram: false, }, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, QuantizationType, ScalarQuantizationBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine).on_disk(true)) .quantization_config( ScalarQuantizationBuilder::default() .r#type(QuantizationType::Int8.into()) .always_ram(false), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.QuantizationConfig; import io.qdrant.client.grpc.Collections.QuantizationType; import io.qdrant.client.grpc.Collections.ScalarQuantization; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .setOnDisk(true) .build()) .build()) .setQuantizationConfig( QuantizationConfig.newBuilder() .setScalar( ScalarQuantization.newBuilder() .setType(QuantizationType.Int8) .setAlwaysRam(false) .build()) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine, OnDisk = true}, quantizationConfig: new QuantizationConfig { Scalar = new ScalarQuantization { Type = QuantizationType.Int8, AlwaysRam = false } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, OnDisk: qdrant.PtrOf(true), }), QuantizationConfig: qdrant.NewQuantizationScalar( &qdrant.ScalarQuantization{ Type: qdrant.QuantizationType_Int8, AlwaysRam: qdrant.PtrOf(false), }, ), }) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/quantization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/quantization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-172-lllmstxt|> ## vector-similarity-beyond-search - [Articles](https://qdrant.tech/articles/) - Vector Similarity: Going Beyond Full-Text Search \| Qdrant [Back to Data Exploration](https://qdrant.tech/articles/data-exploration/) --- # Vector Similarity: Going Beyond Full-Text Search \| Qdrant Luis Cossío · August 08, 2023 ![Vector Similarity: Going Beyond Full-Text Search | Qdrant](https://qdrant.tech/articles_data/vector-similarity-beyond-search/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#vector-similarity-unleashing-data-insights-beyond-traditional-search) Vector Similarity: Unleashing Data Insights Beyond Traditional Search When making use of unstructured data, there are traditional go-to solutions that are well-known for developers: - **Full-text search** when you need to find documents that contain a particular word or phrase. - **[Vector search](https://qdrant.tech/documentation/overview/vector-search/)** when you need to find documents that are semantically similar to a given query. Sometimes people mix those two approaches, so it might look like the vector similarity is just an extension of full-text search. However, in this article, we will explore some promising new techniques that can be used to expand the use-case of unstructured data and demonstrate that vector similarity creates its own stack of data exploration tools. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#what-is-vector-similarity-search) What is vector similarity search? Vector similarity offers a range of powerful functions that go far beyond those available in traditional full-text search engines. From dissimilarity search to diversity and recommendation, these methods can expand the cases in which vectors are useful. Vector Databases, which are designed to store and process immense amounts of vectors, are the first candidates to implement these new techniques and allow users to exploit their data to its fullest. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#vector-similarity-search-vs-full-text-search) Vector similarity search vs. full-text search While there is an intersection in the functionality of these two approaches, there is also a vast area of functions that is unique to each of them. For example, the exact phrase matching and counting of results are native to full-text search, while vector similarity support for this type of operation is limited. On the other hand, vector similarity easily allows cross-modal retrieval of images by text or vice-versa, which is impossible with full-text search. This mismatch in expectations might sometimes lead to confusion. Attempting to use a vector similarity as a full-text search can result in a range of frustrations, from slow response times to poor search results, to limited functionality. As an outcome, they are getting only a fraction of the benefits of vector similarity. ![Full-text search and Vector Similarity Functionality overlap](https://qdrant.tech/articles_data/vector-similarity-beyond-search/venn-diagram.png) Full-text search and Vector Similarity Functionality overlap Below we will explore why the vector similarity stack deserves new interfaces and design patterns that will unlock the full potential of this technology, which can still be used in conjunction with full-text search. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#new-ways-to-interact-with-similarities) New ways to interact with similarities Having a vector representation of unstructured data unlocks new ways of interacting with it. For example, it can be used to measure semantic similarity between words, to cluster words or documents based on their meaning, to find related images, or even to generate new text. However, these interactions can go beyond finding their nearest neighbors (kNN). There are several other techniques that can be leveraged by vector representations beyond the traditional kNN search. These include dissimilarity search, diversity search, recommendations, and discovery functions. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#dissimilarity-ssearch) Dissimilarity ssearch The Dissimilarity —or farthest— search is the most straightforward concept after the nearest search, which can’t be reproduced in a traditional full-text search. It aims to find the most un-similar or distant documents across the collection. ![Dissimilarity Search](https://qdrant.tech/articles_data/vector-similarity-beyond-search/dissimilarity.png) Dissimilarity Search Unlike full-text match, Vector similarity can compare any pair of documents (or points) and assign a similarity score. It doesn’t rely on keywords or other metadata. With vector similarity, we can easily achieve a dissimilarity search by inverting the search objective from maximizing similarity to minimizing it. The dissimilarity search can find items in areas where previously no other search could be used. Let’s look at a few examples. ### [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#case-mislabeling-detection) Case: mislabeling detection For example, we have a dataset of furniture in which we have classified our items into what kind of furniture they are: tables, chairs, lamps, etc. To ensure our catalog is accurate, we can use a dissimilarity search to highlight items that are most likely mislabeled. To do this, we only need to search for the most dissimilar items using the embedding of the category title itself as a query. This can be too broad, so, by combining it with filters —a [Qdrant superpower](https://qdrant.tech/articles/filtrable-hnsw/)—, we can narrow down the search to a specific category. ![Mislabeling Detection](https://qdrant.tech/articles_data/vector-similarity-beyond-search/mislabelling.png) Mislabeling Detection The output of this search can be further processed with heavier models or human supervision to detect actual mislabeling. ### [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#case-outlier-detection) Case: outlier detection In some cases, we might not even have labels, but it is still possible to try to detect anomalies in our dataset. Dissimilarity search can be used for this purpose as well. ![Anomaly Detection](https://qdrant.tech/articles_data/vector-similarity-beyond-search/anomaly-detection.png) Anomaly Detection The only thing we need is a bunch of reference points that we consider “normal”. Then we can search for the most dissimilar points to this reference set and use them as candidates for further analysis. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#diversity-search) Diversity search Even with no input provided vector, (dis-)similarity can improve an overall selection of items from the dataset. The naive approach is to do random sampling. However, unless our dataset has a uniform distribution, the results of such sampling might be biased toward more frequent types of items. ![Example of random sampling](https://qdrant.tech/articles_data/vector-similarity-beyond-search/diversity-random.png) Example of random sampling The similarity information can increase the diversity of those results and make the first overview more interesting. That is especially useful when users do not yet know what they are looking for and want to explore the dataset. ![Example of similarity-based sampling](https://qdrant.tech/articles_data/vector-similarity-beyond-search/diversity-force.png) Example of similarity-based sampling The power of vector similarity, in the context of being able to compare any two points, allows making a diverse selection of the collection possible without any labeling efforts. By maximizing the distance between all points in the response, we can have an algorithm that will sequentially output dissimilar results. ![Diversity Search](https://qdrant.tech/articles_data/vector-similarity-beyond-search/diversity.png) Diversity Search Some forms of diversity sampling are already used in the industry and are known as [Maximum Margin Relevance](https://python.langchain.com/docs/integrations/vectorstores/qdrant#maximum-marginal-relevance-search-mmr) (MMR). Techniques like this were developed to enhance similarity on a universal search API. However, there is still room for new ideas, particularly regarding diversity retrieval. By utilizing more advanced vector-native engines, it could be possible to take use cases to the next level and achieve even better results. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#vector-similarity-recommendations) Vector similarity recommendations Vector similarity can go above a single query vector. It can combine multiple positive and negative examples for a more accurate retrieval. Building a recommendation API in a vector database can take advantage of using already stored vectors as part of the queries, by specifying the point id. Doing this, we can skip query-time neural network inference, and make the recommendation search faster. There are multiple ways to implement recommendations with vectors. ### [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#vector-features-recommendations) Vector-features recommendations The first approach is to take all positive and negative examples and average them to create a single query vector. In this technique, the more significant components of positive vectors are canceled out by the negative ones, and the resulting vector is a combination of all the features present in the positive examples, but not in the negative ones. ![Vector-Features Based Recommendations](https://qdrant.tech/articles_data/vector-similarity-beyond-search/feature-based-recommendations.png) Vector-Features Based Recommendations This approach is already implemented in Qdrant, and while it works great when the vectors are assumed to have each of their dimensions represent some kind of feature of the data, sometimes distances are a better tool to judge negative and positive examples. ### [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#relative-distance-recommendations) Relative distance recommendations Another approach is to use the distance between negative examples to the candidates to help them create exclusion areas. In this technique, we perform searches near the positive examples while excluding the points that are closer to a negative example than to a positive one. ![Relative Distance Recommendations](https://qdrant.tech/articles_data/vector-similarity-beyond-search/relative-distance-recommendations.png) Relative Distance Recommendations The main use-case of both approaches —of course— is to take some history of user interactions and recommend new items based on it. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#discovery) Discovery In many exploration scenarios, the desired destination is not known in advance. The search process in this case can consist of multiple steps, where each step would provide a little more information to guide the search in the right direction. To get more intuition about the possible ways to implement this approach, let’s take a look at how similarity modes are trained in the first place: The most well-known loss function used to train similarity models is a [triplet-loss](https://en.wikipedia.org/wiki/Triplet_loss). In this loss, the model is trained by fitting the information of relative similarity of 3 objects: the Anchor, Positive, and Negative examples. ![Triplet Loss](https://qdrant.tech/articles_data/vector-similarity-beyond-search/triplet-loss.png) Triplet Loss Using the same mechanics, we can look at the training process from the other side. Given a trained model, the user can provide positive and negative examples, and the goal of the discovery process is then to find suitable anchors across the stored collection of vectors. ![Reversed triplet loss](https://qdrant.tech/articles_data/vector-similarity-beyond-search/discovery.png) Reversed triplet loss Multiple positive-negative pairs can be provided to make the discovery process more accurate. Worth mentioning, that as well as in NN training, the dataset may contain noise and some portion of contradictory information, so a discovery process should be tolerant of this kind of data imperfections. ![Sample pairs](https://qdrant.tech/articles_data/vector-similarity-beyond-search/discovery-noise.png) Sample pairs The important difference between this and the recommendation method is that the positive-negative pairs in the discovery method don’t assume that the final result should be close to positive, it only assumes that it should be closer than the negative one. ![Discovery vs Recommendation](https://qdrant.tech/articles_data/vector-similarity-beyond-search/discovery-vs-recommendations.png) Discovery vs Recommendation In combination with filtering or similarity search, the additional context information provided by the discovery pairs can be used as a re-ranking factor. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#a-new-api-stack-for-vector-databases) A new API stack for vector databases When you introduce vector similarity capabilities into your text search engine, you extend its functionality. However, it doesn’t work the other way around, as the vector similarity as a concept is much broader than some task-specific implementations of full-text search. [Vector databases](https://qdrant.tech/), which introduce built-in full-text functionality, must make several compromises: - Choose a specific full-text search variant. - Either sacrifice API consistency or limit vector similarity functionality to only basic kNN search. - Introduce additional complexity to the system. Qdrant, on the contrary, puts vector similarity in the center of its API and architecture, such that it allows us to move towards a new stack of vector-native operations. We believe that this is the future of vector databases, and we are excited to see what new use-cases will be unlocked by these techniques. ## [Anchor](https://qdrant.tech/articles/vector-similarity-beyond-search/\#key-takeaways) Key takeaways: - Vector similarity offers advanced data exploration tools beyond traditional full-text search, including dissimilarity search, diversity sampling, and recommendation systems. - Practical applications of vector similarity include improving data quality through mislabeling detection and anomaly identification. - Enhanced user experiences are achieved by leveraging advanced search techniques, providing users with intuitive data exploration, and improving decision-making processes. Ready to unlock the full potential of your data? [Try a free demo](https://qdrant.tech/contact-us/) to explore how vector similarity can revolutionize your data insights and drive smarter decision-making. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/vector-similarity-beyond-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/vector-similarity-beyond-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-173-lllmstxt|> ## what-is-rag-in-ai - [Articles](https://qdrant.tech/articles/) - What is RAG: Understanding Retrieval-Augmented Generation [Back to RAG & GenAI](https://qdrant.tech/articles/rag-and-genai/) --- # What is RAG: Understanding Retrieval-Augmented Generation Sabrina Aquino · March 19, 2024 ![What is RAG: Understanding Retrieval-Augmented Generation](https://qdrant.tech/articles_data/what-is-rag-in-ai/preview/title.jpg) > Retrieval-augmented generation (RAG) integrates external information retrieval into the process of generating responses by Large Language Models (LLMs). It searches a database for information beyond its pre-trained knowledge base, significantly improving the accuracy and relevance of the generated responses. Language models have exploded on the internet ever since ChatGPT came out, and rightfully so. They can write essays, code entire programs, and even make memes (though we’re still deciding on whether that’s a good thing). But as brilliant as these chatbots become, they still have **limitations** in tasks requiring external knowledge and factual information. Yes, it can describe the honeybee’s waggle dance in excruciating detail. But they become far more valuable if they can generate insights from **any data** that we provide, rather than just their original training data. Since retraining those large language models from scratch costs millions of dollars and takes months, we need better ways to give our existing LLMs access to our custom data. While you could be more creative with your prompts, it is only a short-term solution. LLMs can consider only a **limited** amount of text in their responses, known as a [context window](https://www.hopsworks.ai/dictionary/context-window-for-llms). Some models like GPT-3 can see up to around 12 pages of text (that’s 4,096 tokens of context). That’s not good enough for most knowledge bases. ![How a RAG works](https://qdrant.tech/articles_data/what-is-rag-in-ai/how-rag-works.jpg) The image above shows how a basic RAG system works. Before forwarding the question to the LLM, we have a layer that searches our knowledge base for the “relevant knowledge” to answer the user query. Specifically, in this case, the spending data from the last month. Our LLM can now generate a **relevant non-hallucinated** response about our budget. As your data grows, you’ll need [efficient ways](https://qdrant.tech/rag/rag-evaluation-guide/) to identify the most relevant information for your LLM’s limited memory. This is where you’ll want a proper way to store and retrieve the specific data you’ll need for your query, without needing the LLM to remember it. **Vector databases** store information as **vector embeddings**. This format supports efficient similarity searches to retrieve relevant data for your query. For example, Qdrant is specifically designed to perform fast, even in scenarios dealing with billions of vectors. This article will focus on RAG systems and architecture. If you’re interested in learning more about vector search, we recommend the following articles: [What is a Vector Database?](https://qdrant.tech/articles/what-is-a-vector-database/) and [What are Vector Embeddings?](https://qdrant.tech/articles/what-are-embeddings/). ## [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#rag-architecture) RAG architecture At its core, a RAG architecture includes the **retriever** and the **generator**. Let’s start by understanding what each of these components does. ### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#the-retriever) The Retriever When you ask a question to the retriever, it uses **similarity search** to scan through a vast knowledge base of vector embeddings. It then pulls out the most **relevant** vectors to help answer that query. There are a few different techniques it can use to know what’s relevant: #### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#how-indexing-works-in-rag-retrievers) How indexing works in RAG retrievers The indexing process organizes the data into your vector database in a way that makes it easily searchable. This allows the RAG to access relevant information when responding to a query. ![How indexing works](https://qdrant.tech/articles_data/what-is-rag-in-ai/how-indexing-works.jpg) As shown in the image above, here’s the process: - Start with a _loader_ that gathers _documents_ containing your data. These documents could be anything from articles and books to web pages and social media posts. - Next, a _splitter_ divides the documents into smaller chunks, typically sentences or paragraphs. - This is because RAG models work better with smaller pieces of text. In the diagram, these are _document snippets_. - Each text chunk is then fed into an _embedding machine_. This machine uses complex algorithms to convert the text into [vector embeddings](https://qdrant.tech/articles/what-are-embeddings/). All the generated vector embeddings are stored in a knowledge base of indexed information. This supports efficient retrieval of similar pieces of information when needed. #### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#query-vectorization) Query vectorization Once you have vectorized your knowledge base you can do the same to the user query. When the model sees a new query, it uses the same preprocessing and embedding techniques. This ensures that the query vector is compatible with the document vectors in the index. ![How retrieval works](https://qdrant.tech/articles_data/what-is-rag-in-ai/how-retrieval-works.jpg) #### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#retrieval-of-relevant-documents) Retrieval of relevant documents When the system needs to find the most relevant documents or passages to answer a query, it utilizes vector similarity techniques. **Vector similarity** is a fundamental concept in machine learning and natural language processing (NLP) that quantifies the resemblance between vectors, which are mathematical representations of data points. The system can employ different vector similarity strategies depending on the type of vectors used to represent the data: ##### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#sparse-vector-representations) Sparse vector representations A sparse vector is characterized by a high dimensionality, with most of its elements being zero. The classic approach is **keyword search**, which scans documents for the exact words or phrases in the query. The search creates sparse vector representations of documents by counting word occurrences and inversely weighting common words. Queries with rarer words get prioritized. ![Sparse vector representation](https://qdrant.tech/articles_data/what-is-rag-in-ai/sparse-vectors.jpg) [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) (Term Frequency-Inverse Document Frequency) and [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) are two classic related algorithms. They’re simple and computationally efficient. However, they can struggle with synonyms and don’t always capture semantic similarities. If you’re interested in going deeper, refer to our article on [Sparse Vectors](https://qdrant.tech/articles/sparse-vectors/). ##### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#dense-vector-embeddings) Dense vector embeddings This approach uses large language models like [BERT](https://en.wikipedia.org/wiki/BERT_%28language_model%29) to encode the query and passages into dense vector embeddings. These models are compact numerical representations that capture semantic meaning. Vector databases like Qdrant store these embeddings, allowing retrieval based on **semantic similarity** rather than just keywords using distance metrics like cosine similarity. This allows the retriever to match based on semantic understanding rather than just keywords. So if I ask about “compounds that cause BO,” it can retrieve relevant info about “molecules that create body odor” even if those exact words weren’t used. We explain more about it in our [What are Vector Embeddings](https://qdrant.tech/articles/what-are-embeddings/) article. #### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#hybrid-search) Hybrid search However, neither keyword search nor vector search are always perfect. Keyword search may miss relevant information expressed differently, while vector search can sometimes struggle with specificity or neglect important statistical word patterns. Hybrid methods aim to combine the strengths of different techniques. ![Hybrid search overview](https://qdrant.tech/articles_data/what-is-rag-in-ai/hybrid-search.jpg) Some common hybrid approaches include: - Using keyword search to get an initial set of candidate documents. Next, the documents are re-ranked/re-scored using semantic vector representations. - Starting with semantic vectors to find generally topically relevant documents. Next, the documents are filtered/re-ranked e based on keyword matches or other metadata. - Considering both semantic vector closeness and statistical keyword patterns/weights in a combined scoring model. - Having multiple stages were different techniques. One example: start with an initial keyword retrieval, followed by semantic re-ranking, then a final re-ranking using even more complex models. When you combine the powers of different search methods in a complementary way, you can provide higher quality, more comprehensive results. Check out our article on [Hybrid Search](https://qdrant.tech/articles/hybrid-search/) if you’d like to learn more. ### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#the-generator) The Generator With the top relevant passages retrieved, it’s now the generator’s job to produce a final answer by synthesizing and expressing that information in natural language. The LLM is typically a model like GPT, BART or T5, trained on massive datasets to understand and generate human-like text. It now takes not only the query (or question) as input but also the relevant documents or passages that the retriever identified as potentially containing the answer to generate its response. ![How a Generator works](https://qdrant.tech/articles_data/what-is-rag-in-ai/how-generation-works.png) The retriever and generator don’t operate in isolation. The image bellow shows how the output of the retrieval feeds the generator to produce the final generated response. ![The entire architecture of a RAG system](https://qdrant.tech/articles_data/what-is-rag-in-ai/rag-system.jpg) ## [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#where-is-rag-being-used) Where is RAG being used? Because of their more knowledgeable and contextual responses, we can find RAG models being applied in many areas today, especially those who need factual accuracy and knowledge depth. ### [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#real-world-applications) Real-World Applications: **Question answering:** This is perhaps the most prominent use case for RAG models. They power advanced question-answering systems that can retrieve relevant information from large knowledge bases and then generate fluent answers. **Language generation:** RAG enables more factual and contextualized text generation for contextualized text summarization from multiple sources **Data-to-text generation:** By retrieving relevant structured data, RAG models can generate product/business intelligence reports from databases or describing insights from data visualizations and charts **Multimedia understanding:** RAG isn’t limited to text - it can retrieve multimodal information like images, video, and audio to enhance understanding. Answering questions about images/videos by retrieving relevant textual context. ## [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#creating-your-first-rag-chatbot-with-langchain-groq-and-openai) Creating your first RAG chatbot with Langchain, Groq, and OpenAI Are you ready to create your own RAG chatbot from the ground up? We have a video explaining everything from the beginning. Daniel Romero’s will guide you through: - Setting up your chatbot - Preprocessing and organizing data for your chatbot’s use - Applying vector similarity search algorithms - Enhancing the efficiency and response quality After building your RAG chatbot, you’ll be able to [evaluate its performance](https://qdrant.tech/rag/rag-evaluation-guide/) against that of a chatbot powered solely by a Large Language Model (LLM). Chatbot with RAG, using LangChain, OpenAI, and Groq - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [Chatbot with RAG, using LangChain, OpenAI, and Groq](https://www.youtube.com/watch?v=O60-KuZZeQA) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=O60-KuZZeQA&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 20:14 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=O60-KuZZeQA "Watch on YouTube") ## [Anchor](https://qdrant.tech/articles/what-is-rag-in-ai/\#whats-next) What’s next? Have a RAG project you want to bring to life? Join our [Discord community](https://discord.gg/qdrant) where we’re always sharing tips and answering questions on vector search and retrieval. Learn more about how to properly evaluate your RAG responses: [Evaluating Retrieval Augmented Generation - a framework for assessment](https://superlinked.com/vectorhub/evaluating-retrieval-augmented-generation-a-framework-for-assessment). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-is-rag-in-ai.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/what-is-rag-in-ai.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-174-lllmstxt|> ## hybrid-cloud-cluster-creation - [Documentation](https://qdrant.tech/documentation/) - [Hybrid cloud](https://qdrant.tech/documentation/hybrid-cloud/) - Create a Cluster --- # [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/\#creating-a-qdrant-cluster-in-hybrid-cloud) Creating a Qdrant Cluster in Hybrid Cloud Once you have created a Hybrid Cloud Environment, you can create a Qdrant cluster in that enviroment. Use the same process to [Create a cluster](https://qdrant.tech/documentation/cloud/create-cluster/). Make sure to select your Hybrid Cloud Environment as the target. ![Create Hybrid Cloud Cluster](https://qdrant.tech/documentation/cloud/hybrid_cloud_create_cluster.png) Note that in the “Kubernetes Configuration” section you can additionally configure: - Node selectors for the Qdrant database pods - Toleration for the Qdrant database pods - Additional labels for the Qdrant database pods - A service type and annotations for the Qdrant database service These settings can also be changed after the cluster is created on the cluster detail page. ![Create Hybrid Cloud Cluster - Kubernetes Configuration](https://qdrant.tech/documentation/cloud/hybrid_cloud_kubernetes_configuration.png) ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/\#scheduling-configuration) Scheduling Configuration When creating or editing a cluster, you can configure how the database Pods get scheduled in your Kubernetes cluster. This can be useful to ensure that the Qdrant databases will run on dedicated nodes. You can configure the necessary node selectors and tolerations in the “Kubernetes Configuration” section during cluster creation, or on the cluster detail page. ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/\#authentication-to-your-qdrant-clusters) Authentication to your Qdrant Clusters In Hybrid Cloud the authentication information is provided by Kubernetes secrets. You can configure authentication for your Qdrant clusters in the “Configuration” section of the Qdrant Cluster detail page. There you can configure the Kubernetes secret name and key to be used as an API key and/or read-only API key. ![Hybrid Cloud API Key configuration](https://qdrant.tech/documentation/cloud/hybrid_cloud_api_key.png) One way to create a secret is with kubectl: ```shell kubectl create secret generic qdrant-api-key --from-literal=api-key=your-secret-api-key --namespace the-qdrant-namespace ``` The resulting secret will look like this: ```yaml apiVersion: v1 data: api-key: ... kind: Secret metadata: name: qdrant-api-key namespace: the-qdrant-namespace type: kubernetes.io/generic ``` With this command the secret name would be `qdrant-api-key` and the key would be `api-key`. If you want to retrieve the secret again, you can also use `kubectl`: ```shell kubectl get secret qdrant-api-key -o jsonpath="{.data.api-key}" --namespace the-qdrant-namespace | base64 --decode ``` #### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/\#watch-the-video) Watch the Video In this tutorial, we walk you through the steps to expose your Qdrant database cluster running on Qdrant Hybrid Cloud to external applications or users outside your Kubernetes cluster. Learn how to configure TLS certificates for secure communication, set up authentication, and explore different methods like load balancers, ingress, and port configurations. How to Securely Expose Qdrant on Hybrid Cloud to External Applications - YouTube [Photo image of Qdrant - Vector Database & Search Engine](https://www.youtube.com/channel/UC6ftm8PwH1RU_LM1jwG0LQA?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Qdrant - Vector Database & Search Engine 8.12K subscribers [How to Securely Expose Qdrant on Hybrid Cloud to External Applications](https://www.youtube.com/watch?v=ikofKaUc4x0) Qdrant - Vector Database & Search Engine Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=ikofKaUc4x0&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 9:40 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=ikofKaUc4x0 "Watch on YouTube") ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/\#exposing-qdrant-clusters-to-your-client-applications) Exposing Qdrant clusters to your client applications You can expose your Qdrant clusters to your client applications using Kubernetes services and ingresses. By default, a `ClusterIP` service is created for each Qdrant cluster. Within your Kubernetes cluster, you can access the Qdrant cluster using the service name and port: ``` http://qdrant-9a9f48c7-bb90-4fb2-816f-418a46a74b24.qdrant-namespace.svc:6333 ``` This endpoint is also visible on the cluster detail page. If you want to access the database from your local developer machine, you can use `kubectl port-forward` to forward the service port to your local machine: ``` kubectl --namespace your-qdrant-namespace port-forward service/qdrant-9a9f48c7-bb90-4fb2-816f-418a46a74b24 6333:6333 ``` You can also expose the database outside the Kubernetes cluster with a `LoadBalancer` (if supported in your Kubernetes environment) or `NodePort` service or an ingress. The service type and necessary annotations can be configured in the “Kubernetes Configuration” section during cluster creation, or on the cluster detail page. ![Hybrid Cloud API Key configuration](https://qdrant.tech/documentation/cloud/hybrid_cloud_service.png) Especially if you create a LoadBalancer Service, you may need to provide annotations for the loadbalancer configration. Please refer to the documention of your cloud provider for more details. Examples: - [AWS EKS LoadBalancer annotations](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/) - [Azure AKS Public LoadBalancer annotations](https://learn.microsoft.com/en-us/azure/aks/load-balancer-standard) - [Azure AKS Internal LoadBalancer annotations](https://learn.microsoft.com/en-us/azure/aks/internal-lb) - [GCP GKE LoadBalancer annotations](https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer-parameters) You could also create a Loadbalancer service manually like this: ```yaml apiVersion: v1 kind: Service metadata: name: qdrant-9a9f48c7-bb90-4fb2-816f-418a46a74b24-lb namespace: qdrant-namespace spec: type: LoadBalancer ports: - name: http port: 6333 - name: grpc port: 6334 selector: app: qdrant cluster-id: 9a9f48c7-bb90-4fb2-816f-418a46a74b24 ``` An ingress could look like this: ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: qdrant-9a9f48c7-bb90-4fb2-816f-418a46a74b24 namespace: qdrant-namespace spec: rules: - host: qdrant-9a9f48c7-bb90-4fb2-816f-418a46a74b24.your-domain.com http: paths: - path: / pathType: Prefix backend: service: name: qdrant-9a9f48c7-bb90-4fb2-816f-418a46a74b24 port: number: 6333 ``` Please refer to the Kubernetes, ingress controller and cloud provider documentation for more details. If you expose the database like this, you will be able to see this also reflected as an endpoint on the cluster detail page. And will see the Qdrant database dashboard link pointing to it. ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/\#configuring-tls) Configuring TLS If you want to configure TLS for accessing your Qdrant database in Hybrid Cloud, there are two options: - You can offload TLS at the ingress or loadbalancer level. - You can configure TLS directly in the Qdrant database. If you want to offload TLS at the ingress or loadbancer level, please refer to their respective documents. If you want to configure TLS directly in the Qdrant database, you can reference a secret containing the TLS certificate and key in the “Configuration” section of the Qdrant Cluster detail page. ![Hybrid Cloud API Key configuration](https://qdrant.tech/documentation/cloud/hybrid_cloud_tls.png) To create such a secret, you can use `kubectl`: ```shell kubectl create secret tls qdrant-tls --cert=mydomain.com.crt --key=mydomain.com.key --namespace the-qdrant-namespace ``` The resulting secret will look like this: ```yaml apiVersion: v1 data: tls.crt: ... tls.key: ... kind: Secret metadata: name: qdrant-tls namespace: the-qdrant-namespace type: kubernetes.io/tls ``` With this command the secret name to enter into the UI would be `qdrant-tls` and the keys would be `tls.crt` and `tls.key`. ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/hybrid-cloud-cluster-creation/\#configuring-cpu-and-memory-resource-reservations) Configuring CPU and memory resource reservations When creating a Qdrant database cluster, Qdrant Cloud schedules Pods with specific CPU and memory requests and limits to ensure optimal performance. It will use equal requests and limits for stability. Ideally, Kubernetes nodes should match the Pod size, with one database Pod per VM. By default, Qdrant Cloud will reserve 20% of available CPU and memory on each Pod. This is done to leave room for the operating system, Kubernetes, and system components. This conservative default may need adjustment depending on node size, whereby smaller nodes might require more, and larger nodes less resources reserved. You can modify this reservation in the “Configuration” section of the Qdrant Cluster detail page. If you want to check how much resources are availabe on an empty Kubernetes node, you can use the following command: ```shell kubectl describe node ``` This will give you a breakdown of the available resources to Kubernetes and how much is already reserved and used for system Pods. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/hybrid-cloud-cluster-creation.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/hybrid-cloud-cluster-creation.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-175-lllmstxt|> ## retrieval-quality - [Documentation](https://qdrant.tech/documentation/) - [Beginner tutorials](https://qdrant.tech/documentation/beginner-tutorials/) - Measure Search Quality --- # [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#measure-and-improve-retrieval-quality-in-semantic-search) Measure and Improve Retrieval Quality in Semantic Search | Time: 30 min | Level: Intermediate | | | | --- | --- | --- | --- | Semantic search pipelines are as good as the embeddings they use. If your model cannot properly represent input data, similar objects might be far away from each other in the vector space. No surprise, that the search results will be poor in this case. There is, however, another component of the process which can also degrade the quality of the search results. It is the ANN algorithm itself. In this tutorial, we will show how to measure the quality of the semantic retrieval and how to tune the parameters of the HNSW, the ANN algorithm used in Qdrant, to obtain the best results. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#embeddings-quality) Embeddings quality The quality of the embeddings is a topic for a separate tutorial. In a nutshell, it is usually measured and compared by benchmarks, such as [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/spaces/mteb/leaderboard). The evaluation process itself is pretty straightforward and is based on a ground truth dataset built by humans. We have a set of queries and a set of the documents we would expect to receive for each of them. In the [evaluation process](https://qdrant.tech/rag/rag-evaluation-guide/), we take a query, find the most similar documents in the vector space and compare them with the ground truth. In that setup, **finding the most similar documents is implemented as full kNN search, without any approximation**. As a result, we can measure the quality of the embeddings themselves, without the influence of the ANN algorithm. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#retrieval-quality) Retrieval quality Embeddings quality is indeed the most important factor in the semantic search quality. However, vector search engines, such as Qdrant, do not perform pure kNN search. Instead, they use **Approximate Nearest Neighbors** (ANN) algorithms, which are much faster than the exact search, but can return suboptimal results. We can also **measure the retrieval quality of that approximation** which also contributes to the overall search quality. ### [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#quality-metrics) Quality metrics There are various ways of how quantify the quality of semantic search. Some of them, such as [Precision@k](https://en.wikipedia.org/wiki/Evaluation_measures_%28information_retrieval%29#Precision_at_k), are based on the number of relevant documents in the top-k search results. Others, such as [Mean Reciprocal Rank (MRR)](https://en.wikipedia.org/wiki/Mean_reciprocal_rank), take into account the position of the first relevant document in the search results. [DCG and NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) metrics are, in turn, based on the relevance score of the documents. If we treat the search pipeline as a whole, we could use them all. The same is true for the embeddings quality evaluation. However, for the ANN algorithm itself, anything based on the relevance score or ranking is not applicable. Ranking in vector search relies on the distance between the query and the document in the vector space, however distance is not going to change due to approximation, as the function is still the same. Therefore, it only makes sense to measure the quality of the ANN algorithm by the number of relevant documents in the top-k search results, such as `precision@k`. It is calculated as the number of relevant documents in the top-k search results divided by `k`. In case of testing just the ANN algorithm, we can use the exact kNN search as a ground truth, with `k` being fixed. It will be a measure on **how well the ANN** **algorithm approximates the exact search**. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#measure-the-quality-of-the-search-results) Measure the quality of the search results Let’s build a quality [evaluation](https://qdrant.tech/rag/rag-evaluation-guide/) of the ANN algorithm in Qdrant. We will, first, call the search endpoint in a standard way to obtain the approximate search results. Then, we will call the exact search endpoint to obtain the exact matches, and finally compare both results in terms of precision. Before we start, let’s create a collection, fill it with some data and then start our evaluation. We will use the same dataset as in the [Loading a dataset from Hugging Face hub](https://qdrant.tech/documentation/tutorials/huggingface-datasets/) tutorial, `Qdrant/arxiv-titles-instructorxl-embeddings` from the [Hugging Face hub](https://huggingface.co/datasets/Qdrant/arxiv-titles-instructorxl-embeddings). Let’s download it in a streaming mode, as we are only going to use part of it. ```python from datasets import load_dataset dataset = load_dataset( "Qdrant/arxiv-titles-instructorxl-embeddings", split="train", streaming=True ) ``` We need some data to be indexed and another set for the testing purposes. Let’s get the first 50000 items for the training and the next 1000 for the testing. ```python dataset_iterator = iter(dataset) train_dataset = [next(dataset_iterator) for _ in range(60000)] test_dataset = [next(dataset_iterator) for _ in range(1000)] ``` Now, let’s create a collection and index the training data. This collection will be created with the default configuration. Please be aware that it might be different from your collection settings, and it’s always important to test exactly the same configuration you are going to use later in production. ```python from qdrant_client import QdrantClient, models client = QdrantClient("http://localhost:6333") client.create_collection( collection_name="arxiv-titles-instructorxl-embeddings", vectors_config=models.VectorParams( size=768, # Size of the embeddings generated by InstructorXL model distance=models.Distance.COSINE, ), ) ``` We are now ready to index the training data. Uploading the records is going to trigger the indexing process, which will build the HNSW graph. The indexing process may take some time, depending on the size of the dataset, but your data is going to be available for search immediately after receiving the response from the `upsert` endpoint. **As long as the indexing is not finished, and HNSW not built, Qdrant will perform** **the exact search**. We have to wait until the indexing is finished to be sure that the approximate search is performed. ```python client.upload_points( # upload_points is available as of qdrant-client v1.7.1 collection_name="arxiv-titles-instructorxl-embeddings", points=[\ models.PointStruct(\ id=item["id"],\ vector=item["vector"],\ payload=item,\ )\ for item in train_dataset\ ] ) while True: collection_info = client.get_collection(collection_name="arxiv-titles-instructorxl-embeddings") if collection_info.status == models.CollectionStatus.GREEN: # Collection status is green, which means the indexing is finished break ``` ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#standard-mode-vs-exact-search) Standard mode vs exact search Qdrant has a built-in exact search mode, which can be used to measure the quality of the search results. In this mode, Qdrant performs a full kNN search for each query, without any approximation. It is not suitable for production use with high load, but it is perfect for the evaluation of the ANN algorithm and its parameters. It might be triggered by setting the `exact` parameter to `True` in the search request. We are simply going to use all the examples from the test dataset as queries and compare the results of the approximate search with the results of the exact search. Let’s create a helper function with `k` being a parameter, so we can calculate the `precision@k` for different values of `k`. ```python def avg_precision_at_k(k: int): precisions = [] for item in test_dataset: ann_result = client.query_points( collection_name="arxiv-titles-instructorxl-embeddings", query=item["vector"], limit=k, ).points knn_result = client.query_points( collection_name="arxiv-titles-instructorxl-embeddings", query=item["vector"], limit=k, search_params=models.SearchParams( exact=True, # Turns on the exact search mode ), ).points # We can calculate the precision@k by comparing the ids of the search results ann_ids = set(item.id for item in ann_result) knn_ids = set(item.id for item in knn_result) precision = len(ann_ids.intersection(knn_ids)) / k precisions.append(precision) return sum(precisions) / len(precisions) ``` Calculating the `precision@5` is as simple as calling the function with the corresponding parameter: ```python print(f"avg(precision@5) = {avg_precision_at_k(k=5)}") ``` Response: ```text avg(precision@5) = 0.9935999999999995 ``` As we can see, the precision of the approximate search vs exact search is pretty high. There are, however, some scenarios when we need higher precision and can accept higher latency. HNSW is pretty tunable, and we can increase the precision by changing its parameters. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#tweaking-the-hnsw-parameters) Tweaking the HNSW parameters HNSW is a hierarchical graph, where each node has a set of links to other nodes. The number of edges per node is called the `m` parameter. The larger the value of it, the higher the precision of the search, but more space required. The `ef_construct` parameter is the number of neighbours to consider during the index building. Again, the larger the value, the higher the precision, but the longer the indexing time. The default values of these parameters are `m=16` and `ef_construct=100`. Let’s try to increase them to `m=32` and `ef_construct=200` and see how it affects the precision. Of course, we need to wait until the indexing is finished before we can perform the search. ```python client.update_collection( collection_name="arxiv-titles-instructorxl-embeddings", hnsw_config=models.HnswConfigDiff( m=32, # Increase the number of edges per node from the default 16 to 32 ef_construct=200, # Increase the number of neighbours from the default 100 to 200 ) ) while True: collection_info = client.get_collection(collection_name="arxiv-titles-instructorxl-embeddings") if collection_info.status == models.CollectionStatus.GREEN: # Collection status is green, which means the indexing is finished break ``` The same function can be used to calculate the average `precision@5`: ```python print(f"avg(precision@5) = {avg_precision_at_k(k=5)}") ``` Response: ```text avg(precision@5) = 0.9969999999999998 ``` The precision has obviously increased, and we know how to control it. However, there is a trade-off between the precision and the search latency and memory requirements. In some specific cases, we may want to increase the precision as much as possible, so now we know how to do it. ## [Anchor](https://qdrant.tech/documentation/beginner-tutorials/retrieval-quality/\#wrapping-up) Wrapping up Assessing the quality of retrieval is a critical aspect of [evaluating](https://qdrant.tech/rag/rag-evaluation-guide/) semantic search performance. It is imperative to measure retrieval quality when aiming for optimal quality of. your search results. Qdrant provides a built-in exact search mode, which can be used to measure the quality of the ANN algorithm itself, even in an automated way, as part of your CI/CD pipeline. Again, **the quality of the embeddings is the most important factor**. HNSW does a pretty good job in terms of precision, and it is parameterizable and tunable, when required. There are some other ANN algorithms available out there, such as [IVF\*](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes#cell-probe-methods-indexivf-indexes), but they usually [perform worse than HNSW in terms of quality and performance](https://nirantk.com/writing/pgvector-vs-qdrant/#correctness). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/retrieval-quality.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/beginner-tutorials/retrieval-quality.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-176-lllmstxt|> ## qa-with-cohere-and-qdrant - [Articles](https://qdrant.tech/articles/) - Question Answering as a Service with Cohere and Qdrant [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Question Answering as a Service with Cohere and Qdrant Kacper Łukawski · November 29, 2022 ![Question Answering as a Service with Cohere and Qdrant](https://qdrant.tech/articles_data/qa-with-cohere-and-qdrant/preview/title.jpg) Bi-encoders are probably the most efficient way of setting up a semantic Question Answering system. This architecture relies on the same neural model that creates vector embeddings for both questions and answers. The assumption is, both question and answer should have representations close to each other in the latent space. It should be like that because they should both describe the same semantic concept. That doesn’t apply to answers like “Yes” or “No” though, but standard FAQ-like problems are a bit easier as there is typically an overlap between both texts. Not necessarily in terms of wording, but in their semantics. ![Bi-encoder structure. Both queries (questions) and documents (answers) are vectorized by the same neural encoder. Output embeddings are then compared by a chosen distance function, typically cosine similarity.](https://qdrant.tech/articles_data/qa-with-cohere-and-qdrant/biencoder-diagram.png) And yeah, you need to **bring your own embeddings**, in order to even start. There are various ways how to obtain them, but using Cohere [co.embed API](https://docs.cohere.ai/reference/embed) is probably the easiest and most convenient method. ## [Anchor](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/\#why-coembed-api-and-qdrant-go-well-together) Why co.embed API and Qdrant go well together? Maintaining a **Large Language Model** might be hard and expensive. Scaling it up and down, when the traffic changes, require even more effort and becomes unpredictable. That might be definitely a blocker for any semantic search system. But if you want to start right away, you may consider using a SaaS model, Cohere’s [co.embed API](https://docs.cohere.ai/reference/embed) in particular. It gives you state-of-the-art language models available as a Highly Available HTTP service with no need to train or maintain your own service. As all the communication is done with JSONs, you can simply provide the co.embed output as Qdrant input. ```python --- # Putting the co.embed API response directly as Qdrant method input qdrant_client.upsert( collection_name="collection", points=rest.Batch( ids=[...], vectors=cohere_client.embed(...).embeddings, payloads=[...], ), ) ``` Both tools are easy to combine, so you can start working with semantic search in a few minutes, not days. And what if your needs are so specific that you need to fine-tune a general usage model? Co.embed API goes beyond pre-trained encoders and allows providing some custom datasets to [customize the embedding model with your own data](https://docs.cohere.com/docs/finetuning). As a result, you get the quality of domain-specific models, but without worrying about infrastructure. ## [Anchor](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/\#system-architecture-overview) System architecture overview In real systems, answers get vectorized and stored in an efficient vector search database. We typically don’t even need to provide specific answers, but just use sentences or paragraphs of text and vectorize them instead. Still, if a bit longer piece of text contains the answer to a particular question, its distance to the question embedding should not be that far away. And for sure closer than all the other, non-matching answers. Storing the answer embeddings in a vector database makes the search process way easier. ![Building the database of possible answers. All the texts are converted into their vector embeddings and those embeddings are stored in a vector database, i.e. Qdrant.](https://qdrant.tech/articles_data/qa-with-cohere-and-qdrant/vector-database.png) ## [Anchor](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/\#looking-for-the-correct-answer) Looking for the correct answer Once our database is working and all the answer embeddings are already in place, we can start querying it. We basically perform the same vectorization on a given question and ask the database to provide some near neighbours. We rely on the embeddings to be close to each other, so we expect the points with the smallest distance in the latent space to contain the proper answer. ![While searching, a question gets vectorized by the same neural encoder. Vector database is a component that looks for the closest answer vectors using i.e. cosine similarity. A proper system, like Qdrant, will make the lookup process more efficient, as it won’t calculate the distance to all the answer embeddings. Thanks to HNSW, it will be able to find the nearest neighbours with sublinear complexity.](https://qdrant.tech/articles_data/qa-with-cohere-and-qdrant/search-with-vector-database.png) ## [Anchor](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/\#implementing-the-qa-search-system-with-saas-tools) Implementing the QA search system with SaaS tools We don’t want to maintain our own service for the neural encoder, nor even set up a Qdrant instance. There are SaaS solutions for both — Cohere’s [co.embed API](https://docs.cohere.ai/reference/embed) and [Qdrant Cloud](https://qdrant.to/cloud), so we’ll use them instead of on-premise tools. ### [Anchor](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/\#question-answering-on-biomedical-data) Question Answering on biomedical data We’re going to implement the Question Answering system for the biomedical data. There is a _[pubmed\_qa](https://huggingface.co/datasets/pubmed_qa)_ dataset, with it _pqa\_labeled_ subset containing 1,000 examples of questions and answers labelled by domain experts. Our system is going to be fed with the embeddings generated by co.embed API and we’ll load them to Qdrant. Using Qdrant Cloud vs your own instance does not matter much here. There is a subtle difference in how to connect to the cloud instance, but all the other operations are executed in the same way. ```python from datasets import load_dataset --- # Loading the dataset from HuggingFace hub. It consists of several columns: pubid, --- # question, context, long_answer and final_decision. For the purposes of our system, --- # we’ll use question and long_answer. dataset = load_dataset("pubmed_qa", "pqa_labeled") ``` | **pubid** | **question** | **context** | **long\_answer** | **final\_decision** | | --- | --- | --- | --- | --- | | 18802997 | Can calprotectin predict relapse risk in infla… | … | Measuring calprotectin may help to identify UC… | maybe | | 20538207 | Should temperature be monitorized during kidne… | … | The new storage can affords more stable temper… | no | | 25521278 | Is plate clearing a risk factor for obesity? | … | The tendency to clear one’s plate when eating … | yes | | 17595200 | Is there an intrauterine influence on obesity? | … | Comparison of mother-offspring and father-offs.. | no | | 15280782 | Is unsafe sexual behaviour increasing among HI… | … | There was no evidence of a trend in unsafe sex… | no | ### [Anchor](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/\#using-cohere-and-qdrant-to-build-the-answers-database) Using Cohere and Qdrant to build the answers database In order to start generating the embeddings, you need to [create a Cohere account](https://dashboard.cohere.ai/welcome/register). That will start your trial period, so you’ll be able to vectorize the texts for free. Once logged in, your default API key will be available in [Settings](https://dashboard.cohere.ai/api-keys). We’ll need it to call the co.embed API. with the official python package. ```python import cohere cohere_client = cohere.Client(COHERE_API_KEY) --- # Generating the embeddings with Cohere client library embeddings = cohere_client.embed( texts=["A test sentence"], model="large", ) vector_size = len(embeddings.embeddings[0]) print(vector_size) # output: 4096 ``` Let’s connect to the Qdrant instance first and create a collection with the proper configuration, so we can put some embeddings into it later on. ```python --- # Connecting to Qdrant Cloud with qdrant-client requires providing the api_key. --- # If you use an on-premise instance, it has to be skipped. qdrant_client = QdrantClient( host="xyz-example.eu-central.aws.cloud.qdrant.io", prefer_grpc=True, api_key=QDRANT_API_KEY, ) ``` Now we’re able to vectorize all the answers. They are going to form our collection, so we can also put them already into Qdrant, along with the payloads and identifiers. That will make our dataset easily searchable. ```python answer_response = cohere_client.embed( texts=dataset["train"]["long_answer"], model="large", ) vectors = [\ # Conversion to float is required for Qdrant\ list(map(float, vector))\ for vector in answer_response.embeddings\ ] ids = [entry["pubid"] for entry in dataset["train"]] --- # Filling up Qdrant collection with the embeddings generated by Cohere co.embed API qdrant_client.upsert( collection_name="pubmed_qa", points=rest.Batch( ids=ids, vectors=vectors, payloads=list(dataset["train"]), ) ) ``` And that’s it. Without even setting up a single server on our own, we created a system that might be easily asked a question. I don’t want to call it serverless, as this term is already taken, but co.embed API with Qdrant Cloud makes everything way easier to maintain. ### [Anchor](https://qdrant.tech/articles/qa-with-cohere-and-qdrant/\#answering-the-questions-with-semantic-search--the-quality) Answering the questions with semantic search — the quality It’s high time to query our database with some questions. It might be interesting to somehow measure the quality of the system in general. In those kinds of problems we typically use _top-k accuracy_. We assume the prediction of the system was correct if the correct answer was present in the first _k_ results. ```python --- # Finding the position at which Qdrant provided the expected answer for each question. --- # That allows to calculate accuracy@k for different values of k. k_max = 10 answer_positions = [] for embedding, pubid in tqdm(zip(question_response.embeddings, ids)): response = qdrant_client.search( collection_name="pubmed_qa", query_vector=embedding, limit=k_max, ) answer_ids = [record.id for record in response] if pubid in answer_ids: answer_positions.append(answer_ids.index(pubid)) else: answer_positions.append(-1) ``` Saved answer positions allow us to calculate the metric for different _k_ values. ```python --- # Prepared answer positions are being used to calculate different values of accuracy@k for k in range(1, k_max + 1): correct_answers = len( list( filter(lambda x: 0 <= x < k, answer_positions) ) ) print(f"accuracy@{k} =", correct_answers / len(dataset["train"])) ``` Here are the values of the top-k accuracy for different values of k: | **metric** | **value** | | --- | --- | | accuracy@1 | 0.877 | | accuracy@2 | 0.921 | | accuracy@3 | 0.942 | | accuracy@4 | 0.950 | | accuracy@5 | 0.956 | | accuracy@6 | 0.960 | | accuracy@7 | 0.964 | | accuracy@8 | 0.971 | | accuracy@9 | 0.976 | | accuracy@10 | 0.977 | It seems like our system worked pretty well even if we consider just the first result, with the lowest distance. We failed with around 12% of questions. But numbers become better with the higher values of k. It might be also valuable to check out what questions our system failed to answer, their perfect match and our guesses. We managed to implement a working Question Answering system within just a few lines of code. If you are fine with the results achieved, then you can start using it right away. Still, if you feel you need a slight improvement, then fine-tuning the model is a way to go. If you want to check out the full source code, it is available on [Google Colab](https://colab.research.google.com/drive/1YOYq5PbRhQ_cjhi6k4t1FnWgQm8jZ6hm?usp=sharing). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qa-with-cohere-and-qdrant.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qa-with-cohere-and-qdrant.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-177-lllmstxt|> ## geo-polygon-filter-gsoc - [Articles](https://qdrant.tech/articles/) - Google Summer of Code 2023 - Polygon Geo Filter for Qdrant Vector Database [Back to Qdrant Internals](https://qdrant.tech/articles/qdrant-internals/) --- # Google Summer of Code 2023 - Polygon Geo Filter for Qdrant Vector Database Zein Wen · October 12, 2023 ![Google Summer of Code 2023 - Polygon Geo Filter for Qdrant Vector Database](https://qdrant.tech/articles_data/geo-polygon-filter-gsoc/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/geo-polygon-filter-gsoc/\#introduction) Introduction Greetings, I’m Zein Wen, and I was a Google Summer of Code 2023 participant at Qdrant. I got to work with an amazing mentor, Arnaud Gourlay, on enhancing the Qdrant Geo Polygon Filter. This new feature allows users to refine their query results using polygons. As the latest addition to the Geo Filter family of radius and rectangle filters, this enhancement promises greater flexibility in querying geo data, unlocking interesting new use cases. ## [Anchor](https://qdrant.tech/articles/geo-polygon-filter-gsoc/\#project-overview) Project Overview ![A Use Case of Geo Filter](https://qdrant.tech/articles_data/geo-polygon-filter-gsoc/geo-filter-example.png) A Use Case of Geo Filter ( [https://traveltime.com/blog/map-postcode-data-catchment-area](https://traveltime.com/blog/map-postcode-data-catchment-area)) Because Qdrant is a powerful query vector database it presents immense potential for machine learning-driven applications, such as recommendation. However, the scope of vector queries alone may not always meet user requirements. Consider a scenario where you’re seeking restaurant recommendations; it’s not just about a list of restaurants, but those within your neighborhood. This is where the Geo Filter comes into play, enhancing query by incorporating additional filtering criteria. Up until now, Qdrant’s geographic filter options were confined to circular and rectangular shapes, which may not align with the diverse boundaries found in the real world. This scenario was exactly what led to a user feature request and we decided it would be a good feature to tackle since it introduces greater capability for geo-related queries. ## [Anchor](https://qdrant.tech/articles/geo-polygon-filter-gsoc/\#technical-challenges) Technical Challenges **1\. Geo Geometry Computation** ![Geo Space Basic Concept](https://qdrant.tech/articles_data/geo-polygon-filter-gsoc/basic-concept.png) Geo Space Basic Concept Internally, the Geo Filter doesn’t start by testing each individual geo location as this would be computationally expensive. Instead, we create a geo hash layer that [divides the world](https://en.wikipedia.org/wiki/Grid_%28spatial_index%29#Grid-based_spatial_indexing) into rectangles. When a spatial index is created for Qdrant entries it assigns the entry to the geohash for its location. During a query we first identify all potential geo hashes that satisfy the filters and subsequently check for location candidates within those hashes. Accomplishing this search involves two critical geometry computations: 1. determining if a polygon intersects with a rectangle 2. ascertaining if a point lies within a polygon. ![Geometry Computation Testing](https://qdrant.tech/articles_data/geo-polygon-filter-gsoc/geo-computation-testing.png) Geometry Computation Testing While we have a geo crate (a Rust library) that provides APIs for these computations, we dug in deeper to understand the underlying algorithms and verify their accuracy. This lead us to conduct extensive testing and visualization to determine correctness. In addition to assessing the current crate, we also discovered that there are multiple algorithms available for these computations. We invested time in exploring different approaches, such as [winding windows](https://en.wikipedia.org/wiki/Point_in_polygon#Winding%20number%20algorithm:~:text=of%20the%20algorithm.-,Winding%20number%20algorithm,-%5Bedit%5D) and [ray casting](https://en.wikipedia.org/wiki/Point_in_polygon#Winding%20number%20algorithm:~:text=.%5B2%5D-,Ray%20casting%20algorithm,-%5Bedit%5D), to grasp their distinctions, and pave the way for future improvements. Through this process, I enjoyed honing my ability to swiftly grasp unfamiliar concepts. In addition, I needed to develop analytical strategies to dissect and draw meaningful conclusions from them. This experience has been invaluable in expanding my problem-solving toolkit. **2\. Proto and JSON format design** Considerable effort was devoted to designing the ProtoBuf and JSON interfaces for this new feature. This component is directly exposed to users, requiring a consistent and user-friendly interface, which in turns help drive a a positive user experience and less code modifications in the future. Initially, we contemplated aligning our interface with the [GeoJSON](https://geojson.org/) specification, given its prominence as a standard for many geo-related APIs. However, we soon realized that the way GeoJSON defines geometries significantly differs from our current JSON and ProtoBuf coordinate definitions for our point radius and rectangular filter. As a result, we prioritized API-level consistency and user experience, opting to align the new polygon definition with all our existing definitions. In addition, we planned to develop a separate multi-polygon filter in addition to the polygon. However, after careful consideration, we recognize that, for our use case, polygon filters can achieve the same result as a multi-polygon filter. This relationship mirrors how we currently handle multiple circles or rectangles. Consequently, we deemed the multi-polygon filter redundant and would introduce unnecessary complexity to the API. Doing this work illustrated to me the challenge of navigating real-world solutions that require striking a balance between adhering to established standards and prioritizing user experience. It also was key to understanding the wisdom of focusing on developing what’s truly necessary for users, without overextending our efforts. ## [Anchor](https://qdrant.tech/articles/geo-polygon-filter-gsoc/\#outcomes) Outcomes **1\. Capability of Deep Dive** Navigating unfamiliar code bases, concepts, APIs, and techniques is a common challenge for developers. Participating in GSoC was akin to me going from the safety of a swimming pool and right into the expanse of the ocean. Having my mentor’s support during this transition was invaluable. He provided me with numerous opportunities to independently delve into areas I had never explored before. I have grown into no longer fearing unknown technical areas, whether it’s unfamiliar code, techniques, or concepts in specific domains. I’ve gained confidence in my ability to learn them step by step and use them to create the things I envision. **2\. Always Put User in Minds** Another crucial lesson I learned is the importance of considering the user’s experience and their specific use cases. While development may sometimes entail iterative processes, every aspect that directly impacts the user must be approached and executed with empathy. Neglecting this consideration can lead not only to functional errors but also erode the trust of users due to inconsistency and confusion, which then leads to them no longer using my work. **3\. Speak Up and Effectively Communicate** Finally, In the course of development, encountering differing opinions is commonplace. It’s essential to remain open to others’ ideas, while also possessing the resolve to communicate one’s own perspective clearly. This fosters productive discussions and ultimately elevates the quality of the development process. ### [Anchor](https://qdrant.tech/articles/geo-polygon-filter-gsoc/\#wrap-up) Wrap up Being selected for Google Summer of Code 2023 and collaborating with Arnaud and the other Qdrant engineers, along with all the other community members, has been a true privilege. I’m deeply grateful to those who invested their time and effort in reviewing my code, engaging in discussions about alternatives and design choices, and offering assistance when needed. Through these interactions, I’ve experienced firsthand the essence of open source and the culture that encourages collaboration. This experience not only allowed me to write Rust code for a real-world product for the first time, but it also opened the door to the amazing world of open source. Without a doubt, I’m eager to continue growing alongside this community and contribute to new features and enhancements that elevate the product. I’ve also become an advocate for Qdrant, introducing this project to numerous coworkers and friends in the tech industry. I’m excited to witness new users and contributors emerge from within my own network! If you want to try out my work, read the [documentation](https://qdrant.tech/documentation/concepts/filtering/#geo-polygon) and then, either sign up for a free [cloud account](https://cloud.qdrant.io/) or download the [Docker image](https://hub.docker.com/r/qdrant/qdrant). I look forward to seeing how people are using my work in their own applications! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/geo-polygon-filter-gsoc.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/geo-polygon-filter-gsoc.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-178-lllmstxt|> ## detecting-coffee-anomalies - [Articles](https://qdrant.tech/articles/) - Metric Learning for Anomaly Detection [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Metric Learning for Anomaly Detection Yusuf Sarıgöz · May 04, 2022 ![Metric Learning for Anomaly Detection](https://qdrant.tech/articles_data/detecting-coffee-anomalies/preview/title.jpg) Anomaly detection is a thirsting yet challenging task that has numerous use cases across various industries. The complexity results mainly from the fact that the task is data-scarce by definition. Similarly, anomalies are, again by definition, subject to frequent change, and they may take unexpected forms. For that reason, supervised classification-based approaches are: - Data-hungry - requiring quite a number of labeled data; - Expensive - data labeling is an expensive task itself; - Time-consuming - you would try to obtain what is necessarily scarce; - Hard to maintain - you would need to re-train the model repeatedly in response to changes in the data distribution. These are not desirable features if you want to put your model into production in a rapidly-changing environment. And, despite all the mentioned difficulties, they do not necessarily offer superior performance compared to the alternatives. In this post, we will detail the lessons learned from such a use case. ## [Anchor](https://qdrant.tech/articles/detecting-coffee-anomalies/\#coffee-beans) Coffee Beans [Agrivero.ai](https://agrivero.ai/) \- is a company making AI-enabled solution for quality control & traceability of green coffee for producers, traders, and roasters. They have collected and labeled more than **30 thousand** images of coffee beans with various defects - wet, broken, chipped, or bug-infested samples. This data is used to train a classifier that evaluates crop quality and highlights possible problems. ![Anomalies in coffee](https://qdrant.tech/articles_data/detecting-coffee-anomalies/detection.gif) Anomalies in coffee We should note that anomalies are very diverse, so the enumeration of all possible anomalies is a challenging task on it’s own. In the course of work, new types of defects appear, and shooting conditions change. Thus, a one-time labeled dataset becomes insufficient. Let’s find out how metric learning might help to address this challenge. ## [Anchor](https://qdrant.tech/articles/detecting-coffee-anomalies/\#metric-learning-approach) Metric Learning Approach In this approach, we aimed to encode images in an n-dimensional vector space and then use learned similarities to label images during the inference. The simplest way to do this is KNN classification. The algorithm retrieves K-nearest neighbors to a given query vector and assigns a label based on the majority vote. In production environment kNN classifier could be easily replaced with [Qdrant](https://github.com/qdrant/qdrant) vector search engine. ![Production deployment](https://qdrant.tech/articles_data/detecting-coffee-anomalies/anomalies_detection.png) Production deployment This approach has the following advantages: - We can benefit from unlabeled data, considering labeling is time-consuming and expensive. - The relevant metric, e.g., precision or recall, can be tuned according to changing requirements during the inference without re-training. - Queries labeled with a high score can be added to the KNN classifier on the fly as new data points. To apply metric learning, we need to have a neural encoder, a model capable of transforming an image into a vector. Training such an encoder from scratch may require a significant amount of data we might not have. Therefore, we will divide the training into two steps: - The first step is to train the autoencoder, with which we will prepare a model capable of representing the target domain. - The second step is finetuning. Its purpose is to train the model to distinguish the required types of anomalies. ![Model training architecture](https://qdrant.tech/articles_data/detecting-coffee-anomalies/anomaly_detection_training.png) Model training architecture ### [Anchor](https://qdrant.tech/articles/detecting-coffee-anomalies/\#step-1---autoencoder-for-unlabeled-data) Step 1 - Autoencoder for Unlabeled Data First, we pretrained a Resnet18-like model in a vanilla autoencoder architecture by leaving the labels aside. Autoencoder is a model architecture composed of an encoder and a decoder, with the latter trying to recreate the original input from the low-dimensional bottleneck output of the former. There is no intuitive evaluation metric to indicate the performance in this setup, but we can evaluate the success by examining the recreated samples visually. ![Example of image reconstruction with Autoencoder](https://qdrant.tech/articles_data/detecting-coffee-anomalies/image_reconstruction.png) Example of image reconstruction with Autoencoder Then we encoded a subset of the data into 128-dimensional vectors by using the encoder, and created a KNN classifier on top of these embeddings and associated labels. Although the results are promising, we can do even better by finetuning with metric learning. ### [Anchor](https://qdrant.tech/articles/detecting-coffee-anomalies/\#step-2---finetuning-with-metric-learning) Step 2 - Finetuning with Metric Learning We started by selecting 200 labeled samples randomly without replacement. In this step, The model was composed of the encoder part of the autoencoder with a randomly initialized projection layer stacked on top of it. We applied transfer learning from the frozen encoder and trained only the projection layer with Triplet Loss and an online batch-all triplet mining strategy. Unfortunately, the model overfitted quickly in this attempt. In the next experiment, we used an online batch-hard strategy with a trick to prevent vector space from collapsing. We will describe our approach in the further articles. This time it converged smoothly, and our evaluation metrics also improved considerably to match the supervised classification approach. ![Metrics for the autoencoder model with KNN classifier](https://qdrant.tech/articles_data/detecting-coffee-anomalies/ae_report_knn.png) Metrics for the autoencoder model with KNN classifier ![Metrics for the finetuned model with KNN classifier](https://qdrant.tech/articles_data/detecting-coffee-anomalies/ft_report_knn.png) Metrics for the finetuned model with KNN classifier We repeated this experiment with 500 and 2000 samples, but it showed only a slight improvement. Thus we decided to stick to 200 samples - see below for why. ## [Anchor](https://qdrant.tech/articles/detecting-coffee-anomalies/\#supervised-classification-approach) Supervised Classification Approach We also wanted to compare our results with the metrics of a traditional supervised classification model. For this purpose, a Resnet50 model was finetuned with ~30k labeled images, made available for training. Surprisingly, the F1 score was around ~0.86. Please note that we used only 200 labeled samples in the metric learning approach instead of ~30k in the supervised classification approach. These numbers indicate a huge saving with no considerable compromise in the performance. ## [Anchor](https://qdrant.tech/articles/detecting-coffee-anomalies/\#conclusion) Conclusion We obtained results comparable to those of the supervised classification method by using **only 0.66%** of the labeled data with metric learning. This approach is time-saving and resource-efficient, and that may be improved further. Possible next steps might be: - Collect more unlabeled data and pretrain a larger autoencoder. - Obtain high-quality labels for a small number of images instead of tens of thousands for finetuning. - Use hyperparameter optimization and possibly gradual unfreezing in the finetuning step. - Use [vector search engine](https://github.com/qdrant/qdrant) to serve Metric Learning in production. We are actively looking into these, and we will continue to publish our findings in this challenge and other use cases of metric learning. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/detecting-coffee-anomalies.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/detecting-coffee-anomalies.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-179-lllmstxt|> ## agentic-rag-crewai-zoom - [Documentation](https://qdrant.tech/documentation/) - Simple Agentic RAG System ![agentic-rag-crewai-zoom](https://qdrant.tech/documentation/examples/agentic-rag-crewai-zoom/agentic-rag-1.png) --- # [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#agentic-rag-with-crewai--qdrant-vector-database) Agentic RAG With CrewAI & Qdrant Vector Database | Time: 45 min | Level: Beginner | Output: [GitHub](https://github.com/qdrant/examples/tree/master/agentic_rag_zoom_crewai) | | | --- | --- | --- | --- | By combining the power of Qdrant for vector search and CrewAI for orchestrating modular agents, you can build systems that don’t just answer questions but analyze, interpret, and act. Traditional RAG systems focus on fetching data and generating responses, but they lack the ability to reason deeply or handle multi-step processes. In this tutorial, we’ll walk you through building an Agentic RAG system step by step. By the end, you’ll have a working framework for storing data in a Qdrant Vector Database and extracting insights using CrewAI agents in conjunction with Vector Search over your data. We already built this app for you. [Clone this repository](https://github.com/qdrant/examples/tree/master/agentic_rag_zoom_crewai) and follow along with the tutorial. ## [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#what-youll-build) What You’ll Build In this hands-on tutorial, we’ll create a system that: 1. Uses Qdrant to store and retrieve meeting transcripts as vector embeddings 2. Leverages CrewAI agents to analyze and summarize meeting data 3. Presents insights in a simple Streamlit interface for easy interaction This project demonstrates how to build a Vector Search powered Agentic workflow to extract insights from meeting recordings. By combining Qdrant’s vector search capabilities with CrewAI agents, users can search through and analyze their own meeting content. The application first converts the meeting transcript into vector embeddings and stores them in a Qdrant vector database. It then uses CrewAI agents to query the vector database and extract insights from the meeting content. Finally, it uses Anthropic Claude to generate natural language responses to user queries based on the extracted insights from the vector database. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#how-does-it-work) How Does It Work? When you interact with the system, here’s what happens behind the scenes: First the user submits a query to the system. In this example, we want to find out the average length of Marketing meetings. Since one of the data points from the meetings is the duration of the meeting, the agent can calculate the average duration of the meetings by averaging the duration of all meetings with the keyword “Marketing” in the topic or content. ![User Query Interface](https://qdrant.tech/articles_data/agentic-rag-crewai-zoom/query1.png) Next, the agent used the `search_meetings` tool to search the Qdrant vector database for the most semantically similar meeting points. We asked about Marketing meetings, so the agent searched the database with the search meeting tool for all meetings with the keyword “Marketing” in the topic or content. ![Vector Search Results](https://qdrant.tech/articles_data/agentic-rag-crewai-zoom/output0.png) Next, the agent used the `calculator` tool to find the average duration of the meetings. ![Duration Calculation](https://qdrant.tech/articles_data/agentic-rag-crewai-zoom/output.png) Finally, the agent used the `Information Synthesizer` tool to synthesize the analysis and present it in a natural language format. ![Synthesized Analysis](https://qdrant.tech/articles_data/agentic-rag-crewai-zoom/output4.png) The user sees the final output in a chat-like interface. ![Chat Interface](https://qdrant.tech/articles_data/agentic-rag-crewai-zoom/app.png) The user can then continue to interact with the system by asking more questions. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#architecture) Architecture The system is built on three main components: - **Qdrant Vector Database**: Stores meeting transcripts and summaries as vector embeddings, enabling semantic search - **CrewAI Framework**: Coordinates AI agents that handle different aspects of meeting analysis - **Anthropic Claude**: Provides natural language understanding and response generation 1. **Data Processing Pipeline** - Processes meeting transcripts and metadata - Creates embeddings with SentenceTransformer - Manages Qdrant collection and data upload 2. **AI Agent System** - Implements CrewAI agent logic - Handles vector search integration - Processes queries with Claude 3. **User Interface** - Provides chat-like web interface - Shows real-time processing feedback - Maintains conversation history * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#getting-started) Getting Started ![agentic-rag-crewai-zoom](https://qdrant.tech/documentation/examples/agentic-rag-crewai-zoom/agentic-rag-2.png) 1. **Get API Credentials for Qdrant**: - Sign up for an account at [Qdrant Cloud](https://cloud.qdrant.io/signup). - Create a new cluster and copy the **Cluster URL** (format: [https://xxx.gcp.cloud.qdrant.io](https://xxx.gcp.cloud.qdrant.io/)). - Go to **Data Access Control** and generate an **API key**. 2. **Get API Credentials for AI Services**: - Get an API key from [Anthropic](https://www.anthropic.com/) - Get an API key from [OpenAI](https://platform.openai.com/) * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#setup) Setup 1. **Clone the Repository**: ```bash git clone https://github.com/qdrant/examples.git cd agentic_rag_zoom_crewai ``` 2. **Create and Activate a Python Virtual Environment with Python 3.10 for compatibility**: ```bash python3.10 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate ``` 3. **Install Dependencies**: ```bash pip install -r requirements.txt ``` 4. **Configure Environment Variables**: Create a `.env.local` file with: ```bash openai_api_key=your_openai_key_here anthropic_api_key=your_anthropic_key_here qdrant_url=your_qdrant_url_here qdrant_api_key=your_qdrant_api_key_here ``` * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#usage) Usage ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#1-process-meeting-data) 1\. Process Meeting Data The [`data_loader.py`](https://github.com/qdrant/examples/blob/master/agentic_rag_zoom_crewai/vector/data_loader.py) script processes meeting data and stores it in Qdrant: ```bash python vector/data_loader.py ``` After this script has run, you should see a new collection in your Qdrant Cloud account called `zoom_recordings`. This collection contains the vector embeddings of the meeting transcripts. The points in the collection contain the original meeting data, including the topic, content, and summary. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#2-launch-the-interface) 2\. Launch the Interface The [`streamlit_app.py`](https://github.com/qdrant/examples/blob/master/agentic_rag_zoom_crewai/vector/streamlit_app.py) is located in the `vector` folder. To launch it, run: ```bash streamlit run vector/streamlit_app.py ``` When you run this script, you will be able to interact with the system through a chat-like interface. Ask questions about the meeting content, and the system will use the AI agents to find the most relevant information and present it in a natural language format. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#the-data-pipeline) The Data Pipeline At the heart of our system is the data processing pipeline: ```python class MeetingData: def _initialize(self): self.data_dir = Path(__file__).parent.parent / 'data' self.meetings = self._load_meetings() self.qdrant_client = QdrantClient( url=os.getenv('qdrant_url'), api_key=os.getenv('qdrant_api_key') ) self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2') ``` The singleton pattern in data\_loader.py is implemented through a MeetingData class that uses Python’s **new** and **init** methods. The class maintains a private \_instance variable to track if an instance exists, and a \_initialized flag to ensure the initialization code only runs once. When creating a new instance with MeetingData(), **new** first checks if \_instance exists - if it doesn’t, it creates one and sets the initialization flag to False. The **init** method then checks this flag, and if it’s False, runs the initialization code and sets the flag to True. This ensures that all subsequent calls to MeetingData() return the same instance with the same initialized resources. When processing meetings, we need to consider both the content and context. Each meeting gets converted into a rich text representation before being transformed into a vector: ```python text_to_embed = f""" Topic: {meeting.get('topic', '')} Content: {meeting.get('vtt_content', '')} Summary: {json.dumps(meeting.get('summary', {}))} """ ``` This structured format ensures our vector embeddings capture the full context of each meeting. But processing meetings one at a time would be inefficient. Instead, we batch process our data: ```python batch_size = 100 for i in range(0, len(points), batch_size): batch = points[i:i + batch_size] self.qdrant_client.upsert( collection_name='zoom_recordings', points=batch ) ``` ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#building-the-ai-agent-system) Building the AI Agent System Our AI system uses a tool-based approach. Let’s start with the simplest tool - a calculator for meeting statistics: ```python class CalculatorTool(BaseTool): name: str = "calculator" description: str = "Perform basic mathematical calculations" def _run(self, a: int, b: int) -> dict: return { "addition": a + b, "multiplication": a * b } ``` But the real power comes from our vector search integration. This tool converts natural language queries into vector representations and searches our meeting database: ```python class SearchMeetingsTool(BaseTool): def _run(self, query: str) -> List[Dict]: response = openai_client.embeddings.create( model="text-embedding-ada-002", input=query ) query_vector = response.data[0].embedding return self.qdrant_client.search( collection_name='zoom_recordings', query_vector=query_vector, limit=10 ) ``` The search results then feed into our analysis tool, which uses Claude to provide deeper insights: ```python class MeetingAnalysisTool(BaseTool): def _run(self, meeting_data: dict) -> Dict: meetings_text = self._format_meetings(meeting_data) message = client.messages.create( model="claude-3-sonnet-20240229", messages=[{\ "role": "user",\ "content": f"Analyze these meetings:\n\n{meetings_text}"\ }] ) ``` ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#orchestrating-the-workflow) Orchestrating the Workflow The magic happens when we bring these tools together under our agent framework. We create two specialized agents: ```python researcher = Agent( role='Research Assistant', goal='Find and analyze relevant information', tools=[calculator, searcher, analyzer] ) synthesizer = Agent( role='Information Synthesizer', goal='Create comprehensive and clear responses' ) ``` These agents work together in a coordinated workflow. The researcher gathers and analyzes information, while the synthesizer creates clear, actionable responses. This separation of concerns allows each agent to focus on its strengths. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#building-the-user-interface) Building the User Interface The Streamlit interface provides a clean, chat-like experience for interacting with our AI system. Let’s start with the basic setup: ```python st.set_page_config( page_title="Meeting Assistant", page_icon="🤖", layout="wide" ) ``` To make the interface more engaging, we add custom styling that makes the output easier to read: ```python st.markdown(""" """, unsafe_allow_html=True) ``` One of the key features is real-time feedback during processing. We achieve this with a custom output handler: ```python class ConsoleOutput: def __init__(self, placeholder): self.placeholder = placeholder self.buffer = [] self.update_interval = 0.5 # seconds self.last_update = time.time() def write(self, text): self.buffer.append(text) if time.time() - self.last_update > self.update_interval: self._update_display() ``` This handler buffers the output and updates the display periodically, creating a smooth user experience. When a user sends a query, we process it with visual feedback: ```python with st.chat_message("assistant"): message_placeholder = st.empty() progress_bar = st.progress(0) console_placeholder = st.empty() try: console_output = ConsoleOutput(console_placeholder) with contextlib.redirect_stdout(console_output): progress_bar.progress(0.3) full_response = get_crew_response(prompt) progress_bar.progress(1.0) ``` The interface maintains a chat history, making it feel like a natural conversation: ```python if "messages" not in st.session_state: st.session_state.messages = [] for message in st.session_state.messages: with st.chat_message(message["role"]): st.markdown(message["content"]) ``` We also include helpful examples and settings in the sidebar: ```python with st.sidebar: st.header("Settings") search_limit = st.slider("Number of results", 1, 10, 5) analysis_depth = st.select_slider( "Analysis Depth", options=["Basic", "Standard", "Detailed"], value="Standard" ) ``` This combination of features creates an interface that’s both powerful and approachable. Users can see their query being processed in real-time, adjust settings to their needs, and maintain context through the chat history. * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-crewai-zoom/\#conclusion) Conclusion ![agentic-rag-crewai-zoom](https://qdrant.tech/documentation/examples/agentic-rag-crewai-zoom/agentic-rag-3.png) This tutorial has demonstrated how to build a sophisticated meeting analysis system that combines vector search with AI agents. Let’s recap the key components we’ve covered: 1. **Vector Search Integration** - Efficient storage and retrieval of meeting content using Qdrant - Semantic search capabilities through vector embeddings - Batched processing for optimal performance 2. **AI Agent Framework** - Tool-based approach for modular functionality - Specialized agents for research and analysis - Integration with Claude for intelligent insights 3. **Interactive Interface** - Real-time feedback and progress tracking - Persistent chat history - Configurable search and analysis settings The resulting system demonstrates the power of combining vector search with AI agents to create an intelligent meeting assistant. By following this tutorial, you’ve learned how to: - Process and store meeting data efficiently - Implement semantic search capabilities - Create specialized AI agents for analysis - Build an intuitive user interface This foundation can be extended in many ways, such as: - Adding more specialized agents - Implementing additional analysis tools - Enhancing the user interface - Integrating with other data sources The code is available in the [repository](https://github.com/qdrant/examples/tree/master/agentic_rag_zoom_crewai), and we encourage you to experiment with your own modifications and improvements. * * * ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/agentic-rag-crewai-zoom.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/agentic-rag-crewai-zoom.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-180-lllmstxt|> ## rag-deepseek - [Documentation](https://qdrant.tech/documentation/) - 5 Minute RAG with Qdrant and DeepSeek ![deepseek-rag-qdrant](https://qdrant.tech/documentation/examples/rag-deepseek/deepseek.png) --- # [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#5-minute-rag-with-qdrant-and-deepseek) 5 Minute RAG with Qdrant and DeepSeek | Time: 5 min | Level: Beginner | Output: [GitHub](https://github.com/qdrant/examples/blob/master/rag-with-qdrant-deepseek/deepseek-qdrant.ipynb) | | | --- | --- | --- | --- | This tutorial demonstrates how to build a **Retrieval-Augmented Generation (RAG)** pipeline using Qdrant as a vector storage solution and DeepSeek for semantic query enrichment. RAG pipelines enhance Large Language Model (LLM) responses by providing contextually relevant data. ## [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#overview) Overview In this tutorial, we will: 1. Take sample text and turn it into vectors with FastEmbed. 2. Send the vectors to a Qdrant collection. 3. Connect Qdrant and DeepSeek into a minimal RAG pipeline. 4. Ask DeepSeek different questions and test answer accuracy. 5. Enrich DeepSeek prompts with content retrieved from Qdrant. 6. Evaluate answer accuracy before and after. #### [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#architecture) Architecture: ![deepseek-rag-architecture](https://qdrant.tech/documentation/examples/rag-deepseek/architecture.png) * * * ## [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#prerequisites) Prerequisites Ensure you have the following: - Python environment (3.9+) - Access to [Qdrant Cloud](https://qdrant.tech/) - A DeepSeek API key from [DeepSeek Platform](https://platform.deepseek.com/api_keys) ## [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#setup-qdrant) Setup Qdrant ```python pip install "qdrant-client[fastembed]>=1.14.1" ``` [Qdrant](https://qdrant.tech/) will act as a knowledge base providing the context information for the prompts we’ll be sending to the LLM. You can get a free-forever Qdrant cloud instance at [http://cloud.qdrant.io](http://cloud.qdrant.io/). Learn about setting up your instance from the [Quickstart](https://qdrant.tech/documentation/quickstart-cloud/). ```python QDRANT_URL = "https://xyz-example.eu-central.aws.cloud.qdrant.io:6333" QDRANT_API_KEY = "" ``` ### [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#instantiating-qdrant-client) Instantiating Qdrant Client ```python from qdrant_client import QdrantClient, models client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY) ``` ### [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#building-the-knowledge-base) Building the knowledge base Qdrant will use vector embeddings of our facts to enrich the original prompt with some context. Thus, we need to store the vector embeddings and the facts used to generate them. We’ll be using the [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) model via [FastEmbed](https://github.com/qdrant/fastembed/) \- A lightweight, fast, Python library for embeddings generation. The Qdrant client provides a handy integration with FastEmbed that makes building a knowledge base very straighforward. First, we need to create a collection, so Qdrant would know what vectors it will be dealing with, and then, we just pass our raw documents wrapped into `models.Document` to compute and upload the embeddings. pythonpython ```python collection_name = "knowledge_base" model_name = "BAAI/bge-small-en-v1.5" client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE) ) ``` ```python documents = [\ "Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!",\ "Docker helps developers build, share, and run applications anywhere — without tedious environment configuration or management.",\ "PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.",\ "MySQL is an open-source relational database management system (RDBMS). A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database.",\ "NGINX is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. NGINX is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.",\ "FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.",\ "SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.",\ "The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed times, dates, or intervals.",\ ] client.upsert( collection_name=collection_name, points=[\ models.PointStruct(\ id=idx,\ vector=models.Document(text=document, model=model_name),\ payload={"document": document},\ )\ for idx, document in enumerate(documents)\ ], ) ``` ## [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#setup-deepseek) Setup DeepSeek RAG changes the way we interact with Large Language Models. We’re converting a knowledge-oriented task, in which the model may create a counterfactual answer, into a language-oriented task. The latter expects the model to extract meaningful information and generate an answer. LLMs, when implemented correctly, are supposed to be carrying out language-oriented tasks. The task starts with the original prompt sent by the user. The same prompt is then vectorized and used as a search query for the most relevant facts. Those facts are combined with the original prompt to build a longer prompt containing more information. But let’s start simply by asking our question directly. ```python prompt = """ What tools should I need to use to build a web service using vector embeddings for search? """ ``` Using the Deepseek API requires providing the API key. You can obtain it from the [DeepSeek platform](https://platform.deepseek.com/api_keys). Now we can finally call the completion API. ```python import requests import json --- # Fill the environmental variable with your own Deepseek API key --- # See: https://platform.deepseek.com/api_keys API_KEY = "" HEADERS = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", } def query_deepseek(prompt): data = { "model": "deepseek-chat", "messages": [{"role": "user", "content": prompt}], "stream": False, } response = requests.post( "https://api.deepseek.com/chat/completions", headers=HEADERS, data=json.dumps(data) ) if response.ok: result = response.json() return result["choices"][0]["message"]["content"] else: raise Exception(f"Error {response.status_code}: {response.text}") ``` and also the query ```python query_deepseek(prompt) ``` The response is: ```bash "Building a web service that uses vector embeddings for search involves several components, including data processing, embedding generation, storage, search, and serving the service via an API. Below is a list of tools and technologies you can use for each step:\n\n---\n\n### 1. **Data Processing**\n - **Python**: For general data preprocessing and scripting.\n - **Pandas**: For handling tabular data.\n - **NumPy**: For numerical operations.\n - **NLTK/Spacy**: For text preprocessing (tokenization, stemming, etc.).\n - **LLM models**: For generating embeddings if you're using pre-trained models.\n\n---\n\n### 2. **Embedding Generation**\n - **Pre-trained Models**:\n - Embeddings (e.g., `text-embedding-ada-002`).\n - Hugging Face Transformers (e.g., `Sentence-BERT`, `all-MiniLM-L6-v2`).\n - Google's Universal Sentence Encoder.\n - **Custom Models**:\n - TensorFlow/PyTorch: For training custom embedding models.\n - **Libraries**:\n - `sentence-transformers`: For generating sentence embeddings.\n - `transformers`: For using Hugging Face models.\n\n---\n\n### 3. **Vector Storage**\n - **Vector Databases**:\n - Pinecone: Managed vector database for similarity search.\n - Weaviate: Open-source vector search engine.\n - Milvus: Open-source vector database.\n - FAISS (Facebook AI Similarity Search): Library for efficient similarity search.\n - Qdrant: Open-source vector search engine.\n - Redis with RedisAI: For storing and querying vectors.\n - **Traditional Databases with Vector Support**:\n - PostgreSQL with pgvector extension.\n - Elasticsearch with dense vector support.\n\n---\n\n### 4. **Search and Retrieval**\n - **Similarity Search Algorithms**:\n - Cosine similarity, Euclidean distance, or dot product for comparing vectors.\n - **Libraries**:\n - FAISS: For fast nearest-neighbor search.\n - Annoy (Approximate Nearest Neighbors Oh Yeah): For approximate nearest neighbor search.\n - **Vector Databases**: Most vector databases (e.g., Pinecone, Weaviate) come with built-in search capabilities.\n\n---\n\n### 5. **Web Service Framework**\n - **Backend Frameworks**:\n - Flask/Django/FastAPI (Python): For building RESTful APIs.\n - Node.js/Express: If you prefer JavaScript.\n - **API Documentation**:\n - Swagger/OpenAPI: For documenting your API.\n - **Authentication**:\n - OAuth2, JWT: For securing your API.\n\n---\n\n### 6. **Deployment**\n - **Containerization**:\n - Docker: For packaging your application.\n - **Orchestration**:\n - Kubernetes: For managing containers at scale.\n - **Cloud Platforms**:\n - AWS (EC2, Lambda, S3).\n - Google Cloud (Compute Engine, Cloud Functions).\n - Azure (App Service, Functions).\n - **Serverless**:\n - AWS Lambda, Google Cloud Functions, or Vercel for serverless deployment.\n\n---\n\n### 7. **Monitoring and Logging**\n - **Monitoring**:\n - Prometheus + Grafana: For monitoring performance.\n - **Logging**:\n - ELK Stack (Elasticsearch, Logstash, Kibana).\n - Fluentd.\n - **Error Tracking**:\n - Sentry.\n\n---\n\n### 8. **Frontend (Optional)**\n - **Frontend Frameworks**:\n - React, Vue.js, or Angular: For building a user interface.\n - **Libraries**:\n - Axios: For making API calls from the frontend.\n\n---\n\n### Example Workflow\n1. Preprocess your data (e.g., clean text, tokenize).\n2. Generate embeddings using a pre-trained model (e.g., Hugging Face).\n3. Store embeddings in a vector database (e.g., Pinecone or FAISS).\n4. Build a REST API using FastAPI or Flask to handle search queries.\n5. Deploy the service using Docker and Kubernetes or a serverless platform.\n6. Monitor and scale the service as needed.\n\n---\n\n### Example Tools Stack\n- **Embedding Generation**: Hugging Face `sentence-transformers`.\n- **Vector Storage**: Pinecone or FAISS.\n- **Web Framework**: FastAPI.\n- **Deployment**: Docker + AWS/GCP.\n\nBy combining these tools, you can build a scalable and efficient web service for vector embedding-based search." ``` ### [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#extending-the-prompt) Extending the prompt Even though the original answer sounds credible, it didn’t answer our question correctly. Instead, it gave us a generic description of an application stack. To improve the results, enriching the original prompt with the descriptions of the tools available seems like one of the possibilities. Let’s use a semantic knowledge base to augment the prompt with the descriptions of different technologies! ```python results = client.query_points( collection_name=collection_name, query=models.Document(text=prompt, model=model_name), limit=3, ) results ``` Here is the response: ```bash QueryResponse(points=[\ ScoredPoint(id=0, version=0, score=0.67437416, payload={'document': 'Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!'}, vector=None, shard_key=None, order_value=None),\ ScoredPoint(id=6, version=0, score=0.63144326, payload={'document': 'SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.'}, vector=None, shard_key=None, order_value=None),\ ScoredPoint(id=5, version=0, score=0.6064749, payload={'document': 'FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.'}, vector=None, shard_key=None, order_value=None)\ ]) ``` We used the original prompt to perform a semantic search over the set of tool descriptions. Now we can use these descriptions to augment the prompt and create more context. ```python context = "\n".join(r.payload['document'] for r in results.points) context ``` The response is: ```bash 'Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!\nFastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.\nPyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.' ``` Finally, let’s build a metaprompt, the combination of the assumed role of the LLM, the original question, and the results from our semantic search that will force our LLM to use the provided context. By doing this, we effectively convert the knowledge-oriented task into a language task and hopefully reduce the chances of hallucinations. It also should make the response sound more relevant. ```python metaprompt = f""" You are a software architect. Answer the following question using the provided context. If you can't find the answer, do not pretend you know it, but answer "I don't know". Question: {prompt.strip()} Context: {context.strip()} Answer: """ --- # Look at the full metaprompt print(metaprompt) ``` **Response:** ```bash You are a software architect. Answer the following question using the provided context. If you can't find the answer, do not pretend you know it, but answer "I don't know". Question: What tools should I need to use to build a web service using vector embeddings for search? Context: Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more! FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints. PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing. Answer: ``` Our current prompt is much longer, and we also used a couple of strategies to make the responses even better: 1. The LLM has the role of software architect. 2. We provide more context to answer the question. 3. If the context contains no meaningful information, the model shouldn’t make up an answer. Let’s find out if that works as expected. **Question:** ```python query_deepseek(metaprompt) ``` **Answer:** ```bash 'To build a web service using vector embeddings for search, you can use the following tools:\n\n1. **Qdrant**: As a vector database and similarity search engine, Qdrant will handle the storage and retrieval of high-dimensional vectors. It provides an API service for searching and matching vectors, making it ideal for applications that require vector-based search functionality.\n\n2. **FastAPI**: This web framework is perfect for building the API layer of your web service. It is fast, easy to use, and based on Python type hints, which makes it a great choice for developing the backend of your service. FastAPI will allow you to expose endpoints that interact with Qdrant for vector search operations.\n\n3. **PyTorch**: If you need to generate vector embeddings from your data (e.g., text, images), PyTorch can be used to create and train neural network models that produce these embeddings. PyTorch is a powerful machine learning framework that supports a wide range of applications, including natural language processing and computer vision.\n\n### Summary:\n- **Qdrant** for vector storage and search.\n- **FastAPI** for building the web service API.\n- **PyTorch** for generating vector embeddings (if needed).\n\nThese tools together provide a robust stack for building a web service that leverages vector embeddings for search functionality.' ``` ### [Anchor](https://qdrant.tech/documentation/rag-deepseek/\#testing-out-the-rag-pipeline) Testing out the RAG pipeline By leveraging the semantic context we provided our model is doing a better job answering the question. Let’s enclose the RAG as a function, so we can call it more easily for different prompts. ```python def rag(question: str, n_points: int = 3) -> str: results = client.query_points( collection_name=collection_name, query=models.Document(text=question, model=model_name), limit=n_points, ) context = "\n".join(r.payload["document"] for r in results.points) metaprompt = f""" You are a software architect. Answer the following question using the provided context. If you can't find the answer, do not pretend you know it, but only answer "I don't know". Question: {question.strip()} Context: {context.strip()} Answer: """ return query_deepseek(metaprompt) ``` Now it’s easier to ask a broad range of questions. **Question:** ```python rag("What can the stack for a web api look like?") ``` **Answer:** ```bash 'The stack for a web API can include the following components based on the provided context:\n\n1. **Web Framework**: FastAPI can be used as the web framework for building the API. It is modern, fast, and leverages Python type hints for better development and performance.\n\n2. **Reverse Proxy/Web Server**: NGINX can be used as a reverse proxy or web server to handle incoming HTTP requests, load balancing, and serving static content. It is known for its high performance and low resource consumption.\n\n3. **Containerization**: Docker can be used to containerize the application, making it easier to build, share, and run the API consistently across different environments without worrying about configuration issues.\n\nThis stack provides a robust, scalable, and efficient setup for building and deploying a web API.' ``` **Question:** ```python rag("Where is the nearest grocery store?") ``` **Answer:** ```bash "I don't know. The provided context does not contain any information about the location of the nearest grocery store." ``` Our model can now: 1. Take advantage of the knowledge in our vector datastore. 2. Answer, based on the provided context, that it can not provide an answer. We have just shown a useful mechanism to mitigate the risks of hallucinations in Large Language Models. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/rag-deepseek.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/rag-deepseek.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) ![Company Logo](https://cdn.cookielaw.org/logos/static/ot_company_logo.png) ## Privacy Preference Center Cookies used on the site are categorized, and below, you can read about each category and allow or deny some or all of them. When categories that have been previously allowed are disabled, all cookies assigned to that category will be removed from your browser. Additionally, you can see a list of cookies assigned to each category and detailed information in the cookie declaration. [More information](https://qdrant.tech/legal/privacy-policy/#cookies-and-web-beacons) Allow All ### Manage Consent Preferences #### Targeting Cookies Targeting Cookies These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising. #### Functional Cookies Functional Cookies These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly. #### Strictly Necessary Cookies Always Active These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information. #### Performance Cookies Performance Cookies These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance. Back Button ### Cookie List Search Icon Filter Icon Clear checkbox labellabel ApplyCancel ConsentLeg.Interest checkbox labellabel checkbox labellabel checkbox labellabel Reject AllConfirm My Choices [![Powered by Onetrust](https://cdn.cookielaw.org/logos/static/powered_by_logo.svg)](https://www.onetrust.com/products/cookie-consent/) <|page-181-lllmstxt|> ## filtering - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Filtering --- # [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#filtering) Filtering With Qdrant, you can set conditions when searching or retrieving points. For example, you can impose conditions on both the [payload](https://qdrant.tech/documentation/concepts/payload/) and the `id` of the point. Setting additional conditions is important when it is impossible to express all the features of the object in the embedding. Examples include a variety of business requirements: stock availability, user location, or desired price range. ## [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#related-content) Related Content | [A Complete Guide to Filtering in Vector Search](https://qdrant.tech/articles/vector-search-filtering/) | Developer advice on proper usage and advanced practices. | | --- | --- | ## [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#filtering-clauses) Filtering clauses Qdrant allows you to combine conditions in clauses. Clauses are different logical operations, such as `OR`, `AND`, and `NOT`. Clauses can be recursively nested into each other so that you can reproduce an arbitrary boolean expression. Let’s take a look at the clauses implemented in Qdrant. Suppose we have a set of points with the following payload: ```json [\ { "id": 1, "city": "London", "color": "green" },\ { "id": 2, "city": "London", "color": "red" },\ { "id": 3, "city": "London", "color": "blue" },\ { "id": 4, "city": "Berlin", "color": "red" },\ { "id": 5, "city": "Moscow", "color": "green" },\ { "id": 6, "city": "Moscow", "color": "blue" }\ ] ``` ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#must) Must Example: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must": [\ { "key": "city", "match": { "value": "London" } },\ { "key": "color", "match": { "value": "red" } }\ ] } ... } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(value="London"),\ ),\ models.FieldCondition(\ key="color",\ match=models.MatchValue(value="red"),\ ),\ ] ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.scroll("{collection_name}", { filter: { must: [\ {\ key: "city",\ match: { value: "London" },\ },\ {\ key: "color",\ match: { value: "red" },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::must([\ Condition::matches("city", "london".to_string()),\ Condition::matches("color", "red".to_string()),\ ])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addAllMust( List.of(matchKeyword("city", "London"), matchKeyword("color", "red"))) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); // & operator combines two conditions in an AND conjunction(must) await client.ScrollAsync( collectionName: "{collection_name}", filter: MatchKeyword("city", "London") & MatchKeyword("color", "red") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), qdrant.NewMatch("color", "red"), }, }, }) ``` Filtered points would be: ```json [{ "id": 2, "city": "London", "color": "red" }] ``` When using `must`, the clause becomes `true` only if every condition listed inside `must` is satisfied. In this sense, `must` is equivalent to the operator `AND`. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#should) Should Example: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "should": [\ { "key": "city", "match": { "value": "London" } },\ { "key": "color", "match": { "value": "red" } }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( should=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(value="London"),\ ),\ models.FieldCondition(\ key="color",\ match=models.MatchValue(value="red"),\ ),\ ] ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { should: [\ {\ key: "city",\ match: { value: "London" },\ },\ {\ key: "color",\ match: { value: "red" },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::should([\ Condition::matches("city", "london".to_string()),\ Condition::matches("color", "red".to_string()),\ ])), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; import java.util.List; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addAllShould( List.of(matchKeyword("city", "London"), matchKeyword("color", "red"))) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); // | operator combines two conditions in an OR disjunction(should) await client.ScrollAsync( collectionName: "{collection_name}", filter: MatchKeyword("city", "London") | MatchKeyword("color", "red") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Should: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), qdrant.NewMatch("color", "red"), }, }, }) ``` Filtered points would be: ```json [\ { "id": 1, "city": "London", "color": "green" },\ { "id": 2, "city": "London", "color": "red" },\ { "id": 3, "city": "London", "color": "blue" },\ { "id": 4, "city": "Berlin", "color": "red" }\ ] ``` When using `should`, the clause becomes `true` if at least one condition listed inside `should` is satisfied. In this sense, `should` is equivalent to the operator `OR`. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#must-not) Must Not Example: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must_not": [\ { "key": "city", "match": { "value": "London" } },\ { "key": "color", "match": { "value": "red" } }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must_not=[\ models.FieldCondition(key="city", match=models.MatchValue(value="London")),\ models.FieldCondition(key="color", match=models.MatchValue(value="red")),\ ] ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must_not: [\ {\ key: "city",\ match: { value: "London" },\ },\ {\ key: "color",\ match: { value: "red" },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::must_not([\ Condition::matches("city", "london".to_string()),\ Condition::matches("color", "red".to_string()),\ ])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addAllMustNot( List.of(matchKeyword("city", "London"), matchKeyword("color", "red"))) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); // The ! operator negates the condition(must not) await client.ScrollAsync( collectionName: "{collection_name}", filter: !(MatchKeyword("city", "London") & MatchKeyword("color", "red")) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ MustNot: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), qdrant.NewMatch("color", "red"), }, }, }) ``` Filtered points would be: ```json [\ { "id": 5, "city": "Moscow", "color": "green" },\ { "id": 6, "city": "Moscow", "color": "blue" }\ ] ``` When using `must_not`, the clause becomes `true` if none of the conditions listed inside `must_not` is satisfied. In this sense, `must_not` is equivalent to the expression `(NOT A) AND (NOT B) AND (NOT C)`. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#clauses-combination) Clauses combination It is also possible to use several clauses simultaneously: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must": [\ { "key": "city", "match": { "value": "London" } }\ ], "must_not": [\ { "key": "color", "match": { "value": "red" } }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.FieldCondition(key="city", match=models.MatchValue(value="London")),\ ], must_not=[\ models.FieldCondition(key="color", match=models.MatchValue(value="red")),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must: [\ {\ key: "city",\ match: { value: "London" },\ },\ ], must_not: [\ {\ key: "color",\ match: { value: "red" },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter { must: vec![Condition::matches("city", "London".to_string())], must_not: vec![Condition::matches("color", "red".to_string())], ..Default::default() }), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addMust(matchKeyword("city", "London")) .addMustNot(matchKeyword("color", "red")) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: MatchKeyword("city", "London") & !MatchKeyword("color", "red") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), }, MustNot: []*qdrant.Condition{ qdrant.NewMatch("color", "red"), }, }, }) ``` Filtered points would be: ```json [\ { "id": 1, "city": "London", "color": "green" },\ { "id": 3, "city": "London", "color": "blue" }\ ] ``` In this case, the conditions are combined by `AND`. Also, the conditions could be recursively nested. Example: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must_not": [\ {\ "must": [\ { "key": "city", "match": { "value": "London" } },\ { "key": "color", "match": { "value": "red" } }\ ]\ }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must_not=[\ models.Filter(\ must=[\ models.FieldCondition(\ key="city", match=models.MatchValue(value="London")\ ),\ models.FieldCondition(\ key="color", match=models.MatchValue(value="red")\ ),\ ],\ ),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must_not: [\ {\ must: [\ {\ key: "city",\ match: { value: "London" },\ },\ {\ key: "color",\ match: { value: "red" },\ },\ ],\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::must_not([Filter::must(\ [\ Condition::matches("city", "London".to_string()),\ Condition::matches("color", "red".to_string()),\ ],\ )\ .into()])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.filter; import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addMustNot( filter( Filter.newBuilder() .addAllMust( List.of( matchKeyword("city", "London"), matchKeyword("color", "red"))) .build())) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: new Filter { MustNot = { MatchKeyword("city", "London") & MatchKeyword("color", "red") } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ MustNot: []*qdrant.Condition{ qdrant.NewFilterAsCondition(&qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), qdrant.NewMatch("color", "red"), }, }), }, }, }) ``` Filtered points would be: ```json [\ { "id": 1, "city": "London", "color": "green" },\ { "id": 3, "city": "London", "color": "blue" },\ { "id": 4, "city": "Berlin", "color": "red" },\ { "id": 5, "city": "Moscow", "color": "green" },\ { "id": 6, "city": "Moscow", "color": "blue" }\ ] ``` ## [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#filtering-conditions) Filtering conditions Different types of values in payload correspond to different kinds of queries that we can apply to them. Let’s look at the existing condition variants and what types of data they apply to. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#match) Match jsonpythontypescriptrustjavacsharpgo ```json { "key": "color", "match": { "value": "red" } } ``` ```python models.FieldCondition( key="color", match=models.MatchValue(value="red"), ) ``` ```typescript { key: 'color', match: {value: 'red'} } ``` ```rust Condition::matches("color", "red".to_string()) ``` ```java matchKeyword("color", "red"); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; MatchKeyword("color", "red"); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewMatch("color", "red") ``` For the other types, the match condition will look exactly the same, except for the type used: jsonpythontypescriptrustjavacsharpgo ```json { "key": "count", "match": { "value": 0 } } ``` ```python models.FieldCondition( key="count", match=models.MatchValue(value=0), ) ``` ```typescript { key: 'count', match: {value: 0} } ``` ```rust Condition::matches("count", 0) ``` ```java import static io.qdrant.client.ConditionFactory.match; match("count", 0); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; Match("count", 0); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewMatchInt("count", 0) ``` The simplest kind of condition is one that checks if the stored value equals the given one. If several values are stored, at least one of them should match the condition. You can apply it to [keyword](https://qdrant.tech/documentation/concepts/payload/#keyword), [integer](https://qdrant.tech/documentation/concepts/payload/#integer) and [bool](https://qdrant.tech/documentation/concepts/payload/#bool) payloads. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#match-any) Match Any _Available as of v1.1.0_ In case you want to check if the stored value is one of multiple values, you can use the Match Any condition. Match Any works as a logical OR for the given values. It can also be described as a `IN` operator. You can apply it to [keyword](https://qdrant.tech/documentation/concepts/payload/#keyword) and [integer](https://qdrant.tech/documentation/concepts/payload/#integer) payloads. Example: jsonpythontypescriptrustjavacsharpgo ```json { "key": "color", "match": { "any": ["black", "yellow"] } } ``` ```python models.FieldCondition( key="color", match=models.MatchAny(any=["black", "yellow"]), ) ``` ```typescript { key: 'color', match: {any: ['black', 'yellow']} } ``` ```rust Condition::matches("color", vec!["black".to_string(), "yellow".to_string()]) ``` ```java import static io.qdrant.client.ConditionFactory.matchKeywords; matchKeywords("color", List.of("black", "yellow")); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; Match("color", ["black", "yellow"]); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewMatchKeywords("color", "black", "yellow") ``` In this example, the condition will be satisfied if the stored value is either `black` or `yellow`. If the stored value is an array, it should have at least one value matching any of the given values. E.g. if the stored value is `["black", "green"]`, the condition will be satisfied, because `"black"` is in `["black", "yellow"]`. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#match-except) Match Except _Available as of v1.2.0_ In case you want to check if the stored value is not one of multiple values, you can use the Match Except condition. Match Except works as a logical NOR for the given values. It can also be described as a `NOT IN` operator. You can apply it to [keyword](https://qdrant.tech/documentation/concepts/payload/#keyword) and [integer](https://qdrant.tech/documentation/concepts/payload/#integer) payloads. Example: jsonpythontypescriptrustjavacsharpgo ```json { "key": "color", "match": { "except": ["black", "yellow"] } } ``` ```python models.FieldCondition( key="color", match=models.MatchExcept(**{"except": ["black", "yellow"]}), ) ``` ```typescript { key: 'color', match: {except: ['black', 'yellow']} } ``` ```rust use qdrant_client::qdrant::r#match::MatchValue; Condition::matches( "color", !MatchValue::from(vec!["black".to_string(), "yellow".to_string()]), ) ``` ```java import static io.qdrant.client.ConditionFactory.matchExceptKeywords; matchExceptKeywords("color", List.of("black", "yellow")); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; Match("color", ["black", "yellow"]); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewMatchExcept("color", "black", "yellow") ``` In this example, the condition will be satisfied if the stored value is neither `black` nor `yellow`. If the stored value is an array, it should have at least one value not matching any of the given values. E.g. if the stored value is `["black", "green"]`, the condition will be satisfied, because `"green"` does not match `"black"` nor `"yellow"`. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#nested-key) Nested key _Available as of v1.1.0_ Payloads being arbitrary JSON object, it is likely that you will need to filter on a nested field. For convenience, we use a syntax similar to what can be found in the [Jq](https://stedolan.github.io/jq/manual/#Basicfilters) project. Suppose we have a set of points with the following payload: ```json [\ {\ "id": 1,\ "country": {\ "name": "Germany",\ "cities": [\ {\ "name": "Berlin",\ "population": 3.7,\ "sightseeing": ["Brandenburg Gate", "Reichstag"]\ },\ {\ "name": "Munich",\ "population": 1.5,\ "sightseeing": ["Marienplatz", "Olympiapark"]\ }\ ]\ }\ },\ {\ "id": 2,\ "country": {\ "name": "Japan",\ "cities": [\ {\ "name": "Tokyo",\ "population": 9.3,\ "sightseeing": ["Tokyo Tower", "Tokyo Skytree"]\ },\ {\ "name": "Osaka",\ "population": 2.7,\ "sightseeing": ["Osaka Castle", "Universal Studios Japan"]\ }\ ]\ }\ }\ ] ``` You can search on a nested field using a dot notation. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "should": [\ {\ "key": "country.name",\ "match": {\ "value": "Germany"\ }\ }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( should=[\ models.FieldCondition(\ key="country.name", match=models.MatchValue(value="Germany")\ ),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { should: [\ {\ key: "country.name",\ match: { value: "Germany" },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::should([\ Condition::matches("country.name", "Germany".to_string()),\ ])), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addShould(matchKeyword("country.name", "Germany")) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync(collectionName: "{collection_name}", filter: MatchKeyword("country.name", "Germany")); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Should: []*qdrant.Condition{ qdrant.NewMatch("country.name", "Germany"), }, }, }) ``` You can also search through arrays by projecting inner values using the `[]` syntax. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "should": [\ {\ "key": "country.cities[].population",\ "range": {\ "gte": 9.0,\ }\ }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( should=[\ models.FieldCondition(\ key="country.cities[].population",\ range=models.Range(\ gt=None,\ gte=9.0,\ lt=None,\ lte=None,\ ),\ ),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { should: [\ {\ key: "country.cities[].population",\ range: {\ gt: null,\ gte: 9.0,\ lt: null,\ lte: null,\ },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, Range, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::should([\ Condition::range(\ "country.cities[].population",\ Range {\ gte: Some(9.0),\ ..Default::default()\ },\ ),\ ])), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.range; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.Range; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addShould( range( "country.cities[].population", Range.newBuilder().setGte(9.0).build())) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: Range("country.cities[].population", new Qdrant.Client.Grpc.Range { Gte = 9.0 }) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Should: []*qdrant.Condition{ qdrant.NewRange("country.cities[].population", &qdrant.Range{ Gte: qdrant.PtrOf(9.0), }), }, }, }) ``` This query would only output the point with id 2 as only Japan has a city with population greater than 9.0. And the leaf nested field can also be an array. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "should": [\ {\ "key": "country.cities[].sightseeing",\ "match": {\ "value": "Osaka Castle"\ }\ }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( should=[\ models.FieldCondition(\ key="country.cities[].sightseeing",\ match=models.MatchValue(value="Osaka Castle"),\ ),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { should: [\ {\ key: "country.cities[].sightseeing",\ match: { value: "Osaka Castle" },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::should([\ Condition::matches("country.cities[].sightseeing", "Osaka Castle".to_string()),\ ])), ) .await?; ``` ```java import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addShould(matchKeyword("country.cities[].sightseeing", "Germany")) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: MatchKeyword("country.cities[].sightseeing", "Germany") ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Should: []*qdrant.Condition{ qdrant.NewMatch("country.cities[].sightseeing", "Germany"), }, }, }) ``` This query would only output the point with id 2 as only Japan has a city with the “Osaka castke” as part of the sightseeing. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#nested-object-filter) Nested object filter _Available as of v1.2.0_ By default, the conditions are taking into account the entire payload of a point. For instance, given two points with the following payload: ```json [\ {\ "id": 1,\ "dinosaur": "t-rex",\ "diet": [\ { "food": "leaves", "likes": false},\ { "food": "meat", "likes": true}\ ]\ },\ {\ "id": 2,\ "dinosaur": "diplodocus",\ "diet": [\ { "food": "leaves", "likes": true},\ { "food": "meat", "likes": false}\ ]\ }\ ] ``` The following query would match both points: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must": [\ {\ "key": "diet[].food",\ "match": {\ "value": "meat"\ }\ },\ {\ "key": "diet[].likes",\ "match": {\ "value": true\ }\ }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.FieldCondition(\ key="diet[].food", match=models.MatchValue(value="meat")\ ),\ models.FieldCondition(\ key="diet[].likes", match=models.MatchValue(value=True)\ ),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must: [\ {\ key: "diet[].food",\ match: { value: "meat" },\ },\ {\ key: "diet[].likes",\ match: { value: true },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::must([\ Condition::matches("diet[].food", "meat".to_string()),\ Condition::matches("diet[].likes", true),\ ])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.match; import static io.qdrant.client.ConditionFactory.matchKeyword; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addAllMust( List.of(matchKeyword("diet[].food", "meat"), match("diet[].likes", true))) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: MatchKeyword("diet[].food", "meat") & Match("diet[].likes", true) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("diet[].food", "meat"), qdrant.NewMatchBool("diet[].likes", true), }, }, }) ``` This happens because both points are matching the two conditions: - the “t-rex” matches food=meat on `diet[1].food` and likes=true on `diet[1].likes` - the “diplodocus” matches food=meat on `diet[1].food` and likes=true on `diet[0].likes` To retrieve only the points which are matching the conditions on an array element basis, that is the point with id 1 in this example, you would need to use a nested object filter. Nested object filters allow arrays of objects to be queried independently of each other. It is achieved by using the `nested` condition type formed by a payload key to focus on and a filter to apply. The key should point to an array of objects and can be used with or without the bracket notation (“data” or “data\[\]”). httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must": [{\ "nested": {\ "key": "diet",\ "filter":{\ "must": [\ {\ "key": "food",\ "match": {\ "value": "meat"\ }\ },\ {\ "key": "likes",\ "match": {\ "value": true\ }\ }\ ]\ }\ }\ }] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.NestedCondition(\ nested=models.Nested(\ key="diet",\ filter=models.Filter(\ must=[\ models.FieldCondition(\ key="food", match=models.MatchValue(value="meat")\ ),\ models.FieldCondition(\ key="likes", match=models.MatchValue(value=True)\ ),\ ]\ ),\ )\ )\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must: [\ {\ nested: {\ key: "diet",\ filter: {\ must: [\ {\ key: "food",\ match: { value: "meat" },\ },\ {\ key: "likes",\ match: { value: true },\ },\ ],\ },\ },\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, NestedCondition, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::must([NestedCondition {\ key: "diet".to_string(),\ filter: Some(Filter::must([\ Condition::matches("food", "meat".to_string()),\ Condition::matches("likes", true),\ ])),\ }\ .into()])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.match; import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.ConditionFactory.nested; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addMust( nested( "diet", Filter.newBuilder() .addAllMust( List.of( matchKeyword("food", "meat"), match("likes", true))) .build())) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: Nested("diet", MatchKeyword("food", "meat") & Match("likes", true)) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewNestedFilter("diet", &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("food", "meat"), qdrant.NewMatchBool("likes", true), }, }), }, }, }) ``` The matching logic is modified to be applied at the level of an array element within the payload. Nested filters work in the same way as if the nested filter was applied to a single element of the array at a time. Parent document is considered to match the condition if at least one element of the array matches the nested filter. **Limitations** The `has_id` condition is not supported within the nested object filter. If you need it, place it in an adjacent `must` clause. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter":{ "must":[\ {\ "nested":{\ "key":"diet",\ "filter":{\ "must":[\ {\ "key":"food",\ "match":{\ "value":"meat"\ }\ },\ {\ "key":"likes",\ "match":{\ "value":true\ }\ }\ ]\ }\ }\ },\ {\ "has_id":[\ 1\ ]\ }\ ] } } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.NestedCondition(\ nested=models.Nested(\ key="diet",\ filter=models.Filter(\ must=[\ models.FieldCondition(\ key="food", match=models.MatchValue(value="meat")\ ),\ models.FieldCondition(\ key="likes", match=models.MatchValue(value=True)\ ),\ ]\ ),\ )\ ),\ models.HasIdCondition(has_id=[1]),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must: [\ {\ nested: {\ key: "diet",\ filter: {\ must: [\ {\ key: "food",\ match: { value: "meat" },\ },\ {\ key: "likes",\ match: { value: true },\ },\ ],\ },\ },\ },\ {\ has_id: [1],\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, NestedCondition, ScrollPointsBuilder}; client .scroll( ScrollPointsBuilder::new("{collection_name}").filter(Filter::must([\ NestedCondition {\ key: "diet".to_string(),\ filter: Some(Filter::must([\ Condition::matches("food", "meat".to_string()),\ Condition::matches("likes", true),\ ])),\ }\ .into(),\ Condition::has_id([1]),\ ])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.hasId; import static io.qdrant.client.ConditionFactory.match; import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.ConditionFactory.nested; import static io.qdrant.client.PointIdFactory.id; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addMust( nested( "diet", Filter.newBuilder() .addAllMust( List.of( matchKeyword("food", "meat"), match("likes", true))) .build())) .addMust(hasId(id(1))) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync( collectionName: "{collection_name}", filter: Nested("diet", MatchKeyword("food", "meat") & Match("likes", true)) & HasId(1) ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewNestedFilter("diet", &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("food", "meat"), qdrant.NewMatchBool("likes", true), }, }), qdrant.NewHasID(qdrant.NewIDNum(1)), }, }, }) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#full-text-match) Full Text Match _Available as of v0.10.0_ A special case of the `match` condition is the `text` match condition. It allows you to search for a specific substring, token or phrase within the text field. Exact texts that will match the condition depend on full-text index configuration. Configuration is defined during the index creation and describe at [full-text index](https://qdrant.tech/documentation/concepts/indexing/#full-text-index). If there is no full-text index for the field, the condition will work as exact substring match. jsonpythontypescriptrustjavacsharpgo ```json { "key": "description", "match": { "text": "good cheap" } } ``` ```python models.FieldCondition( key="description", match=models.MatchText(text="good cheap"), ) ``` ```typescript { key: 'description', match: {text: 'good cheap'} } ``` ```rust use qdrant_client::qdrant::Condition; Condition::matches_text("description", "good cheap") ``` ```java import static io.qdrant.client.ConditionFactory.matchText; matchText("description", "good cheap"); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; MatchText("description", "good cheap"); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewMatchText("description", "good cheap") ``` If the query has several words, then the condition will be satisfied only if all of them are present in the text. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#range) Range jsonpythontypescriptrustjavacsharpgo ```json { "key": "price", "range": { "gt": null, "gte": 100.0, "lt": null, "lte": 450.0 } } ``` ```python models.FieldCondition( key="price", range=models.Range( gt=None, gte=100.0, lt=None, lte=450.0, ), ) ``` ```typescript { key: 'price', range: { gt: null, gte: 100.0, lt: null, lte: 450.0 } } ``` ```rust use qdrant_client::qdrant::{Condition, Range}; Condition::range( "price", Range { gt: None, gte: Some(100.0), lt: None, lte: Some(450.0), }, ) ``` ```java import static io.qdrant.client.ConditionFactory.range; import io.qdrant.client.grpc.Points.Range; range("price", Range.newBuilder().setGte(100.0).setLte(450).build()); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; Range("price", new Qdrant.Client.Grpc.Range { Gte = 100.0, Lte = 450 }); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewRange("price", &qdrant.Range{ Gte: qdrant.PtrOf(100.0), Lte: qdrant.PtrOf(450.0), }) ``` The `range` condition sets the range of possible values for stored payload values. If several values are stored, at least one of them should match the condition. Comparisons that can be used: - `gt` \- greater than - `gte` \- greater than or equal - `lt` \- less than - `lte` \- less than or equal Can be applied to [float](https://qdrant.tech/documentation/concepts/payload/#float) and [integer](https://qdrant.tech/documentation/concepts/payload/#integer) payloads. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#datetime-range) Datetime Range The datetime range is a unique range condition, used for [datetime](https://qdrant.tech/documentation/concepts/payload/#datetime) payloads, which supports RFC 3339 formats. You do not need to convert dates to UNIX timestaps. During comparison, timestamps are parsed and converted to UTC. _Available as of v1.8.0_ jsonpythontypescriptrustjavacsharpgo ```json { "key": "date", "range": { "gt": "2023-02-08T10:49:00Z", "gte": null, "lt": null, "lte": "2024-01-31 10:14:31Z" } } ``` ```python models.FieldCondition( key="date", range=models.DatetimeRange( gt="2023-02-08T10:49:00Z", gte=None, lt=None, lte="2024-01-31T10:14:31Z", ), ) ``` ```typescript { key: 'date', range: { gt: '2023-02-08T10:49:00Z', gte: null, lt: null, lte: '2024-01-31T10:14:31Z' } } ``` ```rust use qdrant_client::qdrant::{Condition, DatetimeRange, Timestamp}; Condition::datetime_range( "date", DatetimeRange { gt: Some(Timestamp::date_time(2023, 2, 8, 10, 49, 0).unwrap()), gte: None, lt: None, lte: Some(Timestamp::date_time(2024, 1, 31, 10, 14, 31).unwrap()), }, ) ``` ```java import static io.qdrant.client.ConditionFactory.datetimeRange; import com.google.protobuf.Timestamp; import io.qdrant.client.grpc.Points.DatetimeRange; import java.time.Instant; long gt = Instant.parse("2023-02-08T10:49:00Z").getEpochSecond(); long lte = Instant.parse("2024-01-31T10:14:31Z").getEpochSecond(); datetimeRange("date", DatetimeRange.newBuilder() .setGt(Timestamp.newBuilder().setSeconds(gt)) .setLte(Timestamp.newBuilder().setSeconds(lte)) .build()); ``` ```csharp using Qdrant.Client.Grpc; Conditions.DatetimeRange( field: "date", gt: new DateTime(2023, 2, 8, 10, 49, 0, DateTimeKind.Utc), lte: new DateTime(2024, 1, 31, 10, 14, 31, DateTimeKind.Utc) ); ``` ```go import ( "time" "github.com/qdrant/go-client/qdrant" "google.golang.org/protobuf/types/known/timestamppb" ) qdrant.NewDatetimeRange("date", &qdrant.DatetimeRange{ Gt: timestamppb.New(time.Date(2023, 2, 8, 10, 49, 0, 0, time.UTC)), Lte: timestamppb.New(time.Date(2024, 1, 31, 10, 14, 31, 0, time.UTC)), }) ``` ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#uuid-match) UUID Match _Available as of v1.11.0_ Matching of UUID values works similarly to the regular `match` condition for strings. Functionally, it will work with `keyword` and `uuid` indexes exactly the same, but `uuid` index is more memory efficient. jsonpythontypescriptrustjavacsharpgo ```json { "key": "uuid", "match": { "value": "f47ac10b-58cc-4372-a567-0e02b2c3d479" } } ``` ```python models.FieldCondition( key="uuid", match=models.MatchValue(value="f47ac10b-58cc-4372-a567-0e02b2c3d479"), ) ``` ```typescript { key: 'uuid', match: {value: 'f47ac10b-58cc-4372-a567-0e02b2c3d479'} } ``` ```rust Condition::matches("uuid", "f47ac10b-58cc-4372-a567-0e02b2c3d479".to_string()) ``` ```java matchKeyword("uuid", "f47ac10b-58cc-4372-a567-0e02b2c3d479"); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; MatchKeyword("uuid", "f47ac10b-58cc-4372-a567-0e02b2c3d479"); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewMatch("uuid", "f47ac10b-58cc-4372-a567-0e02b2c3d479") ``` ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#geo) Geo #### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#geo-bounding-box) Geo Bounding Box jsonpythontypescriptrustjavacsharpgo ```json { "key": "location", "geo_bounding_box": { "bottom_right": { "lon": 13.455868, "lat": 52.495862 }, "top_left": { "lon": 13.403683, "lat": 52.520711 } } } ``` ```python models.FieldCondition( key="location", geo_bounding_box=models.GeoBoundingBox( bottom_right=models.GeoPoint( lon=13.455868, lat=52.495862, ), top_left=models.GeoPoint( lon=13.403683, lat=52.520711, ), ), ) ``` ```typescript { key: 'location', geo_bounding_box: { bottom_right: { lon: 13.455868, lat: 52.495862 }, top_left: { lon: 13.403683, lat: 52.520711 } } } ``` ```rust use qdrant_client::qdrant::{Condition, GeoBoundingBox, GeoPoint}; Condition::geo_bounding_box( "location", GeoBoundingBox { bottom_right: Some(GeoPoint { lon: 13.455868, lat: 52.495862, }), top_left: Some(GeoPoint { lon: 13.403683, lat: 52.520711, }), }, ) ``` ```java import static io.qdrant.client.ConditionFactory.geoBoundingBox; geoBoundingBox("location", 52.520711, 13.403683, 52.495862, 13.455868); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; GeoBoundingBox("location", 52.520711, 13.403683, 52.495862, 13.455868); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewGeoBoundingBox("location", 52.520711, 13.403683, 52.495862, 13.455868) ``` It matches with `location` s inside a rectangle with the coordinates of the upper left corner in `bottom_right` and the coordinates of the lower right corner in `top_left`. #### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#geo-radius) Geo Radius jsonpythontypescriptrustjavacsharpgo ```json { "key": "location", "geo_radius": { "center": { "lon": 13.403683, "lat": 52.520711 }, "radius": 1000.0 } } ``` ```python models.FieldCondition( key="location", geo_radius=models.GeoRadius( center=models.GeoPoint( lon=13.403683, lat=52.520711, ), radius=1000.0, ), ) ``` ```typescript { key: 'location', geo_radius: { center: { lon: 13.403683, lat: 52.520711 }, radius: 1000.0 } } ``` ```rust use qdrant_client::qdrant::{Condition, GeoPoint, GeoRadius}; Condition::geo_radius( "location", GeoRadius { center: Some(GeoPoint { lon: 13.403683, lat: 52.520711, }), radius: 1000.0, }, ) ``` ```java import static io.qdrant.client.ConditionFactory.geoRadius; geoRadius("location", 52.520711, 13.403683, 1000.0f); ``` ```csharp using static Qdrant.Client.Grpc.Conditions; GeoRadius("location", 52.520711, 13.403683, 1000.0f); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewGeoRadius("location", 52.520711, 13.403683, 1000.0) ``` It matches with `location` s inside a circle with the `center` at the center and a radius of `radius` meters. If several values are stored, at least one of them should match the condition. These conditions can only be applied to payloads that match the [geo-data format](https://qdrant.tech/documentation/concepts/payload/#geo). #### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#geo-polygon) Geo Polygon Geo Polygons search is useful for when you want to find points inside an irregularly shaped area, for example a country boundary or a forest boundary. A polygon always has an exterior ring and may optionally include interior rings. A lake with an island would be an example of an interior ring. If you wanted to find points in the water but not on the island, you would make an interior ring for the island. When defining a ring, you must pick either a clockwise or counterclockwise ordering for your points. The first and last point of the polygon must be the same. Currently, we only support unprojected global coordinates (decimal degrees longitude and latitude) and we are datum agnostic. jsonpythontypescriptrustjavacsharpgo ```json { "key": "location", "geo_polygon": { "exterior": { "points": [\ { "lon": -70.0, "lat": -70.0 },\ { "lon": 60.0, "lat": -70.0 },\ { "lon": 60.0, "lat": 60.0 },\ { "lon": -70.0, "lat": 60.0 },\ { "lon": -70.0, "lat": -70.0 }\ ] }, "interiors": [\ {\ "points": [\ { "lon": -65.0, "lat": -65.0 },\ { "lon": 0.0, "lat": -65.0 },\ { "lon": 0.0, "lat": 0.0 },\ { "lon": -65.0, "lat": 0.0 },\ { "lon": -65.0, "lat": -65.0 }\ ]\ }\ ] } } ``` ```python models.FieldCondition( key="location", geo_polygon=models.GeoPolygon( exterior=models.GeoLineString( points=[\ models.GeoPoint(\ lon=-70.0,\ lat=-70.0,\ ),\ models.GeoPoint(\ lon=60.0,\ lat=-70.0,\ ),\ models.GeoPoint(\ lon=60.0,\ lat=60.0,\ ),\ models.GeoPoint(\ lon=-70.0,\ lat=60.0,\ ),\ models.GeoPoint(\ lon=-70.0,\ lat=-70.0,\ ),\ ] ), interiors=[\ models.GeoLineString(\ points=[\ models.GeoPoint(\ lon=-65.0,\ lat=-65.0,\ ),\ models.GeoPoint(\ lon=0.0,\ lat=-65.0,\ ),\ models.GeoPoint(\ lon=0.0,\ lat=0.0,\ ),\ models.GeoPoint(\ lon=-65.0,\ lat=0.0,\ ),\ models.GeoPoint(\ lon=-65.0,\ lat=-65.0,\ ),\ ]\ )\ ], ), ) ``` ```typescript { key: "location", geo_polygon: { exterior: { points: [\ {\ lon: -70.0,\ lat: -70.0\ },\ {\ lon: 60.0,\ lat: -70.0\ },\ {\ lon: 60.0,\ lat: 60.0\ },\ {\ lon: -70.0,\ lat: 60.0\ },\ {\ lon: -70.0,\ lat: -70.0\ }\ ] }, interiors: [\ {\ points: [\ {\ lon: -65.0,\ lat: -65.0\ },\ {\ lon: 0,\ lat: -65.0\ },\ {\ lon: 0,\ lat: 0\ },\ {\ lon: -65.0,\ lat: 0\ },\ {\ lon: -65.0,\ lat: -65.0\ }\ ]\ }\ ] } } ``` ```rust use qdrant_client::qdrant::{Condition, GeoLineString, GeoPoint, GeoPolygon}; Condition::geo_polygon( "location", GeoPolygon { exterior: Some(GeoLineString { points: vec![\ GeoPoint {\ lon: -70.0,\ lat: -70.0,\ },\ GeoPoint {\ lon: 60.0,\ lat: -70.0,\ },\ GeoPoint {\ lon: 60.0,\ lat: 60.0,\ },\ GeoPoint {\ lon: -70.0,\ lat: 60.0,\ },\ GeoPoint {\ lon: -70.0,\ lat: -70.0,\ },\ ], }), interiors: vec![GeoLineString {\ points: vec![\ GeoPoint {\ lon: -65.0,\ lat: -65.0,\ },\ GeoPoint {\ lon: 0.0,\ lat: -65.0,\ },\ GeoPoint { lon: 0.0, lat: 0.0 },\ GeoPoint {\ lon: -65.0,\ lat: 0.0,\ },\ GeoPoint {\ lon: -65.0,\ lat: -65.0,\ },\ ],\ }], }, ) ``` ```java import static io.qdrant.client.ConditionFactory.geoPolygon; import io.qdrant.client.grpc.Points.GeoLineString; import io.qdrant.client.grpc.Points.GeoPoint; geoPolygon( "location", GeoLineString.newBuilder() .addAllPoints( List.of( GeoPoint.newBuilder().setLon(-70.0).setLat(-70.0).build(), GeoPoint.newBuilder().setLon(60.0).setLat(-70.0).build(), GeoPoint.newBuilder().setLon(60.0).setLat(60.0).build(), GeoPoint.newBuilder().setLon(-70.0).setLat(60.0).build(), GeoPoint.newBuilder().setLon(-70.0).setLat(-70.0).build())) .build(), List.of( GeoLineString.newBuilder() .addAllPoints( List.of( GeoPoint.newBuilder().setLon(-65.0).setLat(-65.0).build(), GeoPoint.newBuilder().setLon(0.0).setLat(-65.0).build(), GeoPoint.newBuilder().setLon(0.0).setLat(0.0).build(), GeoPoint.newBuilder().setLon(-65.0).setLat(0.0).build(), GeoPoint.newBuilder().setLon(-65.0).setLat(-65.0).build())) .build())); ``` ```csharp using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; GeoPolygon( field: "location", exterior: new GeoLineString { Points = { new GeoPoint { Lat = -70.0, Lon = -70.0 }, new GeoPoint { Lat = 60.0, Lon = -70.0 }, new GeoPoint { Lat = 60.0, Lon = 60.0 }, new GeoPoint { Lat = -70.0, Lon = 60.0 }, new GeoPoint { Lat = -70.0, Lon = -70.0 } } }, interiors: [\ new()\ {\ Points =\ {\ new GeoPoint { Lat = -65.0, Lon = -65.0 },\ new GeoPoint { Lat = 0.0, Lon = -65.0 },\ new GeoPoint { Lat = 0.0, Lon = 0.0 },\ new GeoPoint { Lat = -65.0, Lon = 0.0 },\ new GeoPoint { Lat = -65.0, Lon = -65.0 }\ }\ }\ ] ); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewGeoPolygon("location", &qdrant.GeoLineString{ Points: []*qdrant.GeoPoint{ {Lat: -70, Lon: -70}, {Lat: 60, Lon: -70}, {Lat: 60, Lon: 60}, {Lat: -70, Lon: 60}, {Lat: -70, Lon: -70}, }, }, &qdrant.GeoLineString{ Points: []*qdrant.GeoPoint{ {Lat: -65, Lon: -65}, {Lat: 0, Lon: -65}, {Lat: 0, Lon: 0}, {Lat: -65, Lon: 0}, {Lat: -65, Lon: -65}, }, }) ``` A match is considered any point location inside or on the boundaries of the given polygon’s exterior but not inside any interiors. If several location values are stored for a point, then any of them matching will include that point as a candidate in the resultset. These conditions can only be applied to payloads that match the [geo-data format](https://qdrant.tech/documentation/concepts/payload/#geo). ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#values-count) Values count In addition to the direct value comparison, it is also possible to filter by the amount of values. For example, given the data: ```json [\ { "id": 1, "name": "product A", "comments": ["Very good!", "Excellent"] },\ { "id": 2, "name": "product B", "comments": ["meh", "expected more", "ok"] }\ ] ``` We can perform the search only among the items with more than two comments: jsonpythontypescriptrustjavacsharpgo ```json { "key": "comments", "values_count": { "gt": 2 } } ``` ```python models.FieldCondition( key="comments", values_count=models.ValuesCount(gt=2), ) ``` ```typescript { key: 'comments', values_count: {gt: 2} } ``` ```rust use qdrant_client::qdrant::{Condition, ValuesCount}; Condition::values_count( "comments", ValuesCount { gt: Some(2), ..Default::default() }, ) ``` ```java import static io.qdrant.client.ConditionFactory.valuesCount; import io.qdrant.client.grpc.Points.ValuesCount; valuesCount("comments", ValuesCount.newBuilder().setGt(2).build()); ``` ```csharp using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; ValuesCount("comments", new ValuesCount { Gt = 2 }); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewValuesCount("comments", &qdrant.ValuesCount{ Gt: qdrant.PtrOf(uint64(2)), }) ``` The result would be: ```json [{ "id": 2, "name": "product B", "comments": ["meh", "expected more", "ok"] }] ``` If stored value is not an array - it is assumed that the amount of values is equals to 1. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#is-empty) Is Empty Sometimes it is also useful to filter out records that are missing some value. The `IsEmpty` condition may help you with that: jsonpythontypescriptrustjavacsharpgo ```json { "is_empty": { "key": "reports" } } ``` ```python models.IsEmptyCondition( is_empty=models.PayloadField(key="reports"), ) ``` ```typescript { is_empty: { key: "reports" } } ``` ```rust use qdrant_client::qdrant::Condition; Condition::is_empty("reports") ``` ```java import static io.qdrant.client.ConditionFactory.isEmpty; isEmpty("reports"); ``` ```csharp using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; IsEmpty("reports"); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewIsEmpty("reports") ``` This condition will match all records where the field `reports` either does not exist, or has `null` or `[]` value. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#is-null) Is Null It is not possible to test for `NULL` values with the **match** condition. We have to use `IsNull` condition instead: jsonpythontypescriptrustjavacsharpgo ```json { "is_null": { "key": "reports" } } ``` ```python models.IsNullCondition( is_null=models.PayloadField(key="reports"), ) ``` ```typescript { is_null: { key: "reports" } } ``` ```rust use qdrant_client::qdrant::Condition; Condition::is_null("reports") ``` ```java import static io.qdrant.client.ConditionFactory.isNull; isNull("reports"); ``` ```csharp using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; IsNull("reports"); ``` ```go import "github.com/qdrant/go-client/qdrant" qdrant.NewIsNull("reports") ``` This condition will match all records where the field `reports` exists and has `NULL` value. ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#has-id) Has id This type of query is not related to payload, but can be very useful in some situations. For example, the user could mark some specific search results as irrelevant, or we want to search only among the specified points. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must": [\ { "has_id": [1,3,5,7,9,11] }\ ] } ... } ``` ```python client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.HasIdCondition(has_id=[1, 3, 5, 7, 9, 11]),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must: [\ {\ has_id: [1, 3, 5, 7, 9, 11],\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .scroll( ScrollPointsBuilder::new("{collection_name}") .filter(Filter::must([Condition::has_id([1, 3, 5, 7, 9, 11])])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.hasId; import static io.qdrant.client.PointIdFactory.id; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addMust(hasId(List.of(id(1), id(3), id(5), id(7), id(9), id(11)))) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync(collectionName: "{collection_name}", filter: HasId([1, 3, 5, 7, 9, 11])); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewHasID( qdrant.NewIDNum(1), qdrant.NewIDNum(3), qdrant.NewIDNum(5), qdrant.NewIDNum(7), qdrant.NewIDNum(9), qdrant.NewIDNum(11), ), }, }, }) ``` Filtered points would be: ```json [\ { "id": 1, "city": "London", "color": "green" },\ { "id": 3, "city": "London", "color": "blue" },\ { "id": 5, "city": "Moscow", "color": "green" }\ ] ``` ### [Anchor](https://qdrant.tech/documentation/concepts/filtering/\#has-vector) Has vector _Available as of v1.13.0_ This condition enables filtering by the presence of a given named vector on a point. For example, if we have two named vector in our collection. ```http PUT /collections/{collection_name} { "vectors": { "image": { "size": 4, "distance": "Dot" }, "text": { "size": 8, "distance": "Cosine" } }, "sparse_vectors": { "sparse-image": {}, "sparse-text": {}, }, } ``` Some points in the collection might have all vectors, some might have only a subset of them. This is how you can search for points which have the dense `image` vector defined: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/scroll { "filter": { "must": [\ { "has_vector": "image" }\ ] } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.scroll( collection_name="{collection_name}", scroll_filter=models.Filter( must=[\ models.HasVectorCondition(has_vector="image"),\ ], ), ) ``` ```typescript client.scroll("{collection_name}", { filter: { must: [\ {\ has_vector: "image",\ },\ ], }, }); ``` ```rust use qdrant_client::qdrant::{Condition, Filter, ScrollPointsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .scroll( ScrollPointsBuilder::new("{collection_name}") .filter(Filter::must([Condition::has_vector("image")])), ) .await?; ``` ```java import java.util.List; import static io.qdrant.client.ConditionFactory.hasVector; import static io.qdrant.client.PointIdFactory.id; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.ScrollPoints; client .scrollAsync( ScrollPoints.newBuilder() .setCollectionName("{collection_name}") .setFilter( Filter.newBuilder() .addMust(hasVector("image")) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.ScrollAsync(collectionName: "{collection_name}", filter: HasVector("image")); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Scroll(context.Background(), &qdrant.ScrollPoints{ CollectionName: "{collection_name}", Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewHasVector( "image", ), }, }, }) ``` ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/filtering.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/filtering.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-182-lllmstxt|> ## create-snapshot - [Documentation](https://qdrant.tech/documentation/) - [Database tutorials](https://qdrant.tech/documentation/database-tutorials/) - Create & Restore Snapshots --- # [Anchor](https://qdrant.tech/documentation/database-tutorials/create-snapshot/\#backup-and-restore-qdrant-collections-using-snapshots) Backup and Restore Qdrant Collections Using Snapshots | Time: 20 min | Level: Beginner | | | | --- | --- | --- | --- | A collection is a basic unit of data storage in Qdrant. It contains vectors, their IDs, and payloads. However, keeping the search efficient requires additional data structures to be built on top of the data. Building these data structures may take a while, especially for large collections. That’s why using snapshots is the best way to export and import Qdrant collections, as they contain all the bits and pieces required to restore the entire collection efficiently. This tutorial will show you how to create a snapshot of a collection and restore it. Since working with snapshots in a distributed environment might be thought to be a bit more complex, we will use a 3-node Qdrant cluster. However, the same approach applies to a single-node setup. You can use the techniques described in this page to migrate a cluster. Follow the instructions in this tutorial to create and download snapshots. When you [Restore from snapshot](https://qdrant.tech/documentation/database-tutorials/create-snapshot/#restore-from-snapshot), restore your data to the new cluster. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/create-snapshot/\#prerequisites) Prerequisites Let’s assume you already have a running Qdrant instance or a cluster. If not, you can follow the [installation guide](https://qdrant.tech/documentation/guides/installation/) to set up a local Qdrant instance or use [Qdrant Cloud](https://cloud.qdrant.io/) to create a cluster in a few clicks. Once the cluster is running, let’s install the required dependencies: ```shell pip install qdrant-client datasets ``` ### [Anchor](https://qdrant.tech/documentation/database-tutorials/create-snapshot/\#establish-a-connection-to-qdrant) Establish a connection to Qdrant We are going to use the Python SDK and raw HTTP calls to interact with Qdrant. Since we are going to use a 3-node cluster, we need to know the URLs of all the nodes. For the simplicity, let’s keep them all in constants, along with the API key, so we can refer to them later: ```python QDRANT_MAIN_URL = "https://my-cluster.com:6333" QDRANT_NODES = ( "https://node-0.my-cluster.com:6333", "https://node-1.my-cluster.com:6333", "https://node-2.my-cluster.com:6333", ) QDRANT_API_KEY = "my-api-key" ``` We can now create a client instance: ```python from qdrant_client import QdrantClient client = QdrantClient(QDRANT_MAIN_URL, api_key=QDRANT_API_KEY) ``` First of all, we are going to create a collection from a precomputed dataset. If you already have a collection, you can skip this step and start by [creating a snapshot](https://qdrant.tech/documentation/database-tutorials/create-snapshot/#create-and-download-snapshots). (Optional) Create collection and import data ### Load the dataset We are going to use a dataset with precomputed embeddings, available on Hugging Face Hub. The dataset is called [Qdrant/arxiv-titles-instructorxl-embeddings](https://huggingface.co/datasets/Qdrant/arxiv-titles-instructorxl-embeddings) and was created using the [InstructorXL](https://huggingface.co/hkunlp/instructor-xl) model. It contains 2.25M embeddings for the titles of the papers from the [arXiv](https://arxiv.org/) dataset. Loading the dataset is as simple as: ```python from datasets import load_dataset dataset = load_dataset( "Qdrant/arxiv-titles-instructorxl-embeddings", split="train", streaming=True ) ``` We used the streaming mode, so the dataset is not loaded into memory. Instead, we can iterate through it and extract the id and vector embedding: ```python for payload in dataset: id_ = payload.pop("id") vector = payload.pop("vector") print(id_, vector, payload) ``` A single payload looks like this: ```json { 'title': 'Dynamics of partially localized brane systems', 'DOI': '1109.1415' } ``` ### Create a collection First things first, we need to create our collection. We’re not going to play with the configuration of it, but it makes sense to do it right now. The configuration is also a part of the collection snapshot. ```python from qdrant_client import models if not client.collection_exists("test_collection"): client.create_collection( collection_name="test_collection", vectors_config=models.VectorParams( size=768, # Size of the embedding vector generated by the InstructorXL model distance=models.Distance.COSINE ), ) ``` ### Upload the dataset Calculating the embeddings is usually a bottleneck of the vector search pipelines, but we are happy to have them in place already. Since the goal of this tutorial is to show how to create a snapshot, **we are going to upload only a small part of the dataset**. ```python ids, vectors, payloads = [], [], [] for payload in dataset: id_ = payload.pop("id") vector = payload.pop("vector") ids.append(id_) vectors.append(vector) payloads.append(payload) # We are going to upload only 1000 vectors if len(ids) == 1000: break client.upsert( collection_name="test_collection", points=models.Batch( ids=ids, vectors=vectors, payloads=payloads, ), ) ``` Our collection is now ready to be used for search. Let’s create a snapshot of it. If you already have a collection, you can skip the previous step and start by [creating a snapshot](https://qdrant.tech/documentation/database-tutorials/create-snapshot/#create-and-download-snapshots). ## [Anchor](https://qdrant.tech/documentation/database-tutorials/create-snapshot/\#create-and-download-snapshots) Create and download snapshots Qdrant exposes an HTTP endpoint to request creating a snapshot, but we can also call it with the Python SDK. Our setup consists of 3 nodes, so we need to call the endpoint **on each of them** and create a snapshot on each node. While using Python SDK, that means creating a separate client instance for each node. pythonhttp ```python snapshot_urls = [] for node_url in QDRANT_NODES: node_client = QdrantClient(node_url, api_key=QDRANT_API_KEY) snapshot_info = node_client.create_snapshot(collection_name="test_collection") snapshot_url = f"{node_url}/collections/test_collection/snapshots/{snapshot_info.name}" snapshot_urls.append(snapshot_url) ``` ```http // for `https://node-0.my-cluster.com:6333` POST /collections/test_collection/snapshots // for `https://node-1.my-cluster.com:6333` POST /collections/test_collection/snapshots // for `https://node-2.my-cluster.com:6333` POST /collections/test_collection/snapshots ``` Response ```json { "result": { "name": "test_collection-559032209313046-2024-01-03-13-20-11.snapshot", "creation_time": "2024-01-03T13:20:11", "size": 18956800 }, "status": "ok", "time": 0.307644965 } ``` Once we have the snapshot URLs, we can download them. Please make sure to include the API key in the request headers. Downloading the snapshot **can be done only through the HTTP API**, so we are going to use the `requests` library. ```python import requests import os --- # Create a directory to store snapshots os.makedirs("snapshots", exist_ok=True) local_snapshot_paths = [] for snapshot_url in snapshot_urls: snapshot_name = os.path.basename(snapshot_url) local_snapshot_path = os.path.join("snapshots", snapshot_name) response = requests.get( snapshot_url, headers={"api-key": QDRANT_API_KEY} ) with open(local_snapshot_path, "wb") as f: response.raise_for_status() f.write(response.content) local_snapshot_paths.append(local_snapshot_path) ``` Alternatively, you can use the `wget` command: ```bash wget https://node-0.my-cluster.com:6333/collections/test_collection/snapshots/test_collection-559032209313046-2024-01-03-13-20-11.snapshot \ --header="api-key: ${QDRANT_API_KEY}" \ -O node-0-shapshot.snapshot wget https://node-1.my-cluster.com:6333/collections/test_collection/snapshots/test_collection-559032209313047-2024-01-03-13-20-12.snapshot \ --header="api-key: ${QDRANT_API_KEY}" \ -O node-1-shapshot.snapshot wget https://node-2.my-cluster.com:6333/collections/test_collection/snapshots/test_collection-559032209313048-2024-01-03-13-20-13.snapshot \ --header="api-key: ${QDRANT_API_KEY}" \ -O node-2-shapshot.snapshot ``` The snapshots are now stored locally. We can use them to restore the collection to a different Qdrant instance, or treat them as a backup. We will create another collection using the same data on the same cluster. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/create-snapshot/\#restore-from-snapshot) Restore from snapshot Our brand-new snapshot is ready to be restored. Typically, it is used to move a collection to a different Qdrant instance, but we are going to use it to create a new collection on the same cluster. It is just going to have a different name, `test_collection_import`. We do not need to create a collection first, as it is going to be created automatically. Restoring collection is also done separately on each node, but our Python SDK does not support it yet. We are going to use the HTTP API instead, and send a request to each node using `requests` library. ```python for node_url, snapshot_path in zip(QDRANT_NODES, local_snapshot_paths): snapshot_name = os.path.basename(snapshot_path) requests.post( f"{node_url}/collections/test_collection_import/snapshots/upload?priority=snapshot", headers={ "api-key": QDRANT_API_KEY, }, files={"snapshot": (snapshot_name, open(snapshot_path, "rb"))}, ) ``` Alternatively, you can use the `curl` command: ```bash curl -X POST 'https://node-0.my-cluster.com:6333/collections/test_collection_import/snapshots/upload?priority=snapshot' \ -H 'api-key: ${QDRANT_API_KEY}' \ -H 'Content-Type:multipart/form-data' \ -F 'snapshot=@node-0-shapshot.snapshot' curl -X POST 'https://node-1.my-cluster.com:6333/collections/test_collection_import/snapshots/upload?priority=snapshot' \ -H 'api-key: ${QDRANT_API_KEY}' \ -H 'Content-Type:multipart/form-data' \ -F 'snapshot=@node-1-shapshot.snapshot' curl -X POST 'https://node-2.my-cluster.com:6333/collections/test_collection_import/snapshots/upload?priority=snapshot' \ -H 'api-key: ${QDRANT_API_KEY}' \ -H 'Content-Type:multipart/form-data' \ -F 'snapshot=@node-2-shapshot.snapshot' ``` **Important:** We selected `priority=snapshot` to make sure that the snapshot is preferred over the data stored on the node. You can read mode about the priority in the [documentation](https://qdrant.tech/documentation/concepts/snapshots/#snapshot-priority). Apart from Snapshots, Qdrant also provides the [Qdrant Migration Tool](https://github.com/qdrant/migration) that supports: - Migration between Qdrant Cloud instances. - Migrating vectors from other providers into Qdrant. - Migrating from Qdrant OSS to Qdrant Cloud. Follow our [migration guide](https://qdrant.tech/documentation/database-tutorials/migration/) to learn how to effectively use the Qdrant Migration tool. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/create-snapshot.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/create-snapshot.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-183-lllmstxt|> ## agentic-rag-langgraph - [Documentation](https://qdrant.tech/documentation/) - Agentic RAG With LangGraph --- # [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#agentic-rag-with-langgraph-and-qdrant) Agentic RAG With LangGraph and Qdrant Traditional Retrieval-Augmented Generation (RAG) systems follow a straightforward path: query → retrieve → generate. Sure, this works well for many scenarios. But let’s face it—this linear approach often struggles when you’re dealing with complex queries that demand multiple steps or pulling together diverse types of information. [Agentic RAG](https://qdrant.tech/articles/agentic-rag/) takes things up a notch by introducing AI agents that can orchestrate multiple retrieval steps and smartly decide how to gather and use the information you need. Think of it this way: in an Agentic RAG workflow, RAG becomes just one powerful tool in a much bigger and more versatile toolkit. By combining LangGraph’s robust state management with Qdrant’s cutting-edge vector search, we’ll build a system that doesn’t just answer questions—it tackles complex, multi-step information retrieval tasks with finesse. ## [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#what-well-build) What We’ll Build We’re building an AI agent to answer questions about Hugging Face and Transformers documentation using LangGraph. At the heart of our AI agent lies LangGraph, which acts like a conductor in an orchestra. It directs the flow between various components—deciding when to retrieve information, when to perform a web search, and when to generate responses. The components are: two Qdrant vector stores and the Brave web search engine. However, our agent doesn’t just blindly follow one path. Instead, it evaluates each query and decides whether to tap into the first vector store, the second one, or search the web. This selective approach gives your system the flexibility to choose the best data source for the job, rather than being locked into the same retrieval process every time, like traditional RAG. While we won’t dive into query refinement in this tutorial, the concepts you’ll learn here are a solid foundation for adding that functionality down the line. ## [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#workflow) Workflow ![image1](https://qdrant.tech/documentation/examples/agentic-rag-langgraph/image1.png) | **Step** | **Description** | | --- | --- | | **1\. User Input** | You start by entering a query or request through an interface, like a chatbot or a web form. This query is sent straight to the AI Agent, the brain of the operation. | | **2\. AI Agent Processes the Query** | The AI Agent analyzes your query, figuring out what you’re asking and which tools or data sources will best answer your question. | | **3\. Tool Selection** | Based on its analysis, the AI Agent picks the right tool for the job. Your data is spread across two vector databases, and depending on the query, it chooses the appropriate one. For queries needing real-time or external web data, the agent taps into a web search tool powered by BraveSearchAPI. | | **4\. Query Execution** | The AI Agent then puts its chosen tool to work:
\- **RAG Tool 1** queries Vector Database 1.
\- **RAG Tool 2** queries Vector Database 2.
\- **Web Search Tool** dives into the internet using the search API. | | **5\. Data Retrieval** | The results roll in:
\- Vector Database 1 and 2 return the most relevant documents for your query.
\- The Web Search Tool provides up-to-date or external information. | | **6\. Response Generation** | Using a text generation model (like GPT), the AI Agent crafts a detailed and accurate response tailored to your query. | | **7\. User Response** | The polished response is sent back to you through the interface, ready to use. | ## [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#the-stack) The Stack The architecture taps into cutting-edge tools to power efficient Agentic RAG workflows. Here’s a quick overview of its components and the technologies you’ll need: - **AI Agent:** The mastermind of the system, this agent parses your queries, picks the right tools, and integrates the responses. We’ll use OpenAI’s _gpt-4o_ as the reasoning engine, managed seamlessly by LangGraph. - **Embedding:** Queries are transformed into vector embeddings using OpenAI’s _text-embedding-3-small_ model. - **Vector Database:** Embeddings are stored and used for similarity searches, with Qdrant stepping in as our database of choice. - **LLM:** Responses are generated using OpenAI’s _gpt-4o_, ensuring answers are accurate and contextually grounded. - **Search Tools:** To extend RAG’s capabilities, we’ve added a web search component powered by BraveSearchAPI, perfect for real-time and external data retrieval. - **Workflow Management:** The entire orchestration and decision-making flow is built with LangGraph, providing the flexibility and intelligence needed to handle complex workflows. Ready to start building this system from the ground up? Let’s get to it! ## [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#implementation) Implementation Before we dive into building our agent, let’s get everything set up. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#imports) Imports Here’s a list of key imports required: ```python import os import json from typing import Annotated, TypedDict from dotenv import load_dotenv from langchain.embeddings import OpenAIEmbeddings from langgraph import StateGraph, tool, ToolNode, ToolMessage from langchain.document_loaders import HuggingFaceDatasetLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.llms import ChatOpenAI from qdrant_client import QdrantClient from qdrant_client.http.models import VectorParams from brave_search import BraveSearch ``` ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#qdrant-vector-database-setup) Qdrant Vector Database Setup We’ll use **Qdrant Cloud** as our vector store for document embeddings. Here’s how to set it up: | **Step** | **Description** | | --- | --- | | **1\. Create an Account** | If you don’t already have one, head to Qdrant Cloud and sign up. | | **2\. Set Up a Cluster** | Log in to your account and find the **Create New Cluster** button on the dashboard. Follow the prompts to configure:
\- Select your **preferred region**.
\- Choose the **free tier** for testing. | | **3\. Secure Your Details** | Once your cluster is ready, note these details:
\- **Cluster URL** (e.g., [https://xxx-xxx-xxx.aws.cloud.qdrant.io](https://xxx-xxx-xxx.aws.cloud.qdrant.io/))
\- **API Key** | Save these securely for future use! ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#openai-api-configuration) OpenAI API Configuration Your OpenAI API key will power both embedding generation and language model interactions. Visit [OpenAI’s platform](https://platform.openai.com/) and sign up for an account. In the API section of your dashboard, create a new API key. We’ll use the text-embedding-3-small model for embeddings and GPT-4 as the language model. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#brave-search) Brave Search To enhance search capabilities, we’ll integrate Brave Search. Visit the [Brave API](https://api.search.brave.com/) and complete their API access request process to obtain an API key. This key will enable web search functionality for our agent. For added security, store all API keys in a .env file. ```json OPENAI_API_KEY = QDRANT_KEY = QDRANT_URL = BRAVE_API_KEY = ``` * * * Then load the environment variables: ```python load_dotenv() qdrant_key = os.getenv("QDRANT_KEY") qdrant_url = os.getenv("QDRANT_URL") brave_key = os.getenv("BRAVE_API_KEY") ``` * * * ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#document-processing) Document Processing Before we can create our agent, we need to process and store the documentation. We’ll be working with two datasets from Hugging Face: their general documentation and Transformers-specific documentation. Here’s our document preprocessing function: ```python def preprocess_dataset(docs_list): text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( chunk_size=700, chunk_overlap=50, disallowed_special=() ) doc_splits = text_splitter.split_documents(docs_list) return doc_splits ``` * * * This function processes our documents by splitting them into manageable chunks, ensuring important context is preserved at the chunk boundaries through overlap. We’ll use the HuggingFaceDatasetLoader to load the datasets into Hugging Face documents. ```python hugging_face_doc = HuggingFaceDatasetLoader("m-ric/huggingface_doc","text") transformers_doc = HuggingFaceDatasetLoader("m-ric/transformers_documentation_en","text") ``` * * * In this demo, we are selecting the first 50 documents from the dataset and passing them to the processing function. ```python hf_splits = preprocess_dataset(hugging_face_doc.load()[:number_of_docs]) transformer_splits = preprocess_dataset(transformers_doc.load()[:number_of_docs]) ``` * * * Our splits are ready. Let’s create a collection in Qdrant to store them. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#defining-the-state) Defining the State In LangGraph, a **state** refers to the data or information stored and maintained at a specific point during the execution of a process or a series of operations. States capture the intermediate or final results that the system needs to keep track of to manage and control the flow of tasks, LangGraph works with a state-based system. We define our state like this: ```python class State(TypedDict): messages: Annotated[list, add_messages] ``` * * * Let’s build our tools. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#building-the-tools) Building the Tools Our agent is equipped with three powerful tools: 1. **Hugging Face Documentation Retriever** 2. **Transformers Documentation Retriever** 3. **Web Search Tool** Let’s start by defining a retriever that takes documents and a collection name, then returns a retriever. The query is transformed into vectors using **OpenAIEmbeddings**. ```python def create_retriever(collection_name, doc_splits): vectorstore = QdrantVectorStore.from_documents( doc_splits, OpenAIEmbeddings(model="text-embedding-3-small"), url=qdrant_url, api_key=qdrant_key, collection_name=collection_name, ) return vectorstore.as_retriever() ``` * * * Both the Hugging Face documentation retriever and the Transformers documentation retriever use this same function. With this setup, it’s incredibly simple to create separate tools for each. ```python hf_retriever_tool = create_retriever_tool( hf_retriever, "retriever_hugging_face_documentation", "Search and return information about hugging face documentation, it includes the guide and Python code.", ) transformer_retriever_tool = create_retriever_tool( transformer_retriever, "retriever_transformer", "Search and return information specifically about transformers library", ) ``` * * * For web search, we create a simple yet effective tool using Brave Search: ```python @tool("web_search_tool") def search_tool(query): search = BraveSearch.from_api_key(api_key=brave_key, search_kwargs={"count": 3}) return search.run(query) ``` * * * The search\_tool function leverages the BraveSearch API to perform a search. It takes a query, retrieves the top 3 search results using the API key, and returns the results. Next, we’ll set up and integrate our tools with a language model: ```python tools = [hf_retriever_tool, transformer_retriever_tool, search_tool] tool_node = ToolNode(tools=tools) llm = ChatOpenAI(model="gpt-4o", temperature=0) llm_with_tools = llm.bind_tools(tools) ``` * * * Here, the ToolNode class handles and orchestrates our tools: ```python class ToolNode: def __init__(self, tools: list) -> None: self.tools_by_name = {tool.name: tool for tool in tools} def __call__(self, inputs: dict): if messages := inputs.get("messages", []): message = messages[-1] else: raise ValueError("No message found in input") outputs = [] for tool_call in message.tool_calls: tool_result = self.tools_by_name[tool_call["name"]].invoke( tool_call["args"] ) outputs.append( ToolMessage( content=json.dumps(tool_result), name=tool_call["name"], tool_call_id=tool_call["id"], ) ) return {"messages": outputs} ``` * * * The ToolNode class handles tool execution by initializing a list of tools and mapping tool names to their corresponding functions. It processes input dictionaries, extracts the last message, and checks for tool\_calls from LLM tool-calling capability providers such as Anthropic, OpenAI, and others. ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#routing-and-decision-making) Routing and Decision Making Our agent needs to determine when to use tools and when to end the cycle. This decision is managed by the routing function: ```python def route(state: State): if isinstance(state, list): ai_message = state[-1] elif messages := state.get("messages", []): ai_message = messages[-1] else: raise ValueError(f"No messages found in input state to tool_edge: {state}") if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0: return "tools" return END ``` * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#putting-it-all-together-the-graph) Putting It All Together: The Graph Finally, we’ll construct the graph that ties everything together: ```python graph_builder = StateGraph(State) graph_builder.add_node("agent", agent) graph_builder.add_node("tools", tool_node) graph_builder.add_conditional_edges( "agent", route, {"tools": "tools", END: END}, ) graph_builder.add_edge("tools", "agent") graph_builder.add_edge(START, "agent") ``` * * * This is what the graph looks like: ![image2](https://qdrant.tech/documentation/examples/agentic-rag-langgraph/image2.jpg) Fig. 3: Agentic RAG with LangGraph ### [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#running-the-agent) Running the Agent With everything set up, we can run our agent using a simple function: ```python def run_agent(user_input: str): for event in graph.stream({"messages": [("user", user_input)]}): for value in event.values(): print("Assistant:", value["messages"][-1].content) ``` * * * Now, you’re ready to ask questions about Hugging Face and Transformers! Our agent will intelligently combine information from the documentation with web search results when needed. For example, you can ask: ```txt In the Transformers library, are there any multilingual models? ``` The agent will dive into the Transformers documentation, extract relevant details about multilingual models, and deliver a clear, comprehensive answer. Here’s what the response might look like: ```txt Yes, the Transformers library includes several multilingual models. Here are some examples: BERT Multilingual: Models like `bert-base-multilingual-uncased` can be used just like monolingual models. XLM (Cross-lingual Language Model): Models like `xlm-mlm-ende-1024` (English-German), `xlm-mlm-enfr-1024` (English-French), and others use language embeddings to specify the language used at inference. M2M100: Models like `facebook/m2m100_418M` and `facebook/m2m100_1.2B` are used for multilingual translation. MBart: Models like `facebook/mbart-large-50-one-to-many-mmt` and `facebook/mbart-large-50-many-to-many-mmt` are used for multilingual machine translation across 50 languages. These models are designed to handle multiple languages and can be used for tasks like translation, classification, and more. ``` * * * ## [Anchor](https://qdrant.tech/documentation/agentic-rag-langgraph/\#conclusion) Conclusion We’ve successfully implemented Agentic RAG. But this is just the beginning—there’s plenty more you can explore to take your system to the next level. Agentic RAG is transforming how businesses connect data sources with AI, enabling smarter and more dynamic interactions. In this tutorial, you’ve learned how to build an Agentic RAG system that combines the power of LangGraph, Qdrant, and web search into one seamless workflow. This system doesn’t just stop at retrieving relevant information from Hugging Face and Transformers documentation. It also smartly falls back to web search when needed, ensuring no query goes unanswered. With Qdrant as the vector database backbone, you get fast, scalable semantic search that excels at retrieving precise information—even from massive datasets. To truly grasp the potential of this approach, why not apply these concepts to your own projects? Customize the template we’ve shared to fit your unique use case, and unlock the full potential of Agentic RAG for your business needs. The possibilities are endless. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/agentic-rag-langgraph.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/agentic-rag-langgraph.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-184-lllmstxt|> ## qdrant-1.2.x - [Articles](https://qdrant.tech/articles/) - Introducing Qdrant 1.2.x [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Introducing Qdrant 1.2.x Kacper Łukawski · May 24, 2023 ![Introducing Qdrant 1.2.x](https://qdrant.tech/articles_data/qdrant-1.2.x/preview/title.jpg) A brand-new Qdrant 1.2 release comes packed with a plethora of new features, some of which were highly requested by our users. If you want to shape the development of the Qdrant vector database, please [join our Discord community](https://qdrant.to/discord) and let us know how you use it! ## [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#new-features) New features As usual, a minor version update of Qdrant brings some interesting new features. We love to see your feedback, and we tried to include the features most requested by our community. ### [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#product-quantization) Product Quantization The primary focus of Qdrant was always performance. That’s why we built it in Rust, but we were always concerned about making vector search affordable. From the very beginning, Qdrant offered support for disk-stored collections, as storage space is way cheaper than memory. That’s also why we have introduced the [Scalar Quantization](https://qdrant.tech/articles/scalar-quantization/) mechanism recently, which makes it possible to reduce the memory requirements by up to four times. Today, we are bringing a new quantization mechanism to life. A separate article on [Product\\ Quantization](https://qdrant.tech/documentation/quantization/#product-quantization) will describe that feature in more detail. In a nutshell, you can **reduce the memory requirements by up to 64 times**! ### [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#optional-named-vectors) Optional named vectors Qdrant has been supporting multiple named vectors per point for quite a long time. Those may have utterly different dimensionality and distance functions used to calculate similarity. Having multiple embeddings per item is an essential real-world scenario. For example, you might be encoding textual and visual data using different models. Or you might be experimenting with different models but don’t want to make your payloads redundant by keeping them in separate collections. ![Optional vectors](https://qdrant.tech/articles_data/qdrant-1.2.x/optional-vectors.png) However, up to the previous version, we requested that you provide all the vectors for each point. There have been many requests to allow nullable vectors, as sometimes you cannot generate an embedding or simply don’t want to for reasons we don’t need to know. ### [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#grouping-requests) Grouping requests Embeddings are great for capturing the semantics of the documents, but we rarely encode larger pieces of data into a single vector. Having a summary of a book may sound attractive, but in reality, we divide it into paragraphs or some different parts to have higher granularity. That pays off when we perform the semantic search, as we can return the relevant pieces only. That’s also how modern tools like Langchain process the data. The typical way is to encode some smaller parts of the document and keep the document id as a payload attribute. ![Query without grouping request](https://qdrant.tech/articles_data/qdrant-1.2.x/without-grouping-request.png) There are cases where we want to find relevant parts, but only up to a specific number of results per document (for example, only a single one). Up till now, we had to implement such a mechanism on the client side and send several calls to the Qdrant engine. But that’s no longer the case. Qdrant 1.2 provides a mechanism for [grouping requests](https://qdrant.tech/documentation/search/#grouping-api), which can handle that server-side, within a single call to the database. This mechanism is similar to the SQL `GROUP BY` clause. ![Query with grouping request](https://qdrant.tech/articles_data/qdrant-1.2.x/with-grouping-request.png) You are not limited to a single result per document, and you can select how many entries will be returned. ### [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#nested-filters) Nested filters Unlike some other vector databases, Qdrant accepts any arbitrary JSON payload, including arrays, objects, and arrays of objects. You can also [filter the search results using nested\\ keys](https://qdrant.tech/documentation/filtering/#nested-key), even though arrays (using the `[]` syntax). Before Qdrant 1.2 it was impossible to express some more complex conditions for the nested structures. For example, let’s assume we have the following payload: ```json { "country": "Japan", "cities": [\ {\ "name": "Tokyo",\ "population": 9.3,\ "area": 2194\ },\ {\ "name": "Osaka",\ "population": 2.7,\ "area": 223\ },\ {\ "name": "Kyoto",\ "population": 1.5,\ "area": 827.8\ }\ ] } ``` We want to filter out the results to include the countries with a city with over 2 million citizens and an area bigger than 500 square kilometers but no more than 1000. There is no such a city in Japan, looking at our data, but if we wrote the following filter, it would be returned: ```json { "filter": { "must": [\ {\ "key": "country.cities[].population",\ "range": {\ "gte": 2\ }\ },\ {\ "key": "country.cities[].area",\ "range": {\ "gt": 500,\ "lte": 1000\ }\ }\ ] }, "limit": 3 } ``` Japan would be returned because Tokyo and Osaka match the first criteria, while Kyoto fulfills the second. But that’s not what we wanted to achieve. That’s the motivation behind introducing a new type of nested filter. ```json { "filter": { "must": [\ {\ "nested": {\ "key": "country.cities",\ "filter": {\ "must": [\ {\ "key": "population",\ "range": {\ "gte": 2\ }\ },\ {\ "key": "area",\ "range": {\ "gt": 500,\ "lte": 1000\ }\ }\ ]\ }\ }\ }\ ] }, "limit": 3 } ``` The syntax is consistent with all the other supported filters and enables new possibilities. In our case, it allows us to express the joined condition on a nested structure and make the results list empty but correct. ## [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#important-changes) Important changes The latest release focuses not only on the new features but also introduces some changes making Qdrant even more reliable. ### [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#recovery-mode) Recovery mode There has been an issue in memory-constrained environments, such as cloud, happening when users were pushing massive amounts of data into the service using `wait=false`. This data influx resulted in an overreaching of disk or RAM limits before the Write-Ahead Logging (WAL) was fully applied. This situation was causing Qdrant to attempt a restart and reapplication of WAL, failing recurrently due to the same memory constraints and pushing the service into a frustrating crash loop with many Out-of-Memory errors. Qdrant 1.2 enters recovery mode, if enabled, when it detects a failure on startup. That makes the service halt the loading of collection data and commence operations in a partial state. This state allows for removing collections but doesn’t support search or update functions. **Recovery mode [has to be enabled by user](https://qdrant.tech/documentation/administration/#recovery-mode).** ### [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#appendable-mmap) Appendable mmap For a long time, segments using mmap storage were `non-appendable` and could only be constructed by the optimizer. Dynamically adding vectors to the mmap file is fairly complicated and thus not implemented in Qdrant, but we did our best to implement it in the recent release. If you want to read more about segments, check out our docs on [vector storage](https://qdrant.tech/documentation/storage/#vector-storage). ## [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#security) Security There are two major changes in terms of [security](https://qdrant.tech/documentation/security/): 1. **API-key support** \- basic authentication with a static API key to prevent unwanted access. Previously API keys were only supported in [Qdrant Cloud](https://cloud.qdrant.io/). 2. **TLS support** \- to use encrypted connections and prevent sniffing/MitM attacks. ## [Anchor](https://qdrant.tech/articles/qdrant-1.2.x/\#release-notes) Release notes As usual, [our release notes](https://github.com/qdrant/qdrant/releases/tag/v1.2.0) describe all the changes introduced in the latest version. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.2.x.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-1.2.x.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-185-lllmstxt|> ## discovery-search - [Articles](https://qdrant.tech/articles/) - Discovery needs context [Back to Data Exploration](https://qdrant.tech/articles/data-exploration/) --- # Discovery needs context Luis Cossío · January 31, 2024 ![Discovery needs context](https://qdrant.tech/articles_data/discovery-search/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/discovery-search/\#discovery-needs-context) Discovery needs context When Christopher Columbus and his crew sailed to cross the Atlantic Ocean, they were not looking for the Americas. They were looking for a new route to India because they were convinced that the Earth was round. They didn’t know anything about a new continent, but since they were going west, they stumbled upon it. They couldn’t reach their _target_, because the geography didn’t let them, but once they realized it wasn’t India, they claimed it a new “discovery” for their crown. If we consider that sailors need water to sail, then we can establish a _context_ which is positive in the water, and negative on land. Once the sailor’s search was stopped by the land, they could not go any further, and a new route was found. Let’s keep these concepts of _target_ and _context_ in mind as we explore the new functionality of Qdrant: **Discovery search**. ## [Anchor](https://qdrant.tech/articles/discovery-search/\#what-is-discovery-search) What is discovery search? In version 1.7, Qdrant [released](https://qdrant.tech/articles/qdrant-1.7.x/) this novel API that lets you constrain the space in which a search is performed, relying only on pure vectors. This is a powerful tool that lets you explore the vector space in a more controlled way. It can be used to find points that are not necessarily closest to the target, but are still relevant to the search. You can already select which points are available to the search by using payload filters. This by itself is very versatile because it allows us to craft complex filters that show only the points that satisfy their criteria deterministically. However, the payload associated with each point is arbitrary and cannot tell us anything about their position in the vector space. In other words, filtering out irrelevant points can be seen as creating a _mask_ rather than a hyperplane –cutting in between the positive and negative vectors– in the space. ## [Anchor](https://qdrant.tech/articles/discovery-search/\#understanding-context) Understanding context This is where a **vector _context_** can help. We define _context_ as a list of pairs. Each pair is made up of a positive and a negative vector. With a context, we can define hyperplanes within the vector space, which always prefer the positive over the negative vectors. This effectively partitions the space where the search is performed. After the space is partitioned, we then need a _target_ to return the points that are more similar to it. ![Discovery search visualization](https://qdrant.tech/articles_data/discovery-search/discovery-search.png) While positive and negative vectors might suggest the use of the [recommendation interface](https://qdrant.tech/documentation/concepts/explore/#recommendation-api), in the case of _context_ they require to be paired up in a positive-negative fashion. This is inspired from the machine-learning concept of [_triplet loss_](https://en.wikipedia.org/wiki/Triplet_loss), where you have three vectors: an anchor, a positive, and a negative. Triplet loss is an evaluation of how much the anchor is closer to the positive than to the negative vector, so that learning happens by “moving” the positive and negative points to try to get a better evaluation. However, during discovery, we consider the positive and negative vectors as static points, and we search through the whole dataset for the “anchors”, or result candidates, which fit this characteristic better. ![Triplet loss](https://qdrant.tech/articles_data/discovery-search/triplet-loss.png) [**Discovery search**](https://qdrant.tech/articles/discovery-search/#discovery-search), then, is made up of two main inputs: - **target**: the main point of interest - **context**: the pairs of positive and negative points we just defined. However, it is not the only way to use it. Alternatively, you can **only** provide a context, which invokes a [**Context Search**](https://qdrant.tech/articles/discovery-search/#context-search). This is useful when you want to explore the space defined by the context, but don’t have a specific target in mind. But hold your horses, we’ll get to that [later ↪](https://qdrant.tech/articles/discovery-search/#context-search). ## [Anchor](https://qdrant.tech/articles/discovery-search/\#real-world-discovery-search-applications) Real-world discovery search applications Let’s talk about the first case: context with a target. To understand why this is useful, let’s take a look at a real-world example: using a multimodal encoder like [CLIP](https://openai.com/blog/clip/) to search for images, from text **and** images. CLIP is a neural network that can embed both images and text into the same vector space. This means that you can search for images using either a text query or an image query. For this example, we’ll reuse our [food recommendations demo](https://food-discovery.qdrant.tech/) by typing “burger” in the text input: ![Burger text input in food demo](https://qdrant.tech/articles_data/discovery-search/search-for-burger.png) This is basically nearest neighbor search, and while technically we have only images of burgers, one of them is a logo representation of a burger. We’re looking for actual burgers, though. Let’s try to exclude images like that by adding it as a negative example: ![Try to exclude burger drawing](https://qdrant.tech/articles_data/discovery-search/try-to-exclude-non-burger.png) Wait a second, what has just happened? These pictures have **nothing** to do with burgers, and still, they appear on the first results. Is the demo broken? Turns out, multimodal encoders [might not work how you expect them to](https://modalitygap.readthedocs.io/en/latest/). Images and text are embedded in the same space, but they are not necessarily close to each other. This means that we can create a mental model of the distribution as two separate planes, one for images and one for text. ![Mental model of CLIP embeddings](https://qdrant.tech/articles_data/discovery-search/clip-mental-model.png) This is where discovery excels because it allows us to constrain the space considering the same mode (images) while using a target from the other mode (text). ![Cross-modal search with discovery](https://qdrant.tech/articles_data/discovery-search/clip-discovery.png) Discovery search also lets us keep giving feedback to the search engine in the shape of more context pairs, so we can keep refining our search until we find what we are looking for. Another intuitive example: imagine you’re looking for a fish pizza, but pizza names can be confusing, so you can just type “pizza”, and prefer a fish over meat. Discovery search will let you use these inputs to suggest a fish pizza… even if it’s not called fish pizza! ![Simple discovery example](https://qdrant.tech/articles_data/discovery-search/discovery-example-with-images.png) ## [Anchor](https://qdrant.tech/articles/discovery-search/\#context-search) Context search Now, the second case: only providing context. Ever been caught in the same recommendations on your favorite music streaming service? This may be caused by getting stuck in a similarity bubble. As user input gets more complex, diversity becomes scarce, and it becomes harder to force the system to recommend something different. ![Context vs recommendation search](https://qdrant.tech/articles_data/discovery-search/context-vs-recommendation.png) **Context search** solves this by de-focusing the search around a single point. Instead, it selects points randomly from within a zone in the vector space. This search is the most influenced by _triplet loss_, as the score can be thought of as _“how much a point is closer to a negative than a positive vector?”_. If it is closer to the positive one, then its score will be zero, same as any other point within the same zone. But if it is on the negative side, it will be assigned a more and more negative score the further it gets. ![Context search visualization](https://qdrant.tech/articles_data/discovery-search/context-search.png) Creating complex tastes in a high-dimensional space becomes easier since you can just add more context pairs to the search. This way, you should be able to constrain the space enough so you select points from a per-search “category” created just from the context in the input. ![A more complex context search](https://qdrant.tech/articles_data/discovery-search/complex-context-search.png) This way you can give refreshing recommendations, while still being in control by providing positive and negative feedback, or even by trying out different permutations of pairs. ## [Anchor](https://qdrant.tech/articles/discovery-search/\#key-takeaways) Key takeaways: - Discovery search is a powerful tool for controlled exploration in vector spaces. Context, consisting of positive and negative vectors constrain the search space, while a target guides the search. - Real-world applications include multimodal search, diverse recommendations, and context-driven exploration. - Ready to learn more about the math behind it and how to use it? Check out the [documentation](https://qdrant.tech/documentation/concepts/explore/#discovery-api) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/discovery-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/discovery-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-186-lllmstxt|> ## dimension-reduction-qsoc - [Articles](https://qdrant.tech/articles/) - Qdrant Summer of Code 2024 - WASM based Dimension Reduction [Back to Ecosystem](https://qdrant.tech/articles/ecosystem/) --- # Qdrant Summer of Code 2024 - WASM based Dimension Reduction Jishan Bhattacharya · August 31, 2024 ![Qdrant Summer of Code 2024 - WASM based Dimension Reduction](https://qdrant.tech/articles_data/dimension-reduction-qsoc/preview/title.jpg) ## [Anchor](https://qdrant.tech/articles/dimension-reduction-qsoc/\#introduction) Introduction Hello, everyone! I’m Jishan Bhattacharya, and I had the incredible opportunity to intern at Qdrant this summer as part of the Qdrant Summer of Code 2024. Under the mentorship of [Andrey Vasnetsov](https://www.linkedin.com/in/andrey-vasnetsov-75268897/), I dived into the world of performance optimization, focusing on enhancing vector visualization using WebAssembly (WASM). In this article, I’ll share the insights, challenges, and accomplishments from my journey — one filled with learning, experimentation, and plenty of coding adventures. ## [Anchor](https://qdrant.tech/articles/dimension-reduction-qsoc/\#project-overview) Project Overview Qdrant is a robust vector database and search engine designed to store vector data and perform tasks like similarity search and clustering. One of its standout features is the ability to visualize high-dimensional vectors in a 2D space. However, the existing implementation faced performance bottlenecks, especially when scaling to large datasets. My mission was to tackle this challenge by leveraging a WASM-based solution for dimensionality reduction in the visualization process. ## [Anchor](https://qdrant.tech/articles/dimension-reduction-qsoc/\#learnings--challenges) Learnings & Challenges Our weapon of choice was Rust, paired with WASM, and we employed the t-SNE algorithm for dimensionality reduction. For those unfamiliar, t-SNE (t-Distributed Stochastic Neighbor Embedding) is a technique that helps visualize high-dimensional data by projecting it into two or three dimensions. It operates in two main steps: 1. **Computing Pairwise Similarity:** This step involves calculating the similarity between each pair of data points in the original high-dimensional space. 2. **Iterative Optimization:** The second step is iterative, where the embedding is refined using gradient descent. Here, the similarity matrix from the first step plays a crucial role. At the outset, Andrey tasked me with rewriting the existing JavaScript implementation of t-SNE in Rust, introducing multi-threading along the way. Setting up WASM with Vite for multi-threaded execution was no small feat, but the effort paid off. The resulting Rust implementation outperformed the single-threaded JavaScript version, although it still struggled with large datasets. Next came the challenge of optimizing the algorithm further. A key aspect of t-SNE’s first step is finding the nearest neighbors for each data point, which requires an efficient data structure. I opted for a [Vantage Point Tree](https://en.wikipedia.org/wiki/Vantage-point_tree) (also known as a Ball Tree) to speed up this process. As for the second step, while it is inherently sequential, there was still room for improvement. I incorporated Barnes-Hut approximation to accelerate the gradient calculation. This method approximates the forces between points in low dimensional space, making the process more efficient. To illustrate, imagine dividing a 2D space into quadrants, each containing multiple points. Every quadrant is again subdivided into four quadrants. This is done until every point belongs to a single cell. ![Calculating the resultant force on red point using Barnes-Hut approximation](https://qdrant.tech/articles_data/dimension-reduction-qsoc/barnes_hut.png) Barnes-Hut Approximation We then calculate the center of mass for each cell represented by a blue circle as shown in the figure. Now let’s say we want to find all the forces, represented by dotted lines, on the red point. Barnes Hut’s approximation states that for points that are sufficiently distant, instead of computing the force for each individual point, we use the center of mass as a proxy, significantly reducing the computational load. This is represented by the blue dotted line in the figure. These optimizations made a remarkable difference — Barnes-Hut t-SNE was eight times faster than the exact t-SNE for 10,000 vectors. ![Image of visualizing 10,000 vectors using exact t-SNE which took 884.728s](https://qdrant.tech/articles_data/dimension-reduction-qsoc/rust_rewrite.jpg) Exact t-SNE - Total time: 884.728s ![Image of visualizing 10,000 vectors using Barnes-Hut t-SNE which took 110.728s](https://qdrant.tech/articles_data/dimension-reduction-qsoc/rust_bhtsne.jpg) Barnes-Hut t-SNE - Total time: 104.191s Despite these improvements, the first step of the algorithm was still a bottleneck, leading to noticeable delays and blank screens. I experimented with approximate nearest neighbor algorithms, but the performance gains were minimal. After consulting with my mentor, we decided to compute the nearest neighbors on the server side, passing the distance matrix directly to the visualization process instead of the raw vectors. While waiting for the distance-matrix API to be ready, I explored further optimizations. I observed that the worker thread sent results to the main thread for rendering at specific intervals, causing unnecessary delays due to serialization and deserialization. ![Image showing serialization and deserialization overhead due to message passing between threads](https://qdrant.tech/articles_data/dimension-reduction-qsoc/channels.png) Serialization and Deserialization Overhead To address this, I implemented a `SharedArrayBuffer`, allowing the main thread to access changes made by the worker thread instantly. This change led to noticeable improvements. Additionally, the previous architecture resulted in choppy animations due to the fixed intervals at which the worker thread sent results. ![Image showing the previous architecture of the frontend with fixed intervals for sending results](https://qdrant.tech/articles_data/dimension-reduction-qsoc/prev_arch.png) Previous architecture with fixed intervals I introduced a “rendering-on-demand” approach, where the main thread would signal the worker thread when it was ready to render the next result. This created smoother, more responsive animations. ![Image showing the current architecture of the frontend with rendering-on-demand approach](https://qdrant.tech/articles_data/dimension-reduction-qsoc/curr_arch.png) Current architecture with rendering-on-demand With these optimizations in place, the final step was wrapping up the project by creating a Node.js [package](https://www.npmjs.com/package/wasm-dist-bhtsne). This package exposed the necessary interfaces to accept the distance matrix, perform calculations, and return the results, making the solution easy to integrate into various projects. ## [Anchor](https://qdrant.tech/articles/dimension-reduction-qsoc/\#areas-for-improvement) Areas for Improvement While reflecting on this transformative journey, there are still areas that offer room for improvement and future enhancements: 1. **Payload Parsing:** When requesting a large number of vectors, parsing the payload on the main thread can make the user interface unresponsive. Implementing a faster parser could mitigate this issue. 2. **Direct Data Requests:** Allowing the worker thread to request data directly could eliminate the initial transfer of data from the main thread, speeding up the overall process. 3. **Chart Library Optimization:** Profiling revealed that nearly 80% of the time was spent on the Chart.js update function. Switching to a WebGL-accelerated chart library could dramatically improve performance, especially for large datasets. ![Image showing profiling results with 80% time spent on Chart.js update function](https://qdrant.tech/articles_data/dimension-reduction-qsoc/profiling.png) Profiling Result ## [Anchor](https://qdrant.tech/articles/dimension-reduction-qsoc/\#conclusion) Conclusion Participating in the Qdrant Summer of Code 2024 was a deeply rewarding experience. I had the chance to push the boundaries of my coding skills while exploring new technologies like Rust and WebAssembly. I’m incredibly grateful for the guidance and support from my mentor and the entire Qdrant team, who made this journey both educational and enjoyable. This experience has not only honed my technical skills but also ignited a deeper passion for optimizing performance in real-world applications. I’m excited to apply the knowledge and skills I’ve gained to future projects and to see how Qdrant’s enhanced vector visualization feature will benefit users worldwide. This experience has not only honed my technical skills but also ignited a deeper passion for optimizing performance in real-world applications. I’m excited to apply the knowledge and skills I’ve gained to future projects and to see how Qdrant’s enhanced vector visualization feature will benefit users worldwide. Thank you for joining me on this coding adventure. I hope you found something valuable in my journey, and I look forward to sharing more exciting projects with you in the future. Happy coding! ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dimension-reduction-qsoc.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/dimension-reduction-qsoc.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-187-lllmstxt|> ## networking-logging-monitoring - [Documentation](https://qdrant.tech/documentation/) - [Hybrid cloud](https://qdrant.tech/documentation/hybrid-cloud/) - Networking, Logging & Monitoring --- # [Anchor](https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/\#configuring-networking-logging--monitoring-in-qdrant-hybrid-cloud) Configuring Networking, Logging & Monitoring in Qdrant Hybrid Cloud ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/\#configure-network-policies) Configure network policies For security reasons, each database cluster is secured with network policies. By default, database pods only allow egress traffic between each and allow ingress traffic to ports 6333 (rest) and 6334 (grpc) from within the Kubernetes cluster. You can modify the default network policies in the Hybrid Cloud environment configuration: ```yaml qdrant: networkPolicies: ingress: - from: - ipBlock: cidr: 192.168.0.0/22 - podSelector: matchLabels: app: client-app namespaceSelector: matchLabels: kubernetes.io/metadata.name: client-namespace - podSelector: matchLabels: app: traefik namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - port: 6333 protocol: TCP - port: 6334 protocol: TCP ``` ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/\#logging) Logging You can access the logs with kubectl or the Kubernetes log management tool of your choice. For example: ```bash kubectl -n qdrant-namespace logs -l app=qdrant,cluster-id=9a9f48c7-bb90-4fb2-816f-418a46a74b24 ``` **Configuring log levels:** You can configure log levels for the databases individually in the configuration section of the Qdrant Cluster detail page. The log level for the **Qdrant Cloud Agent** and **Operator** can be set in the [Hybrid Cloud Environment configuration](https://qdrant.tech/documentation/hybrid-cloud/operator-configuration/). ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/\#integrating-with-a-log-management-system) Integrating with a log management system You can integrate the logs into any log management system that supports Kubernetes. There are no Qdrant specific configurations necessary. Just configure the agents of your system to collect the logs from all Pods in the Qdrant namespace. ## [Anchor](https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/\#monitoring) Monitoring The Qdrant Cloud console gives you access to basic metrics about CPU, memory and disk usage of your Qdrant clusters. If you want to integrate the Qdrant metrics into your own monitoring system, you can instruct it to scrape the following endpoints that provide metrics in a Prometheus/OpenTelemetry compatible format: - `/metrics` on port 6333 of every Qdrant database Pod, this provides metrics about each the database and its internals itself - `/metrics` on port 9290 of the Qdrant Operator Pod, this provides metrics about the Operator, as well as the status of Qdrant Clusters and Snapshots - `/metrics` on port 9090 of the Qdrant Cloud Agent Pod, this provides metrics about the Agent and its connection to the Qdrant Cloud control plane - `/metrics` on port 8080 of the [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) Pod, this provides metrics about the state of Kubernetes resources like Pods and PersistentVolumes within the Qdrant Hybrid Cloud namespace (useful, if you are not running kube-state-metrics cluster-wide anyway) ### [Anchor](https://qdrant.tech/documentation/hybrid-cloud/networking-logging-monitoring/\#grafana-dashboard) Grafana dashboard If you scrape the above metrics into your own monitoring system, and your are using Grafana, you can use our [Grafana dashboard](https://github.com/qdrant/qdrant-cloud-grafana-dashboard) to visualize these metrics. ![Grafa dashboard](https://qdrant.tech/documentation/cloud/cloud-grafana-dashboard.png) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/networking-logging-monitoring.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/hybrid-cloud/networking-logging-monitoring.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-188-lllmstxt|> ## optimize - [Documentation](https://qdrant.tech/documentation/) - [Guides](https://qdrant.tech/documentation/guides/) - Optimize Performance --- # [Anchor](https://qdrant.tech/documentation/guides/optimize/\#optimizing-qdrant-performance-three-scenarios) Optimizing Qdrant Performance: Three Scenarios Different use cases require different balances between memory usage, search speed, and precision. Qdrant is designed to be flexible and customizable so you can tune it to your specific needs. This guide will walk you three main optimization strategies: - High Speed Search & Low Memory Usage - High Precision & Low Memory Usage - High Precision & High Speed Search ![qdrant resource tradeoffs](https://qdrant.tech/docs/tradeoff.png) ## [Anchor](https://qdrant.tech/documentation/guides/optimize/\#1-high-speed-search-with-low-memory-usage) 1\. High-Speed Search with Low Memory Usage To achieve high search speed with minimal memory usage, you can store vectors on disk while minimizing the number of disk reads. Vector quantization is a technique that compresses vectors, allowing more of them to be stored in memory, thus reducing the need to read from disk. To configure in-memory quantization, with on-disk original vectors, you need to create a collection with the following parameters: - `on_disk`: Stores original vectors on disk. - `quantization_config`: Compresses quantized vectors to `int8` using the `scalar` method. - `always_ram`: Keeps quantized vectors in RAM. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine", "on_disk": true }, "quantization_config": { "scalar": { "type": "int8", "always_ram": true } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE, on_disk=True), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, always_ram=True, ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", on_disk: true, }, quantization_config: { scalar: { type: "int8", always_ram: true, }, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, QuantizationType, ScalarQuantizationBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .quantization_config( ScalarQuantizationBuilder::default() .r#type(QuantizationType::Int8.into()) .always_ram(true), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.QuantizationConfig; import io.qdrant.client.grpc.Collections.QuantizationType; import io.qdrant.client.grpc.Collections.ScalarQuantization; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .setOnDisk(true) .build()) .build()) .setQuantizationConfig( QuantizationConfig.newBuilder() .setScalar( ScalarQuantization.newBuilder() .setType(QuantizationType.Int8) .setAlwaysRam(true) .build()) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine, OnDisk = true }, quantizationConfig: new QuantizationConfig { Scalar = new ScalarQuantization { Type = QuantizationType.Int8, AlwaysRam = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, OnDisk: qdrant.PtrOf(true), }), QuantizationConfig: qdrant.NewQuantizationScalar(&qdrant.ScalarQuantization{ Type: qdrant.QuantizationType_Int8, AlwaysRam: qdrant.PtrOf(true), }), }) ``` ### [Anchor](https://qdrant.tech/documentation/guides/optimize/\#disable-rescoring-for-faster-search-optional) Disable Rescoring for Faster Search (optional) This is completely optional. Disabling rescoring with search `params` can further reduce the number of disk reads. Note that this might slightly decrease precision. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "params": { "quantization": { "rescore": false } }, "limit": 10 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], search_params=models.SearchParams( quantization=models.QuantizationSearchParams(rescore=False) ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], params: { quantization: { rescore: false, }, }, }); ``` ```rust use qdrant_client::qdrant::{ QuantizationSearchParamsBuilder, QueryPointsBuilder, SearchParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .params( SearchParamsBuilder::default() .quantization(QuantizationSearchParamsBuilder::default().rescore(false)), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QuantizationSearchParams; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SearchParams; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setParams( SearchParams.newBuilder() .setQuantization( QuantizationSearchParams.newBuilder().setRescore(false).build()) .build()) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, searchParams: new SearchParams { Quantization = new QuantizationSearchParams { Rescore = false } }, limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Params: &qdrant.SearchParams{ Quantization: &qdrant.QuantizationSearchParams{ Rescore: qdrant.PtrOf(true), }, }, }) ``` ## [Anchor](https://qdrant.tech/documentation/guides/optimize/\#2-high-precision-with-low-memory-usage) 2\. High Precision with Low Memory Usage If you require high precision but have limited RAM, you can store both vectors and the HNSW index on disk. This setup reduces memory usage while maintaining search precision. To store the vectors `on_disk`, you need to configure both the vectors and the HNSW index: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine", "on_disk": true }, "hnsw_config": { "on_disk": true } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE, on_disk=True), hnsw_config=models.HnswConfigDiff(on_disk=True), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", on_disk: true, }, hnsw_config: { on_disk: true, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, HnswConfigDiffBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine).on_disk(true)) .hnsw_config(HnswConfigDiffBuilder::default().on_disk(true)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.HnswConfigDiff; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .setOnDisk(true) .build()) .build()) .setHnswConfig(HnswConfigDiff.newBuilder().setOnDisk(true).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine, OnDisk = true }, hnswConfig: new HnswConfigDiff { OnDisk = true } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, OnDisk: qdrant.PtrOf(true), }), HnswConfig: &qdrant.HnswConfigDiff{ OnDisk: qdrant.PtrOf(true), }, }) ``` ### [Anchor](https://qdrant.tech/documentation/guides/optimize/\#improving-precision) Improving Precision Increase the `ef` and `m` parameters of the HNSW index to improve precision, even with limited RAM: ```json ... "hnsw_config": { "m": 64, "ef_construct": 512, "on_disk": true } ... ``` **Note:** The speed of this setup depends on the disk’s IOPS (Input/Output Operations Per Second). You can use [fio](https://gist.github.com/superboum/aaa45d305700a7873a8ebbab1abddf2b) to measure disk IOPS. ## [Anchor](https://qdrant.tech/documentation/guides/optimize/\#3-high-precision-with-high-speed-search) 3\. High Precision with High-Speed Search For scenarios requiring both high speed and high precision, keep as much data in RAM as possible. Apply quantization with re-scoring for tunable accuracy. Here is how you can configure scalar quantization for a collection: httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "quantization_config": { "scalar": { "type": "int8", "always_ram": true } } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), quantization_config=models.ScalarQuantization( scalar=models.ScalarQuantizationConfig( type=models.ScalarType.INT8, always_ram=True, ), ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, quantization_config: { scalar: { type: "int8", always_ram: true, }, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, QuantizationType, ScalarQuantizationBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .quantization_config( ScalarQuantizationBuilder::default() .r#type(QuantizationType::Int8.into()) .always_ram(true), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.QuantizationConfig; import io.qdrant.client.grpc.Collections.QuantizationType; import io.qdrant.client.grpc.Collections.ScalarQuantization; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setQuantizationConfig( QuantizationConfig.newBuilder() .setScalar( ScalarQuantization.newBuilder() .setType(QuantizationType.Int8) .setAlwaysRam(true) .build()) .build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine}, quantizationConfig: new QuantizationConfig { Scalar = new ScalarQuantization { Type = QuantizationType.Int8, AlwaysRam = true } } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), QuantizationConfig: qdrant.NewQuantizationScalar(&qdrant.ScalarQuantization{ Type: qdrant.QuantizationType_Int8, AlwaysRam: qdrant.PtrOf(true), }), }) ``` ### [Anchor](https://qdrant.tech/documentation/guides/optimize/\#fine-tuning-search-parameters) Fine-Tuning Search Parameters You can adjust search parameters like `hnsw_ef` and `exact` to balance between speed and precision: **Key Parameters:** - `hnsw_ef`: Number of neighbors to visit during search (higher value = better accuracy, slower speed). - `exact`: Set to `true` for exact search, which is slower but more accurate. You can use it to compare results of the search with different `hnsw_ef` values versus the ground truth. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": [0.2, 0.1, 0.9, 0.7], "params": { "hnsw_ef": 128, "exact": false }, "limit": 3 } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=[0.2, 0.1, 0.9, 0.7], search_params=models.SearchParams(hnsw_ef=128, exact=False), limit=3, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: [0.2, 0.1, 0.9, 0.7], params: { hnsw_ef: 128, exact: false, }, limit: 3, }); ``` ```rust use qdrant_client::qdrant::{QueryPointsBuilder, SearchParamsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query(vec![0.2, 0.1, 0.9, 0.7]) .limit(3) .params(SearchParamsBuilder::default().hnsw_ef(128).exact(false)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.SearchParams; import static io.qdrant.client.QueryFactory.nearest; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(nearest(0.2f, 0.1f, 0.9f, 0.7f)) .setParams(SearchParams.newBuilder().setHnswEf(128).setExact(false).build()) .setLimit(3) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new float[] { 0.2f, 0.1f, 0.9f, 0.7f }, searchParams: new SearchParams { HnswEf = 128, Exact = false }, limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQuery(0.2, 0.1, 0.9, 0.7), Params: &qdrant.SearchParams{ HnswEf: qdrant.PtrOf(uint64(128)), Exact: qdrant.PtrOf(false), }, }) ``` ## [Anchor](https://qdrant.tech/documentation/guides/optimize/\#balancing-latency-and-throughput) Balancing Latency and Throughput When optimizing search performance, latency and throughput are two main metrics to consider: - **Latency:** Time taken for a single request. - **Throughput:** Number of requests handled per second. The following optimization approaches are not mutually exclusive, but in some cases it might be preferable to optimize for one or another. ### [Anchor](https://qdrant.tech/documentation/guides/optimize/\#minimizing-latency) Minimizing Latency To minimize latency, you can set up Qdrant to use as many cores as possible for a single request. You can do this by setting the number of segments in the collection to be equal to the number of cores in the system. In this case, each segment will be processed in parallel, and the final result will be obtained faster. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "optimizers_config": { "default_segment_number": 16 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), optimizers_config=models.OptimizersConfigDiff(default_segment_number=16), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, optimizers_config: { default_segment_number: 16, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, OptimizersConfigDiffBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .optimizers_config( OptimizersConfigDiffBuilder::default().default_segment_number(16), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setOptimizersConfig( OptimizersConfigDiff.newBuilder().setDefaultSegmentNumber(16).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, optimizersConfig: new OptimizersConfigDiff { DefaultSegmentNumber = 16 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), OptimizersConfig: &qdrant.OptimizersConfigDiff{ DefaultSegmentNumber: qdrant.PtrOf(uint64(16)), }, }) ``` ### [Anchor](https://qdrant.tech/documentation/guides/optimize/\#maximizing-throughput) Maximizing Throughput To maximize throughput, configure Qdrant to use as many cores as possible to process multiple requests in parallel. To do that, use fewer segments (usually 2) of larger size (default 200Mb per segment) to handle more requests in parallel. Large segments benefit from the size of the index and overall smaller number of vector comparisons required to find the nearest neighbors. However, they will require more time to build the HNSW index. httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "optimizers_config": { "default_segment_number": 2, "max_segment_size": 5000000 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), optimizers_config=models.OptimizersConfigDiff(default_segment_number=2, max_segment_size=5000000), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, optimizers_config: { default_segment_number: 2, max_segment_size: 5000000, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, OptimizersConfigDiffBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .optimizers_config( OptimizersConfigDiffBuilder::default().default_segment_number(2).max_segment_size(5000000), ), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setOptimizersConfig( OptimizersConfigDiff.newBuilder() .setDefaultSegmentNumber(2) .setMaxSegmentSize(5000000) .build() ) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, optimizersConfig: new OptimizersConfigDiff { DefaultSegmentNumber = 2, MaxSegmentSize = 5000000 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), OptimizersConfig: &qdrant.OptimizersConfigDiff{ DefaultSegmentNumber: qdrant.PtrOf(uint64(2)), MaxSegmentSize: qdrant.PtrOf(uint64(5000000)), }, }) ``` ## [Anchor](https://qdrant.tech/documentation/guides/optimize/\#summary) Summary By adjusting configurations like vector storage, quantization, and search parameters, you can optimize Qdrant for different use cases: - **Low Memory + High Speed:** Use vector quantization. - **High Precision + Low Memory:** Store vectors and HNSW index on disk. - **High Precision + High Speed:** Keep data in RAM, use quantization with re-scoring. - **Latency vs. Throughput:** Adjust segment numbers based on the priority. Choose the strategy that best fits your use case to get the most out of Qdrant’s performance capabilities. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/optimize.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/guides/optimize.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-189-lllmstxt|> ## cluster-scaling - [Documentation](https://qdrant.tech/documentation/) - [Cloud](https://qdrant.tech/documentation/cloud/) - Scale Clusters --- # [Anchor](https://qdrant.tech/documentation/cloud/cluster-scaling/\#scaling-qdrant-cloud-clusters) Scaling Qdrant Cloud Clusters The amount of data is always growing and at some point you might need to upgrade or downgrade the capacity of your cluster. ![Cluster Scaling](https://qdrant.tech/documentation/cloud/cluster-scaling.png) There are different options for how it can be done. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-scaling/\#vertical-scaling) Vertical Scaling Vertical scaling is the process of increasing the capacity of a cluster by adding or removing CPU, storage and memory resources on each database node. You can start with a minimal cluster configuration of 2GB of RAM and resize it up to 64GB of RAM (or even more if desired) over the time step by step with the growing amount of data in your application. If your cluster consists of several nodes each node will need to be scaled to the same size. Please note that vertical cluster scaling will require a short downtime period to restart your cluster. In order to avoid a downtime you can make use of data replication, which can be configured on the collection level. Vertical scaling can be initiated on the cluster detail page via the button “scale”. If you want to scale your cluster down, the new, smaller memory size must be still sufficient to store all the data in the cluster. Otherwise, the database cluster could run out of memory and crash. Therefore, the new memory size must be at least as large as the current memory usage of the database cluster including a bit of buffer. Qdrant Cloud will automatically prevent you from scaling down the Qdrant database cluster with a too small memory size. Note, that it is not possible to scale down the disk space of the cluster due to technical limitations of the underlying cloud providers. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-scaling/\#horizontal-scaling) Horizontal Scaling Vertical scaling can be an effective way to improve the performance of a cluster and extend the capacity, but it has some limitations. The main disadvantage of vertical scaling is that there are limits to how much a cluster can be expanded. At some point, adding more resources to a cluster can become impractical or cost-prohibitive. In such cases, horizontal scaling may be a more effective solution. Horizontal scaling, also known as horizontal expansion, is the process of increasing the capacity of a cluster by adding more nodes and distributing the load and data among them. The horizontal scaling at Qdrant starts on the collection level. You have to choose the number of shards you want to distribute your collection around while creating the collection. Please refer to the [sharding documentation](https://qdrant.tech/documentation/guides/distributed_deployment/#sharding) section for details. After that, you can configure, or change the amount of Qdrant database nodes within a cluster during cluster creation, or on the cluster detail page via “Scale” button. Important: The number of shards means the maximum amount of nodes you can add to your cluster. In the beginning, all the shards can reside on one node. With the growing amount of data you can add nodes to your cluster and move shards to the dedicated nodes using the [cluster setup API](https://qdrant.tech/documentation/guides/distributed_deployment/#cluster-scaling). When scaling down horizontally, the cloud platform will automatically ensure that any shards that are present on the nodes to be deleted, are moved to the remaining nodes. We will be glad to consult you on an optimal strategy for scaling. [Let us know](https://qdrant.tech/documentation/support/) your needs and decide together on a proper solution. ## [Anchor](https://qdrant.tech/documentation/cloud/cluster-scaling/\#resharding) Resharding _Available as of Qdrant v1.13.0_ When creating a collection, it has a specific number of shards. The ideal number of shards might change as your cluster evolves. Resharding allows you to change the number of shards in your existing collections, both up and down, without having to recreate the collection from scratch. Resharding is a transparent process, meaning that the collection is still available while resharding is going on without having downtime. This allows you to scale from one node to any number of nodes and back, keeping your data perfectly distributed without compromise. To increase the number of shards (reshard up), use the [Update collection cluster setup API](https://api.qdrant.tech/master/api-reference/distributed/update-collection-cluster) to initiate the resharding process: ```http POST /collections/{collection_name}/cluster { "start_resharding": { "direction": "up", "shard_key": null } } ``` To decrease the number of shards (reshard down), you may specify the `"down"` direction. The current status of resharding is listed in the [collection cluster info](https://api.qdrant.tech/v-1-12-x/api-reference/distributed/collection-cluster-info) which can be fetched with: ```http GET /collections/{collection_name}/cluster ``` We always recommend to run an ongoing resharding operation till the end. But, if at any point the resharding operation needs to be aborted, you can use: ```http POST /collections/{collection_name}/cluster { "abort_resharding": {} } ``` A few things to be aware of with regards to resharding: - during resharding, performance of your cluster may be slightly reduced - during resharding, reported point counts will not be accurate - resharding may be a long running operation on huge collections - you can only run one resharding operation per collection at a time ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-scaling.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud/cluster-scaling.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-190-lllmstxt|> ## large-scale-search - [Documentation](https://qdrant.tech/documentation/) - [Database tutorials](https://qdrant.tech/documentation/database-tutorials/) - Large Scale Search --- # [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#upload-and-search-large-collections-cost-efficiently) Upload and Search Large collections cost-efficiently | Time: 2 days | Level: Advanced | | | | --- | --- | --- | --- | In this tutorial, we will describe an approach to upload, index, and search a large volume of data cost-efficiently, on an example of the real-world dataset [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/). The goal of this tutorial is to demonstrate what minimal amount of resources is required to index and search a large dataset, while still maintaining a reasonable search latency and accuracy. All relevant code snippets are available in the [GitHub repository](https://github.com/qdrant/laion-400m-benchmark). The recommended Qdrant version for this tutorial is `v1.13.5` and higher. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#dataset) Dataset The dataset we will use is [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/), a collection of approximately 400 million vectors obtained from images extracted from a Common Crawl dataset. Each vector is 512-dimensional and generated using a [CLIP](https://openai.com/blog/clip/) model. Vectors are associated with a number of metadata fields, such as `url`, `caption`, `LICENSE`, etc. The overall payload size is approximately 200 GB, and the vectors are 400 GB. The dataset is available in the form of 409 chunks, each containing approximately 1M vectors. We will use the following [python script](https://github.com/qdrant/laion-400m-benchmark/blob/master/upload.py) to upload dataset chunks one by one. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#hardware) Hardware After some initial experiments, we figured out a minimal hardware configuration for the task: - 8 CPU cores - 64Gb RAM - 650Gb Disk space ![Hardware configuration](https://qdrant.tech/documentation/tutorials/large-scale-search/hardware.png) Hardware configuration This configuration is enough to index and explore the dataset in a single-user mode; latency is reasonable enough to build interactive graphs and navigate in the dashboard. Naturally, you might need more CPU cores and RAM for production-grade configurations. It is important to ensure high network bandwidth for this experiment so you are running the client and server in the same region. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#uploading-and-indexing) Uploading and Indexing We will use the following [python script](https://github.com/qdrant/laion-400m-benchmark/blob/master/upload.py) to upload dataset chunks one by one. ```bash export QDRANT_URL="https://xxxx-xxxx.xxxx.cloud.qdrant.io" export QDRANT_API_KEY="xxxx-xxxx-xxxx-xxxx" python upload.py ``` This script will download chunks of the LAION dataset one by one and upload them to Qdrant. Intermediate data is not persisted on disk, so the script doesn’t require much disk space on the client side. Let’s take a look at the collection configuration we used: ```python client.create_collection( QDRANT_COLLECTION_NAME, vectors_config=models.VectorParams( size=512, # CLIP model output size distance=models.Distance.COSINE, # CLIP model uses cosine distance datatype=models.Datatype.FLOAT16, # We only need 16 bits for float, otherwise disk usage would be 800Gb instead of 400Gb on_disk=True # We don't need original vectors in RAM ), # Even though CLIP vectors don't work well with binary quantization, out of the box, # we can rely on query-time oversampling to get more accurate results quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig( always_ram=True, ) ), optimizers_config=models.OptimizersConfigDiff( # Bigger size of segments are desired for faster search # However it might be slower for indexing max_segment_size=5_000_000, ), # Having larger M value is desirable for higher accuracy, # but in our case we care more about memory usage # We could still achieve reasonable accuracy even with M=6 + oversampling hnsw_config=models.HnswConfigDiff( m=6, # decrease M for lower memory usage on_disk=False ), ) ``` There are a few important points to note: - We use `FLOAT16` datatype for vectors, which allows us to store vectors in half the size compared to `FLOAT32`. There are no significant accuracy losses for this dataset. - We use `BinaryQuantization` with `always_ram=True` to enable query-time oversampling. This allows us to get an accurate and resource-efficient search, even though 512d CLIP vectors don’t work well with binary quantization out of the box. - We use `HnswConfig` with `m=6` to reduce memory usage. We will look deeper into memory usage in the next section. Goal of this configuration is to ensure that prefetch component of the search never needs to load data from disk, and at least a minimal version of vectors and vector index is always in RAM. The second stage of the search can explicitly determine how many times we can afford to load data from a disk. In our experiment, the upload process was going at 5000 points per second. The indexation process was going in parallel with the upload and was happening at the rate of approximately 4000 points per second. ![Upload and indexation process](https://qdrant.tech/documentation/tutorials/large-scale-search/upload_process.png) Upload and indexation process ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#memory-usage) Memory Usage After the upload and indexation process is finished, let’s take a detailed look at the memory usage of the Qdrant server. ![Memory usage](https://qdrant.tech/documentation/tutorials/large-scale-search/memory_usage.png) Memory usage On the high level, memory usage consists of 3 components: - System memory - 8.34Gb - this is memory reserved for internal systems and OS, it doesn’t depend on the dataset size. - Data memory - 39.27Gb - this is a resident memory of qdrant process, it can’t be evicter and qdrant process will crash if it exceeds the limit. - Cache memory - 14.54Gb - this is a disk cache qdrant uses. It is necessary for fast search but can be evicted if needed. The most interest for us is Data and Cache memory. Let’s look what exactly is stored in these components. In our scenario, Qdrant uses memory to store the following components: - Storing vectors - Storing vector index - Storing information about IDs and versions of points ### [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#size-of-vectors) Size of vectors In our scenario, we store only quantized vectors in RAM, so it is relatively easy to calculate the required size: ```text 400_000_000 * 512d / 8 bits / 1024 (Kb) / 1024 (Mb) / 1024 (Gb) = 23.84Gb ``` ### [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#size-of-vector-index) Size of vector index Vector index is a bit more complicated, as it is not a simple matrix. Internally, it is stored as a list of connections in a graph, and each connection is a 4-byte integer. The number of connections is defined by the `M` parameter of the HNSW index, and in our case, it is `6` on the high level and `2 x M` on level 0. This gives us the following estimation: ```text 400_000_000 * (6 * 2) * 4 bytes / 1024 (Kb) / 1024 (Mb) / 1024 (Gb) = 17.881Gb ``` In practice the size of index is a bit smaller due to the [compression](https://qdrant.tech/blog/qdrant-1.13.x/#hnsw-graph-compression) we implemented in Qdrant v1.13.0, but it is still a good estimation. The HNSW index in Qdrant is stored as a mmap, and it can be evicted from RAM if needed. So, the memory consumption of HNSW falls under the category of `Cache memory`. ### [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#size-of-ids-and-versions) Size of IDs and versions Qdrant must store additional information about each point, such as ID and version. This information is needed on each request, so it is very important to keep it in RAM for fast access. Let’s take a look at Qdrant internals to understand how much memory is required for this information. ```rust // This is s simplified version of the IdTracker struct // It omits all optimizations and small details, // but gives a good estimation of memory usage IdTracker { // Mapping of internal id to version (u64), compressed to 4 bytes // Required for versioning and conflict resolution between segments internal_to_version, // 400M x 4 = 1.5Gb // Mapping of external id to internal id, 4 bytes per point. // Required to determine original point ID after search inside the segment internal_to_external: Vec, // 400M x 16 = 6.4Gb // Mapping of external id to internal id. For numeric ids it uses 8 bytes, // UUIDs are stored as 16 bytes. // Required to determine sequential point ID inside the segment external_to_internal: Vec, // 400M x (8 + 4) = 4.5Gb } ``` In the v1.13.5 we introduced a [significant optimization](https://github.com/qdrant/qdrant/pull/6023) to reduce the memory usage of `IdTracker` by approximately 2 times. So the total memory usage of `IdTracker` in our case is approximately `12.4Gb`. So total expected RAM usage of Qdrant server in our case is approximately `23.84Gb + 17.881Gb + 12.4Gb = 54.121Gb`, which is very close to the actual memory usage we observed: `39.27Gb + 14.54Gb = 53.81Gb`. We had to apply some simplifications to the estimations, but they are good enough to understand the memory usage of the Qdrant server. ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#search) Search After the dataset is uploaded and indexed, we can start searching for similar vectors. We can start by exploring the dataset in Web-UI. So you can get an intuition into the search performance, not just table numbers. ![Web-UI Bear image](https://qdrant.tech/documentation/tutorials/large-scale-search/web-ui-bear1.png) Web-UI Bear image ![Web-UI similar Bear image](https://qdrant.tech/documentation/tutorials/large-scale-search/web-ui-bear2.png) Web-UI similar Bear image Web-UI default requests do not use oversampling, but the observable results are still good enough to see the resemblance between images. ### [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#ground-truth-data) Ground truth data However, to estimate the search performance more accurately, we need to compare search results with the ground truth. Unfortunately, the LAION dataset doesn’t contain usable ground truth, so we had to generate it ourselves. To do this, we need to perform a full-scan search for each vector in the dataset and store the results in a separate file. Unfortunately, this process is very time-consuming and requires a lot of resources, so we had to limit the number of queries to 100, we provide a ready-to-use [ground truth file](https://github.com/qdrant/laion-400m-benchmark/blob/master/expected.py) and the [script](https://github.com/qdrant/laion-400m-benchmark/blob/master/full_scan.py) to generate it (requires 512Gb RAM machine and about 20 hours of execution time). Our ground truth file contains 100 queries, each with 50 results. The first 100 vectors of the dataset itself were used to generate queries. ### [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#search-query) Search Query To precisely control the amount of oversampling, we will use the following search query: ```python limit = 50 rescore_limit = 1000 # oversampling factor is 20 query = vectors[query_id] # One of existing vectors response = client.query_points( collection_name=QDRANT_COLLECTION_NAME, query=query, limit=limit, # Go to disk search_params=models.SearchParams( quantization=models.QuantizationSearchParams( rescore=True, ), ), # Prefetch is performed using only in-RAM data, # so querying even large amount of data is fast prefetch=models.Prefetch( query=query, limit=rescore_limit, params=models.SearchParams( quantization=models.QuantizationSearchParams( # Avoid rescoring in prefetch # We should do it explicitly on the second stage rescore=False, ), ) ) ) ``` As you can see, this query contains two stages: - First stage is a prefetch, which is performed using only in-RAM data. It is very fast and allows us to get a large amount of candidates. - The second stage is a rescore, which is performed with full-size vectors stored on disks. By using 2-stage search we can precisely control the amount of data loaded from disk and ensure the balance between search speed and accuracy. You can find the complete code of the search process in the [eval.py](https://github.com/qdrant/laion-400m-benchmark/blob/master/eval.py) ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#performance-tweak) Performance tweak One important performance tweak we found useful for this dataset is to enable [Async IO](https://qdrant.tech/articles/io_uring) in Qdrant. By default, Qdrant uses synchronous IO, which is good for in-memory datasets but can be a bottleneck when we want to read a lot of data from a disk. Async IO (implemented with `io_uring`) allows to send parallel requests to the disk and saturate the disk bandwidth. This is exactly what we are looking for when performing large-scale re-scoring with original vectors. Instead of reading vectors one by one and waiting for the disk response 1000 times, we can send 1000 requests to the disk and wait for all of them to complete. This allows us to saturate the disk bandwidth and get faster results. To enable Async IO in Qdrant, you need to set the following environment variable: ```bash QDRANT__STORAGE__PERFORMANCE__ASYNC_SCORER=true ``` Or set parameter in config file: ```yaml storage: performance: async_scorer: true ``` In Qdrant Managed cloud Async IO can be enabled via `Advanced optimizations` section in cluster `Configuration` tab. ![Async IO configuration in Cloud](https://qdrant.tech/documentation/tutorials/large-scale-search/async_io.png) Async IO configuration in Cloud ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#running-search-requests) Running search requests Once all the preparations are done, we can run the search requests and evaluate the results. You can find the full code of the search process in the [eval.py](https://github.com/qdrant/laion-400m-benchmark/blob/master/eval.py) This script will run 100 search requests with configured oversampling factor and compare the results with the ground truth. ```bash python eval.py --rescore_limit 1000 ``` In our request we achieved the following results: | Rescore Limit | Precision@50 | Time per request | | --- | --- | --- | | 1000 | 75.2% | 0.7s | | 5000 | 81.0% | 2.2s | Additional experiments with `m=16` demonstrated that we can achieve `85%` precision with `rescore_limit=1000`, but they would require slightly more memory. ![Log of search evaluation](https://qdrant.tech/documentation/tutorials/large-scale-search/precision.png) Log of search evaluation ## [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#conclusion) Conclusion In this tutorial we demonstrated how to upload, index and search a large dataset in Qdrant cost-efficiently. Binary quantization can be applied even on 512d vectors, if combined with query-time oversampling. Qdrant allows to precisely control where each part of storage is located, which allows to achieve a good balance between search speed and memory usage. ### [Anchor](https://qdrant.tech/documentation/database-tutorials/large-scale-search/\#potential-improvements) Potential improvements In this experiment, we investigated in detail which parts of the storage are responsible for memory usage and how to control them. One especially interesting part is the `VectorIndex` component, which is responsible for storing the graph of connections between vectors. In our further research, we will investigate the possibility of making HNSW more disk-friendly so it can be offloaded to disk without significant performance losses. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/large-scale-search.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/database-tutorials/large-scale-search.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-191-lllmstxt|> ## modern-sparse-neural-retrieval - [Articles](https://qdrant.tech/articles/) - Modern Sparse Neural Retrieval: From Theory to Practice [Back to Machine Learning](https://qdrant.tech/articles/machine-learning/) --- # Modern Sparse Neural Retrieval: From Theory to Practice Evgeniya Sukhodolskaya · October 23, 2024 ![Modern Sparse Neural Retrieval: From Theory to Practice](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/preview/title.jpg) Finding enough time to study all the modern solutions while keeping your production running is rarely feasible. Dense retrievers, hybrid retrievers, late interaction… How do they work, and where do they fit best? If only we could compare retrievers as easily as products on Amazon! We explored the most popular modern sparse neural retrieval models and broke them down for you. By the end of this article, you’ll have a clear understanding of the current landscape in sparse neural retrieval and how to navigate through complex, math-heavy research papers with sky-high NDCG scores without getting overwhelmed. [The first part](https://qdrant.tech/articles/modern-sparse-neural-retrieval/#sparse-neural-retrieval-evolution) of this article is theoretical, comparing different approaches used in modern sparse neural retrieval. [The second part](https://qdrant.tech/articles/modern-sparse-neural-retrieval/#splade-in-qdrant) is more practical, showing how the best model in modern sparse neural retrieval, `SPLADE++`, can be used in Qdrant and recommendations on when to choose sparse neural retrieval for your solutions. ## [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#sparse-neural-retrieval-as-if-keyword-based-retrievers-understood-meaning) Sparse Neural Retrieval: As If Keyword-Based Retrievers Understood Meaning **Keyword-based (lexical) retrievers** like BM25 provide a good explainability. If a document matches a query, it’s easy to understand why: query terms are present in the document, and if these are rare terms, they are more important for retrieval. ![Keyword-based (Lexical) Retrieval](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/LexicalRetrievers.png) With their mechanism of exact term matching, they are super fast at retrieval. A simple **inverted index**, which maps back from a term to a list of documents where this term occurs, saves time on checking millions of documents. ![Inverted Index](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/InvertedIndex.png) Lexical retrievers are still a strong baseline in retrieval tasks. However, by design, they’re unable to bridge **vocabulary** and **semantic mismatch** gaps. Imagine searching for a “ _tasty cheese_” in an online store and not having a chance to get “ _Gouda_” or “ _Brie_” in your shopping basket. **Dense retrievers**, based on machine learning models which encode documents and queries in dense vector representations, are capable of breaching this gap and finding you “ _a piece of Gouda_”. ![Dense Retrieval](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/DenseRetrievers.png) However, explainability here suffers: why is this query representation close to this document representation? Why, searching for “ _cheese_”, we’re also offered “ _mouse traps_”? What does each number in this vector representation mean? Which one of them is capturing the cheesiness? Without a solid understanding, balancing result quality and resource consumption becomes challenging. Since, hypothetically, any document could match a query, relying on an inverted index with exact matching isn’t feasible. This doesn’t mean dense retrievers are inherently slower. However, lexical retrieval has been around long enough to inspire several effective architectural choices, which are often worth reusing. Sooner or later, there should have been somebody who would say, “ _Wait, but what if I want something timeproof like BM25 but with semantic understanding?_” ## [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#sparse-neural-retrieval-evolution) Sparse Neural Retrieval Evolution Imagine searching for a “ _flabbergasting murder_” story. ” _Flabbergasting_” is a rarely used word, so a keyword-based retriever, for example, BM25, will assign huge importance to it. Consequently, there is a high chance that a text unrelated to any crimes but mentioning something “ _flabbergasting_” will pop up in the top results. What if we could instead of relying on term frequency in a document as a proxy of term’s importance as it happens in BM25, directly predict a term’s importance? The goal is for rare but non-impactful terms to be assigned a much smaller weight than important terms with the same frequency, while both would be equally treated in the BM25 scenario. How can we determine if one term is more important than another? Word impact is related to its meaning, and its meaning can be derived from its context (words which surround this particular word). That’s how dense contextual embedding models come into the picture. All the sparse retrievers are based on the idea of taking a model which produces contextual dense vector representations for terms and teaching it to produce sparse ones. Very often, [Bidirectional Encoder Representations from the Transformers (BERT)](https://huggingface.co/docs/transformers/en/model_doc/bert) is used as a base model, and a very simple trainable neural network is added on top of it to sparsify the representations out. Training this small neural network is usually done by sampling from the [MS MARCO](https://microsoft.github.io/msmarco/) dataset a query, relevant and irrelevant to it documents and shifting the parameters of the neural network in the direction of relevancy. ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#the-pioneer-of-sparse-neural-retrieval) The Pioneer Of Sparse Neural Retrieval ![Deep Contextualized Term Weighting (DeepCT)](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/DeepCT.png) The authors of one of the first sparse retrievers, the [`Deep Contextualized Term Weighting framework (DeepCT)`](https://arxiv.org/pdf/1910.10687), predict an integer word’s impact value separately for each unique word in a document and a query. They use a linear regression model on top of the contextual representations produced by the basic BERT model, the model’s output is rounded. When documents are uploaded into a database, the importance of words in a document is predicted by a trained linear regression model and stored in the inverted index in the same way as term frequencies in BM25 retrievers. Then, the retrieval process is identical to the BM25 one. _**Why is DeepCT not a perfect solution?**_ To train linear regression, the authors needed to provide the true value ( **ground truth**) of each word’s importance so the model could “see” what the right answer should be. This score is hard to define in a way that it truly expresses the query-document relevancy. Which score should have the most relevant word to a query when this word is taken from a five-page document? The second relevant? The third? ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#sparse-neural-retrieval-on-relevance-objective) Sparse Neural Retrieval on Relevance Objective ![DeepImpact](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/DeepImpact.png) It’s much easier to define whether a document as a whole is relevant or irrelevant to a query. That’s why the [`DeepImpact`](https://arxiv.org/pdf/2104.12016) Sparse Neural Retriever authors directly used the relevancy between a query and a document as a training objective. They take BERT’s contextualized embeddings of the document’s words, transform them through a simple 2-layer neural network in a single scalar score and sum these scores up for each word overlapping with a query. The training objective is to make this score reflect the relevance between the query and the document. _**Why is DeepImpact not a perfect solution?**_ When converting texts into dense vector representations, the BERT model does not work on a word level. Sometimes, it breaks the words into parts. For example, the word “ _vector_” will be processed by BERT as one piece, but for some words that, for example, BERT hasn’t seen before, it is going to cut the word in pieces [as “Qdrant” turns to “Q”, “#dra” and “#nt”](https://huggingface.co/spaces/Xenova/the-tokenizer-playground) The DeepImpact model (like the DeepCT model) takes the first piece BERT produces for a word and discards the rest. However, what can one find searching for “ _Q_” instead of “ _Qdrant_”? ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#know-thine-tokenization) Know Thine Tokenization ![Term Independent Likelihood MoDEl v2 (TILDE v2)](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/TILDEv2.png) To solve the problems of DeepImpact’s architecture, the [`Term Independent Likelihood MoDEl (TILDEv2)`](https://arxiv.org/pdf/2108.08513) model generates sparse encodings on a level of BERT’s representations, not on words level. Aside from that, its authors use the identical architecture to the DeepImpact model. _**Why is TILDEv2 not a perfect solution?**_ A single scalar importance score value might not be enough to capture all distinct meanings of a word. **Homonyms** (pizza, cocktail, flower, and female name “ _Margherita_”) are one of the troublemakers in information retrieval. ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#sparse-neural-retriever-which-understood-homonyms) Sparse Neural Retriever Which Understood Homonyms ![COntextualized Inverted List (COIL)](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/COIL.png) If one value for the term importance score is insufficient, we could describe the term’s importance in a vector form! Authors of the [`COntextualized Inverted List (COIL)`](https://arxiv.org/pdf/2104.07186) model based their work on this idea. Instead of squeezing 768-dimensional BERT’s contextualised embeddings into one value, they down-project them (through the similar “relevance” training objective) to 32 dimensions. Moreover, not to miss a detail, they also encode the query terms as vectors. For each vector representing a query token, COIL finds the closest match (using the maximum dot product) vector of the same token in a document. So, for example, if we are searching for “ _Revolut bank _” and a document in a database has the sentence “ _Vivid bank was moved to the bank of Amstel _”, out of two “banks”, the first one will have a bigger value of a dot product with a “ _bank_” in the query, and it will count towards the final score. The final relevancy score of a document is a sum of scores of query terms matched. _**Why is COIL not a perfect solution?**_ This way of defining the importance score captures deeper semantics; more meaning comes with more values used to describe it. However, storing 32-dimensional vectors for every term is far more expensive, and an inverted index does not work as-is with this architecture. ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#back-to-the-roots) Back to the Roots ![Universal COntextualized Inverted List (UniCOIL)](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/UNICOIL.png)[`Universal COntextualized Inverted List (UniCOIL)`](https://arxiv.org/pdf/2106.14807), made by the authors of COIL as a follow-up, goes back to producing a scalar value as the importance score rather than a vector, leaving unchanged all other COIL design decisions. It optimizes resources consumption but the deep semantics understanding tied to COIL architecture is again lost. ## [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#did-we-solve-the-vocabulary-mismatch-yet) Did we Solve the Vocabulary Mismatch Yet? With the retrieval based on the exact matching, however sophisticated the methods to predict term importance are, we can’t match relevant documents which have no query terms in them. If you’re searching for “ _pizza_” in a book of recipes, you won’t find “ _Margherita_”. A way to solve this problem is through the so-called **document expansion**. Let’s append words which could be in a potential query searching for this document. So, the “ _Margherita_” document becomes “ _Margherita pizza_”. Now, exact matching on “ _pizza_” will work! ![Document Expansion](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/DocumentExpansion.png) There are two types of document expansion that are used in sparse neural retrieval: **external** (one model is responsible for expansion, another one for retrieval) and **internal** (all is done by a single model). ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#external-document-expansion) External Document Expansion External document expansion uses a **generative model** (Mistral 7B, Chat-GPT, and Claude are all generative models, generating words based on the input text) to compose additions to documents before converting them to sparse representations and applying exact matching methods. #### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#external-document-expansion-with-doct5query) External Document Expansion with docT5query ![External Document Expansion with docT5query](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/docT5queryDocumentExpansion.png)[`docT5query`](https://github.com/castorini/docTTTTTquery) is the most used document expansion model. It is based on the [Text-to-Text Transfer Transformer (T5)](https://huggingface.co/docs/transformers/en/model_doc/t5) model trained to generate top-k possible queries for which the given document would be an answer. These predicted short queries (up to ~50-60 words) can have repetitions in them, so it also contributes to the frequency of the terms if the term frequency is considered by the retriever. The problem with docT5query expansion is a very long inference time, as with any generative model: it can generate only one token per run, and it spends a fair share of resources on it. #### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#external-document-expansion-with-term-independent-likelihood-model-tilde) External Document Expansion with Term Independent Likelihood MODel (TILDE) ![External Document Expansion with Term Independent Likelihood MODel (TILDE)](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/TILDEDocumentExpansion.png) [`Term Independent Likelihood MODel (TILDE)`](https://github.com/ielab/TILDE) is an external expansion method that reduces the passage expansion time compared to docT5query by 98%. It uses the assumption that words in texts are independent of each other (as if we were inserting in our speech words without paying attention to their order), which allows for the parallelisation of document expansion. Instead of predicting queries, TILDE predicts the most likely terms to see next after reading a passage’s text ( **query likelihood paradigm**). TILDE takes the probability distribution of all tokens in a BERT vocabulary based on the document’s text and appends top-k of them to the document without repetitions. _**Problems of external document expansion:**_ External document expansion might not be feasible in many production scenarios where there’s not enough time or compute to expand each and every document you want to store in a database and then additionally do all the calculations needed for retrievers. To solve this problem, a generation of models was developed which do everything in one go, expanding documents “internally”. ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#internal-document-expansion) Internal Document Expansion Let’s assume we don’t care about the context of query terms, so we can treat them as independent words that we combine in random order to get the result. Then, for each contextualized term in a document, we are free to pre-compute how this term affects every word in our vocabulary. For each document, a vector of the vocabulary length is created. To fill this vector in, for each word in the vocabulary, it is checked if the influence of any document term on it is big enough to consider it. Otherwise, the vocabulary word’s score in a document vector will be zero. For example, by pre-computing vectors for the document “ _pizza Margherita_” on a vocabulary of 50,000 most used English words, for this small document of two words, we will get a 50,000-dimensional vector of zeros, where non-zero values will be for a “ _pizza_”, “ _pizzeria_”, “ _flower_”, “ _woman_”, “ _girl_”, “ _Margherita_”, “ _cocktail_” and “ _pizzaiolo_”. ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#sparse-neural-retriever-with-internal-document-expansion) Sparse Neural Retriever with Internal Document Expansion ![Sparse Transformer Matching (SPARTA)](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/SPARTA.png) The authors of the [`Sparse Transformer Matching (SPARTA)`](https://arxiv.org/pdf/2009.13013) model use BERT’s model and BERT’s vocabulary (around 30,000 tokens). For each token in BERT vocabulary, they find the maximum dot product between it and contextualized tokens in a document and learn a threshold of a considerable (non-zero) effect. Then, at the inference time, the only thing to be done is to sum up all scores of query tokens in that document. _**Why is SPARTA not a perfect solution?**_ Trained on the MS MARCO dataset, many sparse neural retrievers, including SPARTA, show good results on MS MARCO test data, but when it comes to generalisation (working with other data), they [could perform worse than BM25](https://arxiv.org/pdf/2307.10488). ### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#state-of-the-art-of-modern-sparse-neural-retrieval) State-of-the-Art of Modern Sparse Neural Retrieval ![Sparse Lexical and Expansion Model Plus Plus, (SPLADE++)](https://qdrant.tech/articles_data/modern-sparse-neural-retrieval/SPLADE++.png) The authors of the [`Sparse Lexical and Expansion Model (SPLADE)]`](https://arxiv.org/pdf/2109.10086) family of models added dense model training tricks to the internal document expansion idea, which made the retrieval quality noticeably better. - The SPARTA model is not sparse enough by construction, so authors of the SPLADE family of models introduced explicit **sparsity regularisation**, preventing the model from producing too many non-zero values. - The SPARTA model mostly uses the BERT model as-is, without any additional neural network to capture the specifity of Information Retrieval problem, so SPLADE models introduce a trainable neural network on top of BERT with a specific architecture choice to make it perfectly fit the task. - SPLADE family of models, finally, uses **knowledge distillation**, which is learning from a bigger (and therefore much slower, not-so-fit for production tasks) model how to predict good representations. One of the last versions of the SPLADE family of models is [`SPLADE++`](https://arxiv.org/pdf/2205.04733). SPLADE++, opposed to SPARTA model, expands not only documents but also queries at inference time. We’ll demonstrate this in the next section. ## [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#splade-in-qdrant) SPLADE++ in Qdrant In Qdrant, you can use [`SPLADE++`](https://arxiv.org/pdf/2205.04733) easily with our lightweight library for embeddings called [FastEmbed](https://qdrant.tech/documentation/fastembed/). #### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#setup) Setup Install `FastEmbed`. ```python pip install fastembed ``` Import sparse text embedding models supported in FastEmbed. ```python from fastembed import SparseTextEmbedding ``` You can list all sparse text embedding models currently supported. ```python SparseTextEmbedding.list_supported_models() ``` Output with a list of supported models ```bash [{'model': 'prithivida/Splade_PP_en_v1',\ 'vocab_size': 30522,\ 'description': 'Independent Implementation of SPLADE++ Model for English',\ 'size_in_GB': 0.532,\ 'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1'},\ 'model_file': 'model.onnx'},\ {'model': 'prithvida/Splade_PP_en_v1',\ 'vocab_size': 30522,\ 'description': 'Independent Implementation of SPLADE++ Model for English',\ 'size_in_GB': 0.532,\ 'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1'},\ 'model_file': 'model.onnx'},\ {'model': 'Qdrant/bm42-all-minilm-l6-v2-attentions',\ 'vocab_size': 30522,\ 'description': 'Light sparse embedding model, which assigns an importance score to each token in the text',\ 'size_in_GB': 0.09,\ 'sources': {'hf': 'Qdrant/all_miniLM_L6_v2_with_attentions'},\ 'model_file': 'model.onnx',\ 'additional_files': ['stopwords.txt'],\ 'requires_idf': True},\ {'model': 'Qdrant/bm25',\ 'description': 'BM25 as sparse embeddings meant to be used with Qdrant',\ 'size_in_GB': 0.01,\ 'sources': {'hf': 'Qdrant/bm25'},\ 'model_file': 'mock.file',\ 'additional_files': ['arabic.txt',\ 'azerbaijani.txt',\ 'basque.txt',\ 'bengali.txt',\ 'catalan.txt',\ 'chinese.txt',\ 'danish.txt',\ 'dutch.txt',\ 'english.txt',\ 'finnish.txt',\ 'french.txt',\ 'german.txt',\ 'greek.txt',\ 'hebrew.txt',\ 'hinglish.txt',\ 'hungarian.txt',\ 'indonesian.txt',\ 'italian.txt',\ 'kazakh.txt',\ 'nepali.txt',\ 'norwegian.txt',\ 'portuguese.txt',\ 'romanian.txt',\ 'russian.txt',\ 'slovene.txt',\ 'spanish.txt',\ 'swedish.txt',\ 'tajik.txt',\ 'turkish.txt'],\ 'requires_idf': True}] ``` Load SPLADE++. ```python sparse_model_name = "prithivida/Splade_PP_en_v1" sparse_model = SparseTextEmbedding(model_name=sparse_model_name) ``` The model files will be fetched and downloaded, with progress showing. #### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#embed-data) Embed data We will use a toy movie description dataset. Movie description dataset ```python descriptions = ["In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions.",\ "A film projectionist longs to be a detective, and puts his meagre skills to work when he is framed by a rival for stealing his girlfriend's father's pocketwatch.",\ "A group of high-end professional thieves start to feel the heat from the LAPD when they unknowingly leave a clue at their latest heist.",\ "A petty thief with an utter resemblance to a samurai warlord is hired as the lord's double. When the warlord later dies the thief is forced to take up arms in his place.",\ "A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past.",\ "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre.",\ "When a machine that allows therapists to enter their patients' dreams is stolen, all hell breaks loose. Only a young female therapist, Paprika, can stop it.",\ "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop.",\ "A story that revolves around drug abuse in the affluent north Indian State of Punjab and how the youth there have succumbed to it en-masse resulting in a socio-economic decline.",\ "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent.",\ "Concurrent theatrical ending of the TV series Neon Genesis Evangelion (1995).",\ "During World War II, a rebellious U.S. Army Major is assigned a dozen convicted murderers to train and lead them into a mass assassination mission of German officers.",\ "The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home.",\ "A soldier fighting aliens gets to relive the same day over and over again, the day restarting every time he dies.",\ "After two male musicians witness a mob hit, they flee the state in an all-female band disguised as women, but further complications set in.",\ "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household.",\ "A renegade reporter trailing a young runaway heiress for a big story joins her on a bus heading from Florida to New York, and they end up stuck with each other when the bus leaves them behind at one of the stops.",\ "Story of 40-man Turkish task force who must defend a relay station.",\ "Spinal Tap, one of England's loudest bands, is chronicled by film director Marty DiBergi on what proves to be a fateful tour.",\ "Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl."] ``` Embed movie descriptions with SPLADE++. ```python sparse_descriptions = list(sparse_model.embed(descriptions)) ``` You can check how a sparse vector generated by SPLADE++ looks in Qdrant. ```python sparse_descriptions[0] ``` It is stored as **indices** of BERT tokens, weights of which are non-zero, and **values** of these weights. ```bash SparseEmbedding( values=array([1.57449973, 0.90787691, ..., 1.21796167, 1.1321187]), indices=array([ 1040, 2001, ..., 28667, 29137]) ) ``` #### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#upload-embeddings-to-qdrant) Upload Embeddings to Qdrant Install `qdrant-client` ```python pip install qdrant-client ``` Qdrant Client has a simple in-memory mode that allows you to experiment locally on small data volumes. Alternatively, you could use for experiments [a free tier cluster](https://qdrant.tech/documentation/cloud/create-cluster/#create-a-cluster) in Qdrant Cloud. ```python from qdrant_client import QdrantClient, models qdrant_client = QdrantClient(":memory:") # Qdrant is running from RAM. ``` Now, let’s create a [collection](https://qdrant.tech/documentation/concepts/collections/) in which could upload our sparse SPLADE++ embeddings. For that, we will use the [sparse vectors](https://qdrant.tech/documentation/concepts/vectors/#sparse-vectors) representation supported in Qdrant. ```python qdrant_client.create_collection( collection_name="movies", vectors_config={}, sparse_vectors_config={ "film_description": models.SparseVectorParams(), }, ) ``` To make this collection human-readable, let’s save movie metadata (name, description and movie’s length) together with an embeddings. Movie metadata ```python metadata = [{"movie_name": "The Passion of Joan of Arc", "movie_watch_time_min": 114, "movie_description": "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions."},\ {"movie_name": "Sherlock Jr.", "movie_watch_time_min": 45, "movie_description": "A film projectionist longs to be a detective, and puts his meagre skills to work when he is framed by a rival for stealing his girlfriend's father's pocketwatch."},\ {"movie_name": "Heat", "movie_watch_time_min": 170, "movie_description": "A group of high-end professional thieves start to feel the heat from the LAPD when they unknowingly leave a clue at their latest heist."},\ {"movie_name": "Kagemusha", "movie_watch_time_min": 162, "movie_description": "A petty thief with an utter resemblance to a samurai warlord is hired as the lord's double. When the warlord later dies the thief is forced to take up arms in his place."},\ {"movie_name": "Kubo and the Two Strings", "movie_watch_time_min": 101, "movie_description": "A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past."},\ {"movie_name": "Sardar Udham", "movie_watch_time_min": 164, "movie_description": "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre."},\ {"movie_name": "Paprika", "movie_watch_time_min": 90, "movie_description": "When a machine that allows therapists to enter their patients' dreams is stolen, all hell breaks loose. Only a young female therapist, Paprika, can stop it."},\ {"movie_name": "After Hours", "movie_watch_time_min": 97, "movie_description": "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop."},\ {"movie_name": "Udta Punjab", "movie_watch_time_min": 148, "movie_description": "A story that revolves around drug abuse in the affluent north Indian State of Punjab and how the youth there have succumbed to it en-masse resulting in a socio-economic decline."},\ {"movie_name": "Philomena", "movie_watch_time_min": 98, "movie_description": "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent."},\ {"movie_name": "Neon Genesis Evangelion: The End of Evangelion", "movie_watch_time_min": 87, "movie_description": "Concurrent theatrical ending of the TV series Neon Genesis Evangelion (1995)."},\ {"movie_name": "The Dirty Dozen", "movie_watch_time_min": 150, "movie_description": "During World War II, a rebellious U.S. Army Major is assigned a dozen convicted murderers to train and lead them into a mass assassination mission of German officers."},\ {"movie_name": "Toy Story 3", "movie_watch_time_min": 103, "movie_description": "The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home."},\ {"movie_name": "Edge of Tomorrow", "movie_watch_time_min": 113, "movie_description": "A soldier fighting aliens gets to relive the same day over and over again, the day restarting every time he dies."},\ {"movie_name": "Some Like It Hot", "movie_watch_time_min": 121, "movie_description": "After two male musicians witness a mob hit, they flee the state in an all-female band disguised as women, but further complications set in."},\ {"movie_name": "Snow White and the Seven Dwarfs", "movie_watch_time_min": 83, "movie_description": "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household."},\ {"movie_name": "It Happened One Night", "movie_watch_time_min": 105, "movie_description": "A renegade reporter trailing a young runaway heiress for a big story joins her on a bus heading from Florida to New York, and they end up stuck with each other when the bus leaves them behind at one of the stops."},\ {"movie_name": "Nefes: Vatan Sagolsun", "movie_watch_time_min": 128, "movie_description": "Story of 40-man Turkish task force who must defend a relay station."},\ {"movie_name": "This Is Spinal Tap", "movie_watch_time_min": 82, "movie_description": "Spinal Tap, one of England's loudest bands, is chronicled by film director Marty DiBergi on what proves to be a fateful tour."},\ {"movie_name": "Let the Right One In", "movie_watch_time_min": 114, "movie_description": "Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl."}] ``` Upload embedded descriptions with movie metadata into the collection. ```python qdrant_client.upsert( collection_name="movies", points=[\ models.PointStruct(\ id=idx,\ payload=metadata[idx],\ vector={\ "film_description": models.SparseVector(\ indices=vector.indices,\ values=vector.values\ )\ },\ )\ for idx, vector in enumerate(sparse_descriptions)\ ], ) ``` Implicitly generate sparse vectors (Click to expand) ```python qdrant_client.upsert( collection_name="movies", points=[\ models.PointStruct(\ id=idx,\ payload=metadata[idx],\ vector={\ "film_description": models.Document(\ text=description, model=sparse_model_name\ )\ },\ )\ for idx, description in enumerate(descriptions)\ ], ) ``` #### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#querying) Querying Let’s query our collection! ```python query_embedding = list(sparse_model.embed("A movie about music"))[0] response = qdrant_client.query_points( collection_name="movies", query=models.SparseVector(indices=query_embedding.indices, values=query_embedding.values), using="film_description", limit=1, with_vectors=True, with_payload=True ) print(response) ``` Implicitly generate sparse vectors (Click to expand) ```python response = qdrant_client.query_points( collection_name="movies", query=models.Document(text="A movie about music", model=sparse_model_name), using="film_description", limit=1, with_vectors=True, with_payload=True, ) print(response) ``` Output looks like this: ```bash points=[ScoredPoint(\ id=18,\ version=0,\ score=9.6779785,\ payload={\ 'movie_name': 'This Is Spinal Tap',\ 'movie_watch_time_min': 82,\ 'movie_description': "Spinal Tap, one of England's loudest bands,\ is chronicled by film director Marty DiBergi on what proves to be a fateful tour."\ },\ vector={\ 'film_description': SparseVector(\ indices=[1010, 2001, ..., 25316, 25517],\ values=[0.49717945, 0.19760133, ..., 1.2124698, 0.58689135])\ },\ shard_key=None,\ order_value=None\ )] ``` As you can see, there are no overlapping words in the query and a description of a found movie, even though the answer fits the query, and yet we’re working with **exact matching**. This is possible due to the **internal expansion** of the query and the document that SPLADE++ does. #### [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#internal-expansion-by-splade) Internal Expansion by SPLADE++ Let’s check how did SPLADE++ expand the query and the document we got as an answer. For that, we will need to use the HuggingFace library called [Tokenizers](https://huggingface.co/docs/tokenizers/en/index). With it, we will be able to decode back to human-readable format **indices** of words in a vocabulary SPLADE++ uses. Firstly we will need to install this library. ```python pip install tokenizers ``` Then, let’s write a function which will decode SPLADE++ sparse embeddings and return words SPLADE++ uses for encoding the input. We would like to return them in the descending order based on the weight ( **impact score**), SPLADE++ assigned them. ```python from tokenizers import Tokenizer tokenizer = Tokenizer.from_pretrained('Qdrant/SPLADE_PP_en_v1') def get_tokens_and_weights(sparse_embedding, tokenizer): token_weight_dict = {} for i in range(len(sparse_embedding.indices)): token = tokenizer.decode([sparse_embedding.indices[i]]) weight = sparse_embedding.values[i] token_weight_dict[token] = weight # Sort the dictionary by weights token_weight_dict = dict(sorted(token_weight_dict.items(), key=lambda item: item[1], reverse=True)) return token_weight_dict ``` Firstly, we apply our function to the query. ```python query_embedding = list(sparse_model.embed("A movie about music"))[0] print(get_tokens_and_weights(query_embedding, tokenizer)) ``` That’s how SPLADE++ expanded the query: ```bash { "music": 2.764289617538452, "movie": 2.674748420715332, "film": 2.3489091396331787, "musical": 2.276120901107788, "about": 2.124547004699707, "movies": 1.3825485706329346, "song": 1.2893378734588623, "genre": 0.9066758751869202, "songs": 0.8926399946212769, "a": 0.8900706768035889, "musicians": 0.5638002157211304, "sound": 0.49310919642448425, "musician": 0.46415239572525024, "drama": 0.462990403175354, "tv": 0.4398191571235657, "book": 0.38950803875923157, "documentary": 0.3758136034011841, "hollywood": 0.29099565744400024, "story": 0.2697228491306305, "nature": 0.25306591391563416, "concerning": 0.205053448677063, "game": 0.1546829640865326, "rock": 0.11775632947683334, "definition": 0.08842901140451431, "love": 0.08636035025119781, "soundtrack": 0.06807517260313034, "religion": 0.053535860031843185, "filmed": 0.025964470580220222, "sounds": 0.0004048719711136073 } ``` Then, we apply our function to the answer. ```python query_embedding = list(sparse_model.embed("A movie about music"))[0] response = qdrant_client.query_points( collection_name="movies", query=models.SparseVector(indices=query_embedding.indices, values=query_embedding.values), using="film_description", limit=1, with_vectors=True, with_payload=True ) print(get_tokens_and_weights(response.points[0].vector['film_description'], tokenizer)) ``` Implicitly generate sparse vectors (Click to expand) ```python response = qdrant_client.query_points( collection_name="movies", query=models.Document(text="A movie about music", model=sparse_model_name), using="film_description", limit=1, with_vectors=True, with_payload=True, ) print(get_tokens_and_weights(response.points[0].vector["film_description"], tokenizer)) ``` And that’s how SPLADE++ expanded the answer. ```python {'spinal': 2.6548674, 'tap': 2.534881, 'marty': 2.223297, '##berg': 2.0402722, '##ful': 2.0030282, 'fate': 1.935915, 'loud': 1.8381964, 'spine': 1.7507898, 'di': 1.6161551, 'bands': 1.5897619, 'band': 1.589473, 'uk': 1.5385966, 'tour': 1.4758654, 'chronicle': 1.4577943, 'director': 1.4423795, 'england': 1.4301306, '##est': 1.3025658, 'taps': 1.2124698, 'film': 1.1069428, '##berger': 1.1044296, 'tapping': 1.0424755, 'best': 1.0327196, 'louder': 0.9229055, 'music': 0.9056678, 'directors': 0.8887502, 'movie': 0.870712, 'directing': 0.8396196, 'sound': 0.83609974, 'genre': 0.803052, 'dave': 0.80212915, 'wrote': 0.7849579, 'hottest': 0.7594193, 'filmed': 0.750105, 'english': 0.72807616, 'who': 0.69502294, 'tours': 0.6833075, 'club': 0.6375339, 'vertebrae': 0.58689135, 'chronicles': 0.57296354, 'dance': 0.57278687, 'song': 0.50987065, ',': 0.49717945, 'british': 0.4971719, 'writer': 0.495709, 'directed': 0.4875775, 'cork': 0.475757, '##i': 0.47122696, '##band': 0.46837863, 'most': 0.44112885, '##liest': 0.44084555, 'destiny': 0.4264851, 'prove': 0.41789067, 'is': 0.40306947, 'famous': 0.40230379, 'hop': 0.3897451, 'noise': 0.38770816, '##iest': 0.3737782, 'comedy': 0.36903998, 'sport': 0.35883865, 'quiet': 0.3552795, 'detail': 0.3397654, 'fastest': 0.30345848, 'filmmaker': 0.3013101, 'festival': 0.28146765, '##st': 0.28040633, 'tram': 0.27373192, 'well': 0.2599603, 'documentary': 0.24368097, 'beat': 0.22953634, 'direction': 0.22925079, 'hardest': 0.22293334, 'strongest': 0.2018861, 'was': 0.19760133, 'oldest': 0.19532987, 'byron': 0.19360808, 'worst': 0.18397793, 'touring': 0.17598206, 'rock': 0.17319143, 'clubs': 0.16090117, 'popular': 0.15969758, 'toured': 0.15917331, 'trick': 0.1530599, 'celebrity': 0.14458777, 'musical': 0.13888633, 'filming': 0.1363699, 'culture': 0.13616633, 'groups': 0.1340591, 'ski': 0.13049376, 'venue': 0.12992987, 'style': 0.12853126, 'history': 0.12696269, 'massage': 0.11969914, 'theatre': 0.11673525, 'sounds': 0.108338095, 'visit': 0.10516077, 'editing': 0.078659914, 'death': 0.066746496, 'massachusetts': 0.055702563, 'stuart': 0.0447934, 'romantic': 0.041140396, 'pamela': 0.03561337, 'what': 0.016409796, 'smallest': 0.010815808, 'orchestra': 0.0020691194} ``` Due to the expansion both the query and the document overlap in “ _music_”, “ _film_”, “ _sounds_”, and others, so **exact matching** works. ## [Anchor](https://qdrant.tech/articles/modern-sparse-neural-retrieval/\#key-takeaways-when-to-choose-sparse-neural-models-for-retrieval) Key Takeaways: When to Choose Sparse Neural Models for Retrieval Sparse Neural Retrieval makes sense: - In areas where keyword matching is crucial but BM25 is insufficient for initial retrieval, semantic matching (e.g., synonyms, homonyms) adds significant value. This is especially true in fields such as medicine, academia, law, and e-commerce, where brand names and serial numbers play a critical role. Dense retrievers tend to return many false positives, while sparse neural retrieval helps narrow down these false positives. - Sparse neural retrieval can be a valuable option for scaling, especially when working with large datasets. It leverages exact matching using an inverted index, which can be fast depending on the nature of your data. - If you’re using traditional retrieval systems, sparse neural retrieval is compatible with them and helps bridge the semantic gap. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/modern-sparse-neural-retrieval.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/modern-sparse-neural-retrieval.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-192-lllmstxt|> ## storage - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Storage --- # [Anchor](https://qdrant.tech/documentation/concepts/storage/\#storage) Storage All data within one collection is divided into segments. Each segment has its independent vector and payload storage as well as indexes. Data stored in segments usually do not overlap. However, storing the same point in different segments will not cause problems since the search contains a deduplication mechanism. The segments consist of vector and payload storages, vector and payload [indexes](https://qdrant.tech/documentation/concepts/indexing/), and id mapper, which stores the relationship between internal and external ids. A segment can be `appendable` or `non-appendable` depending on the type of storage and index used. You can freely add, delete and query data in the `appendable` segment. With `non-appendable` segment can only read and delete data. The configuration of the segments in the collection can be different and independent of one another, but at least one \`appendable’ segment must be present in a collection. ## [Anchor](https://qdrant.tech/documentation/concepts/storage/\#vector-storage) Vector storage Depending on the requirements of the application, Qdrant can use one of the data storage options. The choice has to be made between the search speed and the size of the RAM used. **In-memory storage** \- Stores all vectors in RAM, has the highest speed since disk access is required only for persistence. **Memmap storage** \- Creates a virtual address space associated with the file on disk. [Wiki](https://en.wikipedia.org/wiki/Memory-mapped_file). Mmapped files are not directly loaded into RAM. Instead, they use page cache to access the contents of the file. This scheme allows flexible use of available memory. With sufficient RAM, it is almost as fast as in-memory storage. ### [Anchor](https://qdrant.tech/documentation/concepts/storage/\#configuring-memmap-storage) Configuring Memmap storage There are two ways to configure the usage of memmap(also known as on-disk) storage: - Set up `on_disk` option for the vectors in the collection create API: _Available as of v1.2.0_ httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine", "on_disk": true } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams( size=768, distance=models.Distance.COSINE, on_disk=True ), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", on_disk: true, }, }); ``` ```rust use qdrant_client::qdrant::{CreateCollectionBuilder, Distance, VectorParamsBuilder}; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine).on_disk(true)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.VectorParams; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( "{collection_name}", VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .setOnDisk(true) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( "{collection_name}", new VectorParams { Size = 768, Distance = Distance.Cosine, OnDisk = true } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, OnDisk: qdrant.PtrOf(true), }), }) ``` This will create a collection with all vectors immediately stored in memmap storage. This is the recommended way, in case your Qdrant instance operates with fast disks and you are working with large collections. - Set up `memmap_threshold` option. This option will set the threshold after which the segment will be converted to memmap storage. There are two ways to do this: 1. You can set the threshold globally in the [configuration file](https://qdrant.tech/documentation/guides/configuration/). The parameter is called `memmap_threshold` (previously `memmap_threshold_kb`). 2. You can set the threshold for each collection separately during [creation](https://qdrant.tech/documentation/concepts/collections/#create-collection) or [update](https://qdrant.tech/documentation/concepts/collections/#update-collection-parameters). httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine" }, "optimizers_config": { "memmap_threshold": 20000 } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE), optimizers_config=models.OptimizersConfigDiff(memmap_threshold=20000), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", }, optimizers_config: { memmap_threshold: 20000, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, OptimizersConfigDiffBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine)) .optimizers_config(OptimizersConfigDiffBuilder::default().memmap_threshold(20000)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.OptimizersConfigDiff; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .build()) .build()) .setOptimizersConfig( OptimizersConfigDiff.newBuilder().setMemmapThreshold(20000).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine }, optimizersConfig: new OptimizersConfigDiff { MemmapThreshold = 20000 } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, }), OptimizersConfig: &qdrant.OptimizersConfigDiff{ MaxSegmentSize: qdrant.PtrOf(uint64(20000)), }, }) ``` The rule of thumb to set the memmap threshold parameter is simple: - if you have a balanced use scenario - set memmap threshold the same as `indexing_threshold` (default is 20000). In this case the optimizer will not make any extra runs and will optimize all thresholds at once. - if you have a high write load and low RAM - set memmap threshold lower than `indexing_threshold` to e.g. 10000. In this case the optimizer will convert the segments to memmap storage first and will only apply indexing after that. In addition, you can use memmap storage not only for vectors, but also for HNSW index. To enable this, you need to set the `hnsw_config.on_disk` parameter to `true` during collection [creation](https://qdrant.tech/documentation/concepts/collections/#create-a-collection) or [updating](https://qdrant.tech/documentation/concepts/collections/#update-collection-parameters). httppythontypescriptrustjavacsharpgo ```http PUT /collections/{collection_name} { "vectors": { "size": 768, "distance": "Cosine", "on_disk": true }, "hnsw_config": { "on_disk": true } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.create_collection( collection_name="{collection_name}", vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE, on_disk=True), hnsw_config=models.HnswConfigDiff(on_disk=True), ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.createCollection("{collection_name}", { vectors: { size: 768, distance: "Cosine", on_disk: true, }, hnsw_config: { on_disk: true, }, }); ``` ```rust use qdrant_client::qdrant::{ CreateCollectionBuilder, Distance, HnswConfigDiffBuilder, VectorParamsBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .create_collection( CreateCollectionBuilder::new("{collection_name}") .vectors_config(VectorParamsBuilder::new(768, Distance::Cosine).on_disk(true)) .hnsw_config(HnswConfigDiffBuilder::default().on_disk(true)), ) .await?; ``` ```java import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Collections.CreateCollection; import io.qdrant.client.grpc.Collections.Distance; import io.qdrant.client.grpc.Collections.HnswConfigDiff; import io.qdrant.client.grpc.Collections.VectorParams; import io.qdrant.client.grpc.Collections.VectorsConfig; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client .createCollectionAsync( CreateCollection.newBuilder() .setCollectionName("{collection_name}") .setVectorsConfig( VectorsConfig.newBuilder() .setParams( VectorParams.newBuilder() .setSize(768) .setDistance(Distance.Cosine) .setOnDisk(true) .build()) .build()) .setHnswConfig(HnswConfigDiff.newBuilder().setOnDisk(true).build()) .build()) .get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.CreateCollectionAsync( collectionName: "{collection_name}", vectorsConfig: new VectorParams { Size = 768, Distance = Distance.Cosine, OnDisk = true }, hnswConfig: new HnswConfigDiff { OnDisk = true } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.CreateCollection(context.Background(), &qdrant.CreateCollection{ CollectionName: "{collection_name}", VectorsConfig: qdrant.NewVectorsConfig(&qdrant.VectorParams{ Size: 768, Distance: qdrant.Distance_Cosine, OnDisk: qdrant.PtrOf(true), }), HnswConfig: &qdrant.HnswConfigDiff{ OnDisk: qdrant.PtrOf(true), }, }) ``` ## [Anchor](https://qdrant.tech/documentation/concepts/storage/\#payload-storage) Payload storage Qdrant supports two types of payload storages: InMemory and OnDisk. InMemory payload storage is organized in the same way as in-memory vectors. The payload data is loaded into RAM at service startup while disk and [Gridstore](https://qdrant.tech/articles/gridstore-key-value-storage/) are used for persistence only. This type of storage works quite fast, but it may require a lot of space to keep all the data in RAM, especially if the payload has large values attached - abstracts of text or even images. In the case of large payload values, it might be better to use OnDisk payload storage. This type of storage will read and write payload directly to RocksDB, so it won’t require any significant amount of RAM to store. The downside, however, is the access latency. If you need to query vectors with some payload-based conditions - checking values stored on disk might take too much time. In this scenario, we recommend creating a payload index for each field used in filtering conditions to avoid disk access. Once you create the field index, Qdrant will preserve all values of the indexed field in RAM regardless of the payload storage type. You can specify the desired type of payload storage with [configuration file](https://qdrant.tech/documentation/guides/configuration/) or with collection parameter `on_disk_payload` during [creation](https://qdrant.tech/documentation/concepts/collections/#create-collection) of the collection. ## [Anchor](https://qdrant.tech/documentation/concepts/storage/\#versioning) Versioning To ensure data integrity, Qdrant performs all data changes in 2 stages. In the first step, the data is written to the Write-ahead-log(WAL), which orders all operations and assigns them a sequential number. Once a change has been added to the WAL, it will not be lost even if a power loss occurs. Then the changes go into the segments. Each segment stores the last version of the change applied to it as well as the version of each individual point. If the new change has a sequential number less than the current version of the point, the updater will ignore the change. This mechanism allows Qdrant to safely and efficiently restore the storage from the WAL in case of an abnormal shutdown. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/storage.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/concepts/storage.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-193-lllmstxt|> ## indexing-optimization - [Articles](https://qdrant.tech/articles/) - Optimizing Memory for Bulk Uploads [Back to Vector Search Manuals](https://qdrant.tech/articles/vector-search-manuals/) --- # Optimizing Memory for Bulk Uploads Sabrina Aquino · February 13, 2025 ![Optimizing Memory for Bulk Uploads](https://qdrant.tech/articles_data/indexing-optimization/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/indexing-optimization/\#optimizing-memory-consumption-during-bulk-uploads) Optimizing Memory Consumption During Bulk Uploads Efficient memory management is a constant challenge when you’re dealing with **large-scale vector data**. In high-volume ingestion scenarios, even seemingly minor configuration choices can significantly impact stability and performance. Let’s take a look at the best practices and recommendations to help you optimize memory usage during bulk uploads in Qdrant. We’ll cover scenarios with both **dense** and **sparse** vectors, helping your deployments remain performant even under high load and avoiding out-of-memory errors. ## [Anchor](https://qdrant.tech/articles/indexing-optimization/\#indexing-for-dense-vs-sparse-vectors) Indexing for dense vs. sparse vectors **Dense vectors** Qdrant employs an **HNSW-based index** for fast similarity search on dense vectors. By default, HNSW is built or updated once the number of **unindexed** vectors in a segment exceeds a set `indexing_threshold`. Although it delivers excellent query speed, building or updating the HNSW graph can be **resource-intensive** if it occurs frequently or across many small segments. **Sparse vectors** Sparse vectors use an **inverted index**. This index is updated at the **time of upsertion**, meaning you cannot disable or postpone it for sparse vectors. In most cases, its overhead is smaller than that of building an HNSW graph, but you should still be aware that each upsert triggers a sparse index update. ## [Anchor](https://qdrant.tech/articles/indexing-optimization/\#bulk-upload-configuration-for-dense-vectors) Bulk upload configuration for dense vectors When performing high-volume vector ingestion, you have **two primary options** for handling indexing overhead. You should choose one depending on your specific workload and memory constraints: - **Disable HNSW indexing** To reduce memory and CPU pressure during bulk ingestion, you can **disable HNSW indexing entirely** by setting `"m": 0`. For dense vectors, the `m` parameter defines how many edges each node in the HNSW graph can have. This way, no dense vector index will be built, preventing unnecessary CPU usage during ingestion. **Figure 1:** A description of three key HNSW parameters. ![](https://qdrant.tech/articles_data/indexing-optimization/hnsw-parameters.png) ```json PATCH /collections/your_collection { "hnsw_config": { "m": 0 } } ``` **After ingestion is complete**, you can **re-enable HNSW** by setting `m` back to a production value (commonly 16 or 32). Remember that search won’t use HNSW until the index is built, so search performance may be slower during this period. - **Disabling optimizations completely** The `indexing_threshold` tells Qdrant how many unindexed dense vectors can accumulate in a segment before building the HNSW graph. Setting `"indexing_threshold"=0` defers indexing entirely, keeping **ingestion speed at maximum**. However, this means uploaded vectors are not moved to disk while uploading, which can lead to **high RAM usage**. ```json PATCH /collections/your_collection { "optimizer_config": { "indexing_threshold": 0 } } ``` After bulk ingestion, set `indexing_threshold` to a positive value to ensure vectors are indexed and searchable via HNSW. **Vectors will not be searchable via HNSW until indexing is performed.** Small thresholds (e.g., 100) mean more frequent indexing, which can still be costly if many segments exist. Larger thresholds (e.g., 10000) delay indexing to batch more vectors at once, potentially using more RAM at the moment of index build, but fewer builds overall. Between these two approaches, we generally recommend disabling HNSW ( `"m"=0`) during bulk ingestion to keep memory usage predictable. Using `indexing_threshold=0` can be an alternative, but only if your system has enough memory to accommodate the unindexed vectors in RAM. * * * ## [Anchor](https://qdrant.tech/articles/indexing-optimization/\#on-disk-storage-in-qdrant) On-Disk storage in Qdrant By default, Qdrant keeps **vectors**, **payload data**, and **indexes** in memory to ensure low-latency queries. However, in large-scale or memory-constrained scenarios, you can configure some or all of them to be stored on-disk. This helps reduce RAM usage at the cost of potential increases in query latency, particularly for cold reads. **When to use on-disk**: - You have **very large** or **rarely used** payload data or indexes, and freeing up RAM is worth potential I/O overhead. - Your dataset doesn’t fit comfortably in available memory. - You want to reduce memory pressure. - You can tolerate slower queries if it ensures the system remains stable under heavy loads. * * * ## [Anchor](https://qdrant.tech/articles/indexing-optimization/\#memmap-storage-and-segmentation) Memmap storage and segmentation Qdrant uses **memory-mapped files** (segments) to store data on-disk. Rather than loading all vectors into RAM, Qdrant maps each segment into its address space, paging data in and out on demand. This helps keep the active RAM footprint lower, because data can be paged out if memory pressure is high. But each segment still incurs overhead (metadata, page table entries, etc.). During **high-volume ingestion**, you can accumulate dozens of small segments. Qdrant’s **optimizer** can later merge these into fewer, larger segments, reducing per-segment overhead and lowering total memory usage. When you create a collection with `"on_disk": true`, Qdrant will store newly inserted vectors in memmap storage from the start. For example: ```json PATCH /collections/your_collection { "vectors": { "on_disk": true } } ``` This approach immediately places all incoming vectors on disk, which can be very efficient in case of bulk ingestion. However, **vector data and indexes are stored separately**, so enabling `on_disk` for vectors does not automatically store their indexes on disk. To fully optimize memory usage, you may need to configure **both vector storage and index storage** independently. For dense vectors, you can enable on-disk storage for both the **vector data** and the **HNSW index**: ```json PATCH /collections/your_collection { "vectors": { "on_disk": true }, "hnsw_config": { "on_disk": true } } ``` For sparse vectors, you need to enable `on_disk` for both the vector data and the sparse index separately: ```json PATCH /collections/your_collection { "sparse_vectors": { "text": { "on_disk": true, "index": { "on_disk": true } } } } ``` * * * ## [Anchor](https://qdrant.tech/articles/indexing-optimization/\#best-practices-for-high-volume-vector-ingestion)**Best practices for high-volume vector ingestion** Bulk ingestion can lead to high memory consumption and even out-of-memory (OOM) errors. **If you’re experiencing out-of-memory errors with your current setup**, scaling up temporarily (increasing available RAM) will provide a buffer while you adjust Qdrant’s configuration for more a efficient data ingestion. The key here is to control indexing overhead. Let’s walk through the best practices for high-volume vector ingestion in a constrained-memory environment. ### [Anchor](https://qdrant.tech/articles/indexing-optimization/\#1-store-vector-data-on-disk-immediately) 1\. Store vector data on disk immediately The most effective way to reduce memory usage is to store vector data on disk right from the start using `on_disk: true`. This prevents RAM from being overloaded with raw vectors before optimization kicks in. ```json PATCH /collections/your_collection { "vectors": { "on_disk": true } } ``` Previously, vector data had to be held in RAM until optimizers could move it to disk, which caused significant memory pressure. Now, by writing vectors to disk directly, memory overhead is significantly reduced, making bulk ingestion much more efficient. ### [Anchor](https://qdrant.tech/articles/indexing-optimization/\#2-disable-hnsw-for-dense-vectors-m0) 2\. Disable HNSW for dense vectors ( `m=0`) During an **initial bulk load**, you can **disable** dense indexing by setting `"m": 0.` This ensures Qdrant won’t build an HNSW graph for incoming vectors, avoiding unnecessary memory and CPU usage. ```json PATCH /collections/your_collection { "hnsw_config": { "m": 0 }, "optimizer_config": { "indexing_threshold": 10000 } } ``` ### [Anchor](https://qdrant.tech/articles/indexing-optimization/\#3-let-the-optimizer-run-after-bulk-uploads) 3\. Let the optimizer run **after** bulk uploads Qdrant’s optimizers continuously restructure data to improve search efficiency. However, during a bulk upload, this can lead to excessive data movement and overhead as segments are constantly reorganized while new data is still arriving. To avoid this, **upload all data first**, then allow the optimizer to process everything in one go. This minimizes redundant operations and ensures a more efficient segment structure. ### [Anchor](https://qdrant.tech/articles/indexing-optimization/\#4-wait-for-indexation-to-clear-up-memory)**4\. Wait for indexation to clear up memory** Before performing additional operations, **allow Qdrant to finish any ongoing indexing**. Large indexing jobs can keep memory usage high until they fully complete. Monitor Qdrant logs or metrics to confirm when indexing finishes—once that happens, memory consumption should drop as intermediate data structures are freed. ### [Anchor](https://qdrant.tech/articles/indexing-optimization/\#5-re-enable-hnsw-post-ingestion) 5\. Re-enable HNSW post-ingestion After the ingestion phase is over and memory usage has stabilized, re-enable HNSW for dense vectors by setting `m` back to a production value (commonly `16` or `32`): ```json PATCH /collections/your_collection { "hnsw_config": { "m": 16 } } ``` ### [Anchor](https://qdrant.tech/articles/indexing-optimization/\#5-enable-quantization) 5\. Enable quantization If you had planned to store all dense vectors on disk, be aware that searches can slow down drastically due to frequent disk I/O while memory pressure is high. A more balanced approach is **scalar quantization**: compress vectors (e.g., to `int8`) so they fit in RAM without occupying as much space as full floating-point values. ```json PATCH /collections/your_collection { "quantization_config": { "scalar": { "type": "int8", "always_ram": true } } } ``` Quantized vectors remain **in-memory** yet consume less space, preserving much of the performance advantage of RAM-based search. Learn more about [vector quantization](https://qdrant.tech/articles/what-is-vector-quantization/). ### [Anchor](https://qdrant.tech/articles/indexing-optimization/\#conclusion) Conclusion High-volume vector ingestion can place significant memory demands on Qdrant, especially if dense vectors are indexed in real time. By following these tips, you can substantially reduce the risk of out-of-memory errors and maintain stable performance in a memory-limited environment. As always, monitor your system’s behavior. Review logs, watch metrics, and keep an eye on memory usage. Each workload is different, so it’s wise to fine-tune Qdrant’s parameters according to your hardware and data scale. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/indexing-optimization.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/indexing-optimization.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-194-lllmstxt|> ## cloud-rbac - [Documentation](https://qdrant.tech/documentation/) - Cloud RBAC --- # [Anchor](https://qdrant.tech/documentation/cloud-rbac/\#cloud-rbac) Cloud RBAC ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/\#about-cloud-rbac) About Cloud RBAC Qdrant Cloud enables you to manage permissions for your cloud resources with greater precision within the Qdrant Cloud console. This feature ensures that only authorized users have access to sensitive data and capabilities, covering the following areas: - Billing - Identity and Access Management - Clusters\* - Hybrid Cloud - Account Configuration _Note: Current permissions control access to ALL clusters. Per Cluster permissions will be in a future release._ > 💡 You can access this in **Access Management > User & Role Management** _if enabled._ ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/\#guides) Guides - [Role Management](https://qdrant.tech/documentation/cloud-rbac/role-management/) - [User Management](https://qdrant.tech/documentation/cloud-rbac/user-management/) ## [Anchor](https://qdrant.tech/documentation/cloud-rbac/\#reference) Reference - [Permission List](https://qdrant.tech/documentation/cloud-rbac/permission-reference/) ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/_index.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/cloud-rbac/_index.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-195-lllmstxt|> ## case-study-dust-v2 0 --- # How Dust Scaled to 5,000+ Data Sources with Qdrant Daniel Azoulai · April 29, 2025 ![How Dust Scaled to 5,000+ Data Sources with Qdrant](https://qdrant.tech/blog/case-study-dust-v2/preview/title.jpg) On this page: - [Share on X](https://twitter.com/intent/tweet?url=https%3A%2F%2Fqdrant.tech%2Fblog%2Fcase-study-dust-v2%2F&text=How%20Dust%20Scaled%20to%205,000+%20Data%20Sources%20with%20Qdrant "x") - [Share on LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fqdrant.tech%2Fblog%2Fcase-study-dust-v2%2F "LinkedIn") ## [Anchor](https://qdrant.tech/blog/case-study-dust-v2/\#inside-dusts-vector-stack-overhaul-scaling-to-5000-data-sources-with-qdrant) Inside Dust’s Vector Stack Overhaul: Scaling to 5,000+ Data Sources with Qdrant ![How Dust Scaled to 5,000+ Data Sources with Qdrant](https://qdrant.tech/blog/case-study-dust-v2/case-study-dust-v2-v2-bento-dark.jpg) ### [Anchor](https://qdrant.tech/blog/case-study-dust-v2/\#the-challenge-scaling-ai-infrastructure-for-thousands-of-data-sources) The Challenge: Scaling AI Infrastructure for Thousands of Data Sources Dust, an OS for AI-native companies enabling users to build AI agents powered by actions and company knowledge, faced a set of growing technical hurdles as it scaled its operations. The company’s core product enables users to give AI agents secure access to internal and external data resources, enabling enhanced workflows and faster access to information. However, this mission hit bottlenecks when their infrastructure began to strain under the weight of thousands of data sources and increasingly demanding user queries. Initially, Dust employed a strategy of creating a separate vector collection per data source, which rapidly became unsustainable. As the number of data sources ballooned beyond 5,000, the platform began experiencing significant performance degradation. RAM consumption skyrocketed, and vector search performance slowed dramatically, especially as the memory-mapped vectors spilled onto disk storage. At one point, they were managing nearly a thousand collections simultaneously and processing over a million vector upsert and delete operations in a single cycle. ### [Anchor](https://qdrant.tech/blog/case-study-dust-v2/\#evaluation-and-selection-why-dust-chose-qdrant) Evaluation and Selection: Why Dust Chose Qdrant The Dust team explored several popular vector databases. While each had merits, none met all of Dust’s increasingly complex needs. Some providers’ developer experience didn’t align with their workflows, and others lacked the deployment flexibility required. Dust needed a solution capable of handling multi-tenancy at scale, embedding model flexibility, efficient memory usage, and deep configurability. Qdrant stood out thanks to its open-source Rust foundation, giving Dust the control they needed over memory, performance, and customization. Its intuitive API and strong developer community also made the integration experience more seamless. Critically, Qdrant’s design allowed Dust to consolidate their fragmented architecture—replacing thousands of individual collections with a few shared, multi-tenant ones powered by robust sharding and payload filtering. ### [Anchor](https://qdrant.tech/blog/case-study-dust-v2/\#implementation-highlights-advanced-architecture-with-qdrant) Implementation Highlights: Advanced Architecture with Qdrant One of the most impactful features Dust adopted was scalar quantization. This reduced vector storage size by a factor of four, enabling the team to keep data in memory rather than falling back to slower disk storage. This shift alone led to dramatic latency improvements. Where queries in large collections once took 5 to 10 seconds, they now returned in under a second. Even in collections with over a million vectors and heavy payloads, search responses consistently clocked in well below the one-second mark. Dust also built a custom `DustQdrantClient` to manage all vector-related operations. This client abstracted away differences between cluster versions, embedding models, and sharding logic, simplifying ongoing development. Their infrastructure runs in Google Cloud Platform, with Qdrant deployed in isolated VPCs that communicate with Dust’s core APIs using secure authentication. The architecture is replicated across two major regions—US and EU—ensuring both high availability and compliance with data residency laws. ### [Anchor](https://qdrant.tech/blog/case-study-dust-v2/\#results-faster-performance-lower-costs-better-user-experience) Results: Faster Performance, Lower Costs, Better User Experience The impact of Qdrant was felt immediately. Search latency was slashed from multi-second averages to sub-second responsiveness. Collections that once consumed over 30 GB of RAM were optimized to run efficiently at a quarter of that size. The shift to in-memory quantized vectors, while keeping original vectors on disk for fallback, proved to be the perfect hybrid model for balancing performance and resource usage. These backend improvements directly translated into user-facing gains. Dust’s AI agents became more responsive and reliable. Even as customers loaded larger and more complex datasets, the system continued to deliver consistent performance. The platform’s ability to scale without degrading UX marked a turning point, empowering Dust to expand its customer base with confidence. The move to a multi-embedding-model architecture also paid dividends. By grouping data sources by embedder, Dust enabled smoother migrations and more efficient model experimentation. Qdrant’s flexibility let them evolve their architecture without reindexing massive datasets or disrupting end-user functionality. ### [Anchor](https://qdrant.tech/blog/case-study-dust-v2/\#lessons-learned-and-roadmap) Lessons Learned and Roadmap As they scaled, Dust uncovered a critical insight: users tend to ask more structured, analytical questions when they know a database is involved—queries better suited to SQL than vector search. This prompted the team to pair Qdrant with a text-to-SQL system, blending unstructured and structured query capabilities for a more versatile agent. Looking forward, Qdrant remains a foundational pillar of Dust’s product roadmap. They’re building multi-region sharding for more granular data residency, scaling their clusters both vertically and horizontally, and supporting newer embedding models from providers like OpenAI and Mistral. Future collections will be organized by embedder, with tenant-aware sharding and index optimizations tailored to each use case. ### [Anchor](https://qdrant.tech/blog/case-study-dust-v2/\#a-new-tier-of-performance-scalability-and-architectural-flexibility) A new tier of performance, scalability, and architectural flexibility By adopting Qdrant, Dust unlocked a new tier of performance, scalability, and architectural flexibility. Their platform is now equipped to support millions of vectors, operate efficiently across regions, and deliver low-latency search, even at enterprise scale. For teams building sophisticated AI agents, Qdrant provides not just a vector database—but the infrastructure backbone to grow with confidence. ### Get Started with Qdrant Free [Get Started](https://cloud.qdrant.io/signup) ![](https://qdrant.tech/img/rocket.svg) Up! <|page-196-lllmstxt|> ## qdrant-0-11-release - [Articles](https://qdrant.tech/articles/) - Introducing Qdrant 0.11 [Back to Qdrant Articles](https://qdrant.tech/articles/) --- # Introducing Qdrant 0.11 Kacper Łukawski · October 26, 2022 ![Introducing Qdrant 0.11](https://qdrant.tech/articles_data/qdrant-0-11-release/preview/title.jpg) We are excited to [announce the release of Qdrant v0.11](https://github.com/qdrant/qdrant/releases/tag/v0.11.0), which introduces a number of new features and improvements. ## [Anchor](https://qdrant.tech/articles/qdrant-0-11-release/\#replication) Replication One of the key features in this release is replication support, which allows Qdrant to provide a high availability setup with distributed deployment out of the box. This, combined with sharding, enables you to horizontally scale both the size of your collections and the throughput of your cluster. This means that you can use Qdrant to handle large amounts of data without sacrificing performance or reliability. ## [Anchor](https://qdrant.tech/articles/qdrant-0-11-release/\#administration-api) Administration API Another new feature is the administration API, which allows you to disable write operations to the service. This is useful in situations where search availability is more critical than updates, and can help prevent issues like memory usage watermarks from affecting your searches. ## [Anchor](https://qdrant.tech/articles/qdrant-0-11-release/\#exact-search) Exact search We have also added the ability to report indexed payload points in the info API, which allows you to verify that payload values were properly formatted for indexing. In addition, we have introduced a new `exact` search parameter that allows you to force exact searches of vectors, even if an ANN index is built. This can be useful for validating the accuracy of your HNSW configuration. ## [Anchor](https://qdrant.tech/articles/qdrant-0-11-release/\#backward-compatibility) Backward compatibility This release is backward compatible with v0.10.5 storage in single node deployment, but unfortunately, distributed deployment is not compatible with previous versions due to the large number of changes required for the replica set implementation. However, clients are tested for backward compatibility with the v0.10.x service. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-0-11-release.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/qdrant-0-11-release.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-197-lllmstxt|> ## langchain-integration - [Articles](https://qdrant.tech/articles/) - Using LangChain for Question Answering with Qdrant [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Using LangChain for Question Answering with Qdrant Kacper Łukawski · January 31, 2023 ![Using LangChain for Question Answering with Qdrant](https://qdrant.tech/articles_data/langchain-integration/preview/title.jpg) --- # [Anchor](https://qdrant.tech/articles/langchain-integration/\#streamlining-question-answering-simplifying-integration-with-langchain-and-qdrant) Streamlining Question Answering: Simplifying Integration with LangChain and Qdrant Building applications with Large Language Models doesn’t have to be complicated. A lot has been going on recently to simplify the development, so you can utilize already pre-trained models and support even complex pipelines with a few lines of code. [LangChain](https://langchain.readthedocs.io/) provides unified interfaces to different libraries, so you can avoid writing boilerplate code and focus on the value you want to bring. ## [Anchor](https://qdrant.tech/articles/langchain-integration/\#why-use-qdrant-for-question-answering-with-langchain) Why Use Qdrant for Question Answering with LangChain? It has been reported millions of times recently, but let’s say that again. ChatGPT-like models struggle with generating factual statements if no context is provided. They have some general knowledge but cannot guarantee to produce a valid answer consistently. Thus, it is better to provide some facts we know are actual, so it can just choose the valid parts and extract them from all the provided contextual data to give a comprehensive answer. [Vector database,\\ such as Qdrant](https://qdrant.tech/), is of great help here, as their ability to perform a [semantic search](https://qdrant.tech/documentation/tutorials/search-beginners/) over a huge knowledge base is crucial to preselect some possibly valid documents, so they can be provided into the LLM. That’s also one of the **chains** implemented in [LangChain](https://qdrant.tech/documentation/frameworks/langchain/), which is called `VectorDBQA`. And Qdrant got integrated with the library, so it might be used to build it effortlessly. ### [Anchor](https://qdrant.tech/articles/langchain-integration/\#the-two-model-approach) The Two-Model Approach Surprisingly enough, there will be two models required to set things up. First of all, we need an embedding model that will convert the set of facts into vectors, and store those into Qdrant. That’s an identical process to any other semantic search application. We’re going to use one of the `SentenceTransformers` models, so it can be hosted locally. The embeddings created by that model will be put into Qdrant and used to retrieve the most similar documents, given the query. However, when we receive a query, there are two steps involved. First of all, we ask Qdrant to provide the most relevant documents and simply combine all of them into a single text. Then, we build a prompt to the LLM (in our case [OpenAI](https://openai.com/)), including those documents as a context, of course together with the question asked. So the input to the LLM looks like the following: ```text Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. It's as certain as 2 + 2 = 4 ... Question: How much is 2 + 2? Helpful Answer: ``` There might be several context documents combined, and it is solely up to LLM to choose the right piece of content. But our expectation is, the model should respond with just `4`. ## [Anchor](https://qdrant.tech/articles/langchain-integration/\#why-do-we-need-two-different-models) Why do we need two different models? Both solve some different tasks. The first model performs feature extraction, by converting the text into vectors, while the second one helps in text generation or summarization. Disclaimer: This is not the only way to solve that task with LangChain. Such a chain is called `stuff` in the library nomenclature. ![](https://qdrant.tech/articles_data/langchain-integration/flow-diagram.png) Enough theory! This sounds like a pretty complex application, as it involves several systems. But with LangChain, it might be implemented in just a few lines of code, thanks to the recent integration with [Qdrant](https://qdrant.tech/). We’re not even going to work directly with `QdrantClient`, as everything is already done in the background by LangChain. If you want to get into the source code right away, all the processing is available as a [Google Colab notebook](https://colab.research.google.com/drive/19RxxkZdnq_YqBH5kBV10Rt0Rax-kminD?usp=sharing). ## [Anchor](https://qdrant.tech/articles/langchain-integration/\#how-to-implement-question-answering-with-langchain-and-qdrant) How to Implement Question Answering with LangChain and Qdrant ### [Anchor](https://qdrant.tech/articles/langchain-integration/\#step-1-configuration) Step 1: Configuration A journey of a thousand miles begins with a single step, in our case with the configuration of all the services. We’ll be using [Qdrant Cloud](https://cloud.qdrant.io/), so we need an API key. The same is for OpenAI - the API key has to be obtained from their website. ![](https://qdrant.tech/articles_data/langchain-integration/code-configuration.png) ### [Anchor](https://qdrant.tech/articles/langchain-integration/\#step-2-building-the-knowledge-base) Step 2: Building the knowledge base We also need some facts from which the answers will be generated. There is plenty of public datasets available, and [Natural Questions](https://ai.google.com/research/NaturalQuestions/visualization) is one of them. It consists of the whole HTML content of the websites they were scraped from. That means we need some preprocessing to extract plain text content. As a result, we’re going to have two lists of strings - one for questions and the other one for the answers. The answers have to be vectorized with the first of our models. The `sentence-transformers/all-mpnet-base-v2` is one of the possibilities, but there are some other options available. LangChain will handle that part of the process in a single function call. ![](https://qdrant.tech/articles_data/langchain-integration/code-qdrant.png) ### [Anchor](https://qdrant.tech/articles/langchain-integration/\#step-3-setting-up-qa-with-qdrant-in-a-loop) Step 3: Setting up QA with Qdrant in a loop `VectorDBQA` is a chain that performs the process described above. So it, first of all, loads some facts from Qdrant and then feeds them into OpenAI LLM which should analyze them to find the answer to a given question. The only last thing to do before using it is to put things together, also with a single function call. ![](https://qdrant.tech/articles_data/langchain-integration/code-vectordbqa.png) ## [Anchor](https://qdrant.tech/articles/langchain-integration/\#step-4-testing-out-the-chain) Step 4: Testing out the chain And that’s it! We can put some queries, and LangChain will perform all the required processing to find the answer in the provided context. ![](https://qdrant.tech/articles_data/langchain-integration/code-answering.png) ```text > what kind of music is scott joplin most famous for Scott Joplin is most famous for composing ragtime music. > who died from the band faith no more Chuck Mosley > when does maggie come on grey's anatomy Maggie first appears in season 10, episode 1, which aired on September 26, 2013. > can't take my eyes off you lyrics meaning I don't know. > who lasted the longest on alone season 2 David McIntyre lasted the longest on Alone season 2, with a total of 66 days. ``` The great thing about such a setup is that the knowledge base might be easily extended with some new facts and those will be included in the prompts sent to LLM later on. Of course, assuming their similarity to the given question will be in the top results returned by Qdrant. If you want to run the chain on your own, the simplest way to reproduce it is to open the [Google Colab notebook](https://colab.research.google.com/drive/19RxxkZdnq_YqBH5kBV10Rt0Rax-kminD?usp=sharing). ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/langchain-integration.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/langchain-integration.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-198-lllmstxt|> ## rag-customer-support-cohere-airbyte-aws - [Documentation](https://qdrant.tech/documentation/) - [Examples](https://qdrant.tech/documentation/examples/) - Question-Answering System for AI Customer Support --- # [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#question-answering-system-for-ai-customer-support) Question-Answering System for AI Customer Support | Time: 120 min | Level: Advanced | | | | --- | --- | --- | --- | Maintaining top-notch customer service is vital to business success. As your operation expands, so does the influx of customer queries. Many of these queries are repetitive, making automation a time-saving solution. Your support team’s expertise is typically kept private, but you can still use AI to automate responses securely. In this tutorial we will setup a private AI service that answers customer support queries with high accuracy and effectiveness. By leveraging Cohere’s powerful models (deployed to [AWS](https://cohere.com/deployment-options/aws)) with Qdrant Hybrid Cloud, you can create a fully private customer support system. Data synchronization, facilitated by [Airbyte](https://airbyte.com/), will complete the setup. ![Architecture diagram](https://qdrant.tech/documentation/examples/customer-support-cohere-airbyte/architecture-diagram.png) ## [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#system-design) System design The history of past interactions with your customers is not a static dataset. It is constantly evolving, as new questions are coming in. You probably have a ticketing system that stores all the interactions, or use a different way to communicate with your customers. No matter what is the communication channel, you need to bring the correct answers to the selected Large Language Model, and have an established way to do it in a continuous manner. Thus, we will build an ingestion pipeline and then a Retrieval Augmented Generation application that will use the data. - **Dataset:** a [set of Frequently Asked Questions from Qdrant\\ users](https://qdrant.tech/documentation/faq/qdrant-fundamentals/) as an incrementally updated Excel sheet - **Embedding model:** Cohere `embed-multilingual-v3.0`, to support different languages with the same pipeline - **Knowledge base:** Qdrant, running in Hybrid Cloud mode - **Ingestion pipeline:** [Airbyte](https://airbyte.com/), loading the data into Qdrant - **Large Language Model:** Cohere [Command-R](https://docs.cohere.com/docs/command-r) - **RAG:** Cohere [RAG](https://docs.cohere.com/docs/retrieval-augmented-generation-rag) using our knowledge base through a custom connector All the selected components are compatible with the [AWS](https://aws.amazon.com/) infrastructure. Thanks to Cohere models’ availability, you can build a fully private customer support system completely isolates data within your infrastructure. Also, if you have AWS credits, you can now use them without spending additional money on the models or semantic search layer. ### [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#data-ingestion) Data ingestion Building a RAG starts with a well-curated dataset. In your specific case you may prefer loading the data directly from a ticketing system, such as [Zendesk Support](https://airbyte.com/connectors/zendesk-support), [Freshdesk](https://airbyte.com/connectors/freshdesk), or maybe integrate it with a shared inbox. However, in case of customer questions quality over quantity is the key. There should be a conscious decision on what data to include in the knowledge base, so we do not confuse the model with possibly irrelevant information. We’ll assume there is an [Excel\\ sheet](https://docs.airbyte.com/integrations/sources/file) available over HTTP/FTP that Airbyte can access and load into Qdrant in an incremental manner. ### [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#cohere--qdrant-connector-for-rag) Cohere <> Qdrant Connector for RAG Cohere RAG relies on [connectors](https://docs.cohere.com/docs/connectors) which brings additional context to the model. The connector is a web service that implements a specific interface, and exposes its data through HTTP API. With that setup, the Large Language Model becomes responsible for communicating with the connectors, so building a prompt with the context is not needed anymore. ### [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#answering-bot) Answering bot Finally, we want to automate the responses and send them automatically when we are sure that the model is confident enough. Again, the way such an application should be created strongly depends on the system you are using within the customer support team. If it exposes a way to set up a webhook whenever a new question is coming in, you can create a web service and use it to automate the responses. In general, our bot should be created specifically for the platform you use, so we’ll just cover the general idea here and build a simple CLI tool. ## [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#prerequisites) Prerequisites ### [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#cohere-models-on-aws) Cohere models on AWS One of the possible ways to deploy Cohere models on AWS is to use AWS SageMaker. Cohere’s website has [a detailed\\ guide on how to deploy the models in that way](https://docs.cohere.com/docs/amazon-sagemaker-setup-guide), so you can follow the steps described there to set up your own instance. ### [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#qdrant-hybrid-cloud-on-aws) Qdrant Hybrid Cloud on AWS Our documentation covers the deployment of Qdrant on AWS as a Hybrid Cloud Environment, so you can follow the steps described there to set up your own instance. The deployment process is quite straightforward, and you can have your Qdrant cluster up and running in a few minutes. Once you perform all the steps, your Qdrant cluster should be running on a specific URL. You will need this URL and the API key to interact with Qdrant, so let’s store them both in the environment variables: shellpython ```shell export QDRANT_URL="https://qdrant.example.com" export QDRANT_API_KEY="your-api-key" ``` ```python import os os.environ["QDRANT_URL"] = "https://qdrant.example.com" os.environ["QDRANT_API_KEY"] = "your-api-key" ``` ### [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#airbyte-open-source) Airbyte Open Source Airbyte is an open-source data integration platform that helps you replicate your data in your warehouses, lakes, and databases. You can install it on your infrastructure and use it to load the data into Qdrant. The installation process is described in the [official documentation](https://docs.airbyte.com/deploying-airbyte/). Please follow the instructions to set up your own instance. #### [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#setting-up-the-connection) Setting up the connection Once you have an Airbyte up and running, you can configure the connection to load the data from the respective source into Qdrant. The configuration will require setting up the source and destination connectors. In this tutorial we will use the following connectors: - **Source:** [File](https://docs.airbyte.com/integrations/sources/file) to load the data from an Excel sheet - **Destination:** [Qdrant](https://docs.airbyte.com/integrations/destinations/qdrant) to load the data into Qdrant Airbyte UI will guide you through the process of setting up the source and destination and connecting them. Here is how the configuration of the source might look like: ![Airbyte source configuration](https://qdrant.tech/documentation/examples/customer-support-cohere-airbyte/airbyte-excel-source.png) Qdrant is our target destination, so we need to set up the connection to it. We need to specify which fields should be included to generate the embeddings. In our case it makes complete sense to embed just the questions, as we are going to look for similar questions asked in the past and provide the answers. ![Airbyte destination configuration](https://qdrant.tech/documentation/examples/customer-support-cohere-airbyte/airbyte-qdrant-destination.png) Once we have the destination set up, we can finally configure a connection. The connection will define the schedule of the data synchronization. ![Airbyte connection configuration](https://qdrant.tech/documentation/examples/customer-support-cohere-airbyte/airbyte-connection.png) Airbyte should now be ready to accept any data updates from the source and load them into Qdrant. You can monitor the progress of the synchronization in the UI. ## [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#rag-connector) RAG connector One of our previous tutorials, guides you step-by-step on [implementing custom connector for Cohere\\ RAG](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/documentation/examples/cohere-rag-connector/) with Cohere Embed v3 and Qdrant. You can just point it to use your Hybrid Cloud Qdrant instance running on AWS. Created connector might be deployed to Amazon Web Services in various ways, even in a [Serverless](https://aws.amazon.com/serverless/) manner using [AWS\\ Lambda](https://aws.amazon.com/lambda/?c=ser&sec=srv). In general, RAG connector has to expose a single endpoint that will accept POST requests with `query` parameter and return the matching documents as JSON document with a specific structure. Our FastAPI implementation created [in the\\ related tutorial](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/documentation/examples/cohere-rag-connector/) is a perfect fit for this task. The only difference is that you should point it to the Cohere models and Qdrant running on AWS infrastructure. > Our connector is a lightweight web service that exposes a single endpoint and glues the Cohere embedding model with > our Qdrant Hybrid Cloud instance. Thus, it perfectly fits the serverless architecture, requiring no additional > infrastructure to run. You can also run the connector as another service within your [Kubernetes cluster running on AWS\\ (EKS)](https://aws.amazon.com/eks/), or by launching an [EC2](https://aws.amazon.com/ec2/) compute instance. This step is dependent on the way you deploy your other services, so we’ll leave it to you to decide how to run the connector. Eventually, the web service should be available under a specific URL, and it’s a good practice to store it in the environment variable, so the other services can easily access it. shellpython ```shell export RAG_CONNECTOR_URL="https://rag-connector.example.com/search" ``` ```python os.environ["RAG_CONNECTOR_URL"] = "https://rag-connector.example.com/search" ``` ## [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#customer-interface) Customer interface At this part we have all the data loaded into Qdrant, and the RAG connector is ready to serve the relevant context. The last missing piece is the customer interface, that will call the Command model to create the answer. Such a system should be built specifically for the platform you use and integrated into its workflow, but we will build the strong foundation for it and show how to use it in a simple CLI tool. > Our application does not have to connect to Qdrant anymore, as the model will connect to the RAG connector directly. First of all, we have to create a connection to Cohere services through the Cohere SDK. ```python import cohere --- # Create a Cohere client pointing to the AWS instance cohere_client = cohere.Client(...) ``` Next, our connector should be registered. **Please make sure to do it once, and store the id of the connector in the** **environment variable or in any other way that will be accessible to the application.** ```python import os connector_response = cohere_client.connectors.create( name="customer-support", url=os.environ["RAG_CONNECTOR_URL"], ) --- # The id returned by the API should be stored for future use connector_id = connector_response.connector.id ``` Finally, we can create a prompt and get the answer from the model. Additionally, we define which of the connectors should be used to provide the context, as we may have multiple connectors and want to use specific ones, depending on some conditions. Let’s start with asking a question. ```python query = "Why Qdrant does not return my vectors?" ``` Now we can send the query to the model, get the response, and possibly send it back to the customer. ```python response = cohere_client.chat( message=query, connectors=[\ cohere.ChatConnector(id=connector_id),\ ], model="command-r", ) print(response.text) ``` The output should be the answer to the question, generated by the model, for example: > Qdrant is set up by default to minimize network traffic and therefore doesn’t return vectors in search results. However, you can make Qdrant return your vectors by setting the ‘with\_vector’ parameter of the Search/Scroll function to true. Customer support should not be fully automated, as some completely new issues might require human intervention. We should play with prompt engineering and expect the model to provide the answer with a certain confidence level. If the confidence is too low, we should not send the answer automatically but present it to the support team for review. ## [Anchor](https://qdrant.tech/documentation/examples/rag-customer-support-cohere-airbyte-aws/\#wrapping-up) Wrapping up This tutorial shows how to build a fully private customer support system using Cohere models, Qdrant Hybrid Cloud, and Airbyte, which runs on AWS infrastructure. You can ensure your data does not leave your premises and focus on providing the best customer support experience without bothering your team with repetitive tasks. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-customer-support-cohere-airbyte-aws.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/documentation/examples/rag-customer-support-cohere-airbyte-aws.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/) <|page-199-lllmstxt|> ## explore - [Documentation](https://qdrant.tech/documentation/) - [Concepts](https://qdrant.tech/documentation/concepts/) - Explore --- # [Anchor](https://qdrant.tech/documentation/concepts/explore/\#explore-the-data) Explore the data After mastering the concepts in [search](https://qdrant.tech/documentation/concepts/search/), you can start exploring your data in other ways. Qdrant provides a stack of APIs that allow you to find similar vectors in a different fashion, as well as to find the most dissimilar ones. These are useful tools for recommendation systems, data exploration, and data cleaning. ## [Anchor](https://qdrant.tech/documentation/concepts/explore/\#recommendation-api) Recommendation API In addition to the regular search, Qdrant also allows you to search based on multiple positive and negative examples. The API is called _**recommend**_, and the examples can be point IDs, so that you can leverage the already encoded objects; and, as of v1.6, you can also use raw vectors as input, so that you can create your vectors on the fly without uploading them as points. REST API - API Schema definition is available [here](https://api.qdrant.tech/api-reference/search/recommend-points) httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": { "recommend": { "positive": [100, 231], "negative": [718, [0.2, 0.3, 0.4, 0.5]], "strategy": "average_vector" } }, "filter": { "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ] } } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") client.query_points( collection_name="{collection_name}", query=models.RecommendQuery( recommend=models.RecommendInput( positive=[100, 231], negative=[718, [0.2, 0.3, 0.4, 0.5]], strategy=models.RecommendStrategy.AVERAGE_VECTOR, ) ), query_filter=models.Filter( must=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(\ value="London",\ ),\ )\ ] ), limit=3, ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); client.query("{collection_name}", { query: { recommend: { positive: [100, 231], negative: [718, [0.2, 0.3, 0.4, 0.5]], strategy: "average_vector" } }, filter: { must: [\ {\ key: "city",\ match: {\ value: "London",\ },\ },\ ], }, limit: 3 }); ``` ```rust use qdrant_client::qdrant::{ Condition, Filter, QueryPointsBuilder, RecommendInputBuilder, RecommendStrategy, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; client .query( QueryPointsBuilder::new("{collection_name}") .query( RecommendInputBuilder::default() .add_positive(100) .add_positive(231) .add_positive(vec![0.2, 0.3, 0.4, 0.5]) .add_negative(718) .strategy(RecommendStrategy::AverageVector) .build(), ) .limit(3) .filter(Filter::must([Condition::matches(\ "city",\ "London".to_string(),\ )])), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.RecommendInput; import io.qdrant.client.grpc.Points.RecommendStrategy; import io.qdrant.client.grpc.Points.Filter; import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.VectorInputFactory.vectorInput; import static io.qdrant.client.QueryFactory.recommend; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(recommend(RecommendInput.newBuilder() .addAllPositive(List.of(vectorInput(100), vectorInput(200), vectorInput(100.0f, 231.0f))) .addAllNegative(List.of(vectorInput(718), vectorInput(0.2f, 0.3f, 0.4f, 0.5f))) .setStrategy(RecommendStrategy.AverageVector) .build())) .setFilter(Filter.newBuilder().addMust(matchKeyword("city", "London"))) .setLimit(3) .build()).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new RecommendInput { Positive = { 100, 231 }, Negative = { 718 } }, filter: MatchKeyword("city", "London"), limit: 3 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQueryRecommend(&qdrant.RecommendInput{ Positive: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(100)), qdrant.NewVectorInputID(qdrant.NewIDNum(231)), }, Negative: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(718)), }, }), Filter: &qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), }, }, }) ``` Example result of this API would be ```json { "result": [\ { "id": 10, "score": 0.81 },\ { "id": 14, "score": 0.75 },\ { "id": 11, "score": 0.73 }\ ], "status": "ok", "time": 0.001 } ``` The algorithm used to get the recommendations is selected from the available `strategy` options. Each of them has its own strengths and weaknesses, so experiment and choose the one that works best for your case. ### [Anchor](https://qdrant.tech/documentation/concepts/explore/\#average-vector-strategy) Average vector strategy The default and first strategy added to Qdrant is called `average_vector`. It preprocesses the input examples to create a single vector that is used for the search. Since the preprocessing step happens very fast, the performance of this strategy is on-par with regular search. The intuition behind this kind of recommendation is that each vector component represents an independent feature of the data, so, by averaging the examples, we should get a good recommendation. The way to produce the searching vector is by first averaging all the positive and negative examples separately, and then combining them into a single vector using the following formula: ```rust avg_positive + avg_positive - avg_negative ``` In the case of not having any negative examples, the search vector will simply be equal to `avg_positive`. This is the default strategy that’s going to be set implicitly, but you can explicitly define it by setting `"strategy": "average_vector"` in the recommendation request. ### [Anchor](https://qdrant.tech/documentation/concepts/explore/\#best-score-strategy) Best score strategy _Available as of v1.6.0_ A new strategy introduced in v1.6, is called `best_score`. It is based on the idea that the best way to find similar vectors is to find the ones that are closer to a positive example, while avoiding the ones that are closer to a negative one. The way it works is that each candidate is measured against every example, then we select the best positive and best negative scores. The final score is chosen with this step formula: ```rust // Sigmoid function to normalize the score between 0 and 1 let sigmoid = |x| 0.5 * (1.0 + (x / (1.0 + x.abs()))); let score = if best_positive_score > best_negative_score { sigmoid(best_positive_score) } else { -sigmoid(best_negative_score) }; ``` Since we are computing similarities to every example at each step of the search, the performance of this strategy will be linearly impacted by the amount of examples. This means that the more examples you provide, the slower the search will be. However, this strategy can be very powerful and should be more embedding-agnostic. To use this algorithm, you need to set `"strategy": "best_score"` in the recommendation request. #### [Anchor](https://qdrant.tech/documentation/concepts/explore/\#using-only-negative-examples) Using only negative examples A beneficial side-effect of `best_score` strategy is that you can use it with only negative examples. This will allow you to find the most dissimilar vectors to the ones you provide. This can be useful for finding outliers in your data, or for finding the most dissimilar vectors to a given one. Combining negative-only examples with filtering can be a powerful tool for data exploration and cleaning. ### [Anchor](https://qdrant.tech/documentation/concepts/explore/\#sum-scores-strategy) Sum scores strategy Another strategy for using multiple query vectors simultaneously is to just sum their scores against the candidates. In qdrant, this is called `sum_scores` strategy. This strategy was used in [this paper](https://arxiv.org/abs/2210.10695) by [UKP Lab](http://www.ukp.tu-darmstadt.de/), [hessian.ai](https://hessian.ai/) and [cohere.ai](https://cohere.ai/) to incorporate relevance feedback into a subsequent search. In the paper this boosted the nDCG@20 performance by 5.6% points when using 2-8 positive feedback documents. The formula that this strategy implements is si=∑vq∈Q+s(vq,vi)−∑vq∈Q−s(vq,vi) where Q+ is the set of positive examples, Q− is the set of negative examples, and s(vq,vi) is the score of the vector vq against the vector vi As with `best_score`, this strategy also allows using only negative examples. ### [Anchor](https://qdrant.tech/documentation/concepts/explore/\#multiple-vectors) Multiple vectors _Available as of v0.10.0_ If the collection was created with multiple vectors, the name of the vector should be specified in the recommendation request: httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": { "recommend": { "positive": [100, 231], "negative": [718] } }, "using": "image", "limit": 10 } ``` ```python client.query_points( collection_name="{collection_name}", query=models.RecommendQuery( recommend=models.RecommendInput( positive=[100, 231], negative=[718], ) ), using="image", limit=10, ) ``` ```typescript client.query("{collection_name}", { query: { recommend: { positive: [100, 231], negative: [718], } }, using: "image", limit: 10 }); ``` ```rust use qdrant_client::qdrant::{QueryPointsBuilder, RecommendInputBuilder}; client .query( QueryPointsBuilder::new("{collection_name}") .query( RecommendInputBuilder::default() .add_positive(100) .add_positive(231) .add_negative(718) .build(), ) .limit(10) .using("image"), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.RecommendInput; import static io.qdrant.client.VectorInputFactory.vectorInput; import static io.qdrant.client.QueryFactory.recommend; client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(recommend(RecommendInput.newBuilder() .addAllPositive(List.of(vectorInput(100), vectorInput(231))) .addAllNegative(List.of(vectorInput(718))) .build())) .setUsing("image") .setLimit(10) .build()).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new RecommendInput { Positive = { 100, 231 }, Negative = { 718 } }, usingVector: "image", limit: 10 ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQueryRecommend(&qdrant.RecommendInput{ Positive: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(100)), qdrant.NewVectorInputID(qdrant.NewIDNum(231)), }, Negative: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(718)), }, }), Using: qdrant.PtrOf("image"), }) ``` Parameter `using` specifies which stored vectors to use for the recommendation. ### [Anchor](https://qdrant.tech/documentation/concepts/explore/\#lookup-vectors-from-another-collection) Lookup vectors from another collection _Available as of v0.11.6_ If you have collections with vectors of the same dimensionality, and you want to look for recommendations in one collection based on the vectors of another collection, you can use the `lookup_from` parameter. It might be useful, e.g. in the item-to-user recommendations scenario. Where user and item embeddings, although having the same vector parameters (distance type and dimensionality), are usually stored in different collections. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/points/query { "query": { "recommend": { "positive": [100, 231], "negative": [718] } }, "limit": 10, "lookup_from": { "collection": "{external_collection_name}", "vector": "{external_vector_name}" } } ``` ```python client.query_points( collection_name="{collection_name}", query=models.RecommendQuery( recommend=models.RecommendInput( positive=[100, 231], negative=[718], ) ), using="image", limit=10, lookup_from=models.LookupLocation( collection="{external_collection_name}", vector="{external_vector_name}" ), ) ``` ```typescript client.query("{collection_name}", { query: { recommend: { positive: [100, 231], negative: [718], } }, using: "image", limit: 10, lookup_from: { collection: "{external_collection_name}", vector: "{external_vector_name}" } }); ``` ```rust use qdrant_client::qdrant::{LookupLocationBuilder, QueryPointsBuilder, RecommendInputBuilder}; client .query( QueryPointsBuilder::new("{collection_name}") .query( RecommendInputBuilder::default() .add_positive(100) .add_positive(231) .add_negative(718) .build(), ) .limit(10) .using("image") .lookup_from( LookupLocationBuilder::new("{external_collection_name}") .vector_name("{external_vector_name}"), ), ) .await?; ``` ```java import java.util.List; import io.qdrant.client.grpc.Points.LookupLocation; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.RecommendInput; import static io.qdrant.client.VectorInputFactory.vectorInput; import static io.qdrant.client.QueryFactory.recommend; client.queryAsync(QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(recommend(RecommendInput.newBuilder() .addAllPositive(List.of(vectorInput(100), vectorInput(231))) .addAllNegative(List.of(vectorInput(718))) .build())) .setUsing("image") .setLimit(10) .setLookupFrom( LookupLocation.newBuilder() .setCollectionName("{external_collection_name}") .setVectorName("{external_vector_name}") .build()) .build()).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; var client = new QdrantClient("localhost", 6334); await client.QueryAsync( collectionName: "{collection_name}", query: new RecommendInput { Positive = { 100, 231 }, Negative = { 718 } }, usingVector: "image", limit: 10, lookupFrom: new LookupLocation { CollectionName = "{external_collection_name}", VectorName = "{external_vector_name}", } ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) client.Query(context.Background(), &qdrant.QueryPoints{ CollectionName: "{collection_name}", Query: qdrant.NewQueryRecommend(&qdrant.RecommendInput{ Positive: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(100)), qdrant.NewVectorInputID(qdrant.NewIDNum(231)), }, Negative: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(718)), }, }), Using: qdrant.PtrOf("image"), LookupFrom: &qdrant.LookupLocation{ CollectionName: "{external_collection_name}", VectorName: qdrant.PtrOf("{external_vector_name}"), }, }) ``` Vectors are retrieved from the external collection by ids provided in the `positive` and `negative` lists. These vectors then used to perform the recommendation in the current collection, comparing against the “using” or default vector. ## [Anchor](https://qdrant.tech/documentation/concepts/explore/\#batch-recommendation-api) Batch recommendation API _Available as of v0.10.0_ Similar to the batch search API in terms of usage and advantages, it enables the batching of recommendation requests. httppythontypescriptrustjavacsharpgo ```http POST /collections/{collection_name}/query/batch { "searches": [\ {\ "query": {\ "recommend": {\ "positive": [100, 231],\ "negative": [718]\ }\ },\ "filter": {\ "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ]\ },\ "limit": 10\ },\ {\ "query": {\ "recommend": {\ "positive": [200, 67],\ "negative": [300]\ }\ },\ "filter": {\ "must": [\ {\ "key": "city",\ "match": {\ "value": "London"\ }\ }\ ]\ },\ "limit": 10\ }\ ] } ``` ```python from qdrant_client import QdrantClient, models client = QdrantClient(url="http://localhost:6333") filter_ = models.Filter( must=[\ models.FieldCondition(\ key="city",\ match=models.MatchValue(\ value="London",\ ),\ )\ ] ) recommend_queries = [\ models.QueryRequest(\ query=models.RecommendQuery(\ recommend=models.RecommendInput(positive=[100, 231], negative=[718])\ ),\ filter=filter_,\ limit=3,\ ),\ models.QueryRequest(\ query=models.RecommendQuery(\ recommend=models.RecommendInput(positive=[200, 67], negative=[300])\ ),\ filter=filter_,\ limit=3,\ ),\ ] client.query_batch_points( collection_name="{collection_name}", requests=recommend_queries ) ``` ```typescript import { QdrantClient } from "@qdrant/js-client-rest"; const client = new QdrantClient({ host: "localhost", port: 6333 }); const filter = { must: [\ {\ key: "city",\ match: {\ value: "London",\ },\ },\ ], }; const searches = [\ {\ query: {\ recommend: {\ positive: [100, 231],\ negative: [718]\ }\ },\ filter,\ limit: 3,\ },\ {\ query: {\ recommend: {\ positive: [200, 67],\ negative: [300]\ }\ },\ filter,\ limit: 3,\ },\ ]; client.queryBatch("{collection_name}", { searches, }); ``` ```rust use qdrant_client::qdrant::{ Condition, Filter, QueryBatchPointsBuilder, QueryPointsBuilder, RecommendInputBuilder, }; use qdrant_client::Qdrant; let client = Qdrant::from_url("http://localhost:6334").build()?; let filter = Filter::must([Condition::matches("city", "London".to_string())]); let recommend_queries = vec![\ QueryPointsBuilder::new("{collection_name}")\ .query(\ RecommendInputBuilder::default()\ .add_positive(100)\ .add_positive(231)\ .add_negative(718)\ .build(),\ )\ .filter(filter.clone())\ .build(),\ QueryPointsBuilder::new("{collection_name}")\ .query(\ RecommendInputBuilder::default()\ .add_positive(200)\ .add_positive(67)\ .add_negative(300)\ .build(),\ )\ .filter(filter)\ .build(),\ ]; client .query_batch(QueryBatchPointsBuilder::new( "{collection_name}", recommend_queries, )) .await?; ``` ```java import java.util.List; import io.qdrant.client.QdrantClient; import io.qdrant.client.QdrantGrpcClient; import io.qdrant.client.grpc.Points.Filter; import io.qdrant.client.grpc.Points.QueryPoints; import io.qdrant.client.grpc.Points.RecommendInput; import static io.qdrant.client.ConditionFactory.matchKeyword; import static io.qdrant.client.VectorInputFactory.vectorInput; import static io.qdrant.client.QueryFactory.recommend; QdrantClient client = new QdrantClient(QdrantGrpcClient.newBuilder("localhost", 6334, false).build()); Filter filter = Filter.newBuilder().addMust(matchKeyword("city", "London")).build(); List recommendQueries = List.of( QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(recommend( RecommendInput.newBuilder() .addAllPositive(List.of(vectorInput(100), vectorInput(231))) .addAllNegative(List.of(vectorInput(731))) .build())) .setFilter(filter) .setLimit(3) .build(), QueryPoints.newBuilder() .setCollectionName("{collection_name}") .setQuery(recommend( RecommendInput.newBuilder() .addAllPositive(List.of(vectorInput(200), vectorInput(67))) .addAllNegative(List.of(vectorInput(300))) .build())) .setFilter(filter) .setLimit(3) .build()); client.queryBatchAsync("{collection_name}", recommendQueries).get(); ``` ```csharp using Qdrant.Client; using Qdrant.Client.Grpc; using static Qdrant.Client.Grpc.Conditions; var client = new QdrantClient("localhost", 6334); var filter = MatchKeyword("city", "london"); await client.QueryBatchAsync( collectionName: "{collection_name}", queries: [\ new QueryPoints()\ {\ CollectionName = "{collection_name}",\ Query = new RecommendInput {\ Positive = { 100, 231 },\ Negative = { 718 },\ },\ Limit = 3,\ Filter = filter,\ },\ new QueryPoints()\ {\ CollectionName = "{collection_name}",\ Query = new RecommendInput {\ Positive = { 200, 67 },\ Negative = { 300 },\ },\ Limit = 3,\ Filter = filter,\ }\ ] ); ``` ```go import ( "context" "github.com/qdrant/go-client/qdrant" ) client, err := qdrant.NewClient(&qdrant.Config{ Host: "localhost", Port: 6334, }) filter := qdrant.Filter{ Must: []*qdrant.Condition{ qdrant.NewMatch("city", "London"), }, } client.QueryBatch(context.Background(), &qdrant.QueryBatchPoints{ CollectionName: "{collection_name}", QueryPoints: []*qdrant.QueryPoints{ { CollectionName: "{collection_name}", Query: qdrant.NewQueryRecommend(&qdrant.RecommendInput{ Positive: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(100)), qdrant.NewVectorInputID(qdrant.NewIDNum(231)), }, Negative: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(718)), }, }, ), Filter: &filter, }, { CollectionName: "{collection_name}", Query: qdrant.NewQueryRecommend(&qdrant.RecommendInput{ Positive: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(200)), qdrant.NewVectorInputID(qdrant.NewIDNum(67)), }, Negative: []*qdrant.VectorInput{ qdrant.NewVectorInputID(qdrant.NewIDNum(300)), }, }, ), Filter: &filter, }, }, }, ) ``` The result of this API contains one array per recommendation requests. ```json { "result": [\ [\ { "id": 10, "score": 0.81 },\ { "id": 14, "score": 0.75 },\ { "id": 11, "score": 0.73 }\ ],\ [\ { "id": 1, "score": 0.92 },\ { "id": 3, "score": 0.89 },\ { "id": 9, "score": 0.75 }\ ]\ ], "status": "ok", "time": 0.001 } ``` ## [Anchor](https://qdrant.tech/documentation/concepts/explore/\#discovery-api) Discovery API _Available as of v1.7_ REST API Schema definition available [here](https://api.qdrant.tech/api-reference/search/discover-points) In this API, Qdrant introduces the concept of `context`, which is used for splitting the space. Context is a set of positive-negative pairs, and each pair divides the space into positive and negative zones. In that mode, the search operation prefers points based on how many positive zones they belong to (or how much they avoid negative zones). The interface for providing context is similar to the recommendation API (ids or raw vectors). Still, in this case, they need to be provided in the form of positive-negative pairs. Discovery API lets you do two new types of search: - **Discovery search**: Uses the context (the pairs of positive-negative vectors) and a target to return the points more similar to the target, but constrained by the context. - **Context search**: Using only the context pairs, get the points that live in the best zone, where loss is minimized The way positive and negative examples should be arranged in the context pairs is completely up to you. So you can have the flexibility of trying out different permutation techniques based on your model and data. ### [Anchor](https://qdrant.tech/documentation/concepts/explore/\#discovery-search) Discovery search This type of search works specially well for combining multimodal, vector-constrained searches. Qdrant already has extensive support for filters, which constrain the search based on its payload, but using discovery search, you can also constrain the vector space in which the search is performed. ![Discovery search](https://qdrant.tech/docs/discovery-search.png) The formula for the discovery score can be expressed as: rank(v+,v−)={1,s(v+)≥s(v−)−1,s(v+) ## chatgpt-plugin - [Articles](https://qdrant.tech/articles/) - Extending ChatGPT with a Qdrant-based knowledge base [Back to Practical Examples](https://qdrant.tech/articles/practicle-examples/) --- # Extending ChatGPT with a Qdrant-based knowledge base Kacper Łukawski · March 23, 2023 ![Extending ChatGPT with a Qdrant-based knowledge base](https://qdrant.tech/articles_data/chatgpt-plugin/preview/title.jpg) In recent months, ChatGPT has revolutionised the way we communicate, learn, and interact with technology. Our social platforms got flooded with prompts, responses to them, whole articles and countless other examples of using Large Language Models to generate content unrecognisable from the one written by a human. Despite their numerous benefits, these models have flaws, as evidenced by the phenomenon of hallucination - the generation of incorrect or nonsensical information in response to user input. This issue, which can compromise the reliability and credibility of AI-generated content, has become a growing concern among researchers and users alike. Those concerns started another wave of entirely new libraries, such as Langchain, trying to overcome those issues, for example, by combining tools like vector databases to bring the required context into the prompts. And that is, so far, the best way to incorporate new and rapidly changing knowledge into the neural model. So good that OpenAI decided to introduce a way to extend the model capabilities with external plugins at the model level. These plugins, designed to enhance the model’s performance, serve as modular extensions that seamlessly interface with the core system. By adding a knowledge base plugin to ChatGPT, we can effectively provide the AI with a curated, trustworthy source of information, ensuring that the generated content is more accurate and relevant. Qdrant may act as a vector database where all the facts will be stored and served to the model upon request. If you’d like to ask ChatGPT questions about your data sources, such as files, notes, or emails, starting with the official [ChatGPT retrieval plugin repository](https://github.com/openai/chatgpt-retrieval-plugin) is the easiest way. Qdrant is already integrated, so that you can use it right away. In the following sections, we will guide you through setting up the knowledge base using Qdrant and demonstrate how this powerful combination can significantly improve ChatGPT’s performance and output quality. ## [Anchor](https://qdrant.tech/articles/chatgpt-plugin/\#implementing-a-knowledge-base-with-qdrant) Implementing a knowledge base with Qdrant The official ChatGPT retrieval plugin uses a vector database to build your knowledge base. Your documents are chunked and vectorized with the OpenAI’s text-embedding-ada-002 model to be stored in Qdrant. That enables semantic search capabilities. So, whenever ChatGPT thinks it might be relevant to check the knowledge base, it forms a query and sends it to the plugin to incorporate the results into its response. You can now modify the knowledge base, and ChatGPT will always know the most recent facts. No model fine-tuning is required. Let’s implement that for your documents. In our case, this will be Qdrant’s documentation, so you can ask even technical questions about Qdrant directly in ChatGPT. Everything starts with cloning the plugin’s repository. ```bash git clone git@github.com:openai/chatgpt-retrieval-plugin.git ``` Please use your favourite IDE to open the project once cloned. ### [Anchor](https://qdrant.tech/articles/chatgpt-plugin/\#prerequisites) Prerequisites You’ll need to ensure three things before we start: 1. Create an OpenAI API key, so you can use their embeddings model programmatically. If you already have an account, you can generate one at [https://platform.openai.com/account/api-keys](https://platform.openai.com/account/api-keys). Otherwise, registering an account might be required. 2. Run a Qdrant instance. The instance has to be reachable from the outside, so you either need to launch it on-premise or use the [Qdrant Cloud](https://cloud.qdrant.io/) offering. A free 1GB cluster is available, which might be enough in many cases. We’ll use the cloud. 3. Since ChatGPT will interact with your service through the network, you must deploy it, making it possible to connect from the Internet. Unfortunately, localhost is not an option, but any provider, such as Heroku or fly.io, will work perfectly. We will use [fly.io](https://fly.io/), so please register an account. You may also need to install the flyctl tool for the deployment. The process is described on the homepage of fly.io. ### [Anchor](https://qdrant.tech/articles/chatgpt-plugin/\#configuration) Configuration The retrieval plugin is a FastAPI-based application, and its default functionality might be enough in most cases. However, some configuration is required so ChatGPT knows how and when to use it. However, we can start setting up Fly.io, as we need to know the service’s hostname to configure it fully. First, let’s login into the Fly CLI: ```bash flyctl auth login ``` That will open the browser, so you can simply provide the credentials, and all the further commands will be executed with your account. If you have never used fly.io, you may need to give the credit card details before running any instance, but there is a Hobby Plan you won’t be charged for. Let’s try to launch the instance already, but do not deploy it. We’ll get the hostname assigned and have all the details to fill in the configuration. The retrieval plugin uses TCP port 8080, so we need to configure fly.io, so it redirects all the traffic to it as well. ```bash flyctl launch --no-deploy --internal-port 8080 ``` We’ll be prompted about the application name and the region it should be deployed to. Please choose whatever works best for you. After that, we should see the hostname of the newly created application: ```text ... Hostname: your-application-name.fly.dev ... ``` Let’s note it down. We’ll need it for the configuration of the service. But we’re going to start with setting all the applications secrets: ```bash flyctl secrets set DATASTORE=qdrant \ OPENAI_API_KEY= \ QDRANT_URL=https://.aws.cloud.qdrant.io \ QDRANT_API_KEY= \ BEARER_TOKEN=eyJhbGciOiJIUzI1NiJ9.e30.ZRrHA1JJJW8opsbCGfG_HACGpVUMN_a9IV7pAx_Zmeo ``` The secrets will be staged for the first deployment. There is an example of a minimal Bearer token generated by [https://jwt.io/](https://jwt.io/). **Please adjust the token and do not expose** **it publicly, but you can keep the same value for the demo.** Right now, let’s dive into the application config files. You can optionally provide your icon and keep it as `.well-known/logo.png` file, but there are two additional files we’re going to modify. The `.well-known/openapi.yaml` file describes the exposed API in the OpenAPI format. Lines 3 to 5 might be filled with the application title and description, but the essential part is setting the server URL the application will run. Eventually, the top part of the file should look like the following: ```yaml openapi: 3.0.0 info: title: Qdrant Plugin API version: 1.0.0 description: Plugin for searching through the Qdrant doc… servers: - url: https://your-application-name.fly.dev ... ``` There is another file in the same directory, and that’s the most crucial piece to configure. It contains the description of the plugin we’re implementing, and ChatGPT uses this description to determine if it should communicate with our knowledge base. The file is called `.well-known/ai-plugin.json`, and let’s edit it before we finally deploy the app. There are various properties we need to fill in: | **Property** | **Meaning** | **Example** | | --- | --- | --- | | `name_for_model` | Name of the plugin for the ChatGPT model | _qdrant_ | | `name_for_human` | Human-friendly model name, to be displayed in ChatGPT UI | _Qdrant Documentation Plugin_ | | `description_for_model` | Description of the purpose of the plugin, so ChatGPT knows in what cases it should be using it to answer a question. | _Plugin for searching through the Qdrant documentation to find answers to questions and retrieve relevant information. Use it whenever a user asks something that might be related to Qdrant vector database or semantic vector search_ | | `description_for_human` | Short description of the plugin, also to be displayed in the ChatGPT UI. | _Search through Qdrant docs_ | | `auth` | Authorization scheme used by the application. By default, the bearer token has to be configured. | `{"type": "user_http", "authorization_type": "bearer"}` | | `api.url` | Link to the OpenAPI schema definition. Please adjust based on your application URL. | _[https://your-application-name.fly.dev/.well-known/openapi.yaml](https://your-application-name.fly.dev/.well-known/openapi.yaml)_ | | `logo_url` | Link to the application logo. Please adjust based on your application URL. | _[https://your-application-name.fly.dev/.well-known/logo.png](https://your-application-name.fly.dev/.well-known/logo.png)_ | A complete file may look as follows: ```json { "schema_version": "v1", "name_for_model": "qdrant", "name_for_human": "Qdrant Documentation Plugin", "description_for_model": "Plugin for searching through the Qdrant documentation to find answers to questions and retrieve relevant information. Use it whenever a user asks something that might be related to Qdrant vector database or semantic vector search", "description_for_human": "Search through Qdrant docs", "auth": { "type": "user_http", "authorization_type": "bearer" }, "api": { "type": "openapi", "url": "https://your-application-name.fly.dev/.well-known/openapi.yaml", "has_user_authentication": false }, "logo_url": "https://your-application-name.fly.dev/.well-known/logo.png", "contact_email": "email@domain.com", "legal_info_url": "email@domain.com" } ``` That was the last step before running the final command. The command that will deploy the application on the server: ```bash flyctl deploy ``` The command will build the image using the Dockerfile and deploy the service at a given URL. Once the command is finished, the service should be running on the hostname we got previously: ```text https://your-application-name.fly.dev ``` ## [Anchor](https://qdrant.tech/articles/chatgpt-plugin/\#integration-with-chatgpt) Integration with ChatGPT Once we have deployed the service, we can point ChatGPT to it, so the model knows how to connect. When you open the ChatGPT UI, you should see a dropdown with a Plugins tab included: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-1.png) Once selected, you should be able to choose one of check the plugin store: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-2.png) There are some premade plugins available, but there’s also a possibility to install your own plugin by clicking on the “ _Develop your own plugin_” option in the bottom right corner: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-3.png) We need to confirm our plugin is ready, but since we relied on the official retrieval plugin from OpenAI, this should be all fine: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-4.png) After clicking on “ _My manifest is ready_”, we can already point ChatGPT to our newly created service: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-5.png) A successful plugin installation should end up with the following information: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-6.png) There is a name and a description of the plugin we provided. Let’s click on “ _Done_” and return to the “ _Plugin store_” window again. There is another option we need to choose in the bottom right corner: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-7.png) Our plugin is not officially verified, but we can, of course, use it freely. The installation requires just the service URL: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-8.png) OpenAI cannot guarantee the plugin provides factual information, so there is a warning we need to accept: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-9.png) Finally, we need to provide the Bearer token again: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-10.png) Our plugin is now ready to be tested. Since there is no data inside the knowledge base, extracting any facts is impossible, but we’re going to put some data using the Swagger UI exposed by our service at [https://your-application-name.fly.dev/docs](https://your-application-name.fly.dev/docs). We need to authorize first, and then call the upsert method with some docs. For the demo purposes, we can just put a single document extracted from the Qdrant documentation to see whether integration works properly: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-11.png) We can come back to ChatGPT UI, and send a prompt, but we need to make sure the plugin is selected: ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-12.png) Now if our prompt seems somehow related to the plugin description provided, the model will automatically form a query and send it to the HTTP API. The query will get vectorized by our app, and then used to find some relevant documents that will be used as a context to generate the response. ![](https://qdrant.tech/articles_data/chatgpt-plugin/step-13.png) We have a powerful language model, that can interact with our knowledge base, to return not only grammatically correct but also factual information. And this is how your interactions with the model may start to look like: ChatGPT Plugin with Qdrant Vector Database - YouTube [Photo image of Andre Zayarni](https://www.youtube.com/channel/UCexRNCxjOZnYTMxFSxpcKpw?embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) Andre Zayarni 14 subscribers [ChatGPT Plugin with Qdrant Vector Database](https://www.youtube.com/watch?v=fQUGuHEYeog) Andre Zayarni Search Watch later Share Copy link Info Shopping Tap to unmute If playback doesn't begin shortly, try restarting your device. More videos ## More videos You're signed out Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer. CancelConfirm Share Include playlist An error occurred while retrieving sharing information. Please try again later. [Watch on](https://www.youtube.com/watch?v=fQUGuHEYeog&embeds_referring_euri=https%3A%2F%2Fqdrant.tech%2F) 0:00 0:00 / 1:54 •Live • [Watch on YouTube](https://www.youtube.com/watch?v=fQUGuHEYeog "Watch on YouTube") However, a single document is not enough to enable the full power of the plugin. If you want to put more documents that you have collected, there are already some scripts available in the `scripts/` directory that allows converting JSON, JSON lines or even zip archives. ##### Was this page useful? ![Thumb up icon](https://qdrant.tech/icons/outline/thumb-up.svg) Yes ![Thumb down icon](https://qdrant.tech/icons/outline/thumb-down.svg) No Thank you for your feedback! 🙏 We are sorry to hear that. 😔 You can [edit](https://qdrant.tech/github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/chatgpt-plugin.md) this page on GitHub, or [create](https://github.com/qdrant/landing_page/issues/new/choose) a GitHub issue. On this page: - [Edit on Github](https://github.com/qdrant/landing_page/tree/master/qdrant-landing/content/articles/chatgpt-plugin.md) - [Create an issue](https://github.com/qdrant/landing_page/issues/new/choose) × [Powered by](https://qdrant.tech/)