# Vespa > The Vespa documentation ([provides all the information required to use all Vespa features and deploy them in any supported environment. --- # Source: https://docs.vespa.ai/en/learn/about-documentation.html.md # About this documentation The Vespa documentation ([https://docs.vespa.ai/](https://docs.vespa.ai/)) provides all the information required to use all Vespa features and deploy them in any supported environment. It is split into guides and tutorials, which explains features and how to use them to solve problems, and reference documentation which list complete information about all features and APIs. ## Applicability The Vespa platform is open source, and can be deployed in self-managed systems and on the Vespa Cloud service. Some add-ons (but no core functionality) are only available under a commercial license. Documents that describe functionality with such limited applicability are clearly marked by one or more of the following chips: | Vespa CloudThis content is applicable to Vespa Cloud deployments. | Only applicable to Vespa Cloud deployments. | | Self-managedThis content is applicable to self-managed Vespa systems. | Only applicable to self-managed deployments. | | EnterpriseNot open source: This functionality is only available commercially. | Not open source: Available commercially only (both self-managed and on cloud unless also marked by one of the other chips above). | For clarity, any document _not_ marked with any of these chips describes functionality that is open source and available both on Vespa Cloud and self-managed deployments. ## Contributing If you find errors or want to improve the documentation,[create an issue](https://github.com/vespa-engine/vespa/issues) or[contribute a fix](contributing). See the [README](https://docs.vespa.ai/README.md)before contributing. ## Notation _Italic_ is used for: - Pathnames, filenames, program names, hostnames, and URLs - New terms where they are defined `Constant Width` is used for: - Programming language elements, code examples, keywords, functions, classes, interfaces, methods, etc. - Commands and command-line output Commands meant to be run on the command line are shown like this, prepended by a $ for the prompt: ``` $ export PATH=$VESPA_HOME/bin:$PATH #how to highlight text in \ ``` Notes and other Important pieces of information are shown like: **Note:** Some info here **Important:** Important info here **Warning:** Warning here **Deprecated:** Deprecation warning here Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/access-logging.html.md # Access Logging The Vespa access log format allows the logs to be processed by a number of available tools handling JSON based (log) files. With the ability to add custom key/value pairs to the log from any Searcher, you can easily track the decisions done by container components for given requests. ## Vespa Access Log Format In the Vespa access log, each log event is logged as a JSON object on a single line. The log format defines a list of fields that can be logged with every request. In addition to these fields, [custom key/value pairs](#logging-key-value-pairs-to-the-json-access-log-from-searchers) can be logged via Searcher code. Pre-defined fields: | Name | Type | Description | Always present | | --- | --- | --- | --- | | ip | string | The IP address request came from | yes | | time | number | UNIX timestamp with millisecond decimal precision (e.g. 1477828938.123) when request is received | yes | | duration | number | The duration of the request in seconds with millisecond decimal precision (e.g. 0.123) | yes | | responsesize | number | The size of the response in bytes | yes | | code | number | The HTTP status code returned | yes | | method | string | The HTTP method used (e.g. 'GET') | yes | | uri | string | The request URI from path and beyond (e.g. '/search?query=test') | yes | | version | string | The HTTP version (e.g. 'HTTP/1.1') | yes | | agent | string | The user agent specified in the request | yes | | host | string | The host header provided in the request | yes | | scheme | string | The scheme of the request | yes | | port | number | The IP port number of the interface on which the request was received | yes | | remoteaddr | string | The IP address of the [remote client](#logging-remote-address-port) if specified in HTTP header | no | | remoteport | string | The port used from the [remote client](#logging-remote-address-port) if specified in HTTP header | no | | peeraddr | string | Address of immediate client making request if different from _remoteaddr_ | no | | peerport | string | Port used by immediate client making request if different from _remoteport_ | no | | user-principal | string | The name of the authenticated user (java.security.Principal.getName()) if principal is set | no | | ssl-principal | string | The name of the x500 principal if client is authenticated through SSL/TLS | no | | search | object | Object holding search specific fields | no | | search.totalhits | number | The total number of hits for the query | no | | search.hits | number | The hits returned in this specific response | no | | search.coverage | object | Object holding [query coverage information](../performance/graceful-degradation.html) similar to that returned in result set. | no | | connection | string | Reference to the connection log entry. See [Connection log](#connection-log) | no | | attributes | object | Object holding [custom key/value pairs](#logging-key-value-pairs-to-the-json-access-log-from-searchers) logged in searcher. | no | **Note:** IP addresses can be both IPv4 addresses in standard dotted format (e.g. 127.0.0.1) or IPv6 addresses in standard form with leading zeros omitted (e.g. 2222:1111:123:1234:0:0:0:4321). An example log line will look like this (here, pretty-printed): ``` { "ip": "152.200.54.243", "time": 920880005.023, "duration": 0.122, "responsesize": 9875, "code": 200, "method": "GET", "uri": "/search?query=test¶m=value", "version": "HTTP/1.1", "agent": "Mozilla/4.05 [en] (Win95; I)", "host": "localhost", "search": { "totalhits": 1234, "hits": 0, "coverage": { "coverage": 98, "documents": 100, "degraded": { "non-ideal-state": true } } } } ``` **Note:** The log format is extendable by design such that the order of the fields can be changed and new fields can be added between minor versions. Make sure any programmatic log handling is using a proper JSON processor. Example: Decompress, pretty-print, with human-readable timestamps: ``` $[jq](https://stedolan.github.io/jq/)'. + {iso8601date:(.time | todateiso8601)}' \ <(unzstd -c /opt/vespa/logs/vespa/access/JsonAccessLog.default.20210601010000.zst) ``` ### Logging Remote Address/Port In some cases when a request passes through an intermediate service, this service may add HTTP headers indicating the IP address and port of the real origin client. These values are logged as _remoteaddr_ and _remoteport_ respectively. Vespa will log the contents in any of the following HTTP request headers as _remoteaddr_: _X-Forwarded-For_, _Y-RA_, _YahooRemoteIP_ or _Client-IP_. If more than one of these headers are present, the precedence is in the order listed here, i.e. _X-Forwarded-For_ takes precedence over _Y-RA_. The contents of the _Y-RP_ HTTP request header will be logged as _remoteport_. If the remote address or -port differs from those initiating the HTTP request, the address and port for the immediate client making the request are logged as _peeraddress_ and _peerport_ respectively. ## Configuring Logging For details on the access logging configuration see [accesslog in the container](../reference/applications/services/container.html#accesslog) element in _services.xml_. Key configuration options include: - **fileNamePattern**: Pattern for log file names with time variable support - **rotationInterval**: Time-based rotation schedule (minutes since midnight) - **rotationSize**: Size-based rotation threshold in bytes (0 = disabled) - **rotationScheme**: Either 'sequence' or 'date' - **compressionFormat**: GZIP or ZSTD compression for rotated files ### Logging Request Content Vespa supports logging of request content for specific URI paths. This is useful for inspecting query content of search POST requests or document operations of Document v1 POST/PUT requests. The request content is logged as a base64-encoded string in the JSON access log. To configure request content logging, use the [request-content](../reference/applications/services/container.html#request-content) element in the accesslog configuration in _services.xml_. Here is an example of how the request content appears in the JSON access log: ``` { ... "method": "POST", "uri": "/search", ..., "request-content": { "type": "application/json; charset=utf-8", "length": 12345, "body": "" } } ``` ### File name pattern The file name pattern is expanded using the time when the file is created. The following parts in the file name are expanded: | Field | Format | Meaning | Example | | --- | --- | --- | --- | | %Y | YYYY | Year | 2003 | | %m | MM | Month, numeric | 08 | | %x | MMM | Month, textual | Aug | | %d | dd | Date | 25 | | %H | HH | Hour | 14 | | %M | mm | Minute | 30 | | %S | ss | Seconds | 35 | | %s | SSS | Milliseconds | 123 | | %Z | Z | Time zone | -0400 | | %T | Long | System.currentTimeMillis | 1349333576093 | | %% | % | Escape percentage | % | ## Log rotation Apache httpd style log _rotation_ can be configured by setting the _rotationScheme_. There's two alternatives for the rotationScheme, sequence and date. Rotation can be triggered by time intervals using _rotationInterval_ and/or by file size using _rotationSize_. ### Sequence rotation scheme The _fileNamePattern_ is used for the active log file name (which in this case will often be a constant string). At rotation, this file is given the name fileNamePattern.N where N is 1 + the largest integer found by extracting the integers from all files ending by .\ in the same directory ``` ``` ### Date rotation scheme The _fileNamePattern_ is used for the active log file name here too, but the log files are not renamed at rotation. Instead, you must specify a time-dependent fileNamePattern so that each time a new log file is created, the name is unique. In addition, a symlink is created pointing to the active log file. The name of the symlink is specified using _symlinkName_. ``` ``` ### Rotation interval The time of rotation is controlled by setting _rotationInterval_: ``` ``` The rotationInterval is a list of numbers specifying when to do rotation. Each element represents the number of minutes since midnight. Ending the list with '...' means continuing the [arithmetic progression](https://en.wikipedia.org/wiki/Arithmetic_progression) defined by the two last numbers for the rest of the day. E.g. "0 100 240 480 ..." is expanded to "0 100 240 480 720 960 1200" ### Log retention Access logs are rotated, but not deleted by Vespa processes. It is up to the application owner to take care of archiving of access logs. ## Logging Key/Value pairs to the JSON Access Log from Searchers To add a key/value pair to the access log from a searcher, use ``` query/result.getContext(true).logValue(key,value) ``` Such key/value pairs may be added from any thread participating in handling the query without incurring synchronization overhead. If the same key is logged multiple times, the values written will be included in the log as an array of strings rather than a single string value. The key/value pairs are added to the _attributes_ object in the log. An example log line will then look something like this: ``` {"ip":"152.200.54.243","time":920880005.023,"duration":0.122,"responsesize":9875,"code":200,"method":"GET","uri":"/search?query=test¶m=value","version":"HTTP/1.1","agent":"Mozilla/4.05 [en] (Win95; I)","host":"localhost","search":{"totalhits":1234,"hits":0},"attributes":{"singlevalue":"value1","multivalue":["value2","value3"]}} ``` A pretty print version of the same example: ``` { "ip": "152.200.54.243", "time": 920880005.023, "duration": 0.122, "responsesize": 9875, "code": 200, "method": "GET", "uri": "/search?query=test¶m=value", "version": "HTTP/1.1", "agent": "Mozilla/4.05 [en] (Win95; I)", "host": "localhost", "search": { "totalhits": 1234, "hits": 0 }, "attributes": { "singlevalue": "value1", "multivalue": [ "value2", "value3" ] } } ``` ## Connection log In addition to the access log, one entry per connection is written to the connection log. This entry is written on connection close. Available fields: | Name | Type | Description | Always present | | --- | --- | --- | --- | | id | string | Unique ID of the connection, referenced from access log. | yes | | timestamp | number | Timestamp (ISO8601 format) when the connection was opened | yes | | duration | number | The duration of the request in seconds with millisecond decimal precision (e.g. 0.123) | yes | | peerAddress | string | IP address used by immediate client making request | yes | | peerPort | number | Port used by immediate client making request | yes | | localAddress | string | The local IP address the request was received on | yes | | localPort | number | The local port the request was received on | yes | | remoteAddress | string | Original client ip, if proxy protocol enabled | no | | remotePort | number | Original client port, if proxy protocol enabled | no | | httpBytesReceived | number | Number of HTTP bytes sent over the connection | no | | httpBytesSent | number | Number of HTTP bytes received over the connection | no | | requests | number | Number of requests sent by the client | no | | responses | number | Number of responses sent to the client | no | | ssl | object | Detailed information on ssl connection | no | ## SSL information | Name | Type | Description | Always present | | --- | --- | --- | --- | | clientSubject | string | Client certificate subject | no | | clientNotBefore | string | Client certificate valid from | no | | clientNotAfter | string | Client certificate valid to | no | | sessionId | string | SSL session id | no | | protocol | string | SSL protocol | no | | cipherSuite | string | Name of session cipher suite | no | | sniServerName | string | SNI server name | no | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Vespa Access Log Format](#access-log-format) - [Logging Remote Address/Port](#logging-remote-address-port) - [Configuring Logging](#configuring-logging) - [Logging Request Content](#logging-request-content) - [File name pattern](#file-name-pattern) - [Log rotation](#log-rotation) - [Sequence rotation scheme](#sequence-rotation-scheme) - [Date rotation scheme](#date-rotation-scheme) - [Rotation interval](#rotation-interval) - [Log retention](#log-retention) - [Logging Key/Value pairs to the JSON Access Log from Searchers](#logging-key-value-pairs-to-the-json-access-log-from-searchers) - [Connection log](#connection-log) - [SSL information](#ssl-information) --- # Source: https://docs.vespa.ai/en/operations/self-managed/admin-procedures.html.md # Administrative Procedures ## Install Refer to the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application for a primer on how to set up a cluster - use this as a starting point. Try the [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample app to get familiar with interfaces and behavior. ## Vespa start / stop / restart Start and stop all services on a node: ``` $ $VESPA_HOME/bin/[vespa-start-services](../../reference/operations/self-managed/tools.html#vespa-start-services)$ $VESPA_HOME/bin/[vespa-stop-services](../../reference/operations/self-managed/tools.html#vespa-stop-services) ``` Likewise, for the config server: ``` $ $VESPA_HOME/bin/[vespa-start-configserver](../../reference/operations/self-managed/tools.html#vespa-start-configserver)$ $VESPA_HOME/bin/[vespa-stop-configserver](../../reference/operations/self-managed/tools.html#vespa-stop-configserver) ``` There is no _restart_ command, do a _stop_ then _start_ for a restart. Learn more about which processes / services are started at [Vespa startup](config-sentinel.html), read the [start sequence](configuration-server.html#start-sequence) and find training videos in the vespaengine [YouTube channel](https://www.youtube.com/@vespaai). Use [vespa-sentinel-cmd](../../reference/operations/self-managed/tools.html#vespa-sentinel-cmd) to stop/start individual services. **Important:** Running _vespa-stop-services_ on a content node will call[prepareRestart](../../reference/operations/self-managed/tools.html#vespa-proton-cmd) to optimize restart time, and is the recommended way to stop Vespa on a node. See [multinode](multinode-systems.html#aws-ec2) for _systemd_ /_systemctl_ examples. [Docker containers](docker-containers.html) has relevant start/stop information, too. ### Content node maintenance mode When stopping a content node _temporarily_ (e.g. for a software upgrade), consider manually setting the node into [maintenance mode](../../reference/api/cluster-v2.html#maintenance) _before_ stopping the node to prevent automatic redistribution of data while the node is down. Maintenance mode must be manually removed once the node has come back online. See also: [cluster state](#cluster-state). Example of setting a node with [distribution key](../../reference/applications/services/content.html#node) 42 into `maintenance` mode using [vespa-set-node-state](../../reference/operations/self-managed/tools.html#vespa-set-node-state), additionally supplying a reason that will be recorded by the cluster controller: ``` $ vespa-set-node-state --type storage --index 42 maintenance "rebooting for software upgrade" ``` After the node has come back online, clear maintenance mode by marking the node as `up`: ``` $ vespa-set-node-state --type storage --index 42 up ``` Note that if the above commands are executed _locally_ on the host running the services for node 42, `--index 42` can be omitted; `vespa-set-node-state` will use the distribution key of the local node if no `--index` has been explicitly specified. ## System status - Use [vespa-config-status](../../reference/operations/self-managed/tools.html#vespa-config-status) on a node in [hosts.xml](../../reference/applications/hosts.html) to verify all services run with updated config - Make sure [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) is set and identical on all nodes in hosts.xml - Use the _cluster controller_ status page (below) to track the status of search/storage nodes. - Check [logs](../../reference/operations/log-files.html) - Use performance graphs, System Activity Report (_sar_) or [status pages](#status-pages) to track load - Use [query tracing](../../reference/api/query.html#trace.level) - Disk and/or memory might be exhausted and block feeding - recover from [feed block](/en/writing/feed-block.html) ## Status pages All Vespa services have status pages, for showing health, Vespa version, config, and metrics. Status pages are subject to change at any time - take care when automating. Procedure: 1. **Find the port:** The status pages runs on ports assigned by Vespa. To find status page ports, use [vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect) to list the services run in the application. ``` $ vespa-model-inspect services ``` To find the status page port for a specific node for a specific service, pick the correct service and run: ``` $ vespa-model-inspect service [Options] ``` 2. **Get the status and metrics:**_distributor_, _storagenode_, _searchnode_ and _container-clustercontroller_ are content services with status pages. These ports are tagged HTTP. The cluster controller have multiple ports tagged HTTP, where the port tagged STATE is the one with the status page. Try connecting to the root at the port, or /state/v1/metrics. The _distributor_ and _storagenode_ status pages are available at `/`: ``` $ vespa-model-inspect service searchnode searchnode @ myhost.mydomain.com : search search/search/cluster.search/0 tcp/myhost.mydomain.com:19110 (STATUS ADMIN RTC RPC) tcp/myhost.mydomain.com:19111 (FS4) tcp/myhost.mydomain.com:19112 (TEST HACK SRMP) tcp/myhost.mydomain.com:19113 (ENGINES-PROVIDER RPC)tcp/myhost.mydomain.com:19114 (HEALTH JSON HTTP)$ curl http://myhost.mydomain.com:19114/state/v1/metrics ... $ vespa-model-inspect service distributor distributor @ myhost.mydomain.com : content search/distributor/0 tcp/myhost.mydomain.com:19116 (MESSAGING) tcp/myhost.mydomain.com:19117 (STATUS RPC)tcp/myhost.mydomain.com:19118 (STATE STATUS HTTP)$ curl http://myhost.mydomain.com:19118/state/v1/metrics ... $ curl http://myhost.mydomain.com:19118/ ... ``` 3. **Use the cluster controller status page**: A status page for the cluster controller is available at the status port at `http://hostname:port/clustercontroller-status/v1/`. If _clustername_ is not specified, the available clusters will be listed. The cluster controller leader status page will show if any nodes are operating with differing cluster state versions. It will also show how many data buckets are pending merging (document set reconciliation) due to either missing or being out of sync. ``` $[vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect)service container-clustercontroller | grep HTTP ``` With multiple cluster controllers, look at the one with a "/0" suffix in its config ID; it is the preferred leader. The cluster state version is listed under the _SSV_ table column. Divergence here usually points to host or networking issues. ## Cluster state Cluster and node state information is available through the [/cluster/v2 API](../../reference/api/cluster-v2.html). This API can also be used to set a _user state_ for a node - alternatively use: - [vespa-get-cluster-state](../../reference/operations/self-managed/tools.html#vespa-get-cluster-state) - [vespa-get-node-state](../../reference/operations/self-managed/tools.html#vespa-get-node-state) - [vespa-set-node-state](../../reference/operations/self-managed/tools.html#vespa-set-node-state) Also see the cluster controller [status page](#status-pages). State is persisted in a ZooKeeper cluster, restarting/changing a cluster controller preserves: - Last cluster state version number, for new cluster controller handover at restarts - User states, set by operators - i.e. nodes manually set to down / maintenance In case of state data lost, the cluster state is reset - see [cluster controller](../../content/content-nodes.html#cluster-controller) for implications. ## Cluster controller configuration It is recommended to run cluster controllers on the same hosts as [config servers](configuration-server.html), as they share a zookeeper cluster for state and deploying three nodes is best practise for both. See the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample app for a working example. To configure the cluster controller, use [services.xml](../../reference/applications/services/content.html#cluster-controller) and/or add [configuration](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) under the _services_ element - example: ``` 5000 ``` A broken content node may end up with processes constantly restarting. It may die during initialization due to accessing corrupt files, or it may die when it starts receiving requests of a given type triggering a node local bug. This is bad for distributor nodes, as these restarts create constant ownership transfer between distributors, causing windows where buckets are unavailable. The cluster controller has functionality for detecting such nodes. If a node restarts in a way that is not detected as a controlled shutdown, more than [max\_premature\_crashes](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def), the cluster controller will set the wanted state of this node to be down. Detecting a controlled restart is currently a bit tricky. A controlled restart is typically initiated by sending a TERM signal to the process. Not having any other sign, the content layer has to assume that all TERM signals are the cause of controlled shutdowns. Thus, if the process keep being killed by kernel due to using too much memory, this will look like controlled shutdowns to the content layer. ## Monitor distance to ideal state Refer to the [distribution algorithm](../../content/idealstate.html). Use distributor [status pages](#status-pages) to inspect state metrics, see [metrics](../../content/content-nodes.html#metrics). `idealstate.merge_bucket.pending` is the best metric to track, it is 0 when the cluster is balanced - a non-zero value indicates buckets out of sync. ## Cluster configuration - Running `vespa prepare` will not change served configuration until `vespa activate` is run. `vespa prepare` will warn about all config changes that require restart. - Refer to [schemas](../../basics/schemas.html) for how to add/change/remove these. - Refer to [elasticity](../../content/elasticity.html) for how to add/remove capacity from a Vespa cluster, procedure below. - See [chained components](../../applications/chaining.html) for how to add or remove searchers and document processors. - Refer to the [sizing examples](../../performance/sizing-examples.html) for changing from a _flat_ to _grouped_ content cluster. ## Add or remove a content node 1. **Node setup:** Prepare the node by installing software, set up the file systems/directories and set [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables). [Start](#vespa-start-stop-restart) the node. 2. **Modify configuration:** Add/remove a [node](../../reference/applications/services/content.html#node)-element in _services.xml_ and [hosts.xml](../../reference/applications/hosts.html). Refer to [multinode install](multinode-systems.html). Make sure the _distribution-key_ is unique. 3. **Deploy**: [Observe metrics](#monitor-distance-to-ideal-state) to track progress as the cluster redistributes documents. Use the [cluster controller](../../content/content-nodes.html#cluster-controller) to monitor the state of the cluster. 4. **Tune performance (optional):** Use [maxpendingidealstateoperations](https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/config/stor-distributormanager.def) to tune concurrency of bucket merge operations from distributor nodes. Likewise, tune [merges](../../reference/applications/services/content.html#merges) - concurrent merge operations per content node. The tradeoff is speed of bucket replication vs use of resources, which impacts the applications' regular load. 5. **Finish:** The cluster is done redistributing when `idealstate.merge_bucket.pending` is zero on all distributors. Do not remove more than _redundancy_-1 nodes at a time, to avoid data loss. Observe `idealstate.merge_bucket.pending` to know bucket replica status, when zero on all distributor nodes, it is safe to remove more nodes. If [grouped distribution](../../content/elasticity.html#grouped-distribution) is used to control bucket replicas, remove all nodes in a group if the redundancy settings ensure replicas in each group. To increase bucket redundancy level before taking nodes out, [retire](../../content/content-nodes.html) nodes. Again, track `idealstate.merge_bucket.pending` to know when done. Use the [/cluster/v2 API](../../reference/api/cluster-v2.html) or [vespa-set-node-state](../../reference/operations/self-managed/tools.html#vespa-set-node-state) to set a node to the _retired_ state. The [cluster controller's](../../content/content-nodes.html#cluster-controller) status page lists node states. An alternative to increasing cluster size is building a new cluster, then migrate documents to it. This is supported using [visiting](../../writing/visiting.html). To _merge_ two content clusters, add nodes to the cluster like above, considering: - [distribution-keys](../../reference/applications/services/content.html#node) must be unique. Modify paths like _$VESPA\_HOME/var/db/vespa/search/mycluster/n3_ before adding the node. - Set [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables), then start the node. ## Topology change Read [changing topology first](../../content/elasticity.html#changing-topology), and plan the sequence of steps. Make sure to not change the `distribution-key` for nodes in _services.xml_. It is not required to restart nodes as part of this process ## Add or remove services on a node It is possible to run multiple Vespa services on the same host. If changing the services on a given host, stop Vespa on the given host before running `vespa activate`. This is because the services are dynamically allocated port numbers, depending on what is running on the host. Consider if some of the services changed are used by services on other hosts. In that case, restart services on those hosts too. Procedure: 1. Edit _services.xml_ and _hosts.xml_ 2. Stop Vespa on the nodes that have changes 3. Run `vespa prepare` and `vespa activate` 4. Start Vespa on the nodes that have changes ## Troubleshooting Also see the [FAQ](../../learn/faq). | No endpoint | Most problems with the quick start guides are due to Docker out of memory. Make sure at least 6G memory is allocated to Docker: ``` $ docker info | grep "Total Memory" or $ podman info | grep "memTotal" ``` OOM symptoms include ``` INFO: Problem with Handshake localhost:8080 ssl=false: localhost:8080 failed to respond ``` The container is named _vespa_ in the guides, for a shell do: ``` $ docker exec -it vespa bash ``` | | Log viewing | Use [vespa-logfmt](../../reference/operations/self-managed/tools.html#vespa-logfmt) to view the vespa log - example: ``` $ /opt/vespa/bin/vespa-logfmt -l warning,error ``` | | Json | For json pretty-print, append ``` | python -m json.tool ``` to commands that output json - or use [jq](https://stedolan.github.io/jq/). | | Routing | Vespa lets application set up custom document processing / indexing, with different feed endpoints. Refer to [indexing](../../writing/indexing.html) for how to configure this in _services.xml_. [#13193](https://github.com/vespa-engine/vespa/issues/13193) has a summary of problems and solutions. | | Tracing | Use [tracelevel](../../reference/api/document-v1.html#request-parameters) to dump the routes and hops for a write operation - example: ``` $ curl -H Content-Type:application/json --data-binary @docs.json \ $ENDPOINT/document/v1/mynamespace/doc/docid/1?tracelevel=4 | jq . { "pathId": "/document/v1/mynamespace/doc/docid/1", "id": "id:mynamespace:doc::1", "trace": [ { "message": "[1623413878.905] Sending message (version 7.418.23) from client to ..." }, { "message": "[1623413878.906] Message (type 100004) received at 'default/container.0' ..." }, { "message": "[1623413878.907] Sending message (version 7.418.23) from 'default/container.0' ..." }, { "message": "[1623413878.907] Message (type 100004) received at 'default/container.0' ..." }, { "message": "[1623413878.909] Selecting route" }, { "message": "[1623413878.909] No cluster state cached. Sending to random distributor." } ``` | ## Clean start mode There has been rare occasions were Vespa stored data that was internally inconsistent. For those circumstances it is possible to start the node in a [validate\_and\_sanitize\_docstore](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/proton.def) mode. This will do its best to clean up inconsistent data. However, detecting that this is required is not easy, consult the Vespa Team first. In order for this approach to work, all nodes must be stopped before enabling this feature - this to make sure the data is not redistributed. ## Content cluster configuration | Availability vs resources | Keeping index structures costs resources. Not all replicas of buckets are necessarily searchable, unless configured using [searchable-copies](../../reference/applications/services/content.html#searchable-copies). As Vespa indexes buckets on-demand, the most cost-efficient setting is 1, if one can tolerate temporary coverage loss during node failures. | | Data retention vs size | When a document is removed, the document data is not immediately purged. Instead, _remove-entries_ (tombstones of removed documents) are kept for a configurable amount of time. The default is two weeks, refer to [removed-db prune age](../../reference/applications/services/content.html#removed-db-prune-age). This ensures that removed documents stay removed in a distributed system where nodes change state. Entries are removed periodically after expiry. Hence, if a node comes back up after being down for more than two weeks, removed documents are available again, unless the data on the node is wiped first. A larger _prune age_ will grow the storage size as this keeps document and tombstones longer. **Note:** The backend does not store remove-entries for nonexistent documents. This to prevent clients sending wrong document identifiers from filling a cluster with invalid remove-entries. A side effect is that if a problem has caused all replicas of a bucket to be unavailable, documents in this bucket cannot be marked removed until at least one replica is available again. Documents are written in new bucket replicas while the others are down - if these are removed, then older versions of these will not re-emerge, as the most recent change wins. | | Transition time | See [transition-time](../../reference/applications/services/content.html#transition-time) for tradeoffs for how quickly nodes are set down vs. system stability. | | Removing unstable nodes | One can configure how many times a node is allowed to crash before it will automatically be removed. The crash count is reset if the node has been up or down continuously for more than the [stable state period](../../reference/applications/services/content.html#stable-state-period). If the crash count exceeds [max premature crashes](../../reference/applications/services/content.html#max-premature-crashes), the node will be disabled. Refer to [troubleshooting](#troubleshooting). | | Minimal amount of nodes required to be available | A cluster is typically sized to handle a given load. A given percentage of the cluster resources are required for normal operations, and the remainder is the available resources that can be used if some of the nodes are no longer usable. If the cluster loses enough nodes, it will be overloaded: - Remaining nodes may create disk full situation. This will likely fail a lot of write operations, and if disk is shared with OS, it may also stop the node from functioning. - Partition queues will grow to maximum size. As queues are processed in FIFO order, operations are likely to get long latencies. - Many operations may time out while being processed, causing the operation to be resent, adding more load to the cluster. - When new nodes are added, they cannot serve requests before data is moved to the new nodes from the already overloaded nodes. Moving data puts even more load on the existing nodes, and as moving data is typically not high priority this may never actually happen. To configure what the minimal cluster size is, use [min-distributor-up-ratio](../../reference/applications/services/content.html#min-distributor-up-ratio) and [min-storage-up-ratio](../../reference/applications/services/content.html#min-storage-up-ratio). | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Install](#install) - [Vespa start / stop / restart](#vespa-start-stop-restart) - [Content node maintenance mode](#content-node-maintenance-mode) - [System status](#system-status) - [Status pages](#status-pages) - [Cluster state](#cluster-state) - [Cluster controller configuration](#cluster-controller-configuration) - [Monitor distance to ideal state](#monitor-distance-to-ideal-state) - [Cluster configuration](#cluster-configuration) - [Add or remove a content node](#add-or-remove-a-content-node) - [Topology change](#topology-change) - [Add or remove services on a node](#add-or-remove-services-on-a-node) - [Troubleshooting](#troubleshooting) - [Clean start mode](#clean-start-mode) - [Content cluster configuration](#content-cluster-configuration) --- # Source: https://docs.vespa.ai/en/reference/applications/services/admin.html.md # services.xml - 'admin' Reference documentation for `` in [services.xml](services.html). Find a working example of this configuration in the sample application _multinode-HA_[services.xml](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml). ``` admin [version][adminserver [hostalias]](#adminserver)[cluster-controllers](#cluster-controllers)[cluster-controller [hostalias, baseport, jvm-options, jvm-gc-options]](#cluster-controller)[configservers](#configservers)[configserver [hostalias, baseport]](#configserver)[logserver [jvm-options, jvm-gc-options]](#logserver)[slobroks](#slobroks)[slobrok [hostalias, baseport]](#slobrok)[monitoring [systemname]](#monitoring)[metrics](#metrics)[consumer [id]](#consumer)[metric-set [id]](#metric-set)[metric [id]](#metric)[cloudwatch [region, namespace]](#cloudwatch)[shared-credentials [file, profile]](#shared-credentials)[logging](#logging) ``` | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | version | required | number | | 2.0 | ## adminserver The configured node will be the default administration node in your Vespa system, which means that unless configured otherwise all administrative services - i.e. the log server, the configuration server, the slobrok, and so on - will run on this node. Use [configservers](#configservers), [logserver](#logserver),[slobroks](#slobroks) elements if you need to specify baseport or jvm options for any of these services. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | hostalias | required | string | | | | baseport | optional | number | | | ## cluster-controllers Container for one or more [cluster-controller](#cluster-controller) elements. When having one or more [content](content.html) clusters, configuring at least one cluster controller is required. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | standalone-zookeeper | optional | true/false | false | Will by default share the ZooKeeper instance with configserver. If configured to true a separate ZooKeeper instance will be configured and started on the set of nodes where you run cluster controller on. The set of cluster controllers nodes cannot overlap with the set of nodes where config server is running. If this setting is changed from false to true in a running system, all previous cluster state information will be lost as the underlying ZooKeeper changes. Cluster controllers will re-discover the state, but nodes that have been manually set as down will again be considered to be up. | ## cluster-controller Specifies a host on which to run the [Cluster Controller](../../../content/content-nodes.html#cluster-controller) service. The Cluster Controller manages the state of the cluster in order to provide elasticity and failure detection. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | hostalias | required | string | | | | baseport | optional | number | | | | jvm-options | optional | string | | | ## configservers Container for one or more `configserver` elements. ## configserver Specifies a host on which to run the [Configuration Server](/en/operations/self-managed/configuration-server.html) service. If contained directly below `` you may only have one, so if you need to configure multiple instances of this service, contain them within the [``](#configservers) element. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | hostalias | required | string | | | | baseport | optional | number | | | ## logserver Specifies a host on which to run the [Vespa Log Server](../../operations/log-files.html#log-server) service. If not specified, the logserver is placed on the [adminserver](#adminserver), like in the [example](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml). | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | hostalias | required | string | | | | baseport | optional | number | | | | jvm-options | optional | string | | | | jvm-gc-options | optional | string | | | Example: ``` ``` ``` ``` ## slobroks This is a container for one or more `slobrok` elements. ## slobrok Specifies a host on which to run the [Service Location Broker (slobrok)](/en/operations/self-managed/slobrok.html) service. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | hostalias | required | string | | | | baseport | optional | number | | | ## monitoring Settings for how to pass metrics to a monitoring service - see [monitoring](/en/operations/self-managed/monitoring.html). ``` ``` ``` ``` | systemname | The name of the application in question in the monitoring system, default is "vespa" | ## logging Used for tuning log levels of Java plug-ins. If you (temporarily) need to enable debug logging from some class or package, or if some third-party component is spamming your log with unnecessary INFO level messages, you can turn levels on or off. Example: ``` ``` ``` ``` Note that tuning also affects sub-packages, so the above would also affect all packages with `org.anotherorg.` as prefix. And if there is a `org.myorg.tricky.package.foo.InternalClass` you will get even "spam" level logging from it! The default for `levels` is `"all -debug -spam"` and as seen above you can add and remove specific levels. ## metrics Used for configuring the forwarding of metrics to graphing applications - add `consumer` child elements. Also see [monitoring](/en/operations/self-managed/monitoring.html). Example: ``` ``` ``` ``` ## consumer Configure a metrics consumer. The metrics contained in this element will be exported to the consumer with the given id. `consumer` is a request parameter in [/metrics/v1/values](../../api/metrics-v1.html), [/metrics/v2/values](../../api/metrics-v2.html) and [/prometheus/v1/values](../../api/prometheus-v1.html). Add `metric` and/or `metric-set` children. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | id | required | string | | The name of the consumer to export metrics to. | ## metric-set Include a pre-defined set of metrics to the consumer. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | id | required | string | | The id of the metric set to include. Built-in metric sets are: - `default` - `Vespa` | ## metric Configure a metric. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | id | required | string | | The name of the metric as defined in custom code or in [process metrics api](../../api/state-v1.html#state-v1-metrics) | Note that metric id needs to include the metric specific suffix, e.g. _.average_. In this example, there is one metric added to a custom consumer in addition to the default metric set. Use _&consumer=my-custom-consumer_ parameter for the prometheus endpoint. Also notice the .count suffix, see [process metrics api](../../api/state-v1.html#state-v1-metrics). The per process metrics api endpoint _/state/v1/metrics_ also includes a description of each emitted metric. The _/state/v1/metrics_ endpoint also includes the metric aggregates (.count, .average, .rate, .max). ``` ``` ``` ``` ## cloudwatch Specifies that the metrics from this consumer should be forwarded to CloudWatch. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | region | required | string | | Your AWS region | | namespace | required | string | | The metrics namespace in CloudWatch | Example: ``` ``` ``` ``` ## shared-credentials Specifies that a profile from a shared-credentials file should be used for authentication to CloudWatch. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | file | required | string | | The path to the shared-credentials file | | profile | optional | string | default | The profile in the shared-credentials file | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [adminserver](#adminserver) - [cluster-controllers](#cluster-controllers) - [cluster-controller](#cluster-controller) - [configservers](#configservers) - [configserver](#configserver) - [logserver](#logserver) - [slobroks](#slobroks) - [slobrok](#slobrok) - [monitoring](#monitoring) - [logging](#logging) - [metrics](#metrics) - [consumer](#consumer) - [metric-set](#metric-set) - [metric](#metric) - [cloudwatch](#cloudwatch) - [shared-credentials](#shared-credentials) --- # Source: https://docs.vespa.ai/en/reference/api/api.html.md # Vespa API and interfaces ## Deployment and configuration - [Deploy API](deploy-v2.html): Deploy [application packages](../../basics/applications.html) to configure a Vespa application - [Config API](config-v2.html): Get and Set configuration - [Tenant API](application-v2.html): Configure multiple tenants in the config servers ## Document API - [Reads and writes](../../writing/reads-and-writes.html): APIs and binaries to read and update documents - [/document/v1/](document-v1.html): REST API for operations based on document ID (get, put, remove, update) - [Feeding API](../../clients/vespa-feed-client.html): High performance feeding API, the recommended API for feeding data - [JSON feed format](../schemas/document-json-format.html): The Vespa Document format - [Vespa Java Document API](../../writing/document-api-guide.html) ## Query and grouping - [Query API](../../querying/query-api.html), [Query API reference](query.html) - [Query Language](../../querying/query-language.html), [Query Language reference](../querying/yql.html), [Simple Query Language reference](../querying/simple-query-language.html), [Predicate fields](../../schemas/predicate-fields.html) - [Vespa Query Profiles](../../querying/query-profiles.html) - [Grouping API](../../querying/grouping.html), [Grouping API reference](../querying/grouping-language.html) ## Processing - [Vespa Processing](../../applications/processing.html): Request-Response processing - [Vespa Document Processing](../../applications/document-processors.html): Feed processing ## Request processing - [Searcher API](../../applications/searchers.html) - [Federation API](../../querying/federation.html) - [Web service API](../../applications/web-services.html) ## Result processing - [Custom renderer API](../../applications/result-renderers.html) ## Status and state - [Health and Metric APIs](../../operations/metrics.html) - [/cluster/v2 API](cluster-v2.html) Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Deployment and configuration](#deployment-and-configuration) - [Document API](#document-api) - [Query and grouping](#query-and-grouping) - [Processing](#processing) - [Request processing](#request-processing) - [Result processing](#result-processing) - [Status and state](#status) --- # Source: https://docs.vespa.ai/en/reference/applications/application-packages.html.md # Application package reference This is the [application package](../../basics/applications.html) reference. An application package is the deployment unit in Vespa. To deploy an application, create an application package and [vespa deploy](../../clients/vespa-cli.html#deployment) or use the [deploy API](../api/deploy-v2.html). The application package is a directory of files and subdirectories: | Directory/file | Required | Description | | --- | --- | --- | | [services.xml](services/services.html) | Yes | Describes which services to run where, and their main configuration. | | [hosts.xml](hosts.html) | No | Vespa Cloud: Not used. See node counts in [services.xml](services/services.html). Self-managed: The mapping from logical nodes to actual hosts. | | [deployment.xml](deployment.html) | Yes, for Vespa Cloud | Specifies which environments and regions the application is deployed to during automated application deployment, as which application instances. This file also specifies other deployment-related configurations like [cloud accounts](../../operations/enclave/enclave) and [private endpoints](../../operations/private-endpoints.html). The file is required when deploying to the [prod environment](../../operations/environments.html#prod) - it is ignored (with some exceptions) when deploying to the _dev_ environment. | | [validation-overrides.xml](validation-overrides.html) | No | Override, allowing this package to deploy even if it fails validation. | | [.vespaignore](../../applications/vespaignore.html) | No | Contains a list of path patterns that should be excluded from the `application.zip` deployed to Vespa. | | [models](../ranking/model-files.html)/ | No | Machine-learned models in the application package. Refer to [stateless model evaluation](../../ranking/stateless-model-evaluation.html), [Tensorflow](../../ranking/tensorflow), [Onnx](../../ranking/onnx), [XGBoost](../../ranking/xgboost), and [LightGBM](../../ranking/lightgbm). | | [schemas](../../basics/schemas.html)/ | No | Contains the \*.sd files describing the document types of the application and how they should be queried and processed. | | [schemas/[schema]](../schemas/schemas.html#rank-profile)/ | No | Contains \*.profile files defining [rank profiles](../../basics/ranking.html#rank-profiles). This is an alternative to defining rank profiles inside the schema. | | [security/clients.pem](../../security/guide) | Yes, for Vespa Cloud | PEM encoded X.509 certificates for data plane access. See the [security guide](../../security/guide) for how to generate and use. | | [components](../../applications/components.html)/ | No | Contains \*.jar files containing searcher(s) for the JDisc Container. | | [rules](../querying/semantic-rules.html)/ | No | Contains \*.sr files containing rule bases for semantic recognition and translation of the query | | [search/query-profiles](../querying/query-profiles.html)/ | No | Contains \*.xml files containing a named set of search request parameters with values | | [constants](../../ranking/tensor-user-guide.html#constant-tensors)/ | No | Constant tensors | | [tests](testing.html)/ | No | Test files for automated tests | | ext/ | No | Files that are guaranteed to be ignored by Vespa: They are excluded when processing the application package and cannot be referenced from any other element in it. | Additional files and directories can be placed anywhere in the application package. These will be not be processed explicitly by Vespa when deploying the application package (i.e. they will only be considered if they are referred to from within the application package), but there is no guarantee to how these might be processed in a future release. To extend the application package in a way that is guaranteed to be ignored by Vespa in all future releases, use the _ext/_ directory. ## Deploy | Command | Description | | --- | --- | | upload | Uploads an application package to the config server. Normally not used, as _prepare_ includes _upload_ | | prepare | 1. Verifies that a configuration server is up and running 2. Uploads the application to the configuration server, which stores it in _$VESPA\_HOME/var/db/vespa/config\_server/serverdb/tenants/default/sessions/[sessionid]_. _[sessionid]_ increases for each _prepare_-call. The config server also stores the application in a [ZooKeeper](/en/operations/self-managed/configuration-server.html) instance at _/config/v2/tenants/default/sessions/[sessionid]_ - this distributes the application to all config servers 3. Creates metadata about the deployed the applications package (which user deployed it, which directory was it deployed from and at what time was it deployed) and stores it in _...sessions/[sessionid]/.applicationMetaData_ 4. Verifies that the application package contains the required files and performs a consistency check 5. Validates the xml config files using the [schema](https://github.com/vespa-engine/vespa/tree/master/config-model/src/main/resources/schema), found in _$VESPA\_HOME/share/vespa/schema_ 6. Checks if there are config changes between the active application and this prepared application that require actions like restart or re-feed (like changes to [schemas](../../basics/schemas.html)). These actions are returned as part of the prepare step in the [deployment API](../api/deploy-v2.html#prepare-session). This prevents breaking changes to production - also read about [validation overrides](validation-overrides.html) 7. Distributes constant tensors and bundles with [components](../../applications/components.html) to nodes using [file distribution](/en/applications/deployment.html#file-distribution). Files are downloaded to _$VESPA\_HOME/var/db/vespa/filedistribution_, URL download starts downloading to _$VESPA\_HOME/var/db/vespa/download_ | | activate | 1. Waits for prepare to complete 2. Activates new configuration version 3. Signals to containers to load new bundles - read more in [container components](../../applications/components.html) | | fetch | Use _fetch_ to download the active application package | An application package can be zipped for deployment: ``` $ zip -r ../app.zip . ``` Use any name for the zip file - then refer to the file instead of the path in [deploy](../../clients/vespa-cli.html#deployment) commands. **Important:** Using `tar` / `gzip` is not supported.[Details](https://github.com/vespa-engine/vespa/issues/17837). ## Preprocess directives Use preprocess directives to: - _preprocess:properties_: define properties that one can refer to everywhere in _services.xml_ - _preprocess:include_: split _services.xml_ in smaller chunks Below, _${container.port}_ is replaced by _4099_. The contents of _content.xml_ is placed at the _include_ point. This is applied recursively, one can use preprocess directives in included files, as long as namespaces are defined in the top level file: ``` \ \4099\ \ \ ``` Sample _content.xml_: ``` 1 ``` ## Versioning application packages An application can be given a user-defined version, available at[/ApplicationStatus](../../applications/components.html#monitoring-the-active-application). Configure the version in [services.xml](../applications/services/services.html) (at top level): ``` 42 ... ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/reference/api/application-v2.html.md # /application/v2/tenant API reference This is the /application/v2/tenant API reference with examples for the HTTP REST API to [list](#list-tenants), [create](#create-tenant) and [delete](#delete-tenant) a tenant, which can be used to [deploy](deploy-v2.html) an application. The response format is JSON. The tenant value is "default". The current API version is 2. The API port is 19071 - use [vespa-model-inspect](/en/reference/operations/self-managed/tools.html#vespa-model-inspect) service configserver to find config server hosts - example: `http://myconfigserver.mydomain.com:19071/application/v2/tenant/` ## HTTP requests | HTTP request | application/v2/tenant operation | Description | | --- | --- | --- | | GET | List tenant information. | | | List tenants | ``` /application/v2/tenant/ ``` Example response: ``` ``` [ "default" ] ``` ``` | | | Get tenant | ``` /application/v2/tenant/default ``` Example response: ``` ``` { "message": "Tenant 'default' exists." } ``` ``` | | PUT | Create a new tenant. | | | Create tenant | ``` /application/v2/tenant/default ``` Response: A message with the name of the tenant created - example: ``` ``` { "message" : "Tenant default created." } ``` ``` **Note:** This operation is asynchronous, it will eventually propagate to all config servers. | | DELETE | Delete a tenant. | | | Delete tenant | ``` /application/v2/tenant/default ``` Response: A message with the deleted tenant: ``` ``` { "message" : "Tenant default deleted." } ``` ``` **Note:** This operation is asynchronous, it will eventually propagate to all config servers. | ## Request parameters None. ## HTTP status codes Non-exhaustive list of status codes. Any additional info is included in the body of the return call, JSON-formatted. | Code | Description | | --- | --- | | 400 | Bad request. Client error. The error message should indicate the cause. | | 404 | Not found. For example using a session id that does not exist. | | 405 | Method not implemented. E.g. using GET where only POST or PUT is allowed. | | 500 | Internal server error. Generic error. The error message should indicate the cause. | ## Response format Responses are in JSON format, with the following fields: | Field | Description | | --- | --- | | message | An info/error message. | Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/basics/applications.html.md # Vespa applications You use Vespa by deploying an _application_ to it. Why applications? Because Vespa handles both data and the computations you do over them - together an application. An application is specified by an _application package_ - a directory with some files. The application package contains _everything_ that is needed to run your application: Config, schemas, components, ML models, and so on. The _only_ way to change an application is to make the change in the application package and then deploy it again. Vespa will then safely change the running system to match the new application package revision, without impacting queries, writes, or data. ## A minimal application package You can create a complete application package with just a single file: services.xml. This file specifies the clusters that your application should run. It could just be a single stateless cluster - what's called _container_ - like this: ``` ``` ``` ``` Put this in a file called services.xml, and you have created the world's smallest application package. However, this won't do much, usually you want to have a `content` cluster which can store data, maintain indexes, and run the distributed part of queries. You'll also want your container cluster to load the necessary middleware for this. With that we get a services file like this: ``` ``` 2 ``` ``` This specifies a pretty normal simple Vespa application, but now we need another file: The schema of the document type we'll use. This goes into the directory `schemas/`, so our application package now looks like this: ``` services.xml schemas/myschema.sd ``` The schema file describes a kind of data and the computations (such as ranking/scoring) you want to do over it. At minimum it just lists the fields of that data type and if and each field should be indexed: ``` schema myschema { document myschema { field text type string { indexing: summary | index } field embedding type tensor(x[384]) { indexing: attribute | index } field popularity type double { indexing: summary | attribute } } } ``` With these two files we have specified a fully functional application that can do text, vector and hybrid search with filtering. Rather than creating applications from scratch like this, you can also clone one of our sample applications as a starting point like we did in [getting started](deploy-an-application.html). To read more on schemas, see the [schemas](schemas.html) guide. To see everything an application package can contain, see the[application package reference](../reference/applications/application-packages.html). ## Deploying applications To create running instances of an application, or make the changes to one take effect, you _deploy_ it. Deployments to the dev zone and to self-managed clusters sets up a single instance, while deployments to production can set up multiple instances in one or more regions. To deploy an application package you use the [deploy command](../clients/vespa-cli.html#deployment) in Vespa CLI: ``` ``` $ vespa deploy . ``` ``` This will deploy the application package at the current dir to the current target and the default dev zone (use `vespa deploy -h` to see other options). Deployment to production zones use a separate command: ``` ``` $ vespa prod deploy . ``` ``` Production deployments also require an additional file in the application package to specify where it should be deployed: deployment.xml. See [production deployment](../operations/production-deployment.html). The recommended way to deploy to production is by setting up a continuous deployment job, see[automated deployments](../operations/automated-deployments.html). Deploying a change to an application package is generally safe to do at any time. It does not disrupt queries and writes, and invalid or destructive changes are rejected before taking effect. You can also add tests that verifies the application before deployment to production zones. #### Next: [Schemas](schemas.html) Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/querying/approximate-nn-hnsw.html.md # Approximate nearest neighbor search using HNSW index This document describes how to speed up searches for nearest neighbors in vector spaces by adding[HNSW index](../reference/schemas/schemas.html#index-hnsw) to tensor fields. For an introduction to nearest neighbor search, see [nearest neighbor search](nearest-neighbor-search) documentation, for practical usage of Vespa's nearest neighbor search, see [nearest neighbor search - a practical guide](nearest-neighbor-search-guide), and to have Vespa create vectors for you, see [embedding](../rag/embedding.html). Vespa implements a modified version of the Hierarchical Navigable Small World (HNSW) graph algorithm [paper](https://arxiv.org/abs/1603.09320). The implementation in Vespa supports: - **Filtering** - The search for nearest neighbors can be constrained by query filters. The [nearestNeighbor](../reference/querying/yql.html#nearestneighbor) query operator can be combined with other filters or query terms using the [Vespa query language](query-language.html). See the query examples in the [practical guide](nearest-neighbor-search-guide#combining-approximate-nearest-neighbor-search-with-query-filters). - **Multi-field vector Indexing** - A schema can include multiple indexed tensor fields and search any combination of them in a query. This is useful to support multiple models, multiple text sources, and multi-modal search such as indexing both a textual description and image for the same entity. - **Multi-vector Indexing** - A single document field can contain any number of vector values by defining it as a mixed tensor (a "map of vectors"). Documents will then be retrieved by the closest vector in each document compared to the query vector. See the [Multi-vector indexing sample application](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing)for examples. This is commonly used to [index documents with multiple chunks](../rag/working-with-chunks.html). See also [this blog post](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/#implementation). - **Real Time Indexing** - CRUD (Create, Add, Update, Remove) vectors in the index in true real time. - **Mutable HNSW Graph** - No query or indexing overhead from searching multiple _HNSW_ graphs. In Vespa, there is one graph per tensor field per content node. No segmented or partitioned graph where a query against a content node need to scan multiple HNSW graphs. - **Multithreaded Indexing** - The costly part when performing real time changes to the _HNSW_ graph is distance calculations while searching the graph layers to find which links to change. These distance calculations are performed by multiple indexing threads. - **Multiple value types** - The cost driver of vector search is often storing the vectors in memory, which is required to produce accurate results at low latency. An effective way to reduce cost is to reduce the size of each vector value. Vespa supports double, float, bfloat16, int8 and [single-bit values](../rag/binarizing-vectors.html). Changing from float to bfloat16 can halve cost with negligible impact on accuracy, while single-bit values greatly reduce both memory and cpu costs, and can be effectively combined with larger vector values stored on disk as a paged attribute to be used for ranking. - **Optimized HNSW lookups** - ANN searches in Vespa [support](https://blog.vespa.ai/tweaking-ann-parameters/) both pre-and post-filtering, beam exploration, and filtering before distance calculation ("Acorn 1"). Tuning parameters for these makes it possible to strike a good balance between performance and accuracy for any data set. Vespa's [ANN tuning tool](https://vespa-engine.github.io/pyvespa/examples/ann-parameter-tuning-vespa-cloud.html) can be used to automate the process. ## Using Vespa's approximate nearest neighbor search The query examples in [nearest neighbor search](nearest-neighbor-search) uses exact search, which has perfect accuracy. However, this is computationally expensive for large document volumes as distances are calculated for every document which matches the query filters. To enable fast approximate matching, the tensor field definition needs an `index` directive. A Vespa [document schema](../basics/schemas.html) can declare multiple tensor fields with `HNSW` enabled. ``` field image_embeddings type tensor(i{},x[512]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 100 } } } field text_embedding type tensor(x[384]) { indexing: summary | attribute | index attribute { distance-metric: prenormalized-angular } index { hnsw { max-links-per-node: 24 neighbors-to-explore-at-insert: 200 } } } ``` In the schema snippet above, fast approximate search is enabled by building an `HNSW` index for the`image_embeddings` and the `text_embedding` tensor fields.`image_embeddings` indexes multiple vectors per document, while `text_embedding` indexes one vector per document. The two vector fields use different [distance-metric](../reference/schemas/schemas.html#distance-metric)and `HNSW` index settings: - `max-links-per-node` - a higher value increases recall accuracy, but also memory usage, indexing and search cost. - `neighbors-to-explore-at-insert` - a higher value increases recall accuracy, but also indexing cost. Choosing the value of these parameters affects both accuracy, search performance, memory usage and indexing performance. See [Billion-scale vector search with Vespa - part two](https://blog.vespa.ai/billion-scale-knn-part-two/)for a detailed description of these tradeoffs. See [HNSW index reference](../reference/schemas/schemas.html#index-hnsw) for details on the index parameters. ### Indexing throughput ![Real-time indexing throughput](https://blog.vespa.ai/assets/2022-01-27-billion-scale-knn-part-two/throughput.png) The `HNSW` settings impacts indexing throughput. Higher values of `max-links-per-node` and `neighbors-to-explore-at-insert`reduces indexing throughput. Example from [Billion-scale vector search with Vespa - part two](https://blog.vespa.ai/billion-scale-knn-part-two/). ### Memory usage Higher value of `max-links-per-node` impacts memory usage, higher values means higher memory usage: ![Memory footprint](https://blog.vespa.ai/assets/2022-01-27-billion-scale-knn-part-two/memory.png) ### Accuracy ![Accuracy](https://blog.vespa.ai/assets/2022-01-27-billion-scale-knn-part-two/ann.png) Higher `max-links-per-node` and `neighbors-to-explore-at-insert` improves the quality of the graph and recall accuracy. As the search-time parameter [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) is increased, the lower combination reaches about 70% recall@10, while the higher combination reaches about 92% recall@10. The improvement in accuracy needs to be weighted against the impact on indexing performance and memory usage. ## Using approximate nearest neighbor search With an _HNSW_ index enabled on the tensor field one can choose between approximate or exact (brute-force) search by using the [approximate query annotation](../reference/querying/yql.html#approximate) ``` { "yql": "select * from doc where {targetHits: 100, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)", "hits": 10 "input.query(query_image_embedding)": [0.21,0.12,....], "ranking.profile": "image_similarity" } ``` By default, `approximate` is true when searching a tensor field with `HNSW` index enabled. The `approximate` parameter allows quantifying the accuracy loss of using approximate search. The loss can be calculated by performing an exact neighbor search using `approximate:false` and compare the retrieved documents with `approximate:true` and calculate the overlap@k metric. Note that exact searches over a large vector volume require adjustment of the[query timeout](../reference/api/query.html#timeout). The default [query timeout](../reference/api/query.html#timeout) is 500ms, which will be too low for an exact search over many vectors. In addition to [targetHits](../reference/querying/yql.html#targethits), there is a [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) parameter which controls how many extra nodes in the graph (in addition to `targetHits`) that are explored during the graph search. This parameter is used to tune accuracy quality versus query performance. ## Combining approximate nearest neighbor search with filters The [nearestNeighbor](../reference/querying/yql.html#nearestneighbor) query operator can be combined with other query filters using the [Vespa query language](../reference/querying/yql.html) and its query operators. There are two high-level strategies for combining query filters with approximate nearest neighbor search: - [pre-filtering](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#pre-filtering-strategy) (the default) - [post-filtering](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#post-filtering-strategy) These strategies can be configured in a rank profile using[approximate-threshold](../reference/schemas/schemas.html#approximate-threshold) and[post-filter-threshold](../reference/schemas/schemas.html#post-filter-threshold). See[Controlling the filtering behavior with approximate nearest neighbor search](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#controlling-the-filtering-behavior-with-approximate-nearest-neighbor-search)for more details. Note that when using `pre-filtering` the following query operators are not included when evaluating the filter part of the query: - [geoLocation](../reference/querying/yql.html#geolocation) - [predicate](../reference/querying/yql.html#predicate) These are instead evaluated after the approximate nearest neighbors are retrieved, more like a `post-filter`. This might cause the search to expose fewer hits to ranking than the wanted `targetHits`. Since Vespa 8.78 the `pre-filter` can be evaluated using[multiple threads per query](../performance/practical-search-performance-guide.html#multithreaded-search-and-ranking). This can be used to reduce query latency for larger vector datasets where the cost of evaluating the `pre-filter` is significant. Note that searching the `HNSW` index is always single-threaded per query. Multithreaded evaluation when using `post-filtering` has always been supported, but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `targetHits`. ## Nearest Neighbor Search Considerations - **targetHits**: The [targetHits](../reference/querying/yql.html#targethits)specifies how many hits one wants to expose to [ranking](../basics/ranking.html) _per content node_. Nearest neighbor search is typically used as an efficient retriever in a [phased ranking](../ranking/phased-ranking.html)pipeline. See [performance sizing](../performance/sizing-search.html). - **Pagination**: Pagination uses the standard [hits](../reference/api/query.html#hits) and [offset](../reference/api/query.html#offset) query api parameters. There is no caching of results in between pagination requests, so a query for a higher `offset` will cause the search to be performed over again. This aspect is no different from [sparse search](../ranking/wand.html) not using nearest neighbor query operator. - **Total hit count is not accurate**: Technically, all vectors in the searchable index are neighbors. There is no strict boundary between a match and no match. Both exact (`approximate:false`) and approximate (`approximate:true`) usages of the [nearestNeighbor](../reference/querying/yql.html#nearestneighbor) query operator does not produce an accurate `totalCount`. This is the same behavior as with sparse dynamic pruning search algorithms like[weakAnd](../reference/querying/yql.html#weakand) and [wand](../reference/querying/yql.html#wand). - **Grouping** counts are not accurate: Grouping counts from [grouping](grouping.html) are not accurate when using [nearestNeighbor](../reference/querying/yql.html#nearestneighbor)search. This is the same behavior as with other dynamic pruning search algorithms like[weakAnd](../reference/querying/yql.html#weakand) and[wand](../reference/querying/yql.html#wand). See the [Result diversification](https://blog.vespa.ai/result-diversification-with-vespa/) blog post on how grouping can be combined with nearest neighbor search. ## Scaling Approximate Nearest Neighbor Search ### Memory Vespa tensor fields are [in-memory](../content/attributes.html) data structures and so is the `HNSW` graph data structure. For large vector datasets the primary memory resource usage relates to the raw vector field memory usage. Using lower tensor cell type precision can reduce memory footprint significantly, for example using `bfloat16` instead of `float` saves close to 50% memory usage without significant accuracy loss. Vespa [tensor cell value types](../performance/feature-tuning.html#cell-value-types) include: - `int8` - 1 byte per value. Also used to represent [packed binary values](../rag/binarizing-vectors.html). - `bfloat16` - 2 bytes per value. See [bfloat16 floating-point format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format). - `float` - 4 bytes per value. Standard float. - `double` - 8 bytes per value. Standard double. ### Search latency and document volume The `HNSW` greedy search algorithm is sublinear (close to log(N) where N is the number of vectors in the graph). This has interesting properties when attempting to add more nodes horizontally using [flat data distribution](../performance/sizing-search.html#data-distribution). Even if the document volume per node is reduced by a factor of 10, the search latency is only reduced by 50%. Still, flat scaling helps scale document volume, and increasing indexing throughput as vectors are partitioned randomly over a set of nodes. Pure vector search applications (without filtering, or re-ranking) should attempt to scale up document volume by using larger instance type and maximize the number of vectors per node. To scale with query throughput, use [grouped data distribution](../performance/sizing-search.html#data-distribution) to replicate content. Note that strongly sublinear search is not necessarily true if the application uses nearest neighbor search for candidate retrieval in a [multiphase ranking](../phased-ranking.html) pipeline, or combines nearest neighbor search with filters. ## HNSW Operations Changing the [distance-metric](../reference/schemas/schemas.html#distance-metric)for a tensor field with `hnsw` index requires [restarting](../reference/schemas/schemas.html#changes-that-require-restart-but-not-re-feed), but not re-indexing (re-feed vectors). Similar, changing the `max-links-per-node` and`neighbors-to-explore-at-insert` construction parameters requires re-starting. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Using Vespa's approximate nearest neighbor search](#using-vespas-approximate-nearest-neighbor-search) - [Indexing throughput](#indexing-throughput) - [Memory usage](#memory-usage) - [Accuracy](#accuracy) - [Using approximate nearest neighbor search](#using-approximate-nearest-neighbor-search) - [Combining approximate nearest neighbor search with filters](#combining-approximate-nearest-neighbor-search-with-filters) - [Nearest Neighbor Search Considerations](#nearest-neighbor-search-considerations) - [Scaling Approximate Nearest Neighbor Search](#scaling-approximate-nearest-neighbor-search) - [Memory](#memory) - [Search latency and document volume](#search-latency-and-document-volume) - [HNSW Operations](#hnsw-operations) --- # Source: https://docs.vespa.ai/en/operations/kubernetes/architecture.html.md # Architecture ![Vespa Operator Architecture](/assets/img/vespa-operator-architecture.png) The Vespa Operator is an implementation of the [Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) that extends Kubernetes with custom orchestration capabilities for Vespa. It relies on a [Custom Resource Definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) called a `VespaSet`, which represents a quorum of [ConfigServers](https://docs.vespa.ai/en/operations/self-managed/configuration-server.html) in a Kubernetes namespace. The Vespa Operator is responsible for the deployment and lifecycle of the `VespaSet` resource and its ConfigServers, which collectively entails the infrastructure for Vespa on Kubernetes. [Application Packages](https://docs.vespa.ai/en/basics/applications.html) are deployed to the [ConfigServers](https://docs.vespa.ai/en/operations/self-managed/configuration-server.html) to create Vespa applications. The ConfigServers will dynamically instantiate the services as individual Pods based on the settings provided in the Application Package. After an Application Package is deployed, the ConfigServers will remain responsible for the management and lifecycle of the Vespa application. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/archive/archive-guide-aws.html.md # AWS Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in AWS S3. This guide explains how to access this data. Access to the data must happen through an AWS account controlled by the tenant. Data traffic to access this data is charged to this AWS account. These resources are needed to get started: - An AWS account - An IAM Role in that AWS account - The [AWS command line client](https://aws.amazon.com/cli/) Access is configured through the Vespa Cloud Console in the tenant account screen. Choose the "archive" tab to see the settings below. ## Register IAM Role ![Authorize IAM Role](/assets/img/archive-1-aws.png) First, the IAM Role must be granted access to the S3 buckets in Vespa Cloud. This is done by entering the IAM Role in the setting seen above. Vespa Cloud will then grant access to that role to the S3 buckets. ## Grant access to Vespa Cloud resources ![Allow access to IAM Role](/assets/img/archive-2-aws.png) Second, the IAM Role must be granted access to resources inside Vespa Cloud. AWS requires both permissions to be registered in both Vespa Cloud's AWS account (step 1) and the tenant's AWS account (step 2). Copy the policy from the user interface and attach it to the IAM Role - or make your own equivalent policy should you have other requirements. For more information, see the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html). ## Access files using AWS CLI ![Download files](/assets/img/archive-3-aws.png) Once permissions have been granted, the IAM Role can access the contents of the archive buckets. Any AWS S3 client will work, but the AWS command line client is an easy tool to use. The settings page will list all buckets where data is stored, typically one bucket per zone the tenant has applications. The `--request-payer=requester` parameter is mandatory to make sure network traffic is charged to the correct AWS account. Refer to [access-log-lambda](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/access-log-lambda/README.md)for how to install and use `aws cli`, which can be used to download logs as in the illustration, or e.g. list objects: ``` $ aws s3 ls --profile=archive --request-payer=requester \ s3://vespa-cloud-data-prod.aws-us-east-1c-9eb633/vespa-team/ PRE album-rec-searcher/ PRE cord-19/ PRE vespacloud-docsearch/ ``` In the example above, the S3 bucket name is _vespa-cloud-data-prod.aws-us-east-1c-9eb633_and the tenant name is _vespa-team_ (for that particular prod zone). Archiving is per tenant, and a log file is normally stored with a key like: ``` /vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst ``` The URI to this object is hence: ``` s3://vespa-cloud-data-prod.aws-us-east-1c-9eb633/vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst ``` Objects are exported once generated - access log files are compressed and exported at least once per hour. If you are having problems accessing the files, please run ``` aws sts get-caller-identity ``` to verify that you are correctly assuming the role which has been granted access. ## Lambda processing When processing logs using a lambda function, write a minimal function to list objects, to sort out access / keys / roles: ``` const aws = require("aws-sdk"); const s3 = new aws.S3({ apiVersion: "2006-03-01" }); const findRelevantKeys = ({ Bucket, Prefix }) => { console.log(`Finding relevant keys in bucket ${Bucket}`); return s3 .listObjectsV2({ Bucket: Bucket, Prefix: Prefix, RequestPayer: "requester" }) .promise() .then((res) => res.Contents.map((content) => ({ Bucket, Key: content.Key })) ) .catch((err) => Error(err)); }; exports.handler = async (event, context) => { const options = { Bucket: "vespa-cloud-data-prod.aws-us-east-1c-9eb633", Prefix: "MY-TENANT-NAME/" }; return findRelevantKeys(options) .then((res) => { console.log("response: ", res); return { statusCode: 200 }; }) .catch((err) => ({ statusCode: 500, message: err })); }; ``` Note: Always set `RequestPayer: "requester"` to access the objects - transfer cost is assigned to the requester. Once the above lists the log files from S3, review [access-log-lambda](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/access-log-lambda/README.md)for how to write a function to decompress and handle the log data. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/archive/archive-guide-gcp.html.md # GCP Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in Google Cloud Storage. This guide explains how to access this data. Access to the data is through a GCP project controlled by the tenant. Data traffic to access this data is charged to this GCP project. These resources are needed to get started: - A GCP project - A Google user account - The [gcloud command line interface](https://cloud.google.com/sdk/docs/install) Access is configured through the Vespa Cloud Console in the tenant account screen. Choose the "archive" tab, then "GCP" tab to see the settings below. ## Register IAM principal ![Register IAM principal](/assets/img/archive-1-gcp.png) First, a principal must be granted access to the Cloud Storage bucket in Vespa Cloud. This is done by entering a [principal](https://cloud.google.com/iam/docs/overview) with a supported prefix. See the accepted format in the description below the input field. ## Access files using Gcloud CLI ![Download files](/assets/img/archive-2-gcp.png) Once permissions have been granted, the GPC member can access the contents of the archive buckets. Any Cloud Storage client will work, but the `gsutil` command line client is an easy tool to use. The settings page will list all buckets where data is stored, typically one bucket per zone the tenant has applications. The `-u user-project` parameter is mandatory to make sure network traffic is charged to the correct GCP project. ``` $ gsutil -u my-project ls \ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/album-rec-searcher/ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/cord-19/ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/ ``` In the example above, the bucket name is _vespa-cloud-data-prod.gcp-us-central1-f-73770f_and the tenant name is _vespa-team_ (for that particular prod zone). Archiving is per tenant, and a log file is normally stored with a key like: ``` /vespa-team/vespacloud-docsearch/default/h7644a/logs/access/JsonAccessLog.20221011080000.zst ``` The URI to this object is hence: ``` gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst ``` Objects are exported once generated - access log files are compressed and exported at least once per hour. Note: Always set a user project to access the objects - transfer cost is assigned to the requester. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/archive/archive-guide.html.md # Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to storage buckets. The bucket system used will depend on which cloud provider is backing the zone your application is running in. AWS S3 will be used in the AWS zones, and Cloud Storage will be used in the GCP zones. How to access and use the storage buckets is found in the documentation for the respective cloud providers: - [AWS S3](archive-guide-aws) - [Google Cloud Storage](archive-guide-gcp) ## Examples These examples use GCP as source, replace with AWS commands as needed. Here, _resonant-triode-123456_ is the Google project ID that owns the target bucket _my\_access\_logs_ for data copy (and will get the data download cost, if any). Use the CLUSTERS view in the Vespa Cloud Console to find hostname(s) for the nodes to export logs from - then list contents: ``` $ gsutil -u resonant-triode-123456 ls \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/ $ gsutil -u resonant-triode-123456 ls \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/myinstance $ gsutil -u resonant-triode-123456 ls \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/myinstance/h404a/logs/access ``` Copy files for a host to the _my\_access\_logs_ bucket: ``` $ gsutil -u resonant-triode-123456 \ -m -o "GSUtil:parallel_process_count=1" \ cp -r \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \ gs://my_access_logs/vespa-files ``` `rsync` can be used to reduce number of files copied, using `-x` to exclude paths: ``` $ gsutil -u resonant-triode-123456 \ -m -o "GSUtil:parallel_process_count=1" \ rsync -r \ -x '.*/connection/.*|.*/vespa/.*|.*/zookeeper/.*' \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \ gs://my_access_logs/vespa-files ``` Refer to [cloud-functions](https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions)and [lambda](https://github.com/vespa-engine/sample-apps/tree/master/examples/aws/lambda)for how to write and deploy simple functions to process files in Google Cloud and AWS. For local processing, copy files for a host to local file system (or use `rsync`): ``` $ gsutil -u resonant-triode-123456 \ -m -o "GSUtil:parallel_process_count=1" \ cp -r \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \ . ``` Use [zstd](https://facebook.github.io/zstd/) to decompress files: ``` $ zstd -d * ``` Example: Filter out healthchecks using [jq](https://stedolan.github.io/jq/): ``` $ cat JsonAccessLog.20230117* | jq '. | select (.uri != "/status.html") | select (.uri != "/state/v1/metrics") | select (.uri != "/state/v1/health")' ``` Add a human-readable date field per access log entry: ``` $ cat JsonAccessLog.20230117* | jq '. | select (.uri != "/status.html") | select (.uri != "/state/v1/metrics") | select (.uri != "/state/v1/health") | . +{iso8601date:(.time|todateiso8601)}' ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/enclave/archive.html.md # Log archive in Vespa Cloud Enclave **Warning:** The structure of log archive buckets may change without notice After Vespa Cloud Enclave is established in your cloud provider account using Terraform, the module will have created a storage bucket per Vespa Cloud zone you configured in your enclave. These storage buckets are used to archive logs from the machines that run Vespa inside your account. There will be one storage bucket per Vespa Cloud Zone that is configured in the enclave. The name of the bucket will depend on the cloud provider you are setting up the enclave in. Files are synchronized to the archive bucket when the file is rotated by the logging system, or when a virtual machine is deprovisioned from the application. The consequence of this is that frequency of uploads will depend on the activity of the Vespa application. ## Directory structure The directory structure in the bucket is as follows: ``` ////logs// ``` - `tenant` is the tenant ID. - `application` is the application ID that generated the log. - `instance` is the instance ID of the generated log, e.g. `default`. - `host` is the name prefix of the host that generated the log, e.g. `e103a`. - `logtype` is the type of log in the directory (see below). - `logfile` is the specific file of the log. ## Log types There are three log types that are synced to this bucket. - `vespa`: [Vespa logs](../../reference/operations/log-files.html) - `access`: [Access logs](../access-logging.html) - `connection`: [Connection logs](../access-logging.html#connection-log) Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/content/attributes.html.md # Attributes An _attribute_ is a [schema](../reference/schemas/schemas.html#attribute) keyword, specifying the indexing for a field: ``` field price type int { indexing: attribute } ``` Attribute properties and use cases: - Flexible [match modes](../reference/schemas/schemas.html#match) including exact match, prefix match, and case-sensitive matching, but not text matching (tokenization and linguistic processing). - High sustained update rates (avoiding read-apply-write patterns). Any mutating operation against an attribute field is written to Vespa's [transaction log](proton.html#transaction-log) and persisted, but appending to the log is sequential access, not random. Read more in [partial updates](../writing/partial-updates.html). - Instant query updates - values are immediately searchable. - [Document Summaries](../querying/document-summaries.html) are memory-only operations if all fields are attributes. - [Numerical range queries](../reference/querying/yql.html#numeric). ``` where price > 100 ``` - [Grouping](../querying/grouping.html) - aggregate results into groups - it is also great for generating diversity in results. ``` all(group(customer) each(max(3) each(output(summary())))) ``` - [Ranking](../basics/ranking.html) - use attribute values directly in rank functions. ``` rank-profile rank_fresh { first-phase { expression { freshness(timestamp) } } } ``` - [Sorting](../reference/querying/sorting-language.html) - order results by attribute value. ``` order by price asc, release_date desc ``` - [Parent/child](../schemas/parent-child.html) - import attribute values from global parent documents. ``` import field advertiser_ref.name as advertiser_name {} ``` The other field option is _index_ - use [index](proton.html#index) for fields used for [text search](../querying/text-matching.html), with [stemming](../linguistics/linguistics-opennlp.html#stemming) and [normalization](../linguistics/linguistics-opennlp.html#normalization). An attribute is an in-memory data structure. Attributes speed up query execution and [document updates](../writing/partial-updates.html), trading off memory. As data structures are regularly optimized, consider both static and temporary resource usage - see [attribute memory usage](#attribute-memory-usage) below. Use attributes in document summaries to limit access to storage to generate result sets. ![Attribute is an in-memory data structure](/assets/img/attributes-update.svg) Configuration overview: | fast-search | Also see the [reference](../reference/schemas/schemas.html#attribute). Add an [index structure](#index-structures) to improve query performance: ``` field titles type array { indexing : summary | attribute attribute: fast-search } ``` | | fast-access | For high-throughput updates, all nodes with a replica should have the attribute loaded in memory. Depending on replication factor and other configuration, this is not always the case. Use [fast-access](../reference/schemas/schemas.html#attribute) to increase feed rates by having replicas on all nodes in memory - see the [reference](../reference/schemas/schemas.html#attribute) and [sizing feeding](../performance/sizing-feeding.html). ``` field titles type array { indexing : summary | attribute attribute: fast-access } ``` | | distance-metric | Features like [nearest neighbor search](../querying/nearest-neighbor-search) require a [distance-metric](../reference/schemas/schemas.html#distance-metric), and can also have an `hsnw index` to speed up queries. Read more in [approximate nearest neighbor](../querying/approximate-nn-hnsw). Pay attention to the field's `index` setting to enable the index: ``` field image_sift_encoding type tensor(x[128]) { indexing: summary | attribute |indexattribute { distance-metric: euclidean }index{ hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 500 } } } ``` | ## Data structures The attribute field's data type decides which data structures are used by the attribute to store values for that field across all documents on a content node. For some data types, a combination of data structures is used: - _Attribute Multivalue Mapping_ stores arrays of values for array and weighed set types. - _Attribute Enum Store_ stores unique strings for all string attributes and unique values for attributes with [fast-search](attributes.html#fast-search). - _Attribute Tensor Store_ stores tensor values for all tensor attributes. In the following illustration, a row represents a document, while a named column represents an attribute. ![Attribute in-memory stores](/assets/img/attributes.svg) Attributes can be: | Type | Size | Description | | --- | --- | --- | | Single-valued | Fixed | Like the "A" attribute, example `int`. The element size is the size of the type, like 4 bytes for an integer. A memory buffer (indexed by Local ID) holds all values directly. | | Multi-valued | Fixed | Like the "B" attribute, example `array`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Multivalue Mapping_ the arrays are stored. The _Multivalue Mapping_ consists of multiple memory buffers, where arrays of the same size are co-located in the same buffer. | | Multi-valued | Variable | Like the "B" attribute, example `array`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Multivalue Mapping_ the arrays are stored. The unique strings are stored in the _Enum Store_, and the arrays in the _Multivalue Mapping_ stores the references (32 bit) to the strings in the _Enum Store_. The _Enum Store_ consists of multiple memory buffers. | | Single-valued | Variable | Like the "C" attribute, example `string`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Enum Store_ the strings are stored. | | Tensor | Fixed / Variable | Like the "D" attribute, example `tensor(x{},y[64])`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Tensor Store_ the tensor values are stored. The memory layout in the _Tensor Store_ depends on the tensor type. | The "A", "B", "C" and "D" attribute memory buffers have attribute values or references in Local ID (LID) order - see [document meta store](#document-meta-store). When updating an attribute, the full value is written. This also applies to [multivalue](../basics/schemas.html#document-fields) fields - example adding an item to an array: 1. Space for the new array is reserved in a memory buffer 2. The current value is copied 3. The new element is written This means that larger fields will copy more data at updates. It also implies that updates to [weighted sets](../reference/schemas/schemas.html#weightedset) are faster when using numeric keys (less memory and easier comparisons). Data stored in the _Multivalue Mapping_, _Enum Store_ and _Tensor Store_ is referenced using 32 bit references. This address space can go full, and then feeding is blocked - [learn more](../writing/feed-block.html). For array or weighted set attributes, the max limit on the number of documents that can have the same number of values is approx 2 billion per node. For string attributes or attributes with [fast-search](attributes.html#fast-search), the max limit on the number of unique values is approx 2 billion per node. ## Index structures Without `fast-search`, attribute access is a memory lookup, being one value or all values, depending on query execution. An attribute is a linear array-like data structure - matching documents potentially means scanning _all_ attribute values. Setting [fast-search](../reference/schemas/schemas.html#attribute) creates an index structure for quicker lookup and search. This consists of a [dictionary](../reference/schemas/schemas.html#dictionary) pointing to posting lists. This uses more memory, and also more CPU when updating documents. It increases steady state memory usage for all attribute types and also add initialization overhead for numeric types. The default dictionary is a b-tree of attribute _values_, pointing to an _occurrence_ b-tree (posting list) of local doc IDs for each value, exemplified in the A-attribute below. Using `dictionary: hash` on the attribute generates a hash table of attributes values pointing to the posting lists, as in the C-attribute (short posting lists are represented as arrays instead of b-trees): ![Attribute index structures](/assets/img/attributes-indexes.svg) Notes: - If a value occurs in many documents, the _occurrence_ b-tree grows large. For such values, a boolean-occurrence list (i.e. bitvector) is generated in addition to the b-tree. - Setting `fast-search` is not observable in the files on disk, other than size. - `fast-search` causes a memory increase even for empty fields, due to the extra index structures created. E.g. single value fields will have the "undefined value" when empty, and there is a posting list for this value. - The _value_ b-tree enables fast range-searches in numerical attributes. This is also available for `hash`-based dictionaries, but slower as a full scan is needed. Using `fast-search` has many implications, read more in [when to use fast-search](../performance/feature-tuning.html#when-to-use-fast-search-for-attribute-fields). ## Attribute memory usage Attribute structures are regularly optimized, and this causes temporary resource usage - read more in [maintenance jobs](proton.html#proton-maintenance-jobs). The memory footprint of an attribute depends on a few factors, data type being the most important: - Numeric (int, long, byte, and double) and Boolean (bit) types - fixed length and fix cost per document - String type - the footprint depends on the length of the strings and how many unique strings that needs to be stored. Collection types like array and weighted sets increases the memory usage some, but the main factor is the average number of values per document. String attributes are typically the largest attributes, and requires most memory during initialization - use boolean/numeric types where possible. Example, refer to formulas below: ``` schema foo { document bar { field titles type array { indexing: summary | attribute } } } ``` - Assume average 10 values per document, average string length 15, 100k unique strings and 20M documents. - Steady state memory usage is approx 1 GB (20M\*4\*(6/5) + 20M\*10\*4\*(6/5) + 100k\*(15+1+4+4)\*(6/5)). - During initialization (loading attribute from disk) an additional 2.4 GB is allocated (20M\*10\*(4+4+4), for each value: - local document id - enum value - weight - Increasing the average number of values per document to 20 (double) will also double the memory footprint during initialization (4.8 GB). When doing the capacity planning, keep in mind the maximum footprint, which occurs during initialization. For the steady state footprint, the number of unique values is important for string attributes. Check the [Example attribute sizing spreadsheet](../../assets/attribute-memory-Vespa.xls), with various data types and collection types. It also contains estimates for how many documents a 48 GB RAM node can hold, taking initialization into account. [Multivalue](../basics/schemas.html#document-fields) attributes use an adaptive approach in how data is stored in memory, and up to 2 billion documents per node is supported. **Pro-tip:** The proton _/state/v1/_ interface can be explored for attribute memory usage. This is an undocumented debug-interface, subject to change at any moment - example: _http://localhost:19110/state/v1/custom/component/documentdb/music/subdb/ready/attribute/artist_ ## Attribute file usage Attribute data is stored in two locations on disk: - The attribute store in memory, which is regularly flushed to disk. At startup, the flushed files are used to quickly populate the memory structures, resulting in a much quicker startup compared to generating the attribute store from the source in the document store. - The document store on disk. Documents here are used to (re)generate index structures, as well as being the source for replica generation across nodes. The different field types use various data types for storage, see below, a conservative rule of thumb for steady-state disk usage is hence twice the data size. ## Sizing Attribute sizing is not an exact science but rather an approximation. The reason is that they vary in size. Both the number of documents, number of values, and uniqueness of the values are variable. The components of the attributes that occupy memory are: | Abbreviation | Concept | Comment | | --- | --- | --- | | D | Number of documents | Number of documents on the node, or rather the maximum number of local document ids allocated | | V | Average number of values per document | Only applicable for arrays and weighted sets | | U | Number of unique values | Only applies for strings or if [fast-search](../reference/schemas/schemas.html#attribute) is set | | FW | Fixed data width | sizeof(T) for numerics, 1 byte for strings, 1 bit for boolean | | WW | Weight width | Width of the weight in a weighted set, 4 bytes. 0 bytes for arrays. | | EIW | Enum index width | Width of the index into the enum store, 4 bytes. Used by all strings and other attributes if [fast-search](../reference/schemas/schemas.html#attribute) is set | | VW | Variable data width | strlen(s) for strings, 0 bytes for the rest | | PW | Posting entry width | Width of a posting list entry, 4 bytes for singlevalue, 8 bytes for array and weighted sets. Only applies if [fast-search](../reference/schemas/schemas.html#attribute) is set. | | PIW | Posting index width | Width of the index into the store of posting lists; 4 bytes | | MIW | Multivalue index width | Width of the index into the multivalue mapping; 4 bytes | | ROF | Resize overhead factor | Default is 6/5. This is the average overhead in any dynamic vector due to resizing strategy. Resize strategy is 50% indicating that structure is 5/6 full on average. | ### Components | Component | Formula | Approx Factor | Applies to | | --- | --- | --- | --- | | Document vector | D \* ((FW or EIW) or MIW) | ROF | FW for singlevalue numeric attributes and MIW for multivalue attributes. EIW for singlevalue string or if the attribute is singlevalue fast-search | | Multivalue mapping | D \* V \* ((FW or EIW) + WW) | ROF | Applicable only for array or weighted sets. EIW if string or fast-search | | Enum store | U \* ((FW + VW) + 4 + ((EIW + PIW) or EIW)) | ROF | Applicable for strings or if fast-search is set. (EIW + PIW) if fast-search is set, EIW otherwise. | | Posting list | D \* V \* PW | ROF | Applicable if fast-search is set | ### Variants | Type | Components | Formula | | --- | --- | --- | | Numeric singlevalue plain | Document vector | D \* FW \* ROF | | Numeric multivalue value plain | Document vector, Multivalue mapping | D \* MIW \* ROF + D \* V \* (FW+WW) \* ROF | | Numeric singlevalue fast-search | Document vector, Enum store, Posting List | D \* EIW \* ROF + U \* (FW+4+EIW+PIW) \* ROF + D \* PW \* ROF | | Numeric multivalue value fast-search | Document vector, Multivalue mapping, Enum store, Posting List | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+4+EIW+PIW) \* ROF + D \* V \* PW \* ROF | | Singlevalue string plain | Document vector, Enum store | D \* EIW \* ROF + U \* (FW+VW+4+EIW) \* ROF | | Singlevalue string fast-search | Document vector, Enum store, Posting List | D \* EIW \* ROF + U \* (FW+VW+4+EIW+PIW) \* ROF + D \* PW \* ROF | | Multivalue string plain | Document vector, Multivalue mapping, Enum store | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+VW+4+EIW) \* ROF | | Multivalue string fast-search | Document vector, Multivalue mapping, Enum store, Posting list | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+VW+4+EIW+PIW) \* ROF + D \* V \* PW \* ROF | | Boolean singlevalue | Document vector | D \* FW \* ROF | ## Paged attributes Regular attribute fields are guaranteed to be in-memory, while the [paged](../reference/schemas/schemas.html#attribute) attribute setting allows paging the attribute data out of memory to disk. The `paged` setting is _not_ supported for the following types: - [tensor](../reference/schemas/schemas.html#tensor) with [fast-rank](../reference/schemas/schemas.html#attribute). - [predicate](../reference/schemas/schemas.html#predicate). For attribute fields using [fast-search](../reference/schemas/schemas.html#attribute), the memory needed for dictionary and index structures are never paged out to disk. Using the `paged` setting for attributes is an alternative when there are memory resource constraints and the attribute data is only accessed by a limited number of hits per query during ranking. E.g. a dense tensor attribute which is only used during a [re-ranking phase](../ranking/phased-ranking.html), where the number of attribute accesses are limited by the re-ranking phase count. For example using a second phase [rerank-count](../reference/schemas/schemas.html#secondphase-rerank-count) of 100 will limit the maximum number of page-ins/disk access per query to 100. Running at 100 QPS would need up to 10K disk accesses per second. This is the worst case if none of the accessed attribute data were paged into memory already. This depends on access locality and memory pressure (size of the attribute data versus available memory). In this example, we have a dense tensor with 1024 [int8](../reference/ranking/tensor.html#tensor-type-spec) values. The tensor attribute is only accessed during re-ranking (second-phase ranking expression): ``` schema foo { document foo { field tensordata type tensor(x[1024]) { indexing: attribute attribute: paged } } rank-profile foo { first-phase {} second-phase { rerank-count: 100 expression: sum(attribute(tensordata)) } } } ``` For some use cases where serving latency SLA is not strict and query throughput is low, the `paged` attribute setting might be a tuning alternative, as it allows storing more data per node. ### Paged attributes disadvantages The disadvantages of using _paged_ attributes are many: - Unpredictable query latency as attribute access might touch disk. Limited queries per second throughput per node (depends on the locality of document re-ranking requests). - Paged attributes are implemented by file-backed memory mappings. The performance depends on the [Linux virtual memory management](https://tldp.org/LDP/tlk/mm/memory.html) ability to page data in and out. Using many threads per search/high query throughput might cause high system (kernel) CPU and system unresponsiveness. - The content node's total memory utilization will be close to 100% when using paged attributes. It's up to the Linux kernel to determine what part of the attribute data is paged into memory based on access patterns. A good understanding of how the Linux virtual memory management system works is recommended before enabling paged attributes. - The[memory usage metrics](/en/performance/sizing-search.html#metrics-for-vespa-sizing)from content nodes are not reflecting the reality when using paged attributes. They can indicate a usage that is much higher than the available memory on the node. This is because attribute memory usage is reported as the amount of data contained in the attribute, and whether this data is paged out to disk is controlled by the Linux kernel. - Using paged attributes doubles the disk usage of attribute data. For example if the original attribute size is 92 GB (100M documents of the above 1024 int8 per document schema), using the `paged` setting will double the attribute disk usage to close to 200 GB. - Changing the `paged` setting (e.g. removing the option) on a running system might cause hard out-of-memory situations as without `paged`, the content nodes will attempt loading the attribute into memory without the option for page outs. - Using a paged attribute in [first-phase](../ranking/phased-ranking.html) ranking can result in extremely high query latency if a large amount of the corpus is retrieved by the query. The number of disk accesses will, in the worst case, be equal to the number of hits the query produces. A similar problem can occur if running a query that searches a paged attribute. - Using `paged` in combination with [HNSW indexing](../querying/approximate-nn-hnsw) is _strongly_ discouraged._HNSW_ indexing also searches and reads tensors during indexing, causing random access during feeding. Once the system memory usage reaches 100%, the Linux kernel will start paging pages in and out of memory. This can cause a high system (kernel) CPU and slows down HNSW indexing throughput significantly. ## Mutable attributes [Mutable attributes](../reference/schemas/schemas.html#mutate) is document metadata for matching and ranking performance per document. The attribute values are mutated as part of query execution, as defined in rank profiles - see [rank phase statistics](../ranking/phased-ranking.html#rank-phase-statistics) for details. ## Document meta store The document meta store is an in-memory data structure for all documents on a node. It is an _implicit attribute_, and is [compacted](proton.html#lid-space-compaction) and [flushed](proton.html#attribute-flush). Memory usage for applications with small documents / no other attributes can be dominated by this attribute. The document meta store scales linearly with number of documents - using approximately 30 bytes per document. The metric _content.proton.documentdb.ready.attribute.memory\_usage.allocated\_bytes_ for `"field": "[documentmetastore]"` is the size of the document meta store in memory - use the [metric API](../reference/api/state-v1.html#state-v1-metrics) to find the size - in this example, the node has 9M ready documents with 52 bytes in memory per document: ``` { "name": "content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes", "description": "The number of allocated bytes", "values": { "average": 4.69736008E8, "count": 12, "rate": 0.2, "min": 469736008, "max": 469736008,"last": 469736008}, "dimensions": { "documenttype": "doctype","field": "[documentmetastore]"} }, ``` The above is for the _ready_ documents, also check _removed_ and _notready_ - refer to [sub-databases](proton.html#sub-databases). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Data structures](#data-structures) - [Index structures](#index-structures) - [Attribute memory usage](#attribute-memory-usage) - [Attribute file usage](#attribute-file-usage) - [Sizing](#sizing) - [Components](#components) - [Variants](#variants) - [Paged attributes](#paged-attributes) - [Paged attributes disadvantages](#paged-attributes-disadvantages) - [Mutable attributes](#mutable-attributes) - [Document meta store](#document-meta-store) --- # Source: https://docs.vespa.ai/en/operations/automated-deployments.html.md # Automated Deployments ![Picture of an automated deployment](/assets/img/automated-deployments-overview.png) Vespa Cloud provides: - A [CD test framework](#cd-tests) for safe deployments to production zones. - [Multi-zone deployments](#deployment-orchestration) with orchestration and test steps. This guide goes through details of an orchestrated deployment. Read / try [production deployment](production-deployment.html) first to have a baseline understanding. The [developer guide](../applications/developer-guide.html) is useful for writing tests. Use [example GitHub Actions](#automating-with-github-actions) for automation. ## CD tests Before deployment in production zones, [system tests](#system-tests) and [staging tests](#staging-tests) are run. Tests are run in a dedicated and [downsized](environments.html) environment. These tests are optional, see details in the sections below. Status and logs of ongoing tests can be found in the _Deployment_ view in the [Vespa Cloud Console](https://console.vespa-cloud.com/): ![Minimal deployment pipeline](/assets/img/deployment-with-system-test.png) These tests are also run during [Vespa Cloud upgrades](#vespa-cloud-upgrades). Find deployable example applications in [CI-CD](https://github.com/vespa-cloud/examples/tree/main/CI-CD). ### System tests When a system test is run, the application is deployed in the [test environment](environments.html#test). The system test suite is then run against the endpoints of the test deployment. The test deployment is empty when the test execution begins. The application package and Vespa platform version is the same as that to be deployed to production. A test suite includes at least one [system test](../applications/testing.html#system-tests). An application can be deployed to a production zone without system tests - this step will then only test that the application starts successfully. See [production deployment](production-deployment.html) for an example without tests. Read more about [system tests](../applications/testing.html#system-tests). ### Staging tests A staging test verifies the transition of a deployment of a new application package - i.e., from application package `Appold` to `Appnew`. A test suite includes at least one [staging setup](../applications/testing.html#staging-tests), and [staging test](../applications/testing.html#staging-tests). 1. All production zone deployments are polled for the current versions. As there can be multiple versions already being deployed (i.e. multiple `Appold`), there can be a series of staging test runs. 2. The application at revision `Appold` is deployed in the [staging environment](environments.html#staging). 3. The staging setup test code is run, typically making the cluster reasonably similar to a production cluster. 4. The test deployment is then upgraded to application revision `Appnew`. 5. Finally, the staging test code is run, to verify the deployment works as expected after the upgrade. An application can be deployed to a production zone without staging tests - this step will then only test that the application starts successfully before and after the change. See [production deployment](production-deployment.html) for an example without tests. Read more about [staging tests](../applications/testing.html#staging-tests). ### Disabling tests To deploy without testing, remove the test files from the application package. Tests are always run, regardless of _deployment.xml_. To temporarily deploy without testing, run `deploy` and hit the "Abort" button (see illustration above, hover over the test step in the Console) - this skips the test step and makes the orchestration progress to the next step. ### Running tests only To run a system test, without deploying to any nodes after, add a new test instance. In _deployment.xml_, add the instance without `dev` or`prod` elements, like: ``` ``` ... ``` ``` Note that this will leave an empty instance in the console, as the deployment is for testing only, so no resources deployed to after test. Make sure to run `vespa prod deploy` to invoke the pipeline for testing, and use a separate application for this test. ## Deployment orchestration The _deployment orchestration_ is flexible. One can configure dependencies between deployments to production zones, production verification tests, and configured delays; by ordering these in parallel and serial blocks of steps: ![Picture of a complex automated deployment](/assets/img/automated-deployments-complex.png) On a higher level, instances can also depend on each other in the same way. This makes it easy to configure a deployment process which gradually rolls out changes to increasingly larger subsets of production nodes, as confidence grows with successful production verification tests. Refer to [deployment.xml](../reference/applications/deployment.html) for details. Deployments run sequentially by default, but can be configured to [run in parallel](../reference/applications/deployment.html). Inside each zone, Vespa Cloud orchestrates the deployment, such that the change is applied without disruption to read or write traffic against the application. A production deployment in a zone is complete when the new configuration is active on all nodes. Most changes are instant, making this a quick process. If node restarts are needed, e.g., during platform upgrades, these will happen automatically and safely as part of the deployment. When this is necessary, deployments will take longer to complete. System and staging tests, if present, must always be successfully run before the application package is deployed to production zones. ### Source code repository integration Each new _submission_ is assigned an increasing build number, which can be used to track the roll-out of the new package to the instances and their zones. With the submission, add a source code repository reference for easy integration - this makes it easy to track changes: ![Build numbers and source code repository reference](/assets/img/CI-integration.png) Add the source diff link to the pull request - see example [GitHub Action](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/.github/workflows/deploy-vespa-documentation-search.yaml): ``` $ vespa prod deploy \ --source-url "$(git config --get remote.origin.url | sed 's+git@\(.*\):\(.*\)\.git+https://\1/\2+')/commit/$(git rev-parse HEAD)" ``` ### Block-windows Use block-windows to block deployments during certain windows throughout the week, e.g., avoid rolling out changes during peak hours / during vacations. Hover over the instance (here "default") to find block status - see [block-change](../reference/applications/deployment.html#block-change): ![Application block window](/assets/img/block-window.png) ### Validation overrides Some configuration changes are potentially destructive / change the application behavior - examples are removing fields and changing linguistic processing. These changes are disallowed by default, the deploy-command will fail. To override and force a deploy, use a [validation override](../reference/applications/validation-overrides.html): ``` ``` tensor-type-change ``` ``` ### Production tests Production tests are optional and configured in [deployment.xml](../reference/applications/deployment.html). Production tests do not have access to the Vespa endpoints, for security reasons. Dependent steps in the release pipeline will stop if the tests fail, but upgraded regions will remain on the version where the test failed. A production test is hence used to block deployments to subsequent zones and only makes sense in a multi-zone deployment. ### Deploying Components Vespa is [backwards compatible](../learn/releases.html#versions) within major versions, and major versions rarely change. This means that [Components](../applications/components.html) compiled against an older version of Vespa APIs can always be run on the same major version. However, if the application package is compiled against a newer API version, and then deployed to an older runtime version in production, it might fail. See [vespa:compileVersion](production-deployment.html#production-deployment-with-components) for how to solve this. ## Automating with GitHub Actions Auto-deploy production applications using GitHub Actions - examples: - [deploy-vector-search.yaml](https://github.com/vespa-cloud/vector-search/blob/main/.github/workflows/deploy-vector-search.yaml) deploys an application to a production environment - a good example to start from! - [deploy.yaml](https://github.com/vespa-cloud/examples/blob/main/.github/workflows/deploy.yaml) deploys an application with basic HTTP tests. - [deploy-vespa-documentation-search.yaml](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/.github/workflows/deploy-vespa-documentation-search.yaml) deploys an application with Java-tests. The automation scripts use an API-KEY to deploy: ``` $ vespa auth api-key ``` This creates a key, or outputs: ``` Error: refusing to overwrite /Users/me/.vespa/mytenant.api-key.pem Hint: Use -f to overwrite it This is your public key: -----BEGIN PUBLIC KEY----- ABCDEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEB2UFsh8ZjoWNtkrDhyuMyaZQe1ze qLB9qquTKUDQTuM2LOr2dawUs02nfSc3UTfC08Lgr/dvnTnHpc0/fY+3Aw== -----END PUBLIC KEY----- Its fingerprint is: 12:34:56:78:65:30:77:90:30:ab:83:ee:a9:67:68:2c To use this key in Vespa Cloud click 'Add custom key' at https://console.vespa-cloud.com/tenant/mytenant/account/keys and paste the entire public key including the BEGIN and END lines. ``` This means, if there is a key, it is not overwritten, it is safe to run. Make sure to add the deploy-key to the tenant using the Vespa Cloud Console. After the deploy-key is added, everything is ready for deployment. You can upload or create new Application keys in the console, and store them as a secret in the repository like the GitHub actions example above. Some services like [Travis CI](https://travis-ci.com) do not accept multi-line values for Environment Variables in Settings. A workaround is to use the output of ``` $ openssl base64 -A -a < mykey.pem && echo ``` in a variable, say `VESPA_MYAPP_API_KEY`, in Travis Settings. `VESPA_MYAPP_API_KEY` is exported in the Travis environment, example output: ``` Setting environment variables from repository settings $ export VESPA_MYAPP_API_KEY=[secure] ``` Then, before deploying to Vespa Cloud, regenerate the key value: ``` $ MY_API_KEY=`echo $VESPA_MYAPP_API_KEY | openssl base64 -A -a -d` ``` and use `${MY_API_KEY}` in the deploy command. ## Vespa Cloud upgrades Vespa upgrades follows the same pattern as for new application revisions in [CD tests](#cd-tests), and can be tracked via its version number in the Vespa Cloud Console. System tests are run the same way as for deploying a new application package. A staging test verifies the upgrade from application package `Appold` to `Appnew`, and from Vespa platform version `Vold` to `Vnew`. The staging test then consists of the following steps: 1. All production zone deployments are polled for the current `Vold` / `Appold` versions. As there can be multiple versions already being deployed (i.e. multiple `Vold` / `Appold`), there can be a series of staging test runs. 2. The application at revision `Appold` is deployed on platform version `Vold`, to a zone in the [staging environment](environments.html#staging). 3. The _staging setup_ test code is run, typically making the cluster reasonably similar to a production cluster. 4. The test deployment is then upgraded to application revision `Appnew` and platform version `Vnew`. 5. Finally, the _staging test_ test code is run, to verify the deployment works as expected after the upgrade. Note that one or both of the application revision and platform may be upgraded during the staging test, depending on what upgrade scenario the test is run to verify. These changes are usually kept separate, but in some cases is necessary to allow them to roll out together. ## Next steps - Read more about [feature switches and bucket tests](../applications/testing.html#feature-switches-and-bucket-tests). - A challenge with continuous deployment can be integration testing across multiple services: Another service depends on this Vespa application for its own integration testing. Use a separate [application instance](../reference/applications/deployment.html#instance) for such integration testing. - Set up a deployment badge - available from the console's deployment view - example: ![vespa-team.vespacloud-docsearch.default overview](https://api-ctl.vespa-cloud.com/badge/v1/vespa-team/vespacloud-docsearch/default) - Set up a [global query endpoint](../reference/applications/deployment.html#endpoints-global). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [CD tests](#cd-tests) - [System tests](#system-tests) - [Staging tests](#staging-tests) - [Disabling tests](#disabling-tests) - [Running tests only](#running-tests-only) - [Deployment orchestration](#deployment-orchestration) - [Source code repository integration](#source-code-repository-integration) - [Block-windows](#block-windows) - [Validation overrides](#validation-overrides) - [Production tests](#production-tests) - [Deploying Components](#deploying-components) - [Automating with GitHub Actions](#automating-with-github-actions) - [Vespa Cloud upgrades](#vespa-cloud-upgrades) - [Next steps](#next-steps) --- # Source: https://docs.vespa.ai/en/operations/autoscaling.html.md # Autoscaling Autoscaling lets you adjust the hardware resources allocated to application clusters automatically depending on actual usage. It will attempt to keep utilization of all allocated resources close to ideal, and will automatically reconfigure to the cheapest option allowed by the ranges when necessary. You can turn it on by specifying _ranges_ in square brackets for the [nodes](../reference/applications/services/services.html#nodes) and/or [node resource](../reference/applications/services/services.html#resources) values in _services.xml_. Vespa Cloud will monitor the resource utilization of your clusters and automatically choose the cheapest resource allocation within ranges that produces close to optimal utilization. You can see the status and recent actions of the autoscaler in the _Resources_ view under a deployment in the console. Autoscaling is not considering latency differences achieved by different configurations. If your application has certain configurations that produce good throughput but too high latency, you should not include these configurations in your autoscaling ranges. Adjusting the allocation of a cluster may happen quickly for stateless container clusters, and much more slowly for content clusters with a lot of data. Autoscaling will adjust each cluster on the timescale it typically takes to rescale it (including any data redistribution). The ideal utilization takes into account that a node may be down or failing, that another region may be down causing doubling of traffic, and that we need headroom for maintenance operations and handling requests with low latency. It acts on what it has observed on your system in the recent past. If you need much more capacity in the near future than you do currently, you may want to set the lower limit to take this into account. Upper limits should be set to the maximum size that makes business sense. ## When to use autoscaling Autoscaling is useful in a number of scenarios. Some typical ones are: - You have a new application which you can't benchmark with realistic data and usage, making you unsure what resources to allocate: Set wide ranges for all resource parameters and let the system choose a configuration. Once you gain experience you can consider tightening the configuration space. - You have load that varies quickly during the day, or that may suddenly increase quickly due to some event, and want container cluster resources to quickly adjust to the load: Set a range for the number of nodes and/or vcpu on containers. - You expect your data volume to grow over time, but you don't want to allocate resources prematurely, nor constantly worry about whether it is time to increase: Configure ranges for content nodes and/or node resources such that the size of the system grows with the data. ## Resource tradeoffs Some other considerations when deciding resources: - Making changes to resources/nodes is easy and safe, and one of Vespa Cloud's strengths. We advise you make controlled changes and observe effect on latencies, data migration and cost. Everything is automated, just deploy a new application package. This is useful learning when later needed during load peaks and capacity requirement changes. - Node resources cannot be chosen freely in all zones, CPU/Memory often comes in increments of x 2. Try to make sure that the resource configuration is a good fit. - CPU is the most expensive component, optimize for this for most applications. - Having few nodes means more overcapacity as Vespa requires that the system will handle one node being down (or one group, in content clusters having multiple groups). 4-5 nodes minimum is a good rule of thumb. Whether 4-5 or 9-10 nodes of half the size is better depends on quicker upgrade cycles vs. smoother resource auto-scale curves. Latencies can be better or worse, depending on static vs dynamic query cost. - Changing a node resource may mean allocating a new node, so it may be faster to scale container nodes by changing the number of nodes. - As a consequence, during resource shortage (say almost full disk), add nodes and keep the rest unchanged. - It is easiest to reason over capacity when changing one thing at a time. It is often safe to follow the _suggested resources_ advice when shown in the console and feel free to contact us if you have questions. ## Mixed load A Vespa application must handle a combination of reads and writes, from multiple sources. User load often resembles a sine-like curve. Machine-generated load, like a batch job, can be spiky and abrupt. In the default Vespa configuration, all kinds of load uses _one_ default container cluster. Example: An application where daily batch jobs update the corpus at high rate: ![nodes and resources](/assets/img/load.png) Autoscaling scales _up_ much quicker than _down_, as the probability of a new spike is higher after one has been observed. In this example, see the rapid cluster growth for the daily load spike - followed by a slow decay. The best solution for this case is to slow down the batch job, as it is of short duration. It is not always doable to slow down jobs - in these cases, setting up multiple[container clusters](../applications/containers.html)can be a smart thing - optimize each cluster for its load characteristics. This could be a combination of clusters using autoscale and clusters with a fixed size. Autoscaling often works best for the user-generated load, whereas the machine-generated load could either be tuned or routed to a different cluster in the same Vespa application. ## Examples Below is an example of node resources with autoscaling that would work well for a container cluster: ``` ``` ``` ``` The above would in general **not be recommended for a content cluster.** Changing cpu, memory or disk usually leads to allocating new nodes to fulfil the new node resources spec. When that happens there will be redistribution of documents between the old and new nodes and this might impact service quality to some degree. For a content cluster it would usually be better to try to stick to the same node resources and add or remove nodes, e.g something like: ``` ``` ``` ``` If a content cluster is configured to autoscale based on node resources (not just number of nodes or groups) this will work fine, but note that using paged attributes or HNSW indexes will make it more expensive and time-consuming to redistribute documents when scaling up or down. When doing the initial feeding of a cluster it will be best to avoid auto-scaling, as changing the topology will require redistribution of documents, possibly several times. When using groups in a content cluster it's possible to scale the number of groups instead of the number of nodes, e.g. with a fixed group size and a range for the number of groups: ``` ``` ``` ``` Note that at the moment it is not possible to autoscale GPU resources. ## Related reading - [Feed sizing](../performance/sizing-feeding.html) - [Query sizing](../performance/sizing-search.html) Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [When to use autoscaling](#When-to-use-autoscaling) - [Resource tradeoffs](#resource-tradeoffs) - [Mixed load](#mixed-load) - [Examples](#examples) - [Related reading](#) --- # Source: https://docs.vespa.ai/en/operations/enclave/aws-architecture.html.md # Vespa Cloud Enclave AWS Architecture Each Vespa Cloud Enclave in the tenant AWS account corresponds to a Vespa Cloud[zone](../zones.html). Inside the tenant AWS account one enclave is contained within one single[VPC](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html). ![Enclave architecture](/assets/img/vespa-cloud-enclave-aws.png) #### EC2 Instances, Load Balancers, and S3 buckets Configuration Servers inside the Vespa Cloud zone makes the decision to create or destroy EC2 instances ("Vespa Hosts" in diagram) based on the Vespa applications that are deployed. The Configuration Servers also set up the Network Load Balancers needed to communicate with the deployed Vespa application. Each Vespa Host will periodically sync its logs to a S3 bucket ("Log Archive"). This bucket is "local" to the enclave and provisioned by the Terraform module inside the tenant's AWS account. #### Networking The enclave VPC is very network restricted. Vespa Hosts do not have public IPv4 addresses and there is no[NAT gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html)available in the VPC. Vespa Hosts have public IPv6 addresses and are able to make outbound connections. Inbound connections are not allowed. Outbound IPv6 connections are used to bootstrap communication with the Configuration Servers, and to report operational metrics back to Vespa Cloud. When a Vespa Host is booted it will set up an encrypted tunnel back to the Configuration Servers. All communication between Configuration Servers and the Vespa Hosts will be run over this tunnel after it is set up. ### Security The Vespa Cloud operations team does _not_ have any direct access to the resources that is part of the customer account. The only possible access is through the management APIs needed to run Vespa itself. In case it is needed for, e.g. incident debugging, direct access can only be granted to the Vespa team by the tenant itself. For further details, see the documentation for the[`ssh`-submodule](https://registry.terraform.io/modules/vespa-cloud/enclave/aws/latest/submodules/ssh). All communication between the enclave and the Vespa Cloud configuration servers is encrypted, authenticated and authorized using[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) with identities embedded in the certificate. mTLS communication is facilitated with the[Athenz](https://www.athenz.io/) service. All data stored is encrypted at rest using[KMS](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html). All keys are managed by the tenant in the tenant's AWS account. The resources provisioned in the tenant AWS account are either provisioned by the Terraform module executed by the tenant, or by the orchestration services inside a Vespa Cloud Zone. Resources are provisioned by the Vespa Cloud configuration servers, using the[`provision_policy`](https://github.com/vespa-cloud/terraform-aws-enclave/blob/main/modules/provision/main.tf)AWS IAM policy document defined in the Terraform module. The tenant that registered the AWS account is the only tenant that can deploy applications targeting the enclave. For more general information about security in Vespa Cloud, see the[whitepaper](../../security/whitepaper). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/enclave/aws-getting-started.html.md # Getting started with Vespa Cloud Enclave in AWS Setting up Vespa Cloud Enclave requires: 1. Registration at [Vespa Cloud](https://console.vespa-cloud.com), or use a pre-existing tenant. 2. Registration of the AWS account ID in Vespa Cloud 3. Running a [Terraform](https://www.terraform.io/) configuration to provision AWS resources in the account. Go through the [AWS tutorial](https://developer.hashicorp.com/terraform/tutorials/aws-get-started) as needed. 4. Deployment of a Vespa application. ### 1. Vespa Cloud Tenant setup Register at [Vespa Cloud](https://console.vespa-cloud.com) or use an existing tenant. Note that the tenant must be on a [paid plan](https://vespa.ai/pricing/). ### 2. Onboarding Contact [support@vespa.ai](mailto:support@vespa.ai) stating which tenant should be on-boarded to use Vespa Cloud Enclave. Also include the [AWS account ID](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindAccountId)to associate with the tenant. **Note:** We recommend using a _dedicated_ account for your Vespa Cloud Enclave. Vespa Cloud will manage resources in the Enclave VPCs created in the AWS resource provisioning step. Primarily EC2 instances, load balancers and service endpoints. One account can host all your Vespa applications, there is no need for multiple tenants or accounts. ### 3. Configure AWS Account The same AWS account used in step two must be prepared for deploying Vespa applications using either _Terraform_ or _Cloudformation_. #### Terraform Use [Terraform](https://www.terraform.io/) to set up the necessary resources using the[modules](https://registry.terraform.io/modules/vespa-cloud/enclave/aws/latest) published by the Vespa team. Modify the[multi-region Terraform files](https://github.com/vespa-cloud/terraform-aws-enclave/blob/main/examples/multi-region/main.tf)for your deployment. If you are unfamiliar with Terraform: It is a tool to manage resources and their configuration in various cloud providers, like AWS and GCP. Terraform has published an[AWS](https://developer.hashicorp.com/terraform/tutorials/aws-get-started)tutorial, and we strongly encourage enclave users to read and follow the Terraform recommendations for[CI/CD](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform). The Terraform module we provide is regularly updated to add new required resources or extra permissions for Vespa Cloud to automate the operations of your applications. In order for your enclave applications to use the new features you must re-apply your terraform templates with the latest release. The [notification system](../notifications)will let you know when a new release is available. #### Cloudformation Vespa also supports Cloudformation if you prefer the AWS-native solution. Download the Cloudformation stacks in our [github repository](https://github.com/vespa-cloud/cloudformation-aws-enclave) and refer to the README for stack-specific instructions. ### 4. Deploy a Vespa application By default, all applications are deployed on resources in Vespa Cloud accounts. To deploy in your enclave account, update [deployment.xml](../../reference/applications/deployment.html) to reference the account used in step two: ``` ``` Useful resources are [getting started](../../basics/deploy-an-application-java.html)and [migrating to Vespa Cloud](../../learn/migrating-to-cloud) - put _deployment.xml_ next to _services.xml_. ## Next steps After a successful deployment to the [dev](../environments.html#dev) environment, iterate on the configuration to implement your application on Vespa. The _dev_ environment is ideal for this, with rapid deployment cycles. For production serving, deploy to the [prod](../environments.html#prod) environment - follow the steps in [production deployment](../production-deployment.html). ## Enclave teardown To tear down a Vespa Cloud Enclave system, do the steps above in reverse order: 1. [Undeploy the application(s)](../deleting-applications.html) 2. Undeploy the Terraform changes It is important to undeploy the Vespa application(s) first. After running the Terraform, Vespa Cloud cannot manage the resources allocated, so you must clean up these yourself. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [1. Vespa Cloud Tenant setup](#1-vespa-cloud-tenant-setup) - [2. Onboarding](#2-onboarding) - [3. Configure AWS Account](#3-configure-aws-account) - [4. Deploy a Vespa application](#4-deploy-a-vespa-application) - [Next steps](#next-steps) - [Enclave teardown](#enclave-teardown) --- # Source: https://docs.vespa.ai/en/operations/enclave/azure-architecture.html.md # Architecture for Vespa Cloud Enclave in Azure ### Architecture With Vespa Cloud Enclave, all Azure resources associated with your Vespa Cloud applications are in your enclave Azure subscription, as opposed to a shared Vespa Cloud subscription. Each Vespa Cloud [zone](../zones.html) has an associated zone resource group (RG) in the enclave subscription, that contains all the resources for that zone. For instance, it has one Virtual Network (VNet aka [VPC](https://cloud.google.com/vpc/)). ![Enclave architecture](/assets/img/vespa-cloud-enclave-azure.png) #### Virtual Machines, Load Balancers, and Blob Storage Configuration Servers inside the Vespa Cloud subscription make the decision to create or destroy virtual machines ("Vespa Hosts" in diagram) based on the Vespa applications that are deployed. The Configuration Servers also set up the Container Load Balancers needed to communicate with the deployed Vespa application. Each Vespa Host will periodically sync its logs to a Blob Storage container ("Log Archive") in a Storage Account in the zone RG. This storage account is "local" to the enclave and provisioned by the Terraform module inside your Azure subscription. #### Networking The Zone Virtual Network (VNet aka VPC) is very network restricted. The Vespa Hosts do not have a public IPv4 address. But your application can connect to external IPv4 services using a [NAT gateway](https://learn.microsoft.com/en-us/azure/nat-gateway/nat-overview). Vespa Hosts have public IPv6 addresses and are able to make outbound connections. Inbound connections are not allowed. Outbound IPv6 connections are used to bootstrap communication with the Configuration Servers, and to report operational metrics back to Vespa Cloud. When a Vespa Host is booted, it will set up an encrypted tunnel back to the Configuration Servers. All communication between Configuration Servers and the Vespa Hosts will be run over this tunnel after it is set up. ### Security The Vespa Cloud operations team does _not_ have any direct access to the resources in your subscription. The only possible access is through the management APIs needed to run Vespa itself. In case it is needed for, e.g. incident debugging, direct access can only be granted to the Vespa team by you. Enable direct access by setting the`enable_ssh` input to true in the enclave module. For further details, see the documentation for the[enclave module inputs](https://registry.terraform.io/modules/vespa-cloud/enclave/azure/latest/?tab=inputs). All communication between the enclave and the Vespa Cloud Configuration servers is encrypted, authenticated, and authorized using[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) with identities embedded in the certificate. mTLS communication is facilitated with the[Athenz](https://www.athenz.io/) service. All data stored is encrypted at rest using[Encryption At Host](https://learn.microsoft.com/en-us/azure/virtual-machines/disk-encryption-overview). All keys are managed automatically by the Azure platform. The resources provisioned in your Azure subscription are either provisioned by the Vespa Cloud Enclave Terraform module you apply, or by the orchestration services inside a Vespa Cloud zone. Resources are provisioned by the Vespa Cloud Configuration servers, using the[`id-provisioner`](https://github.com/vespa-cloud/terraform-azure-enclave/blob/main/provisioner.tf)user-assigned managed identity defined in the Terraform module. Only your Vespa tenant (that registered this Azure subscription) can deploy applications targeting your enclave. For more general information about security in Vespa Cloud, see the[whitepaper](../../security/whitepaper). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/enclave/azure-getting-started.html.md # Getting started with Vespa Cloud Enclave in Azure Setting up Vespa Cloud Enclave requires: 1. Registration at [Vespa Cloud](https://console.vespa-cloud.com), or use a pre-existing Vespa tenant. 2. Running a [Terraform](https://www.terraform.io/) configuration to provision necessary Azure resources in the subscription. 3. Registration of the Azure subscription in Vespa Cloud. 4. Deployment of a Vespa application. ### 1. Vespa Cloud Tenant setup Register at [Vespa Cloud](https://console.vespa-cloud.com) or use an existing Vespa tenant. Note that the tenant must be on a [paid plan](https://vespa.ai/pricing/). ### 2. Configure Azure subscription Choose an Azure subscription to use for Vespa Cloud Enclave. **Note:** We recommend using a _dedicated_ subscription for your Vespa Cloud Enclave. Resources in this subscription will be fully managed by Vespa Cloud. One subscription can host all your Vespa applications, there is no need for multiple Vespa tenants or Azure subscriptions. The subscription must be prepared for deploying Vespa applications. Use [Terraform](https://www.terraform.io/) to set up the necessary resources using the[modules](https://registry.terraform.io/modules/vespa-cloud/enclave/azure/latest)published by the Vespa team. Feel free to use the[example](https://github.com/vespa-cloud/terraform-azure-enclave/blob/main/examples/basic/main.tf)to get started. If you are unfamiliar with Terraform: It is a tool to manage resources and their configuration in various cloud providers, like AWS, Azure, and GCP. Terraform has published a[Get Started - Azure](https://developer.hashicorp.com/terraform/tutorials/azure-get-started)tutorial, and we strongly encourage enclave users to read and follow the Terraform recommendations for[CI/CD](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform). The Terraform module we provide is regularly updated to add new required resources or extra permissions for Vespa Cloud to automate the operations of your applications. In order for your enclave applications to use the new features you must re-apply your terraform templates with the latest release. The [notification system](../notifications.html)will let you know when a new release is available. ### 3. Onboarding Contact [support@vespa.ai](mailto:support@vespa.ai) and provide the `enclave_config` output after applying the Terraform, see[Outputs](https://github.com/vespa-cloud/terraform-azure-enclave?tab=readme-ov-file#outputs). The `enclave_config` includes which Vespa tenant should be on-boarded to use Vespa Cloud Enclave. And the Azure tenant ID, the subscription ID, and a client ID of an Athenz identity the Terraform created. ### 4. Deploy a Vespa application By default, all applications are deployed on resources in Vespa Cloud accounts. To deploy in your Azure enclave subscription instead, update [deployment.xml](../../reference/applications/deployment.html) to reference the subscription ID from step 2: ``` ``` Useful resources are [getting started](../../basics/deploy-an-application.html)and [migrating to Vespa Cloud](../../learn/migrating-to-cloud) - put _deployment.xml_ next to _services.xml_. ## Next steps After a successful deployment to the [dev](../environments.html#dev) environment, iterate on the configuration to implement your application on Vespa. The _dev_ environment is ideal for this, with rapid deployment cycles. For production serving, deploy to the [prod](../environments.html#prod) environment - follow the steps in [production deployment](../production-deployment.html). ## Enclave teardown To tear down a Vespa Cloud Enclave system, do the steps above in reverse order: 1. [Undeploy the application(s)](../deleting-applications.html) 2. Undeploy the Terraform changes It is important to undeploy the Vespa application(s) first. After running the Terraform, Vespa Cloud cannot manage the resources allocated, so you must clean up these yourself. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [1. Vespa Cloud Tenant setup](#1-vespa-cloud-tenant-setup) - [2. Configure Azure subscription](#2-configure-azure-subscription) - [3. Onboarding](#3-onboarding) - [4. Deploy a Vespa application](#4-deploy-a-vespa-application) - [Next steps](#next-steps) - [Enclave teardown](#enclave-teardown) --- # Source: https://docs.vespa.ai/en/writing/batch-delete.html.md # Batch delete Options for batch deleting documents: 1. Use [vespa feed](../clients/vespa-cli.html#documents): ``` $ vespa feed -t my-endpoint deletes.json ``` 2. Find documents using a query, delete, repeat. Pseudocode: ``` while True; do query and read document ids, if empty exit delete document ids using[/document/v1](../reference/api/document-v1.html#delete)wait a sec # optional, add wait to reduce load while deleting ``` 3. Use a [document selection](../schemas/documents.html#document-expiry) to expire documents. This deletes all documents _not_ matching the expression. It is possible to use parent documents and imported fields for expiry of a document set. The content node will iterate over the corpus and delete documents (that are later compacted out): ``` ``` ``` ``` 4. Use [/document/v1](../reference/api/document-v1.html#delete) to delete documents identified by a [document selection](../reference/writing/document-selector-language.html) - example dropping all documents from the _my\_doctype_ schema. The _cluster_ value is the ID of the content cluster in _services.xml_, e.g., ``: ``` $ curl -X DELETE \ "$ENDPOINT/document/v1/my_namespace/my_doctype/docid?selection=true&cluster=my_cluster" ``` 5. It is possible to drop a schema, with all its content, by removing the mapping to the content cluster. To understand what is happening, here is the status before the procedure: ## Example This is an end-to-end example on how to track number of documents, and delete a subset using a [selection string](../reference/writing/document-selector-language.html). ### Feed sample documents Feed a batch of documents, e.g. using the [vector-search](https://github.com/vespa-cloud/vector-search) sample application: ``` $ vespa feed <(python3 feed.py 100000 3) ``` See number of documents for a node using the [content.proton.documentdb.documents.total](../reference/operations/metrics/searchnode.html#content_proton_documentdb_documents_total) metric (here 100,000): ``` $ docker exec vespa curl -s http://localhost:19092/prometheus/v1/values | grep ^content.proton.documentdb.documents.total content_proton_documentdb_documents_total_max{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000 content_proton_documentdb_documents_total_last{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000 ``` Using the metric above is useful while feeding this example. Another alternative is [visiting](visiting.html) all documents to print the ID: ``` $ vespa visit --field-set "[id]" | wc -l 100000 ``` At this point, there are 100,000 document in the index. ### Define selection Define the subset of documents to delete - e.g. by age or other criteria. In this example, select random 1%. Do a test run: ``` $ vespa visit --field-set "[id]" --selection 'id.hash().abs() % 100 == 0' | wc -l 1016 ``` Hence, the selection string `id.hash().abs() % 100 == 0` hits 1,016 documents. ### Delete documents Delete documents, see the number of documents deleted in the response: ``` $ curl -X DELETE \ "http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors" { "pathId":"/document/v1/mynamespace/vector/docid", "documentCount":1016 } ``` In case of a large result set, a continuation token might be returned in the response, too: ``` "continuation": "AAAAEAAAA" ``` If so, add the token and redo the request: ``` $ curl -X DELETE \ "http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors&continuation=AAAAEAAAA" ``` Repeat as long as there are tokens in the output. The token changes in every response. ### Validate Check that all documents matching the selection criterion are deleted: ``` $ vespa visit --selection 'id.hash().abs() % 100 == 0' --field-set "[id]" | wc -l 0 ``` List remaining documents: ``` $ vespa visit --field-set "[id]" | wc -l 98984 ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Example](#example) - [Feed sample documents](#feed-sample-documents) - [Define selection](#define-selection) - [Delete documents](#delete-documents) - [Validate](#validate) --- # Source: https://docs.vespa.ai/en/performance/benchmarking-cloud.html.md # Benchmarking This is a step-by-step guide to get started with benchmarking on Vespa Cloud, based on the [Vespa benchmarking guide](benchmarking.html), using the [sample app](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation). Overview: ![Vespa Cloud Benchmarking](/assets/img/cloud-benchmarks.svg) ## Set up a performance test instance Use an instance in a [dev zone](../operations/environments.html#dev) for benchmarks. To deploy an instance there, use the [getting started](../basics/deploy-an-application.html) guide, and make sure to specify the resources using a `deploy:environment="dev"` attribute: ``` ``` ``` ``` ``` $ vespa deploy --wait 600 ``` Feed documents: ``` $ vespa feed ext/documents.jsonl ``` Query documents to validate the feed: ``` $ vespa query "select * from music where true" ``` Query documents using curl: ``` $ curl \ --cert ~/.vespa/mytenant.myapp.default/data-plane-public-cert.pem \ --key ~/.vespa/mytenant.myapp.default/data-plane-private-key.pem \ -H "Content-Type: application/json" \ --data '{"yql" : "select * from music where true"}' \ https://baaae1db.b68ddc0d.z.vespa-app.cloud/search/ ``` At this point, the instance is ready, with data, and can be queried using data-plane credentials. ## Test using vespa-fbench The rest of the guide assumes the data-plane credentials are in working directory: ``` $ ls -1 *.pem data-plane-private-key.pem data-plane-public-cert.pem ``` Prepare a query file: ``` $ echo "/search/?yql=select+*+from+music+where+true" > query001.txt ``` Test using [vespa-fbench](../reference/operations/tools.html#vespa-fbench) running in a docker container: ``` $ docker run -v $(pwd):/files -w /files \ --entrypoint /opt/vespa/bin/vespa-fbench \ vespaengine/vespa \ -C data-plane-public-cert.pem \ -K data-plane-private-key.pem \ -T /etc/ssl/certs/ca-bundle.crt \ -n 1 -q query001.txt -s 1 -c 0 \ -o output.txt \ baaae1db.b68ddc0d.z.vespa-app.cloud 443 ``` `-o output.txt` is useful when validating the test - remove this option when load testing. Make sure there are no `SSL_do_handshake` errors in the output. Expect HTTP status code 200: ``` Starting clients... Stopping clients Clients stopped. . Clients Joined. ***HTTP keep-alive statistics*** connection reuse count -- 4 *****************Benchmark Summary***************** clients: 1 ran for: 1 seconds cycle time: 0 ms lower response limit: 0 bytes skipped requests: 0 failed requests: 0 successful requests: 5 cycles not held: 5 minimum response time: 128.17 ms maximum response time: 515.35 ms average response time: 206.38 ms 25 percentile: 128.70 ms 50 percentile: 129.60 ms 75 percentile: 130.20 ms 90 percentile: 361.32 ms 95 percentile: 438.36 ms 99 percentile: 499.99 ms actual query rate: 4.80 Q/s utilization: 99.03 % zero hit queries: 5 http request status breakdown: 200 : 5 ``` At this point, running queries using _vespa-fbench_ works well from local laptop. ## Run queries inside data center Next step is to run this from the same location (data center) as the dev zone. In this example, an AWS [zone](../operations/zones.html). Deduce the AWS zone from Vespa Cloud zone name. Below is an example using a host with Amazon Linux 2023 AMI (HVM) image: 1. Create the host - here assume key pair is named _key.pem_. No need to do anything other than default. 2. Log in, update, install docker: 3. Copy credentials for endpoint access, log in and validate docker setup: 4. Make a dummy query: 5. Run vespa-fbench and verify 200 response: At this point, you are able to benchmark using _vespa-fbench_ in the same zone as the Vespa Cloud dev instance. ## Run benchmark Use the [Vespa Benchmarking Guide](../performance/benchmarking.html) to plan and run benchmarks. Also see [sizing](#sizing) below. Make sure the client running the benchmark tool has sufficient resources. Export [metrics](../operations/metrics.html): ``` $ curl \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ https://baaae1db.b68ddc0d.z.vespa-app.cloud/prometheus/v1/values ``` Notes: - Periodically dump all metrics using `consumer=Vespa`. - Make sure you will not exhaust your serving threads on your container nodes while in production. This can be verified by making sure this expression stays well below 100% (typically below 50%) for the traffic you expect: `100 * (jdisc.thread_pool.active_threads.sum / jdisc.thread_pool.active_threads.count) / jdisc.thread_pool.size.max` for each `threadpool` value. You can increase the number of threads in the pools by using larger container nodes, more container nodes or by tuning the number of threads as described in [services-search](../reference/applications/services/search.html#threadpool). In the case you do exhaust a threadpool and its queue you will experience HTTP 503 responses for requests that are rejected by the container. ## Making changes Whenever deploying changes to configuration, track progress in the Deployment dashboard. Some changes, like changing [requestthreads](../reference/applications/services/content.html#requestthreads) will restart content nodes, and this is done in sequence and takes time. Wait for successful completion in _Wait for services and endpoints to come online_. When changing node type/count, wait for auto data redistribution to complete, watching the `vds.idealstate.merge_bucket.pending.average` metric: ``` $ while true; do curl -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ https://baaae1db.b68ddc0d.z.vespa-app.cloud/prometheus/v1/values?consumer=Vespa | \ grep idealstate.merge_bucket.pending.average; \ sleep 10; done ``` Notes: - Dump all metrics using `consumer=Vespa`. - After changing the number of content nodes, this metric will jump, then decrease (not necessarily linearly) - speed depending on data volume. ## Sizing Using Vespa Cloud enables the Vespa Team to assist you to optimise the application to reduce resource spend. Based on 150 applications running on Vespa Cloud today, savings are typically 50%. Cost optimization is hard to do without domain knowledge - but few teams are experts in both their application and its serving platform. Sizing means finding both the right node size and the right cluster topology: ![Resize to fewer and smaller nodes](/assets/img/nodes.svg) Applications use Vespa for their primary business use cases. Availability and performance vs. cost are business decisions. The best sized application can handle all expected load situations, and is configured to degrade quality gracefully for the unexpected. Even though Vespa is cost-efficient out of the box, Vespa experts can usually spot over/under-allocations in CPU, memory and disk space/IO, and discuss trade-offs with the application team. Using [automated deployments](../operations/automated-deployments.html) applications go live with little risk. After launch, right-size the application based on true load after using Vespa’s elasticity features with automated data migration. Use the [Vespa sizing guide](../performance/sizing-search.html)to size the application and find metrics used there. Pro-tips: - 60% is a good max memory allocation - 50% is a good max CPU allocation, although application dependent. - 70% is a good max disk allocation Rules of thumb: - Memory and disk scales approximately linearly for indexed fields' data - attributes have a fixed cost for empty fields. - Data variance will impact memory usage. - Undersized instances will [block writes](../writing/feed-block.html). - If is often a good idea to use the `dev` zone to test memory impact of adding large fields, e.g. adding an embedding. ## Notes - The user running benchmarks must have read access to the endpoint - if you already have, you can skip this section. Refer to the [Vespa security guide](../security/guide). - [Monitoring](../operations/monitoring.html) is useful to track metrics when benchmarking. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Set up a performance test instance](#set-up-a-performance-test-instance) - [Test using vespa-fbench](#test-using-vespa-fbench) - [Run queries inside data center](#run-queries-inside-data-center) - [Run benchmark](#run-benchmark) - [Making changes](#making-changes) - [Sizing](#sizing) - [Notes](#notes) --- # Source: https://docs.vespa.ai/en/performance/benchmarking.html.md # Vespa Benchmarking Benchmarking a Vespa application is essential to get an idea of how well the test configuration performs. Thus, benchmarking is an essential part of sizing a search cluster itself. Benchmarking a cluster can answer the following questions: - What throughput and latency to expect from a search node? - Which resource is the bottleneck in the system? These in turn indirectly answers other questions such as how many nodes are needed, and if it will help to upgrade disk or CPU. Thus, benchmarking will help in finding the optimal Vespa configuration, using all resources optimally, which in turn lowers costs. A good rule is to benchmark whenever the workload changes. Benchmarking should also be done when adding new features to queries. Having an understanding of the query mix and SLA will help to set the test parameters. Before benchmarking, consider: - What is the expected query mix? Having a representative query mix to test with is essential in order to get valid results. Splitting up in different types of queries is also a useful way to get an idea of which query classes are resource intensive. - What is the expected SLA, both in terms of latency and query throughput? - How important is real-time behavior? What is the rate of incoming documents, if any? - Timeout, in a benchmarking scenario, is it ok for requests to time out? Default [timeout](/en/reference/querying/yql.html#timeout) is 500 ms, and [softtimeout](/en/reference/api/query.html#ranking.softtimeout.enable) is enabled. If the full cost of all queries are to be considered: - Disable soft timeout with execution parameter - by a [query profile](../querying/query-profiles.html) - by appending: `&ranking.softtimeout.enable=false` to with the [vespa-fbench](#vespa-fbench) `-a` option - Set timeout to e.g. 5 seconds - Note that `timeout` in YQL takes precedence - Replace timeout in YQL or use the execution parameter [timeout](/en/reference/api/query.html#timeout) as above. If benchmarking using Vespa Cloud, see [Vespa Cloud Benchmarking](https://cloud.vespa.ai/en/benchmarking). ## vespa-fbench Vespa provides a query load generator tool,[vespa-fbench](/en/reference/operations/tools.html#vespa-fbench), to run queries and generate statistics - much like a traditional web server load generator. It allows running any number of _clients_(i.e. the more clients, the higher load), for any length of time, and adjust the client response time before issuing the next query. It outputs the throughput, max, min, and average latency, as well as the 25, 50, 75, 90, 95, 99 and 99.9 latency percentiles. This provides quite accurate information of how well the system manages the workload. **Disclaimer:** _vespa-fbench_ is a tool to drive load for benchmarking and tuning. It is not a tool for finding the maximum load or latencies in a production setting. This is due to the way it is implemented: It is run with `-n` number of clients per run. It is good for testing, as proton can be observed at different levels of concurrency. In the real world, the number of clients and query arrival will follow a different distribution, and impact 95p / 99p latency percentiles. ### Prepare queries vespa-fbench uses _query files_ for GET and POST queries - see the [reference](/en/reference/operations/tools.html#vespa-fbench) - examples:_HTTP GET_ requests: ``` /search/?yql=select%20%2A%20from%20sources%20%2A%20where%20true ``` _HTTP POST_ requests format: ``` /search/ {"yql" : "select * from sources * where true"} ``` ### Run queries A typical vespa-fbench command looks like: ``` $ vespa-fbench -n 8 -q queries.txt -s 300 -c 0 myhost.mydomain.com 8080 ``` This starts 8 clients, using requests read from `queries.txt`. The `-s` parameter indicates that the benchmark will run for 300 seconds. The `-c` parameter, states that each client thread should wait for 0 milliseconds between each query. The last two parameters are container hostname and port. Multiple hosts and ports can be provided, and the clients will be uniformly distributed to query the containers round-robin. A more complex example, using docker, hitting a Vespa Cloud endpoint: ``` $ docker run -v /Users/myself/tmp:/testfiles \ -w /testfiles --entrypoint '' vespaengine/vespa \ /opt/vespa/bin/vespa-fbench \ -C data-plane-public-cert.pem -K data-plane-private-key.pem -T /etc/ssl/certs/ca-bundle.crt \ -n 10 -q queries.txt -o result.txt -s 300 -c 0 \ myapp.mytenant.aws-us-east-1c.z.vespa-app.cloud 443 ``` When using a query file with HTTP POST requests (`-P` option) one also need to pass the _Content-Type_ header using the `-H` header option. ``` $ docker run -v /Users/myself/tmp:/testfiles \ -w /testfiles --entrypoint '' vespaengine/vespa \ /opt/vespa/bin/vespa-fbench \ -C data-plane-public-cert.pem -K data-plane-private-key.pem -T /etc/ssl/certs/ca-bundle.crt \ -n 10 -P -H "Content-Type: application/json" -q queries_post.txt -o output.txt -s 300 -c 0 \ myapp.mytenant.aws-us-east-1c.z.vespa-app.cloud 443 ``` ### Post Processing After each run, a summary is written to stdout (and possibly an output file from each client) - example: ``` *****************Benchmark Summary***************** clients: 30 ran for: 1800 seconds cycle time: 0 ms lower response limit: 0 bytes skipped requests: 0 failed requests: 0 successful requests: 12169514 cycles not held: 12169514 minimum response time: 0.82 ms maximum response time: 3010.53 ms average response time: 4.44 ms 25 percentile: 3.00 ms 50 percentile: 4.00 ms 75 percentile: 6.00 ms 90 percentile: 7.00 ms 95 percentile: 8.00 ms 99 percentile: 11.00 ms actual query rate: 6753.90 Q/s utilization: 99.93 % ``` Take note of the number of _failed requests_, as a high number here can indicate that the system is overloaded, or that the queries are invalid. - In some modes of operation, vespa-fbench waits before sending the next query. "utilization" represents the time that vespa-fbench is sending queries and waiting for responses. For example, a 'system utilization' of 50% means that vespa-fbench is stress testing the system 50% of the time, and is doing nothing the remaining 50% of the time - vespa-fbench latency results include network latency between the client and the Vespa instance. Measure and subtract network latency to obtain the true vespa query latency. ## Benchmark Strategy: find optimal _requestthreads_ number, then find capacity by increasing number of parallel test clients: 1. Test with single client (n=1), single thread to find a _latency baseline_. For each test run, increase [threads](../reference/applications/services/content.html#requestthreads): 2. use #threads sweet spot, then increase number of clients, observe latency and CPU. ### Metrics The _container_ nodes expose the[/metrics/v2/values](../operations/metrics.html) interface - use this to dump metrics during benchmarks. Example - output all metrics from content node: ``` $ curl http://localhost:8080/metrics/v2/values | \ jq '.nodes[] | select(.role=="content/mysearchcluster/0/0") | .node.metrics[].values' ``` Output CPU util: ``` $ curl http://localhost:8080/metrics/v2/values | \ jq '.nodes[] | select(.role=="content/mysearchcluster/0/0") | .node.metrics[].values."cpu.util"' ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [vespa-fbench](#vespa-fbench) - [Prepare queries](#prepare-queries) - [Run queries](#run-queries) - [Post Processing](#post-processing) - [Benchmark](#benchmark) - [Metrics](#metrics) --- # Source: https://docs.vespa.ai/en/rag/binarizing-vectors.html.md # Binarizing Vectors Binarization in this context is mapping numbers in a vector (embedding) to bits (reducing the value range), and representing the vector of bits efficiently using the `int8` data type. Examples: | input vector | binarized floats | pack\_bits (to INT8) | | --- | --- | --- | | [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] | [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] | -1 | | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 | | [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 | | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -128 | | [2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | -127 | Binarization is key to reducing memory requirements and, therefore, cost. Binarization can also improve feeding performance, as the memory bandwidth requirements go down accordingly. Refer to [embedding](../rag/embedding.html) for more details on how to create embeddings from text. ## Summary This guide maps all the steps required to run a successful binarization project using Vespa only - there is no need to re-feed data. This makes a project feasible with limited incremental resource usage and man-hours required. Approximate Nearest Neighbor vector operations are run using an HNSW index in Vespa, with online data structures. The cluster is operational during the procedure, gradually building the required data structures. This guide is useful to map the steps and tradeoffs made for a successful vector binarization. Other relevant articles on how to reduce vector size in memory are: - [Exploring the potential of OpenAI Matryoshka 🪆 embeddings with Vespa](https://blog.vespa.ai/matryoshka-embeddings-in-vespa/) - [Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/) Adding to this, using algorithms like SPANN can solve problems with huge vector data sizes, read more in [Billion-scale vector search using hybrid HNSW-IF](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/). A binarization project normally involves iteration over different configuration settings, measuring quality loss for each iteration - this procedure it built with that in mind. ## Converters Vespa’s built-in indexing language [converters](../reference/writing/indexing-language.html#converters)`binarize` and `pack_bits` let you easily generate binarized vectors. Example schema definitions used to generate the vectors in the table above: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute } } field doc_embedding_binarized_floats type tensor(x[8]) { indexing: input doc_embedding | binarize | attribute } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute } } ``` We see that the `binarize` function itself will not compress vectors to a smaller size, as the output cell type is the same as the input - it is only the values that are mapped to 0 or 1. Above, the vectors are binarized using a threshold value of 0, the Vespa default - any number \> 0 will map to 1 - this threshold is configurable. `pack_bits` reads binarized vectors and represents them using int8. In the example above: - `tensor(x[8])` is 8 x sizeof(float) = 8 x 32 bits = 256 bits = 32 bytes - `tensor(x[1])` is 1 x sizeof(int8) = 1 x 8 bits = 8 bits = 1 byte In other words, a compression factor of 32, which is expected, mapping a 32-bit float into 1 bit. As memory usage often is the cost driver for applications, this has huge potential. However, there is a loss of precision, so the tradeoff must be evaluated. Read more in [billion-scale-knn](https://blog.vespa.ai/billion-scale-knn/) and[combining-matryoshka-with-binary-quantization-using-embedder](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/). ## Binarizing an existing embedding field In the example above, we see that `doc_embedding` has the original embedding data, and the fields `doc_embedding_binarized_floats` and `doc_embedding_binarized` are generated from `doc_embedding`. This is configured through the `indexing: input …` statement, and defining the generated fields outside the `document { … }` block. **Note:** The `doc_embedding_binarized_floats` field is just for illustration purposes, as input to the `doc_embedding_binarized` field, which is the target binarized and packed field with low memory requirements. From here, we will call this the binarized embedding. This is a common case for many applications - how to safely binarize and evaluate the binarized data for subsequent use. The process can be broken down into: - Pre-requisites. - Define the new binarized embedding, normally as an addition to the original field. - Deploy and re-index the data to populate the binarized embedding. - Create new ranking profiles with the binarized embeddings. - Evaluate the quality of the binarized embedding. - Remove the original embedding field from memory to save cost. ## Pre-requisites Adding a new field takes resources, on disk and in memory. A new binarized embedding field is smaller - above, it is 1/32 of the original field. Also note that embedding fields often have an index configured, like: ``` field doc_embeddings type tensor(x[8]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 100 } } } ``` The index is used for approximate nearest neighbor (ANN) searches, and also consumes memory. Use the Vespa Cloud console to evaluate the size of original fields and size of indexes to make sure that there is room for the new embedding field, possibly with an index. **Note:** The size of an index is a function of the number of documents, regardless of tensor type. In this context, this means that adding a new field with and index, the new index will have the same size as the index of the existing embedding field. Use status pages to find the index size in memory - example: https://api-ctl.vespa-cloud.com/application/v4/tenant/ TENANT\_NAME/application/APP\_NAME/instance/INSTANCE\_NAME/environment/prod/region/REGION/ service/searchnode/NODE\_HOSTNAME/ state/v1/custom/component/documentdb/SCHEMA/subdb/ready/attribute/ATTRIBUTE\_NAME ### Example ``` tensor: { compact_generation: 33946879, ref_vector: { memory_usage: { used: 1402202052, dead: 0, allocated: 1600126976, onHold: 0 } }, tensor_store: { memory_usage: { used: 205348904436, dead: 10248636768, allocated:206719921232, onHold: 0 } }, nearest_neighbor_index: { memory_usage: { all: { used: 10452397992, dead: 360247164, allocated:13346516304, onHold: 0 } ``` In this example, the index is 13G, the tensor data is 206G, so the index is 6.3% of the tensor data. The original tensor is of type `bfloat16`, a binarized version is 1/16 of this and hence 13G. As an extra index is 13G, the temporal incremental memory usage is approximately 26G during the procedure. ## Define the binarized embedding field The new field is _added_ to the schema, example schema, before: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute } } } ``` After: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute } } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute } } ``` The above are simple examples, with no ANN settings on the fields. Following is a more complex example - schema before: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } } ``` Schema after: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute | index attribute { distance-metric: hamming } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } ``` Note that we replicate the index settings to the new field. ## Deploy and reindex the binarized embedding field Deploying the field will trigger a reindexing on Vespa Cloud to populate the binarized embedding, fully automated. Self-hosted, the `deploy` operation will output the below - [trigger a reindex](../operations/reindexing.html). ``` $ vespa deploy Uploading application package... done Success: Deployed '.' with session ID 3 WARNING Change(s) between active and new application that may require re-index: reindexing: Consider re-indexing document type 'doc' in cluster 'doc' because: 1) Document type 'doc': Non-document field 'doc_embedding_binarized' added; this may be populated by reindexing ``` Depending on the size of the corpus and resources configured, the reindexing process takes time. ## Create new ranking profiles and queries using the binarized embeddings After reindexing, you can query using the new, binarized embedding field. Assuming a query using the doc\_embedding field: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \ 'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'ranking=app_ranking' ``` The same query, with a binarized query vector, to the binarized field: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin' ``` See [tensor-hex-dump](../reference/schemas/document-json-format.html#tensor-hex-dump)for more information about how to create the int8-typed tensor. ### Quick Hamming distance intro Example embeddings: | document embedding | binarized floats | pack\_bits (to INT8) | | --- | --- | --- | | [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 | | **query embedding** | **binarized floats** | **to INT8** | | [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0] | -119 | Use [matchfeatures](../reference/schemas/schemas.html#match-features)to debug ranking (see ranking profile `app_ranking_bin` below): ``` "matchfeatures": { "attribute(doc_embedding_binarized)": { "type": "tensor(x[1])", "values": [0] }, "distance(field,doc_embedding_binarized)": 3.0, "query(q_bin)": { "type": "tensor(x[1])", "values": [-119] } } ``` See distance calculated to 3.0, which is the number of bits different in the binarized vectors, which is the hamming distance. ## Rank profiles and queries Assuming a rank profile like: ``` rank-profile app_ranking { match-features { distance(field, doc_embedding) query(q) attribute(doc_embedding) } inputs { query(q) tensor(x[8]) } first-phase { expression: closeness(field, doc_embedding) } } ``` Query: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \ 'input.query(q)=[2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]' \ 'ranking=app_ranking' ``` A binarized version is like: ``` rank-profile app_ranking_bin { match-features { distance(field, doc_embedding_binarized) query(q_bin) attribute(doc_embedding_binarized) } inputs { query(q_bin) tensor(x[1]) } first-phase { expression: closeness(field, doc_embedding_binarized) } } ``` Query: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin' ``` Query with full-precision query vector, against a binarized vector - rank profile: ``` rank-profile app_ranking_bin_full { match-features { distance(field, doc_embedding_binarized) query(q) query(q_bin) attribute(doc_embedding_binarized) } function unpack_to_float() { expression: 2*unpack_bits(attribute(doc_embedding_binarized), float)-1 } function dot_product() { expression: sum(query(q) * unpack_to_float) } inputs { query(q) tensor(x[8]) query(q_bin) tensor(x[1]) } first-phase { expression: closeness(field, doc_embedding_binarized) } second-phase { expression: dot_product } } ``` Notes: - The first-phase ranking is as the binarized query above. - The second-phase ranking is using the full-precision query vector query(q) with a bit-precision vector cast to float for type match. - Both query vectors must be supplied in the query. Note the differences when using full values in the query tensor, see the relevance score for the results: ``` $ vespa query \ 'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin_full' ... "relevance": 3.0 ``` ``` $ vespa query \ 'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q)=[2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin_full' "relevance": 4.0 ``` Read the [closeness](../reference/ranking/rank-features.html#closeness(dimension,name)) reference documentation. ### TargetHits for ANN Given the lower precision with binarization, it might be a good idea to increase the `{targetHits:5}` annotation in the query, to generate more candidates for later ranking phases. ## Evaluate the quality of the binarized embeddings This exercise is about evaluating a lower-precision retrieval phase, using the original full-sized (here we use floats) query-result pairs as reference. Experiments, query-document precision: 1. float-float 2. binarized-binarized 3. float-binarized 4. float-float, with binarized retrieval To evaluate the precision, compute the differences for each query @10, like: ``` def compute_list_differences(list1, list2): set1 = set(list1) set2 = set(list2) return len(set1 - set2) list1 = [1, 3, 5, 7, 9, 11, 13, 15, 17, 20] list2 = [2, 3, 5, 7, 9, 11, 14, 15, 18, 20] num_hits = compute_list_differences(list1, list2) print(f"Hits different: {num_hits}") ``` ## Remove the original embedding field from memory The purpose of the binarization is reducing memory footprint. Given the results of the evaluation above, store the full-precision embeddings on disk or remove them altogether. Example with paging the attribute to disk-only: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute | index attribute: paged } } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute | index attribute { distance-metric: hamming } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } ``` This example only indexes the binarized embedding, with data binarized before indexing: ``` schema doc { document doc { field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute | index attribute { distance-metric: hamming } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } } ``` ## Appendix: Binarizing from text input To generate the embedding from other data types, like text, use the [converters](../reference/writing/indexing-language.html#converters) - example: ``` field doc_embedding type tensor(x[1]) { indexing: (input title || "") . " " . (input content || "") | embed | attribute attribute { distance-metric: hamming } } ``` Find examples in [Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/). ## Appendix: conversion to int8 Find examples of how to binarize values in code: ``` import numpy as np def floats_to_bits(floats): if len(floats) != 8: raise ValueError("Input must be a list of 8 floats.") bits = [1 if f > 0 else 0 for f in floats] return bits def bits_to_int8(bits): bit_string = ''.join(str(bit) for bit in bits) int_value = int(bit_string, 2) int8_value = np.int8(int_value) return int8_value def floats_to_int8(floats): bits = floats_to_bits(floats) int8_value = bits_to_int8(bits) return int8_value floats = [0.5, -1.2, 3.4, 0.0, -0.5, 2.3, -4.5, 1.2] int8_value = floats_to_int8(floats) print(f"The int8 value is: {int8_value}") ``` ``` import numpy as np def binarize_tensor(tensor: torch.Tensor) -> str: """ Binarize a floating-point 1-d tensor by thresholding at zero and packing the bits into bytes. Returns the hex str representation of the bytes. """ if not tensor.is_floating_point(): raise ValueError("Input tensor must be of floating-point type.") return ( np.packbits(np.where(tensor > 0, 1, 0), axis=0).astype(np.int8).tobytes().hex() ) ``` Multivector example, from[ColPali: Efficient Document Retrieval with Vision Language Models](https://vespa-engine.github.io/pyvespa/examples/colpali-document-retrieval-vision-language-models-cloud.html): ``` import numpy as np from typing import Dict, List from binascii import hexlify def binarize_token_vectors_hex(vectors: List[torch.Tensor]) -> Dict[str, str]: vespa_tensor = list() for page_id in range(0, len(vectors)): page_vector = vectors[page_id] binarized_token_vectors = np.packbits( np.where(page_vector > 0, 1, 0), axis=1 ).astype(np.int8) for patch_index in range(0, len(page_vector)): values = str( hexlify(binarized_token_vectors[patch_index].tobytes()), "utf-8" ) if ( values == "00000000000000000000000000000000" ): # skip empty vectors due to padding of batch continue vespa_tensor_cell = { "address": {"page": page_id, "patch": patch_index}, "values": values, } vespa_tensor.append(vespa_tensor_cell) return vespa_tensor ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Summary](#summary) - [Converters](#converters) - [Binarizing an existing embedding field](#binarizing-an-existing-embedding-field) - [Pre-requisites](#pre-requisites) - [Example](#example) - [Define the binarized embedding field](#define-the-binarized-embedding-field) - [Deploy and reindex the binarized embedding field](#deploy-and-reindex-the-binarized-embedding-field) - [Create new ranking profiles and queries using the binarized embeddings](#create-new-ranking-profiles-and-queries-using-the-binarized-embeddings) - [Quick Hamming distance intro](#quick-hamming-distance-intro) - [Rank profiles and queries](#rank-profiles-and-queries) - [TargetHits for ANN](#targethits-for-ann) - [Evaluate the quality of the binarized embeddings](#evaluate-the-quality-of-the-binarized-embeddings) - [Remove the original embedding field from memory](#remove-the-original-embedding-field-from-memory) - [Appendix: Binarizing from text input](#appendix-binarizing-from-text-input) - [Appendix: conversion to int8](#appendix-conversion-to-int8) --- # Source: https://docs.vespa.ai/en/ranking/bm25.html.md # The BM25 rank feature The[bm25 rank feature](../reference/ranking/rank-features.html#bm25)implements the[Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25)ranking function used to estimate the relevance of a text document given a search query. It is a pure text ranking feature which operates over an[indexed string field](../reference/schemas/schemas.html#indexing-index). The feature is cheap to compute, about 3-4 times faster than[nativeRank](nativerank.html), while still providing a good rank score quality wise. It is a good candidate to use in a first phase ranking function when ranking text documents. ## Ranking function The _bm25_ feature calculates a score for how good a query with termsq1,...,qnmatches an indexed string field _t_ in a document _D_. The score is calculated as follows: ∑inIDF(qi)⋅f(qi,D)⋅(k1+1)f(qi,D)+k1⋅(1-b+b⋅field\_lenavg\_field\_len) Where the components in the function are: - IDF(qi): The [inverse document frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Inverse_document_frequency) (_IDF_) of query term _i_ in field _t_. This is calculated as: - f(qi,D): The number of occurrences (term frequency) of query term _i_ in the field _t_ of document _D_. For multi-value fields we use the sum of occurrences over all elements. - field\_len: The field length (in number of words) of field _t_ in document _D_. For multi-value fields we use the sum of field lengths over all elements. - avg\_field\_len: The average field length of field _t_ among the documents on the content node. Can be configured using [rank-properties](../reference/ranking/rank-feature-configuration.html#bm25). - k1: A parameter used to limit how much a single query term can affect the score for document _D_. With a higher value the score for a single term can continue to go up relatively more when more occurrences for that term exists. Default value is 1.2. Can be configured using [rank-properties](../reference/ranking/rank-feature-configuration.html#bm25). - b: A parameter used to control the effect of the field length of field _t_ compared to the average field length. Default value is 0.75. Can be configured using [rank-properties](../reference/ranking/rank-feature-configuration.html#bm25). ## Example In the following example we have an indexed string field _content_, and a rank profile using the _bm25_ rank feature. Note that the field must be enabled for usage with the bm25 feature by setting the _enable-bm25_ flag in the[index](../reference/schemas/schemas.html#index)section of the field definition. ``` schema example { document example { field content type string { indexing: index | summary index: enable-bm25 } } rank-profile default { first-phase { expression { bm25(content) } } } } ``` If the _enable-bm25_ flag is turned on after documents are already fed then [proton](../content/proton.html) performs a [memory index flush](../content/proton.html#memory-index-flush)followed by a [disk index fusion](../content/proton.html#disk-index-fusion) to prepare the posting lists for use with _bm25_. Use the [custom component state API](../content/proton.html#custom-component-state-api) on each content node and examine `pending_urgent_flush` to determine if the preparation is still ongoing: ``` /state/v1/custom/component/documentdb/mydoctype/subdb/ready/index ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/content/buckets.html.md # Buckets The content layer splits the document space into chunks called _buckets_, and algorithmically maps documents to buckets by their id. The cluster automatically splits and joins buckets to maintain a uniform distribution across all nodes and to keep bucket sizes within configurable limits. Documents have string identifiers that maps to a 58 bit numeric location. A bucket is defined as all the documents that shares a given amount of the least significant bits within the location. The amount of bits used controls how many buckets will exist. For instance, if a bucket contains all documents whose 8 LSB bits is 0x01, the bucket can be split in two by using the 9th bit in the location to split them. Similarly, buckets can be joined by requiring one less bit in common. ## Distribution Distribution happens in several layers. - Documents map to 58 bit numeric locations. - Locations map to buckets - Buckets map to distributors responsible for handling requests related to those buckets. - Buckets map to content nodes responsible for storing replicas of buckets. ### Document to location distribution Document identifiers use [document identifier schemes](../schemas/documents.html)to map documents to locations. This way it is possible to co-locate data within buckets by enforcing some documents to have common LSB bits. Specifying a group or numeric value with the n and g options overrides the 32 LSB bits of the location. Only use when required, e.g. when using streaming search for personal search. ### Location to bucket distribution The cluster state contains a distribution bit count, which is the amount of location bits to use to generate buckets which can be mapped to distributors. The cluster state may change the number of distribution bits to adjust the number of buckets distributed at this level. When adding more nodes to the cluster, the number of buckets increases in order for the distribution to remain uniform. Altering the distribution bit count causes a redistribution of all buckets. If locations have been overridden to co-localize documents into few units, the distribution of documents into these buckets may be skewed. ### Bucket to distributor distribution Buckets are mapped to distributors using the ideal state algorithm. ### Bucket to content node distribution Buckets are mapped to content nodes using the ideal state algorithm. As the content nodes persist data, changing bucket ownership takes more time/resources than on the distributors. There is usually a replica of a bucket on the same content node as the distributor owning the bucket, as the same algorithm is used. The distributors may split the buckets further than the distribution bit count indicates, allowing more units to be distributed among the content nodes to create a more even distribution, while not affecting routing from client to distributors. ## Maintenance operations The content layer defines a set of maintenance operations to keep the cluster balanced. Distributors schedule maintenance operations and issue them to content nodes. Maintenance operations are typically not high priority requests. Scheduling a maintenance operation does not block any external operations. | Split bucket | Split a bucket in two, by enforcing the documents within the new buckets to have more location bits in common. Buckets are split either because they have grown too big, or because the cluster wants to use more distribution bits. | | Join bucket | Join two buckets into one. If a bucket has been previously split due to being large, but documents have now been deleted, the bucket can be joined again. | | Merge bucket | If there are multiple replicas of a bucket, but they do not store the same set of versioned documents, _merge_ is used to synchronize the replicas. A special case of a merge is a one-way merge, which may be done if some of the replicas are to be deleted right after the merge. Merging is used not only to fix inconsistent bucket replicas, but also to move buckets between nodes. To move a bucket, an empty replica is created on the target node, a merge is executed, and the source bucket is deleted. | | Create bucket | This operation exist merely for the distributor to notify a content node that it is now to store documents for this bucket too. This allows content nodes to refuse operations towards buckets it does not own. The ability to refuse traffic is a safeguard to avoid inconsistencies. If a client talks to a distributor that is no longer working correctly, we rather want its requests to fail than to alter the content cluster in strange ways. | | Delete bucket | Drop stored state for a bucket and reject further requests for it | | (De)activate bucket | Activate bucket for search results - refer to [bucket management](proton.html#bucket-management) | | Garbage collections | If configured, documents are periodically garbage collected through background maintenance operations. | ### Bucket split size The distributors may split existing buckets further to keep bucket sizes at manageable levels, or to ensure more units to split among the backends and their partitions. Using small buckets, the distribution will be more uniform and bucket operations will be smaller. Using large buckets, less memory is needed for metadata operations and bucket splitting and joining is less frequent. The size limits may be altered by configuring [bucket splitting](../reference/applications/services/content.html#bucket-splitting). ## Document to bucket distribution Each document has a document identifier following a document identifier[uri scheme](../schemas/documents.html). From this scheme a 58 bit numeric _location_ is generated. Typically, all the bits are created from an MD5 checksum of the whole identifier. Schemes specifying a _groupname_, will have the LSB bits of the location set to a hash of the _groupname_. Thus, all documents belonging to that group will have locations with similar least significant bits, which will put them in the same bucket. If buckets end up split far enough to use more bits than the hash bits overridden by the group, the data will be split into many buckets, but each will typically only contain data for that group. MD5 checksums maps document identifiers to random locations. This creates a uniform bucket distribution, and is default. For some use cases, it is better to co-locate documents, optimizing grouped access - an example is personal documents. By enforcing some documents to map to similar locations, these documents are likely to end up in the same actual buckets. There are several use cases for where this may be useful: - When migrating documents for some entity between clusters, this may be implemented more efficient if the entity is contained in just a few buckets rather than having documents scattered around all the existing buckets. - If operations to the cluster is clustered somehow, clustering the documents equally in the backend may make better use of caches. For instance, if a service stores data for users, and traffic is typically created for users at short intervals while the users are actively using the service, clustering user data may allow a lot of the user traffic to be easily cached by generic bucket caches. If the `n=` option is specified, the 32 LSB bits of the given number overrides the 32 LSB bits of the location. If the `g=` option is specified, a hash is created of the group name, the hash value is then used as if it were specified with `n=`. When the location is calculated, it is mapped to a bucket. Clients map locations to buckets using[distribution bits](#location-to-bucket-distribution). Distributors map locations to buckets by searching their bucket database, which is sorted in inverse location order. The common case is that there is one. If there are several, there is currently inconsistent bucket splitting. If there are none, the distributor will create a new bucket for the request if it is a request that may create new data. Typically, new buckets are generated split according to the distribution bit count. Content nodes should rarely need to map documents to buckets, as distributors specify bucket targets for all requests. However, as external operations are not queued during bucket splits and joins, the content nodes remap operations to avoid having to fail them due to a bucket having recently been split or joined. ### Limitations One basic limitation to the document to location mapping is that it may never change. If it changes, then documents will suddenly be in the wrong buckets in the cluster. This would violate a core invariant in the system, and is not supported. To allow new functionality, document identifier schemes may be extended or created that maps to location in new ways, but the already existing ones must map the same way as they have always done. Current document identifier schemes typically allow the 32 least significant bits to be overridden for co-localization, while the remaining 26 bits are reserved for bits created from the MD5 checksum. ### Splitting When there are enough documents co-localized to the same bucket, causing the bucket to be split, it will typically need to split past the 32 LSB. At this split-level and beyond, there is no longer a 1-1 relationship between the node owning the bucket and the nodes its replica data will be stored on. The effect of this is that documents sharing a location will be spread across nodes in the entire cluster once they reach a certain size. This enables efficient parallel processing. ## Bucket space Buckets exist in the _default_ or _global_ bucket space. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Distribution](#distribution) - [Document to location distribution](#documents-to-location-distribution) - [Location to bucket distribution](#location-to-bucket-distribution) - [Bucket to distributor distribution](#bucket-to-distributor-distribution) - [Bucket to content node distribution](#bucket-to-content-node-distribution) - [Maintenance operations](#maintenance-operations) - [Bucket split size](#bucket-split-size) - [Document to bucket distribution](#document-to-bucket-distribution) - [Limitations](#limitations) - [Splitting](#splitting) - [Bucket space](#bucket-space) --- # Source: https://docs.vespa.ai/en/operations/self-managed/build-install.html.md # Build / install Vespa To develop with Vespa, follow the [guide](https://github.com/vespa-engine/vespa#building) to set up a development environment on AlmaLinux 8 using Docker. Build Vespa Java artifacts with Java \>= 17 and Maven \>= 3.6.3. Once built, Vespa Java artifacts are ready to be used and one can build a Vespa application using the [bundle plugin](../../applications/bundles.html#maven-bundle-plugin). ``` $ export MAVEN_OPTS="-Xms128m -Xmx1024m" $ ./bootstrap.sh java && mvn install ``` See [vespa.ai releases](../../learn/releases.html). ## Container images | Image | Description | | --- | --- | | [docker.io/vespaengine/vespa](https://hub.docker.com/r/vespaengine/vespa) [ghcr.io/vespa-engine/vespa](https://github.com/orgs/vespa-engine/packages/container/package/vespa) | Container image for running Vespa. | | [docker.io/vespaengine/vespa-build-almalinux-8](https://hub.docker.com/r/vespaengine/vespa-build-almalinux-8) | Container image for building Vespa on AlmaLinux 8. | | [docker.io/vespaengine/vespa-dev-almalinux-8](https://hub.docker.com/r/vespaengine/vespa-dev-almalinux-8) | Container image for development of Vespa on AlmaLinux 8. Used for incremental building and system testing. | ## RPMs Dependency graph: ![RPM overview](/assets/img/rpms.svg) Installing Vespa on AlmaLinux 8: ``` $ dnf config-manager \ --add-repo https://raw.githubusercontent.com/vespa-engine/vespa/master/dist/vespa-engine.repo $ dnf config-manager --enable powertools $ dnf install -y epel-release $ dnf install -y vespa ``` Package repository hosting is graciously provided by [Cloudsmith](https://cloudsmith.com) which is a fully hosted, cloud-native and universal package management solution:[![OSS hosting by Cloudsmith](https://img.shields.io/badge/OSS%20hosting%20by-cloudsmith-blue?logo=cloudsmith&style=flat-square)](https://cloudsmith.com) **Important:** Please note that the retention of released RPMs in the repository is limited to the latest 50 releases. Use the Docker images (above) for installations of specific versions older than this. Any problems with released rpm packages will be fixed in subsequent releases, please [report any issues](https://vespa.ai/support/) - troubleshoot using the [install example](/en/operations/self-managed/multinode-systems.html#aws-ec2-singlenode). Refer to [vespa.spec](https://github.com/vespa-engine/vespa/blob/master/dist/vespa.spec). Build RPMs for a given Vespa version X.Y.Z: ``` $ git clone https://github.com/vespa-engine/vespa $ cd vespa $ git checkout vX.Y.Z $ docker run --rm -ti -v $(pwd):/wd:Z -w /wd \ docker.io/vespaengine/vespa-build-almalinux-8:latest \ make -f .copr/Makefile rpms outdir=/wd $ ls *.rpm | grep -v debug vespa-8.634.24-1.el8.src.rpm vespa-8.634.24-1.el8.x86_64.rpm vespa-ann-benchmark-8.634.24-1.el8.x86_64.rpm vespa-base-8.634.24-1.el8.x86_64.rpm vespa-base-libs-8.634.24-1.el8.x86_64.rpm vespa-clients-8.634.24-1.el8.x86_64.rpm vespa-config-model-fat-8.634.24-1.el8.x86_64.rpm vespa-jars-8.634.24-1.el8.x86_64.rpm vespa-libs-8.634.24.el8.x86_64.rpm vespa-malloc-8.634.24-1.el8.x86_64.rpm vespa-node-admin-8.634.24-1.el8.x86_64.rpm vespa-tools-8.634.24-1.el8.x86_64.rpm ``` Find most utilities in the vespa-x.y.z\*.rpm - other RPMs: | RPM | Description | | --- | --- | | vespa-tools | Tools accessing Vespa endpoints for query or document operations: - [vespa-destination](/en/reference/operations/self-managed/tools.html#vespa-destination) - [vespa-fbench](/en/reference/operations/tools.html#vespa-fbench) - [vespa-feeder](/en/reference/operations/self-managed/tools.html#vespa-feeder) - [vespa-get](/en/reference/operations/self-managed/tools.html#vespa-get) - [vespa-query-profile-dump-tool](/en/reference/operations/tools.html#vespa-query-profile-dump-tool) - [vespa-stat](/en/reference/operations/self-managed/tools.html#vespa-stat) - [vespa-summary-benchmark](/en/reference/operations/self-managed/tools.html#vespa-summary-benchmark) - [vespa-visit](/en/reference/operations/self-managed/tools.html#vespa-visit) - [vespa-visit-target](/en/reference/operations/self-managed/tools.html#vespa-visit-target) | | vespa-malloc | Vespa has its own memory allocator, _vespa-malloc_ - refer to _/opt/vespa/etc/vespamalloc.conf_ | | vespa-clients | _vespa-feed-client.jar_ - see [vespa-feed-client](../../clients/vespa-feed-client.html) | Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/applications/bundles.html.md # Bundles The Container uses [OSGi](https://osgi.org) to provide a modular platform for developing applications that can be composed of many reusable components. The user can deploy, upgrade and remove these components at runtime. ## OSGi OSGi is a framework for modular development of Java applications, where a set of resources called _bundles_ can be installed. OSGi allows the developer to control which resources (Java packages) in a bundle that should be available to other bundles. Hence, you can explicitly declare a bundle's public API, and also ensure that internal implementation details remain hidden. Unless you're already familiar with OSGi, we recommend reading Richard S. Hall's presentation [Learning to ignore OSGi](https://cwiki.apache.org/confluence/download/attachments/7956/Learning_to_ignore_OSGi.pdf), which explains the most important aspects that you must relate to as a bundle developer. There are other good OSGi tutorials available: - [OSGi for Dummies](https://thiloshon.wordpress.com/2020/03/04/osgi-for-dummies/) - [OSGi Modularity and Services - Tutorial](https://www.vogella.com/tutorials/OSGi/article.html) (You can ignore the part about OSGi services.) JDisc uses OSGi's _module_ and _lifecycle_ layers, and does not provide any functionality from the _service_ layer. ## OSGi bundles An OSGi bundle is a regular JAR file with a MANIFEST.MF file that describes its content, what the bundle requires (imports) from other bundles, and what it provides (exports) to other bundles. Below is an example of a typical bundle manifest with the most important headers: ``` Bundle-SymbolicName: com.yahoo.helloworld Bundle-Description: A Hello World bundle Bundle-Version: 1.0.0 Export-Package: com.yahoo.helloworld;version="1.0.0" Import-Package: org.osgi.framework;version="1.3.0" ``` The meaning of the headers in this bundle manifest is as follows: - `Bundle-SymbolicName` - The unique identifier of the bundle. - `Bundle-Description` - A human-readable description of the bundle's functionality. - `Bundle-Version` - Designates a version number to the bundle. - `Export-Package` - Expresses which Java packages contained in a bundle will be made available to the outside world. - `Import-Package` - Indicates which Java packages will be required from the outside world to fulfill the dependencies needed in a bundle. Note that OSGi has a strict definition of version numbers that need to be followed for bundles to work correctly. See the [OSGi javadoc](https://docs.osgi.org/javadoc/r4v42/org/osgi/framework/Version.html#Version(java.lang.String)) for details. As a general advice, never use more than three numbers in the version (major, minor, micro). ## Building an OSGi bundle As long as the project was created by following steps in the [Developer Guide](developer-guide.html), the code is already being packaged into an OSGi bundle by the [Maven bundle plugin](#maven-bundle-plugin). However, if migrating an existing Maven project, change the packaging statement to: ``` ``` container-plugin ``` ``` and add the plugin to the build instructions: ``` ``` com.yahoo.vespa bundle-plugin 8.634.24 true true ``` ``` Because OSGi introduces a different runtime environment from what Maven provides when running unit tests, one will not observe any loading and linking errors until trying to deploy the application onto a running Container. Errors triggered at this stage will be the likes of `ClassNotFoundException` and `NoClassDefFoundError`. To debug these types of errors, inspect the stack traces in the [error log](../reference/operations/log-files.html), and refer to [troubleshooting](#troubleshooting). [vespa-logfmt](../reference/operations/self-managed/tools.html#vespa-logfmt) with its _--nldequote_ option is useful when reading logs. The test suite needs to cover deployment of the application bundle to ensure that its dynamic loading and linking issues are covered. ## Depending on non-OSGi ready libraries Unfortunately, many popular Java libraries have yet to be bundled with the appropriate manifest that makes them OSGi-compatible. The simplest solution to this is to set the scope of the problematic dependency to **compile** in your pom.xml file. This will cause the bundle plugin to package the whole library into your bundle's JAR file. Until the offending library becomes available as an OSGi bundle, it means that your bundle will be bigger (in number of bytes), and that classes of that library can not be shared across application bundles. The practical implication of this feature is that the bundle plugin copies the compile-scoped dependency, and its transitive dependencies, into the final JAR file, and adds a `Bundle-ClassPath` instruction to its manifest that references those dependencies. Although this approach works for most non-OSGi libraries, it only works for libraries where the jar file is _self-contained_. If, on the other hand, the library depends on other installed files, it must be treated as if it was a [JNI library](#depending-on-JNI-libraries). ## Depending on JNI Libraries This section details alternatives for using native code in the container. ### OSGi bundles containing native code OSGi jars may contain .so files, which can be loaded in the standard way from Java code in the bundle. Note that since only one instance of an .so can be loaded at any time, it is not possible to hot swap a jar containing .so files - when such jars are changed the [new configuration will not take effect until the container is restarted](components.html#JNI-requires-restart). Therefore, it is often a good idea to package a .so file and its Java API into a separate bundle from the rest of your code to avoid having to restart the container on all code changes. ### Add JNI code to the global classpath When the JNI dependency cannot be packaged in a bundle, and you run on an environment where you can install files locally on the container nodes, you can add the dependency to the container's classpath and explicitly export the packages to make them visible to OSGi bundles. Add the following configuration in the top level _services_ element in [services.xml](../reference/applications/services/container.html): ``` ``` /lib/jars/foo.jar:/path/bar.jar com.foo,com.bar ... ``` ``` Adding the config at the top level ensures that it's applied to all jdisc clusters. The packages are now available and visible, but they must still be imported by the application bundle that uses the library. Here is how to configure the bundle plugin to enforce an import of the packages to the bundle: ``` com.yahoo.vespa bundle-plugin true\\com.foo,com.bar\\ ``` When adding a library to the classpath it becomes globally visible, and exempt from the package visibility management of OSGi. If another bundle contains the same library, there will be class loading issues. ## Maven bundle plugin The _bundle-plugin_ is used to build and package components for the [Vespa Container](components.html) with Maven. Refer to the [multiple-bundles sample app](https://github.com/vespa-engine/sample-apps/tree/master/examples/multiple-bundles) for a practical example. The minimal Maven _pom.xml_ configuration is: ``` 4.0.0 com.yahoo.example basic-application container-plugin\ 8.634.24 \ ``` ``` **Note:** If the requested document-summary only contains fields that are[attributes](../content/attributes.html), the summary store (and cache) is not used. ## Protocol phases caches _ranking.queryCache_ and _groupingSessionCache_described in the [Query API reference](../reference/api/query.html)are only caching data in between phases for a given a query, so other queries do not get any benefits, but these caches saves container - content node(s) round-trips for a _given_ query. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/applications/chaining.html.md # Chained Components [Processors](processing.html), [searcher plug-ins](searchers.html) and [document processors](document-processors.html) are chained components. They are executed serially, with each providing some service or transform, and other optionally depending on these. In other words, a chain is a set of components with dependencies. Javadoc: [com.yahoo.component.chain.Chain](https://javadoc.io/doc/com.yahoo.vespa/chain/latest/com/yahoo/component/chain/Chain.html) It is useful to read the [federation guide](../querying/federation.html) before this document. A chained component has three basic differences from a component in general: - The named services it _provides_ to other components in the chain. - The list of services or checkpoints which the component itself should be _before_ in a chain, in other words, its dependents. - The list of services or checkpoints which the component itself should be _after_ in a chain, in other words, its dependencies. What a component should be placed before, what it should be placed after and what itself provides, may be either defined using Java annotations directly on the component class, or it may be added specifically to the component declarations in [services.xml](../reference/applications/services/container.html). In general, the implementation should have as many of the necessary annotations as practical, leaving the application specific configuration clean and simple to work with. ## Ordering Components The execution order of the components in a chain is not defined by the order of the components in the configuration. Instead, the order is defined by adding the _ordering constraints_ to the components: - Any component may declare that it `@Provides` some named functionality (the names are just labels that have no meaning to the container). - Any component may declare that it must be placed `@Before` some named functionality, - or that it must be placed `@After` some functionality. The container will pick any ordering of a chain consistent with the constraints of the components in the chain. Dependencies can be added in two ways. Dependencies which are due to the code should be added as annotations in the code: ``` import com.yahoo.processing.*; import com.yahoo.component.chain.dependencies.*;@Provides("SourceSelection") @Before("Federation") @After("IntentModel")public class SimpleProcessor extends Processor { @Override public Response process(Request request, Execution execution) { //TODO: Implement this } } ``` Multiple functionality names may be specified by using the syntax `@Provides/Before/After({"A", "B"})`. Annotations which do not belong in the code may be added in the[configuration](../reference/applications/services/container.html): ``` \ai.vespa.examples.Processor1\ ``` For convenience, components always `Provides` their own fully qualified class name (the package and simple class name concatenated, e.g.`ai.vespa.examples.SimpleProcessor`) and their simple name (that is, only the class name, like`SimpleProcessor` in our searcher case), so it is always possible to declare that one must execute before or after some particular component. This goes for both general processors, searchers and document processors. Finally, note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”. ## Chain Inheritance As implied by examples above, chains may inherit other chains in _services.xml_. ``` ``` ``` ``` A chain will include all components from the chains named in the optional `inherits` attribute, exclude from that set all components named in the also optional`excludes` attribute and add all the components listed inside the defining tag. Both `inherits` and`excludes` are space delimited lists of reference names. For search chains, there are two built-in search chains which are especially useful to inherit from, `native` and `vespa`.`native` is a basic search chain, containing the basic functionality most systems will need anyway,`vespa` inherits from `native` and adds a few extra searchers which most installations containing Vespa backends will need. ``` ``` ``` ``` ## Unit Tests A component should be unit tested in a chain containing the components it depends on. It is not necessary to run the dependency handling framework to achieve that, as the `com.yahoo.component.chain.Chain` class has several constructors which are easy to use while testing. ``` Chain c = new Chain(new UselessSearcher("first"), new UselessSearcher("second"), new UselessSearcher("third")); Execution e = new Execution(c, Execution.Context.createContextStub(null)); Result r = e.search(new Query()); ``` The above is a rather useless test, but it illustrates how the basic workflow can be simulated. The constructor will create a chain with supplied searchers in the given order (it will not analyze any annotations). ## Passing Information Between Components When different searchers or document processors depend on shared classes or field names, it is good practice defining the name only in a single place. An [example](searchers.html#passing-information-between-searchers) in the searcher development introduction illustrates an easy way to do that. ## Invoking a Specific Search Chain The search chain to use can be selected in the request, by adding the request parameter:`searchChain=myChain` If no chain is selected in the query, the chain called`default` will be used. If no chain called`default` has been configured, the chain called`native` will be used. The _native_ chain is always present and contains a basic set of searchers needed in most applications. Custom chains will usually inherit the native chain to include those searchers. The search chain can also be set in a [query profile](../querying/query-profiles.html). ## Example: Configuration Annotations which do not belong in the code may be added in the configuration, here a simple example with[search chains](../reference/applications/services/search.html#chain): ``` \Cache\\Statistics\\Logging\\SimpleTest\ ``` And for [document processor chains](../reference/applications/services/docproc.html), it becomes: ``` \TextMetrics\ ``` For searcher plugins the class[com.yahoo.search.searchchain.PhaseNames](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/searchchain/PhaseNames.html)defines a set of checkpoints third party searchers may use to help order themselves when extending the Vespa search chains. Note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”. ## Example: Cache with async write Use case: In a search chain, do early return and do further search asynchronously using [ExecutorService](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/ExecutorService.html). Pseudocode: If cache hit (e.g. using Redis), just return cached data. If cache miss, return null data and let the following searcher finish further query and write back to cache: ``` ``` public Result search(Query query, Execution execution) { // cache lookup if (cache_hit) { return result; } else { execution.search(query); // invoke async cache update searcher next in chain return result; } } ``` ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Ordering Components](#ordering-components) - [Chain Inheritance](#chain-inheritance) - [Unit Tests](#unit-tests) - [Passing Information Between Components](#passing-information-between-components) - [Invoking a Specific Search Chain](#invoking-a-specific-search-chain) - [Example: Configuration](#example-configuration) - [Example: Cache with async write](#example-cache-with-async-write) --- # Source: https://docs.vespa.ai/en/reference/rag/chunking.html.md # Chunking Reference Reference configuration for _chunkers_: Components that splits text into pieces in[chunk indexing expressions](../writing/indexing-language.html#chunk), as in ``` indexing: input myTextField | chunk fixed-length 500 | index ``` See also the [guide to working with chunks](../../rag/working-with-chunks.html). ## Built-in chunkers Vespa provides these built-in chunkers: | Chunker id | Arguments | Description | | --- | --- | --- | | sentence | - | Splits the text into chunks at sentence boundaries. | | fixed-length | target chunk length in characters | Splits the text into chunks with roughly equal length. This will prefer to make chunks of similar length, and to split at reasonable locations over matching the target length exactly. | ## Chunker components Chunkers are [components](../../applications/components.html), so you can also add your own: ``` ``` foo ``` ``` You create a chunker component by implementing the[com.yahoo.language.process.Chunker](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/process/Chunker.java)interface, see [these examples](https://github.com/vespa-engine/vespa/tree/master/linguistics/src/main/java/ai/vespa/language/chunker). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/cloning.html.md # Cloning applications and data This is a guide on how to replicate a Vespa application in different environments, with or without data. Use cases for cloning include: - Get a copy of the application and (some) data on a laptop to work offline, or attach a debugger. - Deploy local experiments to the `dev` environment to easily cooperate and share. - Set up a copy of the application and (some) data to test a new major version of Vespa. - Replicate a bug report in a non-production environment. - Set up a copy of the application and (some) data in a `prod` environment to experiment with a CI/CD pipeline, without touching the current production serving. - Onboard a new team member by setting up a copy of the application and test data in a `dev` environment. - Clone to a `dev` environment for load testing. This guide uses _applications_. One can also use _instances_, but that will not work across Vespa major versions on Vespa Cloud - refer to [tenant, applications, instances](../learn/tenant-apps-instances) for details. Vespa Cloud has different environments `dev` and `prod`, with different characteristics -[details](environments.html). Clone to `dev` for short-lived experiments/development/benchmarking, use `prod` for serving applications with a [CI/CD pipeline](automated-deployments.html). As some steps are similar, it is a good idea to read through all, as details are added only the first time for brevity. Examples are based on the[album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation) sample application. **Note:** When done, it is easy to tear down resources in Vespa Cloud. E.g., _https://console.vespa-cloud.com/tenant/mytenant/application/myapp/prod/deploy_ or_https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default_ to find a delete-link. Instances in `dev` environments are auto-expired ([details](environments.html)), so application cloning is a safe way to work with Vespa. Find more information in [deleting applications](deleting-applications). ## Cloning - self-hosted to Vespa Cloud **Source setup:** ``` $ docker run --detach --name vespa1 --hostname vespa-container1 \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa $ vespa deploy -t http://localhost:19071 ``` **Target setup:** [Create a tenant](../basics/deploy-an-application.html) in the Vespa Cloud console, in this guide using "mytenant". **Export source application package:** This gets the application package and copies it out of the container to local file system: ``` $ vespa fetch -t http://localhost:19071 && \ unzip application.zip -x application.zip ``` **Deploy target application package** The procedure differs a little whether deploying to dev or prod [environment](environments.html). The `mvn -U clean package` step is only needed for applications with custom code. Configure application name and create data plane credentials: ``` $ vespa config set target cloud && \ vespa config set application mytenant.myapp $ vespa auth login $ vespa auth cert -f $ mvn -U clean package ``` **Note:** When deploying to a new app, one will often want to generate a new data plane cert/key pair. To do this, use `vespa auth cert -f`. If reusing a cert/key pair, drop `-f` and make sure to put the pair in _.vespa_, to avoid errors like`Error: open /Users/me/.vespa/mytenant.myapp.default/data-plane-public-cert.pem: no such file or directory`in the subsequent deploy step. Then deploy the application. Depending on the use case, deploy to `dev` or `prod`: - `dev`: ``` $ vespa deploy ``` Expect something like: ``` Uploading application package ... done Success: Triggered deployment of . with run ID 1 Use vespa status for deployment status, or follow this deployment at https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default/job/dev-aws-us-east-1c/run/1 ``` - Deployments to the `prod` environment requires [deployment.xml](/en/reference/applications/deployment.html) - select which [zone](https://cloud.vespa.ai/en/reference/zones) to deploy to: ``` $ cat < deployment.xml aws-us-east-1c EOF ``` `prod` deployments also require `resources` specifications in [services.xml](https://cloud.vespa.ai/en/reference/services) - use [vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/application/services.xml) as an example and add/replace `nodes` elements for `container` and `content` clusters. If in doubt, just add a small config to start with, and change later: ``` ``` Deploy the application package: ``` $ vespa prod deploy ``` Expect something like: ``` Hint: See[production deployment](production-deployment.html)Success: Deployed . See https://console.vespa-cloud.com/tenant/mytenant/application/myapp/prod/deployment for deployment progress ``` A proper deployment to a `prod` zone should have automated tests, read more in [automated deployments](automated-deployments.html) **Data copy** Export documents from the local instance and feed to the Vespa Cloud instance: ``` $ vespa visit -t http://localhost:8080 | vespa feed - ``` Add more parameters as needed to `vespa feed` for other endpoints. **Get access log from source:** ``` $ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default ``` ## Cloning - Vespa Cloud to self-hosted **Download application from Vespa Cloud** Validate the endpoint, and fetch the application package: ``` $ vespa config get application application = mytenant.myapp.default $ vespa fetch Downloading application package... done Success: Application package written to application.zip ``` The application package can also be downloaded from the Vespa Cloud Console: - dev: Navigate to _https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default_, click _Application_ to download: - prod: Navigate to _https://console.vespa-cloud.com/tenant/mytenant1/application/myapp/prod/deployment?tab=builds_ and select the version of the application to download: **Target setup:** Note the name of the application package .zip-file just downloaded. If changes are needed, unzip it and use `vespa deploy -t http://localhost:19071 `to deploy from current directory: ``` $ docker run --detach --name vespa1 --hostname vespa-container1 \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa $ vespa config set target local $ vespa deploy -t http://localhost:19071 mytenant.myapp.default.dev.aws-us-east-1c.zip ``` **Data copy** Set config target cloud for `vespa visit` and pipe the jsonl output into `vespa feed` to the local instance: ``` $ vespa config set target cloud $ vespa visit | vespa feed - -t http://localhost:8080 ``` **data copy - minimal** For use cases requiring a few documents, visit just a few documents: ``` $ vespa visit --chunk-count 10 ``` **Get access log from source:** Use the Vespa Cloud Console to get access logs ## Cloning - Vespa Cloud to Vespa Cloud This is a combination of the procedures above. Download the application package from dev or prod, make note of the source name, like mytenant.myapp.default. Then use `vespa deploy` or `vespa prod deploy` as above to deploy to dev or prod. If cloning from `dev` to `prod`, pay attention to changes in _deployment.xml_ and _services.xml_as in [cloning to Vespa Cloud](#cloning---self-hosted-to-vespa-cloud). **Data copy** Set the feed endpoint name / paths, e.g. mytenant.myapp-new.default: ``` $ vespa config set target cloud $ vespa visit | vespa feed - -t https://default.myapp-new.mytenant.aws-us-east-1c.dev.z.vespa-app.cloud ``` **Data copy 5%**Set the –selection argument to `vespa visit` to select a subset of the documents. ## Cloning - self-hosted to self-hosted Creating a copy from one self-hosted application to another. Self-hosted means running [Vespa](https://vespa.ai/) on a laptop or a [multinode system](self-managed/multinode-systems.html). This example sets up a source app and deploys the [application package](../basics/applications.html) - use [album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation)as an example. The application package is then exported from the source and deployed to a new target app. Steps: **Source setup:** ``` $ vespa config set target local $ docker run --detach --name vespa1 --hostname vespa-container1 \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa $ vespa deploy -t http://localhost:19071 ``` **Target setup:** ``` $ docker run --detach --name vespa2 --hostname vespa-container2 \ --publish 8081:8080 --publish 19072:19071 \ vespaengine/vespa ``` **Export source application package** Export files: ``` $ vespa fetch -t http://localhost:19071 ``` **Deploy application package to target** Before deploying, one can make changes to the application package files as needed. Deploy to target: ``` $ vespa deploy -t http://localhost:19072 application.zip ``` **Data copy from source to target** This pipes the source data directly into `vespa feed` - another option is to save the data to files temporarily and feed these individually: ``` $ vespa visit -t http://localhost:8080 | vespa feed - -t http://localhost:8081 ``` **Data copy 5%** This is an example on how to use a [selection](../reference/writing/document-selector-language.html)to specify a subset of the documents - here a "random" 5% selection: ``` $ vespa visit -t http://localhost:8080 --selection 'id.hash().abs() % 20 = 0' | \ vespa feed - -t http://localhost:8081 ``` **Get access log from source** Get the current query access log from the source application (there might be more files there): ``` $ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/security/cloudflare-workers.html.md # Using Cloudflare Workers with Vespa Cloud This guide describes how you can access mutual TLS protected Vespa Cloud endpoints using[Cloudflare Workers](https://workers.cloudflare.com/). ## Writing and reading from Vespa Cloud Endpoints Vespa Cloud's endpoints are protected using mutual TLS. This means the client must present a TLS certificate that the Vespa application trusts. The application knows which certificate to trust because the certificate is included in the Vespa application package. ### mTLS Configuration Mutual TLS certificates can be created using the[Vespa CLI](../clients/vespa-cli.html): For example, for tenant `samples` with application `vsearch` and instance `default`: ``` $ vespa auth cert --application samples.vsearch.default Success: Certificate written to security/clients.pem Success: Certificate written to $HOME/.vespa/samples.vsearch.default/data-plane-public-cert.pem Success: Private key written to $HOME/.vespa/samples.vsearch.default/data-plane-private-key.pem ``` Refer to the [security guide](guide) for details. ### Creating a Cloudflare Worker to interact with mTLS Vespa Cloud endpoints In March 2023, Cloudflare announced [Mutual TLS available for Workers](https://blog.cloudflare.com/mtls-workers/), see also [Workers Runtime API mTLS](https://developers.cloudflare.com/workers/runtime-apis/mtls/). Install wrangler and create a worker project. Wrangler is the Cloudflare command line interface (CLI), refer to[Workers:Get started guide](https://developers.cloudflare.com/workers/get-started/guide/). Once configured and authenticated, one can upload the Vespa Cloud data plane certificates to Cloudflare. Upload Vespa Cloud mTLS certificates to Cloudflare: ``` $ npx wrangler mtls-certificate upload \ --cert $HOME/.vespa/samples.vsearch.default/data-plane-public-cert.pem \ --key $HOME/.vespa/samples.vsearch.default/data-plane-private-key.pem \ --name vector-search-dev ``` The output will look something like this: ``` Uploading mTLS Certificate vector-search-dev... Success! Uploaded mTLS Certificate vector-search-dev ID: 63316464-1404-4462-baf7-9e9f81114d81 Issuer: CN=cloud.vespa.example Expires on 3/11/2033 ``` Notice the `ID` in the output; This is the `certificate_id` of the uploaded mTLS certificate. To use the certificate in the worker code, add an `mtls_certificates` variable to the `wrangler.toml` file in the project to bind a name to the certificate id. In this case, bind to `VESPA_CERT`: ``` mtls_certificates = [ { binding = "VESPA_CERT", certificate_id = "63316464-1404-4462-baf7-9e9f81114d81" } ] ``` With the above binding in place, you can access the `VESPA_CERT` in Worker code like this: ``` export default { async fetch(request, env) { return await env.VESPA_CERT.fetch("https://vespa-cloud-endpoint"); } } ``` Notice that `env` is a variable passed by the Cloudflare worker infrastructure. ### Worker example The following worker example forwards POST and GET HTTP requests to the `/search/` path of the Vespa cloud endpoint. It rejects other paths or other HTTP methods. ``` /** * Simple Vespa proxy that forwards read (POST and GET) requests to the * /search/ endpoint * Learn more at https://developers.cloudflare.com/workers/ */ export default { async fetch(request, env, ctx) { //Change to your endpoint url, obtained from the Vespa Cloud Console. //Use global endpoint if you have global routing with multiple Vespa regions const vespaEndpoint = "https://vsearch.samples.aws-us-east-1c.dev.z.vespa-app.cloud"; async function MethodNotAllowed(request) { return new Response(`Method ${request.method} not allowed.`, { status: 405, headers: { Allow: 'GET,POST', } }); } async function NotAcceptable(request) { return new Response(`Path not Acceptable.`, { status: 406, }); } if (request.method !== 'GET' && request.method !== 'POST') { return MethodNotAllowed(request); } let url = new URL(request.url) const { pathname, search } = url; if (!pathname.startsWith("/search/")) { return NotAcceptable(request); } const destinationURL = `${vespaEndpoint}${pathname}${search}`; let new_request = new Request(destinationURL, request); return await env.VESPA_CERT.fetch(new_request) }, }; ``` To deploy the above to the worldwide global edge network of Cloudflare, use: ``` $ npx wrangler publish ``` To start a local instance, use: ``` $ npx wrangler dev ``` Test using `curl`: ``` $ curl --json '{"yql": "select * from sources * where true"}' http://127.0.0.1:8787/search/ ``` After publishing to Cloudflare production: ``` $ curl --json '{"yql": "select * from sources * where true"}' https://your-worker-name.workers.dev/search/ ``` ## Data plane access control permissions Vespa Cloud supports having multiple certificates to separate `read` and `write` access. This way, one can upload the read-only certificate to a Cloudflare worker to limit write access. See [Data plane access control permissions](guide#permissions). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Writing and reading from Vespa Cloud Endpoints](#writing-and-reading-from-vespa-cloud-endpoints) - [mTLS Configuration](#mtls-configuration) - [Creating a Cloudflare Worker to interact with mTLS Vespa Cloud endpoints](#creating-a-cloudflare-worker-to-interact-with-mtls-vespa-cloud-endpoints) - [Worker example](#worker-example) - [Data plane access control permissions](#data-plane-access-control-permissions) --- # Source: https://docs.vespa.ai/en/reference/api/cluster-v2.html.md # /cluster/v2 API reference The cluster controller has a /cluster/v2 API for viewing and modifying a content cluster state. To find the URL to access this API, identify the [cluster controller services](../../content/content-nodes.html#cluster-controller) in the application. Only the master cluster controller will be able to respond. The master cluster controller is the cluster controller alive that has the lowest index. Thus, one will typically use cluster controller 0, but if contacting it fails, try number 1 and so on. Using [vespa-model-inspect](/en/reference/operations/self-managed/tools.html#vespa-model-inspect): ``` $ vespa-model-inspect service -u container-clustercontroller container-clustercontroller @ hostname.domain.com : admin admin/cluster-controllers/0 http://hostname.domain.com:19050/ (STATE EXTERNAL QUERY HTTP) http://hostname.domain.com:19117/ (EXTERNAL HTTP) tcp/hostname.domain.com:19118 (MESSAGING RPC) tcp/hostname.domain.com:19119 (ADMIN RPC) ``` In this example, there is only one clustercontroller, and the State Rest API is available on the port marked STATE and HTTP, 19050 in this example. This information can also be retrieved through the model config in the config server. Find examples of API usage in [content nodes](../../content/content-nodes.html#cluster-v2-API-examples). ## HTTP requests | HTTP request | cluster/v2 operation | Description | | --- | --- | --- | | GET | List cluster and nodes. Get cluster, node or disk states. | | | List content clusters | ``` /cluster/v2/ ``` | | | Get cluster state and list service types within cluster | ``` /cluster/v2/ ``` | | | List nodes per service type for cluster | ``` /cluster/v2// ``` | | | Get node state | ``` /cluster/v2/// ``` | | PUT | Set node state | | | Set node user state | ``` /cluster/v2/// ``` | ## Node state Content and distributor nodes have state: | State | Description | | --- | --- | | `Up` | The node is up and available to keep buckets and serve requests. | | `Down` | The node is not available, and can not be used. | | `Stopping` | This node is stopping and is expected to be down soon. This state is typically only exposed to the cluster controller to tell why the node stopped. The cluster controller will expose the node as down or in maintenance mode for the rest of the cluster. This state is thus not seen by the distribution algorithm. | | `Maintenance` | This node is temporarily unavailable. The node is available for bucket placement, so redundancy is lower. Using this mode, new replicas of the documents stored on this node will not be created, allowing the node to be down with less of a performance impact on the rest of the cluster. This mode is typically used to mask a down state during controlled node restarts, or by an administrator that need to do some short maintenance work, like upgrading software or restart the node. | | `Retired` | A retired node is available and serves requests. This state is used to remove nodes while keeping redundancy. Buckets are moved to other nodes (with low priority), until empty. Special considerations apply when using [grouped distribution](../../content/elasticity.html#grouped-distribution) as buckets are not necessarily removed. | Distributor nodes start / transfer buckets quickly and are hence not in `maintenance` or `retired`. Refer to [examples](../../content/content-nodes.html#cluster-v2-API-examples) of manipulating states. ## Types | Type | Spec | Description | | --- | --- | --- | | cluster | _\_ | The name given to a content cluster in a Vespa application. | | description | _.\*_ | Description can contain anything that is valid JSON. However, as the information is presented in various interfaces, some which may present reasons for all the states in a cluster or similar, keeping it short and to the point makes it easier to fit the information neatly into a table and get a better cluster overview. | | group-spec | _\_(\._\_)\* | The hierarchical group assignment of a given content node. This is a dot separated list of identifiers given in the application services.xml configuration. | | node | [0-9]+ | The index or distribution key identifying a given node within the context of a content cluster and a service type. | | service-type | (distributor|storage) | The type of the service to look at state for, within the context of a given content cluster. | | state-disk | (up|down) | One of the valid disk states. | | state-unit | [up](#up) | [stopping](#stopping) | [down](#down) | The cluster controller fetches states from all nodes, called _unit states_. States reported from the nodes are either `up` or `stopping`. If the node can not be reached, a `down` state is assumed. This means, the cluster controller detects failed nodes. The subsequent _generated states_ will have nodes in `down`, and the [ideal state algorithm](../../content/idealstate.html) will redistribute [buckets](../../content/buckets.html) of documents. | | state-user | [up](#up) | [down](#down) | [maintenance](#maintenance) | [retired](#retired) | Use tools for [user state management](/en/operations/self-managed/admin-procedures.html#cluster-state). - Retire a node from a cluster - use `retired` to move buckets to other nodes - Short-lived maintenance work - use `maintenance` to avoid merging buckets to other nodes - Fail a bad node. The cluster controller or an operator can set a node `down` | | state-generated | [up](#up) | [down](#down) | [maintenance](#maintenance) | [retired](#retired) | The cluster controller generates the cluster state from the `unit` and `user` states, over time. The generated state is called the _cluster state_. | ## Request parameters | Parameter | Type | Description | | --- | --- | --- | | recursive | number | Number of levels, or `true` for all levels. Examples: - Use `recursive=1` for a node request to also see all data - use `recursive=2` to see all the node data within each service type In recursive mode, you will see the same output as found in the spec below. However, where there is a `{ "link" : "" }` element, this element will be replaced by the content of that request, given a recursive value of one less than the request above. | ## HTTP status codes Non-exhaustive list of status codes: | Code | Description | | --- | --- | | 200 | OK. | | 303 | Cluster controller not master - master known. This error means communicating with the wrong cluster controller. This returns a standard HTTP redirect, so the HTTP client can automatically redo the request on the correct cluster controller. As the cluster controller available with the lowest index will be the master, the cluster controllers are normally queried in index order. Hence, it is unlikely to ever get this error, but rather fail to connect to the cluster controller if it is not the current master. ``` HTTP/1.1 303 See Other Location: http://\/\Content-Type: application/json { "message" : "Cluster controllerindexnot master. Use master at indexindex. } ``` | | 503 | Cluster controller not master - unknown or no master. This error is used if the cluster controller asked is not master, and it doesn't know who the master is. This can happen, e.g. in a network split, where cluster controller 0 no longer can reach cluster controller 1 and 2, in which case cluster controller 0 knows it is not master, as it can't see the majority, and cluster controller 1 and 2 will vote 1 to master. ``` HTTP/1.1 503 Service Unavailable Content-Type: application/json { "message" : "No known master cluster controller currently exist." } ``` | ## Response format Responses are in JSON format, with the following fields: | Field | Description | | --- | --- | | message | An error message — included for failed requests. | | ToDo | Add more fields here. | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [HTTP requests](#http-requests) - [Node state](#node-state) - [Types](#types) - [Request parameters](#request-parameters) - [HTTP status codes](#http-status-codes) - [Response format](#response-format) --- # Source: https://docs.vespa.ai/en/reference/operations/metrics/clustercontroller.html.md # ClusterController Metrics | Name | Unit | Description | | --- | --- | --- | | cluster-controller.down.count | node | Number of content nodes down | | cluster-controller.initializing.count | node | Number of content nodes initializing | | cluster-controller.maintenance.count | node | Number of content nodes in maintenance | | cluster-controller.retired.count | node | Number of content nodes that are retired | | cluster-controller.stopping.count | node | Number of content nodes currently stopping | | cluster-controller.up.count | node | Number of content nodes up | | cluster-controller.cluster-state-change.count | node | Number of nodes changing state | | cluster-controller.nodes-not-converged | node | Number of nodes not converging to the latest cluster state version | | cluster-controller.stored-document-count | document | Total number of unique documents stored in the cluster | | cluster-controller.stored-document-bytes | byte | Combined byte size of all unique documents stored in the cluster (not including replication) | | cluster-controller.cluster-buckets-out-of-sync-ratio | fraction | Ratio of buckets in the cluster currently in need of syncing | | cluster-controller.busy-tick-time-ms | millisecond | Time busy | | cluster-controller.idle-tick-time-ms | millisecond | Time idle | | cluster-controller.work-ms | millisecond | Time used for actual work | | cluster-controller.is-master | binary | 1 if this cluster controller is currently the master, or 0 if not | | cluster-controller.remote-task-queue.size | operation | Number of remote tasks queued | | cluster-controller.node-event.count | operation | Number of node events | | cluster-controller.resource\_usage.nodes\_above\_limit | node | The number of content nodes above resource limit, blocking feed | | cluster-controller.resource\_usage.max\_memory\_utilization | fraction | Current memory utilisation, for content node with the highest value | | cluster-controller.resource\_usage.max\_disk\_utilization | fraction | Current disk space utilisation, for content node with the highest value | | cluster-controller.resource\_usage.memory\_limit | fraction | Memory space limit as a fraction of available memory | | cluster-controller.resource\_usage.disk\_limit | fraction | Disk space limit as a fraction of available disk space | | reindexing.progress | fraction | Re-indexing progress | Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/applications/components.html.md # Source: https://docs.vespa.ai/en/reference/applications/components.html.md # Component reference A component is any Java class whose lifetime is controlled by the container, see the [Developer Guide](../../applications/developer-guide.html) for an introduction. Components are specified and configured in services.xml and can have other components, and config (represented by generated "Config" classes) [injected](../../applications/dependency-injection.html) at construction time, and in turn be injected into other components. Whenever a component or a resource your component depends on is changed by a redeployment, your component is reconstructed. Once all changed components are reconstructed, new requests are atomically switched to use the new set and the old ones are destructed. If you have multiple constructors in your component, annotate the one to use for injection by `@com.yahoo.component.annotation.Inject`. Identifiable components must implement `com.yahoo.component.Component`, and components that need to destruct resources at removal must subclass `com.yahoo.component.AbstractComponent` and implement `deconstruct()`. See the [example](../../operations/metrics.html#example-qa) for common questions about component uniqueness / lifetime. ## Component Types Vespa defined various component types (superclasses) for common tasks: | Component type | Description | | --- | --- | | Request handler | [Request handlers](../../applications/request-handlers.html) allow applications to implement arbitrary HTTP APIs. A request handler accepts a request and returns a response. Custom request handlers are subclasses of [ThreadedHttpRequestHandler](https://javadoc.io/doc/com.yahoo.vespa/container-disc/latest/com/yahoo/container/jdisc/ThreadedHttpRequestHandler.html). | | Processor | The [processing framework](../../applications/processing.html) can be used to create general composable synchronous request-response systems. Searchers and search chains are an instantiation (through subclasses) of this general framework for a specific domain. Processors are invoked synchronously and the response is a tree of arbitrary data elements. Custom output formats can be defined by adding [renderers](#renderers). | | Renderer | Renderers convert a Response (or query Result) into a serialized form sent over the network. Renderers are subclasses of [com.yahoo.processing.rendering.Renderer](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/processing/rendering/Renderer.java). | | Searcher | Searchers processes Queries and their Results. Since they are synchronous, they can issue multiple queries serially or in parallel to e.g. implement federation or decorate queries with information fetched from a content cluster. Searchers are composed into _search chains_ defined in services.xml. A query request selects a particular search chain which implements the logic of that query. [Read more](../../applications/searchers.html). | | Document processor | Document processors processes incoming document operations. Similar to Searchers and Processors they can be composed in chains, but document processors are asynchronous. [Read more](../../applications/document-processors.html). | | Binding | A binding matches a request URI to the correct [filter chain](#filter) or [request handler](#request-handlers), and route outgoing requests to the correct [client](#client). For instance, the binding _http://\*/\*_ would match any HTTP request, while _http://\*/processing_ would only match that specific path. If several bindings match, the most specific one is chosen. | Server binding | A server binding is a rule for matching incoming requests to the correct request handler, basically the JDisc building block for implementing RESTful APIs. | | Client binding | A client binding is a pattern which is used to match requests originating inside the container, e.g. when doing federation, to a client provider. That is, it is a rule which determines what code should handle a given outgoing request. | | | Filter | A filter is a lightweight request checker. It may set some specific request property, or it may do security checking and simply block requests missing some mandatory property or header. | | Client | Clients, or client providers, are implementations of clients for different protocols, or special rules for given protocols. When a JDisc application acts as a client, e.g. fetches a web page from another host, it is a client provider that handles the transaction. Bindings are used, as with request handlers and filters, to choose the correct client, matching protocol, server, etc., and then hands off the request to the client provider. There is no problem in using arbitrary other types of clients for external services in processors and request handlers. | ## Component configurations This illustrates a typical component configuration set up by the Vespa container: ![Vespa container component configuration](/assets/img/container-components.svg) The network layer associates a Request with a _response handler_ and routes it to the correct type of [request handler](#request-handlers) (typically based on URI binding patterns). If an application needs lightweight request-response processing using decomposition by a series of chained logical units, the [processing framework](../../applications/processing.html) is the correct family of components to use. The request will be routed from ProcessingHandler through one or more chains of [Processor](#processors) instances. The exact format of the output is customizable using a [Renderer](#renderers). If doing queries, SearchHandler will create a Query object, route that to the pertinent chain of [Searcher](#searchers) instances, and associate the returned Result with the correct [Renderer](#renderers) instance for optional customization of the output format. The DocumentProcessingHandler is usually invoked from messagebus, and used for feeding documents into an index or storage. The incoming data is used to build a Document object, and this is then feed through a chain of [DocumentProcessor](#document-processors) instances. If building an application with custom HTTP APIs, for instance arbitrary REST APIs, the easiest way is building a custom [RequestHandler](#request-handlers). This gets the Request, which is basically a set of key-value pairs, and returns a stream of arbitrary data back to the network. ## Injectable Components These components are available from Vespa for [injection](../../applications/dependency-injection.html) into applications in various contexts: | Component | Description | | --- | --- | | Always available | | --- | | [AthenzIdentityProvider](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/jdisc/athenz/AthenzIdentityProvider.java) | Provides the application's Athenz-identity and gives access to identity/role certificate and tokens. | | [BertBaseEmbedder](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/java/ai/vespa/embedding/BertBaseEmbedder.java) | A BERT-Base compatible embedder, see [BertBase embedder](../../rag/embedding.html#bert-embedder). | | [ConfigInstance](https://github.com/vespa-engine/vespa/blob/master/config-lib/src/main/java/com/yahoo/config/ConfigInstance.java) | Configuration is injected into components as `ConfigInstance` components - see [configuring components](../../applications/configuring-components.html). | | [Executor](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html) | Default threadpool for processing requests in threaded request handler | | [Linguistics](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/Linguistics.java) | Inject a Linguistics component like [SimpleLinguistics](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/simple/SimpleLinguistics.java) or provide a custom implementation - see [linguistics](../../linguistics/linguistics.html). | | [Metric](https://github.com/vespa-engine/vespa/blob/master/jdisc_core/src/main/java/com/yahoo/jdisc/Metric.java) | Jdisc core interface for metrics. Required by all subclasses of ThreadedRequestHandler. | | [MetricReceiver](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/metrics/simple/MetricReceiver.java) | Use to emit metrics from a component. Find an example in the [metrics](../../operations/metrics.html#metrics-from-custom-components) guide. | | [ModelsEvaluator](https://github.com/vespa-engine/vespa/blob/master/model-evaluation/src/main/java/ai/vespa/models/evaluation/ModelsEvaluator.java) | Evaluates machine-learned models added to Vespa applications and available as config form. | | [SentencePieceEmbedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/sentencepiece/SentencePieceEmbedder.java) | A native Java implementation of SentencePiece, see [SentencePiece embedder](../rag/embedding.html#sentencepiece-embedder). | | [VespaCurator](https://github.com/vespa-engine/vespa/blob/master/zkfacade/src/main/java/com/yahoo/vespa/curator/api/VespaCurator.java) | A client for ZooKeeper. For use in container clusters that have ZooKeeper enabled. See [using ZooKeeper](../../applications/using-zookeeper). | | [VipStatus](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/handler/VipStatus.java) | Use this to gain control over the service status (up/down) to be emitted from this container. | | [WordPieceEmbedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/wordpiece/WordPieceEmbedder.java) | An implementation of the WordPiece embedder, usually used with BERT models. Refer to [WordPiece embedder](../rag/embedding.html#wordpiece-embedder). | | [SystemInfo](https://github.com/vespa-engine/vespa/blob/master/hosted-zone-api/src/main/java/ai/vespa/cloud/SystemInfo.java) | Vespa Cloud: Provides information about the environment the component is running in. [Read more](/en/applications/components.html#the-systeminfo-injectable-component). | | Available in containers having `search` | | --- | | [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java) | To use the [Document API](../../writing/document-api-guide.html). | | [ExecutionFactory](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/searchchain/ExecutionFactory.java) | To execute new queries from code. [Read more](../../applications/web-services.html#queries). | | [Map\](https://github.com/vespa-engine/vespa/blob/master/model-evaluation/src/main/java/ai/vespa/models/evaluation/Model.java) | Use to inject a set of Models, see [Stateless Model Evaluation](../../ranking/stateless-model-evaluation.html). | | Available in containers having `document-api` or `document-processing` | | --- | | [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java) | To use the [Document API](../../writing/document-api-guide.html). | ## Component Versioning Components as well as many other artifacts in the container can be versioned. This document explains the format and semantics of these versions and how they are referred. ### Format Versions are on the form: ``` version ::= major ["." minor [ "." micro [ "." qualifier]]] ``` Where `major`, `minor`, and `micro` are integers and `qualifier` is any string. A version is appended to an id separated by a colon. In cases where a file is created for each component version, the colon is replaced by a dash in the file name. ### Ordering Versions are ordered first by major, then minor, then micro and then by doing a lexical ordering on the qualifier. This means that `a:1 < a:1.0 < a:1.0.0 < a:1.1 < a:1.1.0 < a:2` ### Referencing a versioned Component Whenever component is referenced by id (in code or configuration), a fully or partially specified version may be included in the reference by using the form `id:versionSpecification`. Such references are resolved using the following rules: - An id without any version specification resolves to the highest version not having a qualifier. - A partially or full version specification resolves to the highest version not having a qualifier which matches the specification. - Versions with qualifiers are matched only by exact match. Example: Given a component with id `a` having these versions: `[1.1, 1.2, 1.2, 1.3.test, 2.0]` - The reference `a` will resolve to `a:2.0` - The reference `a:1` will resolve to `a:1.2` - The only way to resolve to the "test" qualified version is by using the exact reference `a:1.3.test` - These references will not resolve: `a:1.3`, `a:3`, `1.2.3` ### Merging specifications for chained Components In some cases, there is a need for merging multiple references into one. An example is inheritance of chains of version references, where multiple inherited chains may reference the same component. Two version references are said to be _compatible_ if one is a prefix of the other. In this case the most specific version is used. If they are not compatible they are _conflicting_. Example: ``` bundle="the name in in your pom.xml" bundle="the name in in your pom.xml" bundle="the name in in your pom.xml" ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Component Types](#component-types) - [Component configurations](#component-configurations) - [Injectable Components](#injectable-components) - [Component Versioning](#component-versioning) - [Format](#format) - [Ordering](#ordering) - [Referencing a versioned Component](#referencing-a-versioned-component) - [Merging specifications for chained Components](#merging-specifications-for-chained-components) --- # Source: https://docs.vespa.ai/en/schemas/concrete-documents.html.md # Concrete documents In [document processing](../applications/document-processors.html),`setFieldValue()` and `getFieldValue()`is used to access fields in a `Document`. The data for each of the fields in the document instance is wrapped in field values. If the documents use structs, they are handled the same way. Example: ``` book.setFieldValue("title", new StringFieldValue("Moby Dick")); ``` Alternatively, use code generation to get a _concrete document type_, a `Document` subclass that represents the exact document type (defined for example in the file `book.sd`). To generate, include it in the build, plugins section in _pom.xml_: ``` com.yahoo.vespa vespa-documentgen-plugin 8.634.24 \etc/schemas\ document-gen document-gen ``` `schemasDirectory` contains the[schemas](../reference/schemas/schemas.html). Generated classes will be in _target/generated-sources_. The document type `book` will be represented as the Java class `Book`, and it will have native methods for data access, so the code example above becomes: ``` book.setTitle("Moby Dick"); ``` | Configuration | Description | | --- | --- | | Java package | Specify the Java package of the generated types by using the following configuration: ``` com.yahoo.mypackage ``` | | User provided annotation types | To provide the Java implementation of a given annotation type, yielding _behaviour of annotations_ (implementing additional interfaces may be one scenario): ``` etc/schemas NodeImpl com.yahoo.vespa.document.NodeImpl DocumentImpl com.yahoo.vespa.document.DocumentImpl ``` Here, the plugin will not generate a type for `NodeImpl` and `DocumentImpl`, but the `ConcreteDocumentFactory` will support them, so that code depending on this will work. | | Abstract annotation types | Make a generated annotation type abstract: ``` myabstractannotationtype ``` | ## Inheritance If input document types use single inheritance, the generated Java types will inherit accordingly. However, if a document type inherits from more than one type (example: `document myDoc inherits base1, base2`), the Java type for `myDoc` will just inherit from `Document`, since Java has single inheritance. Refer to [schema inheritance](inheritance-in-schemas.html) for examples. ## Feeding Concrete types are often used in a docproc, used for feeding data into stateful clusters. To make Vespa use the correct type during feeding and serialization, include in `` in [services.xml](../reference/applications/services/services.html ): ``` in your pom.xml"class="com.yahoo.mypackage.Book"/> ``` Vespa will make the type `Book` and all other concrete document, annotation and struct types from the bundle available to the docproc(s) in the container. The specified bundle must be the `Bundle-SymbolicName`. It will also use the given Java type when feeding through a docproc chain. If the class is not in the specified bundle, the container will emit an error message about not being able to load`ConcreteDocumentFactory` as a component, and not start. There is no need to `Export-Package` the concrete document types from the bundle, a `package-info.java` is generated that does that. ## Factory and copy constructor Along with the actual types, the Maven plugin will also generate a class `ConcreteDocumentFactory`, which holds information about the actual concrete types present. It can be used to initialize an object given the document type: ``` Book b = (Book) ConcreteDocumentFactory.getDocument("book", new DocumentId("id:book:book::0")); ``` This can be done for example during deserialization, when a document is created. The concrete types also have copy constructors that can take a generic`Document` object of the same type. The contents will be deep-copied: ``` Document bookGeneric; // … Book book = new Book(bookGeneric, bookGeneric.getId()); ``` All the accessor and mutator methods on `Document` will work as expected on concrete types. Note that `getFieldValue()` will _generate_ an ad-hoc `FieldValue` _every time_, since concrete types don't use them to store data.`setFieldValue()` will pack the data into the native Java field of the type. ## Document processing In a document processor, cast the incoming document base into the concrete document type before accessing it. Example: ``` public class ConcreteDocDocProc extends DocumentProcessor { public Progress process(Processing processing) { DocumentPut put = (DocumentPut) processing.getDocumentOperations().get(0); Book b = (Book) (put.getDocument()); b.setTitle("The Title"); return Progress.DONE; } } ``` Concrete document types are not supported for document updates or removes. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/reference/applications/config-files.html.md # Custom Configuration File Reference This is the reference for config file definitions. It is useful for developing applications that has[configurable components](../../applications/configuring-components.html)for the [Vespa Container](../../applications/containers.html), where configuration for individual components may be provided by defining[``](#generic-configuration-in-services-xml)elements within the component's scope in services.xml. ## Config definition files Config definition files are part of the source code of your application and have a _.def_ suffix. Each file defines and documents the content and semantics of one configuration type. Vespa's builtin _.def_ files are found in`$VESPA_HOME/share/vespa/configdefinitions/`. ### Package Package is a mandatory statement that is used to define the package for the java class generated to represent the file. For [container component](../../applications/components.html) developers, it is recommended to use a separate package for each bundle that needs to export config classes, to avoid conflicts between bundles that contain configurable components. Package must be the first non-comment line, and can only contain lower-case characters and dots: ``` package=com.mydomain.mypackage ``` ### Parameter names Config definition files contain lines on the form: ``` parameterName type [default=value] [range=[min,max]] ``` camelCase in parameter names is recommended for readability. ### Parameter types Supported types for variables in the _.def_ file: | int | 32 bit signed integer value | | long | 64 bit signed integer value | | double | 64 bit IEEE float value | | enum | Enumerated types. A set of strings representing the valid values for the parameter, e.g: ``` foo enum {BAR, BAZ, QUUX} default=BAR ``` | | bool | A boolean (true/false) value | | string | A String value. Default values must be enclosed in quotation marks (" "), and any internal quotation marks must be escaped by backslash. Likewise, newlines must be escaped to `\n` | | path | A path to a physical file or directory in the application package. This makes it possible to access files from the application package in container components. The path is relative to the root of the [application package](../../basics/applications.html). A path parameter cannot have a default value, but may be optional (using the _optional_ keyword after the type). An optional path does not have to be set, in which case it will be an empty value. The content will be available as a `java.nio.file.Path` instance when the component accessing this config is constructed, or an `Optional` if the _optional_ keyword is used. | | url | Similar to `path`, an arbitrary URL of a file that should be downloaded and made available to container components. The file content will be available as a java.io.File instance when the component accessing this config is constructed. Note that if the file takes a long time to download, it will also take a long time for the container to come up with the configuration referencing it. See also the [note about changing contents for such a url](../../applications/configuring-components.html#adding-files-to-the-component-configuration). | | model | A pointer to a machine-learned model. This can be a model-id, url or path, and multiple of these can be specified as a single config value, where one is used depending on the deployment environment: - If a model-id is specified and the application is deployed on Vespa Cloud, the model-id is used. - Otherwise, if a URL is specified, it is used. - Otherwise, path is used. You may also use remote URLs protected by bearer-token authentication by supplying the optional `secret-ref` attribute. See [using private Huggingface models](../rag/embedding.html#private-model-hub). On the receiving side, this config value is simply represented as a file path regardless of how it is resolved. This makes it easy to refer to models in multiple ways such that the appropriate one is used depending on the context. The special syntax for setting these config values is documented in [adding files to the configuration](../../applications/configuring-components.html#adding-files-to-the-component-configuration). | | reference | A config id to another configuration (only for internal vespa usage) | ### Structs Structs are used to group a number of parameters that naturally belong together. A struct is declared by adding a '.' between the struct name and each member's name: ``` basicStruct.foo string basicStruct.bar int ``` ### Arrays Arrays are declared by appending square brackets to the parameter name. Arrays can either contain simple values, or have children. Children can be simple parameters and/or structs and/or other arrays. Arbitrarily complex structures can be built to any depth. Examples: ``` intArr[] int # Integer value array row[].column[] int # Array of integer value arrays complexArr[].foo string # Complex array that contains complexArr[].bar double # … two simple parameters complexArr[].coord.x int # … and a struct called 'coord' complexArr[].coord.y int complexArr[].coord.depths[] double # … that contains a double array ``` Note that arrays cannot have default values, even for simple value arrays. An array that has children cannot contain simple values, and vice versa. In the example above, `intArr` and `row.column` could not have children, while `row` and `complexArr` are not allowed to contain values. ### Maps Maps are declared by appending curly brackets to the parameter name. Arbitrarily complex structures are supported also here. Examples: ``` myMap{} int complexMap{}.nestedMap{}.id int complexMap{}.nestedMap{}.name string ``` ## Generic configuration in services.xml `services.xml`has four types of elements: | individual service elements | (e.g. _searcher_, _handler_, _searchnode_) - creates a service, but has no child elements that create services | | service group elements | (e.g. _content_, _container_, _document-processing_ - creates a group of services and can have all types of child elements | | dedicated config elements | (e.g. _accesslog_) - configures a service or a group of services and can only have other dedicated config elements as children | | generic config elements | always named _config_ | Generic config elements can be added to most elements that lead to one or more services being created - i.e. service group elements and individual service elements. The config is then applied to all services created by that element and all descendant elements. For example, by adding _config_ for _container_, the config will be applied to all container components in that cluster. Config at a deeper level has priority, so this config can be overridden for individual components by setting the same config values in e.g. _handler_ or _server_ elements. Given the following config definition, let's say its name is `type-examples.def`: ``` package=com.mydomain stringVal string myArray[].name string myArray[].type enum {T1, T2, T3} default=T1 myArray[].intArr[] int myMap{} string basicStruct.foo string basicStruct.bar int default=0 range=[-100,100] boolVal bool myFile path myUrl url myOptionalPath path optional ``` To set all the values for this config in `services.xml`, add the following xml at the desired element (the name should be _\.\_): ``` val elem_0 T2 0 1 elem_1 T3 0 1 val1 val2 str 3 true components/file1.txt https://docs.vespa.ai/en/reference/query-api-reference.html ``` Note that each '.' in the parameter's definition corresponds to a child element in the xml. It is not necessary to set values that already have a default in the _.def_ file, if you want to keep the default value. Hence, in the example above, `basicStruct.bar` and `myArray[].type`could have been omitted in the xml without generating any errors when deploying the application. ### Configuring arrays Assigning values to _arrays_ is done by using the `` element. This ensures that the given config values do not overwrite any existing array elements from higher-level xml elements in services, or from Vespa itself. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Config definition files](#config-definition-files) - [Package](#package) - [Parameter names](#parameter-names) - [Parameter types](#parameter-types) - [Structs](#structs) - [Arrays](#arrays) - [Maps](#maps) - [Generic configuration in services.xml](#generic-configuration-in-services-xml) - [Configuring arrays](#configuring-arrays) --- # Source: https://docs.vespa.ai/en/operations/self-managed/config-proxy.html.md # Configuration proxy Read [application packages](../../basics/applications.html) for an overview of the cloud config system. The _config proxy_ runs on every Vespa node. It has a set of config sources, defined in [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables). The config proxy will act as a proxy for config clients on the same machine, so that all clients can ask for config on _localhost:19090_. The _config source_ that the config proxy uses is set in [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) and consists of one or more config sources (the addresses of [config servers](configuration-server.html)). The proxy has a memory cache that is used to serve configs if it is possible. In default mode, the proxy will have an outstanding request to the config server that will return when the config has changed (a new generation of config). This means that every time config changes on the config server, the proxy will get a response, update its cache and respond to all its clients with the changed config. The config proxy has two modes: | Mode | Description | | --- | --- | | default | Gets config from server and stores in memory cache. The config proxy will always be started in _default_ mode. Serves from cache if possible. Always uses a config source. If restarted, it will lose all configs that were cached in memory. | | memorycache | Serves config from memory cache only. Never uses a config source. A restart will lose all cached configs. Setting the mode to _memorycache_ will make all applications on the node work as before (given that they have previously been running and requested config), since the config proxy will serve config from cache and work without connection to any config server. Applications on this node will not work if the config proxy stops, is restarted or crashes. | Use [vespa-configproxy-cmd](../../reference/operations/self-managed/tools.html#vespa-configproxy-cmd)to inspect cached configs, mode, config sources etc., there are also some commands to change some of the settings. Run the command as: ``` $ vespa-configproxy-cmd -m ``` to see all possible commands. ## Detaching from config servers ``` $ vespa-configproxy-cmd -m setmode memorycache ``` ## Inspecting config To inspect the configuration for a service, in this example a searchnode (proton) instance, do: 1. Find the active config generation used by the service, using [/state/v1/config](../../reference/api/state-v1.html#state-v1-config) - example for _http://localhost:19110/state/v1/config_, here the generation is 2: ``` ``` { "config": { "generation": 2, "proton": { "generation": 2 }, "proton.documentdb.music": { "generation": 2 } } } ``` ``` 2. Find the relevant _config definition name_, _config id_ and _config generation_ using [vespa-configproxy-cmd](../../reference/operations/self-managed/tools.html#vespa-configproxy-cmd) - e.g.: ``` $ vespa-configproxy-cmd | grep protonvespa.config.search.core.proton,music/search/cluster.music/0,2,MD5:40087d6195cedb1840721b55eb333735,XXHASH64:43829e79cea8e714 ``` `vespa.config.search.core.proton` is the _config definition name_ for this particular config, `music/search/cluster.music/0` is the _config id_ used by the proton service instance on this node and `2` is the active config generation. This means, the service is using the correct config generation as it is matching the /state/v1/config response (a restart can be required for some config changes). 3. Get the generated config using [vespa-get-config](../../reference/operations/self-managed/tools.html#vespa-get-config) - e.g.: ``` $ vespa-get-config -n vespa.config.search.core.proton -i music/search/cluster.music/0 basedir "/opt/vespa/var/db/vespa/search/cluster.music/n0" rpcport 19106 httpport 19110 ... ``` **Important:** Omitting `-i` will return the default configuration, meaning not generated for the active service instance. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/self-managed/config-sentinel.html.md # Config sentinel The config sentinel starts and stops services - and restart failed services unless they are manually stopped. All nodes in a Vespa system have at least these running processes: | Process | Description | | --- | --- | | [config-proxy](config-proxy.html) | Proxies config requests between Vespa applications and the configserver node. All configuration is cached locally so that this node can maintain its current configuration, even if the configserver shuts down. | | config-sentinel | Registers itself with the _config-proxy_ and subscribes to and enforces node configuration, meaning the configuration of what services should be run locally, and with what parameters. | | [vespa-logd](../../reference/operations/log-files.html#logd) | Monitors _$VESPA\_HOME/logs/vespa/vespa.log_, which is used by all other services, and relays everything to the [log-server](../../reference/operations/log-files.html#log-server). | | [metrics-proxy](monitoring.html#metrics-proxy) | Provides APIs for metrics access to all nodes and services. | ![Vespa node configuration, startup and logs](/assets/img/config-sentinel.svg) Start sequence: 1. _config server(s)_ are started and application config is deployed to them - see [config server operations](configuration-server.html). 2. _config-proxy_ is started. The environment variables [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) and [VESPA\_CONFIGSERVER\_RPC\_PORT](files-processes-and-ports.html#environment-variables) are used to connect to the [config-server(s)](configuration-server.html). It will retry all config servers in case some are down. 3. _config-sentinel_ is started, and subscribes to node configuration (i.e. a service list) from _config-proxy_ using its hostname as the [config id](../../applications/configapi-dev.html#config-id). See [Node and network setup](node-setup.html) for details about how the hostname is detected and how to override it. The config for the config-sentinel (the service list) lists the processes to be started, along with the _config id_ to assign to each, typically the logical name of that service instance. 4. _config-proxy_ subscribes to node configuration from _config-server_, caches it, and returns the result to _config-sentinel_ 5. _config-sentinel_ starts the services given in the node configuration, with the config id as argument. See example output below, like _id="search/qrservers/qrserver.0"_. _logd_ and _metrics-proxy_ are always started, regardless of configuration. Each service: 1. Subscribes to configuration from _config-proxy_. 2. _config-proxy_ subscribes to configuration from _config-server_, caches it and returns result to the service. 3. The service runs according to its configuration, logging to _$VESPA\_HOME/logs/vespa/vespa.log_. The processes instantiate internal components, each assigned the same or another config id, and instantiating further components. Also see [cluster startup](#cluster-startup) for a minimum nodes-up start setting. When new config is deployed to _config-servers_ they propagate the changed configuration to nodes subscribing to it. In turn, these nodes reconfigure themselves accordingly. ## User interface The config sentinel runs an RPC service which can be used to list, start and stop the services supposed to run on that node. This can be useful for testing and debugging. Use [vespa-sentinel-cmd](../../reference/operations/self-managed/tools.html#vespa-sentinel-cmd) to trigger these actions. Example output from `vespa-sentinel-cmd list`: ``` vespa-sentinel-cmd 'sentinel.ls' OK. container state=RUNNING mode=AUTO pid=27993 exitstatus=0 id="default/container.0" container-clustercontroller state=RUNNING mode=AUTO pid=27997 exitstatus=0 id="admin/cluster-controllers/0" distributor state=RUNNING mode=AUTO pid=27996 exitstatus=0 id="search/distributor/0" logd state=RUNNING mode=AUTO pid=5751 exitstatus=0 id="hosts/r6-3/logd" logserver state=RUNNING mode=AUTO pid=27994 exitstatus=0 id="admin/logserver" searchnode state=RUNNING mode=AUTO pid=27995 exitstatus=0 id="search/search/cluster.search/0" slobrok state=RUNNING mode=AUTO pid=28000 exitstatus=0 id="admin/slobrok.0" ``` To learn more about the processes and services, see [files and processes](files-processes-and-ports.html). Use [vespa-model-inspect host _hostname_](../../reference/operations/self-managed/tools.html#vespa-model-inspect) to list services running on a node. ## Cluster startup The config sentinel will not start services on a node unless it has connectivity to a minimum of other nodes, default 50%. Find an example of this feature in the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA#start-the-admin-server) example application. Example configuration: ``` ``` 20 1 ``` ``` Example: `minOkPercent 10` means that services will be started only if more than or equal to 10% of nodes are up. If there are 11 nodes in the application, the first node started will not start its services - when the second node is started, services will be started on both. `maxBadCount` is for connectivity checks where the other node is up, but we still do not have proper two-way connectivity. Normally, one-way connectivity means network configuration is broken and needs looking into, so this may be set low (1 or even 0 are the recommended values). If there are some temporary problems (in the example below non-responding DNS which leads to various issues at startup) the config sentinel will loop and retry, so the service startup will just be slightly delayed. Example log: ``` [2021-06-15 14:33:25] EVENT : starting/1 name="sbin/vespa-config-sentinel -c hosts/le40808.ostk (pid 867)" [2021-06-15 14:33:25] EVENT : started/1 name="config-sentinel" [2021-06-15 14:33:25] CONFIG : Sentinel got 4 service elements [tenant(footest), application(bartest), instance(default)] for config generation 1001 [2021-06-15 14:33:25] CONFIG : Booting sentinel 'hosts/le40808.ostk' with [stateserver port 19098] and [rpc port 19097] [2021-06-15 14:33:25] CONFIG : listening on port 19097 [2021-06-15 14:33:25] CONFIG : Sentinel got model info [version 7.420.21] for 35 hosts [config generation 1001] [2021-06-15 14:33:25] CONFIG : connectivity.maxBadCount = 3 [2021-06-15 14:33:25] CONFIG : connectivity.minOkPercent = 40 [2021-06-15 14:33:28] INFO : Connectivity check details: 2086533.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le01287.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23256.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23267.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23297.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23312.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le23317.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23319.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le30550.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le30553.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le30556.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le30560.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le30567.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40387.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40389.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40808.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40817.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40833.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40834.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le40841.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le40858.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40860.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le40863.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le40873.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40892.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40900.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40905.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40914.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02318.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02324.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02340.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40672.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40712.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40728.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt41329.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] WARNING : 8 of 35 nodes up but with network connectivity problems (max is 3) [2021-06-15 14:33:28] WARNING : Bad network connectivity (try 1) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le30556.ostk' -> '1234:5678:90:123::abcd' (5.00528 s) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le40834.ostk' -> '1234:5678:90:456::efab' (5.00527 s) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le40860.ostk' -> '1234:5678:90:789::cdef' (5.00459 s) [2021-06-15 14:33:31] INFO : Connectivity check details: le23312.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le23319.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le30553.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le30556.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40834.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40841.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40860.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:31] INFO : Connectivity check details: le40863.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Enough connectivity checks OK, proceeding with service startup [2021-06-15 14:33:31] EVENT : starting/1 name="searchnode" ... ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/applications/config-system.html.md # The Config System The config system in Vespa is responsible for turning the application package into live configuration of all the nodes, processes and components that realizes the running system. Here we deep dive into various aspects of how this works. ## Node configuration The problem of configuring nodes can be divided into three parts, each addressed by different solutions: - **Node system level configuration:** Configure OS level settings such as time zone as well as user privileges on the node. - **Package management**: Ensure that the correct set of software packages is installed on the nodes. This functionality is provided by three tools working together. - **Vespa configuration:** Starts the configured set of processes on each node with their configured startup parameters and provides dynamic configuration to the modules run by these services. _Configuration_ here is any data which: - can not be fixed at compile time - is static most of the time Note that by these definitions, this allows all the nodes to have the same software packages (disregarding version differences, discussed later), as variations in what services are run on each node and in their behavior is achieved entirely by using Vespa Configuration. This allows managing the complexity of node variations completely within the configuration system, rather than across multiple systems. Configuring a system can be divided into: - **Configuration assembly:** Assembly of a complete set of configurations for delivery from the inputs provided by the parties involved in configuring the system - **Configuration delivery:** Definition of individual configurations, APIs for requesting and accessing configuration, and the mechanism for delivering configurations from their source to the receiving components This division allows the problem of reliable configuration delivery in large distributed systems to be addressed in configuration delivery, while the complexities of assembling complete configurations can be treated as a vm-local design problem. An important feature of Vespa Configuration is the nature of the interface between the delivery and assembly subsystems. The assembly subsystem creates as output a (Java) object model of the distributed system. The delivery subsystem queries this model to obtain concrete configurations of all the components of the system. This allows the assembly subsystem to accept higher level, and simpler to use, abstractions as input and automatically derive detailed configurations with the correct interdependencies. This division insulates the external interface and the components being configured from changes in each other. In addition, the system model provides the home for logic implementing node/component instance variations of configuration. ## Configuration assembly Config assembly is the process of turning the configuration input sources into an object model of the desired system, which can respond to queries for configs given a name and config id. Config assembly for Vespa systems can become complex, because it involves merging information owned by multiple parties: - **Vespa operations** own the nodes and controls assignment of nodes to services/applications - **Vespa service providers** own services which hosts multiple applications running on Vespa - **Vespa applications** define the final applications running on nodes and shared services The current config model assembly procedure uses a single source - the _application package_. The application package is a directory structure containing defined files and subdirectories which together completely defines the system - including which nodes belong in the system, which services they should run and the configuration of these services and their components. When the application deployer wants to change the application,[vespa prepare](../reference/clients/vespa-cli/vespa_prepare) is issued to a config server, with the application package as argument. At this point the system model is assembled and validated and any feedback is issued to the deployer. If the deployer decides to make the new configuration active, a [vespa activate](../reference/clients/vespa-cli/vespa_activate) is then issued, causing the config server cluster to switch to the new system model and respond with new configs on any active subscriptions where the new system model caused the config to change. This ensures that subscribers gets new configs timely on changes, and that the changes propagated are the minimal set such that small changes to an application package causes correspondingly small changes to the system. ![The config server assembles app config](/assets/img/config-assembly.svg) The config model itself is pluggable, so that service providers may write plugins for assembling a particular service. The plugins are written in Java, and is installed together with the Vespa Configuration. Service plugins define their own syntax for specifying services that may be configured by Vespa applications. This allows the applications to be specified in an abstract manner, decoupled from the configuration that is delivered to the components. ## Configuration delivery Configuration delivery encompasses the following aspects: - Definition of configurations - The component view (API) of configuration - Configuration delivery mechanism These aspects work together to realize the following goals: - Eliminate inconsistency between code and configuration. - Eliminate inconsistency between the desired configuration and the state on each node. - Limit temporary inconsistencies after reconfiguration. The next three subsections discusses the three aspects above, followed by subsections on two special concerns - bootstrapping and system upgrades. ### Configuration definitions A _configuration_ is a set of simple or array key-values with a name and a type, which can possibly be nested - example: ``` myProperty "myvalue" myArray[1] myArray[0].key1 "someValue" myArray[0].key2 1337 ``` The _type definition_ (or class) of a configuration object defines and documents the set of fields a configuration may contain with their types and default values. It has a name as well as a namespace. For example, the above config instance may have this definition: ``` namespace=foo.bar # Documentation of this key myProperty string default="foo" # etc. myArray[].key1 string myArray[].key2 int default=0 ``` An individual config typically contains a coherent set of settings regarding some topic, such as _logging_ or _indexing_. A complete system consists of many instances of many config types. ### Component view Individual components of a system consumes one or more such configs and use their values to influence their behavior. APIs are needed for _requesting_ configs and for _accessing_ the values of those configs as they are provided. _Access_ to configs happens through a (Java or C++) class generated from the config definition file. This ensures that any inconsistency between the fields declared in a config type and the expectations of the code accessing it are caught at compile time. The config definition is best viewed as another class with an alternative form of source syntax belonging to the components consuming it. A Maven target is provided for generating such classes from config definition types. Components may use two different methods for _requesting_ configurations subscription and dependency injection: **Subscription:** The component sets up_ConfigSubscriber_, then subscribes to one or more configs. This is the simple approach, there are [other ways of](configapi-dev.html)getting configs too: ``` ``` ConfigSubscriber subscriber = new ConfigSubscriber(); ConfigHandle handle = subscriber.subscribe(MyConfig.class, "myId"); if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out."); if (handle.isChanged()) { String message = handle.getConfig().myKey(); // ... consume the rest of this config } ``` ``` **Dependency injection:** The component declares its config dependencies in the constructor and subscriptions are set up on its behalf. When changed configs are available a new instance of the component is created. The advantage of this method is that configs are immutable throughout the lifetime of the component such that no thread coordination is required. This method is currently only available in Java using the [Container](containers.html). ``` ``` public MyComponent(MyConfig config) { String myKey = config.myKey(); // ... consume the rest of this config } ``` ``` For unit testing,[configs can be created with Builders](configapi-dev.html#unit-testing), submitted directly to components. ### Delivery mechanism The config delivery mechanism is responsible for ensuring that a new config instance is delivered to subscribing components, each time there is a change to the system model causing that config instance to change. A config subscription is identified by two parameters, the _config definition name and namespace_and the [config id](configapi-dev.html#config-id)used to identify the particular component instance making the subscription. The in-process config library will forward these subscription requests to a node local[config proxy](../operations/self-managed/config-proxy.html), which provides caching and fan-in from processes to node. The proxy in turn issues these subscriptions to a node in the configuration server cluster, each of which hosts a copy of the system model and resolves config requests by querying the system model. To provide config server failover, the config subscriptions are implemented as long-timeout gets, which are immediately resent when they time out, but conceptually this is best understood as push subscriptions: ![Nodes get config from a config server cluster](/assets/img/config-delivery.svg) As configs are not stored as files locally on the nodes, there is no possibility of inconsistencies due to local edits, or of nodes coming out of maintenance with a stale configuration. As configuration changes are pushed as soon as the config server cluster allows, time inconsistencies during reconfigurations are minimized, although not avoided as there is no global transaction. Application code and config is generally pulled from the config server - it is however possible to use the [url](../reference/applications/config-files.html#url)config type to refer to any resource to download to nodes. ### Bootstrapping Each Vespa node runs a [config-sentinel](../operations/self-managed/config-sentinel.html) process which start and maintains services run on a node. ### System upgrades The configuration server will up/downgrade between config versions on the fly on minor upgrades which causes discrepancies between the config definitions requested from those produced by the configuration model. Major upgrades, which involve incompatible changes to the configuration protocol or the system model, require a [procedure](../operations/self-managed/config-proxy.html). ## Notes Find more information for using the Vespa config API in the[reference doc](configapi-dev.html). Vespa Configuration makes the following assumptions about the nodes using it: - All nodes have the software packages needed to run the configuration system and any services which will be configured to run on the node. This usually means that all nodes have the same software, although this is not a requirement - All nodes have [VESPA\_CONFIGSERVERS](../operations/self-managed/files-processes-and-ports.html#environment-variables) set - All nodes know their fully qualified domain name Reading this document is not necessary in order to use Vespa or to develop Java components for the Vespa container - for this purpose, refer to[Configuring components](configuring-components.html). ## Further reads - [Configuration server operations](../operations/self-managed/configuration-server.html) is a good resource for troubleshooting. - Refer to the [bundle plugin](bundles.html#maven-bundle-plugin) for how to build an application package with Java components. - During development on a local instance it can be handy to just wipe the state completely and start over: 1. [Delete all config server state](../operations/self-managed/configuration-server.html#zookeeper-recovery) on all config servers 2. Run [vespa-remove-index](../reference/operations/self-managed/tools.html#vespa-remove-index) to wipe content nodes Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Node configuration](#node-configuration) - [Configuration assembly](#configuration-assembly) - [Configuration delivery](#configuration-delivery) - [Configuration definitions](#configuration-definitions) - [Component view](#component-view) - [Delivery mechanism](#delivery-mechanism) - [Bootstrapping](#bootstrapping) - [System upgrades](#upgrades) - [Notes](#notes) - [Further reads](#further-reads) --- # Source: https://docs.vespa.ai/en/reference/api/config-v2.html.md # Config API Vespa provides a REST API for listing and retrieving config - alternatives are the [programmatic Java API](../../applications/configapi-dev.html#the-java-config-api). The Config API provides a way to inspect and retrieve all the config that can be generated by the config model for a given [tenant's active application](deploy-v2.html). Some, but not necessarily all, of those configs are used by services by [subscribing](../../applications/configapi-dev.html) to them. The response format is JSON. The current API version is 2. All config servers provide the REST API. The API port is 19071 - use [vespa-model-inspect](../operations/self-managed/tools.html#vespa-model-inspect) service configserver to find config server hosts. Example: `http://myconfigserver.mydomain.com:19071/config/v2/tenant/msbe/application/articlesearch/` The API is available after an application has been [deployed and activated](../../basics/applications.html#deploying-applications). ## The application id The API provides two ways to identify your application, given a tenant: one using only an application name, and one using application name, environment, region and instance. For the former, "short" form, a default environment, region and instance is used. More formally, an _application id_ is a tuple of the form (_application_, _environment_, _region_, _instance_). The system currently provides shorthand to the id (_application_, "default", "default", "default"). Note: Multiple environments, regions and instances are not currently supported for application deployments, _default_ is always used. Example URL using only application name: `http://myconfigserver.mydomain.com:19071/config/v2/tenant/media/application/articlesearch/media.config.server-list/clusters/0` | Part | Description | | --- | --- | | media | Tenant | | articlesearch | Application | | media.config | Namespace of the requested config | | server-list | Name of the requested config | | clusters/0 | Config id of the requested config | Example URL using full application id:`http://myconfigserver.mydomain.com:19071/config/v2/tenant/media/application/articlesearch/environment/test/region/us/instance/staging/media.config.server-list/clusters/0` | Part | Description | | --- | --- | | media | Tenant | | articlesearch | Name of the application | | test | Environment | | us | Region | | staging | Instance | | media.config | Namespace of the requested config | | server-list | Name of the requested config | | clusters/0 | Config id of the requested config | In this API specification, the short form of the application id, i.e. only the application name, is used. The tenant `mytenant` and the application name `myapplication` is used throughout in examples. ## GET /config/v2/tenant/mytenant/application/myapplication/ List the configs in the model, as [config id](../../applications/configapi-dev.html#config-id) specific URLs. | Parameters | | Parameter | Default | Description | | --- | --- | --- | | recursive | false | If true, include each config id in the model which produces the config, and list only the links to the config payload. If false, include the first level of the config ids in the listing of new list URLs, as explained above. | | | Request body | None | | Response | A list response includes two arrays: - List-links to descend one level down in the config id hierarchy, named `children`. - [Config payload](#payload) links for the current (top) level, named `configs`. | | Error Response | N/A | Examples: `GET /config/v2/tenant/mytenant/application/myapplication/` ``` ``` { "children": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/myconfigserver.mydomain.com/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/hosts/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.model/admin/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/container.components/search/" ], "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.model", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/container.components" ] ``` ``` `GET /config/v2/tenant/mytenant/application/myapplication/?recursive=true` ``` ``` { "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/myconfigserver.mydomain.com", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/hosts/myconfigserver.mydomain.com" ``` ``` ## GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/ | Parameters | Same as above. | | Request body | None | | Response | List the configs in the model with the given namespace and name. List semantics as above. | | Error Response | 404 if the given namespace.name is not known to the config model. | Examples: `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/` ``` ``` { "children": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/", "http://myconfigserver.mydomain.com:19071/config/v1/vespaclient.config.feeder/docproc/" ] "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder", ] } ``` ``` `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/?recursive=true` ``` ``` { "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/default", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/gateways", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/gateways/gateway", ``` ``` ## GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/subid]/ | Parameters | Same as above. | | Request body | None | | Response | List the configs in the model with the given namespace and name, and for which the given config id segment is a prefix. | | Error Response | - 404 if the given namespace.name is not known to the config model. - 404 if the given config id is not in the model. | Examples: `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/` ``` ``` { "children": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/" ] "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search" ] } ``` ``` `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/?recursive=true` ``` ``` { "configs": [ "http://myhost.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/default" ] } ``` ``` ## GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/id] | Parameters | None | | Request body | None | | Response | Returns the config payload of the given `namespace.name/config/id`, formatted as JSON. | | Error Response | Same as above. | Example: `GET /config/v2/tenant/mytenant/application/myapplication/container.core.container-http/search/qrsclusters/default/qrserver.0` ``` ``` { "enabled": "true", "requestbuffersize": "65536", "port": { "search": "8080", "host": "" }, "fileserver": { "throughsearch": "true" } } ``` ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [The application id](#application-id) - [GET /config/v2/tenant/mytenant/application/myapplication/](#list-configs) - [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/](#list-namespace) - [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/subid]/](#list-prefix) - [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/id]](#payload) --- # Source: https://docs.vespa.ai/en/applications/configapi-dev.html.md # Cloud Config API This document describes how to use the C++ and Java versions of the Cloud config API (the 'config API'). This API is used internally in Vespa, and reading this document is not necessary in order to use Vespa or to develop Java components for the Vespa container. For this purpose, please refer to[Configuring components](configuring-components.html) instead. Throughout this document, we will use as example an application serving up a configurable message. ## Creating a Config Definition The first thing to do when deciding to use the config API is to define the config you want to use in your application. This is described in the[configuration file reference](../reference/applications/config-files.html). Here we will use the definition `motd.def`from the complete example at the end of the document: ``` namespace=myproject message string default="NO MESSAGE" port int default=1337 ``` ## Generating Source Code and Accessing Config in Code Before you can access config in your program you will need to generate source code for the config definition. Simple steps for how you can generate API code and use the API are provided for[Java](#the-java-config-api). See also [javadoc](https://javadoc.io/doc/com.yahoo.vespa/config-lib)) We also recommend that you read the [general guidelines](#guidelines)for examples of advanced usage and recommendations for how to use the API. ## Config ID The config id specified when requesting config is essentially an identifier of the component requesting config. The config server contains a config object model, which maps a request for a given config name and config id to the correct configproducer instance, which will merge default values from the config definition with config from the object model and config set in`services.xml` to produce the final config instance. The config id is given to a service via the VESPA\_CONFIG\_ID environment variable. The [config sentinel](/en/operations/self-managed/config-sentinel.html)sets the environment variable to the id given by the config model. This id should then be used by the service to subscribe for config. If you are running multiple services, each of them will be assigned a **unique config id** for that service, and a service should not subscribe using any config id other than its own. If you need to get config for a services that is not part of the model (i.e. it is not specified in the services.xml), but that you want to specify values for in services.xml, use the config id `client`. ## Schema Compatibility Rules A schema incompatibility occurs if the config class (for example `MotdConfig` in the C++ and Java sections above) was built from a different def-file than the one the server is seeing and using to serve config. Some such incompatibilities are automatically handled by the config system, others lead to error. This is useful to know during development/testing of a config schema. Let _S_ denote a config definition called _motd_ which the server is using, and _C_ denote a config definition also called _motd_ which the client is using, i.e. the one that created `MotdConfig` used when subscribing. The following is the system's behavior: | Compatible Changes | These schema mismatches are handled automatically by the configserver: - C is missing a config value that S has: The server will omit that value from the response. - C has an additional config value with a default value: The server will include that value in the response. - C and S both have a config value, but the default values differ: The server will use C's default value. | | Incompatible Changes | These schema mismatches are not handled by the config server, and will typically lead to error in the subscription API because of missing values (though in principle some consumers of config may tolerate them): - C has an additional config value without a default value: The server will not include anything for that value. - C has the type of a config value changed, for example from string to int: The server will print an error message, and not include anything for that value. The user must use an entirely new name for the config if such a change must be made. | As with any data schema, it is wise to be conservative about changing it if the system will have new versions in the future. For a `def` schema, removing a config value constitutes a semantic change that may lead to problems when an older version of some config subscriber asks for config. In large deployments, the risk associated with this increases, because of the higher cost of a full restart of everything. Consequently, one should prefer creating a new config name, to removing a config value from a schema. ## Creating a Deployable Application Package The application package consists of the following files: ``` app/services.xml app/hosts.xml ``` The services file contains the services that is handled by the config model plugin. The hosts file contains: ``` ``` node0 ``` ``` ## Setting Up a Running System To get a running system, first install the cloudconfig package, start the config server, then deploy the application: Prepare the application: ``` $ vespa prepare /path/to/app/folder ``` Activate the application: ``` $ vespa activate /path/to/app/folder ``` Then, start vespa. This will start the application and pass it its config id via the VESPA\_CONFIG\_ID environment variable. ## Advanced Usage of the Config API For a simple application, having only 1 config may suffice. In a typical server application, however, the number of config settings can become large. Therefore, we **encourage** that you split the config settings into multiple logical classes. This section covers how you can use a ConfigSubscriber to subscribe to multiple configs and how you should group configs based on their dependencies. Configs can either be: - Independent static configs - Dependent static configs - Dependent dynamic configs We will give a few examples of how you can cope with these different scenarios. The code examples are given in a pseudo format common to C++ and Java, but they should be easy to convert to their language specific equivalents. ### Independent Static Configs Independent configs means that it does not matter if one of them is updated independently of the other. In this case, you might as well use one ConfigSubscriber for each of the configs, but it might become tedious to check all of them. Therefore, the recommended way is to manage all of these configs using one ConfigSubscriber. In this setup, it is also typical to split the subscription phase from the config check/retrieval part. The subscribing part: | C++ | ``` ``` ConfigSubscriber subscriber; ConfigHandle::UP fooHandle = subscriber.subscribe(...); ConfigHandle::UP barHandle = subscriber.subscribe(...); ConfigHandle::UP bazHandle = subscriber.subscribe(...); ``` ``` | | Java | ``` ``` ConfigSubscriber subscriber; ConfigHandle fooHandle = subscriber.subscribe(FooConfig.class, ...); ConfigHandle barHandle = subscriber.subscribe(BarConfig.class, ...); ConfigHandle bazHandle = subscriber.subscribe(BazConfig.class, ...); ``` ``` | And the retrieval part: ``` if (subscriber.nextConfig()) { if (fooHandle->isChanged()) { // Reconfigure foo } if (barHandle->isChanged()) { // Reconfigure bar } if (bazHandle->isChanged()) { // Reconfigure baz } } ``` This allows you to perform the config fetch part either in its own thread or as part of some other event thread in your application. ### Dependent Static Configs Dependent configs means that one of your configs depends on the value in another config. The most common is that you have one config which contains the config id to use when subscribing to the second config. In addition, your system may need that the configs are updated to the same **generation**. **Note:** A generation is a monotonically increasing number which is increased each time an application is deployed with `vespa deploy`. Certain applications may require that all configs are of the same generation to ensure consistency, especially container-like applications. All configs subscribed to by a ConfigSubscriber are guaranteed to be of the same generation. The configs are static in the sense that the config id used does not change. The recommended way to approach this is to use a two phase setup, where you fetch the initial configs in the first phase, and then subscribe to both the initial and derived configs in order to ensure that they are of the same generation. Assume that the InitialConfig config contains two fields named _derived1_ and _derived2_: | C++ | ``` ``` ConfigSubscriber initialSubscriber; ConfigHandle::UP initialHandle = subscriber.subscribe(...); while (!subscriber.nextConfig()); // Ensure that we actually get initial config. std::auto_ptr initialConfig = initialHandle->getConfig(); ConfigSubscriber subscriber; ... = subscriber.subscribe(...); ... = subscriber.subscribe(initialConfig->derived1); ... = subscriber.subscribe(initialConfig->derived1); ``` ``` | | Java | ``` ``` ConfigSubscriber initialSubscriber; ConfigHandle initialHandle = subscriber.subscribe(InitialConfig.class, ...); while (!subscriber.nextConfig()); // Ensure that we actually get initial config. InitialConfig initialConfig = initialHandle.getConfig(); ConfigSubscriber subscriber; ... = subscriber.subscribe(InitialConfig.class, ...); ... = subscriber.subscribe(DerivedConfig.class, initialConfig.derived1); ... = subscriber.subscribe(DerivedConfig.class, initialConfig.derived1); ``` ``` | You can then check the configs in the same way as for independent static configs, and be sure that all your configs are of the same generation. The reason why you need to create a new ConfigSubscriber is that **once you have called nextConfig(), you cannot add or remove new subscribers**. ### Dependent Dynamic Configs Dynamic configs mean that the set of configs that you subscribe for may change between each deployment. This is the hardest case to solve, and how hard it is depends on how many levels of configs you have. The most common one is to have a set of bootstrap configs, and another set of configs that may change depending on the bootstrap configs (typically in an application that has plugins). To cover this case, you can use a class named `ConfigRetriever`. Currently, it is **only available in the C++ API**. The ConfigRetriever uses the same mechanisms as the ConfigSubscriber to ensure that you get a consistent set of configs. In addition, two more classes called`ConfigKeySet` and `ConfigSnapshot` are added. The ConfigRetriever takes in a set of configs used to bootstrap the system in its constructor. This set does not change. It then provides one method, `getConfigs(ConfigKeySet)`. The method returns a ConfigSnapshot of the next generation of bootstrap configs or derived configs. To create the ConfigRetriever, you must first populate a set of bootstrap configs: ``` ``` ConfigKeySet bootstrapKeys; bootstrapKeys.add(configId); bootstrapKeys.add(configId); ``` ``` The bootstrap configs are typically configs that will always be needed by your application. Once you have defined your set, you can create the retriever and fetch a ConfigSnapshot of the bootstrap configs: ``` ConfigRetriever retriever(bootstrapKeys); ConfigSnapshot bootstrapConfigs = retriever.getConfigs(); ``` The ConfigSnapshot contains the bootstrap config, and you may use that to fetch the individual configs. You need to provide the config id and the type in order for the snapshot to know which config to look for: ``` ``` if (!bootstrapConfigs.empty()) { std::auto_ptr bootstrapFoo = bootstrapConfigs.getConfig(configId); std::auto_ptr bootstrapBar = bootstrapConfigs.getConfig(configId); ``` ``` The snapshot returned is empty if the retriever was unable to get the configs. In that case, you can try calling the same method again. Once you have the bootstrap configs, you know the config ids for the other components that you should subscribe for, and you can define a new key set. Let's assume that bootstrapFoo contains an array of config ids we should subscribe for. ``` ``` ConfigKeySet pluginKeySet; for (size_t i = 0; i < (*bootstrapFoo).pluginConfigId.size; i++) { pluginKeySet.add((*bootstrapFoo).pluginConfigId[i]); } ``` ``` In this example we know the type of config requested, but this could be done in another way letting the plugin add keys to the set. Now that the derived configs have been added to the pluginKeySet, we can request a snapshot of them: ``` ConfigSnapshot pluginConfigs = retriever.getConfigs(pluginKeySet); if (!pluginConfigs.empty()) { // Configure each plugin with a config picked from the snapshot. } ``` And that's it. When calling the method without any key parameters, the snapshot returned by this method may be empty if **the config could not be fetched within the timeout**, or **the generation of configs has changed**. To check if you should call getBootstrapConfigs() again, you can use the `bootstrapRequired()` method. If it returns true, you will have to call getBootstrapConfigs() again, because the plugin configs have been updated, and you need a new bootstrap generation to match it. If it returns false, you may call getConfigs() again to try and get a new generation of plugin configs. We recommend that you use the retriever API if you have a use case like this. The alternative is to create your own mechanism using two ConfigSubscriber classes, but this is **not** recommended. ### Advice on Config Modelling Regardless of which of these types of configs you have, it is recommended that you always fetch all the configs you need**before** you start configuring your system. This is because the user may deploy multiple different version of the config that may cause your components to get conflicting config values. A common pitfall is to treat dependent configs as independent, thereby causing inconsistency in your application when a config update for config A arrives before config B. The ConfigSubscriber was created to minimize the possibility of making this mistake, by ensuring that all of the configs comes from the same config reload. **Tip:** Set up your entire _tree_ of configs in one thread to ensure consistency, and configure your system once all of the configs have arrived. This also maps best to the ConfigSubscriber, since it is not thread safe. ## The Java config API Assumption: a [def file](configapi-dev.html), which is the schema for one of your configs, is created and put in `src/main/resources/configdefinitions/`. To generate source code for the def-file, invoke the`config-class-plugin` from _pom.xml_, in the ``, `` section: ``` ``` com.yahoo.vespa config-class-plugin ${vespa.version} config-gen config-gen ``` ``` The generated classes will be saved to`target/generated-sources/vespa-configgen-plugin`, when the`generate-sources` phase of the build is executed. The def-file [`motd.def`](configapi-dev.html)is used in this tutorial, and a class called `MotdConfig`was generated (in the package `myproject`). It is a subtype of `ConfigInstance`. When using only the config system (and not other parts of Vespa or the JDisc container), pull in that by using this in pom.xml: ``` ``` com.yahoo.vespa config ${vespa.version} provided ``` ``` ## Subscribing and getting config To retrieve the config in the application, create a `ConfigSubscriber`. A `ConfigSubscriber` is capable of subscribing to one or more configs. The example shown here uses simplified error handling: ``` ``` ConfigSubscriber subscriber = new ConfigSubscriber(); ConfigHandle handle = subscriber.subscribe(MotdConfig.class, "motdserver2/0"); if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out."); if (handle.isChanged()) { String message = handle.getConfig().message(); int port = handle.getConfig().port(); } ``` ``` Note that `isChanged()` always will be true after the first call to `nextConfig()`, it is included here to illustrate the API. In many cases one will do this from a thread which loops the`nextConfig()` call, and reconfigures your application if`isChanged()` is true. The second parameter to `subscribe()`, _"motdserver2/0"_, is the [config id](configapi-dev.html#config-id). If one `ConfigSubscriber` subscribes to multiple configs,`nextConfig()` will only return true if the configs are of the same generation, i.e. they are "in sync". See the[com.yahoo.config](https://javadoc.io/doc/com.yahoo.vespa/config-lib) javadoc for details. Example: ``` ``` ConfigSubscriber subscriber = new ConfigSubscriber(); ConfigHandle motdHandle = subscriber.subscribe(MotdConfig.class, "motdserver2/0"); ConfigHandle anotherHandle = subscriber.subscribe(AnotherConfig.class, "motdserver2/0"); if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out."); // We now have a synchronized new generation for these two configs. if (motdHandle.isChanged()) { String message = motdHandle.getConfig().message(); int port = motdHandle.getConfig().port(); } if (anotherHandle.isChanged()) { String myfield = anotherHandle.getConfig().getMyField(); } ``` ``` ## Simplified subscription In cases like the first example above, where you only subscribe to one config, you may also subscribe using the`ConfigSubscriber.SingleSubscriber` interface. In this case, you define a `configure()`method from the interface, and call a special `subscribe()`. The method will start a dedicated config fetcher thread for you. The method will throw an exception in the user thread if initial configuration fails, and print a warning in the config thread if it fails afterwards. Example: ``` ``` public class MyConfigSubscriber implements ConfigSubscriber.SingleSubscriber { public MyConfigSubscriber(String configId) { new ConfigSubscriber().subscribe(this, MotdConfig.class, configId); } @Override public void configure(MotdConfig config) { // configuration logic here } } ``` ``` The disadvantage to using this is that one cannot implement custom error handling or otherwise track config changes. If needed, use the generic method above. ## Unit testing config When instantiating a [ConfigSubscriber](https://javadoc.io/doc/com.yahoo.vespa/config/latest/com/yahoo/config/subscription/ConfigSubscriber.html), one can give it a [ConfigSource](https://javadoc.io/doc/com.yahoo.vespa/config/latest/com/yahoo/config/subscription/ConfigSource.html). One such source is a `ConfigSet`. It consists of a set of `Builder`s. This is an example of instantiating a subscriber using this - it uses 2 types of config, that were generated from files`app.def` and `string.def`: ``` ConfigSet myConfigs = new ConfigSet(); AppConfig.Builder a0builder = new AppConfig.Builder().message("A message, 0").times(88); AppConfig.Builder a1builder = new AppConfig.Builder().message("A message, 1").times(89); myConfigs.add("app/0", a0builder); myConfigs.add("app/1", a1builder); myConfigs.add("bar", new StringConfig.Builder().stringVal("StringVal")); ConfigSubscriber subscriber = new ConfigSubscriber(myConfigs); ``` To help with unit testing, each config type has a corresponding builder type. The `Builder` is mutable whereas the `ConfigInstance` is not. Use this to set up config fixtures for unit tests. The `ConfigSubscriber` has a `reload()` method which is used in tests to force the subscriptions into a new generation. It emulates a `vespa activate` operation after you have updated the `ConfigSet`. A full example can be found in[ConfigSetSubscriptionTest.java](https://github.com/vespa-engine/vespa/blob/master/config/src/test/java/com/yahoo/config/subscription/ConfigSetSubscriptionTest.java). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Creating a Config Definition](#creating-config-definition) - [Generating Source Code and Accessing Config in Code](#generate-source) - [Config ID](#config-id) - [Schema Compatibility Rules](#def-compatibility) - [Creating a Deployable Application Package](#deploy) - [Setting Up a Running System](#setting-up) - [Advanced Usage of the Config API](#guidelines) - [Independent Static Configs](#independent-static-configs) - [Dependent Static Configs](#guidelines-dependent-static) - [Dependent Dynamic Configs](#guidelines-dependent-dynamic) - [Advice on Config Modelling](#guidelines-tips) - [The Java config API](#the-java-config-api) - [Subscribing and getting config](#subscribing-and-getting-config) - [Simplified subscription](#simplified-subscription) - [Unit testing config](#unit-testing) --- # Source: https://docs.vespa.ai/en/reference/operations/metrics/configserver.html.md # ConfigServer Metrics | Name | Unit | Description | | --- | --- | --- | | configserver.requests | request | Number of requests processed | | configserver.failedRequests | request | Number of requests that failed | | configserver.latency | millisecond | Time to complete requests | | configserver.cacheConfigElems | item | Time to complete requests | | configserver.cacheChecksumElems | item | Number of checksum elements in the cache | | configserver.hosts | node | The number of nodes being served configuration from the config server cluster | | configserver.tenants | instance | The number of tenants being served configuration from the config server cluster | | configserver.applications | instance | The number of applications being served configuration from the config server cluster | | configserver.delayedResponses | response | Number of delayed responses | | configserver.sessionChangeErrors | session | Number of session change errors | | configserver.unknownHostRequests | request | Config requests from unknown hosts | | configserver.newSessions | session | New config sessions | | configserver.preparedSessions | session | Prepared config sessions | | configserver.activeSessions | session | Active config sessions | | configserver.inactiveSessions | session | Inactive config sessions | | configserver.addedSessions | session | Added config sessions | | configserver.removedSessions | session | Removed config sessions | | configserver.rpcServerWorkQueueSize | item | Number of elements in the RPC server work queue | | maintenanceDeployment.transientFailure | operation | Number of maintenance deployments that failed with a transient failure | | maintenanceDeployment.failure | operation | Number of maintenance deployments that failed with a permanent failure | | maintenance.successFactorDeviation | fraction | Configserver: Maintenance Success Factor Deviation | | maintenance.duration | millisecond | Configserver: Maintenance Duration | | maintenance.congestion | failure | Configserver: Maintenance Congestion | | configserver.zkConnectionLost | connection | Number of ZooKeeper connections lost | | configserver.zkReconnected | connection | Number of ZooKeeper reconnections | | configserver.zkConnected | node | Number of ZooKeeper nodes connected | | configserver.zkSuspended | node | Number of ZooKeeper nodes suspended | | configserver.zkZNodes | node | Number of ZooKeeper nodes present | | configserver.zkAvgLatency | millisecond | Average latency for ZooKeeper requests | | configserver.zkMaxLatency | millisecond | Max latency for ZooKeeper requests | | configserver.zkConnections | connection | Number of ZooKeeper connections | | configserver.zkOutstandingRequests | request | Number of ZooKeeper requests in flight | | orchestrator.lock.acquire-latency | second | Time to acquire zookeeper lock | | orchestrator.lock.acquire-success | operation | Number of times zookeeper lock has been acquired successfully | | orchestrator.lock.acquire-timedout | operation | Number of times zookeeper lock couldn't be acquired within timeout | | orchestrator.lock.acquire | operation | Number of attempts to acquire zookeeper lock | | orchestrator.lock.acquired | operation | Number of times zookeeper lock was acquired | | orchestrator.lock.hold-latency | second | Time zookeeper lock was held before it was released | | nodes.active | node | The number of active nodes in a cluster | | nodes.nonActive | node | The number of non-active nodes in a cluster | | nodes.nonActiveFraction | node | The fraction of non-active nodes vs total nodes in a cluster | | nodes.exclusiveSwitchFraction | fraction | The fraction of nodes in a cluster on exclusive network switches | | nodes.emptyExclusive | node | The number of exclusive hosts that do not have any nodes allocated to them | | nodes.expired.deprovisioned | node | The number of deprovisioned nodes that have expired | | nodes.expired.dirty | node | The number of dirty nodes that have expired | | nodes.expired.inactive | node | The number of inactive nodes that have expired | | nodes.expired.provisioned | node | The number of provisioned nodes that have expired | | nodes.expired.reserved | node | The number of reserved nodes that have expired | | cluster.cost | dollar\_per\_hour | The cost of the nodes allocated to a certain cluster, in $/hr | | cluster.load.ideal.cpu | fraction | The ideal cpu load of a certain cluster | | cluster.load.ideal.memory | fraction | The ideal memory load of a certain cluster | | cluster.load.ideal.disk | fraction | The ideal disk load of a certain cluster | | cluster.load.peak.cpu | fraction | The peak cpu load in the period considered of a certain cluster | | cluster.load.peak.memory | fraction | The peak memory load in the period considered of a certain cluster | | cluster.load.peak.disk | fraction | The peak disk load in the period considered of a certain cluster | | zone.working | binary | The value 1 if zone is considered healthy, 0 if not. This is decided by considering the number of non-active nodes vs the number of active nodes in a zone | | cache.nodeObject.hitRate | fraction | The fraction of cache hits vs cache lookups for the node object cache | | cache.nodeObject.evictionCount | item | The number of cache elements evicted from the node object cache | | cache.nodeObject.size | item | The number of cache elements in the node object cache | | cache.curator.hitRate | fraction | The fraction of cache hits vs cache lookups for the curator cache | | cache.curator.evictionCount | item | The number of cache elements evicted from the curator cache | | cache.curator.size | item | The number of cache elements in the curator cache | | wantedRestartGeneration | generation | Wanted restart generation for tenant node | | currentRestartGeneration | generation | Current restart generation for tenant node | | wantToRestart | binary | One if node wants to restart, zero if not | | wantedRebootGeneration | generation | Wanted reboot generation for tenant node | | currentRebootGeneration | generation | Current reboot generation for tenant node | | wantToReboot | binary | One if node wants to reboot, zero if not | | retired | binary | One if node is retired, zero if not | | wantedVespaVersion | version | Wanted vespa version for the node, in the form MINOR.PATCH. Major version is not included here | | currentVespaVersion | version | Current vespa version for the node, in the form MINOR.PATCH. Major version is not included here | | wantToChangeVespaVersion | binary | One if node want to change Vespa version, zero if not | | hasWireguardKey | binary | One if node has a WireGuard key, zero if not | | wantToRetire | binary | One if node wants to retire, zero if not | | wantToDeprovision | binary | One if node wants to be deprovisioned, zero if not | | failReport | binary | One if there is a fail report for the node, zero if not | | suspended | binary | One if the node is suspended, zero if not | | suspendedSeconds | second | The number of seconds the node has been suspended | | activeSeconds | second | The number of seconds the node has been active | | numberOfServicesUp | instance | The number of services confirmed to be running on a node | | numberOfServicesNotChecked | instance | The number of services supposed to run on a node, that has not checked | | numberOfServicesDown | instance | The number of services confirmed to not be running on a node | | someServicesDown | binary | One if one or more services has been confirmed to not run on a node, zero if not | | numberOfServicesUnknown | instance | The number of services the config server does not know is running on a node | | nodeFailerBadNode | binary | One if the node is failed due to being bad, zero if not | | downInNodeRepo | binary | One if the node is registered as being down in the node repository, zero if not | | numberOfServices | instance | Number of services supposed to run on a node | | lockAttempt.acquireMaxActiveLatency | second | Maximum duration for keeping a lock, ending during the metrics snapshot, or still being kept at the end or this snapshot period | | lockAttempt.acquireHz | operation\_per\_second | Average number of locks acquired per second the snapshot period | | lockAttempt.acquireLoad | operation | Average number of locks held concurrently during the snapshot period | | lockAttempt.lockedLatency | second | Longest lock duration in the snapshot period | | lockAttempt.lockedLoad | operation | Average number of locks held concurrently during the snapshot period | | lockAttempt.acquireTimedOut | operation | Number of locking attempts that timed out during the snapshot period | | lockAttempt.deadlock | operation | Number of lock grab deadlocks detected during the snapshot period | | lockAttempt.errors | operation | Number of other lock related errors detected during the snapshot period | | hostedVespa.docker.totalCapacityCpu | vcpu | Total number of VCPUs on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.totalCapacityMem | gigabyte | Total amount of memory on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.totalCapacityDisk | gigabyte | Total amount of disk space on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.freeCapacityCpu | vcpu | Total number of free VCPUs on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.freeCapacityMem | gigabyte | Total amount of free memory on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.freeCapacityDisk | gigabyte | Total amount of free disk space on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.allocatedCapacityCpu | vcpu | Total number of allocated VCPUs on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.allocatedCapacityMem | gigabyte | Total amount of allocated memory on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.allocatedCapacityDisk | gigabyte | Total amount of allocated disk space on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.pendingRedeployments | task | The number of hosted Vespa re-deployments pending | | hostedVespa.docker.skew | fraction | A number in the range 0..1 indicating how well allocated resources are balanced with availability on hosts | | hostedVespa.activeHosts | host | The number of managed hosts that are in state "active" | | hostedVespa.breakfixedHosts | host | The number of managed hosts that are in state "breakfixed" | | hostedVespa.deprovisionedHosts | host | The number of managed hosts that are in state "deprovisioned" | | hostedVespa.dirtyHosts | host | The number of managed hosts that are in state "dirty" | | hostedVespa.failedHosts | host | The number of managed hosts that are in state "failed" | | hostedVespa.inactiveHosts | host | The number of managed hosts that are in state "inactive" | | hostedVespa.parkedHosts | host | The number of managed hosts that are in state "parked" | | hostedVespa.provisionedHosts | host | The number of managed hosts that are in state "provisioned" | | hostedVespa.readyHosts | host | The number of managed hosts that are in state "ready" | | hostedVespa.reservedHosts | host | The number of managed hosts that are in state "reserved" | | hostedVespa.activeNodes | host | The number of managed nodes that are in state "active" | | hostedVespa.breakfixedNodes | host | The number of managed nodes that are in state "breakfixed" | | hostedVespa.deprovisionedNodes | host | The number of managed nodes that are in state "deprovisioned" | | hostedVespa.dirtyNodes | host | The number of managed nodes that are in state "dirty" | | hostedVespa.failedNodes | host | The number of managed nodes that are in state "failed" | | hostedVespa.inactiveNodes | host | The number of managed nodes that are in state "inactive" | | hostedVespa.parkedNodes | host | The number of managed nodes that are in state "parked" | | hostedVespa.provisionedNodes | host | The number of managed nodes that are in state "provisioned" | | hostedVespa.readyNodes | host | The number of managed nodes that are in state "ready" | | hostedVespa.reservedNodes | host | The number of managed nodes that are in state "reserved" | | overcommittedHosts | host | The number of hosts with over-committed resources | | spareHostCapacity | host | The number of spare hosts | | throttledHostFailures | host | Number of host failures stopped due to throttling | | throttledNodeFailures | host | Number of node failures stopped due to throttling | | nodeFailThrottling | binary | Metric indicating when node failure throttling is active. The value 1 means active, 0 means inactive | | clusterAutoscaled | operation | Number of times a cluster has been rescaled by the autoscaler | | clusterAutoscaleDuration | second | The currently predicted duration of a rescaling of this cluster | | deployment.prepareMillis | millisecond | Duration of deployment preparations | | deployment.activateMillis | millisecond | Duration of deployment activations | | throttledHostProvisioning | binary | Value 1 if host provisioning is throttled, 0 if not | Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/self-managed/configuration-server.html.md # Configuration Servers Vespa Configuration Servers host the endpoint where application packages are deployed - and serves generated configuration to all services - see the [overview](../../learn/overview.html) and [application packages](../../basics/applications.html) for details. I.e., one cannot configure Vespa without config servers, and services cannot run without it. It is useful to understand the [Vespa start sequence](config-sentinel.html). Refer to the sample applications [multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) and [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) for practical examples of multi-configserver configuration. Vespa configuration is set up using one or more configuration servers (config servers). A config server uses [Apache ZooKeeper](https://zookeeper.apache.org/) as a distributed data storage for the configuration system. In addition, each node runs a config proxy to cache configuration data - find an overview at [services start](config-sentinel.html). ## Status and config generation Check the health of a running config server using (replace localhost with hostname): ``` $ curl http://localhost:19071/state/v1/health ``` Note that the config server is a service is itself, and runs with file-based configuration. The application packages deployed will not change the config server - the config server serves this configuration to all other Vespa nodes. This will hence always be config generation 0: ``` $ curl http://localhost:19071/state/v1/config ``` Details in [start-configserver](https://github.com/vespa-engine/vespa/blob/master/configserver/src/main/sh/start-configserver). ## Redundancy The config servers are defined in [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables), [services.xml](../../reference/applications/services/services.html) and [hosts.xml](/en/reference/applications/hosts.html): ``` $ VESPA_CONFIGSERVERS=myserver0.mydomain.com,myserver1.mydomain.com,myserver2.mydomain.com ``` ``` ``` ``` ``` ``` ``` admin0 admin1 admin2 ``` ``` [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) must be set on all nodes. This is a comma- or whitespace-separated list with the hostname of all config servers, like _myhost1.mydomain.com,myhost2.mydomain.com,myhost3.mydomain.com_. When there are multiple config servers, the [config proxy](config-proxy.html) will pick a config server randomly (to achieve load balancing between config servers). The config proxy is fault-tolerant and will switch to another config server (if there is more than one) if the one it is using becomes unavailable or there is an error in the configuration it receives. For the system to tolerate _n_ failures, [ZooKeeper](#zookeeper) by design requires using _(2\*n)+1_ nodes. Consequently, only an odd numbers of nodes is useful, so you need minimum 3 nodes to have a fault-tolerant config system. Even when using just one config server, the application will work if the server goes down (but deploying application changes will not work). Since the _config proxy_ runs on every node and caches configs, it will continue to serve config to the services on that node. However, restarting a node when config servers are unavailable means that services on the node will be unable to start since the cache will be destroyed when restarting the config proxy. Refer to the [admin model reference](../../reference/applications/services/admin.html#configservers) for more details on _services.xml_. ## Start sequence To bootstrap a Vespa application instance, the high-level steps are: - Start config servers - Deploy config - Start Vespa nodes [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) is a great guide on how to start a multinode Vespa application instance - try this first. Detailed steps for config server startup: 1. Set [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) on all nodes, using fully qualified hostnames and the same value on all nodes, including the config servers. 2. Start the config server on the nodes configured in _services/hosts.xml_. Make sure the startup is successful by inspecting [/state/v1/health](../../reference/api/state-v1.html#state-v1-health), default on port 19071: ``` $ curl http://localhost:19071/state/v1/health ``` ``` ``` { "time" : 1651147368066, "status" : { "code" : "up" }, "metrics" : { "snapshot" : { "from" : 1.651147308063E9, "to" : 1.651147367996E9 } } } ``` ``` If there is no response on the health API, two things can have happened: - The config server process did not start - inspect logs using `vespa-logfmt`, or check _$VESPA\_HOME/logs/vespa/vespa.log_, normally _/opt/vespa/logs/vespa/vespa.log_. - The config server process started, and is waiting for [Zookeeper quorum](#zookeeper): ``` $ vespa-logfmt -S configserver ``` ``` configserver Container.com.yahoo.vespa.zookeeper.ZooKeeperRunner Starting ZooKeeper server with /opt/vespa/var/zookeeper/conf/zookeeper.cfg. Trying to establish ZooKeeper quorum (members: [node0.vespanet, node1.vespanet, node2.vespanet], attempt 1)configserver Container.com.yahoo.container.handler.threadpool.ContainerThreadpoolImpl Threadpool 'default-pool': min=12, max=600, queue=0 configserver Container.com.yahoo.vespa.config.server.tenant.TenantRepository Adding tenant 'default', created 2022-04-28T13:02:24.182Z. Bootstrapping in PT0.175576S configserver Container.com.yahoo.vespa.config.server.rpc.RpcServer Rpc server will listen on port 19070 configserver Container.com.yahoo.container.jdisc.state.StateMonitor Changing health status code from 'initializing' to 'up' configserver Container.com.yahoo.jdisc.http.server.jetty.Janitor Creating janitor executor with 2 threads configserver Container.com.yahoo.jdisc.http.server.jetty.JettyHttpServer Threadpool size: min=22, max=22 configserver Container.org.eclipse.jetty.server.Server jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 11.0.14.1+1- configserver Container.org.eclipse.jetty.server.handler.ContextHandler Started o.e.j.s.ServletContextHandler@341c0dfc{19071,/,null,AVAILABLE} configserver Container.org.eclipse.jetty.server.AbstractConnector Started configserver@3cd6d147{HTTP/1.1, (http/1.1, h2c)}{0.0.0.0:19071} configserver Container.org.eclipse.jetty.server.Server Started @21955ms configserver Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components.Application config generation: 0 ``` It will hang until quorum is reached, and the second highlighted log line is emitted. Root causes for missing quorum can be: - No connectivity between the config servers. Zookeeper logs the members like `(members: [node0.vespanet, node1.vespanet, node2.vespanet], attempt 1)`. Verify that the nodes running config server can reach each other on port 2181. - No connectivity can be wrong network config. [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) uses a docker network, make sure there are no underscores in the hostnames. 3. Once all config servers return `up` on _state/v1/health_, an application package can be deployed. This means, if deploy fails, it is always a good idea to verify the config server health first - if config servers are up, and deploy fails, it is most likely an issue with the application package - if so, refer to [application packages](../../basics/applications.html). 4. A successful deployment logs the following, for the _prepare_ and _activate_ steps: ``` Container.com.yahoo.vespa.config.server.ApplicationRepository Session 2 prepared successfully. Container.com.yahoo.vespa.config.server.deploy.Deployment Session 2 activated successfully using no host provisioner. Config generation 2. File references: [file '9cfc8dc57f415c72'] Container.com.yahoo.vespa.config.server.session.SessionRepository Session activated: 2 ``` 5. Start the Vespa nodes. Technically, they can be started at any time. When troubleshooting, it is easier to make sure the config servers are started successfully, and deployment was successful - before starting any other nodes. Refer to the [Vespa start sequence](config-sentinel.html) and [Vespa start / stop / restart](admin-procedures.html#vespa-start-stop-restart). Make sure to look for logs on all config servers when debugging. ## Scaling up Add a config server node for increased fault tolerance or when replacing a node. Read up on [ZooKeeper configuration](#zookeeper-configuration) before continuing. Although it is _possible_ to add more than one config server at a time, doing it one by one is recommended, to keep the ZooKeeper quorum intact. Due to the ZooKeeper majority vote, use one or three config servers. 1. Install _vespa_ on new config server node. 2. Append the config server node's hostname to VESPA\_CONFIGSERVERS on all nodes, then (re)start all config servers in sequence to update the ZooKeeper config. By appending, the current config server nodes keep their current ZooKeeper index. Restart the existing config server(s) first. Config server will log which servers are configured when starting up to vespa log. 3. Update _services.xml_ and _hosts.xml_ with the new set of config servers, then _vespa prepare_ and _vespa activate_. 4. Restart other nodes one by one to start using the new config servers. This will let the vespa nodes use the updated set of config servers. The config servers will automatically redistribute the application data to new nodes. ## Scaling down This is the inverse of scaling up, and the procedure is the same. Remove config servers from the end of _VESPA\_CONFIGSERVERS_, and here one can remove two nodes in one go, if going from three to one. ## Replacing nodes - Make sure to replace only one node at a time. - If you have only one config server you need to first scale up with a new node, then scale down by removing the old node. - If you have 3 or more you can replace one of the old nodes in VESPA\_CONFIGSERVERS with the new one instead of adding one, otherwise same procedure as in [Scaling up](#scaling-up). Repeat for each node you want to replace. ## Tools Tools to access config: - [vespa-get-config](../../reference/operations/self-managed/tools.html#vespa-get-config) - [vespa-configproxy-cmd](../../reference/operations/self-managed/tools.html#vespa-configproxy-cmd) - [Config API](../../reference/api/config-v2.html) ## ZooKeeper [ZooKeeper](https://zookeeper.apache.org/) handles data consistency across multiple config servers. The config server Java application runs a ZooKeeper server, embedded with an RPC frontend that the other nodes use. ZooKeeper stores data internally in _nodes_ that can have _sub-nodes_, similar to a file system. At [vespa prepare](../../reference/clients/vespa-cli/vespa_prepare), the application's files, along with global configurations, are stored in ZooKeeper. The application data is stored under _/config/v2/tenants/default/sessions/[sessionid]/userapp_. At [vespa activate](../../reference/clients/vespa-cli/vespa_activate), the newest application is activated _live_ by writing the session id into _/config/v2/tenants/default/applications/default:default:default_. It is at that point the other nodes get configured. Use _vespa-zkcli_ to inspect state, replace with actual session id: ``` $ vespa-zkcli ls /config/v2/tenants/default/sessions/sessionid/userapp $ vespa-zkcli get /config/v2/tenants/default/sessions/sessionid/userapp/services.xml ``` The ZooKeeper server logs to _$VESPA\_HOME/logs/vespa/zookeeper.configserver.0.log (files are rotated with sequence number)_ ### ZooKeeper configuration The members of the ZooKeeper cluster is generated based on the contents of [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables). _$VESPA\_HOME/var/zookeeper/conf/zookeeper.cfg_ is written when (re)starting the config server. Hence, config server(s) must all be restarted when `VESPA_CONFIGSERVERS` changes. The order of the nodes is used to create indexes in _zookeeper.cfg_, do not change node order. ### ZooKeeper recovery If the config server(s) should experience data corruption, for instance a hardware failure, use the following recovery procedure. One example of such a scenario is if _$VESPA\_HOME/logs/vespa/zookeeper.configserver.0.log_ says _java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method)_, which indicates ZooKeeper has not been able to recover after a full disk. There is no need to restart Vespa on other nodes during the procedure: 1. [vespa-stop-configserver](../../reference/operations/self-managed/tools.html#vespa-stop-configserver) 2. [vespa-configserver-remove-state](../../reference/operations/self-managed/tools.html#vespa-configserver-remove-state) 3. [vespa-start-configserver](../../reference/operations/self-managed/tools.html#vespa-start-configserver) 4. [vespa](../../clients/vespa-cli.html#deployment) prepare \ 5. [vespa](../../clients/vespa-cli.html#deployment) activate This procedure completely cleans out ZooKeeper's internal data snapshots and deploys from scratch. Note that by default the [cluster controller](../../content/content-nodes.html#cluster-controller) that maintains the state of the content cluster will use the shared same ZooKeeper instance, so the content cluster state is also reset when removing state. Manually set state will be lost (e.g. a node with user state _down_). It is possible to run cluster-controllers in standalone zookeeper mode - see [standalone-zookeeper](../../reference/applications/services/admin.html#cluster-controllers). ### ZooKeeper barrier timeout If the config servers are heavily loaded, or the applications being deployed are big, the internals of the server may time out when synchronizing with the other servers during deploy. To work around, increase the timeout by setting: [VESPA\_CONFIGSERVER\_ZOOKEEPER\_BARRIER\_TIMEOUT](files-processes-and-ports.html#environment-variables) to 600 (seconds) or higher, and restart the config servers. ## Configuration To access config from a node not running the config system (e.g. doing feeding via the Document API), use the environment variable [VESPA\_CONFIG\_SOURCES](files-processes-and-ports.html#environment-variables): ``` $ export VESPA_CONFIG_SOURCES="myadmin0.mydomain.com:19071,myadmin1.mydomain.com:19071" ``` Alternatively, for Java programs, use the system property _configsources_ and set it programmatically or on the command line with the _-D_ option to Java. The syntax for the value is the same as for _VESPA\_CONFIG\_SOURCES_. ### System requirements The minimum heap size for the JVM it runs under is 128 Mb and max heap size is 2 GB (which can be changed with a [setting](../../performance/container-tuning.html#config-server-and-config-proxy)). It writes a transaction log that is regularly purged of old items, so little disk space is required. Note that running on a server that has a lot of disk I/O will adversely affect performance and is not recommended. ### Ports The config server RPC port can be changed by setting [VESPA\_CONFIGSERVER\_RPC\_PORT](files-processes-and-ports.html#environment-variables) on all nodes in the system. Changing HTTP port requires changing the port in _$VESPA\_HOME/conf/configserver-app/services.xml_: ``` ``` ``` ``` When deploying, use the _-p_ option, if port is changed from the default. ## Troubleshooting | Problem | Description | | --- | --- | | Health checks | Verify that a config server is up and running using [/state/v1/health](../../reference/api/state-v1.html#state-v1-health), see [start sequence](#start-sequence). Status code is `up` if the server is up and has finished bootstrapping. Alternatively, use [http://localhost:19071/status.html](http://localhost:19071/status.html) which will return response code 200 if server is up and has finished bootstrapping. Metrics are found at [/state/v1/metrics](../../reference/api/state-v1.html#state-v1-metrics). Use [vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect) to find host and port number, port is 19071 by default. | | Consistency | When having more than one config server, consistency between the servers is crucial. [http://localhost:19071/status](http://localhost:19071/status) can be used to check that settings for config servers are the same for all servers. [vespa-config-status](../../reference/operations/self-managed/tools.html#vespa-config-status) can be used to check config on nodes. [http://localhost:19071/application/v2/tenant/default/application/default](http://localhost:19071/application/v2/tenant/default/application/default) displays active config generation and should be the same on all servers, and the same as in response from running [vespa deploy](../../clients/vespa-cli.html#deployment) | | Bad Node | If running with more than one config server and one of these goes down or has hardware failure, the cluster will still work and serve config as usual (clients will switch to use one of the good servers). It is not necessary to remove a bad server from the configuration. Deploying applications will take longer, as [vespa deploy](../../clients/vespa-cli.html#deployment) will not be able to complete a deployment on all servers when one of them is down. If this is troublesome, lower the [barrier timeout](#zookeeper-barrier-timeout) - (default value is 120 seconds). Note also that if you have not configured [cluster controllers](../../reference/applications/services/admin.html#cluster-controller) explicitly, these will run on the config server nodes and the operation of these might be affected. This is another reason for not trying to manually remove a bad node from the config server setup. | | Stuck filedistribution | The config system distributes binary files (such as jar bundle files) using [file-distribution](../../applications/deployment.html#file-distribution) - use [vespa-status-filedistribution](../../reference/operations/self-managed/tools.html#vespa-status-filedistribution) to see detailed status if it gets stuck. | | Memory | Insufficient memory on the host / in the container running the config server will cause startup or deploy / configuration problems - see [Docker containers](docker-containers.html). | | ZooKeeper | The following can be caused by a full disk on the config server, or clocks out of sync: ``` at com.yahoo.vespa.zookeeper.ZooKeeperRunner.startServer(ZooKeeperRunner.java:92) Caused by: java.io.IOException: The accepted epoch, 10 is less than the current epoch, 48 ``` Users have reported that "Copying the currentEpoch to acceptedEpoch fixed the problem". | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Status and config generation](#status-and-config-generation) - [Redundancy](#redundancy) - [Start sequence](#start-sequence) - [Scaling up](#scaling-up) - [Scaling down](#scaling-down) - [Replacing nodes](#replacing-nodes) - [Tools](#tools) - [ZooKeeper](#zookeeper) - [ZooKeeper configuration](#zookeeper-configuration) - [ZooKeeper recovery](#zookeeper-recovery) - [ZooKeeper barrier timeout](#zookeeper-barrier-timeout) - [Configuration](#configuration) - [System requirements](#system-requirements) - [Ports](#ports) - [Troubleshooting](#troubleshooting) --- # Source: https://docs.vespa.ai/en/applications/configuring-components.html.md # Configuring Java components Any Java component might require some sort of configuration, be it simple strings or integers, or more complex structures. Because of all the boilerplate code that commonly goes into classes to hold such configuration, this often degenerates into a collection of key-value string pairs (e.g. [javax.servlet.FilterConfig](https://docs.oracle.com/javaee/6/api/javax/servlet/FilterConfig.html)). To avoid this, Vespa has custom, type-safe configuration to all [Container](containers.html) components. Get started with the [Developer Guide](developer-guide.html), try the [album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java) sample application. Configurable components in short: - Create a [config definition](../reference/applications/config-files.html#config-definition-files) file - Use the Vespa [bundle plugin](bundles.html#maven-bundle-plugin) to generate a config class from the definition - Inject config objects in the application code The application code is interfacing with config through the generated code — code and config is always in sync. This configuration should be used for all state which is assumed to stay constant for the _lifetime of the component instance_. Use [deploy](../basics/applications.html) to push and activate code and config changes. ## Config definition Write a [config definition](../reference/applications/config-files.html#config-definition-files) file and place it in the application's `src/main/resources/configdefinitions/` directory, e.g. `src/main/resources/configdefinitions/my-component.def`: ``` package=com.mydomain.mypackage myCode int default=42 myMessage string default="" ``` ## Generating config classes Generating config classes is done by the _bundle plugin_: ``` $ mvn generate-resources ``` The generated the config classes are written to `target/generated-sources/vespa-configgen-plugin/`. In the above example, the config definition file was named _my-component.def_ and its package declaration is _com.mydomain.mypackage_. The full name of the generated java class will be _com.mydomain.mypackage.MyComponentConfig_ It is a good idea to generate the config classes first, _then_ resolve dependencies and compile in the IDE. ## Using config in code The generated config class is now available for the component through [constructor injection](dependency-injection.html), which means that the component can declare the generated class as one of its constructor arguments: ``` package com.mydomain.mypackage; public class MyComponent { private final int code; private final String message; @Inject public MyComponent(MyComponentConfig config) { code = config.myCode(); message = config.myMessage(); } } ``` The Container will create and inject the config instance. To override the default values of the config, [specify](../reference/applications/config-files.html#generic-configuration-in-services-xml) values in `src/main/application/services.xml`, like: ``` 132 Hello, World! ``` and the deployed instance of `MyComponent` is constructed using a corresponding instance of `MyComponentConfig`. ## Unit testing configurable components The generated config class provides a builder API that makes it easy to create config objects for unit testing. Example that sets up a unit test for the `MyComponent` class from the example above: ``` import static com.mydomain.mypackage.MyComponentConfig.*; public class MyComponentTest { @Test public void requireThatMyComponentGetsConfig() { MyComponentConfig config = new MyComponentConfig.Builder() .myCode(668) .myMessage("Neighbour of the beast") .build(); MyComponent component = new MyComponent(config); … } } ``` The config class used here is simple — see a separate example of [building a complex configuration object](unit-testing.html#unit-testing-configurable-components). ## Adding files to the component configuration This section describes what to do if the component needs larger configuration objects that are stored in files, e.g. machine-learned models, [automata](../reference/operations/tools.html#vespa-makefsa) or large tables. Before proceeding, take a look at how to create [provider components](dependency-injection.html#special-components) — instead of integrating large objects into e.g. a searcher or processor, it might be better to split the resource-demanding part of the component's configuration into a separate provider component. The procedure described below can be applied to any component type. Files can be transferred using either [file distribution](deployment.html#file-distribution) or URL download. File distribution is used when the files are added to the application package. If for some reason this is not convenient, e.g. due to size, origin of file or update frequency, Vespa can download the file and make it available for the component. Both types are set up in the config definition file. File distribution uses the `path` config type, and URL downloading the `url` type. You can also use the `model` type for machine-learned models that can be referenced by both model-id, used on Vespa Cloud, and url/path, used on self-hosted deployments. See [the config file reference](../reference/applications/config-files.html) for details. In the following example we will show the usage of all three types. Assume this config definition, named `my-component.def`: ``` package=com.mydomain.mypackage myFile path myUrl url myModel model ``` The file must reside in the application package, and the path (relative to the application package root) must be given in the component's configuration in `services.xml`: ``` my-files/my-file.txt https://docs.vespa.ai/en/reference/query-api-reference.html ``` An example component that uses these files: ``` package com.mydomain.mypackage; import java.io.File; public class MyComponent { private final File fileFromFileDistribution; private final File fileFromUrlDownload; public MyComponent(MyComponentConfig config) { pathFromFileDistribution = config.myFile(); fileFromUrlDownload = config.myUrl(); modelFilePath = config.myModel(); } } ``` The `myFile()` and `myModel()` getter returns a `java.nio.Path` object, while the `myUrl()` getter returns a `java.io.File` object. The container framework guarantees that these files are fully present at the given location before the component constructor is invoked, so they can always be accessed right away. When the client asks for config that uses the `url` or `model` config type with a URL, the content will be downloaded and cached on the nodes that need it. If you want to change the content, the application package needs to be updated with a new URL for the changed content and the application [deployed](../basics/applications.html), otherwise the cached content will still be used. This avoids unintended changes to the application if the content of a URL changes. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Config definition](#config-definition) - [Generating config classes](#generate-config-class) - [Using config in code](#use-config-in-code) - [Unit testing configurable components](#unit-testing-configurable-components) - [Adding files to the component configuration](#adding-files-to-the-component-configuration) --- # Source: https://docs.vespa.ai/en/content/consistency.html.md # Vespa Consistency Model Vespa offers configurable data redundancy with eventual consistency across replicas. It's designed for high efficiency under workloads where eventual consistency is an acceptable tradeoff. This document aims to go into some detail on what these tradeoffs are, and what you, as a user, can expect. ### Vespa and CAP Vespa may be considered a limited subset of AP under the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem). Under CAP, there is a fundamental limitation of whether any distributed system can offer guarantees on consistency (C) or availability (A) in scenarios where nodes are partitioned (P) from each other. Since there is no escaping that partitions can and will happen, we talk either of systems that are _either_ CP or AP. Consistency (C) in CAP implies that reads and writes are strongly consistent, i.e. the system offers_linearizability_. Weaker forms such as causal consistency or "read your writes" consistency is _not_ sufficient. As mentioned initially, Vespa is an eventually consistent data store and therefore does not offer this property. In practice, Consistency requires the use of a majority consensus algorithm, which Vespa does not currently use. Availability (A) in CAP implies that _all requests_ receive a non-error response regardless of how the network may be partitioned. Vespa is dependent on a centralized (but fault tolerant) node health checker and coordinator. A network partition may take place between the coordinator and a subset of nodes. Operations to nodes in this subset aren't guaranteed to succeed until the partition heals. As a consequence, Vespa is not _guaranteed_ to be strongly available, so we treat this as a "limited subset" of AP (though this is not technically part of the CAP definition). In _practice_, the best-effort semantics of Vespa have proven to be both robust and highly available in common datacenter networks. ### Write durability and consistency When a client receives a successful [write](../writing/reads-and-writes.html) response, the operation has been written and synced to disk. The replication level is configurable. Operations are by default written on _all_ available replica nodes before sending a response. "Available" here means being Up in the [cluster state](content-nodes.html#cluster-state), which is determined by the fault-tolerant, centralized Cluster Controller service. If a cluster has a total of 3 nodes, 2 of these are available and the replication factor is 3, writes will be ACKed to the client if both the available nodes ACK the operation. On each replica node, operations are persisted to a write-ahead log before being applied. The system will automatically recover after a crash by replaying logged operations. Writes are guaranteed to be synced to durable storage prior to sending a successful response to the client, so acknowledged writes are retained even in the face of sudden power loss. If a client receives a failure response for a write operation, the operation may or may not have taken place on a subset of the replicas. If not all replicas could be written to, they are considered divergent (out of sync). The system detects and reconciles divergent replicas. This happens without any required user intervention. Each document write assigns a new wall-clock timestamp to the resulting document version. As a consequence, configure servers with NTP to keep clock drift as small as possible. Large clock drifts may result in timestamp collisions or unexpected operation orderings. Vespa has support for conditional writes for individual documents through test-and-set operations. Multi-document transactions are not supported. After a successful response, changes to the search indexes are immediately visible by default. ### Read consistency Reads are consistent on a best-effort basis and are not guaranteed to be linearizable. When using a [Get](../reference/api/document-v1.html#get) or [Visit](../writing/visiting.html) operation, the client will never observe a partially updated document. For these read operations, writes behave as if they are atomic. Searches may observe partial updates, as updates are not atomic across index structures. This can only happen _after_ a write has started, but _before_ it's complete. Once a write is complete, all index updates are visible. Searches may observe transient loss of coverage when nodes go down. Vespa will restore coverage automatically when this happens. How fast this happens depends on the configured [searchable-copies](../reference/applications/services/content.html#searchable-copies) value. If replicas diverge during a Get, Vespa performs a read-repair. This fetches the requested document from all divergent replicas. The client then receives the version with the newest timestamp. If replicas diverge during a Visit, the behavior is slightly different between the Document V1 API and [vespa-visit](/en/reference/operations/self-managed/tools.html#vespa-visit): - Document V1 will prefer immediately visiting the replica that contains the most documents. This means it's possible for a subset of documents in a bucket to not be returned. - `vespa-visit` will by default retry visiting the bucket until it is in sync. This may take a long time if large parts of the system are out of sync. The rationale for this difference in behavior is that Document V1 is usually called in a real-time request context, whereas `vespa-visit` is usually called in a background/batch processing context. Visitor operations iterate over the document corpus in an implementation-specific order. Any given document is returned in the state it was in at the time the visitor iterated over the data bucket containing the document. This means there is _no snapshot isolation_—a document mutation happening concurrently with a visitor may or may not be reflected in the returned document set, depending on whether the mutation happened before or after iteration of the bucket containing the document. ### Replica reconciliation Reconciliation is the act of bringing divergent replicas back into sync. This usually happens after a node restarts or fails. It will also happen after network partitions. Unlike several other eventually consistent databases, Vespa doesn't use distributed replica operation logs. Instead, reconciling replicas involves exchanging sets of timestamped documents. Reconciliation is complete once the union set of documents is present on all replicas. Metadata is checksummed to determine whether replicas are in sync with each other. When reconciling replicas, the newest available version of a document will "win" and become visible. This version may be a remove (tombstone). Tombstones are replicated in the same way as regular documents. Reconciliation happens the document level, not at the field level. I.e. there is no merging of individual fields across different versions. If a test-and-set operation updates at least one replica, it will eventually become visible on the other replicas. The reconciliation operation is referred to as a "merge" in the rest of the Vespa documentation. Tombstone entries have a configurable time-to-live before they are compacted away. Nodes that have been partitioned away from the network for a longer period of time than this TTL should ideally have their indexes removed before being allowed back into the cluster. If not, there is a risk of resurrecting previously removed documents. Vespa does not currently detect or handle this scenario automatically. See the documentation on [data-retention-vs-size](/en/operations/self-managed/admin-procedures.html#data-retention-vs-size). ### Q/A #### How does Vespa perform read-repair for Get-operations, and how many replicas are consulted? When the distributor process that is responsible for a particular data bucket receives a Get operation, it checks its locally cached replica metadata state for inconsistencies. If all replicas have consistent metadata, the operation is routed to a single replica—preferably located on the same host as the distributor, if present. This is the normal case when the bucket replicas are in sync. If there is at least one replica metadata mismatch, the distributor automatically initiates a read-repair process: 1. The distributor splits the bucket replicas into subsets based on their metadata, where all replicas in each subset have the same metadata. It then sends a lightweight metadata-only Get to one replica in each subset. The core assumption is that all these replicas have the same set of document versions, and that it suffices to consult one replica in the set. If a metadata read fails, the distributor will automatically fail over to another replica in the subset. 2. It then sends one full Get to a node in the replica set that returned the _highest_timestamp. This means that if you have 100 replicas and 1 has different metadata from the remaining 99, only 2 nodes in total will be initially queried, and only 1 will receive the actual (full) Get read. Similar algorithms are used by other operations that may trigger read/write-repair. #### Since Vespa performs read-repair when inconsistencies are detected, does this mean replies are strongly consistent? Unfortunately not. Vespa does not offer any cross-document transactions, so in this case strong consistency implies single-object _linearizability_ (as opposed to_strict serializability_ across multiple objects). Linearizability requires the ability to reach a majority consensus amongst a particular known and stable configuration of replicas (side note: replica sets can be reconfigured in strongly consistent algorithms like Raft and Paxos, but such a reconfiguration must also be threaded through the consensus machinery). The active replica set for a given data bucket (and thus the documents it logically contains) is ephemeral and dynamic based on the nodes that are currently available in the cluster (as seen from the cluster controller). This precludes having a stable set of replicas that can be used for reaching majority decisions. See also [Vespa and CAP](#vespa-and-cap). #### In what kind of scenario might Vespa return a stale version of a document? Stale document versions may be returned when all replicas containing the most recent document version have become unavailable. Example scenario (for simplicity—but without loss of generality—assuming redundancy 1) in a cluster with two nodes {A, B}: 1. Document X is stored in a replica on node A with timestamp 100. 2. Node A goes down; node B takes over ownership. 3. A write request is received for document X; it is stored on node B with timestamp 200 and ACKed to the client. 4. Node B goes down. 5. Node A comes back up. 6. A read request arrives for document X. The only visible replica is on node A, which ends up serving the request. 7. The document version at timestamp 100 is returned to the client. Since the write at `t=200` _happens-after_ the write at `t=100`, returning the version at`t=100` violates linearizability. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Vespa and CAP](#vespa-and-cap) - [Write durability and consistency](#write-durability-and-consistency) - [Read consistency](#read-consistency) - [Replica reconciliation](#replica-reconciliation) - [Q/A](#qa) --- # Source: https://docs.vespa.ai/en/reference/ranking/constant-tensor-json-format.html.md # Constant Tensor JSON Format This document describes with examples the JSON formats accepted when reading tensor constants from a file. For convenience, compactness, and readability there are various formats that can be used depending on the detailed tensor type: - [Dense tensors](#dense-tensors): indexed dimensions only - [Sparse tensors](#sparse-tensors): mapped dimensions only - [Mixed tensors](#mixed-tensors): both indexed and mapped dimensions ## Canonical type A tensor type can be declared with its dimension in any order, but internally they will always be sorted in alphabetical order. So the type "`tensor(category{}, brand{}, a[3], x[768], d0[1])`" has the canonical string representation "`tensor(a[3],brand{},category{},d0[1],x[768])`" and the "x" dimension with size 768 is the innermost. For constants, all indexed dimensions must have a known size. ## Dense tensors Tensors using only indexed dimensions are used for storing a vector, a matrix, and so on and are collectively known as "dense" tensors. These are particularly easy to handle, as they always have a known number of cells in a well-defined order. They can be input as nested arrays of numerical values. Example with vector of size 5: ``` { "type": "tensor(x[5])", "values": [13.25, -22, 0.4242, 0, -17.0] } ``` The "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is similar to "Indexed tensors short form" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-indexed). Example of a 3x4 matrix; note that the dimension names will always be processed in [alphabetical order](#canonical-type) from outermost to innermost. ``` { "type": "tensor(bar[3],foo[4])", "values": [ [2.5, 1.0, 2.0, 3.0], [1.0, 2.0, 3.0, 2.0], [2.0, 3.0, 2.0, 1.5] ] } ``` Note that the arrays must have exactly the declared number of elements for each dimension, and be correctly nested. Example of an ONNX model input where we have an extra "batch" dimension which is unused (size 1) for this particular input, but still requires extra brackets: ``` { "type": "tensor(d0[1],d1[5],d2[2])", "values": [ [ [1.1, 1.2], [2.1, 2.2], [3.1, 3.2], [4.1, 4.2], [5.1, 5.2] ] ] } ``` ## Sparse tensors Tensors using only mapped dimensions are collectively known as "sparse" tensors. JSON input for these will list the cells directly. Tensors with only one mapped dimension can use as simple JSON object as input: ``` { "type": "tensor(category{})", "cells": { "tag": 2.5, "another": 2.75 } } ``` The "type" field is optional. This format is similar to "Short form for tensors with a single mapped dimension" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-mapped). Tensors with multiple mapped dimensions must use an array of objects, where each object has an "address" containing the labels for all dimensions, and a "value" with the cell value: ``` { "type": "tensor(category{},product{})", "cells": [ { "address": { "category": "foo", "product": "bar" }, "value": 1.5 }, { "address": { "category": "qux", "product": "zap" }, "value": 3.5 }, { "address": { "category": "pop", "product": "rip" }, "value": 6.5 } ] } ``` Again, the "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is also known as the [general verbose form](../schemas/document-json-format.html#tensor), and it's possible to use it for any tensor type. ## Mixed tensors Tensors with both mapped and indexed dimensions can use a "blocks" format; this is similar to the "cells" formats for sparse tensors, but instead of a single cell value you get a block of values for each address. With one mapped dimension and two indexed dimensions: ``` { "type": "tensor(a{},x[3],y[4])", "blocks": { "bar": [ [1.0, 2.0, 0.0, 3.0], [2.0, 2.5, 2.0, 0.5], [3.0, 6.0, 9.0, 9.0] ], "foo": [ [1.0, 0.0, 2.0, 3.0], [2.0, 2.5, 2.0, 0.5], [3.0, 3.0, 6.0, 9.0] ] } } ``` The "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is similar to the first variant of "Mixed tensors short form" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-mixed). With two mapped dimensions and one indexed dimensions: ``` { "type": "tensor(a{},b{},x[3])", "blocks": [ { "address": { "a": "qux", "b": "zap" }, "values": [2.5, 3.5, 4.5] }, { "address": { "a": "foo", "b": "bar" }, "values": [1.5, 2.5, 3.5] }, { "address": { "a": "pop", "b": "rip" }, "values": [3.5, 4.5, 5.5] } ] } ``` Again, the "type" field is optional. This format is similar to the second variant of "Mixed tensors short form" in the [document JSON format](../schemas/document-json-format.html#tensor-short-form-mixed). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/performance/container-http.html.md # HTTP Performance Testing of the Container using Gatling For container testing, more flexibility and more detailed checking than straightforward saturating an interface with HTTP requests is often required. The stress test tool [Gatling](https://gatling.io/) provides such capabilities in a flexible manner with the possibility of writing arbitrary plug-ins and a DSL for the most common cases. This document shows how to get started using Gatling with Vespa. Experienced Gatling users should find there is nothing special with testing Vespa versus other HTTP services. ## Install Gatling Refer to Gatling's [documentation for getting started](https://gatling.io/docs/gatling/reference/current/), or simply get the newest version from the[Gatling front page](https://gatling.io/), unpack the tar ball and jump straight into it. The tool runs happily from the directory created when unpacking it. This tutorial is written with Gatling 2 in mind. ## Configure the First Test with a Query Log Refer to the Gatling documentation on how to set up the recorder. This tool acts as a browser proxy, recording what you do in the browser, allowing you to replay that as a test scenario. After running _bin/recorder.sh_ and setting package to _com.vespa.example_and class name to _VespaTutorial_, running a simple query against your node _mynode_ (running e.g.[album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java)), should create a basic simulation looking something like the following in_user-files/simulations/com/vespa/example/VespaTutorial.scala_: ``` package com.vespa.example import io.gatling.core.Predef._ import io.gatling.core.session.Expression import io.gatling.http.Predef._ import io.gatling.jdbc.Predef._ import io.gatling.http.Headers.Names._ import io.gatling.http.Headers.Values._ import scala.concurrent.duration._ import bootstrap._ import assertions._ class VespaTutorial extends Simulation { val httpProtocol = http .baseURL("http://mynode:8080") .acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") .acceptEncodingHeader("gzip, deflate") .connection("keep-alive") .userAgentHeader("Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0") val headers_1 = Map("""Cache-Control""" -> """max-age=0""") val scn = scenario("Scenario Name") .exec(http("request_1") .get("""/search/?query=bad""") .headers(headers_1)) setUp(scn.inject(atOnce(1 user))).protocols(httpProtocol) } ``` Running a single query over and over again is not useful, so we have a tiny query log in a CSV file we want to run in our test,_user-files/data/userinput.csv_: ``` userinput bad religion bad lucky oops radiohead bad jackson ``` As usual for CSV files, the first line names the parameters. A literal comma may be escaped with backslash as "\,". Gatling takes hand of URL quoting, there is no need to e.g. encode space as "%20". Add a feeder: ``` package com.vespa.example import io.gatling.core.Predef._ import io.gatling.core.session.Expression import io.gatling.http.Predef._ import io.gatling.jdbc.Predef._ import io.gatling.http.Headers.Names._ import io.gatling.http.Headers.Values._ import scala.concurrent.duration._ import bootstrap._ import assertions._ class VespaTutorial extends Simulation { val httpProtocol = http .baseURL("http://mynode:8080") .acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") .acceptEncodingHeader("gzip, deflate") .connection("keep-alive") .userAgentHeader("Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0") val headers_1 = Map("""Cache-Control""" -> """max-age=0""") val scn = scenario("Scenario Name") .feed(csv("userinput.csv").random) .exec(http("request_1") .get("/search/") .queryParam("query", "${userinput}") .headers(headers_1)) setUp(scn.inject(constantRate(100 usersPerSec) during (10 seconds))) .protocols(httpProtocol) } ``` Now, we have done a couple of changes to the original scenario. First, we have added the feeder. Since we do not have enough queries available for running long enough to get a scenario for some traffic, we chose the "random" strategy. This means a random user input string will be chosen for each invocation, and it might be reused. Also, we have changed how the test is run, from just a single query, into a constant rate of 100 users for 10 seconds. We should expect something as close as possible to 100 QPS in our test report. ## Running a Benchmark We now have something we can run both on a headless node and on a personal laptop, sample run output: ``` $ ./bin/gatling.sh GATLING_HOME is set to ~/tmp/gatling-charts-highcharts-2.0.0-M3a Choose a simulation number: [0] advanced.AdvancedExampleSimulation [1] basic.BasicExampleSimulation [2] com.vespa.example.VespaTutorial 2 Select simulation id (default is 'vespatutorial'). Accepted characters are a-z, A-Z, 0-9, - and _ Select run description (optional) Simulation com.vespa.example.VespaTutorial started... ================================================================================ 2014-04-09 11:54:33 0s elapsed ---- Scenario Name ------------------------------------------------------------- [-] 0% waiting: 998 / running: 2 / done:0 ---- Requests ------------------------------------------------------------------ > Global (OK=0 KO=0 ) ================================================================================ ================================================================================ 2014-04-09 11:54:38 5s elapsed ---- Scenario Name ------------------------------------------------------------- [####################################] 49% waiting: 505 / running: 0 / done:495 ---- Requests ------------------------------------------------------------------ > Global (OK=495 KO=0 ) > request_1 (OK=495 KO=0 ) ================================================================================ ================================================================================ 2014-04-09 11:54:43 10s elapsed ---- Scenario Name ------------------------------------------------------------- [#########################################################################] 99% waiting: 8 / running: 0 / done:992 ---- Requests ------------------------------------------------------------------ > Global (OK=992 KO=0 ) > request_1 (OK=992 KO=0 ) ================================================================================ ================================================================================ 2014-04-09 11:54:43 10s elapsed ---- Scenario Name ------------------------------------------------------------- [##########################################################################]100% waiting: 0 / running: 0 / done:1000 ---- Requests ------------------------------------------------------------------ > Global (OK=1000 KO=0 ) > request_1 (OK=1000 KO=0 ) ================================================================================ Simulation finished. Generating reports... Parsing log file(s)... Parsing log file(s) done ================================================================================ ---- Global Information -------------------------------------------------------- > numberOfRequests 1000 (OK=1000 KO=0 ) > minResponseTime 10 (OK=10 KO=- ) > maxResponseTime 30 (OK=30 KO=- ) > meanResponseTime 10 (OK=10 KO=- ) > stdDeviation 2 (OK=2 KO=- ) > percentiles1 10 (OK=10 KO=- ) > percentiles2 10 (OK=10 KO=- ) > meanNumberOfRequestsPerSecond 99 (OK=99 KO=- ) ---- Response Time Distribution ------------------------------------------------ > t < 800 ms 1000 (100%) > 800 ms < t < 1200 ms 0 ( 0%) > t > 1200 ms 0 ( 0%) > failed 0 ( 0%) ================================================================================ Reports generated in 0s. Please open the following file : ~/tmp/gatling-charts-highcharts-2.0.0-M3a/results/vespatutorial-20140409115432/index.html ``` The report gives graphs showing how the test progressed and summaries for failures and time spent. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/performance/container-tuning.html.md # Container Tuning A collection of configuration parameters to tune the Container as used in Vespa. Some configuration parameters have native [services.xml](../application-packages.html) support while others are configured through [generic config overrides](../reference/applications/config-files.html#generic-configuration-in-services-xml). ## Container worker threads The container uses multiple thread pools for its operations. Most components including request handlers use the container's [default thread pool](../reference/applications/services/container.html#threadpool), which is controlled by a shared executor instance. Any component can utilize the default pool by injecting an `java.util.concurrent.Executor` instance. Some built-in components have dedicated thread pools - such as the Jetty server, the [search handler](../reference/applications/services/search.html#threadpool) and [document-processing](../reference/applications/services/docproc.html#threadpool) chains. These thread pools are injected through special wiring in the config model and are not easily accessible from other components. The thread pools are by default scaled on the system resources as reported by the JVM (`Runtime.getRuntime().availableProcessors()`). It's paramount that the `-XX:ActiveProcessorCount`/`jvm_availableProcessors` configuration is correct for the container to work optimally. The [default thread pool](../reference/applications/services/container.html#threadpool) configuration can be overridden through services.xml. We recommend you keep the default configuration as it's tuned to work across a variety of workloads. Note that the default configuration and pool usage may change between minor versions. The container will pre-start the minimum number of worker threads, so even an idle container may report running several hundred threads. The [search handler](../reference/applications/services/search.html#threadpool) and [document processing handler](../reference/applications/services/docproc.html#threadpool) thread pools each pre-start the number of workers set in their configurations. Note that tuning the capacity upwards increases the risk of high GC pressure as concurrency becomes higher with more in-flight requests. The GC pressure is a function of number of in-flight requests, the time it takes to complete the request and the amount of garbage produced per request. Increasing the queue size will allow the application to handle shorter traffic bursts without rejecting requests, although increasing the average latency for those requests that are queued up. Large queues will also increase heap consumption in overload situations. For some thread pools, extra threads will be created once the queue is full (when [`max`](../reference/applications/services/search.html#threads.max) is specified), and are destroyed after an idle timeout. If all threads are occupied, requests are rejected with a 503 response. The effective thread pool configuration and utilization statistics can be observed through the [Container Metrics](/en/operations/metrics.html#container-metrics). See [Thread Pool Metrics](/en/operations/metrics.html#thread-pool-metrics) for a list of metrics exported. **Note:** If the queue size is set to 0 the metric measuring the queue size -`jdisc.thread_pool.work_queue.size` - will instead switch to measure how many threads are active. ### Recommendation A fixed size pool is preferable for stable latency during peak load, at a cost of a higher static memory load and increased context-switching overhead if excessive number of threads are configured. Variable size pool is mostly beneficial to minimize memory consumption during low-traffic periods, and in general if the size of peak load is somewhat unknown. The downside is that once all core threads are active, latency will increase as additional tasks are queued and launching extra threads is relatively expensive as it involves system calls to the OS. ### Example Consider a container host with 8 vCPU. Setting `4` on the [search handler threadpool](../reference/applications/services/search.html#threadpool) yields `4 * 8 = 32` worker threads, and adding `25` gives the pool a total queue capacity of `32 * 25 = 800` requests. The same thread calculation applies to the [document processing handler threadpool](../reference/applications/services/docproc.html#threadpool), which does not support queue configuration. The example below shows a consistent configuration where the default thread pool, the search handler threadpool, and the document processing handler threadpool are all kept fixed. ``` ``` 5 25 4 25 2 ``` ``` ## Container memory usage > Help, my container nodes are using more than 70% memory! It's common to observe the container process utilizing its maximum configured heap size. This, by itself, is not necessarily an indication of a problem. The Java Virtual Machine (JVM) manages memory within the allocated heap, and it's designed to use as much of it as possible to reduce the frequency of garbage collection. To understand whether enough memory is allocated, look at the garbage collection activity. If GC is running frequently and using significant CPU or causing long pauses, it might indicate that the heap size is too small for the workload. In such cases, consider increasing the maximum heap size. However, if the garbage collector is running infrequently and efficiently, it's perfectly normal for the container to utilize most or all of its allocated heap, and even more (as some memory will also be allocated outside the heap; e.g. direct buffers for efficient data transfer). Vespa exports several metrics to allow you to monitor JVM GC performance, such as [jvm.gc.overhead](../reference/operations/metrics/container.html#jvm_gc_overhead) - if this exceeds 8-10% you should consider increasing heap memory and/or tuning GC settings. ## JVM heap size Change the default JVM heap size settings used by Vespa to better suit the specific hardware settings or application requirements. By setting the relative size of the total JVM heap in [percentage of available memory](../reference/applications/services/container.html#nodes), one does not know exactly what the heap size will be, but the configuration will be adaptable and ensure that the container can start even in environments with less available memory. The example below allocates 50% of available memory on the machine to the JVM heap: ``` ``` ``` ``` ## JVM Tuning Use _gc-options_ for controlling GC related parameters and _options_ for tuning other parameters. See [reference documentation](../reference/applications/services/container.html#nodes). Example: Running with 4 GB heap using G1 garbage collector and using NewRatio = 1 (equal size of old and new generation) and enabling verbose GC logging (logged to stdout to vespa.log file). ``` ``` ``` ``` The default heap size with docker image is 1.5g which can for high throughput applications be on the low side, causing frequent garbage collection. By default, the G1GC collector is used. ### Config Server and Config Proxy The config server and proxy are not executed based on the model in _services.xml_. On the contrary, they are used to bootstrap the services in that model. Consequently, one must use configuration variables to set the JVM parameters for the config server and config proxy. They also need to be restarted (_services_ in the config proxy's case) after a change, but one does _not_ need to _vespa prepare_ or _vespa activate_ first. Example: ``` VESPA_CONFIGSERVER_JVMARGS -Xlog:gc VESPA_CONFIGPROXY_JVMARGS -Xlog:gc -Xmx256m ``` Refer to [Setting Vespa variables](/en/operations/self-managed/files-processes-and-ports.html#environment-variables). ## Container warmup Some applications observe that the first queries made to a freshly started container take a long time to complete. This is typically due to some components performing lazy setup of data structures or connections. Lazy initialization should be avoided in favor of eager initialization in component constructor, but this is not always possible. A way to avoid problems with the first queries in such cases is to perform warmup queries at startup. This is done by issuing queries from the constructor of the Handler of regular queries. If using the default handler, [com.yahoo.search.handler.SearchHandler](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/handler/SearchHandler.java), subclass this and configure your subclass as the handler of query requests in _services.xml_. Add a call to a warmupQueries() method as the last line of your handler constructor. The method can look something like this: ``` ``` private void warmupQueries() { String[] requestUris = new String[] {"warmupRequestUri1", "warmupRequestUri2"}; int warmupIterations = 50; for (int i = 0; i < warmupIterations; i++) { for (String requestUri : requestUris) { handle(HttpRequest.createTestRequest(requestUri, com.yahoo.jdisc.http.HttpRequest.Method.GET)); } } } ``` ``` Since these queries will be executed before the container starts accepting external queries, they will cause the first external queries to observe a warmed up container instance. Use [metrics.ignore](../reference/api/query.html#metrics.ignore) in the warmup queries to eliminate them from being reported in metrics. ### Disabling warmups Warmups can be disabled by adding the following container http config to the container section in services.xml: ``` ``` false ``` ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Container worker threads](#container-worker-threads) - [Recommendation](#recommendation) - [Example](#container-worker-threads-example) - [Container memory usage](#container-memory-usage) - [JVM heap size](#jvm-heap-size) - [JVM Tuning](#jvm-tuning) - [Config Server and Config Proxy](#config-server-and-config-proxy) - [Container warmup](#container-warmup) - [Disabling warmups](#disabling-warmups) --- # Source: https://docs.vespa.ai/en/reference/operations/metrics/container.html.md # Source: https://docs.vespa.ai/en/reference/applications/services/container.html.md # Source: https://docs.vespa.ai/en/operations/self-managed/container.html.md # Container This is the Container service operational guide. ![Vespa Overview](/assets/img/vespa-overview.svg) Note that "container" is an overloaded concept in Vespa - in this guide it refers to service instance nodes in blue. Refer to [container metrics](../metrics.html#container-metrics). ## Endpoints Container service(s) hosts the query and feed endpoints - examples: - [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml) configures \_both\_ query and feed in the same container cluster (i.e. service): ``` ``` ``` ``` - [multinode-HA](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml) configures query and feed in separate container clusters (i.e. services): ``` ``` ``` ``` Observe that `` and `` are located in separate clusters in the second example, and endpoints are therefore different. **Important:** The first thing to validate when troubleshooting query errors is to make sure that the endpoint is correct, i.e. that query requests hit the correct nodes. A query will be written to the [access log](../access-logging.html)on one of the nodes in the container cluster ## Inspecting Vespa Java Services using JConsole Determine the state of each running Java Vespa service using JConsole. JConsole is distributed along with the Java developer kit. Start JConsole: ``` $ jconsole : ``` where the host and port determine which service to attach to. For security purposes the JConsole tool can not directly attach to Vespa services from external machines. ### Connecting to a Vespa instance To attach a JConsole to a Vespa service running on another host, create a tunnel from the JConsole host to the Vespa service host. This can for example be done by setting up two SSH tunnels as follows: ``` $ ssh -N -L:localhost: & $ ssh -N -L:localhost: & ``` where port1 and port2 are determined by the type of service (see below). A JConsole can then be attached to the service as follows: ``` $ jconsole localhost: ``` Port numbers: | Service | Port 1 | Port 2 | | --- | --- | --- | | QRS | 19015 | 19016 | | Docproc | 19123 | 19124 | Updated port information can be found by running: ``` $[vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect)service ``` where the resulting RMIREGISTRY and JMX lines determine port1 and port2, respectively. ### Examining thread states The state of each container is available in JConsole by pressing the Threads tab and selecting the thread of interest in the threads list. Threads of interest includes _search_, _connector_, _closer_, _transport_ and _acceptor_ (the latter four are used for backend communications). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/applications/containers.html.md # Container clusters Vespa's Java container - JDisc, hosts all application components as well as the stateless logic of Vespa itself. Which particular components are hosted by a container cluster is configured in services.xml. The main features of JDIsc are: - HTTP serving out of the box from an embedded Jetty server, and support for plugging in other transport mechanisms. - Integration with the config system of Vespa which allows components to [receive up-to-date config](configuring-components.html) (by constructor injection) resulting from application deployment. - [Dependency injection based on Guice](dependency-injection.html) (Felix), but extended for configs and component collections. - A component model based on [OSGi](bundles.html) which allows component to be (re)deployed to running servers, and to control which APIs they expose to others. - The features above combine to allow application package changes (changes to components, configuration or data) to be applied by Vespa without disrupting request serving nor requiring restarts. - Standard component types exists for - [general request handling](request-handlers.html) - [chained request-response processing](processing.html) - [processing document writes](document-processors.html) - [intercepting queries and results](searchers.html) - [rendering responses](result-renderers.html) Application components can be of any other type as well and do not need to reference any Vespa API to be loaded and managed by the container. - A general [chain composition](chaining.html) mechanism for components. ## Developing Components - The JDisc container provides a framework for processing requests and responses, named _Processing_ - its building blocks are: - [Chains](chaining.html) of other components that are to be executed serially, with each providing some service or transform - [Processors](processing.html) that change the request and / or the response. They may also make multiple forward requests, in series or parallel, or manufacture the response content themselves - [Renderers](processing.html#response-rendering) that are used to serialize a Processor's response before returning it to a client - Application Lifecycle and unit testing: - [Configuring components](configuring-components.html) with custom configuration - [Component injection](dependency-injection.html) allows components to access other application components - Learn how to [build OSGi bundles](bundles.html) and how to [troubleshoot](bundles.html#troubleshooting) classloading issues - Using [Libraries for Pluggable Frameworks](pluggable-frameworks.html) from a component may result in class loading issues that require extra setup in the application - [Unit testing configurable components](unit-testing.html#unit-testing-configurable-components) - Handlers and filters: - [Http servers and security filters](http-servers-and-filters.html) for incoming connections on HTTP and HTTPS - [Request handlers](request-handlers.html) to process incoming requests and generate responses - Searchers and Document Processors: - [Searcher](searchers.html) and [search result renderer](result-renderers.html) development - [Document processing](document-processors.html) ## Reference documentation - [services.xml](../reference/applications/services/container.html) ## Other related documents - [Designing RESTful web services](web-services.html) as Vespa Components - [healthchecks](../reference/operations/health-checks.html) - using the Container with a VIP - [Vespa Component Reference](../reference/applications/components.html): The Container's request processing lifecycle Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/self-managed/content-node-recovery.html.md # Content node recovery In exceptional cases, one or more content nodes may end up with corrupted data causing it to fail to restart. Possible reasons are - the application configuring a higher memory or disk limit such that the node is allowed to accept more data than it can manage, - hardware failure, or - a bug in Vespa. Normally a corrupted node can just be wiped of all data or removed from the cluster, but when this happens simultaneously to multiple nodes, or redundancy 1 is used, it may be necessary to recover the node(s) to avoid data loss. This documents explains the procedure. ## Recovery steps On each of the nodes needing recovery: 1. [Stop services](admin-procedures.html#vespa-start-stop-restart) on the node if running. 2. Repair the node: - If the node cannot start due to needing more memory than available: Increase the memory available to the node, or if not possible stop all non-essential processes on the node using `vespa-sentinel-cmd list` and `vespa-sentinel-cmd stop [name]`, and (if necessary) start only the content node process using `vespa-sentinel-cmd start searchnode`. When the node is successfully started, issue delete operations or increase the cluster size to reduce the amount of data on the node if necessary. - If the node cannot start due to needing more disk than available: Increase the disk available to the node, or if not possible delete non-essential data such as logs and cached packages. When the node is successfully started, issue delete operations or increase the cluster size to reduce the amount of data on the node if necessary. - If the node cannot start for any other reason, repair the data manually as needed. This procedure will depend on the specific nature of the data corruption. 3. [Start services](admin-procedures.html#vespa-start-stop-restart) on the node. 4. Verify that the node is fully up before doing the next node - metrics/interfaces to be used to evaluate if the next node can be stopped: - Check if a node is up using [/state/v1/health](../../reference/api/state-v1.html#state-v1-health). - Check the `vds.idealstate.merge_bucket.pending.average` metric on content nodes. When 0, all buckets are in sync - see [example](../metrics.html). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/content/content-nodes.html.md # Content nodes, states and metrics ![Content cluster overview](/assets/img/elastic-feed.svg) Content cluster processes are _distributor_, _proton_ and _cluster controller_. The distributor calculates the correct content node using the distribution algorithm and the [cluster state](#cluster-state). With no known cluster state, the client library will send requests to a random node, which replies with the updated cluster state if the node was incorrect. Cluster states are versioned, such that clients hitting outdated distributors do not override updated states with old states. The [distributor](#distributor) keeps track of which content nodes that stores replicas of each bucket (maximum one replica each), based on [redundancy](../reference/applications/services/content.html#redundancy) and information from the _cluster controller_. A bucket maps to one distributor only. A distributor keeps a bucket database with bucket metadata. The metadata holds which content nodes store replicas of the buckets, the checksum of the bucket content and the number of documents and meta entries within the bucket. Each document is algorithmically mapped to a bucket and forwarded to the correct content nodes. The distributors detect whether there are enough bucket replicas on the content nodes and add/remove as needed. Write operations wait for replies from every replica and fail if less than redundancy are persisted within timeout. The [cluster controller](#cluster-controller) manages the state of the distributor and content nodes. This _cluster state_ is used by the document processing chains to know which distributor to send documents to, as well as by the distributor to know which content nodes should have which bucket. ## Cluster state There are three kinds of state: [unit state](../reference/api/cluster-v2.html#state-unit), [user state](../reference/api/cluster-v2.html#state-user) and [generated state](../reference/api/cluster-v2.html#state-generated) (a.k.a. _cluster state_). For new cluster states, the cluster state version is incremented, and the new cluster state is broadcast to all nodes. There is a minimum time between each cluster state change. It is possible to set a minimum capacity for the cluster state to be `up`. If a cluster has so many nodes unavailable that it is considered down, the state of each node is irrelevant, and thus new cluster states will not be created and broadcast before enough nodes are back for the cluster to come back up. A cluster state indicating the entire cluster is down, may thus have outdated data on the node level. ## Cluster controller The main task of the cluster controller is to maintain the [cluster state](#cluster-state). This is done by _polling_ nodes for state, _generating_ a cluster state, which is then _broadcast_ to all the content nodes in the cluster. Note that clients do not interface with the cluster controller - they get the cluster state from the distributors - [details](#distributor). | Task | Description | | --- | --- | | Node state polling | The cluster controller polls nodes, sending the current cluster state. If the cluster state is no longer correct, the node returns correct information immediately. If the state is correct, the request lingers on the node, such that the node can reply to it immediately if its state changes. After a while, the cluster controller will send a new state request to the node, even with one pending. This triggers a reply to the lingering request and makes the new one linger instead. Hence, nodes have a pending state request. During a controlled node shutdown, it starts the shutdown process by responding to the pending state request that it is now stopping. **Note:** As controlled restarts or shutdowns are implemented as TERM signals from the [config-sentinel](/en/operations/self-managed/config-sentinel.html), the cluster controller is not able to differ between controlled and other shutdowns. | | Cluster state generation | The cluster controller translates unit and user states into the generated _cluster state_ | | Cluster state broadcast | When node unit states are received, a cluster controller internal cluster state is updated. New cluster states are distributed with a minimum interval between. A grace period per unit state too - e.g., distributors and content nodes that are on the same node often stop at the same time. The version number is incremented, and the new cluster state is broadcast. If cluster state version is [reset](../operations/self-managed/admin-procedures.html#cluster-state), distributors and content node processes may have to be restarted in order for the system to converge to the new state. Nodes will reject lower cluster state versions to prevent race conditions caused by overlapping cluster controller leadership periods. | See [cluster controller configuration](../operations/self-managed/admin-procedures.html#cluster-controller-configuration). ### Master election Vespa can be configured with one cluster controller. Reads and writes will work well in case of cluster controller down, but other changes to the cluster (like a content node going down) will not be handled. It is hence recommended to configure a set of cluster controllers. The cluster controller nodes elect a master, which does the node polling and cluster state broadcast. The other cluster controller nodes only exist to do master election and potentially take over if the master dies. All cluster controllers will vote for the cluster controller with the lowest index that says it is ready. If a cluster controller has more than half of the votes, it will be elected master. As a majority vote is required, the number of cluster controllers should be an odd number of 3 or greater. A fresh master will not broadcast states before a transition time is passed, allowing an old master to have some time to realize it is no longer the master. ## Distributor Buckets are mapped to distributors using the [ideal state algorithm](idealstate.html). As the cluster state changes, buckets are re-mapped immediately. The mapping does not overlap - a bucket is owned by one distributor. Distributors do not persist the bucket database, the bucket-to-content-node mapping is kept in memory in the distributor. Document count, persisted size and a metadata checksum per bucket is stored as well. At distributor (re)start, content nodes are polled for bucket information, and return which buckets are owned by this distributor (using the ideal state algorithm). There is no centralized bucket directory node. Likewise, at any distributor cluster state change, content nodes are polled for bucket handover - a distributor will then handle a new set of buckets. Document operations are mapped to content nodes based on bucket locations - each put/update/get/remove is mapped to a [bucket](buckets.html)and sent to the right content nodes. To manage the document set as it grows and nodes change, buckets move between content nodes. Document API clients (i.e. container nodes with[\](../reference/applications/services/container.html#document-api)) do not communicate directly with the cluster controller, and do not know the cluster state at startup. Clients therefore start out by sending requests to a random distributor. If the document operation hits the wrong distributor,`WRONG_DISTRIBUTION` is returned, with the current cluster state in the response.`WRONG_DISTRIBUTION` is hence expected and normal at cold start / state change events. ### Timestamps [Write operations](../writing/reads-and-writes.html)have a _last modified time_ timestamp assigned when passing through the distributor. The timestamp is guaranteed to be unique within the[bucket](buckets.html) where it is stored. The timestamp is used by the content layer to decide which operation is newest. These timestamps can be used when [visiting](../writing/visiting.html), to process/retrieve documents within a given time range. To guarantee unique timestamps, they are in microseconds - the microsecond part is generated to avoid conflicts with other documents. If documents are migrated _between_ clusters, the target cluster will have new timestamps for their entries. Also, when [reprocessing documents](../applications/document-processors.html) _within_ a cluster, documents will have new timestamps, even if not modified. ### Ordering The Document API uses the [document ID](../schemas/documents.html#document-ids) to order operations. A Document API client ensures that only one operation is pending at the same time. This ensures that if a client sends multiple operations for the same document, they will be processed in a defined order. This is done by queueing pending operations _locally_ at the client. **Note:** If sending two write operations to the same document, and the first operation fails, the enqueued operation is sent. In other words, the client does not assume there exists any kind of dependency between separate operations to the same document. If you need to enforce this, use[test-and-set conditions](../writing/document-v1-api-guide.html#conditional-writes)for writes. If _different_ clients have pending operations on the same document, the order is unspecified. ### Maintenance operations Distributors track which content nodes have which buckets in their bucket database. Distributors then use the [ideal state algorithm](idealstate.html)to generate bucket _maintenance operations_. A stable system has all buckets located per the ideal state: - If buckets have too few replicas, new are generated on other content nodes. - If the replicas differ, a bucket merge is issued to get replicas consistent. - If a buckets has too many replicas, superfluous are deleted. Buckets are merged, if inconsistent, before deletion. - If two buckets exist, such that both may contain the same document, the buckets are split or joined to remove such overlapping buckets. Read more on [inconsistent buckets](buckets.html). - If buckets are too small/large, they will be joined or split. The maintenance operations have different priorities. If no maintenance operations are needed, the cluster is said to be in the _ideal state_. The distributors synchronize maintenance load with user load, e.g. to remap requests to other buckets after bucket splitting and joining. ### Restart When a distributor stops, it will try to respond to any pending cluster state request first. New incoming requests after shutdown is commenced will fail immediately, as the socket is no longer accepting requests. Cluster controllers will thus detect processes stopping almost immediately. The cluster state will be updated with the new state internally in the cluster controller. Then the cluster controller will wait for maximum [min\_time\_between\_new\_systemstates](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) before publishing the new cluster state - this to reduce short-term state fluctuations. The cluster controller has the option of setting states to make other distributors take over ownership of buckets, or mask the change, making the buckets owned by the distributor restarting unavailable for the time being. If the distributor transitions from `up` to `down`, other distributors will request metadata from the content nodes to take over ownership of buckets previously owned by the restarting distributor. Until the distributors have gathered this new metadata from all the content nodes, requests for these buckets can not be served, and will fail back to client. When the restarting node comes back up and is marked `up` in the cluster state again, the additional nodes will discard knowledge of the extra buckets they previously acquired. For requests with timeouts of several seconds, the transition should be invisible due to automatic client resending. Requests with a lower timeout might fail, and it is up to the application whether to resend or handle failed requests. Requests to buckets not owned by the restarting distributor will not be affected. ## Content node The content node runs _proton_, which is the query backend. ### Restart When a content node does a controlled restart, it marks itself in the `stopping` state and rejects new requests. It will process its pending request queue before shutting down. Consequently, client requests are typically unaffected by content node restarts. The currently pending requests will typically be completed. New copies of buckets will be created on other nodes, to store new requests in appropriate redundancy. This happens whether node transitions through `down` or `maintenance` state. The difference being that if transitioning through `maintenance`, the distributor will not start any effort of synchronizing new copies with existing copies. They will just store the new requests until the maintenance node comes back up. When starting, content nodes will start with gathering information on what buckets it has data stored for. While this is happening, the service layer will expose that it is `down`. ## Metrics | Metric | Description | | --- | --- | | .idealstate.idealstate\_diff | This metric tries to create a single value indicating distance to the ideal state. A value of zero indicates that the cluster is in the ideal state. Graphed values of this metric gives a good indication for how fast the cluster gets back to the ideal state after changes. Note that some issues may hide other issues, so sometimes the graph may appear to stand still or even go a bit up again, as resolving one issue may have detected one or several others. | | .idealstate.buckets\_toofewcopies | Specifically lists how many buckets have too few copies. Compare to the _buckets_ metric to see how big a portion of the cluster this is. | | .idealstate.buckets\_toomanycopies | Specifically lists how many buckets have too many copies. Compare to the _buckets_ metric to see how big a portion of the cluster this is. | | .idealstate.buckets | The total number of buckets managed. Used by other metrics reporting bucket counts to know how big a part of the cluster they relate to. | | .idealstate.buckets\_notrusted | Lists how many buckets have no trusted copies. Without trusted buckets operations against the bucket may have poor performance, having to send requests to many copies to try and create consistent replies. | | .idealstate.delete\_bucket.pending | Lists how many buckets that needs to be deleted. | | .idealstate.merge\_bucket.pending | Lists how many buckets there are, where we suspect not all copies store identical document sets. | | .idealstate.split\_bucket.pending | Lists how many buckets are currently being split. | | .idealstate.join\_bucket.pending | Lists how many buckets are currently being joined. | | .idealstate.set\_bucket\_state.pending | Lists how many buckets are currently altered for active state. These are high priority requests which should finish fast, so these requests should seldom be seen as pending. | Example, using the [quickstart](../basics/deploy-an-application-local.html) - find the distributor port (look for HTTP): ``` $ docker exec vespa vespa-model-inspect service distributor distributor @ vespa-container : content music/distributor/0 tcp/vespa-container:19112 (MESSAGING) tcp/vespa-container:19113 (STATUS RPC) tcp/vespa-container:19114 (STATE STATUS HTTP) ``` Get the metric value: ``` $ docker exec vespa curl -s http://localhost:19114/state/v1/metrics | jq . | \ grep -A 10 idealstate.merge_bucket.pending "name": "vds.idealstate.merge_bucket.pending", "description": "The number of operations pending", "values": { "average": 0, "sum": 0, "count": 1, "rate": 0.016666, "min": 0, "max": 0, "last": 0 }, ``` ## /cluster/v2 API examples Examples of state manipulation using the [/cluster/v2 API](../reference/api/cluster-v2.html). List content clusters: ``` $ curl http://localhost:19050/cluster/v2/ ``` ``` ``` { "cluster": { "music": { "link": "/cluster/v2/music" }, "books": { "link": "/cluster/v2/books" } } } ``` ``` Get cluster state and list service types within cluster: ``` $ curl http://localhost:19050/cluster/v2/music ``` ``` ``` { "state": { "generated": { "state": "state-generated", "reason": "description" } } "service": { "distributor": { "link": "/cluster/v2/music/distributor" }, "storage": { "link": "/cluster/v2/music/storage" } } } ``` ``` List nodes per service type for cluster: ``` $ curl http://localhost:19050/cluster/v2/music/storage ``` ``` ``` { "node": { "0": { "link": "/cluster/v2/music/storage/0" }, "1": { "link": "/cluster/v2/music/storage/1" } } } ``` ``` Get node state: ``` $ curl http://localhost:19050/cluster/v2/music/storage/0 ``` ``` ``` { "attributes": { "hierarchical-group": "group0" }, "state": { "generated": { "state": "up", "reason": "" }, "unit": { "state": "up", "reason": "" }, "user": { "state": "up", "reason": "" } }, "metrics": { "bucket-count": 0, "unique-document-count": 0, "unique-document-total-size": 0 } } ``` ``` Get all nodes, including topology information (see `hierarchical-group`): ``` $ curl http://localhost:19050/cluster/v2/music/?recursive=true ``` ``` ``` { "state": { "generated": { "state": "up", "reason": "" } }, "service": { "storage": { "node": { "0": { "attributes": { "hierarchical-group": "group0" }, "state": { "generated": { "state": "up", "reason": "" }, "unit": { "state": "up", "reason": "" }, "user": { "state": "up", "reason": "" } }, "metrics": { "bucket-count": 0, "unique-document-count": 0, "unique-document-total-size": 0 } ``` ``` Set node user state: ``` curl -X PUT -H "Content-Type: application/json" --data ' { "state": { "user": { "state": "retired", "reason": "This node will be removed soon" } } }' \ http://localhost:19050/cluster/v2/music/storage/0 ``` ``` ``` { "wasModified": true, "reason": "ok" } ``` ``` ## Further reading - Refer to [administrative procedures](../operations/self-managed/admin-procedures.html) for configuration and state monitoring / management. - Try the [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample app to get familiar with interfaces and behavior. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Cluster state](#cluster-state) - [Cluster controller](#cluster-controller) - [Master election](#master-election) - [Distributor](#distributor) - [Timestamps](#timestamps) - [Ordering](#ordering) - [Maintenance operations](#maintenance-operations) - [Restart](#distributor-restart) - [Content node](#content-node) - [Restart](#content-node-restart) - [Metrics](#metrics) - [/cluster/v2 API examples](#cluster-v2-API-examples) - [Further reading](#further-reading) --- # Source: https://docs.vespa.ai/en/reference/applications/services/content.html.md # services.xml - 'content' ``` [content](#content)[documents](#documents)[document](#document)[document-processing](#document-processing)[min-redundancy](#min-redundancy)[redundancy](#redundancy)[coverage-policy](#coverage-policy)[nodes](services.html#nodes)[node](#node)[group](#group)[distribution](#distribution)[node](#node)[group](#group)[engine](#engine)[proton](#proton)[searchable-copies](#searchable-copies)[tuning](#tuning-proton)[searchnode](#searchnode)[lidspace](#lidspace)[max-bloat-factor](#lidspace-max-bloat-factor)[requestthreads](#requestthreads)[search](#requestthreads-search)[persearch](#requestthreads-persearch)[summary](#requestthreads-summary)[flushstrategy](#flushstrategy)[native](#flushstrategy-native)[total](#flushstrategy-native-total)[maxmemorygain](#flushstrategy-native-total-maxmemorygain)[diskbloatfactor](#flushstrategy-native-total-diskbloatfactor)[component](#flushstrategy-native-component)[maxmemorygain](#flushstrategy-native-component-maxmemorygain)[diskbloatfactor](#flushstrategy-native-component-diskbloatfactor)[maxage](#flushstrategy-native-component-maxage)[transactionlog](#flushstrategy-native-transactionlog)[maxsize](#flushstrategy-native-transactionlog-maxsize)[conservative](#flushstrategy-native-conservative)[memory-limit-factor](#flushstrategy-native-conservative-memory-limit-factor)[disk-limit-factor](#flushstrategy-native-conservative-disk-limit-factor)[initialize](#initialize)[threads](#initialize-threads)[feeding](#feeding)[concurrency](#feeding-concurrency)[niceness](#feeding-niceness)[index](#index)[io](#index-io)[search](#index-io-search)[warmup](#index-warmup)[time](#index-warmup-time)[unpack](#index-warmup-unpack)[removed-db](#removed-db)[prune](#removed-db-prune)[age](#removed-db-prune-age)[interval](#removed-db-prune-interval)[summary](#summary)[io](#summary-io)[read](#summary-io-read)[store](#summary-store)[cache](#summary-store-cache)[maxsize](#summary-store-cache-maxsize)[maxsize-percent](#summary-store-cache-maxsize-percent)[compression](#summary-store-cache-compression)[type](#summary-store-cache-compression-type)[level](#summary-store-cache-compression-level)[logstore](#summary-store-logstore)[maxfilesize](#summary-store-logstore-maxfilesize)[chunk](#summary-store-logstore-chunk)[maxsize](#summary-store-logstore-chunk-maxsize)[compression](#summary-store-logstore-chunk-compression)[type](#summary-store-logstore-chunk-compression-type)[level](#summary-store-logstore-chunk-compression-level)[sync-transactionlog](#sync-transactionlog)[flush-on-shutdown](#flush-on-shutdown)[resource-limits](#resource-limits-proton)[disk](#disk)[memory](#memory)[search](#search)[query-timeout](#query-timeout)[visibility-delay](#visibility-delay)[coverage](#coverage)[minimum](#minimum)[min-wait-after-coverage-factor](#min-wait-after-coverage-factor)[max-wait-after-coverage-factor](#max-wait-after-coverage-factor)[tuning](#tuning)[bucket-splitting](#bucket-splitting)[min-node-ratio-per-group](#min-node-ratio-per-group)[distribution](#distribution_type)[maintenance](#maintenance)[max-document-size](#max-document-size)[merges](#merges)[persistence-threads](#persistence-threads)[resource-limits](#resource-limits)[visitors](#visitors)[max-concurrent](#max-concurrent)[dispatch](#dispatch-tuning)[max-hits-per-partition](#max-hits-per-partition)[dispatch-policy](#dispatch-policy)[prioritize-availability](#prioritize-availability)[min-active-docs-coverage](#min-active-docs-coverage)[top-k-probability](#top-k-probability)[cluster-controller](#cluster-controller)[init-progress-time](#init-progress-time)[transition-time](#transition-time)[max-premature-crashes](#max-premature-crashes)[stable-state-period](#stable-state-period)[min-distributor-up-ratio](#min-distributor-up-ratio)[min-storage-up-ratio](#min-storage-up-ratio)[groups-allowed-down-ratio](#groups-allowed-down-ratio) ``` ## content The root element of a Content cluster definition. Creates a content cluster. A content cluster stores and/or indexes documents. The xml file may have zero or more such tags. Contained in [services](services.html). | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | version | required | number | | 1.0 in this version of Vespa | | id | required for multiple clusters | string | | Name of the content cluster. If none is supplied, the cluster name will be `content`. Cluster names must be unique within the application, if multiple clusters are configured, the name must be set for all but one at minimum. **Note:** Renaming a cluster is the same as dropping the current cluster and adding a new one. This makes data unavailable or lost, depending on hosting model. Deploying with a changed cluster id will therefore fail with a validation override requirement: `Content cluster 'music' is removed. This will cause loss of all data in this cluster. To allow this add content-cluster-removal to validation-overrides.xml, see https://docs.vespa.ai/en/reference/validation-overrides.html`. | Subelements: - [documents](#documents) (required) - [min-redundancy](#min-redundancy) - [redundancy](#redundancy) - [coverage-policy](#coverage-policy) - [nodes](services.html#nodes) - [group](#group) - [engine](#engine) - [search](#search) - [tuning](#tuning) ## documents Contained in [content](#content). Defines which document types should be routed to this content cluster using the default route, and what documents should be kept in the cluster if the garbage collector runs. Read more on [expiring documents](../../../schemas/documents.html#document-expiry). Also have some backend specific configuration for whether documents should be searchable or not. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | selection | optional | string | | A [document selection](../../writing/document-selector-language.html), restricting documents that are routed to this cluster. Defaults to a selection expression matching everything. This selection can be specified to match document identifier specifics that are _independent_ of document types. For restrictions that apply only to a _specific_ document type, this must be done within that particular document type's [document](#document) element. Trying to use document type references in this selection makes an error during deployment. The selection given here will be merged with per-document type selections specified within document tags, if any, meaning that any document in the cluster must match _both_ selections to be accepted and kept. This feature is primarily used to [expire documents](../../../schemas/documents.html#document-expiry). | | garbage-collection | optional | true / false | false | If true, regularly verify the documents stored in the cluster to see if they belong in the cluster, and delete them if not. If false, garbage collection is not run. | | garbage-collection-interval | optional | integer | 3600 | Time (in seconds) between garbage collection cycles. Note that the deletion of documents is spread over this interval, so more resources will be used for deleting a set of documents with a small interval than with a larger interval. | Subelements: - [document](#document) (required) - [document-processing](#document-processing) (optional) ## document Contained in [documents](#documents). The document type to be routed to this content cluster. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | type | required | string | | [Document type name](../../schemas/schemas.html#document) | | mode | required | index / store-only / streaming | | The mode of storing and indexing. Refer to [streaming search](../../../performance/streaming-search.html) for _store-only_, as documents are stored the same way for both cases. Changing mode requires an _indexing-mode-change_[validation override](../validation-overrides.html), and documents must be re-fed. | | selection | optional | string | | A [document selection](../../writing/document-selector-language.html), restricting documents that are routed to this cluster. Defaults to a selection expression matching everything. This selection must apply to fields in _this document type only_. Selection will be merged together with selection for other types and global selection from [documents](#documents) to form a full expression for what documents belong to this cluster. | | global | optional | true / false | false | Set to _true_ to distribute all documents of this type to all nodes in the content cluster it is defined. Fields in global documents can be imported into documents to implement joins - read more in [parent/child](../../../schemas/parent-child.html). Vespa will detect when a new (or outdated) node is added to the cluster and prevent it from taking part in searches until it has received all global documents. Changing from _false_ to _true_ or vice versa requires a _global-document-change_[validation override](../validation-overrides.html). First, [stop services](/en/operations/self-managed/admin-procedures.html#vespa-start-stop-restart) on all content nodes. Then, deploy with the validation override. Finally, [start services](/en/operations/self-managed/admin-procedures.html#vespa-start-stop-restart) on all content nodes. Note: _global_ is only supported for _mode="index"_. | ## document-processing Contained in [documents](#documents). Vespa Search specific configuration for which document processing cluster and chain to run index preprocessing. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | cluster | optional | string | Container cluster on content node | Name of a [document-processing](docproc.html) container cluster that does index preprocessing. Use cluster to specify an alternative cluster, other than the default cluster on content nodes. | | chain | optional | string | `indexing` chain | A document processing chain in the container cluster specified by _cluster_ to use for index preprocessing. The chain must inherit the `indexing` chain. | Example - the container cluster enables [document-processing](docproc.html), referred to by the content cluster: ``` ``` ``` ``` To add document processors either before or after the indexer, declare a chain (inherit _indexing_) in a _document-processing_ container cluster and add document processors. Annotate document processors with `before=indexingStart` or `after=indexingEnd`. Configure this cluster and chain as the indexing chain in the content cluster - example: ``` ``` indexingStart indexingEnd ``` ``` **Important:** Note the [document-api](container.html#document-api) configuration. Set up this API on the same nodes as `document-processing` - find details in [indexing](../../../writing/indexing.html). ## min-redundancy Contained in [content](#content). The minimum total data copies the cluster will maintain. This can be set instead of (or in addition to) redundancy to ensure that a minimum number of copies are always maintained regardless of other configuration. `min-redundancy` can be changed without node restart - replicas will be added or removed automatically. ### min-redundancy and groups A group will always have minimum one copy of each document in the cluster. This is also the most commonly used configuration; Increase replica level with more groups to improve query capacity. - Example 1: If _min-redundancy_ is 2 and there is 1 content group, there will be 2 data copies in the group (2 copies for the cluster). If the number of groups is changed to 2 there will be 1 data copy in each group (still 2 copies for the cluster). - Example 2: A cluster is configured to [autoscale](../../../operations/autoscaling.html) using `groups="[2,3]"`. Here, configure min-redundancy to 2, as each group will have 1 replica irrespective of number of groups, here 2 or 3 - see [replicas](../../../content/elasticity.html#replicas). Setting the lower bound ensures correct replica level for 2 groups. For self-managed Vespa: Read more about the actual number of replicas when using [groups](#group) in [topology change](/en/content/elasticity.html#changing-topology). ## redundancy Contained in [content](#content). **Note:** Use [min-redundancy](#min-redundancy) instead of `redundancy`. Vespa Cloud: The number of data copies _per group_. Self-managed: The total data copies the cluster will maintain to avoid data loss. Example: with a redundancy of 2, the system tolerates 1 node failure before data becomes unavailable (until the system has managed to create new replicas on other online nodes). Redundancy can be changed without node restart - replicas will be added or removed automatically. ## coverage-policy Contained in [content](#content). Specifies the coverage policy for the content cluster. Valid values are `group` or `node`. The default value is `group`. If the policy is `group` coverage is maintained per group, meaning that when doing maintenance, upgrades etc. one group is allowed to be down at a time. If there is only one group in the cluster, coverage will be the same as policy `node`. If the policy is `node` coverage is maintained on a node level, meaning that when doing maintenance, upgrades etc. coverage will be maintained on a node level, so in practice 1 node in the whole cluster is allowed to be down at a time. When having several groups the common reason for changing policy away from the default `group` policy is when the load added to the remaining groups will increase too much when a whole group is allowed to go down. In that case it will be better to use the `node` policy, as taking one node at a time will give just a minor increase in load. ## node Contained in [nodes](services.html#nodes) or [group](#group). Configures a content node to the cluster, see [node](services.html#node)in the general services.xml documentation. Additional node attributes for content nodes: | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | distribution-key | required | integer | | The unique data distribution id of this node. This **must** remain unchanged for the host's lifetime. Distribution keys of a fresh system should be contiguous and start from zero. Distribution keys are used to identify nodes and groups for the [distribution algorithm](../../../content/idealstate.html). If a node changes distribution key, the distribution algorithm regards it as a new node, so buckets are redistributed. | | capacity | optional | double | 1 | **Deprecated:** Capacity of this node, relative to other nodes. A node with capacity 2 will get double the data and feed requests of a node with capacity 1. This feature is deprecated and expert mode only. Don't use in production, Vespa assumes homogenous cluster capacity. | | baseport | optional | integer | | baseport The first port in the port range allocated by this node. | ## group Contained in [content](#content) or[group](#group) - groups can be nested. Defines the [hierarchical structure](../../../content/elasticity.html#grouped-distribution) of the cluster. Can not be used in conjunction with the [nodes](services.html#nodes) element. Groups can contain other groups or nodes, but not both. There can only be a single level of leaf groups under the top group. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | distribution-key | required | integer | | Sets the distribution key of a group. It is not allowed to change this for a given group. Group distribution keys only need to be unique among groups that share the same parent group. | | name | required | string | | The name of the group, used for access from status pages and the like. | **Important:** There is no deployment-time verification that the distribution key remains unchanged for any given node or group. Consequently, take great care when modifying the set of nodes in a content cluster. Assigning a new distribution key to an existing node is undefined behavior; Best case, the existing data will be temporarily unavailable until the error has been corrected. Worst case, risk crashes or data loss. See [Vespa Serving Scaling Guide](../../../performance/sizing-search.html)for when to consider using grouped distribution and [Examples](../../../performance/sizing-examples.html) for example deployments using flat and grouped distribution. ## distribution (in group) Contained in [group](#group). Defines the data distribution to subgroups of this group._distribution_ should not be in the lowest level group containing storage nodes, as here the ideal state algorithm is used directly. In higher level groups, _distribution_ is mandatory. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | partitions | required if there are subgroups in the group | string | | String conforming to the partition specification: | Partition specification | Description | | --- | --- | | \* | Distribute all copies over 1 of N groups | | 1|\* | Distribute all copies over 2 of N groups | | 1|1|\* | Distribute all copies over 3 of N groups | | The partition specification is used to evenly distribute content copies across groups. Set a number or `*` per group separated by pipes (e.g. `1|*` for two groups). See [sample deployment configurations](../../../performance/sizing-examples.html). ## engine Contained in [content](#content). Specify the content engine to use, and/or adjust tuning parameters for the engine. Allowed engines are `proton` and `dummy`, the latter being used for debugging purposes. If no engine is given, proton is used. Sub-element: [proton](#proton). ## proton Contained in [engine](#engine). If specified, the content cluster will use the Proton content engine. This engine supports storage, indexed search and secondary indices. Optional sub-elements are [searchable-copies](#searchable-copies),[tuning](#tuning-proton),[sync-transactionlog](#sync-transactionlog),[flush-on-shutdown](#flush-on-shutdown), and[resource-limits (in proton)](#resource-limits-proton). ## searchable-copies Contained in [proton](#proton). Default value is 2, or [redundancy](#redundancy), if lower. If set to less than redundancy, only some of the stored copies are ready for searching at any time. This means that node failures causes temporary data unavailability while the alternate copies are being indexed for search. The benefit is using less memory, trading off availability during transitions. Refer to [bucket move](../../../content/proton.html#bucket-move). If updating documents or using [document selection](#documents) for garbage collection, consider setting [fast-access](../../schemas/schemas.html#attribute)on the subset of attribute fields used for this to make sure that these attributes are always kept in memory for fast access. Note that this is only useful if `searchable-copies` is less than `redundancy`. Read more in [proton](../../../content/proton.html). `searchable-copies` can be changed without node restart. Note that when reducing `searchable-copies` resource usage will not be reduced until content nodes are restarted. ## tuning Contained in [proton](#proton), optional. Tune settings for the search nodes in a content cluster - sub-element: | Element | Required | Quantity | | --- | --- | --- | | [searchnode](#searchnode) | No | Zero or one | ## searchnode Contained in [tuning](#tuning-proton), optional. Tune settings for search nodes in a content cluster - sub-elements: | Element | Required | Quantity | | --- | --- | --- | | [lidspace](#lidspace) | No | Zero or one | | | | | [requestthreads](#requestthreads) | No | Zero or one | | [flushstrategy](#flushstrategy) | No | Zero or one | | [initialize](#initialize) | No | Zero or one | | [feeding](#feeding) | No | Zero or one | | [index](#index) | No | Zero or one | | [summary](#summary) | No | Zero or one | ``` ``` ``` ``` ## requestthreads Contained in [searchnode](#searchnode), optional. Tune the number of request threads used on a content node, see [thread-configuration](../../../performance/sizing-search.html#thread-configuration) for details. Sub-elements: | Element | Required | Default | Description | | --- | --- | --- | --- | | search | Optional | 64 | Number of search threads. | | persearch | Optional | 1 | Number of search threads. Number of search threads used per search, see the [Vespa serving scaling guide](../../../performance/sizing-search.html) for an introduction of using multiple threads per search per node to reduce query latency. Number of threads per search can be adjusted down per _rank-profile_ using [num-threads-per-search](../../schemas/schemas.html#num-threads-per-search). | | summary | Optional | 16 | Number of summary threads. | ``` ``` 64 1 16 ``` ``` ## flushstrategy Contained in [searchnode](#searchnode), optional. Tune the _native_-strategy for flushing components to disk - a smaller number means more frequent flush: - _Memory gain_ is how much memory can be freed by flushing a component - _Disk gain_ is how much disk space can be freed by flushing a component (typically by using compaction) Refer to [Proton maintenance jobs](../../../content/proton.html#proton-maintenance-jobs). Optional sub-elements: - `native`: - `total` - `maxmemorygain`: The total maximum memory gain (in bytes) for _all_ components before running flush, default 4294967296 (4 GB) - `diskbloatfactor`: Trigger flush if the total disk gain (in bytes) for _all_ components is larger than the factor times current total disk usage, default 0.25 - `component` - `maxmemorygain`: The maximum memory gain (in bytes) by _a single_ component before running flush, default 1073741824 (1 GB) - `diskbloatfactor`: Trigger flush if the disk gain (in bytes) by _a single_ component is larger than the given factor times the current disk usage by that component, default 0.25 - `maxage`: The maximum age (in seconds) of unflushed content for a single component before running flush, default 111600 (31h) - `transactionlog` - `maxsize`: The total maximum size (in bytes) of [transaction logs](../../../content/proton.html#transaction-log) for all document types before running flush, default 21474836480 (20 GB) - `conservative` - `memory-limit-factor`: When [resource-limits (in proton)](#resource-limits-proton) for memory is reached, flush more often by downscaling `total.maxmemorygain` and `component.maxmemorygain`, default 0.5 - `disk-limit-factor`: When [resource-limits (in proton)](#resource-limits-proton) for disk is reached, flush more often by downscaling `transactionlog.maxsize`, default 0.5 ``` ``` 4294967296 0.2 1073741824 0.2 111600 21474836480 0.5 0.5 ``` ``` ## initialize Contained in [searchnode](#searchnode), optional. Tune settings related to how the search node (proton) is initialized. Optional sub-elements: - `threads`: The number of initializer threads used for loading structures from disk at proton startup. The threads are shared between document databases when the value is larger than 0. Default value is the number of document databases + 1. - When set to larger than 1, document databases are initialized in parallel - When set to 1, document databases are initialized in sequence - When set to 0, 1 separate thread is used per document database, and they are initialized in parallel. ``` ``` 2 ``` ``` ## lidspace Contained in [searchnode](#searchnode), optional. Tune settings related to how lidspace is managed. Optional sub-elements: - `max-bloat-factor`: Maximum bloat allowed before lidspace compaction is started. Compaction is moving a document from a high lid to a lower lid. Cost is similar to feeding a document and removing it. Also see description in [lidspace compaction maintenance job](../../../content/proton.html#lid-space-compaction). Default value is 0.01 or 1% of total lidspace. Will be increased to target of 0.50 or 50%. ``` ``` 0.5 ``` ``` ## feeding Contained in [searchnode](#searchnode), optional. Tune [proton](../../../content/proton.html) settings for feed operations. Optional sub-elements: - `concurrency`: A number between 0.0 and 1.0 that specifies the concurrency when handling feed operations, default 0.5. When set to 1.0, all cores on the cpu can be used for feeding. Changing this value requires restart of node to take effect. - `niceness`: A number between 0.0 and 1.0 that specifies the niceness of the feeding threads, default 0.0 =\> not any nicer than anyone else. Increasing this number will reduce priority of feeding compared to search. The real world effect is hard to predict as the magic exists in the OS level scheduler. Changing this value requires restart of node to take effect. ``` ``` 0.8 0.5 ``` ``` ## index Contained in [searchnode](#searchnode), optional. Tune various aspect with the handling of disk and memory indexes. Optional sub-elements: - `io` - `search`: Controls io read options used during search, values={mmap,populate}, default `mmap`. Using `populate` will eagerly touch all pages when index is loaded (after re-start or after index fusion is complete). - `warmup` - `time`: Specifies in seconds how long the index shall be warmed up before being switched in for serving. During warmup, it will receive queries and posting lists will be iterated, but results ignored as they are duplicates of the live index. This will pull in the most important ones in the cache. However, as warming up an index will occupy more memory, do not turn it on unless you suspect you need it. And always benchmark to see if it is worth it. - `unpack`: Controls whether all posting features are pulled in to the cache, or only the most important. values={true, false}, default false. ``` ``` mmap true ``` ``` ## removed-db Contained in [searchnode](#searchnode), optional. Tune various aspect of the db of removed documents. Optional sub-elements: - `prune` - `age`: Specifies how long (in seconds) we must remember removed documents before we can prune them away. Default is 2 weeks. This sets the upper limit on how long a node can be down and still be accepted back in the system, without having the index wiped. There is no point in having this any higher than the age of the documents. If corpus is re-fed every day, there is no point in having this longer than 24 hours. - `interval`: Specifies how often (in seconds) to prune old documents. Default is 3.36 hours (prune age / 100). No need to change default. Exposed here for reference and for testing. ``` ``` 86400 ``` ``` ## summary Contained in [searchnode](#searchnode), optional. Tune various aspect with the handling of document summary. Optional sub-elements: - `io` - `read`: Controls io read options used during reading of stored documents. Values are `directio` `mmap` `populate`. Default is `mmap`. `populate` will do an eager mmap and touch all pages. - `store` - `cache`: Used to tune the cache used by the document store. Enabled by default, using up to 5% of available memory. - `maxsize`: The maximum size of the cache in bytes. If set, it takes precedence over [maxsize-percent](#summary-store-cache-maxsize-percent). Default is unset. - `maxsize-percent`: The maximum size of the cache in percent of available memory. Default is 5%. - `compression` - `type`: The compression type of the documents while in the cache. Possible values are , `none` `lz4` `zstd`. Default is `lz4` - `level`: The compression level of the documents while in cache. Default is 6 - `logstore`: Used to tune the actual document store implementation (log-based). - `maxfilesize`: The maximum size (in bytes) per summary file on disk. Default value is 1GB. [document-store-compaction](../../../content/proton.html#document-store-compaction) - `chunk` - `maxsize`: Maximum size (in bytes) of a chunk. Default value is 64KB. - `compression` - `type`: Compression type for the documents, `none` `lz4` `zstd`. Default is `zstd`. - `level`: Compression level for the documents. Default is 3. ``` ``` directio 5 none 16384 zstd 3 ``` ``` ## flush-on-shutdown Contained in [proton](#proton). Default value is true. If set to true, search nodes will flush a set of components (e.g. memory index, attributes) to disk before shutting down such that the time it takes to flush these components plus the time it takes to replay the [transaction log](../../../content/proton.html#transaction-log)after restart is as low as possible. The time it takes to replay the transaction log depends on the amount of data to replay, so by flushing, some components before restart the transaction log will be pruned, and we reduce the replay time significantly. Refer to [Proton maintenance jobs](../../../content/proton.html#proton-maintenance-jobs). ## sync-transactionlog Contained in [proton](#proton). Default value is true. If true, the transactionlog is synced to disk after every write. This enables the transactionlog to survive power failures and kernel panic. The sync cost is amortized over multiple feed operations. The faster you feed the more operations it is amortized over. So with a local disk this is not known to be a performance issue. However, if using NAS (Network Attached Storage) like EBS on AWS one can see significant feed performance impact. For one particular case, turning off sync-transactionlog for EBS gave a 60x improvement. With sync-transactionlog turned off, the risk of losing data depends on the kernel's [sysctl settings.](https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#dirty-background-bytes) For example, this is a common default: ``` # sysctl -a ... vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500 ... ``` With this configuration, the worse case scenario is to lose 35 seconds worth of transactionlog, but no more than 1/20 of the free memory. Because kernel flusher threads wake up every 5s (dirty\_writeback\_centisecs) and write data older than 30s (dirty\_expire\_centisecs) from memory to disk. But if un-synced data exceeds 1/20 of the free memory, the Vespa process will sync it (dirty\_ratio). The above also assumes that all copies of the data are lost at the same time **and** that kernels on all these nodes flush at the same time: realistic scenario only with one copy. Adjust these [sysctl settings](https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#dirty-background-bytes) to manage the trade-off between data loss and performance. You'll see more in those kernel docs: for example, thresholds can be expressed in bytes. ## resource-limits (in proton) Contained in [proton](#proton). Specifies resource limits used by proton to reject both external and internal write operations (on this content node) when a limit is reached. **Warning:** These proton limits should almost never be changed directly. Instead, change [resource-limits](#resource-limits)that controls when external write operations are blocked in the entire content cluster. Be aware of the risks of tuning resource limits as seen in the link. The local proton limits are derived from the cluster limits if not specified, using this formula: $${L\_{proton}} = {L\_{cluster}} + \frac{1-L\_{cluster}}{2}$$ | Element | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | disk | optional | float [0, 1] | 0.875 | Fraction of total space on the disk partition used before put and update operations are rejected | | memory | optional | float [0, 1] | 0.9 | Fraction of physical memory that can be resident memory in anonymous mapping by proton before put and update operations are rejected | Example: ``` ``` 0.83 0.82 ``` ``` ## search Contained in [content](#content), optional. Declares search configuration for this content cluster. Optional sub-elements are[query-timeout](#query-timeout),[visibility-delay](#visibility-delay) and[coverage](#coverage). ## query-timeout Contained in [search](#search). Specifies the query timeout in seconds for queries against the search interface on the content nodes. The default is 0.5 (500ms), the max is 600.0. For query timeout also see the request parameter [timeout](../../api/query.html#timeout). **Note:** One can not override this value using the[timeout](../../api/query.html#timeout) request parameter. ## visibility-delay Contained in [search](#search). Default 0, max 1, seconds. This setting controls the TTL caching for [parent-child](../../../schemas/parent-child.html) imported fields. See [feature tuning](../../../performance/feature-tuning.html#parent-child-and-search-performance). ## coverage Contained in [search](#search). Declares search coverage configuration for this content cluster. Optional sub-elements are[minimum](#minimum),[min-wait-after-coverage-factor](#min-wait-after-coverage-factor) and[max-wait-after-coverage-factor](#max-wait-after-coverage-factor). Search coverage configuration controls how many nodes the query dispatcher process should wait for, trading search coverage versus search performance. ## minimum Contained in [coverage](#coverage). Declares the minimum search coverage required before returning the results of a query. This number is in the range `[0, 1]`, with 0 being no coverage and 1 being full coverage. The default is 1; unless configured otherwise a query will not return until all search nodes have responded within the specified timeout. ## min-wait-after-coverage-factor Contained in [coverage](#coverage). Declares the minimum time for a query to wait for full coverage once the declared[minimum](#minimum) has been reached. This number is a factor that is multiplied with the time remaining at the time of reaching minimum coverage. The default is 0; unless configured otherwise a query will return as soon as the minimum coverage has been reached, and the remaining search nodes appear to be lagging. ## max-wait-after-coverage-factor Contained in [coverage](#coverage). Declares the maximum time for a query to wait for full coverage once the declared[minimum](#minimum) has been reached. This number is a factor that is multiplied with the time remaining at the time of reaching minimum coverage. The default is 1; unless configured otherwise a query is allowed to wait its full timeout for full coverage even after reaching the minimum. ## tuning Contained in [content](#content), optional. Optional tuning parameters are:[bucket-splitting](#bucket-splitting),[min-node-ratio-per-group](#min-node-ratio-per-group),[cluster-controller](#cluster-controller),[dispatch](#dispatch-tuning),[distribution](#distribution_type),[maintenance](#maintenance),[max-document-size](#max-document-size),[merges](#merges),[persistence-threads](#persistence-threads) and[visitors](#visitors). ## bucket-splitting Contained in [tuning](#tuning). The [bucket](../../../content/buckets.html) is the fundamental unit of distribution and management in a content cluster. Buckets are auto-split, no need to configure for most applications. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | max-documents | optional | integer | 1024 | Maximum number of documents per content bucket. Buckets are split in two if they have more documents than this. Keep this value below 16K. | | max-size | optional | integer | 32MiB | Maximum size (in bytes) of a bucket. This is the sum of the serialized size of all documents kept in the bucket. Buckets are split in two if they have a larger size than this. Keep this value below 100 MiB. | | minimum-bits | optional | integer | | Override the ideal distribution bit count configured for this cluster. Prefer to use the [distribution type](#distribution_type) setting instead if the default distribution bit count does not fit the cluster. This variable is intended for testing and to work around possible distribution bit issues. Most users should not need this option. | ## min-node-ratio-per-group **Important:** This is configuration for the cluster controller. Most users are normally looking for[min-active-docs-coverage](#min-active-docs-coverage)which controls how many nodes can be down before query load is routed to other groups. Contained in [tuning](#tuning). States a lower bound requirement on the ratio of nodes within _individual_ [groups](#group)that must be online and able to accept traffic before the entire group is automatically taken out of service. Groups are automatically brought back into service when the availability of its nodes has been restored to a level equal to or above this limit. Elastic content clusters are often configured to use multiple groups for the sake of horizontal traffic scaling and/or data availability. The content distribution system will try to ensure a configured number of replicas is always present within a group in order to maintain data redundancy. If the number of available nodes in a group drops too far, it is possible for the remaining nodes in the group to not have sufficient capacity to take over storage and serving for the replicas they now must assume responsibility for. Such situations are likely to result in increased latencies and/or feed rejections caused by resource exhaustion. Setting this tuning parameter allows the system to instead automatically take down the remaining nodes in the group, allowing feed and query traffic to fail completely over to the remaining groups. Valid parameter is a decimal value in the range [0, 1]. Default is 0, which means that the automatic group out-of-service functionality will _not_ automatically take effect. Example: assume a cluster has been configured with _n_ groups of 4 nodes each and the following tuning config: ``` ``` 0.75 ``` ``` This tuning allows for 1 node in a group to be down. If 2 or more nodes go down, all nodes in the group will be marked as down, letting the _n-1_ remaining groups handle all the traffic. This configuration can be changed live as the system is running and altered limits will take effect immediately. ## distribution (in tuning) Contained in [tuning](#tuning). Tune the distribution algorithm used in the cluster. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | type | optional | loose | strict | legacy | loose | When the number of a nodes configured in a system changes over certain limits, the system will automatically trigger major redistributions of documents. This is to ensure that the number of buckets is appropriate for the number of nodes in the cluster. This enum value specifies how aggressive the system should be in triggering such distribution changes. The default of `loose` strikes a balance between rarely altering the distribution of the cluster and keeping the skew in document distribution low. It is recommended that you use the default mode unless you have empirically observed that it causes too much skew in load or document distribution. Note that specifying `minimum-bits` under [bucket-splitting](#bucket-splitting) overrides this setting and effectively "locks" the distribution in place. | ## maintenance Contained in [tuning](#tuning). Controls the running time of the bucket maintenance process. Bucket maintenance verifies bucket content for corruption. Most users should not need to tweak this. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | start | required | HH.MM | | Start of daily maintenance window, e.g. 02:00 | | stop | required | HH.MM | | End of daily maintenance window, e.g. 05:00 | | high | required | HH.MM | | Day of week for starting full file verification cycle, e.g. monday. The full cycle is more costly than partial file verification | ## max-document-size Contained in [tuning](#tuning). Specifies max document size in the content cluster, measured as the uncompressed size of a document operation arriving over the wire by the distributor service. The limit will be used for all document types. A document larger than this limit will be rejected by the distributor. Note that some document operations that don't contain the entire document, like[document updates](../../../writing/document-api-guide.html#document-updates)might increase the size of a document above this limit. Valid values are numbers including a unit (e.g. _10MiB_) and the value must be between 1Mib and 2048 Mib (inclusive). Values will be rounded to nearest MiB, so using MiB as a unit is preferrable. It is strongly recommended to make sure this is not set too high, 10 MiB is a reasonable setting for most use cases, setting it above 100 MiB is not recommended, as allowing large documents might impact operations, e.g. when restarting nodes, moving documents between nodes etc. Default value is 128 MiB. Example: ``` ``` 10MiB ``` ``` ## merges Contained in [tuning](#tuning). Defines throttling parameters for bucket merge operations. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | max-per-node | optional | number | | Maximum number of parallel active bucket merge operations. | | max-queue-size | optional | number | | Maximum size of the merge bucket queue, before reporting BUSY back to the distributors. | ## persistence-threads Contained in [tuning](#tuning). Defines the number of persistence threads per partition on each content node. A content node executes bucket operations against the persistence engine synchronously in each of these threads. 8 threads are used by default. Override with the **count** attribute. ## visitors Contained in [tuning](#tuning). Tuning parameters for visitor operations. Might contain [max-concurrent](#max-concurrent). | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | thread-count | optional | number | | The maximum number of threads in which to execute visitor operations. A higher number of threads may increase performance, but may use more memory. | | max-queue-size | optional | number | | Maximum size of the pending visitor queue, before reporting BUSY back to the distributors. | ## max-concurrent Contained in [visitors](#visitors). Defines how many visitors can be active concurrently on each storage node. The number allowed depends on priority - lower priority visitors should not block higher priority visitors completely. To implement this, specify a fixed and a variable number. The maximum active is calculated by adjusting the variable component using the priority, and adding the fixed component. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | fixed | optional | number | [16](https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/visiting/stor-visitor.def) | The fixed component of the maximum active count | | variable | optional | number | [64](https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/visiting/stor-visitor.def) | The variable component of the maximum active count | ## resource-limits Contained in [tuning](#tuning). Specifies resource limits used to decide whether external write operations should be blocked in the entire content cluster, based on the reported resource usage by content nodes. See [feed block](../../../writing/feed-block.html) for more details. **Warning:** The content nodes require resource headroom to handle extra documents as part of re-distribution during node failure, and spikes when running[maintenance jobs](../../../content/proton.html#proton-maintenance-jobs). Tuning these limits should be done with extreme care, and setting them too high might lead to permanent data loss. They are best left untouched, using the defaults, and cannot be set in Vespa Cloud. | Element | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | disk | optional | float [0, 1] | 0.75 | Fraction of total space on the disk partition used on a content node before feed is blocked | | memory | optional | float [0, 1] | 0.8/0.75 | Fraction of physical memory that can be resident memory in anonymous mapping on a content node before feed is blocked. Total physical memory is sampled as the minimum of `sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE)` and the cgroup (v1 or v2) memory limit. Nodes with 8 Gib or less memory in Vespa Cloud has a limit of 0.75. | Example - in the content tag: ``` ``` 0.78 0.77 ``` ``` ## dispatch Contained in [tuning](#tuning). Tune the query dispatch behavior - child elements: | Element | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | max-hits-per-partition | optional | Integer | No capping: Return all | Maximum number of hits to return from a content node. By default, a query returns the requested number of hits + offset from every content node to the container. The container orders the hits globally according to the query, then discards all hits beyond the number requested. In a system with a large fan-out, this consumes network bandwidth and the container nodes easily network saturated. Containers will also sort and discard more hits than optimal. When there are sufficiently many search nodes, assuming an even distribution of the hits, it suffices to only return a fraction of the request number of hits from each node. Note that changing this number will have global ordering impact. See _top-k-probability_ below for improving performance with fewer hits. | | dispatch-policy | optional | best-of-random-2 / adaptive | adaptive | Configure policy for choosing which group shall receive the next query request. However, multiphase requests that either requires or benefits from hitting the same group in all phases are always hashed. Relevant only for [grouped distribution](../../../performance/sizing-search.html#data-distribution): | best-of-random-2 | Selects 2 random groups and selects the one with the lowest latency. | | adaptive | measures latency, preferring lower latency groups, selecting group with probability latency/(sum latency over all groups) | | | prioritize-availability | optional | Boolean | true | With [grouped distribution](../../../performance/sizing-search.html#data-distribution): If true, or by default, all groups that are within min-active-docs-coverage of the **median** of the document count of other groups will be used to service queries. If set to false, only groups within min-active-docs-coverage of the **max** document count will be used, with the consequence that full coverage is prioritized over availability when multiple groups are lacking content, since the remaining groups may not be able to service the full query load. | | min-active-docs-coverage | optional | A float percentage | 97 | With [grouped distribution](../../../performance/sizing-search.html#data-distribution): The percentage of active documents a group must have, relative to the median across all groups in the content cluster, to be considered active for serving queries. Because of measurement timing differences, it is not advisable to tune this above 99 percent. | | top-k-probability | optional | Double | 0.9999 | Probability that the top K hits will be the globally best. Based on this probability, the dispatcher will fetch enough hits from each node to achieve this. The only way to guarantee a probability of 1.0 is to fetch K hits from each partition. However, by reducing the probability from 1.0 to 0.99999, one can significantly reduce number of hits fetched and save both bandwidth and latency. The number of hits to fetch from each partition is computed as: $${q}={\frac{k}{n}}+{qT}({p},{30})×{\sqrt{ {k}×{\frac{1}{n}}×({1}-{\frac{1}{n}}) }}$$ where qT is a Student's t-distribution. With n=10 partitions, k=200 hits and p=0.99999, only 45 hits per partition is needed, as opposed to 200 when p=1.0. Use this option to reduce network and container cpu/memory in clusters with many nodes per group - see [Vespa Serving Scaling Guide](../../../performance/sizing-search.html). | ## cluster-controller Contained in [tuning](#tuning). Tuning parameters for the cluster controller managing this cluster - child elements: | Element | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | init-progress-time | optional | | | If the initialization progress count have not been altered for this amount of seconds, the node is assumed to have deadlocked and is set down. Note that initialization may actually be prioritized lower now, so setting a low value here might cause false positives. Though if it is set down for wrong reason, when it will finish initialization and then be set up again. | | transition-time | optional | | [storage\_transition\_time](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def)[distributor\_transition\_time](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) | The transition time states how long (in seconds) a node will be in maintenance mode during what looks like a controlled restart. Keeping a node in maintenance mode during a restart allows a restart without the cluster trying to create new copies of all the data immediately. If the node has not started or got back up within the transition time, the node is set down, in which case, new full bucket copies will be created. Note separate defaults for distributor and storage (i.e. search) nodes. | | max-premature-crashes | optional | | [max\_premature\_crashes](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) | The maximum number of crashes allowed before a content node is permanently set down by the cluster controller. If the node has a stable up or down state for more than the _stable-state-period_, the crash count is reset. However, resetting the count will not re-enable the node again if it has been disabled - restart the cluster controller to reset. | | stable-state-period | optional | | [stable\_state\_time\_period](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) | If a content node's state doesn't change for this many seconds, it's state is considered _stable_, clearing the premature crash count. | | min-distributor-up-ratio | optional | | [min\_distributor\_up\_ratio](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) | The minimum ratio of distributors that are required to be _up_ for the cluster state to be _up_. | | min-storage-up-ratio | optional | | [min\_storage\_up\_ratio](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) | The minimum ratio of content nodes that are required to be _up_ for the cluster state to be _up_. | | groups-allowed-down-ratio | optional | | [groups-allowed-down-ratio](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) | A ratio for the number of content groups that are allowed to be down simultaneously. A value of 0.5 means that 50% of the groups are allowed to be down. The default is to allow only one group to be down at a time. | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [content](#content) - [documents](#documents) - [document](#document) - [document-processing](#document-processing) - [min-redundancy](#min-redundancy) - [min-redundancy and groups](#min-redundancy-and-groups) - [redundancy](#redundancy) - [coverage-policy](#coverage-policy) - [node](#node) - [group](#group) - [distribution (in group)](#distribution) - [engine](#engine) - [proton](#proton) - [searchable-copies](#searchable-copies) - [tuning](#tuning-proton) - [searchnode](#searchnode) - [requestthreads](#requestthreads) - [flushstrategy](#flushstrategy) - [initialize](#initialize) - [lidspace](#lidspace) - [feeding](#feeding) - [index](#index) - [removed-db](#removed-db) - [summary](#summary) - [flush-on-shutdown](#flush-on-shutdown) - [sync-transactionlog](#sync-transactionlog) - [resource-limits (in proton)](#resource-limits-proton) - [search](#search) - [query-timeout](#query-timeout) - [visibility-delay](#visibility-delay) - [coverage](#coverage) - [minimum](#minimum) - [min-wait-after-coverage-factor](#min-wait-after-coverage-factor) - [max-wait-after-coverage-factor](#max-wait-after-coverage-factor) - [tuning](#tuning) - [bucket-splitting](#bucket-splitting) - [min-node-ratio-per-group](#min-node-ratio-per-group) - [distribution (in tuning)](#distribution_type) - [maintenance](#maintenance) - [max-document-size](#max-document-size) - [merges](#merges) - [persistence-threads](#persistence-threads) - [visitors](#visitors) - [max-concurrent](#max-concurrent) - [resource-limits](#resource-limits) - [dispatch](#dispatch-tuning) - [cluster-controller](#cluster-controller) --- # Source: https://docs.vespa.ai/en/learn/contributing.html.md # Contributing to Vespa Contributions to [Vespa](http://github.com/vespa-engine/vespa)and the [Vespa documentation](http://github.com/vespa-engine/documentation)are welcome. This document tells you what you need to know to contribute. ## Open development All work on Vespa happens directly on GitHub, using the [GitHub flow model](https://docs.github.com/en/get-started/quickstart/github-flow). We release the master branch a few times a week, and you should expect it to almost always work. In addition to the [builds seen on factory.vespa.ai](https://factory.vespa.ai)we have a large acceptance and performance test suite which is also run continuously. ### Pull requests All pull requests are reviewed by a member of the Vespa Committers team. You can find a suitable reviewer in the OWNERS file upward in the source tree from where you are making the change (the OWNERS have a special responsibility for ensuring the long-term integrity of a portion of the code). If you want to become a committer/OWNER making some quality contributions is the way to start. We require all pull request checks to pass. ## Versioning Vespa uses semantic versioning - see [vespa versions](releases.html). Notice in particular that any Java API in a package having a @PublicAPI annotation in the package-info file cannot be changed in an incompatible way between major versions: Existing types and method signatures must be preserved (but can be marked deprecated). ## Issues We track issues in [GitHub issues](https://github.com/vespa-engine/vespa/issues). It is fine to submit issues also for feature requests and ideas, whether you intend to work on them or not. There is also a [ToDo list](https://github.com/vespa-engine/vespa/blob/master/TODO.md) for larger things which no one are working on yet. ## Community If you have questions, want to share your experience or help others, please join our community on the [Vespa Slack](https://slack.vespa.ai), or see Vespa on [Stack Overflow](http://stackoverflow.com/questions/tagged/vespa). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Open development](#open-development) - [Pull requests](#pull-requests) - [Versioning](#versioning) - [Issues](#issues) - [Community](#community) --- # Source: https://docs.vespa.ai/en/operations/self-managed/cpu-support.html.md # CPU Support For maximum performance, the current version of Vespa for x86\_64 is compiled only for [Haswell (2013)](https://en.wikipedia.org/wiki/Haswell_(microarchitecture)) or later CPUs. If trying to run on an older CPU, you will likely see error messages like the following: ``` Problem running program /opt/vespa/bin/vespa-runserver => died with signal: illegal instruction (you probably have an older CPU than required) ``` or in older versions of Vespa, something like ``` /usr/local/bin/start-container.sh: line 67: 10 Illegal instruction /opt/vespa/bin/vespa-start-configserver ``` If you would like to run Vespa on an older CPU, we provide a [generic x86 container image](https://hub.docker.com/r/vespaengine/vespa-generic-intel-x86_64/). This image is slower, receives less testing than the regular image, and is less frequently updated. **To start a Vespa Docker container using this image:** ``` $ docker run --detach --name vespa --hostname vespa-container \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa-generic-intel-x86_64 ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/ranking/cross-encoders.html.md # Ranking With Transformer Cross-Encoder Models [Cross-Encoder Transformer](https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/) based text ranking models are generally more effective than [text embedding](../rag/embedding.html) models as they take both the query and the document as input with full cross-attention between all the query and document tokens. The downside of cross-encoder models is the computational complexity. This document is a guide on how to export cross-encoder Transformer based models from [huggingface](https://huggingface.co/), and how to configure them for use in Vespa. ## Exporting cross-encoder models For exporting models from HF to [ONNX](onnx), we recommend the [Optimum](https://huggingface.co/docs/optimum/main/en/index)library. Example usage for two relevant ranking models. Export [intfloat/simlm-msmarco-reranker](https://huggingface.co/intfloat/simlm-msmarco-reranker), which is a BERT-based transformer model for English texts: ``` $ optimum-cli export onnx --task text-classification -m intfloat/simlm-msmarco-reranker ranker ``` Export [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base), which is a ROBERTA-based transformer model for English and Chinese texts (multilingual): ``` $ optimum-cli export onnx --task text-classification -m BAAI/bge-reranker-base ranker ``` These two example ranking models use different language model [tokenization](../reference/rag/embedding.html#huggingface-tokenizer-embedder) and also different transformer inputs. After the above Optimum export command you have two important files that is needed for importing the model to Vespa: ``` ├── ranker │   └── model.onnx └── tokenizer.json ``` The Optimum tool also supports various Transformer optimizations, including quantization to optimize the model for faster inference. ## Importing ONNX and tokenizer model files to Vespa Add the generated `model.onnx` and `tokenizer.json` files from the `ranker` directory created by Optimum to the Vespa [application package](../basics/applications.html): ``` ├── models │   └── model.onnx └── tokenizer.json ├── schemas │   └── doc.sd └── services.xml ``` ## Configure tokenizer embedder To speed up inference, Vespa avoids re-tokenizing the document tokens, so we need to configure the [huggingface-tokenizer-embedder](../reference/rag/embedding.html#huggingface-tokenizer-embedder) in the `services.xml` file: ``` .. .. ``` This allows us to use the tokenizer while indexing documents in Vespa and also at query time to map (embed) query text to language model tokens. ## Using tokenizer in schema Assuming we have two fields that we want to index and use for re-ranking (title, body), we can use the `embed` indexing expression to invoke the tokenizer configured above: ``` schema my_document { document my_document { field title type string {..} field body type string {..} } field tokens type tensor(d0[512]) { indexing: (input title || "") . " " . (input body || "") | embed tokenizer | attribute } } ``` The above will concat the title and body input document fields, and input to the `hugging-face-tokenizer` tokenizer which saves the output tokens as float (101.0). To use the generated `tokens` tensor in ranking, the tensor field must be defined with [attribute](../content/attributes.html). ## Using the cross-encoder model in ranking Cross-encoder models are not practical for _retrieval_ over large document volumes due to their complexity, so we configure them using [phased ranking](phased-ranking.html). ### Bert-based model Bert-based models have three inputs: - input\_ids - token\_type\_ids - attention\_mask The [onnx-model](../reference/schemas/schemas.html#onnx-model) configuration specifies the input names of the model and how to calculate them. It also specifies the file `models/model.onnx`. Notice also the [GPU](../operations/self-managed/vespa-gpu-container.html). GPU inference is not required, and Vespa will fall back to CPU if no GPU device is found. See the section on [performance](#performance). ``` rank-profile bert-ranker inherits default { inputs { query(q_tokens) tensor(d0[32]) } onnx-model cross_encoder { file: models/model.onnx input input_ids: my_input_ids input attention_mask: my_attention_mask input token_type_ids: my_token_type_ids gpu-device: 0 } function my_input_ids() { expression: tokenInputIds(256, query(q_tokens), attribute(tokens)) } function my_token_type_ids() { expression: tokenTypeIds(256, query(q_tokens), attribute(tokens)) } function my_attention_mask() { expression: tokenAttentionMask(256, query(q_tokens), attribute(tokens)) } first-phase { expression: #depends on the retriever used } # The output of this model is a tensor of size ["batch", 1] global-phase { rerank-count: 25 expression: onnx(cross_encoder){d0:0,d1:0} } } ``` The example above limits the sequence length to `256` using the built-in [convenience functions](../reference/ranking/rank-features.html#tokenInputIds(length,%20input_1,%20input_2,%20...)) for generating token sequence input to Transformer models. Note that `tokenInputIds` uses 101 as start of sequence and 102 as padding. This is only compatible with BERT-based tokenizers. See section on [performance](#performance)about sequence length and impact on inference performance. ### Roberta-based model ROBERTA-based models only have two inputs (input\_ids and attention\_mask). In addition, the default tokenizer start of sequence token is 1 and end of sequence is 2. In this case we use the`customTokenInputIds` function in `my_input_ids` function. See[customTokenInputIds](../reference/ranking/rank-features.html#customTokenInputIds(start_sequence_id, sep_sequence_id, length, input_1, input_2, ...)). ``` rank-profile roberta-ranker inherits default { inputs { query(q_tokens) tensor(d0[32]) } onnx-model cross_encoder { file: models/model.onnx input input_ids: my_input_ids input attention_mask: my_attention_mask gpu-device: 0 } function my_input_ids() { expression: customTokenInputIds(1, 2, 256, query(q_tokens), attribute(tokens)) } function my_attention_mask() { expression: tokenAttentionMask(256, query(q_tokens), attribute(tokens)) } first-phase { expression: #depends on the retriever used } # The output of this model is a tensor of size ["batch", 1] global-phase { rerank-count: 25 expression: onnx(cross_encoder){d0:0,d1:0} } } ``` ## Using the cross-encoder model at query time At query time, we need to tokenize the user query using the [embed](../rag/embedding.html#embedding-a-query-text) support. The `embed` of the query text, sets the `query(q_tokens)`tensor that we defined in the ranking profile. ``` ``` { "yql": "select title,body from doc where userQuery()", "query": "semantic search", "input.query(q_tokens)": "embed(tokenizer, \"semantic search\")", "ranking": "bert-ranker", } ``` ``` The retriever (query + first-phase ranking) can be anything, including [nearest neighbor search](../querying/nearest-neighbor-search) a.k.a. dense retrieval using bi-encoders. ## Performance There are three major scaling dimensions: - The number of hits that are re-ranked [rerank-count](../reference/schemas/schemas.html#globalphase-rerank-count) Complexity is linear with the number of hits that are re-ranked. - The size of the transformer model used. - The sequence input length. Transformer models scales quadratic with the input sequence length. For models larger than 30-40M parameters, we recommend using GPU to accelerate inference. Quantization of model weights can drastically improve serving efficiency on CPU. See[Optimum Quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization) ## Examples The [MS Marco](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking)sample application demonstrates using cross-encoders. ## Using cross-encoders with multi-vector indexing When using [multi-vector indexing](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/) we can do the following to feed the best (closest) paragraph using the [closest()](../reference/ranking/rank-features.html#closest(name)) feature into re-ranking with the cross-encoder model. ``` schema my_document { document my_document { field paragraphs type arraystring {..} } field tokens type tensor(p{}, d0[512]) { indexing: input paragraphs | embed tokenizer | attribute } field embedding type tensor(p{}, x[768]) { indexing: input paragraphs | embed embedder | attribute } } ``` Notice that both tokens use the same mapped embedding dimension name `p`. ``` rank-profile max-paragraph-into-cross-encoder inherits default { inputs { query(tokens) tensor(d0[32]) query(q) tensor(x[768]) } first-phase { expression: closeness(field, embedding) } function best_input() { expression: reduce(closest(embedding)*attribute(tokens), max, p) } function my_input_ids() { expression: tokenInputIds(256, query(tokens), best_input) } function my_token_type_ids() { expression: tokenTypeIds(256, query(tokens), best_input) } function my_attention_mask() { expression: tokenAttentionMask(256, query(tokens), best_input) } match-features: best_input my_input_ids my_token_type_ids my_attention_mask global-phase { rerank-count: 25 expression: onnx(cross_encoder){d0:0,d1:0} #Slice } } ``` The `best_input` uses a tensor join between the `closest(embedding)` tensor and the `tokens` tensor, which then returns the tokens of the best-matching (closest) paragraph. This tensor is used in the other Transformer-related functions (`tokenTypeIds tokenAttentionMask tokenInputIds`) as the document tokens. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Exporting cross-encoder models](#exporting-cross-encoder-models) - [Importing ONNX and tokenizer model files to Vespa](#importing-onnx-and-tokenizer-model-files-to-vespa) - [Configure tokenizer embedder](#configure-tokenizer-embedder) - [Using tokenizer in schema](#using-tokenizer-in-schema) - [Using the cross-encoder model in ranking](#using-the-cross-encoder-model-in-ranking) - [Bert-based model](#bert-based-model) - [Roberta-based model](#roberta-based-model) - [Using the cross-encoder model at query time](#using-the-cross-encoder-model-at-query-time) - [Performance](#performance) - [Examples](#examples) - [Using cross-encoders with multi-vector indexing](#using-cross-encoders-with-multi-vector-indexing) --- # Source: https://docs.vespa.ai/en/operations/kubernetes/custom-overrides-podtemplate.html.md # Provide Custom Overrides While services.xml defines the Vespa application specification, it abstracts away the underlying Kubernetes infrastructure. Advanced users often need to configure Kubernetes-specific settings for the Vespa application Pods to integrate Vespa within their broader platform ecosystem. The Pod Template mechanism allows you to inject custom configurations into the Vespa application pods created by the ConfigServer. Common use cases for overriding the default pod configuration include: - **Sidecar Injection**: Running auxiliary containers alongside Vespa for logging (e.g., Fluent Bit), monitoring (e.g., Datadog, Prometheus exporters), or service mesh proxies (e.g., Envoy, Istio). - **Scheduling Constraints**: Using nodeSelector, affinity, or tolerations to pin Vespa pods to specific hardware (e.g., high-memory nodes, specific availability zones) or isolate them from other workloads. - **Metadata Management**: Adding custom Labels or Annotations for cost allocation, team ownership, or integration with external inventory tools. - **Security & Config**: Mounting Kubernetes Secrets or ConfigMaps that contain credentials or environment configurations required by custom sidecars. ## Configure Custom Overrides Overrides are defined in the `VespaSet` Custom Resource under `spec.application.podTemplate` and `spec.configServer.podTemplate`. This field accepts a standard Kubernetes PodTemplateSpec. The Operator and ConfigServer treat this template as an overlay. When creating a ConfigServer or Application Pod, the base template of the main `vespa` container is merged with your custom overlay. Vespa on Kubernetes enforces a `Add-Only` merge strategy. One cannot remove or downgrade core `vespa` container settings, but only augment them. | Category | Allowed Actions | Restricted Actions | | --- | --- | --- | | **Containers** | - Add new sidecar containers. - Add env vars/mounts to main container. | - Cannot change main container image, command, or args. - Cannot override main container CPU/Memory resources (these are locked to `services.xml`). | | **Volumes** | - Add new Volumes (ConfigMap, Secret, EmptyDir). | - Cannot modify operator-reserved volumes (e.g., `/data`). | | **Metadata** | - Add new Labels and Annotations. | - Cannot overwrite operator-created labels and annotations | ## Examples ### Example 1: Injecting a Logging Sidecar This example adds a Fluent Bit sidecar to ship logs to a central system. It defines the sidecar container and mounts a shared volume that the Vespa container also writes to. ``` apiVersion: k8s.ai.vespa/v1 kind: VespaSet metadata: name: my-vespa-cluster spec: application: image: vespaengine/vespa:8.200.15 # Define the Custom Overlay podTemplate: spec: containers: # 1. Define the Sidecar - name: fluent-bit image: fluent/fluent-bit:1.9 volumeMounts: - name: vespa-logs mountPath: /opt/vespa/logs/vespa # 2. Define the Shared Volume volumes: - name: vespa-logs emptyDir: {} ``` ### Example 2: Pinning Pods to Specific Nodes This example uses a nodeSelector to ensure Vespa pods only run on nodes labeled with workload=high-performance. ``` apiVersion: k8s.ai.vespa/v1 kind: VespaSet metadata: name: prod-vespa spec: application: podTemplate: spec: # Schedule only on nodes with label 'workload: high-performance' nodeSelector: workload: high-performance # Tolerate the 'dedicated' taint if those nodes are tainted tolerations: - key: "dedicated" operator: "Equal" value: "search-team" effect: "NoSchedule" ``` ### Example 3: Adding Cost Allocation Labels This example adds custom labels that will appear on every tenant pod, enabling cost tracking by team. ``` apiVersion: k8s.ai.vespa/v1 kind: VespaSet metadata: name: shared-vespa spec: application: podTemplate: metadata: labels: cost-center: "engineering-search" owner: "team-alpha" annotations: # Example annotation for an external monitoring system monitoring.datadoghq.com/enabled: "true" ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Configure Custom Overrides](#) - [Examples](#) - [Example 1: Injecting a Logging Sidecar](#) - [Example 2: Pinning Pods to Specific Nodes](#) - [Example 3: Adding Cost Allocation Labels](#) --- # Source: https://docs.vespa.ai/en/operations/data-management.html.md # Data management and backup This guide documents how to export data from a Vespa cloud application and how to do mass updates or removals. See [cloning applications and data](cloning) for how to copy documents from one application to another. Prerequisite: Use the latest version of the [vespa](../clients/vespa-cli.html) command-line client. ## Export documents To export documents, configure the application to export from, then select zone, container cluster and schema - example: ``` $ vespa config set application vespa-team.vespacloud-docsearch.default $ vespa visit --zone prod.aws-us-east-1c --cluster default --selection doc | head ``` Some of the parameters above are redundant if unambiguous. Here, the application is set up using a template found in [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) with multiple container clusters. This example [visit](../writing/visiting.html) documents from the `doc` schema. Use a [fieldset](../schemas/documents.html#fieldsets) to export document IDs only: ``` $ vespa visit --zone prod.aws-us-east-1c --cluster default --selection doc --field-set '[id]' | head ``` As the name implies, fieldsets are useful to select a subset of fields to export. Note that this normally does not speed up the exporting process, as the same amount of data is read from the index. The data transfer out of the Vespa application is smaller with fewer fields. ## Backup Use the _visit_ operations above to extract documents for backup. To back up documents to your own Google Cloud Storage, see [backup](https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions#backup---experimental) for a Google Cloud Function example. ## Feed If a document feed is generated with `vespa visit` (above), it is already in [JSON Lines](https://jsonlines.org/) feed-ready format by default: ``` $ vespa visit | vespa feed - -t $ENDPOINT ``` Find more examples in [cloning applications and data](cloning). A document export generated using [/document/v1](../writing/document-v1-api-guide.html) is slightly different from the .jsonl output from `vespa visit` (e.g., fields like a continuation token are added). Extract the `document` objects before feeding: ``` $ gunzip -c docs.gz |[jq](https://stedolan.github.io/jq/)'.documents[]' | \ vespa feed - -t $ENDPOINT ``` ## Delete To remove all documents in a Vespa deployment—or a selection of them—run a _deletion visit_. Use the `DELETE` HTTP method, and fetch only the continuation token from the response: ``` #!/bin/bash set -x # The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/' ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud" NAMESPACE=open DOCTYPE=doc CLUSTER=documentation # doc.path =~ "^/old/" -- all documents under the /old/ directory: SELECTION='doc.path%3D~%22%5E%2Fold%2F%22' continuation="" while token=$( curl -X DELETE -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?selection=${SELECTION}&cluster=${CLUSTER}&${continuation}" \ | tee >( jq . > /dev/tty ) | jq -re .continuation ) do continuation="continuation=${token}" done ``` Each request will return a response after roughly one minute—change this by specifying _timeChunk_ (default 60). To purge all documents in a document export (above), generate a feed with `remove`-entries for each document ID, like: ``` $ gunzip -c docs.gz | jq '[.documents[] | {remove: .id} ]' | head [ { "remove": "id:open:doc::open/documentation/schemas.html" }, { "remove": "id:open:doc::open/documentation/securing-your-vespa-installation.html" }, ``` Complete example for a single chunk: ``` $ gunzip -c docs.gz | jq '[.documents[] | {remove: .id} ]' | \ vespa feed - -t $ENDPOINT ``` ## Update To update all documents in a Vespa deployment—or a selection of them—run an _update visit_. Use the `PUT` HTTP method, and specify a partial update in the request body: ``` #!/bin/bash set -x # The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/' ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud" NAMESPACE=open DOCTYPE=doc CLUSTER=documentation # doc.inlinks == "some-url" -- the weightedset inlinks has the key "some-url" SELECTION='doc.inlinks%3D%3D%22some-url%22' continuation="" while token=$( curl -X PUT -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ --data '{ "fields": { "inlinks": { "remove": { "some-url": 0 } } } }' \ "${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?selection=${SELECTION}&cluster=${CLUSTER}&${continuation}" \ | tee >( jq . > /dev/tty ) | jq -re .continuation ) do continuation="continuation=${token}" done ``` Each request will return a response after roughly one minute—change this by specifying _timeChunk_ (default 60). ## Using /document/v1/ api To get started with a document export, find the _namespace_ and _document type_ by listing a few IDs. Hit the [/document/v1/](../reference/api/document-v1.html) ENDPOINT. Restrict to one CLUSTER, see [content clusters](../reference/applications/services/content.html): ``` $ curl \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "$ENDPOINT/document/v1/?cluster=$CLUSTER" ``` For ID export only, use a [fieldset](../schemas/documents.html#fieldsets): ``` $ curl \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "$ENDPOINT/document/v1/?cluster=$CLUSTER&fieldSet=%5Bid%5D" ``` From an ID, like _id:open:doc::open/documentation/schemas.html_, extract - NAMESPACE: open - DOCTYPE: doc Example script: ``` #!/bin/bash set -x # The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/' ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud" NAMESPACE=open DOCTYPE=doc CLUSTER=documentation continuation="" idx=0 while ((idx+=1)) echo "$continuation" printf -v out "%05g" $idx filename=${NAMESPACE}-${DOCTYPE}-${out}.data.gz echo "Fetching data..." token=$( curl -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?wantedDocumentCount=1000&concurrency=4&cluster=${CLUSTER}&${continuation}" \ | tee >( gzip > ${filename} ) | jq -re .continuation ) do continuation="continuation=${token}" done ``` If only a few documents are returned per response, _wantedDocumentCount_ (default 1, max 1024) can be specified for a lower bound on the number of documents per response, if that many documents still remain. Specifying _concurrency_ (default 1, max 100) increases throughput, at the cost of resource usage. This also increases the number of documents per response, and _could_ lead to excessive memory usage in the HTTP container when many large documents are buffered to be returned in the same response. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Export documents](#export-documents) - [Backup](#backup) - [Feed](#feed) - [Delete](#delete) - [Update](#update) - [Using /document/v1/ api](#using-document-v1-api) --- # Source: https://docs.vespa.ai/en/reference/operations/metrics/default-metric-set.html.md # Default Metric Set This document provides reference documentation for the Default metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix. ## ClusterController Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | cluster-controller.down.count | node | last, max | Number of content nodes down | | cluster-controller.maintenance.count | node | last, max | Number of content nodes in maintenance | | cluster-controller.up.count | node | last, max | Number of content nodes up | | cluster-controller.is-master | binary | last, max | 1 if this cluster controller is currently the master, or 0 if not | | cluster-controller.resource\_usage.nodes\_above\_limit | node | last, max | The number of content nodes above resource limit, blocking feed | | cluster-controller.resource\_usage.max\_memory\_utilization | fraction | last, max | Current memory utilisation, for content node with the highest value | | cluster-controller.resource\_usage.max\_disk\_utilization | fraction | last, max | Current disk space utilisation, for content node with the highest value | ## Container Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | http.status.1xx | response | rate | Number of responses with a 1xx status | | http.status.2xx | response | rate | Number of responses with a 2xx status | | http.status.3xx | response | rate | Number of responses with a 3xx status | | http.status.4xx | response | rate | Number of responses with a 4xx status | | http.status.5xx | response | rate | Number of responses with a 5xx status | | jdisc.gc.ms | millisecond | average, max | Time spent in JVM garbage collection | | jdisc.thread\_pool.work\_queue.capacity | thread | max | Capacity of the task queue | | jdisc.thread\_pool.work\_queue.size | thread | count, max, min, sum | Size of the task queue | | jdisc.thread\_pool.size | thread | max | Size of the thread pool | | jdisc.thread\_pool.active\_threads | thread | count, max, min, sum | Number of threads that are active | | jdisc.application.failed\_component\_graphs | item | rate | JDISC Application failed component graphs | | jdisc.singleton.is\_active | item | last, max | JDISC Singleton is active | | jdisc.http.ssl.handshake.failure.missing\_client\_cert | operation | rate | JDISC HTTP SSL Handshake failures due to missing client certificate | | jdisc.http.ssl.handshake.failure.incompatible\_protocols | operation | rate | JDISC HTTP SSL Handshake failures due to incompatible protocols | | jdisc.http.ssl.handshake.failure.incompatible\_chifers | operation | rate | JDISC HTTP SSL Handshake failures due to incompatible chifers | | jdisc.http.ssl.handshake.failure.unknown | operation | rate | JDISC HTTP SSL Handshake failures for unknown reason | | mem.heap.free | byte | average | Free heap memory | | athenz-tenant-cert.expiry.seconds | second | last, max, min | Time remaining until Athenz tenant certificate expires | | feed.operations | operation | rate | Number of document feed operations | | feed.latency | millisecond | count, sum | Feed latency | | queries | operation | rate | Query volume | | query\_latency | millisecond | average, count, max, sum | The overall query latency as seen by the container | | failed\_queries | operation | rate | The number of failed queries | | degraded\_queries | operation | rate | The number of degraded queries, e.g. due to some content nodes not responding in time | | hits\_per\_query | hit\_per\_query | average, count, max, sum | The number of hits returned | | docproc.documents | document | sum | Number of processed documents | | totalhits\_per\_query | hit\_per\_query | average, count, max, sum | The total number of documents found to match queries | | serverActiveThreads | thread | average | Deprecated. Use jdisc.thread\_pool.active\_threads instead. | ## Distributor Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | vds.distributor.docsstored | document | average | Number of documents stored in all buckets controlled by this distributor | | vds.bouncer.clock\_skew\_aborts | operation | count | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range | ## NodeAdmin Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | endpoint.certificate.expiry.seconds | second | N/A | Time until node endpoint certificate expires | | node-certificate.expiry.seconds | second | N/A | Time until node certificate expires | ## SearchNode Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | content.proton.documentdb.documents.total | document | last, max | The total number of documents in this documents db (ready + not-ready) | | content.proton.documentdb.documents.ready | document | last, max | The number of ready documents in this document db | | content.proton.documentdb.documents.active | document | last, max | The number of active / searchable documents in this document db | | content.proton.documentdb.disk\_usage | byte | last | The total disk usage (in bytes) for this document db | | content.proton.documentdb.memory\_usage.allocated\_bytes | byte | last | The number of allocated bytes | | content.proton.search\_protocol.query.latency | second | average, count, max, sum | Query request latency (seconds) | | content.proton.search\_protocol.docsum.latency | second | average, count, max, sum | Docsum request latency (seconds) | | content.proton.search\_protocol.docsum.requested\_documents | document | rate | Total requested document summaries | | content.proton.resource\_usage.disk | fraction | average | The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller | | content.proton.resource\_usage.memory | fraction | average | The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller | | content.proton.resource\_usage.feeding\_blocked | binary | last, max | Whether feeding is blocked due to resource limits being reached (value is either 0 or 1) | | content.proton.transactionlog.disk\_usage | byte | last | The disk usage (in bytes) of the transaction log | | content.proton.documentdb.matching.docs\_matched | document | rate | Number of documents matched | | content.proton.documentdb.matching.docs\_reranked | document | rate | Number of documents re-ranked (second phase) | | content.proton.documentdb.matching.rank\_profile.query\_latency | second | average, count, max, sum | Total average latency (sec) when matching and ranking a query | | content.proton.documentdb.matching.rank\_profile.query\_setup\_time | second | average, count, max, sum | Average time (sec) spent setting up and tearing down queries | | content.proton.documentdb.matching.rank\_profile.rerank\_time | second | average, count, max, sum | Average time (sec) spent on 2nd phase ranking | ## Sentinel Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | sentinel.totalRestarts | restart | last, max, sum | Total number of service restarts done by the sentinel since the sentinel was started | ## Storage Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | vds.filestor.allthreads.put.count | operation | rate | Number of requests processed. | | vds.filestor.allthreads.remove.count | operation | rate | Number of requests processed. | | vds.filestor.allthreads.update.count | request | rate | Number of requests processed. | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [ClusterController Metrics](#clustercontroller-metrics) - [Container Metrics](#container-metrics) - [Distributor Metrics](#distributor-metrics) - [NodeAdmin Metrics](#nodeadmin-metrics) - [SearchNode Metrics](#searchnode-metrics) - [Sentinel Metrics](#sentinel-metrics) - [Storage Metrics](#storage-metrics) --- # Source: https://docs.vespa.ai/en/reference/querying/default-result-format.html.md # Default JSON Result Format The default Vespa query response format is used when [presentation.format](../api/query.html#presentation.format) is unset or set to `json`. An alternative binary [CBOR](https://cbor.io/) format is available by setting `format=cbor` or using `Accept: application/cbor`. CBOR is a drop-in replacement - when deserialized, the result is identical to JSON. CBOR is both more compact and faster to generate, especially for numeric data such as tensors and embeddings. Results are rendered with one or more objects: - `root`: mandatory object with the tree of returned data - `timing`: optional object with query timing information - `trace`: optional object for metadata about query execution Refer to the [query API guide](../../querying/query-api.html#result-examples) for result and tracing examples. All object names are literal strings, the node `root` is the map key "root" in the return JSON object, in other words, only strings are used as map keys. | Element | Parent | Mandatory | Type | Description | | --- | --- | --- | --- | --- | | ## root | | root | | yes | Map of string to object | The root of the tree of returned data. | | children | root | no | Array of objects | Array of JSON objects with the same structure as `root`. | | fields | root | no | Map of string to object | | | totalCount | fields | no | Integer | Number of documents matching the query. Not accurate when using _nearestNeighbor_, _wand_ or _weakAnd_ query operators. The value is the number of hits after [first-phase dropping](../schemas/schemas.html#rank-score-drop-limit). | | coverage | root | no | Map of string to string and number | Map of metadata about how much of the total corpus has been searched to return the given documents. | | coverage | coverage | yes | Integer | Percentage of total corpus searched (when lower than 100 this is an approximation and is a lower bound, as no info from nodes down is known) | | documents | coverage | yes | Long | The number of active documents searched. | | full | coverage | yes | Boolean | Whether the full corpus was searched. | | nodes | coverage | yes | Integer | The number of search nodes returning results. | | results | coverage | yes | Integer | The number of results merged creating the final rendered result. | | resultsFull | coverage | yes | Integer | The number of full result sets merged, e.g. when there are several sources/clusters for the results. | | degraded | coverage | no | Map of string to object | Map of match-phase degradation elements. | | match-phase | degraded | no | Boolean | Indicator whether [match-phase degradation](../schemas/schemas.html#match-phase) has occurred. | | timeout | degraded | no | Boolean | Indicator whether the query [timed out](../api/query.html#timeout) before completion. | | adaptive-timeout | degraded | no | Boolean | Indicator whether the query timed out with [adaptive timeout](../api/query.html#ranking.softtimeout.enable) before completion. | | non-ideal-state | degraded | no | Boolean | Indicator whether the content cluster is in [ideal state](../../content/idealstate.html). | | errors | root | no | Array of objects | Array of error messages with the fields given below. [Example](../../querying/query-api.html#error-result). | | code | errors | yes | Integer | Numeric identifier used by the container application. See [error codes](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/protect/Error.java) and [ErrorMessage.java](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/result/ErrorMessage.java) for a short description. | | message | errors | no | String | Full error message. | | source | errors | no | String | Which [data provider](../../querying/federation.html) logged the error condition. | | stackTrace | errors | no | String | Stack trace if an exception was involved. | | summary | errors | yes | String | Short description of error. | | transient | errors | no | Boolean | Whether the system is expected to recover from the faulty state on its own. If the flag is not present, this may or may not be the case, or the flag is not applicable. | | fields | root | no | Map of string to object | The named document (schema) [fields](../schemas/schemas.html#field). Fields without value are not rendered. In addition to the fields defined in the schema, the following might be returned: | Fieldname | Description | | --- | --- | | sddocname | Schema name. Returned in the [default document summary](../../querying/document-summaries.html). | | documentid | Document ID. Returned in the [default document summary](../../querying/document-summaries.html). | | summaryfeatures | Refer to [summary-features](../schemas/schemas.html#summary-features) and [observing values used in ranking](../../ranking/ranking-intro#observing-values-used-in-ranking). | | matchfeatures | Refer to [match-features](../schemas/schemas.html#match-features) and [example use](../../querying/nearest-neighbor-search-guide#strict-filters-and-distant-neighbors). | | | id | root | no | String | String identifying the hit, document or other data type. For document hits, this is the full string document id if the hit is filled with a document summary from disk. If it is not filled or only filled with data from memory (attributes), it is an internally generated unique id on the form `index:[source]/[node-index]/[hex-gid]`. Also see the [/document/v1/ guide](../../writing/document-v1-api-guide.html#troubleshooting) and [receiving-responses-of-different-formats-for-the-same-query-in-vespa](https://stackoverflow.com/questions/74033383/receiving-responses-of-different-formats-for-the-same-query-in-vespa). | | label | root | no | String | The label of a grouping list. | | limits | root | no | Object | Used in grouping, the limits of a bucket in histogram style data. | | from | limits | no | String | Lower bound of a bucket group. | | to | limits | no | String | Upper bound of a bucket group. | | relevance | root | yes | Double | Double value representing the rank score. | | source | root | no | String | Which data provider created this node. | | types | root | no | Array of string | Metadata about what kind of document or other kind of node in the result set this object is. | | value | root | no | String | Used in grouping for value groups, the argument for the grouping data which is in the fields. | | | | ## timing | | timing | | no | Map of string to object | Query timing information, enabled by [presentation.timing](../api/query.html#presentation.timing). The [query performance guide](../../performance/practical-search-performance-guide#basic-text-search-query-performance) is a useful resource to understand the values in its child elements. | | querytime | timing | no | Double | Time to execute the first protocol phase/matching phase, in seconds. | | summaryfetchtime | timing | no | Double | [Document summary](../../querying/document-summaries.html) fetch time, in seconds. This is the time to execute the summary fill protocol phase for the globally ordered top-k hits. | | searchtime | timing | no | Double | Approximately the sum of `querytime` and `summaryfetchtime` and is close to what a client will observe (except network latency). In seconds. | | | | ## trace **Note:** The tracing elements below is a subset of all elements. Refer to the [search performance guide](../../performance/practical-search-performance-guide#advanced-query-tracing) for examples. | | trace | | no | Map of string to object | Metadata about query execution. | | children | trace | no | Array of object | Array of maps with exactly the same structure as `trace` itself. | | timestamp | children | no | Long | Number of milliseconds since the start of query execution this node was added to the trace. | | message | children | no | String | Descriptive trace text regarding this step of query execution. | | message | children | no | Array of objects | Array of messages | | start\_time | message | no | String | Timestamp, e.g. 2022-07-27 09:51:21.938 UTC | | traces | message or threads | no | Array of traces or objects | | | distribution-key | message | no | Integer | The [distribution key](../applications/services/content.html#node) of the content node creating this span. | | duration\_ms | message | no | float | duration of span | | timestamp\_ms | traces | no | float | time since start of parent, see `start_time`. | | event | traces | no | String | Description of span | | tag | traces | no | String | Name of span | | threads | traces | no | Array of objects | Array of object that again has traces elements. | ## JSON Schema Formal schema for the query API default result format: ``` ``` { "$schema": "http://json-schema.org/draft-04/schema#", "title": "Result", "description": "Schema for Vespa results", "type": "object", "properties": { "root": { "type": "document_node", "required": true }, "trace": { "type": "trace_node", "required": false } }, "definitions": { "document_node": { "properties": { "children": { "type": "array", "items": { "type": "document_node" }, "required": false }, "coverage": { "type": "coverage", "required": false }, "errors": { "type": "array", "items": { "type": "error" }, "required": false }, "fields": { "type": "object", "additionalProperties": true, "required": false }, "id": { "type": "string", "required": false }, "relevance": { "type": "number", "required": true }, "types": { "type": "array", "items": { "type": "string" }, "required": false }, "source": { "type": "string", "required": false }, "value": { "type": "string", "required": false }, "limits": { "type": "object", "required": false }, "label": { "type": "string", "required": false } }, "additionalProperties": true, }, "trace_node": { "properties": { "children": { "type": "array", "items": { "type": "trace_node" }, "required": false }, "timestamp": { "type": "number", "required": false }, "message": { "type": "string", "required": false } } }, "fields": { "properties": { "totalCount": { "type": "number", "required": true } } }, "coverage": { "properties": { "coverage": { "type": "number", "required": true }, "documents": { "type": "number", "required": true }, "full": { "type": "boolean", "required": true }, "nodes": { "type": "number", "required": true }, "results": { "type": "number", "required": true }, "resultsFull": { "type": "number", "required": true } } }, "error": { "properties": { "code": { "type": "number", "required": true }, "message": { "type": "string", "required": false }, "source": { "type": "string", "required": false }, "stackTrace": { "type": "string", "required": false }, "summary": { "type": "string", "required": true }, "transient": { "type": "boolean", "required": false } } } } } ``` ``` ## Appendix: Legacy Vespa 7 JSON rendering There were some inconsistencies between search results and document rendering in Vespa 7, which are fixed in Vespa 8. This appendix describes the old behavior, what the changes are, and how to configure to select a specific rendering. ### Inconsistent weightedset rendering Fields with various weightedset types has a JSON input representation (for feeding) as a JSON object; for example `{"one":1, "two":2,"three":3}` for the value of a a `weightedset` field. The same format is used when rendering a document (for example when visiting). In search results however, there are intermediate processing steps during which the field value is represented as an array of item/weight pairs, so in a search result the field value would render as `[ {"item":"one", "weight":1}, {"item":"two", "weight":2}, {"item":"three", "weight":3} ]` In Vespa 8, the default JSON renderer for search results outputs the same format as document rendering. If you have code that depends on the old format you can turn off this by setting `renderer.json.jsonWsets=false` in the query (usually via a [query profile](../../querying/query-profiles.html)). ### Inconsistent map rendering Fields with various map types has a JSON input representation (for feeding) as a JSON object; for example `{"1001":1.0, "1002":2.0, "1003":3.0}` for the value of a a `map` field. The same format is used when rendering a document (for example when visiting). In search results however, there are intermediate processing steps and the field value is represented as an array of key/value pairs, so in a search results the field value would (in some cases) render as `[ {"key":1001, "value":1.0}, {"key":1002, "value":2.0}, {"key":1003, "value":3.0} ]` In Vespa 8, the default JSON renderer for search results output the same format as document rendering. For code that depends on the old format one can turn off this by setting `renderer.json.jsonMaps=false` in the query (usually via a [query profile](../../querying/query-profiles.html)). ### Geo position rendering Fields with the type `position` would in Vespa 7 be rendered using the internal fields "x" and "y". These are integers representing microdegrees, aka geographical degrees times 1 million, of longitude (for x) and latitude (for y). Also, any field _foo_ of type `position` would trigger addition of two extra synthetic summary fields _foo.position_ and _foo.distance_ (see below for details). In Vespa 8, positions are rendered with two JSON fields "lat" and "lng", both having a floating-point value. The "lat" field is latitude (going from -90.0 at the South Pole to +90.0 at the North Pole). The "lng" field is longitude (going from -180.0 at the dateline seen as extreme west, via 0.0 at the Greenwich meridian, to +180.0 at the dateline again, now as extreme east). The field names are chosen so the format is the same as used in the Google "places" API. A closely related change is the removal of two synthetic summary fields which would be returned in search results. For example with this in schema: ``` field mainloc type position { indexing: attribute | summary } ``` Vespa 7 would include the _mainloc_ summary field, but also _mainloc.position_ and _mainloc.distance_; the latter only when the query actually had a position to take the distance from. The first of these (_mainloc.position_ in this case) was mainly useful for producing XML output in older Vespa versions, and now contains just the same information as the _mainloc_ summary field. The second (_mainloc.distance_ in this case) would return a distance in internal units, and can be replaced by a summary feature - here `distance(mainloc)` would give the same number, while `distance(mainloc).km` would be the recommended replacement with suitable code changes. ### Summary-features wrapped in "rankingExpression" In Vespa 7, if a rank profile wanted a function `foobar` returned in summary-features (or match-features), it would be rendered as `rankingExpression(foobar)` in the output. For programmatic use, the `FeatureData` class has extra checking to allow lookup with `getDouble("foobar")` or `getTensor("foobar")`, but now it's present and rendered with just the original name as specified. If applications needs the JSON rendering to look exactly as in Vespa 7, one can specify that in the rank profile. For example, with this in the schema: ``` rank-profile whatever { function lengthScore() { expression: matchCount(title)/fieldLength(title) } summary-features { matchCount(title) lengthScore ... ``` could, in Vespa 7, yield JSON output containing: ``` summaryfeatures: { matchCount(title): 1, rankingExpression(lengthScore): 0.25, ... ``` in Vespa 8, you instead get the expected: ``` summaryfeatures: { matchCount(title): 1, lengthScore: 0.25, ... ``` But to get the old behavior one can specify: ``` rank-profile whatever { function lengthScore() { expression: matchCount(title)/fieldLength(title) } summary-features { matchCount(title) rankingExpression(lengthScore) ... ``` which gives you the same output as before. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [root](#root-header) - [timing](#timing-header) - [trace](#trace-heading) - [JSON Schema](#json-schema) - [Appendix: Legacy Vespa 7 JSON rendering](#appendix-legacy-vespa-7-json-rendering) - [Inconsistent weightedset rendering](#inconsistent-weightedset-rendering) - [Inconsistent map rendering](#inconsistent-map-rendering) - [Geo position rendering](#geo-position-rendering) - [Summary-features wrapped in "rankingExpression"](#summary-features-wrapped-in-rankingexpression) --- # Source: https://docs.vespa.ai/en/operations/deleting-applications.html.md # Deleting Applications **Warning:** Following these steps will remove production instances or regions and all data within them. Data will be unrecoverable. ## Deleting an application To delete an application, use the console: - navigate to the _application_ view at https://console.vespa-cloud.com/tenant/tenant-name/application where you can find the trash can icon to the far right, as an `ACTION`. - navigate to the _deploy_ view at_https://console.vespa-cloud.com/tenant/tenant-name/application/app-name/prod/deploy_. ![delete production deployment](/assets/img/console/delete-production-deployment.png) When the application deployments are deleted, delete the application in the [console](https://console.vespa-cloud.com). Remove the CI job that builds and deploys application packages, if any. ## Deleting an instance / region To remove an instance or a deployment to a region from an application: 1. Remove the `region` from `prod`, or the `instance` from `deployment`in [deployment.xml](../reference/applications/deployment.html#instance): 2. Add or modify [validation-overrides.xml](../reference/applications/validation-overrides.html), allowing Vespa Cloud to remove production instances: 3. Build and deploy the application package. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/applications/dependency-injection.html.md # Dependency injection The Container (a.k.a. JDisc container) implements a dependency injection framework that allows components to declare arbitrary dependencies on configuration and other components in the application. This document explains how to write a container component that depends on another component. See the [reference](../reference/applications/components.html#injectable-components)for a list of injectable components. The container relies on auto-injection instead of Guice modules. All components declared in the container cluster are available for injection, and the dependent component only needs to declare the dependency as a constructor parameter. In general, dependency injection involves at least three elements: - a dependent consumer, - a declaration of a component's dependencies, - an injector that creates instances of classes that implement a given dependency on request. Notes: - The dependent object describes what software component it depends on to do its work. The injector decides what concrete classes satisfy the requirements of the dependent object, and provides them to the dependent - The Container encapsulates the injector, and the consumer and all its dependencies are considered to be components. - The Container only supports constructor injection (i.e. all dependencies must be declared in a component's constructor). - Circular dependencies is not supported. Refer to the [multiple-bundles sample app](https://github.com/vespa-engine/sample-apps/tree/master/examples/multiple-bundles) for a practical example. ## Depending on another component A component that depends on another is considered to be a _consumer_. A component's dependencies is whatever its `@Inject`-annotated constructor declares as arguments. E.g. the component: ``` package com.yahoo.example; import com.yahoo.component.annotation.Inject; public class MyComponent { private final MyDependency dependency; @Inject public MyComponent(MyDependency dependency) { this.dependency = dependency; } } ``` has a dependency on the class `com.yahoo.example.MyDependency`. To deploy `MyComponent`, register `MyDependency` in `services.xml`: ``` ``` Upon deployment, the Container will first instantiate `MyDependency`, and then pass that instance to the constructor of `MyComponent`. Multiple consumers can take the same dependency. One can also [inject configuration](configuring-components.html) to components. **Note:** A component will be reconstructed only when one of its dependencies, configuration, or its class changes - all which only occurs when you re-deploy your application package. Reconstruction is transitive; if component A depends on component B, and component B depends on component C, then a reconfiguration of component B causes a reconfiguration of A, but not of C. Reconfiguration of C causes a reconstruction of both A and B. ### Extending components When injecting two components when one extends the other, the dependency injection code does not know which of the two to use as the argument for the parent class. To resolve this, inject a `ComponentRegistry` (see below), and look up its entries, like `getComponent(XXX.class.getName())`. ### Specify the bundle The example above assumes the bundle name can be deducted from the class name. This is not always the case, and you will get class loading problems like: ``` Caused by: java.lang.IllegalArgumentException: Could not create a component with id 'com.yahoo.example.My'. Tried to load class directly, since no bundle was found for spec: com.yahoo.example.Dependency ``` To remedy, specify the jar file (i.e. bundle) with the component: ``` ``` ## Depending on all components of a specific type Consider the use-case where a component chooses between various strategies, and each strategy is implemented as a separate component. Since the number and type of strategies is unknown when implementing the consumer, it is impossible to make a constructor that lists all of them. This is where the `ComponentRegistry` comes into play. E.g. the following component: ``` package com.yahoo.example; public class MyComponent { private final ComponentRegistry strategies; @Inject public MyComponent(ComponentRegistry strategies) { this.strategies = strategies; } } ``` declares a dependency on the set of all components registered in `services.xml`that are instances of the class `Strategy` (including subclasses). The `ComponentRegistry` class provides accessors for components based on their [component id](../reference/applications/services/container.html#component). ## Special Components There are cases where a component cannot be directly injected to its consumers - example: - The component must be instantiated via a factory method instead of its constructor - Each consumer must have a unique instance of the dependency class - The component uses native resources that must be cleaned up when the component goes out of scope For these situations, JDisc supports injection, and optional deconstruction, via its `Provider` interface: ``` public interface Provider { T get(); void deconstruct(); } ``` `get()` is called by JDisc each time it needs to instantiate the specific component type.`deconstruct()` is only called after reconfiguring the system with a new application, where the current provider instance is either removed or replaced due to modified dependencies. Following the earlier example, declare a provider for the `MyDependency` class, that returns a new instance for each consumer: ``` package com.yahoo.example; import com.yahoo.container.di.componentgraph.Provider; public class MyDependencyProvider implements Provider { @Override public MyDependency get() { return new MyDependency(); } @Override public void deconstruct() { } } ``` Using this provider, `services.xml` has two instances of `MyComponent`, each getting a unique instance of `MyDependency`: ``` ``` Upon deployment, the Container will first instantiate `MyDependencyProvider`, and then invoke `MyDependencyProvider.get()` for each instantiation of `MyComponent`. A provider can declare constructor dependencies, just like any other component. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Depending on another component](#depending-on-another-component) - [Extending components](#extending-components) - [Specify the bundle](#specify-the-bundle) - [Depending on all components of a specific type](#depending-on-all-components-of-a-specific-type) - [Special Components](#special-components) --- # Source: https://docs.vespa.ai/en/basics/deploy-an-application-java.html.md # Deploy an application having Java components Follow these steps to deploy a Vespa application which includes Java components to the [dev zone](../operations/environments.html#dev) on Vespa Cloud (for free). Alternative versions of this guide: - [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application without Java components](deploy-an-application.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application locally](deploy-an-application-local.html) - [Deploy an application having Java components locally](deploy-an-application-local-java.html) **Prerequisites:** - [Java 17](https://openjdk.org/projects/jdk/17/). - [Apache Maven](https://maven.apache.org/install.html) to build the application. Steps: 1. **Create a [tenant](../learn/tenant-apps-instances.html) on Vespa Cloud:** 2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Configure the Vespa client:** 4. **Get Vespa Cloud control plane access:** 5. **Clone a sample [application](applications.html):** 6. **Add a certificate for [data plane access](../security/guide#data-plane) to the application:** 7. **Build the application:** 8. **[Deploy](applications.html#deploying-applications) the application:** 9. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):** 10. **Run [queries](../querying/query-api.html):** Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](../operations/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/). #### Next: [Vespa applications](applications.html) Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/basics/deploy-an-application-local-java.html.md # Deploy an application having Java components locally Follow these steps to deploy a Vespa application having Java components on your own machine. Alternative versions of this guide: - [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application](deploy-an-application.html) - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application without Java components locally](deploy-an-application-local.html) This is tested with _vespaengine/vespa:8.634.24_ container image. **Prerequisites:** - Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with [Podman Desktop](https://podman.io/) or [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed, with an engine running. - Alternatively, start the Podman daemon: ``` $ podman machine init --memory 6000 $ podman machine start ``` - See [Docker Containers](/en/operations/self-managed/docker-containers.html) for system limits and other settings. - For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html). - Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations/self-managed/node-setup.html#memory-settings). - Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/writing/feed-block.html). - [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/clients/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases). - [Java 17](https://openjdk.org/projects/jdk/17/). - [Apache Maven](https://maven.apache.org/install.html) is used to build the application. Steps: 1. **Validate the environment:** 2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Set local target:** 4. **Start a Vespa Docker container:** 5. **Clone a sample [application](applications.html):** 6. **Build it:** 7. **[Deploy](applications.html#deploying-applications) the application:** 8. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):** 9. **Run [queries](../querying/query-api.html):** Congratulations, you have deployed your first Vespa application! #### Next: [Vespa applications](applications.html) ``` $ docker rm -f vespa ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/basics/deploy-an-application-local.html.md # Deploy an application locally Follow these steps to deploy a Vespa application on your own machine. Alternative versions of this guide: - [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application](deploy-an-application.html) - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application having Java components locally](deploy-an-application-local-java.html) This is tested with _vespaengine/vespa:8.634.24_ container image. **Prerequisites:** - Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with [Podman Desktop](https://podman.io/) or [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed, with an engine running. - Alternatively, start the Podman daemon: ``` $ podman machine init --memory 6000 $ podman machine start ``` - See [Docker Containers](/en/operations/self-managed/docker-containers.html) for system limits and other settings. - For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html). - Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations/self-managed/node-setup.html#memory-settings). - Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/writing/feed-block.html). - [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/clients/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases). Steps: 1. **Validate the environment:** 2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Set local target:** 4. **Start a Vespa Docker container:** 5. **Clone a sample [application](applications.html):** 6. **[Deploy](applications.html#deploying-applications) the application:** 7. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):** 8. **Run [queries](../querying/query-api.html):** 9. **Get documents:** Congratulations, you have deployed your first Vespa application! #### Next: [Vespa applications](applications.html) ``` $ docker rm -f vespa ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/basics/deploy-an-application-shell.html.md # Deploy an application without Vespa CLI This lets you deploy an application to the [dev zone](../operations/environments.html#dev)on Vespa Cloud (for free). Alternative versions of this guide: - [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application](deploy-an-application.html) - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application locally](deploy-an-application-local.html) - [Deploy an application with Java components locally](deploy-an-application-local-java.html) **Prerequisites:** - git - or download the files from [album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation) - zip - or other tool to create a .zip file - curl - or other tool to send HTTP requests with security credentials - OpenSSL Steps: 1. **Create a [tenant](../learn/tenant-apps-instances.html) on Vespa Cloud:** 2. **Clone a sample [application](applications.html):** 3. **Add a certificate for [data plane access](../security/guide#data-plane) to the application:** 4. **Create a deployable application package zip:** 5. **Deploy the application:** 6. **Verify the application endpoint:** 7. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):** 8. **Run [queries](../querying/query-api.html):** Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](../operations/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/). #### Next: [Vespa applications](applications.html) Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/basics/deploy-an-application.html.md # Deploy an application Follow these steps to deploy a Vespa application to the [dev zone](../operations/environments.html#dev)on Vespa Cloud (for free). Alternative versions of this guide: - [Deploy an application using pyvespa](https://vespa-engine.github.io/pyvespa/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application locally](deploy-an-application-local.html) - [Deploy an application having Java components locally](deploy-an-application-local-java.html) Steps: 1. **Create a [tenant](../learn/tenant-apps-instances.html) on Vespa Cloud:** 2. **Install the [Vespa CLI](../clients/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Configure the Vespa client:** 4. **Get Vespa Cloud control plane access:** 5. **Clone a sample [application](applications.html):** 6. **Add a certificate for [data plane access](../security/guide#data-plane) to the application:** 7. **[Deploy](applications.html#deploying-applications) the application:** 8. **[Feed](../writing/reads-and-writes.html) [documents](../schemas/documents.html):** 9. **Run [queries](../querying/query-api.html):** Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](../operations/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/). #### Next: [Vespa applications](applications.html) Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/reference/api/deploy-v2.html.md # Deploy API This is the API specification and some examples for the HTTP Deploy API that can be used to deploy an application: - [upload](#create-session) - [prepare](#prepare-session) - [activate](#activate-session) The response format is JSON. Examples are found in the [use-cases](#use-cases). Also see the [deploy guide](/en/basics/applications.html#deploying-applications). **Note:** To build a multi-application system, use one or three config server(s) per application. Best practise is using a [containerized](/en/operations/self-managed/docker-containers.html) architecture, also see [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA). The current API version is 2. The API port is 19071 - use [vespa-model-inspect](/en/reference/operations/self-managed/tools.html#vespa-model-inspect) service configserver to find config server hosts. Example: `http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session`. Write operations return successfully after a majority of config servers have persisted changes (e.g. 2 out of 3 config servers). Entities: | session-id | The session-id used in this API is generated by the server and is required for all operations after [creating](#create-session) a session. The session-id is valid if it is an active session, or it was created before [session lifetime](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/configserver.def) has expired, the default value being 1 hour. | | path | An application file path in a request URL or parameter refers to a relative path in the application package. A path ending with "/" refers to a directory. | Use [Vespa CLI](../../clients/vespa-cli.html) to deploy from the command line. ## POST /application/v2/tenant/default/prepareandactivate Creates a new session with the application package that is included in the request, prepares it and then activates it. See details in the steps later in this document | Parameters | | Name | Default | Description | | --- | --- | --- | | | | | | | Request body | | Required | Content | Note | | --- | --- | --- | | Yes | A compressed [application package](../applications/application-packages.html) (with gzip or zip compression) | Set `Content-Type` HTTP header to `application/x-gzip` or `application/zip`. | | | Response | See [active](#activate-session). | Example: ``` $ (cd src/main/application && zip -r - .) | \ curl --header Content-Type:application/zip --data-binary @- \ localhost:19071/application/v2/tenant/default/prepareandactivate ``` ``` ``` { "log": [ { "time": 1619448107299, "level": "WARNING", "message": "Host named 'vespa-container' may not receive any config since it is not a canonical hostname. Disregard this warning when testing in a Docker container." } ], "tenant": "default", "session-id": "3", "url": "http://localhost:19071/application/v2/tenant/default/application/default/environment/prod/region/default/instance/default", "message": "Session 3 for tenant 'default' prepared and activated.", "configChangeActions": { "restart": [], "refeed": [], "reindex": [] } } ``` ``` ## POST /application/v2/tenant/default/session Creates a new session with the application package that is included in the request. | Parameters | | Name | Default | Description | | --- | --- | --- | | from | N/A | Use when you want to create a new session based on an active application. The value supplied should be a URL to an active application. | | | Request body | | Required | Content | Note | | --- | --- | --- | | Yes, unless `from` parameter is used | A compressed [application package](../applications/application-packages.html) (with gzip or zip compression) | It is required to set the `Content-Type` HTTP header to `application/x-gzip` or `application/zip`, unless the `from` parameter is used. | | | Response | The response contains: - A [session-id](#session-id) to the application that was created. - A [prepared](#prepare-session) URL for preparing the application. | Examples (both requests return the same response): - `POST /application/v2/tenant/default/session` - `POST /application/v2/tenant/default/session?from=http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default` ``` { "tenant": "default", "session-id": "1", "prepared": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/session-id/prepared/", "content": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/session-id/content/", "message": "Session 1 for tenant 'default' created." } ``` ## PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)] Writes the content to the given path, or creates a directory if the path ends with '/'. | Parameters | None | | Request body | - If path is a directory, none. - If path is a file, the contents of the file. | | Response | None - Any errors or warnings from writing the file/creating the directory. | ## GET /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)] Returns the content of the file at this path, or lists files and directories if `path` ends with '/'. | Parameters | | Name | Default | Description | | --- | --- | --- | | recursive | false | If _true_, directory content will be listed recursively. | | return | content | - If set to content and path refers to a file, the content will be returned. - If set to content and path refers to a directory, the files and subdirectories in the directory will be listed. - If set to status and path refers to a file, the file status and hash will be returned. - If set to status and path refers to a directory, a list of file/subdirectory statuses and hashes will be returned. | | | Request body | None. | | Response | - If path is a directory: a JSON array of URLs to the files and subdirectories of that directory. - If path is a file: the contents of the file. - If status parameter is set, the status and hash will be returned. | Examples: `GET /application/v2/tenant/default/session/3/content/` ``` ``` [ "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/services.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/" ] ``` ``` `GET /application/v2/tenant/default/session/3/content/?recursive=true` ``` ``` [ "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/services.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/music.sd", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/video.sd" ] ``` ``` `GET /application/v2/tenant/default/session/3/content/hosts.xml` ``` ``` vespa1 vespa2 ``` ``` `GET /application/v2/tenant/default/session/3/content/hosts.xml?return=status` ``` ``` { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml", "status": "new", "md5": "03d7cff861fcc2d88db70b7857d4d452" } ``` ``` `GET /application/v2/tenant/default/session/3/content/schemas/?return=status` ``` ``` [ { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/music.sd", "status": "new", "md5": "03d7cff861fcc2d88db70b7857d4d452" }, { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/video.sd", "status": "changed", "md5": "03d7cff861fcc2d88db70b7857d4d452" }, { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/book.sd", "status": "deleted", "md5": "03d7cff861fcc2d88db70b7857d4d452" } ] ``` ``` ## DELETE /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)] Deletes the resource at the given path. | Parameters | None | | Request body | None | | Response | Any errors or warnings from deleting the resource. | ## PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/prepared Prepares an application with the [session-id](#session-id) given. | Parameters | | Parameter | Default | Description | | --- | --- | --- | | applicationName | N/A | Name of the application to be deployed | | environment | default | Environment where application should be deployed | | region | default | Region where application should be deployed | | instance | default | Name of application instance | | debug | false | If true, include stack trace in response if prepare fails. | | timeout | 360 seconds | Timeout in seconds to wait for session to be prepared. | | | Request body | None | | Response | Returns a [session-id](#session-id) and a link to activate the session. - Log with any errors or warnings from preparing the application. - An [activate](#activate-session) URL for activating the application with this [session-id](#session-id), if there were no errors. - A list of actions (possibly empty) that must be performed in order to apply some config changes between the current active application and this next prepared application. These actions are organized into three categories; _restart_, _reindex_, and _refeed_: - _Restart_ actions are done after the application has been activated and are handled by restarting all listed services. See [schemas](../schemas/schemas.html#modifying-schemas) for details. - _Reindex_ actions are special refeed actions that Vespa [handles automatically](../../operations/reindexing.html), if the [reindex](#reindex) endpoint below is used. - _Refeed_ actions require several steps to handle. See [schemas](../schemas/schemas.html#modifying-schemas) for details. | Example: `PUT /application/v2/tenant/default/session/3/prepared` ``` ``` { "tenant": "default", "session-id": "3", "activate": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/active", "message": "Session 3 for tenant 'default' prepared.", "log": [ { "level": "WARNING", "message": "Warning message 1", "time": 1430134091319 }, { "level": "WARNING", "message": "Warning message 2", "time": 1430134091320 } ], "configChangeActions": { "restart": [ { "clusterName": "mycluster", "clusterType": "search", "serviceType": "searchnode", "messages": ["Document type 'test': Field 'f1' changed: add attribute aspect"], "services": [ { "serviceName": "searchnode", "serviceType": "searchnode", "configId": "mycluster/search/cluster.mycluster/0", "hostName": "myhost.mydomain.com" } ] } ], "reindex": [ { "documentType": "test", "clusterName": "mycluster", "messages": ["Document type 'test': Field 'f1' changed: add index aspect"], "services": [ { "serviceName": "searchnode", "serviceType": "searchnode", "configId": "mycluster/search/cluster.mycluster/0", "hostName": "myhost.mydomain.com" } ] } ] } } ``` ``` ## GET /application/v2/tenant/default/session/[[session-id](#session-id)]/prepared Returns the state of a prepared session. The response is the same as a successful [prepare](#prepare-session) operation (above), however the _configChangeActions_ element will be empty. ## PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/active Activates an application with the [session-id](#session-id) given. The [session-id](#session-id) must be for a [prepared session](#prepare-session). The operation will make sure the session is activated on all config servers. | Parameters | | Parameter | Default | Description | | --- | --- | --- | | timeout | 60 seconds | Timeout in seconds to wait for session to be activated (when several config servers are used, they might need to sync before activate can be done). | | | Request body | None | | Response | Returns a [session-id](#session-id), a message and a URL to the activated application. - [session-id](#session-id) - Message | Example: `PUT /application/v2/tenant/default/session/3/active` ``` ``` { "tenant": "default", "session-id": "3", "message": "Session 3 for tenant 'default' activated.", "url": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default" } ``` ``` ## GET /application/v2/tenant/default/application/ Returns a list of the currently active applications for the given tenant. | Parameters | None | | Request body | None | | Response | Returns a list of applications - Array of active applications | Example: `GET /application/v2/tenant/default/application/` ``` ``` { ["http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default"] } ``` ``` ## GET /application/v2/tenant/default/application/default Gets info about the application. | Parameters | None | | Request body | None | | Response | Returns information about the application specified. - config generation | Example: `GET /application/v2/tenant/default/application/default` ``` ``` { "generation": 2 } ``` ``` ## GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing Returns [reindexing](../../operations/reindexing.html) status for the given application. | Parameters | N/A | | Request body | N/A | | Response | JSON detailing current reindexing status for the application, with all its clusters and document types. - Status for each content cluster in the application, by name: - Status of each document type in the cluster, by name: - Last time reindexing was triggered for this document type. - Current status of reindexing. - Optional start time of reindexing. - Optional end time of reindexing. - Optional progress of reindexing, from 0 to 1. - Pseudo-speed of reindexing. | Example: `GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing` ``` ``` { "clusters": { "db": { "ready": { "test_artifact": { "readyMillis": 1607937250998, "startedMillis": 1607940060012, "state": "running", "speed": 1.0, "progress": 0.04013824462890625 }, "test_result": { "readyMillis": 1607688477294, "startedMillis": 1607690520026, "endedMillis": 1607709294236, "speed": 0.1, "state": "successful" }, "test_run": { "readyMillis": 1607937250998, "state": "pending" } } } } } ``` ``` ## POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex Marks specified document types in specified clusters of an application as ready for [reindexing](../../operations/reindexing.html). Reindexing itself starts with the next redeployment of the application. To stop an ongoing reindexing, see [updating reindexing](#update-reindexing) below. All document types in all clusters are reindexed unless restricted, using parameters as specified: | Parameters | | Name | Description | | --- | --- | | clusterId | A comma-separated list of content clusters to limit reindexing to. All clusters are reindexed if this is not present. | | documentType | A comma-separated list of document types to limit reindexing to. All document types are reindexed if this is not present. | | indexedOnly | Boolean: whether to mark reindexing ready only for document types with indexing mode _index_ and at least one field with the indexing statement `index`. Default is `false`. | | speed | Number (0–10], default 1: Indexing pseudo speed - balance speed vs. resource use. Example: speed=0.1 | | | Request body | N/A | | Response | A human-readable message indicating what reindexing was marked as ready. | Example: `POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex?clusterId=foo,bar&documentType=moo,baz&indexedOnly=true` ``` ``` { "message": "Reindexing document types [moo, baz] in 'foo', [moo] in 'bar' of application default.default" } ``` ``` ## PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex Modifies [reindexing](../../operations/reindexing.html) of specified document types in specified clusters of an application. Specifically, this can be used to alter the pseudo-speed of the reindexing, optionally halting it by specifying a speed of `0`; reindexing for the specified types will remain dormant until either speed is increased again, or a new reindexing is triggered (see [trigger reindexing](#reindex)). Speed changes become effective with the next redeployment of the application. Reindexing for all document types in all clusters are affected if no other parameters are specified: | Parameters | | Name | Description | | --- | --- | | clusterId | A comma-separated list of content clusters to limit the changes to. Reindexing for all clusters are modified if this is not present. | | documentType | A comma-separated list of document types to limit the changes to. Reindexing for all document types are modified if this is not present. | | indexedOnly | Boolean: whether to modify reindexing only for document types with indexing mode _index_ and at least one field with the indexing statement `index`. Default is `false`. | | speed | Number [0–10], required: Indexing pseudo speed - balance speed vs. resource use. Example: speed=0.1 | | | Request body | N/A | | Response | A human-readable message indicating what reindexing was modified. | Example: `PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex?clusterId=foo,bar&documentType=moo,baz&speed=0.618` ``` ``` { "message": "Set reindexing speed to '0.618' for document types [moo, baz] in 'foo', [moo] in 'bar' of application default.default" } ``` ``` ## GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/content/[[path](#path)] Returns content at the given path for an application. See [getting content](#content-get) for usage and response. ## DELETE /application/v2/tenant/default/application/default Deletes an active application. | Parameters | None | | Request body | None | | Response | Returns a message stating if the operation was successful or not | Example: `DELETE /application/v2/tenant/default/application/default` ``` ``` { "message": "Application 'default' was deleted" } ``` ``` ## GET /application/v2/host/[hostname] Gets information about which tenant and application a hostname is used by. | Parameters | None | | Request body | None | | Response | Returns a message with tenant and application details. | Example: `GET /application/v2/host/myhost.mydomain.com` ``` ``` { "tenant": "default" "application": "default" "environment": "default" "region": "default" "instance": "default" } ``` ``` ## Error Handling Errors are returned using standard HTTP status codes. Any additional info is included in the body of the return call, JSON-formatted. The general format for an error response is: ``` ``` { "error-code": "ERROR_CODE", "message": "An error message" } ``` ``` | HTTP status code | Error code | Description | | --- | --- | --- | | 400 | BAD\_REQUEST | Bad request. Client error. The error message should indicate the cause. | | 400 | INVALID\_APPLICATION\_PACKAGE | There is an error in the application package. The error message should indicate the cause. | | 400 | OUT\_OF\_CAPACITY | Not enough nodes available for the request to be fulfilled. | | 401 | | Not authorized. The error message should indicate the cause. | | 404 | NOT\_FOUND | Not found. E.g. when using a session-id that doesn't exist. | | 405 | METHOD\_NOT\_ALLOWED | Method not implemented. E.g. using GET where only POST or PUT is allowed. | | 409 | ACTIVATION\_CONFLICT | Conflict, returned when activating an application fails due to a conflict with other changes to the same application (in another session). Client should retry. | | 500 | INTERNAL\_SERVER\_ERROR | Internal server error. Generic error. The error message should indicate the cause. | ## Access log Requests are logged in the [access log](../../operations/access-logging.html) which can be found at _$VESPA\_HOME/logs/vespa/configserver/access-json.log_, example: ``` ``` { "ip": "172.17.0.2", "time": 1655665104.751, "duration": 1.581, "responsesize": 230, "requestsize": 0, "code": 200, "method": "PUT", "uri": "/application/v2/tenant/default/session/2/prepared", "version": "HTTP/2.0", "agent": "vespa-deploy", "host": "b614c9ff04d7:19071", "scheme": "https", "localport": 19071, "peeraddr": "172.17.0.2", "peerport": 47480, "attributes": { "http2-stream-id":"1" } } ``` ``` ## Use Cases It is assumed that the tenant _default_ is already created in these use cases, and the application package is in _app_. ### Create, prepare and activate an application Create a session with the application package: ``` $ (cd app && zip -r - .) | \ curl -s --header Content-Type:application/zip --data-binary @- \ "http://host:19071/application/v2/tenant/default/session" ``` Prepare the application with the URL in the _prepared_ link from the response: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/1/prepared?applicationName=default" ``` Activate the application with the URL in the _activate_ link from the response: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/1/active" ``` ### Modify the application package Dump _services.xml_ from session 1: ``` $ curl -s -X GET "http://host:19071/application/v2/tenant/default/session/1/content/services.xml" ``` ``` ``` 12345 ``` ``` Session 1 is activated and cannot be changed - create a new session based on the active session: ``` $ curl -s -X POST "http://host:19071/application/v2/tenant/default/session?from=http://host:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default" ``` Modify rpcport to 12346 in _services.xml_, deploy the change: ``` $ curl -s -X PUT --data-binary @app/services.xml \ "http://host:19071/application/v2/tenant/default/session/2/content/services.xml" ``` Get _services.xml_ from session 2 to validate: ``` $ curl -s -X GET "http://host:19071/application/v2/tenant/default/session/2/content/services.xml" ``` ``` ``` 12346 ``` ``` To add the file _files/test1.txt_, first create the directory, then add the file: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/content/files/" $ curl -s -X PUT --data-binary @app/files/test1.txt \ "http://host:19071/application/v2/tenant/default/session/2/content/files/test1.txt" ``` Prepare and activate the session: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/prepared?applicationName=fooapp" $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/active" ``` ### Rollback If you need to roll back to a previous version of the application package this can be achieved by creating a new session based on the previous known working version by passing the corresponding session-id in the _from_ argument, see [creating a session](#create-session) Also see [rollback](/en/applications/deployment.html#rollback). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [POST /application/v2/tenant/default/prepareandactivate](#prepareandactivate) - [POST /application/v2/tenant/default/session](#create-session) - [PUT /application/v2/tenant/default/session/[](#content-put) - [GET /application/v2/tenant/default/session/[](#content-get) - [DELETE /application/v2/tenant/default/session/[](#content-delete) - [PUT /application/v2/tenant/default/session/[](#prepare-session) - [GET /application/v2/tenant/default/session/[](#get-prepare-session) - [PUT /application/v2/tenant/default/session/[](#activate-session) - [GET /application/v2/tenant/default/application/](#get-application) - [GET /application/v2/tenant/default/application/default](#get-application-info) - [GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing](#reindexing) - [POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex](#reindex) - [PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex](#update-reindexing) - [GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/content/[](#get-application-content) - [DELETE /application/v2/tenant/default/application/default](#delete-application) - [GET /application/v2/host/[hostname]](#get-host-info) - [Error Handling](#error-handling) - [Access log](#access-log) - [Use Cases](#use-cases) - [Create, prepare and activate an application](#use-case-start) - [Modify the application package](#use-case-modify) - [Rollback](#rollback) --- # Source: https://docs.vespa.ai/en/operations/deployment-patterns.html.md # Deployment patterns Vespa Cloud's [automated deployments](automated-deployments.html)lets you design CD pipelines for staged rollouts and multi-zone deployments. This guide documents some of these patterns. ## Two regions, two AZs each, sequenced deployment This is the simplest pattern, deploy to a set of zones/regions, in a sequence: ![Two regions, two AZs each, sequenced deployment](/assets/img/pipeline-1.png) ``` aws-us-east-1c aws-use1-az4 aws-use2-az1 aws-use2-az3 ``` ## Two regions, two AZs each, parallel deployment Same as above, but deploying all zones in parallel: ![Two regions, two AZs each, parallel deployment](/assets/img/pipeline-2.png) ``` aws-us-east-1c aws-use1-az4 aws-use2-az1 aws-use2-az3 ``` ## Two regions, two AZs each, parallel deployment inside region Deploy to the use1 region first, both AZs in parallel, then the use2 region, both AZs in parallel: ![Two regions, two AZs each, parallel deployment inside region](/assets/img/pipeline-3.png) ``` aws-us-east-1c aws-use1-az4 aws-use2-az1 aws-use2-az3 ``` ## Deploy to a test instance first Deploy to a (downscaled) instance first, and add a delay before propagating to later instances and zones. ![With a canary instance](/assets/img/canary-instance-one-app.png) ``` aws-use2-az1 aws-use2-az1 ``` ### Deployment variants [Deployment variants](deployment-variants.html) are useful to set up a downscaled instance. In [services.xml](../reference/applications/services/services.html), override settings per instance: ``` ``` ## Test and prod instances as separate applications In the section before, we modeled the test and prod app as one pipeline. This lets users halt the pipeline (using the delay) before prod propagation. In some cases, this is better modeled as different applications: - The CI pipeline is multistep, with approvals and use of different branches The below uses different _applications_ to model the flow, these are completely separate application instances. The application owner will model the flow in own tool, and orchestrate deployments to Vespa Cloud as fit: ![canary app](/assets/img/canaryapp.png) ![prod app](/assets/img/prodapp.png) The important point is, these are two _separate_ deploy commands to Vespa Cloud: ``` $ vespa config set application kkraunetenant1.canaryapp $ vespa prod deploy app ``` ``` aws-use2-az1 ``` ``` $ vespa config set application kkraunetenant1.prodapp $ vespa prod deploy app ``` ``` aws-use2-az1 ``` ## services.xml structure It is possible to split _services.xml_ to more file using includes: ``` ``` Note: The include-feature can not be used in combination with [deployment variants](#deployment-variants). ## Next reads - [Environments](environments.html) - [Zones](zones.html) - [Routing](endpoint-routing.html) Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Two regions, two AZs each, sequenced deployment](#two-regions-two-azs-each-sequenced-deployment) - [Two regions, two AZs each, parallel deployment](#two-regions-two-azs-each-parallel-deployment) - [Two regions, two AZs each, parallel deployment inside region](#two-regions-two-azs-each-parallel-deployment-inside-region) - [Deploy to a test instance first](#deploy-to-a-test-instance-first) - [Deployment variants](#deployment-variants) - [Test and prod instances as separate applications](#test-and-prod-instances-as-separate-applications) - [services.xml structure](#servicesxml-structure) - [Next reads](#next-reads) --- # Source: https://docs.vespa.ai/en/operations/deployment-variants.html.md # Instance, region, cloud and environment variants Sometimes it is useful to create configuration that varies depending on properties of the deployment, for example to set region specific endpoints of services used by [Searchers](/en/applications/searchers.html), or use smaller clusters for a "beta" instance. This is supported both for [services.xml](#services.xml-variants) and [query profiles](#query-profile-variants). ## services.xml variants [services.xml](../reference/applications/services/services.html) files support different configuration settings for different _tags_, _instances_, _environments_, _clouds_ and _regions_. To use this, import the _deploy_ namespace: ``` ``` ``` ``` Deploy directives are used to specify with which tags, and in which instance, environment, cloud and/or [region](https://cloud.vespa.ai/en/reference/zones) an XML element should be included: ``` ``` 2 ``` ``` The example above configures different node counts/configurations depending on the deployment target. Deploying the application in the _dev_ environment gives: ``` ``` 2 ``` ``` Whereas in `aws-us-west-2a` it is: ``` ``` 2 ``` ``` This can be used to modify any config by deployment target. The `deploy` directives have a set of override rules: - A directive specifying more conditions will override one specifying fewer. - Directives are inherited in child elements. - When multiple XML elements with the same name is specified (e.g. when specifying search or docproc chains), the _id_ attribute or the _idref_ attribute of the element is used together with the element name when applying directives. Some overrides are applied by default in some environments, see [environments](/en/operations/environments.html). Any override made explicitly for an environment will override the defaults for it. ### Specifying multiple targets More than one tag, instance, region or environment can be specified in the attribute, separated by space. Note that `tags` by default only apply in production instances, and are matched whenever the tags of the element and the tags of the instance intersect. To match tags in other environments, an explicit `deploy:environment` directive for that environment must also match. Use tags if you have a complex instance structure which you want config to vary by. The namespace can be applied to any element. Example: ``` ``` Hello from application config Hello from east colo! ``` ``` Above, the `container` element is configured for the 3 environments only (it will not apply to `dev`) - and in region `aws-us-east-1c`, the config is different. ## Query profile variants [Query profiles](/en/querying/query-profiles.html) support different configuration settings for different _instances_, _environments_ and _regions_ through [query profile variants](/en/querying/query-profiles.html#query-profile-variants). This allows you to set different query parameters for a query type depending on these deployment attributes. To use this feature, create a regular query profile variant with any of `instance`, `environment` and `region` as dimension names and let your query profile vary by that. For example: ``` ``` instance, environment, region My default value My beta value My dev value My main instance prod value ``` ``` You can pick and combine these dimensions in any way you want with other dimensions sent as query parameters, e.g: ``` ``` device, instance, usecase ``` ``` Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/applications/deployment.html.md # Source: https://docs.vespa.ai/en/reference/applications/deployment.html.md # deployment.xml reference _deployment.xml_ controls how an application is deployed. _deployment.xml_ is placed in the root of the [application package](../../basics/applications.html) and specifies which environments and regions the application is deployed to during [automated application deployment](../../operations/automated-deployments.html), as which application instances. Deployment progresses through the `test` and `staging` environments to the `prod` environments listed in _deployment.xml_. Simple example: ``` ``` aws-us-east-1c aws-us-west-2a ``` ``` More complex example: ``` ``` aws-us-east-1c aws-us-east-1c aws-us-west-1c aws-eu-west-1a aws-us-west-2a aws-us-east-1c beta ``` ``` Some of the elements can be declared _either_ under the `` root, **or**, if one or more `` tags are listed, under these. These have a bold **or** when listing where they may be present. ## deployment The root element. | Attribute | Mandatory | Values | | --- | --- | --- | | version | Yes | 1.0 | | major-version | No | The major version number this application is valid for. | | cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). | ## instance In `` or `` (which must be a direct descendant of the root). An instance of the application; several of these may be simultaneously deployed in the same zone. If no `` is specified, all children of the root are implicitly children of an `` with `id="default"`, as in the simple example at the top. | Attribute | Mandatory | Values | | --- | --- | --- | | id | Yes | The unique name of the instance. | | tags | No | Space-separated tags which can be referenced to make [deployment variants](../../operations/deployment-variants.html). | | cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. | ## block-change In ``, **or** ``. This blocks changes from being deployed to production in the matching time interval. Changes are nevertheless tested while blocked. By default, both application revision changes and Vespa platform changes (upgrades) are blocked. It is possible to block just one kind of change using the `revision` and `version` attributes. Any combination of the attributes below can be specified. Changes on a given date will be blocked if all conditions are met. Invalid `` tags (i.e. that contains conditions that never match an actual date) are rejected by the system. This tag must be placed after any `` and `` tags, and before ``. It can be declared multiple times. | Attribute | Mandatory | Values | | --- | --- | --- | | revision | No, default `true` | Set to `false` to allow application deployments | | version | No, default `true` | Set to `false` to allow Vespa platform upgrades | | days | No, default `mon-sun` | List of days this block is effective - a comma-separated list of single days or day intervals where the start and end day are separated by a dash and are inclusive. Each day is identified by its english name or three-letter abbreviation. | | hours | No, default `0-23` | List of hours this block is effective - a comma-separated list of single hours or hour intervals where the start and end hour are separated by a dash and are inclusive. Each hour is identified by a number in the range 0 to 23. | | time-zone | No, default UTC | The name of the time zone used to interpret the hours attribute. Time zones are full names or short forms, when the latter is unambiguous. See [ZoneId.of](https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#of-java.lang.String-) for the full spec of acceptable values. | | from-date | No | The inclusive starting date of this block (ISO-8601, `YYYY-MM-DD`). | | to-date | No | The inclusive ending date of this block (ISO-8601, `YYYY-MM-DD`). | The below example blocks all changes on weekends, and blocks revisions outside working hours, in the PST time zone: ``` ``` ``` ``` The below example blocks: - all changes on Sundays starting on 2022-03-01 - all changes in the hours 16-23 between 2022-02-10 and 2022-02-15 - all changes until 2022-01-05 ``` ``` ``` ``` ## upgrade In ``, or ``. Determines the strategy for upgrading the application, or one of its instances. By default, application revision changes and Vespa platform changes are deployed separately. The exception is when an upgrade fails; then, the latest application revision is deployed together with the upgrade, as these may be necessary to fix the upgrade failure. | Attribute | Mandatory | Values | | --- | --- | --- | | rollout | No, default `separate` | - `separate` is the default. When a revision catches up to a platform upgrade, it stays behind, unless the upgrade alone fails. - `simultaneous` favors revision roll-out. When a revision catches up to a platform upgrade, it joins, and then passes the upgrade. | | revision-target | No, default `latest` | - `latest` is the default. When rolling out a new revision to an instance, the latest available revision is chosen. - `next` trades speed for smaller changes. When rolling out a new revision to an instance, the next available revision is chosen. The available revisions for an instance are revisions which are not yet deployed, or revisions which have rolled out in previous instances. | | revision-change | No, default `when-failing` | - `always` is the most aggressive setting. A new, available revision may always replace the one which is currently rolling out. - `when-failing` is the default. A new, available revision may replace the one which is currently rolling out if this is failing. - `when-clear` is the most conservative setting. A new, available revision may never replace one which is currently rolling out. Revision targets will never automatically change inside [revision block window](#block-change), but may be set by manual intervention at any time. | | max-risk | No, default `0` | May only be used with `revision-change="when-clear"` and `revision-target="next"`. The maximum amount of _risk_ to roll out per new revision target. The default of `0` results in the next build always being chosen, while a higher value allows skipping intermediate builds, as long as the cumulative risk does not exceed what is configured here. | | min-risk | No, default `0` | Must be less than or equal to the configured `max-risk`. The minimum amount of _risk_ to start rolling out a new revision. The default of `0` results in a new revision rolling out as soon as anything is ready, while a higher value lets the system wait until enough cumulative risk is available. This can be used to avoid blocking a lengthy deployment process with trivial changes. | | max-idle-hours | No, default `8` | May only be used when `min-risk` is specified, and greater than `0`. The maximum number of hours to wait for enough cumulative risk to be available, before rolling out a new revision. | ## test Meaning depends on where it is located: | Parent | Description | | --- | --- | | `` `` | If present, the application is deployed to the [`test`](../../operations/environments.html#test) environment, and system tested there, even if no prod zones are deployed to. Also, when specified, system tests _must_ be present in the application test package. See guides for [getting to production](../../operations/production-deployment.html). If present in an `` element, system tests are run for that specific instance before any production deployments of the instance may proceed — otherwise, previous system tests for any instance are acceptable. | | `` `` `` | If present, production tests are run against the production region with id contained in this element. A test must be _after_ a corresponding [region](#region) element. When specified, production tests _must_ be preset in the application test package. See guides for [getting to production](../../operations/production-deployment.html). | | Attribute | Mandatory | Values | | --- | --- | --- | | cloud-account | No | For [system tests](../../operations/automated-deployments.html#system-tests) only: account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. Cloud account _must not_ be specified for [production tests](../../operations/automated-deployments.html#production-tests), which always run in the account of the corresponding deployment. | ## staging In ``, or ``. If present, the application is deployed to the[`staging`](../../operations/environments.html#staging) environment, and tested there, even if no prod zones are deployed to. If present in an `` element, staging tests are run for that specific instance before any production deployments of the instance may proceed — otherwise, previous staging tests for any instance are acceptable. When specified, staging tests _must_ be preset in the application test package. See guides for [getting to production](../../operations/production-deployment.html). | Attribute | Mandatory | Values | | --- | --- | --- | | cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. | ## prod In ``, **or** in ``. If present, the application is deployed to the production regions listed inside this element, under the specified instance, after deployments and tests in the `test` and `staging` environments. | Attribute | Mandatory | Values | | --- | --- | --- | | cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. | ## region In ``, ``, ``, or ``. The application is deployed to the production[region](../../operations/zones.html) with id contained in this element. | Attribute | Mandatory | Values | | --- | --- | --- | | fraction | No | Only when this region is inside a group: The fractional membership in the group. | | cloud-account | No | Account to deploy to with [Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. | ## dev In ``. Optionally used to control deployment settings for the [dev environment](../../operations/environments.html). This can be used specify a different cloud account, tags, and private endpoints. | Attribute | Mandatory | Values | | --- | --- | --- | | tags | No | Space-separated tags which can be referenced to make [deployment variants](../../operations/deployment-variants.html). | | cloud-account | No | Account to deploy to with [Vespa Cloud Enclave](../../operations/enclave/enclave). Overrides parent's use of cloud-account. | ## delay In ``, ``, ``, ``, or ``. Introduces a delay which must pass after completion of all previous steps, before subsequent steps may proceed. This may be useful to allow some grace time to discover errors before deploying a change in additional zones, or to gather higher-level metrics for a production deployment for a while, before evaluating these in a production test. The maximum total delay for the whole deployment spec is 48 hours. The delay is specified by any combination of the `hours`, `minutes` and `seconds` attributes. ## parallel In ``, ``, or ``. Runs the contained steps in parallel: instances if in ``, or primitive steps (deployments, tests or delays) or a series of these (see [steps](#steps)) otherwise. Multiple `` elements are permitted. The following example will deploy to `us-west-1` first, then to `us-east-3` and `us-central-1`simultaneously, and, finally to `eu-west-1`, once both parallel deployments have completed: ``` ``` us-west-1 us-east-3 us-central-1 eu-west-1 ``` ``` ## steps In ``. Runs the contained parallel or primitive steps (deployments, tests or delays) serially. The following example will in parallel: 1. deploy to `us-east-3`, 2. deploy to `us-west-1`, then delay 1 hour, and run tests for `us-west-1`, and 3. delay for two hours. Thus, the parallel block is complete when both deployments are complete, tests are successful for the second deployment, and at least two hours have passed since the block began executing. ``` ``` us-east-3 us-west-1 us-west-1 ``` ``` ## tester In ``, `` and ``. Specifies container settings for the tester application container, which is used to run system, staging and production verification tests. The allowed elements inside this are [``](../applications/services/services.html#nodes). ``` ``` ``` ``` ## endpoints (global) In ``, without any ``declared **or** in ``: This allows_global_ endpoints, via one or more [``](#endpoint-global) elements; and [zone endpoint](#endpoint-zone) and [private endpoint](#endpoint-private)elements for cloud-native private network configuration. ## endpoints (dev) In ``. This allows[zone endpoint](#endpoint-zone) elements for cloud-native private network configuration for[dev](../../operations/environments.html#dev) deployments. Note that [private endpoints](#endpoint-private) are only supported in `prod`. ## endpoint (global) In `` or ``. Specifies a global endpoint for this application. Each endpoint will point to the regions that are declared in the endpoint. If no regions are specified, the endpoint defaults to the regions declared in the `` element. The following example creates a default endpoint to all regions, and a _us_ endpoint pointing only to US regions. ``` ``` aws-us-east-1c aws-us-west-2a ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | id | No | The identifier for the endpoint. This will be part of the endpoint name that is generated. If not specified, the endpoint will be the default global endpoint for the application. | | container-id | Yes | The id of the [container cluster](/en/reference/applications/services/container.html) to which requests to the global endpoint is forwarded. | Global endpoints are implemented using Route 53 and healthchecks, to keep active zones in rotation. See [BCP](#bcp) for advanced configurations. ## endpoint (zone) In `` or ``, with `type='zone'`. Used to disable public zone endpoints. _Non-public endpoints can not be used in global endpoints, which require that all constituent endpoints are public._The example disables the public zone endpoint for the `my-container`container cluster in all regions, except where it is explicitly enabled, in `region-1`. Changing endpoint visibility will make the service unavailable for a short period of time. ``` ``` region-1 ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | type | Yes | Private endpoints are specified with `type='zone'`. | | container-id | Yes | The id of the [container cluster](/en/reference/applications/services/container.html) to disable public endpoints for. | | enabled | No | Whether a public endpoint for this container cluster should be enabled; default `true`. | ## endpoint (private) In `` or ``, with `type='private'`. Specifies a private endpoint service for this application. Each service will be launched in the regions that are declared in the endpoint. If no regions are specified, the service is launched in all regions declared in the`` element, that support any of the declared [access types](#allow). The following example creates a private endpoint in two specific regions. ``` ``` aws-us-east-1c gcp-us-central1-f ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | type | Yes | Private endpoints are specified with `type='private'`. | | container-id | Yes | The id of the [container cluster](/en/reference/applications/services/container.html) to which requests to the private endpoint service is forwarded. | | auth-method | No | The authentication method to use with this [private endpoint](/en/operations/private-endpoints.html). Must be either `mtls` or `token`. Defaults to mTLS if not included. | ## allow In ``. Allows a principal identified by the URN to set up a connection to the declared private endpoint service. This element must be repeated for each additional URN. An endpoint service will only consider allowed URNs of a compatible type, and will only be created if at least one compatible access type-and-URN is given: - For AWS deployments, specify `aws-private-link`, and an _ARN_. - For GCP deployments, specify `gcp-service-connect`, and a _project ID_ ``` ``` ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | with | Yes | The private endpoint access type; must be `aws-private-link` or `gcp-service-connect`. | | arn | Maybe | Must be specified with `aws-private-link`. See [AWS documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html) for more details. | | project | Maybe | Must be specified with `gcp-service-connect`. See [GCP documentation](https://cloud.google.com/vpc/docs/configure-private-service-connect-services) for more details. | ## bcp In `` or ``. Defines the BCP (Business Continuity Planning) structure of this instance: Which zones should take over for which others during the outage of a zone and how fast they must have the capacity ready. Autoscaling uses this information to decide the ideal cpu load of a zone. If this element is not defined, it is assumed that all regions covers for an equal share of the traffic of all other regions and must have that capacity ready at all times. If a bcp element is specified at the root, and explicit instances are used, that bcp element becomes the default for all instances that does not contain a bcp element themselves. If a BCP element contains no group elements it will implicitly define a single group of all the regions of the instance in which it is used. See [BCP test](https://cloud.vespa.ai/en/reference/bcp-test.html) for a procedure to verify that your BCP configuration is correct. | Attribute | Mandatory | Values | | --- | --- | --- | | deadline | No | The max time after a region becomes unreachable until the other regions in its BCP group must be able to handle the traffic of it, given as a number of minutes followed by 'm', 'h' or 'd' (for minutes, hours or days). The default deadline is 0: Regions must at all times have capacity to handle BCP traffic immediately. By providing a deadline, autoscaling can avoid the cost of provisioning additional resources for BCP capacity if it predicts that it can grow to handle the traffic faster than the deadline in a given cluster. This is the default deadline to be used for all groups that don't specify one themselves. | Example: ``` ``` us-east1 us-east2 us-central1 us-west1 us-west2 us-central1 ``` ``` ## group In ``. Defines a bcp group: A set of regions whose members cover for each other during a regional outage. Each region in a group will (as allowed, when autoscaling ranges are configured) provision resources sufficient to handle that any other single region in the group goes down. The traffic of the region is assumed to be rerouted in equal amount to the remaining regions in the group. That is, if a group has one member, no resources will be provisioned to handle an outage in that member. If a group has two members, each will aim to provision sufficient resources to handle the actual traffic of the other. If a group has three members, each will provision to handle half of the traffic observed in the region among the two others which receives the most traffic. A region may have fractional membership in multiple groups, meaning it will handle just that fraction of the traffic of the remaining members, and vice versa. A regions total membership among groups must always sum to exactly 1. A group may also define global endpoints for the region members in the group. This is exactly the same as defining the endpoint separately and repeating the regions of the group under the endpoint. Endpoints under a group cannot contain explicit region sub-elements. | Attribute | Mandatory | Values | | --- | --- | --- | | deadline | No | The deadline of this BCP group. See deadline on the BCP element. | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [deployment](#deployment) - [instance](#instance) - [block-change](#block-change) - [upgrade](#upgrade) - [test](#test) - [staging](#staging) - [prod](#prod) - [region](#region) - [dev](#dev) - [delay](#delay) - [parallel](#parallel) - [steps](#steps) - [tester](#tester) - [endpoints (global)](#endpoints-global) - [endpoints (dev)](#endpoints-dev) - [endpoint (global)](#endpoint-global) - [endpoint (zone)](#endpoint-zone) - [endpoint (private)](#endpoint-private) - [allow](#allow) - [bcp](#bcp) - [group](#group) --- # Source: https://docs.vespa.ai/en/applications/developer-guide.html.md # Developer Guide This document explains how to develop applications, including basic terminology, tips on using the Vespa Cloud Console, and how to benchmark and size your application. See [deploy a sample application](../basics/deploy-an-application.html) to deploy a basic sample application, and[automated deployments](../operations/automated-deployments.html) on making production deployments safe routine occurences. ## Manual deployments Developers will typically deploy their application to the `dev` [zone](../operations/zones.html) during development. Each deployment is owned by a _tenant_, and each specified _instance_ is a separate copy of the application; this lets developers work on independent copies of the same application, or collaborate on a shared one, as they prefer—more details [here](../learn/tenant-apps-instances.html). These values can be set in the Vespa Cloud UI when deploying, or with each of the build and deploy tools, as shown in the respective getting-started guides. Additionally, a deployment may specify a different [zone](../operations/zones.html) to deploy to, instead of the default `dev` zone. ### Auto downsizing Deployments to `dev` are downscaled to one small node by default, so that applications can be deployed there without changing `services.xml`. See [performance testing](#performance-testing) for how to disable auto downsizing using `deploy:environment="dev"`. ### Availability The `dev` zone is a sandbox and not for production serving; It has no uptime guarantees. An automated Vespa software upgrade can be triggered at any time, and this may lead to some downtime if you have only one node per cluster (as with the default [auto downsizing](#auto-downsizing)). ## Performance testing For performance testing, to avoid auto downsizing, lock the [resources](../reference/applications/services/services.html) using `deploy:environment="dev"`: ``` ``` ``` ``` Read more in [benchmarking](../performance/benchmarking-cloud.html) and [variants in services.xml](../operations/deployment-variants.html). ## Component overview ![Vespa Overview](/assets/img/vespa-overview.svg) Application packages can contain Java components to be run in container clusters. The most common component types are: - [Searchers](searchers.html), which can modify or build the query, modify the result, implement workflows issuing multiple queries etc. - [Document processors](document-processors.html) that can modify incoming write operations. - [Handlers](request-handlers.html) that can implement custom web service APIs. - [Renderers](result-renderers.html) that are used to define custom result formats. Components are constructed by dependency injection and are reloaded safely on deployment without restarts. See the [container documentation](containers.html) for more details. See [deploy an application having Java components](../basics/deploy-an-application-java.html), and [troubleshooting](../operations/self-managed/admin-procedures.html#troubleshooting). ## Developing Components The development cycle consists of creating the component, deploying the application package to Vespa, writing tests, and iterating. These steps refer to files in [album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java): | Build | All the Vespa sample applications use the [bundle plugin](bundles.html#maven-bundle-plugin) to build the components. | | | Configure | A key Vespa feature is code and configuration consistency, deployed using an [application package](../basics/applications.html). This ensures that code and configuration is in sync, and loaded atomically when deployed. This is done by generating config classes from config definition files. In Vespa and application code, configuration is therefore accessed through generated config classes. The Maven target `generate-sources` (invoked by `mvn install`) uses [metal-names.def](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation-java/src/main/resources/configdefinitions/metal-names.def) to generate `target/generated-sources/vespa-configgen-plugin/com/mydomain/example/MetalNamesConfig.java`. After generating config classes, they will resolve in tools like [IntelliJ IDEA](https://www.jetbrains.com/idea/download/). | | | Tests | Examples unit tests are found in [MetalSearcherTest.java](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation-java/src/test/java/ai/vespa/example/album/MetalSearcherTest.java). `testAddedOrTerm1` and `testAddedOrTerm2` illustrates two ways of doing the same test: - The first setting up the minimal search chain for [YQL](../querying/query-language.html) programmatically - The second uses ` com.yahoo.application.Application`, which sets up the application package and simplifies testing Read more in [unit testing](unit-testing.html). | ## Debugging Components **Important:** The debugging procedure only works for endpoints with an open debug port - most managed services don't do this for security reasons. Vespa Cloud does not allow debugging over the _Java Debug Wire Protocol (JDWP)_ due to the protocol's inherent lack of security measures. If you need interactive debugging, deploy your application to a self-hosted Vespa installation (below) and manually [add the _JDWP_ agent to JVM options](#debugging-components). You may debug your Java code by requesting either a JVM heap dump or a Java Flight Recorder recording through the [Vespa Cloud Console](https://console.vespa-cloud.com/). Go to your application's cluster overview and select _export JVM artifact_ on any _container_ node. The process will take up to a few minutes. You'll find the steps to download the dump on the Console once it's completed. Extract the files from the downloaded Zstandard-compressed archive, and use the free [JDK Mission Control](https://www.oracle.com/java/technologies/jdk-mission-control.html) utility to inspect the dump/recording. ![Generate JVM dump](/assets/img/jvm-dump.png) To debug a [Searcher](searchers.html) / [Document Processor](document-processors.html) / [Component](components.html) running in a self-hosted container, set up a remote debugging configuration in the IDEA - IntelliJ example: 1. Run -\> Edit Configurations... 2. Click `+` to add a new configuration. 3. Select the "Remote JVM Debug" option in the left-most pane. 4. Set hostname to the host running the container, change the port if needed. 5. Set the container's [jvm options](../reference/applications/services/container.html#jvm) to the value in "Command line arguments for remote JVM": ``` \ ``` 6. Re-deploy the application, then restart Vespa on the node that runs the container. Make sure the port is published if using a Docker/Podman container, e.g.: ``` $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071--publish 127.0.0.1:5005:5005\ vespaengine/vespa ``` 7. Start debugging! Check _vespa.log_ for errors. [![Video thumbnail](/assets/img/video-thumbs/deploying-a-vespa-searcher.png)](https://www.youtube.com/embed/dUCLKtNchuE)  **Vespa videos:** Find _Debugging a Vespa Searcher_ in the vespaengine [youtube channel](https://www.youtube.com/@vespaai)! ## Developing system and staging tests When using Vespa Cloud, system and tests are most easily developed using a test deployment in a `dev` zone to run the tests against. Refer to [general testing guide](testing.html) for a discussion of the different test types, and the [basic HTTP tests](../reference/applications/testing.html) or [Java JUnit tests](../reference/applications/testing-java.html) reference for how to write the relevant tests. If using the [Vespa CLI](../clients/vespa-cli.html) to deploy and run [basic HTTP tests](../reference/applications/testing.html), the same commands as in the test reference will just work, provided the CLI is configured to use the `cloud` target. ### Running Java tests With Maven, and [Java Junit tests](../reference/applications/testing-java.html), some additional configuration is required, to infuse the test runtime on the local machine with API and data plane credentials: ``` $ mvn test \ -D test.categories=system \ -D dataPlaneKeyFile=data-plane-private-key.pem -D dataPlaneCertificateFile=data-plane-public-cert.pem \ -D apiKey="$API_KEY" ``` The `apiKey` is used to fetch the _dev_ instance's endpoints. The data plane key and certificate pair is used by [ai.vespa.hosted.cd.Endpoint](https://github.com/vespa-engine/vespa/blob/master/tenant-cd-api/src/main/java/ai/vespa/hosted/cd/Endpoint.java) to access the application endpoint. Note that the `-D vespa.test.config` argument is gone; this configuration is automatically fetched from the Vespa Cloud API—hence the need for the API key. When running Vespa self-hosted like in the [sample application](../basics/deploy-an-application-local.html), no authentication is required by default, to either API or container, and specifying a data plane key and certificate will instead cause the test to fail, since the correct SSL context is the Java default in this case. Make sure the TestRuntime is able to start. As it will init an SSL context, make sure to remove config when running locally, in order to use a default context. Remove properties from _pom.xml_ and IDE debug configuration. Developers can also set these parameters in the IDE run configuration to debug system tests: ``` -D test.categories=system -D tenant=my_tenant -D application=my_app -D instance=my_instance -D apiKeyFile=/path/to/myname.mytenant.pem -D dataPlaneCertificateFile=data-plane-public-cert.pem -D dataPlaneKeyFile=data-plane-private-key.pem ``` ## Tips and troubleshooting - Vespa Cloud upgrades daily, and applications in `dev` also have their Vespa platform upgraded. This usually happens at the opposite time of day of when deployments are made to each instance, and takes some minutes. Deployments without redundancy will be unavailable during the upgrade. - Failure to deploy, due to authentication (HTTP code 401) or authorization (HTTP code 403), is most often due to wrong configuration of `tenant` and/or `application`, when using command line tools to deploy. Ensure the values set with Vespa CLI or in `pom.xml` match what is configured in the UI. - In case of data plane failure, remember to copy the public certificate to `src/main/application/security/clients.pem` before building and deploying. This is handled by the Vespa CLI `vespa auth cert` command. - To run Java [system and staging tests](../reference/applications/testing-java.html) in an IDE, ensure all API and data plane keys and certificates are configured in the IDE as well; not all IDEs pick up all settings from `pom.xml` correctly: Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Manual deployments](#manual-deployments) - [Auto downsizing](#auto-downsizing) - [Availability](#availability) - [Performance testing](#performance-testing) - [Component overview](#component-overview) - [Developing Components](#developing-components) - [Debugging Components](#debugging-components) - [Developing system and staging tests](#developing-system-and-staging-tests) - [Running Java tests](#running-java-tests) - [Tips and troubleshooting](#tips-and-troubleshooting) --- # Source: https://docs.vespa.ai/en/reference/operations/metrics/distributor.html.md # Distributor Metrics | Name | Unit | Description | | --- | --- | --- | | vds.idealstate.buckets\_rechecking | bucket | The number of buckets that we are rechecking for ideal state operations | | vds.idealstate.idealstate\_diff | bucket | A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state | | vds.idealstate.buckets\_toofewcopies | bucket | The number of buckets the distributor controls that have less than the desired redundancy | | vds.idealstate.buckets\_toomanycopies | bucket | The number of buckets the distributor controls that have more than the desired redundancy | | vds.idealstate.buckets | bucket | The number of buckets the distributor controls | | vds.idealstate.buckets\_notrusted | bucket | The number of buckets that have no trusted copies. | | vds.idealstate.bucket\_replicas\_moving\_out | bucket | Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. | | vds.idealstate.bucket\_replicas\_copying\_out | bucket | Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge | | vds.idealstate.bucket\_replicas\_copying\_in | bucket | Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for | | vds.idealstate.bucket\_replicas\_syncing | bucket | Bucket replicas that need syncing due to mismatching metadata | | vds.idealstate.max\_observed\_time\_since\_last\_gc\_sec | second | Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. | | vds.idealstate.delete\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.delete\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.delete\_bucket.pending | operation | The number of operations pending | | vds.idealstate.delete\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.delete\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.merge\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.merge\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.merge\_bucket.pending | operation | The number of operations pending | | vds.idealstate.merge\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.merge\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.merge\_bucket.source\_only\_copy\_changed | operation | The number of merge operations where source-only copy changed | | vds.idealstate.merge\_bucket.source\_only\_copy\_delete\_blocked | operation | The number of merge operations where delete of unchanged source-only copies was blocked | | vds.idealstate.merge\_bucket.source\_only\_copy\_delete\_failed | operation | The number of merge operations where delete of unchanged source-only copies failed | | vds.idealstate.split\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.split\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.split\_bucket.pending | operation | The number of operations pending | | vds.idealstate.split\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.split\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.join\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.join\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.join\_bucket.pending | operation | The number of operations pending | | vds.idealstate.join\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.join\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.garbage\_collection.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.garbage\_collection.done\_failed | operation | The number of operations that failed | | vds.idealstate.garbage\_collection.pending | operation | The number of operations pending | | vds.idealstate.garbage\_collection.documents\_removed | document | Number of documents removed by GC operations | | vds.idealstate.garbage\_collection.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.garbage\_collection.throttled | operation | The number of operations throttled by throttling operation starter | | vds.distributor.puts.latency | millisecond | The latency of put operations | | vds.distributor.puts.ok | operation | The number of successful put operations performed | | vds.distributor.puts.failures.total | operation | Sum of all failures | | vds.distributor.puts.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.puts.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.puts.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.puts.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.puts.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.puts.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.puts.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.puts.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.puts.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.puts.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.puts.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.removes.latency | millisecond | The latency of remove operations | | vds.distributor.removes.ok | operation | The number of successful removes operations performed | | vds.distributor.removes.failures.total | operation | Sum of all failures | | vds.distributor.removes.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.removes.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.removes.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.removes.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.removes.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.removes.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.removes.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.removes.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.removes.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.removes.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.removes.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.updates.latency | millisecond | The latency of update operations | | vds.distributor.updates.ok | operation | The number of successful updates operations performed | | vds.distributor.updates.failures.total | operation | Sum of all failures | | vds.distributor.updates.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.updates.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.updates.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.updates.diverging\_timestamp\_updates | operation | Number of updates that report they were performed against divergent version timestamps on different replicas | | vds.distributor.updates.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.updates.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.updates.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.updates.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.updates.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.updates.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.updates.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.updates.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.updates.fast\_path\_restarts | operation | Number of safe path (write repair) updates that were restarted as fast path updates because all replicas returned documents with the same timestamp in the initial read phase | | vds.distributor.removelocations.ok | operation | The number of successful removelocations operations performed | | vds.distributor.removelocations.failures.total | operation | Sum of all failures | | vds.distributor.removelocations.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.removelocations.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.removelocations.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.removelocations.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.removelocations.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.removelocations.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.removelocations.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.removelocations.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.removelocations.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.removelocations.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.removelocations.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.removelocations.latency | millisecond | The average latency of removelocations operations | | vds.distributor.gets.latency | millisecond | The average latency of gets operations | | vds.distributor.gets.ok | operation | The number of successful gets operations performed | | vds.distributor.gets.failures.total | operation | Sum of all failures | | vds.distributor.gets.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.gets.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.gets.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.gets.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.gets.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.gets.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.gets.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.gets.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.gets.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.gets.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.gets.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.visitor.latency | millisecond | The average latency of visitor operations | | vds.distributor.visitor.ok | operation | The number of successful visitor operations performed | | vds.distributor.visitor.failures.total | operation | Sum of all failures | | vds.distributor.visitor.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.visitor.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.visitor.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.visitor.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.visitor.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.visitor.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.visitor.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.visitor.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.visitor.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.visitor.bytes\_per\_visitor | operation | The number of bytes visited on content nodes as part of a single client visitor command | | vds.distributor.visitor.docs\_per\_visitor | operation | The number of documents visited on content nodes as part of a single client visitor command | | vds.distributor.visitor.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.visitor.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.docsstored | document | Number of documents stored in all buckets controlled by this distributor | | vds.distributor.bytesstored | byte | Number of bytes stored in all buckets controlled by this distributor | | metricmanager.periodichooklatency | millisecond | Time in ms used to update a single periodic hook | | metricmanager.resetlatency | millisecond | Time in ms used to reset all metrics. | | metricmanager.sleeptime | millisecond | Time in ms worker thread is sleeping | | metricmanager.snapshothooklatency | millisecond | Time in ms used to update a single snapshot hook | | metricmanager.snapshotlatency | millisecond | Time in ms used to take a snapshot | | vds.distributor.activate\_cluster\_state\_processing\_time | millisecond | Elapsed time where the distributor thread is blocked on merging pending bucket info into its bucket database upon activating a cluster state | | vds.distributor.bucket\_db.memory\_usage.allocated\_bytes | byte | The number of allocated bytes | | vds.distributor.bucket\_db.memory\_usage.dead\_bytes | byte | The number of dead bytes (\<= used\_bytes) | | vds.distributor.bucket\_db.memory\_usage.onhold\_bytes | byte | The number of bytes on hold | | vds.distributor.bucket\_db.memory\_usage.used\_bytes | byte | The number of used bytes (\<= allocated\_bytes) | | vds.distributor.getbucketlists.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.getbucketlists.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.getbucketlists.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.getbucketlists.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.getbucketlists.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.getbucketlists.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.getbucketlists.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.getbucketlists.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.getbucketlists.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.getbucketlists.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.getbucketlists.failures.total | operation | Total number of failures | | vds.distributor.getbucketlists.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.getbucketlists.latency | millisecond | The average latency of getbucketlists operations | | vds.distributor.getbucketlists.ok | operation | The number of successful getbucketlists operations performed | | vds.distributor.recoverymodeschedulingtime | millisecond | Time spent scheduling operations in recovery mode after receiving new cluster state | | vds.distributor.set\_cluster\_state\_processing\_time | millisecond | Elapsed time where the distributor thread is blocked on processing its bucket database upon receiving a new cluster state | | vds.distributor.state\_transition\_time | millisecond | Time it takes to complete a cluster state transition. If a state transition is preempted before completing, its elapsed time is counted as part of the total time spent for the final, completed state transition | | vds.distributor.stats.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.stats.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.stats.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.stats.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.stats.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.stats.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.stats.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.stats.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.stats.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.stats.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.stats.failures.total | operation | The total number of failures | | vds.distributor.stats.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.stats.latency | millisecond | The average latency of stats operations | | vds.distributor.stats.ok | operation | The number of successful stats operations performed | | vds.distributor.update\_gets.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.update\_gets.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.update\_gets.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.update\_gets.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.update\_gets.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.update\_gets.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.update\_gets.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.update\_gets.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.update\_gets.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.update\_gets.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.update\_gets.failures.total | operation | The total number of failures | | vds.distributor.update\_gets.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.update\_gets.latency | millisecond | The average latency of update\_gets operations | | vds.distributor.update\_gets.ok | operation | The number of successful update\_gets operations performed | | vds.distributor.update\_metadata\_gets.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.update\_metadata\_gets.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.update\_metadata\_gets.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.update\_metadata\_gets.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.update\_metadata\_gets.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.update\_metadata\_gets.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.update\_metadata\_gets.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.update\_metadata\_gets.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.update\_metadata\_gets.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.update\_metadata\_gets.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.update\_metadata\_gets.failures.total | operation | The total number of failures | | vds.distributor.update\_metadata\_gets.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.update\_metadata\_gets.latency | millisecond | The average latency of update\_metadata\_gets operations | | vds.distributor.update\_metadata\_gets.ok | operation | The number of successful update\_metadata\_gets operations performed | | vds.distributor.update\_puts.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.update\_puts.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.update\_puts.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.update\_puts.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.update\_puts.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.update\_puts.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.update\_puts.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.update\_puts.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.update\_puts.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.update\_puts.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.update\_puts.failures.total | operation | The total number of put failures | | vds.distributor.update\_puts.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.update\_puts.latency | millisecond | The average latency of update\_puts operations | | vds.distributor.update\_puts.ok | operation | The number of successful update\_puts operations performed | | vds.distributor.mutating\_op\_memory\_usage | byte | Estimated amount of memory used by active mutating operations across all distributor stripes, in bytes | | vds.idealstate.nodes\_per\_merge | node | The number of nodes involved in a single merge operation. | | vds.idealstate.set\_bucket\_state.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.set\_bucket\_state.done\_failed | operation | The number of operations that failed | | vds.idealstate.set\_bucket\_state.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.set\_bucket\_state.pending | operation | The number of operations pending | | vds.idealstate.set\_bucket\_state.throttled | operation | The number of operations throttled by throttling operation starter | | vds.bouncer.clock\_skew\_aborts | operation | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range | Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/self-managed/docker-containers.html.md # Docker containers This document describes tuning and adaptions for running Vespa Docker containers, for developer use on laptop, and in production. ## Mounting persistent volumes The [quick start](../../basics/deploy-an-application-local.html) and [AWS ECS multinode](multinode-systems.html#aws-ecs) guides show how to run Vespa in Docker containers. In these examples, all the data is stored inside the container - the data is lost if the container is deleted. When running Vespa inside Docker containers in production, volume mappings to the parent host should be added to persist data and logs. - /opt/vespa/var - /opt/vespa/logs ``` $ mkdir -p /tmp/vespa/var; export VESPA_VAR_STORAGE=/tmp/vespa/var $ mkdir -p /tmp/vespa/logs; export VESPA_LOG_STORAGE=/tmp/vespa/logs $ docker run --detach --name vespa --hostname vespa-container \ --volume $VESPA_VAR_STORAGE:/opt/vespa/var \ --volume $VESPA_LOG_STORAGE:/opt/vespa/logs \ --publish 8080:8080 \ vespaengine/vespa ``` ## Start Vespa container with Vespa user You can start the container directly as the _vespa_ user. The _vespa_ user and group within the container are configured with user id _1000_ and group id _1000_. The vespa user and group must be the owner of the _/opt/vespa/var_ and _/opt/vespa/logs_ volumes that are mounted in the container for Vespa to start. This is required for Vespa to create the required directories and files within those directories. The start script will check that the correct owner uid and gid are set and fail if the wrong user or group is set as the owner. When using an isolated user namespace for the Vespa container, you must set the uid and gid of the directories on the host to the subordinate uid and gid, depending on your mapping. See the [Docker documentation](https://docs.docker.com/engine/security/userns-remap/) for more details. ``` $ mkdir -p /tmp/vespa/var; export VESPA_VAR_STORAGE=/tmp/vespa/var $ mkdir -p /tmp/vespa/logs; export VESPA_LOG_STORAGE=/tmp/vespa/logs $ sudo chown -R 1000:1000 $VESPA_VAR_STORAGE $VESPA_LOG_STORAGE $ docker run --detach --name vespa --user vespa:vespa --hostname vespa-container \ --volume $VESPA_VAR_STORAGE:/opt/vespa/var \ --volume $VESPA_LOG_STORAGE:/opt/vespa/logs \ --publish 8080:8080 \ vespaengine/vespa ``` ## System limits When Vespa starts inside Docker containers, the startup scripts will set [system limits](files-processes-and-ports.html#vespa-system-limits). Make sure that the environment starting the Docker engine is set up in such a way that these limits can be set inside the containers. For a CentOS/RHEL base host, Docker is usually started by [systemd](https://www.freedesktop.org/software/systemd/man/systemd.exec.html). In this case, `LimitNOFILE`, `LimitNPROC` and `LimitCORE` should be set to meet the minimum requirements in [system limits](files-processes-and-ports.html#vespa-system-limits). In general, when using Docker or Podman to run Vespa, the `--ulimit` option should be used to set limits according to [system limits](files-processes-and-ports.html#vespa-system-limits). The `--pids-limit` should be set to unlimited (`-1` for Docker and `0` for Podman). ## Transparent Huge Pages Vespa performance improves significantly by enabling [Transparent Huge Pages (THP)](https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html), especially for memory-intensive applications with large dense tensors with concurrent query and write workloads. One application improved query p99 latency from 950 ms to 150 ms during concurrent query and write by enabling THP. Using THP is even more important when running in virtualized environments like AWS and GCP due to nested page tables. When running Vespa using the container image, _THP_ settings must be set on the base host OS (Linux). The recommended settings are: ``` $ echo 1 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag $ echo always > /sys/kernel/mm/transparent_hugepage/enabled $ echo never > /sys/kernel/mm/transparent_hugepage/defrag ``` To verify that the setting is active, one should see that _AnonHugePages_ is non-zero, In this case, 75 GB has been allocated using AnonHugePages. ``` $ cat /proc/meminfo |grep AnonHuge AnonHugePages: 75986944 kB ``` Note that the Vespa container needs to be restarted after modifying the base host OS settings to make the changes effective. Vespa uses `MADV_HUGEPAGE` for memory allocations done by the [content node process (proton)](/en/content/proton.html). ## Controlling which services to start The Docker image _vespaengine/vespa_'s [start script](https://github.com/vespa-engine/docker-image/blob/master/include/start-container.sh) takes a parameter that controls which services are started inside the container. Starting a _configserver_ container: ``` $ docker run \ --env VESPA_CONFIGSERVERS= \ vespaengine/vespaconfigserver ``` Starting a _services_ container (configserver will not be started): ``` $ docker run \ --env VESPA_CONFIGSERVERS= \ vespaengine/vespaservices ``` Starting a container with _both configserver and services_: ``` $ docker run \ --env VESPA_CONFIGSERVERS= \ vespaengine/vespaconfigserver,services ``` This is required in the case where the configserver container should run other services like an adminserver or logserver (see [services.html](/en/reference/applications/services/services.html)) If the [VESPA\_CONFIGSERVERS](files-processes-and-ports.html#environment-variables) environment variable is not specified, it will be set to the container hostname, also see [node setup](node-setup.html#hostname). Use the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application as a blueprint for how to set up config servers and services. ## Graceful stop Stopping a running _vespaengine/vespa_ container triggers a graceful shutdown, which saves time when starting the container again (i.e., data structures are flushed). If the container is shut down forcefully, the content nodes might need to restore the state from the transaction log, which might be time-consuming. There is no chance of data loss or data corruption as the data is always written and synced to persistent storage. The default timeout for the Docker daemon to wait for the shutdown might be too low for larger number of documents per node. Below stop will wait at least 120 seconds before terminating the running container forcefully, if the stop is successfully performed before the timeout has passed, the command takes less than the timeout: ``` $ docker stop name -t 120 ``` It is also possible to configure the default Docker daemon timeout, see [--shutdown-timeout](https://docs.docker.com/reference/cli/dockerd/). A clean content node shutdown looks like: ``` [2025-05-02 10:07:52.052] EVENT searchnode proton.node.server stopping/1 name="storagenode" why="Stopped" [2025-05-02 10:07:52.056] EVENT searchnode proton stopping/1 name="servicelayer" why="clean shutdown" [2025-05-02 10:07:52.056] INFO searchnode proton.proton.server.rtchooks shutting down monitoring interface [2025-05-02 10:07:52.058] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8832 at 0.00 percent [2025-05-02 10:07:52.059] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8832 at 0.00 percent [2025-05-02 10:07:52.060] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8840 at 0.00 percent [2025-05-02 10:07:52.066] INFO searchnode proton.transactionlog.server Stopping TLS [2025-05-02 10:07:52.066] INFO searchnode proton.transactionlog.server TLS Stopped [2025-05-02 10:07:52.071] EVENT searchnode proton stopping/1 name="proton" why="clean shutdown" [2025-05-02 10:07:52.078] EVENT config-sentinel sentinel.sentinel.service stopped/1 name="searchnode" pid=354 exitcode=0 ``` ## Memory The [sample applications](https://github.com/vespa-engine/sample-apps) and [local application deployment guide](../../basics/deploy-an-application-local.html) indicates the minimum memory requirements for the Docker containers. **Note:** Too little memory is a very common problem when testing Vespa in Docker containers. Use the below to troubleshoot before making a support request, and also see the [FAQ](../../learn/faq). As a rule of thumb, a single-node Vespa application requires a minimum of 4 GB for the Docker container. Using `docker stats` can be useful to track memory usage: ``` $ docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 589bf5801b22 node0 213.25% 697.3MiB / 3.84GiB 17.73% 14.2kB / 11.5kB 617MB / 976MB 253 e108dde84679 node1 213.52% 492.7MiB / 3.84GiB 12.53% 15.7kB / 12.7kB 74.3MB / 924MB 252 be43aacd0bbb node2 191.22% 497.8MiB / 3.84GiB 12.66% 19.6kB / 21.6kB 64MB / 949MB 261 ``` It is not necessarily easy to verify that Vespa has started all services successfully. Symptoms of errors due to insufficient memory vary, depending on where it fails. Example: Inspect restart logs in a container named _vespa_, running the [quickstart](../../basics/deploy-an-application-local.html) with only 2G: ``` $ docker exec -it vespa sh -c "/opt/vespa/bin/vespa-logfmt -S config-sentinel -c sentinel.sentinel.service" INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 2.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 6.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 14.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 30.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 25.173 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 62.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 126.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 119.515 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 254.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 510.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 501.026 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 1022.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 1800.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 1793.142 seconds ``` Observe that the _container_ service restarts in a loop, with increasing pause. A common problem is [config servers](configuration-server.html) not starting or running properly due to a lack of memory. This manifests itself as nothing listening on 19071, or deployment failures. Some guides/sample applications have specific configurations to minimize resource usage. Example from [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA): ``` $ docker run --detach --name node0 --hostname node0.vespanet \ -e VESPA_CONFIGSERVERS=node0.vespanet,node1.vespanet,node2.vespanet \ -eVESPA\_CONFIGSERVER\_JVMARGS="-Xms32M -Xmx128M"\ -eVESPA\_CONFIGPROXY\_JVMARGS="-Xms32M -Xmx32M"\ --network vespanet \ --publish 19071:19071 --publish 19100:19100 --publish 19050:19050 --publish 20092:19092 \ vespaengine/vespa ``` Here [VESPA\_CONFIGSERVER\_JVMARGS](files-processes-and-ports.html#environment-variables) and [VESPA\_CONFIGPROXY\_JVMARGS](files-processes-and-ports.html#environment-variables) are tweaked to the minimum for a functional test only. **Important:** For production use, do not reduce memory settings in `VESPA_CONFIGSERVER_JVMARGS` and `VESPA_CONFIGPROXY_JVMARGS`unless you know what you are doing - the Vespa defaults are set for regular production use, and rarely need changing. Container memory setting are done in _services.xml_, example from [multinode-HA](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml): ``` \ ``` Make sure that the settings match the Docker container Vespa is running in. Also see [node memory settings](node-setup.html#memory-settings) for more settings. ## Network Vespa processes communicate over both fixed and ephemeral ports - in general, all ports must be accessible. See [example ephemeral use](../../writing/visiting.html#handshake-failed). Find an example application using a Docker network in [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA). ## Resource usage Note that CPU usage will not be zero even if there are zero documents and zero queries. Starting the _vespaengine/vespa_ container image means starting the [configuration server](configuration-server.html) and the [configuration sentinel](config-sentinel.html). When deploying an application, the sentinel starts the configured service processes, and they all listen to work to do, changes in the config, and so forth. Therefore, an "idle" container instance consumes CPU and memory. ## Troubleshooting The Vespa documentation examples use `docker`. The Vespa Team has good experience with using `podman`, too, in the examples just change from `docker` to `podman`. We recommend using Podman v5, see the [release notes](https://github.com/containers/podman/blob/main/RELEASE_NOTES.md). [emulating-docker-cli-with-podman](https://podman-desktop.io/docs/migrating-from-docker/emulating-docker-cli-with-podman) is a useful resource. Many startup failures are caused by a failed Vespa Container start due to configuration or download errors. Use `docker logs vespa` to show the log (this example assumes a Docker container named `vespa`, use `docker ps` to list containers). ### Docker image Make sure to use a recent Vespa release (check [releases](https://factory.vespa.ai/releases)) and validate the downloaded image: ``` $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/vespaengine/vespa latest 8cfb0da22c01 35 hours ago 1.2 GB ``` ### Model download failures If the application package depends on downloaded models, look for `RuntimeException: Not able to create config builder for payload` - [details](../../applications/components.html#component-load). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Mounting persistent volumes](#mounting-persistent-volumes) - [Start Vespa container with Vespa user](#start-vespa-container-with-vespa-user) - [System limits](#system-limits) - [Transparent Huge Pages](#transparent-huge-pages) - [Controlling which services to start](#controlling-which-services-to-start) - [Graceful stop](#graceful-stop) - [Memory](#memory) - [Network](#network) - [Resource usage](#resource-usage) - [Troubleshooting](#troubleshooting) - [Docker image](#docker-image) - [Model download failures](#model-download-failures) --- # Source: https://docs.vespa.ai/en/reference/applications/services/docproc.html.md # services.xml - document-processing This is the [document-processing](../../../applications/document-processors.html)reference in [services.xml](services.html): ``` [container](container.html)document-processing [numnodesperclient, preferlocalnode, maxmessagesinqueue, maxqueuebytesize, maxqueuewait, maxconcurrentfactor, documentexpansionfactor, containercorememory][include](container.html#include)[documentprocessor [class, bundle, id, idref, provides, before, after]](#documentprocessor)[provides](#provides)[before](processing.html#before)[after](processing.html#after)[map](#map)[field [doctype, in-document, in-processor]](#map)[chain [name, id, idref, inherits, excludes, documentprocessors]](#chain)[map](#map)[field [doctype, in-document, in-processor]](#map)[inherits](processing.html#inherits)[chain](processing.html#chain)[exclude](processing.html#exclude)[documentprocessor [class, bundle, id, idref, provides, before, after]](#documentprocessor)[provides](#provides)[before](processing.html#before)[after](processing.html#after)[map](#map)[field [doctype, in-document, in-processor]](#map)[phase [id, idref, before, after]](processing.html#phase)[before](processing.html#before)[after](processing.html#after)[threadpool](#threadpool) ``` The root element of the _document-processing_ configuration model. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | numnodesperclient | optional | | | **Deprecated:** Ignored and deprecated, will be removed in Vespa 9. Set to some number below the amount of nodes in the cluster to limit how many nodes a single client can connect to. If you have many clients, this can reduce the memory usage on both document-processing and client nodes. | | preferlocalnode | optional | | false | **Deprecated:** Ignored and deprecated, will be removed in Vespa 9. Set to always prefer sending to a document-processing node running on the same host as the client. You should use this if you are running a client on each document-processing node. | | maxmessagesinqueue | | | | | | maxqueuebytesize | | | | **Deprecated:** Ignored and deprecated, will be removed in Vespa 9. | | maxqueuewait | optional | | | The maximum number of seconds a message should wait in queue before being processed. Docproc will adapt its queue size to adhere to this. If the queue is full, new messages will be replied to with SESSION\_BUSY. | | maxconcurrentfactor | | | | | | documentexpansionfactor | optional | | | | | containercorememory | | | | | ## Document Processor elements _documentprocessor_ elements are contained in [docproc chain elements](#chain)or in the _document-processing_ root. A documentprocessor element is either a document processor definition or document processor reference. The rest of this section deals with document processor definitions; document processor references are described in [docproc chain elements](#docproc-chain-elements). A documentprocessor definition causes the creation of exactly one document processor instance. This instance is set up according to the content of the documentprocessor element. A documentprocessor definition contained in a docproc chain element defines an_inner document processor_. Otherwise, it defines an _outer document processor._ For inner documentprocessors, the name must be unique inside the docproc chain. For outer documentprocessors, the component id must be unique. An inner documentprocessor is not permitted to have the same name as an outer documentprocessor. Optional sub-elements: - provides, a single name that should be added to the provides list - before, a single name that should be added to the before list - after, a single name that should be added to the after list - config (one or more) For more information on provides, before and after, see [Chained components](../../../applications/chaining.html). | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | class | | | | | | bundle | | | | | | id | required | | | The component id of the documentprocessor instance. | | idref | | | | | | provides | optional | | | A space-separated list of names that represents what this documentprocessor produces. | | before | optional | | | A space-separated list of phase or provided names. Phases or documentprocessors providing these names will be placed later in the docproc chain than this document processor. | | after | optional | | | A space-separated list of phase or provided names. Phases or documentprocessors providing these names will be placed earlier in the docproc chain than this document processor. | ### documentprocessor Defines a documentprocessor instance of a user specified class. ``` ... ``` | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | id | required | | | The component id of the documentprocessor instance. | | class | optional | | | A component specification containing the name of the class to instantiate to create the document processor instance. If missing, copied from id. | | bundle | optional | | | The bundle containing the class: The name in \ in pom.xml. If a bundle is not specified, the bundle containing document processors bundled with Vespa is used. | ## Docproc chain elements Specifies how a docproc chain should be instantiated, and how the contained document processors should be ordered. ### chain Contained in _document-processing_. Refer to the [chain reference](processing.html#chain). Chains can [inherit](processing.html#inherits) document processors from other chains and use [phases](processing.html#phase) for ordering. Optional sub-elements: - [documentprocessor element](#documentprocessor) (one or more), either a documentprocessor reference or documentprocessor definition. If the name given for a documentprocessor matches an _outer documentprocessor_, it is a _documentprocessor reference_ - otherwise, it is a _documentprocessor definition_. If it is a documentprocessor definition, it is also an implicit documentprocessor reference saying: use _exactly_ this documentprocessor. All these documentprocessor elements must have different name. - [phase](processing.html#phase) (one or more). - [config](../config-files.html#generic-configuration-in-services-xml) (one or more - will apply to all _inner_ documentprocessors in this docproc chain, unless overridden by individual inner documentprocessors). ## Map Set up a field name mapping from the name(s) of field(s) in the input documents to the names used in a deployed docproc. The purpose is to reuse functionality without changing the field names. The example below shows the configuration: ``` ``` In the example, a chain is deployed with 2 docprocs. For the chain, a mapping from _key_ to _id_ is set up. Imagine that some or all of the docprocs in the chain read and write to a field called _id_, but we want this functionality to the document field _key_. Furthermore, a similar thing is done for the `CityDocProc`: The docproc accesses the field_city_, whereas it's called _town_ in the feed. The mapping only applies to the document type _restaurant_. The `CarDocProc` accesses a field called _cyl_. In this example this is mapped to the field _cylinders_ of a struct _engine_using a dotted notation. If you specify mappings on different levels of the config (say both for a cluster and a docproc), the mapping closest to the actual docproc will take precedence. ## threadpool Available since Vespa 8.601.12 Specifies configuration for the thread pool used by document processor chains. All values scale with the number of vCPU—see the [container tuning example](../../../performance/container-tuning.html#container-worker-threads-example). When all workers are busy, new document processing requests are rejected. ### threads Number of worker threads per vCPU. Default value is `1`. The pool runs with `threads * vCPU` workers. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Document Processor elements](#document-processor-elements) - [documentprocessor](#documentprocessor) - [Docproc chain elements](#docproc-chain-elements) - [chain](#chain) - [Map](#map) - [threadpool](#threadpool) - [threads](#threadpool-threads) --- # Source: https://docs.vespa.ai/en/writing/document-api-guide.html.md # Document API This is an introduction to how to build and compile Vespa clients using the Document API. It can be used for feeding, updating and retrieving documents, or removing documents from the repository. See also the [Java reference](https://javadoc.io/doc/com.yahoo.vespa/documentapi). Use the [VESPA\_CONFIG\_SOURCES](../operations/self-managed/files-processes-and-ports.html#environment-variables) environment variable to set config servers to interface with. The most common use case is using the async API in a [document processor](../applications/document-processors.html) - from the sample apps: - Async GET in [LyricsDocumentProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/document-processing/src/main/java/ai/vespa/example/album/LyricsDocumentProcessor.java) - Async UPDATE in [ReviewProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/src/main/java/ai/vespa/example/shopping/ReviewProcessor.java) ## Documents All data fed, indexed and searched in Vespa are instances of the `Document` class. A [document](../schemas/documents.html) is a composite object that consists of: - A `DocumentType` that defines the set of fields that can exist in a document. A document can only have a single _document type_, but document types can inherit the content of another. All fields of an inherited type is available in all its descendants. The document type is defined in the [schema](../reference/schemas/schemas.html), which is converted into a configuration file to be read by the `DocumentManager`. - A `DocumentId` which is a unique document identifier. The document distribution uses the document identifier, see the [reference](../content/buckets.html#distribution) for details. - A set of `(Field, FieldValue)` pairs, or "fields" for short. The `Field` class has methods for getting its name, data type and internal identifier. The field object for a given field name can be retrieved using the `getField()` method in the `DocumentType`. Use [DocumentAccess](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/DocumentAccess.html) javadoc. Sample app: ``` com.yahoo.vespa documentapi 8.634.24 ``` ``` import com.yahoo.document.DataType; import com.yahoo.document.Document; import com.yahoo.document.DocumentId; import com.yahoo.document.DocumentPut; import com.yahoo.document.DocumentType; import com.yahoo.document.DocumentUpdate; import com.yahoo.document.datatypes.StringFieldValue; import com.yahoo.document.datatypes.WeightedSet; import com.yahoo.document.update.FieldUpdate; import com.yahoo.documentapi.DocumentAccess; import com.yahoo.documentapi.SyncParameters; import com.yahoo.documentapi.SyncSession; public class DocClient { public static void main(String[] args) { // DocumentAccess is injectable in Vespa containers, but not in command line tools, etc. DocumentAccess access = DocumentAccess.createForNonContainer(); DocumentType type = access.getDocumentTypeManager().getDocumentType("music"); DocumentId id = new DocumentId("id:namespace:music::0"); Document docIn = new Document(type, id); SyncSession session = access.createSyncSession(new SyncParameters.Builder().build()); // Put document with a1,1 WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset.put(new StringFieldValue("a1"), 1); docIn.setFieldValue("aWeightedset", wset); DocumentPut put = new DocumentPut(docIn); System.out.println(docIn.toJson()); session.put(put); // Update document with a1,10 and a2,20 DocumentUpdate upd1 = new DocumentUpdate(type, id); WeightedSet wset1 = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset1.put(new StringFieldValue("a1"), 10); wset1.put(new StringFieldValue("a2"), 20); upd1.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aWeightedset"), wset1)); System.out.println(upd1.toString()); session.update(upd1); Document docOut = session.get(id); System.out.println("document get:" + docOut.toJson()); session.destroy(); access.shutdown(); } } ``` To test using the [sample apps](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation), enable more ports for client to connect to config server and other processes on localhost - change docker command: ``` $ docker run --detach --name vespa --hostnamelocalhost--privileged \ --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 \--publish 19070:19070 --publish 19071:19071 --publish 19090:19090 --publish 19099:19099 --publish 19101:19101 --publish 19112:19112\ vespaengine/vespa ``` ## Fields Examples: ``` doc.setFieldValue("aByte", (byte)1); doc.setFieldValue("aInt", (int)1); doc.setFieldValue("aLong", (long)1); doc.setFieldValue("aFloat", 1.0); doc.setFieldValue("aDouble", 1.0); doc.setFieldValue("aBool", new BoolFieldValue(true)); doc.setFieldValue("aString", "Hello Field!"); doc.setFieldValue("unknownField", "Will not see me!"); Array intArray = new Array<>(doc.getField("aArray").getDataType()); intArray.add(new IntegerFieldValue(11)); intArray.add(new IntegerFieldValue(12)); doc.setFieldValue("aArray", intArray); Struct pos = PositionDataType.valueOf(1,2); pos = PositionDataType.fromString("N0.000002;E0.000001"); // two ways to set same value doc.setFieldValue("aPosition", pos); doc.setFieldValue("aPredicate", new PredicateFieldValue("aLong in [10..20]")); byte[] rawBytes = new byte[100]; for (int i = 0; i < rawBytes.length; i++) { rawBytes[i] = (byte)i; } doc.setFieldValue("aRaw", new Raw(ByteBuffer.wrap(rawBytes))); Tensor tensor = Tensor.Builder.of(TensorType.fromSpec("tensor>(x[2],y[2])")). cell().label("x", 0).label("y", 0).value(1.0). cell().label("x", 0).label("y", 1).value(2.0). cell().label("x", 1).label("y", 0).value(3.0). cell().label("x", 1).label("y", 1).value(5.0).build(); doc.setFieldValue("aTensor", new TensorFieldValue(tensor)); MapFieldValue map = new MapFieldValue<>(new MapDataType(DataType.STRING, DataType.STRING)); map.put(new StringFieldValue("key1"), new StringFieldValue("foo")); map.put(new StringFieldValue("key2"), new StringFieldValue("bar")); doc.setFieldValue("aMap", map); WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset.put(new StringFieldValue("strval 1"), 5); wset.put(new StringFieldValue("strval 2"), 10); doc.setFieldValue("aWeightedset", wset); ``` ## Document updates A document update is a request to modify a document, see [reads and writes](reads-and-writes.html). Primitive, and some multivalue fields (WeightedSet and Array\), are updated using a[FieldUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/update/FieldUpdate.html). Complex, multivalue fields like Map and Array\ are updated using[AddFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/AddFieldPathUpdate.html),[AssignFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/AssignFieldPathUpdate.html) and[RemoveFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/RemoveFieldPathUpdate.html).Field path updates are only supported on non-attribute[fields](../reference/schemas/schemas.html#field),[index](../reference/schemas/schemas.html#index) fields, or fields containing[struct field](../reference/schemas/schemas.html#struct-field) attributes. If a field is both an index field and an attribute, then the document is updated in the document store, the index is updated, but the attribute is not updated. Thus, you can get old values in document summary requests and old values being used in ranking and grouping. A [field path](../reference/schemas/document-field-path.html) string identifies fields to update - example: ``` upd.addFieldPathUpdate(new AssignFieldPathUpdate(type, "myMap{key2}", new StringFieldValue("abc"))); ``` _FieldUpdate_ examples: ``` // Simple assignment Field intField = type.getField("aInt"); IntegerFieldValue intFieldValue = new IntegerFieldValue(2); FieldUpdate assignUpdate = FieldUpdate.createAssign(intField, intFieldValue); upd.addFieldUpdate(assignUpdate); // Arithmetic FieldUpdate addUpdate = FieldUpdate.createIncrement(type.getField("aLong"), 3); upd.addFieldUpdate(addUpdate); // Composite - add one array element upd.addFieldUpdate(FieldUpdate.createAdd(type.getField("aArray"), new IntegerFieldValue(13))); // Composite - add two array elements upd.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aArray"), List.of(new IntegerFieldValue(14), new IntegerFieldValue(15)))); // Composite - add weightedset element upd.addFieldUpdate(FieldUpdate.createAdd(type.getField("aWeightedset"), new StringFieldValue("add_me"),101)); // Composite - add set to set WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset.put(new StringFieldValue("a1"), 3); wset.put(new StringFieldValue("a2"), 4); upd.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aWeightedset"), wset)); // Composite - update array element upd.addFieldUpdate(FieldUpdate.createMap(type.getField("aArray"), new IntegerFieldValue(1), // array index new AssignValueUpdate(new IntegerFieldValue(2)))); // value at index // Composite - increment weight upd3.addFieldUpdate(FieldUpdate.createIncrement(type.getField("aWeightedset"), new StringFieldValue("a1"), 1)); // Composite - add to set upd.addFieldUpdate(FieldUpdate.createMap(type.getField("aWeightedset"), new StringFieldValue("element1"), // value new AssignValueUpdate(new IntegerFieldValue(30)))); ``` _FieldPathUpdate_ examples: ``` // Add an element to a map Array stringArray = new Array(DataType.getArray(DataType.STRING)); stringArray.add(new StringFieldValue("my-val")); AddFieldPathUpdate addElement = new AddFieldPathUpdate(type, "aMap{key1}", stringArray); upd.addFieldPathUpdate(addElement); // Modify an element in a map upd.addFieldPathUpdate(new AssignFieldPathUpdate(type, "aMap{key2}", new StringFieldValue("abc"))); ``` ### Update reply semantics Sending in an update for which the system can not find a corresponding document to update is _not_ considered an error. These are returned with a successful status code (assuming that no actual error occurred during the update processing). Use[UpdateDocumentReply.wasFound()](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/UpdateResponse.html#wasFound()) to check if the update was known to have been applied. If the update returns with an error reply, the update _may or may not_ have been applied, depending on where in the platform stack the error occurred. ## Document Access The starting point of for passing documents and updates to Vespa is the `DocumentAccess` class. This is a singleton (see `get()` method) session factory (see `createXSession()` methods), that provides three distinct access types: - **Synchronous random access**: provided by the class `SyncSession`. Suitable for low-throughput proof-of-concept applications. - [**Asynchronous random access**](#asyncsession): provided by the class `AsyncSession`. It allows for document repository writes and random access with **high throughput**. - [**Visiting**](#visitorsession): provided by the class `VisitorSession`. Allows a set of documents to be accessed in order decided by the document repository, which gives higher read throughput than random access. ### AsyncSession This class represents a session for asynchronous access to a document repository. It is created by calling`myDocumentAccess.createAsyncSession(myAsyncSessionParams)`, and provides document repository writes and random access with high throughput. The usage pattern for an asynchronous session is like: 1. `put()`, `update()`, `get()` or `remove()` is invoked on the session, and it returns a synchronous `Result` object that indicates whether the request was successful or not. The `Result` object also contains a _request identifier_. 2. The client polls the session for a `Response` through its `getNext()` method. Any operation accepted by an asynchronous session will produce exactly one response within the configured timeout. 3. Once a response is available, it is matched to the request by inspecting the response's request identifier. The response may also contain data, either a retrieved document or a failed document put or update that needs to be handled. 4. Note that the client must process the response queue or your JVM will run into garbage collection issues, as the underlying session keeps track of all responses and unless they are consumed they will be kept alive and not be garbage collected. Example: ``` import com.yahoo.document.*; import com.yahoo.documentapi.*; public class MyClient { // DocumentAccess is injectable in Vespa containers, but not in command line tools, etc. private final DocumentAccess access = DocumentAccess.createForNonContainer(); private final AsyncSession session = access.createAsyncSession(new AsyncParameters()); private boolean abort = false; private int numPending = 0; /** * Implements application entry point. * * @param args Command line arguments. */ public static void main(String[] args) { MyClient app = null; try { app = new MyClient(); app.run(); } catch (Exception e) { e.printStackTrace(); } finally { if (app != null) { app.shutdown(); } } if (app == null || app.abort) { System.exit(1); } } /** * This is the main entry point of the client. This method will not return until all available documents * have been fed and their responses have been returned, or something signaled an abort. */ public void run() { System.out.println("client started"); while (!abort) { flushResponseQueue(); Document doc = getNextDocument(); if (doc == null) { System.out.println("no more documents to put"); break; } System.out.println("sending doc " + doc); while (!abort) { Result res = session.put(doc); if (res.isSuccess()) { System.out.println("put has request id " + res.getRequestId()); ++numPending; break; // step to next doc. } else if (res.type() == Result.ResultType.TRANSIENT_ERROR) { System.out.println("send queue full, waiting for some response"); processNext(9999); } else { res.getError().printStackTrace(); abort = true; // this is a fatal error } } } if (!abort) { waitForPending(); } System.out.println("client stopped"); } /** * Shutdown the underlying api objects. */ public void shutdown() { System.out.println("shutting down document api"); session.destroy(); access.shutdown(); } /** * Returns the next document to feed to Vespa. This method should only return null when the end of the * document stream has been reached, as returning null terminates the client. This is the point at which * your application logic should block if it knows more documents will eventually become available. * * @return The next document to put, or null to terminate. */ public Document getNextDocument() { return null; // TODO: Implement at your discretion. } /** * Processes all immediately available responses. */ void flushResponseQueue() { System.out.println("flushing response queue"); while (processNext(0)) { // empty } } /** * Wait indefinitely for the responses of all sent operations to return. This method will only return * early if the abort flag is set. */ void waitForPending() { while (numPending != 0) { if (abort) { System.out.println("waiting aborted, " + numPending + " still pending"); break; } System.out.println("waiting for " + numPending + " responses"); processNext(9999); } } /** * Retrieves and processes the next response available from the underlying asynchronous session. If no * response becomes available within the given timeout, this method returns false. * * @param timeout The maximum number of seconds to wait for a response. * @return True if a response was processed, false otherwise. */ boolean processNext(int timeout) { Response res; try { res = session.getNext(timeout); } catch (InterruptedException e) { e.printStackTrace(); abort = true; return false; } if (res == null) { return false; } System.out.println("got response for request id " + res.getRequestId()); --numPending; if (!res.isSuccess()) { System.err.println(res.getTextMessage()); abort = true; return false; } return true; } } ``` ### VisitorSession This class represents a session for sequentially visiting documents with high throughput. A visitor is started when creating the `VisitorSession`through a call to `createVisitorSession`. A visitor target, that is a receiver of visitor data, can be created through a call to `createVisitorDestinationSession`. The `VisitorSession` is a receiver of visitor data. See [visiting reference](visiting.html) for details. The `VisitorSession`: - Controls the operation of the visiting process - Handles the data resulting from visiting data in the system Those two different tasks may be set up to be handled by a `VisitorControlHandler` and a `VisitorDataHandler` respectively. These handlers may be supplied to the `VisitorSession` in the `VisitorParameters` object, together with a set of other parameters for visiting. Example: To increase performance, let more separate visitor destinations handle visitor data, then specify the addresses to remote data handlers. The default `VisitorDataHandler` used by the `VisitorSession` returned from`DocumentAccess` is `VisitorDataQueue` which queues up incoming documents and implements a polling API. The documents can be extracted by calls to the session's `getNext()` methods and can be ack-ed by the `ack()` method. The default `VisitorControlHandler` can be accessed through the session's `getProgress()`,`isDone()`, and `waitUntilDone()` methods. Implement custom `VisitorControlHandler`and `VisitorDataHandler` by subclassing them and supplying these to the `VisitorParameters` object. The `VisitorParameters` object controls how and what data will be visited - refer to the [javadoc](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/VisitorParameters.html). Configure the[document selection](../reference/writing/document-selector-language.html) string to select what data to visit - the default is all data. You can specify what fields to return in a result by specifying a[fieldSet](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/VisitorParameters.html) - see [document field sets](../schemas/documents.html#fieldsets). Specifying only the fields you need may improve performance a lot, especially if you can make do with only in-memory fields or if you have large fields you don't need returned. Example: ``` import com.yahoo.document.Document; import com.yahoo.document.DocumentId; import com.yahoo.documentapi.DocumentAccess; import com.yahoo.documentapi.DumpVisitorDataHandler; import com.yahoo.documentapi.ProgressToken; import com.yahoo.documentapi.VisitorControlHandler; import com.yahoo.documentapi.VisitorParameters; import com.yahoo.documentapi.VisitorSession; import java.util.concurrent.TimeoutException; public class MyClient { public static void main(String[] args) throws Exception { VisitorParameters params = new VisitorParameters("true"); params.setLocalDataHandler(new DumpVisitorDataHandler() { @Override public void onDocument(Document doc, long timeStamp) { System.out.print(doc.toXML("")); } @Override public void onRemove(DocumentId id) { System.out.println("id=" + id); } }); params.setControlHandler(new VisitorControlHandler() { @Override public void onProgress(ProgressToken token) { System.err.format("%.1f %% finished.\n", token.percentFinished()); super.onProgress(token); } @Override public void onDone(CompletionCode code, String message) { System.err.println("Completed visitation, code " + code + ": " + message); super.onDone(code, message); } }); params.setRoute(args.length > 0 ? args[0] : "[Storage:cluster=storage;clusterconfigid=storage]"); params.setFieldSet(args.length > 1 ? args[1] : "[document]"); // DocumentAccess is injectable in Vespa containers, but not in command line tools, etc. DocumentAccess access = DocumentAccess.createForNonContainer(); VisitorSession session = access.createVisitorSession(params); if (!session.waitUntilDone(0)) { throw new TimeoutException(); } session.destroy(); access.shutdown(); } } ``` The first optional argument to this client is the [route](document-routing.html) of the cluster to visit. The second is the [fieldset](../schemas/documents.html#fieldsets) set to retrieve. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Documents](#documents) - [Fields](#fields) - [Document updates](#document-updates) - [Update reply semantics](#update-reply-semantics) - [Document Access](#document-access) - [AsyncSession](#asyncsession) - [VisitorSession](#visitorsession) --- # Source: https://docs.vespa.ai/en/rag/document-enrichment.html.md # Document enrichment with LLMs Document enrichment enables automatic generation of document field values using large language models (LLMs) or custom code during feeding. It can be used to transform raw text into a more structured representation or expand it with additional contextual information. Examples of enrichment tasks include: - Extraction of named entities (e.g., names of people, organizations, locations, and products) for fuzzy matching and customized ranking - Categorization and tagging (e.g., sentiment and topic analysis) for filtering and faceting - Generation of relevant keywords, queries, and questions to improve search recall and search suggestions - Anonymization to remove personally identifiable information (PII) and reduction of bias in search results - Translation of content for multilingual search - LLM chunking These tasks are defined through prompts, which can be customized for a particular application. Generated fields are indexed and stored as normal fields and can be used for searching without additional latency associated with LLM inference. ## Setting up document enrichment components This section provides guidelines for configuring document enrichment, using the[LLM document enrichment sample app](https://github.com/vespa-engine/sample-apps/tree/master/field-generator) as an example. ### Defining generated fields Enrichments are defined in a schema using a [generate indexing expression](../reference/writing/indexing-language.html#generate). For example the following schema defines two [synthetic fields](../operations/reindexing.html) with `generate`: ``` schema passage { document passage { field id type string { indexing: summary | attribute } field text type string { indexing: summary | index index: enable-bm25 } } # Generate relevant questions to increase recall and search suggestions field questions type array { indexing: input text | generate questions_generator | summary | index index: enable-bm25 } # Extract named entities for fuzzy matching with ngrams field names type array { indexing: input text | generate names_extractor | summary | index match { gram gram-size: 3 } } } ``` Indexing statement `input text | generate questions_generator | summary | index` is interpreted as follows: 1. Take document field named `text` as an input 2. Pass the input to a field generator with id `questions_generator` 3. Store the output of the generator as summary 4. Index the output of the generator for lexical search Example of a document generated with this schema: ``` { "id": "71", "text": "Barley (Hordeum vulgare L.), a member of the grass family, is a major cereal grain. It was one of the first cultivated grains and is now grown widely. Barley grain is a staple in Tibetan cuisine and was eaten widely by peasants in Medieval Europe. Barley has also been used as animal fodder, as a source of fermentable material for beer and certain distilled beverages, and as a component of various health foods.", "questions": [ "What are the major uses of Barley (Hordeum vulgare L.) in different cultures and regions throughout history?", "How has the cultivation and consumption of Barley (Hordeum vulgare L.) evolved over time, from its initial cultivation to its present-day uses?", "What role has Barley (Hordeum vulgare L.) played in traditional Tibetan cuisine and Medieval European peasant diets?" ], "names": [ "Barley", "Hordeum vulgare L.", "Tibetan", "Medieval Europe" ] } ``` ### Configuring field generators A schema can contain multiple generated fields that use one or multiple field generators. All used field generators should be configured in `services.xml`, e.g. ``` ... ... local_llm Generate 3 questions relevant for this text: {input} openai files/names_extractor.txt ... ... ``` All field generators must specify `` that references a language model client, which is either a local LLM, an OpenAI client or a custom component. In addition to the language model, field generators require a prompt. Prompts are constructed from three parts: 1. Prompt template, specified either inline inside `` or in a file within application package with the path in ``. 2. Input from the indexing statement, e.g. `input text` where `text` is a document field name. 3. Output type of the field being generated. If neither `` nor `` are provided, the default prompt is set to the input part. When both are provided, `` has precedence. A prompt template must contain `{input}` placeholder, which will be replaced with the input value. It is possible to combine several fields into one input by concatenating them into a single string, e.g. ``` input "title: " . title . " text: " . text | generate names_extractor | summary | index ``` A prompt template might also contain a `{jsonSchema}` placeholder which will be replaced with a JSON schema based on the type of the field being generated, see the [structured output section](#structured-output) for details. Including a JSON schema in your prompt can help language models generate output in a specific format. However, it's important to understand that field generators already provide the JSON schema as a separate inference parameter to the underlying language model client. Both local LLM and OpenAI client utilize [structured output](#structured-output) functionality, which forces LLMs to produce outputs that conform to the schema. For this reason, explicitly including `{jsonSchema}` in your prompt template is unnecessary for most use cases. Structured output can be disabled by specifying `TEXT`. In this case, the generated field must have a `string` type. This is useful for very small models (less than a billion parameters) that struggle to generate structured output. For most use cases, it is recommended to use structured output even for `string` fields. The last parameter in the field generator configuration is ``, which specifies what to do when the output from the underlying language model can't be converted to the generated field type. This shouldn't happen when using structured output, but it can happen with `TEXT` response format. The default value is `DISCARD`, which leaves the field empty, sets it to `null`. Other values `WARN` and `FAIL` log a warning and throw an exception respectively. Overview of all the field generator parameters is available in the [configuration definition file](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/resources/configdefinitions/language-model-field-generator.def). ## Configuring language models Field generators specify `` to reference a language model client to be used for generation, which is either a local LLM, an OpenAI client or a custom component. Configuration details for local LLM and OpenAI client are covered in [local LLM](local-llms.html)and [OpenAI client](external-llms.html) documentation. This section focuses on configuration parameters that are important for document enrichment. Both local LLM and OpenAI client can be configured with different models. For efficient scaling of document enrichment, it is recommended to select the smallest model that delivers acceptable performance for the task at hand. In general, larger models produce better results but are more expensive and slower. Document enrichment tasks such as information extraction, summarization, expansion and classification are often less complex than the problem-solving capabilities targeted by larger models. These tasks can be accomplished by smaller, cost-efficient models, such as [Microsoft Phi-3.5-mini](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) for a local model or [GPT-4o mini](https://platform.openai.com/docs/models/gpt-4o-mini) for OpenAI API. Here is an example of a OpenAI client configured with GPT-4o mini model: ``` ... ... openai-key gpt-4o-mini ... ... ``` For OpenAI client, model selection influences API cost and latency. In addition to the model, local LLM client has several other parameters that are important for performance of document enrichment. The following configuration is a good starting point: ``` ... ... 5000 5 500 500 3 60000 60000 FAIL ... ... ``` There are three important aspects of this configuration in addition to the model used. 1. `model`, `contextSize` and `parallelRequests` determine compute resources necessary to run the model. 2. `contextSize`, `parallelRequests`, `maxPromptTokens` and `maxTokens` should be configured to avoid context overflow - a situation when context size is too small to process multiple parallel requests with the given number of prompt and completion tokens. 3. `maxQueueSize`, `maxEnqueueWait` and `maxQueueWait` are related to managing the queue used for storing and feeding parallel requests into LLM runtime (llama.cpp). [Local LLMs documentation](local-llms.html) explains how to configure `model`, `contextSize` and `parallelRequests` with respect to the model and compute resources used. Memory usage (RAM or GPU VRAM) is especially important to considered when configuring these parameters. To avoid context overflow, configure `contextSize`, `parallelRequests`, `maxPromptTokens` and `maxTokens` parameters so that `contextSize / parallelRequests >= maxPromptTokens + maxTokens`. Also consider that larger `contextSize` takes longer to process. The queue related parameters are used to balance latency with throughput. Values for these parameters heavily depends on underlying compute resources. Local LLM configuration presented above is optimized for CPU nodes with 16 cores and 32GB RAM as well as GPU nodes with NVIDIA T4 GPUs 16GB VRAM. ### Configuring compute resources Provisioned compute resources only affect local LLM performance, as OpenAI client merely calls a remote API that leverages the service provider's infrastructure. In practice, GPU is highly recommended for running local LLMs. It provided order of magnitude speedup compared to CPU. For Vespa Cloud, a reasonable starting configuration is as follows: ``` ... ... ... ... ``` This configuration provisions a container cluster with a single node containing NVIDIA T4 GPUs 16GB VRAM. Local model throughput scales linearly with the number of nodes in the container cluster used for feeding. For example with 8 GPU nodes (``) and throughput per node 1.5 generations/second, combined throughput will be close to 12 generations/second. ### Feeding configuration Generated fields introduce considerable latency during feeding. Large number of high-latency parallel requests might lead to timeouts in the document processing pipeline. To avoid this, it is recommended to reduce the number of connections during feeding. A reasonable starting point is to use three connections per GPU node and one connection per CPU node. Example for one GPU node: ``` vespa feed data.json --connections 3 ``` ## Structured output Document enrichment generates field values based on the data types defined in a document schema. Both local LLMs and the OpenAI client support structured output, forcing LLMs to produce JSON that conforms to a specified schema. This JSON schema is automatically constructed by a field generator according to the data type of the field being created. For example, a JSON schema for `field questions type array` in document `passage` will be as follows: ``` { "type": "object", "properties": { "passage.questions": { "type": "array", "items": { "type": "string" } } }, "required": [ "passage.questions" ], "additionalProperties": false } ``` Constructed schemas for different data types correspond to the [document JSON format](../reference/schemas/document-json-format.html#) used for feeding. The following field types are supported: - string - bool - int - long - byte - float - float16 - double - array of types mentioned above Types that are not supported: - map - struct - weightset - tensors - references - predicate - position ## Custom field generator As usual with Vespa, existing functionality can be extended by developing [custom application components](../applications/developer-guide.html). A custom generator component can be used to implement application-specific logic to construct prompts, transform and validate LLM inputs and output, combine outputs of several LLMs or use other sources such a knowledge graph. A custom field generator compatible with `generate` should implement `com.yahoo.language.process.FieldGenerator` interface with `generate` method that returns a field value. Here is toy example of a custom field generator: ``` package ai.vespa.test; import ai.vespa.llm.completion.Prompt; import com.yahoo.document.datatypes.FieldValue; import com.yahoo.document.datatypes.StringFieldValue; import com.yahoo.language.process.FieldGenerator; public class MockFieldGenerator implements FieldGenerator { private final MockFieldGeneratorConfig config; public MockFieldGenerator(MockFieldGeneratorConfig config) { this.config = config; } @Override public FieldValue generate(Prompt prompt, Context context) { var stringBuilder = new StringBuilder(); for (int i = 0; i < config.repetitions(); i++) { stringBuilder.append(prompt.asString()); if (i < config.repetitions() - 1) { stringBuilder.append(" "); } } return new StringFieldValue(stringBuilder.toString()); } } ``` The config definition for this component looks as follows: ``` namespace=ai.vespa.test package=ai.vespa.test repetitions int default=1 ``` To be used with `generate` indexing expression this component should be added to `services.xml`: ``` ... 2 ... ``` The last step is to use it in a document schema, e.g.: ``` schema passage { document passage { field text type string { indexing: summary | index index: enable-bm25 } } field mock_text type string { indexing: input text | generate mock_generator | summary } } ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Setting up document enrichment components](#setting-up-document-enrichment-components) - [Defining generated fields](#defining-generated-fields) - [Configuring field generators](#configuring-field-generators) - [Configuring language models](#configuring-language-models) - [Configuring compute resources](#configuring-compute-resources) - [Feeding configuration](#feeding-configuration) - [Structured output](#structured-output) - [Custom field generator](#custom-field-generator) --- # Source: https://docs.vespa.ai/en/reference/schemas/document-field-path.html.md # Document field path reference The field path syntax is used several places in Vespa to traverse documents through arrays, structs, maps and sets and generate a set of values matching the expression. Examples - If the document contains the field `mymap`, and it has a key `mykey`, the expression returns the value of the map for that key: ``` mymap{mykey} ``` Returns the value in index 3 of the `myarray` field, if set: ``` myarray[3] ``` Returns the value of the `value1` field in the struct field `mystruct`, if set: ``` mystruct.value1 ``` If mystructarray is an array field containing structs, returns the values of value1 for each of those structs: ``` mystructarray.value1 ``` The following syntax can be used for the different field types, and can be combined recursively as required: ## Maps/weighted Sets | \{\} | Retrieve the value of a specific key | | \{$\} | Retrieve all values, setting the [variable](#variables) to the key value for each | | \.key | Retrieve all key values | | \.value | Retrieve all values | | \ | Retrieve all keys | In the case of weighted sets, the value referenced above is the weight of the item. ## Array | \[\] | Retrieve the value in a specific index | | \[$\] | Retrieve all values in the array, setting the [variable](#variables) to the index of each | | \ | Retrieve all values in the array | ## Struct | \{.\} | Return the value of the struct field | | \ | Return the value of all subfields | Note that when specifying values of subscripts of maps, weighted sets and arrays, only primitive types (numbers and strings) may be used. ## Variables It can be useful to reference several field paths using a common variable. For instance, if you have an array of structs, you may want to use document selection on fields within the same array index together. This could be done by an expression like: ``` mydoctype.mystructarray{$x}.field1=="foo" AND mydoctype.mystructarray{$x}.field2=="bar" ``` Variables either have a `key` value (for maps and weighted sets), or an `index` value (for arrays). Variables cannot be used across such contexts (that is, a map key cannot be used to index into an array). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/reference/schemas/document-json-format.html.md # Document JSON format reference This document describes the JSON format used for sending document operations to Vespa. Field types are defined in the[schema reference](schemas.html#field). This is a reference for: - JSON representation of [document operations](#document-operations) (put, get, remove, update) - JSON representation of [field types](#field-types) in Vespa documents - JSON representation of addressing fields for update, and [update operations](#update-operations) Also refer to [encoding troubleshooting](../../linguistics/troubleshooting-encoding.html). ``` [Document operations](#document-operations)[Put](#put)[Get](#get)[Remove](#remove)[Update](#update)[Test and set](#test-and-set)[Create](#create)[Field types](#field-types)[string](#string)[int](#int)[long](#long)[bool](#bool)[byte](#byte)[float](#float)[double](#double)[position](#position)[predicate](#predicate)[raw](#raw)[uri](#uri)[array](#array)[weightedset](#weightedset)[Tensors](#tensor)[Indexed tensors short form](#tensor-short-form-indexed)[Short form for tensors with a single mapped dimension](#tensor-short-form-mapped)[Mixed tensors short form](#tensor-short-form-mixed)[Cell values as binary data (hex dump format)](#tensor-hex-dump)[Tensor verbose form](#tensor-verbose-form)[struct](#struct)[map](#map)[reference](#reference)[Empty fields](#empty-fields)[Update operations](#update-operations)[assign](#assign)[Single value field](#single-value-field)[Assign tensor](#tensor-field)[Assign struct field](#(x[6]):[-1,0,17,-128,34,-2]`. For other cell types, it's possible to take the bits of the floating-point value, interpreted directly as an unsigned integer of appropriate width (16, 32, or 64 bits) and use the hex dump (respectively 4, 8, or 16 hex digits per cell) in a string. For "float" cells (32-bit IEE754 floating-point) a simple snippet for converting a cell could look like this: ``` ``` import struct def float_to_hex(f: float): return format(struct.unpack('=I', struct.pack('=f', f))[0], '08X') ``` ``` As an advanced combination example, if you have a tensor with type `tensor(tag{},x[3])` this input could be used, shown with corresponding output: ``` ``` "mixedtensor": { "foo": "3DE38E393E638E393EAAAAAB", "bar": "3EE38E393F0E38E43F2AAAAB", "baz": "3F471C723F638E393F800000" } "mixedtensor":{ "type":"tensor(tag{},x[3])", "blocks":{ "foo":[0.1111111119389534,0.2222222238779068,0.3333333432674408], "bar":[0.4444444477558136,0.5555555820465088,0.6666666865348816], "baz":[0.7777777910232544,0.8888888955116272,1.0] } } ``` ``` **Verbose:** [Tensor](../../ranking/tensor-user-guide.html) fields may be represented as an array of cells: ``` ``` "tensorfield": [ { "address": { "x": "a", "y": "0" }, "value": 2.0 }, { "address": { "x": "a", "y": "1" }, "value": 3.0 }, { "address": { "x": "b", "y": "0" }, "value": 4.0 }, { "address": { "x": "b", "y": "1" }, "value": 5.0 } ] ``` ``` This works for any tensor but is verbose, so shorter forms specific to various tensor types are also supported. Use the shortest form applicable to your tensor type for the best possible performance. The cells array can optionally be nested in an object under the key "cells". This is how tensor values are returned [by default](../api/document-v1.html#format.tensors), along with another key "type" containing the tensor type. | | struct | ``` ``` "mystruct": { "intfield": 123, "stringfield": "foo" } ``` ``` | | map | The JSON dictionary key must be a string, even if the map key type in the schema is not a string: ``` ``` "int_to_string_map": { "123": "foo", "456": "bar", "789": "foobar" } ``` ``` Feeding in an empty map ({}) for a field will have the same effect as not feeding a value for that field, and the field will not be rendered in the document API and in document summaries. | | reference | String with document ID referring to a [parent document](../../schemas/parent-child.html): ``` ``` "artist_ref": "id:mynamespace:artists::artist-1" ``` ``` | | | | ## Empty fields In general, fields that have not received a value during feeding will be ignored when rendering the document. They are considered as empty fields. However, certain field types have some values which causes them to be considered empty. For instance, the empty string ("") is considered empty, as well as the empty array ([]). See the above table for more information for each type. ## Document operations Refer to [reads and writes](../../writing/reads-and-writes.html) for details - alternatives: - Use the [Vespa CLI](../../clients/vespa-cli.html#documents). - [/document/v1/](../api/document-v1.html): This API accepts one operation per request, with the document ID encoded in the URL. - [Vespa feed client](../../clients/vespa-feed-client.html): Java APIs / command line tool to feed document operations asynchronously to Vespa, over HTTP. ### Put The "put" payload has a "put" operation and ["fields"](#field-types) containing field values; ([/document/v1/ example](../../writing/document-v1-api-guide.html#post)): ``` ``` { "put": "id:mynamespace:music::123", "fields": { "title": "Best of Bob Dylan" } } ``` ``` ### Get "get" does not have a payload - the response has the same "field" object as in "put", and also "id" and "pathId" fields ([/document/v1/ example](../../writing/document-v1-api-guide.html#get)): ``` ``` { "pathId": "/document/v1/mynamespace/music/docid/123", "id": "id:mynamespace:music::123", "fields": { "title": "Best of Bob Dylan" } } ``` ``` ### Remove The "remove" payload only has a "remove" operation ([/document/v1/ example](../../writing/document-v1-api-guide.html#delete)): ``` ``` { "remove": "id:mynamespace:music::123" } ``` ``` ### Update The "update" payload has an "update" operation and "fields". Note: Each field must contain an [update operation](#update-operations), not just the field value directly; ([/document/v1/ example](../../writing/document-v1-api-guide.html#put)): ``` ``` { "update": "id:mynamespace:music::123", "fields": { "title": { "assign": "The best of Bob Dylan" } } } ``` ``` Flags can be added to add a [test and set](#test-and-set) condition, or allow the update to [create](#create) a new document (a so-called "upsert" operation). #### Test and set An optional _condition_ can be added to operations to specify a _test and set_ condition - see [conditional writes](../../writing/document-v1-api-guide.html#conditional-writes). The value of the _condition_ is a [document selection](../writing/document-selector-language.html), encoded as a string. Example: Increment the _sales_ field only if it is already equal to 999 ([/document/v1/ example](../../writing/document-v1-api-guide.html#conditional-writes)): ``` ``` { "update": "id:mynamespace:music::bob/BestOf", "condition": "music.sales==999", "fields": { "sales": { "increment": 1 } } } ``` ``` **Note:** Use _documenttype.fieldname_ in the condition, not only _fieldname_. If the condition is not met, a 412 response code is returned. #### create (create if nonexistent) **Updates** to nonexistent documents are supported using _create_; ([/document/v1/ example](../../writing/document-v1-api-guide.html#create-if-nonexistent)): ``` ``` { "update": "id:mynamespace:music::bob/BestOf", "create": true, "fields": { "title": { "assign": "The best of Bob Dylan" } } } ``` ``` Since Vespa 8.178, _create_ can also be used together with conditional **Put** operations ([/document/v1/ example](../../writing/document-v1-api-guide.html#conditional-updates-and-puts-with-create) - review notes there before using): ``` ``` { "put": "id:mynamespace:music::123", "condition": "music.sales==999", "create": true, "fields": { "title": "Best of Bob Dylan" } } ``` ``` ## Update operations The update operations are: [`assign`](#assign), [`add`](#add), [`remove`](#composite-remove), [arithmetics](#arithmetic) (`increment` `decrement` `multiply` `divide`), [`match`](#match), [`modify`](#tensor-modify) ## assign `assign` is used to replace the value of a field (or an element of a collection) with a new value. When assigning, one can generally use the same syntax and structure as when feeding that field's value in a `put` operation. ### Single value field ``` field title type string { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:music::example", "fields": { "title": { "assign": "The best of Bob Dylan" } } } ``` ``` ### Tensor field ``` field tensorfield type tensor(x{},y{}) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "assign": { "cells": [ { "address": { "x": "a", "y": "b" }, "value": 2.0 }, { "address": { "x": "c", "y": "d" }, "value": 3.0 } ] } } } } ``` ``` This will fully replace the entire tensor stored in this field. ### Struct field #### Replacing all fields in a struct A full struct is replaced by assigning an object of struct key/value pairs. ``` struct person { field first_name type string {} field last_name type string {} } field contact type person { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:workers::example", "fields": { "contact": { "assign": { "first_name": "Bob", "last_name": "The Plumber" } } } } ``` ``` #### Individual struct fields Individual struct fields are updated using [field path](#fieldpath) syntax. Refer to the [reference](schemas.html#struct-name) for restrictions using structs. ``` ``` { "update": "id:mynamespace:workers::example", "fields": { "contact.first_name": { "assign": "Bob" }, "contact.last_name": { "assign": "The Plumber" } } } ``` ``` ### Map field Individual map entries can be updated using [field path](document-field-path.html) syntax. The following declaration defines a `map` where the `key` is an Integer and the value is a `person` struct. ``` struct person { field first_name type string {} field last_name type string {} } field contact type map { indexing: summary } ``` Example updating part of an entry in the `contact` map: - `contact` is the name of the map field to be updated - `{0}` is the key that is going to be updated - `first_name` is the struct field to be updated inside the `person` struct ``` ``` { "update": "id:mynamespace:workers::example", "fields": { "contact{0}.first_name": { "assign": "John" } } } ``` ``` Assigning an element to a key in a map will insert the key/value mapping if it does not already exist, or overwrite it with the new value if it does exist. Refer to the [reference](schemas.html#map) for restrictions using maps. #### Map to primitive value ``` field my_food_scores type map { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:food::example", "fields": { "my_food_scores{Strawberries}": { "assign": "Delicious!" } } } ``` ``` #### Map to struct ``` struct contact_info { field phone_number type string {} field email type string {} } field contacts type map { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:people::d_duck", "fields": { "contacts{\"Uncle Scrooge\"}": { "assign": { "phone_number": "555-123-4567", "email": "number_one_dime_luvr1877@example.com" } } } } ``` ``` ### Array field #### Array of primitive values ``` field ingredients type array { indexing: summary } ``` Assign full array: ``` ``` { "update": "id:mynamespace:cakes:tasty_chocolate_cake", "fields": { "ingredients": { "assign": ["sugar", "butter", "vanilla", "flour"] } } } ``` ``` Assign existing elements in array: ``` ``` { "update": "id:mynamespace:cakes:tasty_chocolate_cake", "fields": { "ingredients[3]": { "assign": "2 cups of flour (editor's update: NOT asbestos!)" } } } ``` ``` Note that the index element 3 needs to exist. Alternative using match: ``` ``` { "update": "id:mynamespace:cakes:tasty_chocolate_cake", "fields": { "ingredients": { "match": { "element": 3, "assign": "2 cups of flour (editor's update: NOT asbestos!)" } } } } ``` ``` Individual array elements may be updated using [field path](document-field-path.html) or [match](#match) syntax. #### Array of struct Refer to the reference for restrictions using[array of structs](schemas.html#array). ``` struct person { field first_name type string {} field last_name type string {} } field people type array { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:students:example", "fields": { "people[34]": { "assign": { "first_name": "Bobby", "last_name": "Tables" } } } } ``` ``` Note that the element index needs to exist. Use [add](#add-array-elements) to add a new element. Alternative syntax using match: ``` ``` { "update": "id:mynamespace:students:example", "fields": { "people": { "match": { "element": 34, "assign": { "first_name": "Bobby", "last_name": "Tables" } } } } } ``` ``` ### Weighted set field Adding new elements to a weighted set can be done using [add](#add-weighted-set), or by assigning with `field{key}` syntax. Example of the latter: ``` field int_weighted_set type weightedset { indexing: summary } field string_weighted_set type weightedset { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:weightedsetdoctype::example1", "fields": { "int_weighted_set{123}": { "assign": 123 }, "int_weighted_set{456}": { "assign": 100 }, "string_weighted_set{\"item 1\"}": { "assign": 144 }, "string_weighted_set{\"item 2\"}": { "assign": 7 } } } ``` ``` Note that using the `field{key}` syntax for weighted sets _may_ be less efficient than using [add](#add-weighted-set). ### Clearing a field To clear a field, assign a `null` value to it. ``` ``` { "update": "id:mynamespace:music::example", "fields": { "title": { "assign": null } } } ``` ``` ## add `add` is used to add entries to arrays, weighted sets or to the mapped dimensions of tensors. ### Adding array elements The added entries are appended to the end of the array in the order specified. ``` field tracks type array { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf", "fields": { "tracks": { "add": [ "Lay Lady Lay", "Every Grain of Sand" ] } } } ``` ``` ### Add weighted set entries Add weighted set elements by using a JSON key/value syntax, where the value is the weight of the element. Adding a key/weight mapping that already exists will overwrite the existing weight with the new one. ``` field int_weighted_set type weightedset { indexing: summary } field string_weighted_set type weightedset { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:weightedsetdoctype::example1", "fields": { "int_weighted_set": { "add": { "123": 123, "456": 100 } }, "string_weighted_set": { "add": { "item 1": 144, "item 2": 7 } } } } ``` ``` ### Add tensor cells Add cells to mapped or mixed tensors. Invalid for tensors with only indexed dimensions. Adding a cell that already exists will overwrite the cell value with the new value. The address must be fully specified, but cells with bound indexed dimensions not specified will receive the default value of `0.0`. See system test[tensor add update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_add_remove_update)for more examples. ``` field tensorfield type tensor(x{},y[3]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "add": { "cells": [ { "address": { "x": "b", "y": "0" }, "value": 2.0 }, { "address": { "x": "b", "y": "1" }, "value": 3.0 } ] } } } } ``` ``` In this example, cell `{"x":"b","y":"2"}` will implicitly be set to 0.0. So if you started with the following tensor: ``` { {"x": "a", "y": "0"}: 0.2, {"x": "a", "y": "1"}: 0.3, {"x": "a", "y": "2"}: 0.5, } ``` You now end up with this tensor after the above add operation was applied: ``` { {"x": "a", "y": "0"}: 0.2, {"x": "a", "y": "1"}: 0.3, {"x": "a", "y": "2"}: 0.5, {"x": "b", "y": "0"}: 2.0, {"x": "b", "y": "1"}: 3.0, {"x": "b", "y": "2"}: 0.0, } ``` Prefer the _block short form_ for mixed tensors instead. This also avoids the problem where cells with indexed dimensions are not specified: ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "add": { "blocks": [ { "address": { "x": "b" }, "values": [2.0, 3.0, 5.0] } ] } } } } ``` ``` ## remove Remove elements from weighted sets, maps and tensors with `remove`. ### Weighted set field ``` field string_weighted_set type weightedset { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:weightedsetdoctype::example1", "fields": { "string_weighted_set": { "remove": { "item 2": 0 } } } } ``` ``` ### Map field ``` field string_map type map { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:mapdoctype::example1", "fields": { "string_map{item 2}": { "remove": 0 } } } ``` ``` ### Tensor field Removes cells from mapped or mixed tensors. Invalid for tensors with only indexed dimensions. Only mapped dimensions should be specified for tensors with both mapped and indexed dimensions, as all indexed cells the mapped dimensions point to will be removed implicitly. See system test[tensor remove update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_add_remove_update)for more examples. ``` field tensorfield type tensor(x{},y[2]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "remove": { "addresses": [ {"x": "b"}, {"x": "c"} ] } } } } ``` ``` In this example, cells `{x:b,y:0},{x:b,y:1},{x:c,y:0},{x:c,y:1}` will be removed. It is also supported to specify only a subset of the mapped dimensions in the addresses. In that case, all cells that match the label values of the specified dimensions are removed. In the given example, all cells having label `b` for dimension `x` are removed. ``` field tensorfield type tensor(x{},y{},z[2]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "remove": { "addresses": [ {"x": "b"} ] } } } } ``` ``` ## Arithmetic The four arithmetic operators `increment`, `decrement`,`multiply` and `divide` are used to modify _single value_ numeric values without having to look up the current value before applying the update. Example: ``` field sales type int { indexing: summary | attribute } ``` ``` ``` { "update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf", "fields": { "sales": { "increment": 1 } } } ``` ``` ## match If an arithmetic operation is to be done for a specific key in a _weighted set or array_, use the `match` operation: ``` field track_popularity type weightedset { indexing: summary | attribute } ``` ``` ``` { "update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf", "fields": { "track_popularity": { "match": { "element": "Lay Lady Lay", "increment": 1 } } } } ``` ``` In other words, for the weighted set "track\_popularity",`match` the element "Lay Lady Lay", then `increment` its weight by 1. See the [weightedset properties](schemas.html#weightedset-properties)reference for how to make incrementing a non-existing key trigger auto-create of the key. If the updated field is an array, the `element` value would be a positive integer. **Note:** Only oneelement can be matched per operation. ## Modify tensors Individual cells in tensors can be modified using the `modify` update. The cells are modified according to the given operation: - `replace` - replaces a single cell value - `add` - adds a value to the existing cell value - `multiply` - multiples a value with the existing cell value The addresses of cells must be fully specified. If the cell does not exist, the update for that cell will be ignored. Use `"create": true` (see example below) to create non-existing cells before the modify update is applied. See system test[tensor modify update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_modify_update)for more examples. ``` field tensorfield type tensor(x[3]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "modify": { "operation": "replace", "addresses": [ { "address": { "x": "1" }, "value": 7.0 }, { "address": { "x": "2" }, "value": 8.0 } ] } } } } ``` ``` In this example, cell `{"x":"1"}` is replaced with value 7.0 and `{"x":"2"}` with value 8.0. If operation `add` or `multiply` was used instead, 7.0 and 8.0 would be added or multiplied to the current values of cells `{"x":"1"}` and `{"x":"2"}`. For tensors with a single mapped dimension the _cells short form_ can also be used: ``` field tensorfield type tensor(x{}) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "modify": { "operation": "add", "create": true, "cells": { "b": 5.0, "c": 6.0 } } } } } ``` ``` In this example, 5.0 is added to cell `{"x":"b"}` and 6.0 is added to cell `{"x":"c"}`. With `"create": true` non-existing cells in the input tensor are created before applying the modify update. The default cell value is 0.0 for `replace` and `add`, and 1.0 for `multiply`. This means a non-existing cell ends up with the value specified in the operation. For mixed tensors the _block short form_ can also be used to modify entire dense subspaces: ``` field tensorfield type tensor(x{},y[3]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "modify": { "operation": "replace", "blocks": { "a": [1,2,3], "b": [4,5,6] } } } } } ``` ``` ## Fieldpath Fieldpath is for accessing fields within composite structures - for structures that are not part of index or attribute, it is possible to access elements directly using fieldpaths. This is done by adding more information to the field value. For map structures, specify the key (see [example](#assign)). ``` mymap{mykey} ``` and then do operation on the element which is keyed by "mykey". Arrays can be accessed as well (see [details](#assign)). ``` myarray[3] ``` And this is also true for structs (see [details](#assign)).**Note:** Struct updates do not work for[index](../applications/services/content.html#document) mode: ``` mystruct.value1 ``` This also works for nested structures, e.g. a `map` of `map` to `array` of `struct`: ``` ``` { "update": "id:mynamespace:complexdoctype::foo", "fields": { "nested_structure{firstMapKey}{secondMapKey}[4].title": { "assign": "Look at me, mom! I'm hiding deep in a nested type!" } } } ``` ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Field types](#field-types) - [Empty fields](#empty-fields) - [Document operations](#document-operations) - [Put](#put) - [Get](#get) - [Remove](#remove) - [Update](#update) - [Update operations](#update-operations) - [assign](#assign) - [Single value field](#single-value-field) - [Tensor field](#tensor-field) - [Map field](#assign-map-field) - [Array field](#array-field) - [Weighted set field](#weighted-set-field) - [Clearing a field](#clearing-a-field) - [add](#add) - [Adding array elements](#add-array-elements) - [Add weighted set entries](#add-weighted-set) - [Add tensor cells](#tensor-add) - [remove](#composite-remove) - [Weighted set field](#weighted-set-field-remove) - [Map field](#map-field-remove) - [Tensor field](#tensor-remove) - [Arithmetic](#arithmetic) - [match](#match) - [Modify tensors](#tensor-modify) - [Fieldpath](#fieldpath) --- # Source: https://docs.vespa.ai/en/applications/document-processors.html.md # Document processors This document describes how to develop and deploy _Document Processors_, often called _docproc_ in this documentation. Document processing is a framework to create [chains](chaining.html) of configurable [components](components.html), that read and modify document operations. The input source splits the input data into logical units called [documents](../schemas/documents.html). A [feeder application](../writing/reads-and-writes.html) sends the documents into a document processing chain. This chain is an ordered list of document processors. Document processing examples range from language detection, HTML removal and natural language processing to mail attachment processing, character set transcoding and image thumbnailing. At the end of the processing chain, extracted data will typically be set in some fields in the document. The motivation for document processing is that code and configuration is atomically deployed, as like all Vespa components. It is also easy to build components that access data in Vespa as part of processing. To get started, see the [sample application](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing). Read [indexing](../writing/indexing.html) to understand deployment and routing. As document processors are chained components just like Searchers, read [Searcher Development](searchers.html). For reference, see the [Javadoc](https://javadoc.io/doc/com.yahoo.vespa/docproc), and [services.xml](../reference/applications/services/docproc.html). ![Document Processing component in Vespa overview](/assets/img/vespa-overview-docproc.svg) ## Deploying a Document Processor Refer to[album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) to get started, [LyricsDocumentProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/document-processing/src/main/java/ai/vespa/example/album/LyricsDocumentProcessor.java) is a document processor example. Add the document processor in [services.xml](../reference/applications/services/docproc.html), and then add it to a [chain](#chains). The type of processing done by the processor dictates what chain it should be part of: - If it does general data-processing, such as populating some document fields from others, looking up data in external services, etc., it should be added to a general docproc chain. - If, and only if, it does processing required for _indexing_ - or requires this to have already been run — it should be added to a chain which inherits the _indexing_ chain, and which is used for indexing by a content cluster. An example that adds a general document processor to the "default" chain, and an indexing related processor to the chain for a particular content cluster: ``` \ \ ...\ ``` The "default" chain, if it exists, is run by default, before the chain used for indexing. The default indexing chain is called "indexing", and _must_ be inherited by any chain that is to replace it. To run through any chain, specify a [route](../writing/document-routing.html) which includes the chain. For example, the route `default/chain.my-chain indexing` would route feed operations through the chain "my-chain" in the "default" container cluster, and then to the "indexing" hop, which resolves to the specified indexing chain for each content cluster the document should be sent to. More details can be found in [indexing](../writing/document-routing.html#document-processing): ## Document Processors A document processor is a component extending `com.yahoo.docproc.DocumentProcessor`. All document processors must implement `process()`: ``` public Progress process(Processing processing); ``` When the container receives a document operation, it will create a new `Processing`, and add the `DocumentPut`s, `DocumentUpdate`s or `DocumentRemove`s to the `List` accessible through `Processing.getDocumentOperations()`. The latter is useful also where a processing should be stopped by doing `Processing.getDocumentOperations().clear()` before `Progress.DONE`, say for blocklist use, to stop a `DocumentPut/Update`. Furthermore, the call stack of the document processing chain in question will be _copied_ to `Processing.callStack()`, so that document processors may freely modify the flow of control for this processing without affecting all other processings going on. After creation, the `Processing` is added to an internal queue. A worker thread will retrieve a `Processing` from the input queue, and run its document operations through its call stack. A minimal, no-op document processor implementation is thus: ``` ``` import com.yahoo.docproc.*; public class SimpleDocumentProcessor extends DocumentProcessor { public Progress process(Processing processing) { return Progress.DONE; } } ``` ``` The `process()` method should loop through all document operations in `Processing.getDocumentOperations()`, do whatever it sees fit to them, and return a `Progress`: ``` ``` public Progress process(Processing processing) { for (DocumentOperation op : processing.getDocumentOperations()) { if (op instanceof DocumentPut) { DocumentPut put = (DocumentPut) op; // TODO do something to 'put here } else if (op instanceof DocumentUpdate) { DocumentUpdate update = (DocumentUpdate) op; // TODO do something to 'update' here } else if (op instanceof DocumentRemove) { DocumentRemove remove = (DocumentRemove) op; // TODO do something to 'remove' here } } return Progress.DONE; } ``` ``` | Return code | Description | | --- | --- | | `Progress.DONE` | Returned if a document processor has successfully processed a `Processing`. | | `Progress.FAILED` | Processing failed and the input message should return a _fatal_ failure back to the feeding application, meaning that this application will not try to re-feed this document operation. Return an error message/reason by calling `withReason()`: This result is represented as a `500 Internal Server Error` response in [Document v1](../writing/document-v1-api-guide.html). ``` ``` if (op instanceof DocumentPut) { return Progress.FAILED.withReason("PUT is not supported"); } ``` ``` | | `Progress.INVALID_INPUT` | Available since 8.584. Processing failed due to invalid input, like a malformed document operation. This result is represented as a `400 Bad Request` response in [Document v1](../writing/document-v1-api-guide.html). | | `Progress.LATER` | See [execution model](#execution-model). The document processor wants to release the calling thread and be called again later. This is useful if e.g. calling an external service with high latency. The document processor may then save its state in the `Processing` and resume when called again later. There are no guarantees as to _when_ the processor is called again with this `Processing`; it is simply appended to the back of the input queue. By the use of `Progress.LATER`, this is an asynchronous model, where the processing of a document operation does not need to consume one thread for its entire lifespan. Note, however, that the document processors themselves are shared between all processing operations in a chain, and must thus be implemented [thread-safe](#state). | | Exception | Description | | --- | --- | | `com.yahoo.docproc.TransientFailureException` | Processing failed and the input message should return a _transient_ failure back to the feeding application, meaning that this application _may_ try to re-feed this document operation. | | `RuntimeException` | Throwing any other `RuntimeException` means same behavior as for `Progress.FAILED`. | ## Chains The call stack mentioned above is another name for a _document processor chain_. Document processor chains are a special case of the general [component chains](chaining.html) - to avoid confusion some concepts are explained here as well. A document processor chain is nothing more than a list of document processor instances, having an id, and represented as a stack. The document processor chains are typically not created for every processing, but are part of the configuration. Multiple ones may exist at the same time, the chain to execute will be specified by the message bus destination of the incoming message. The same document processor instance may exist in multiple document processor chains, which is why the `CallStack` of the `Processing` is responsible for knowing the next document processor to invoke in a particular message. The execution order of the document processors in a chain are not ordered explicitly, but by [ordering constraints](chaining.html#ordering-components) declared in the document processors or their configuration. ## Execution model The Document Processing Framework works like this: 1. A thread from the message bus layer appends an incoming message to an internal priority queue, shared between all document processing chains configured on a node. The priority is set based on the message bus priority of the message. Messages of the same priority are ordered FIFO. 2. One worker thread from the docproc thread pool picks one message from the head of the queue, deserializes it, copies the call stack (chain) in question, and runs it through the document processors. 3. Processing finishes if **(a)** the document(s) has passed successfully through the whole chain, or **(b)** a document processor in the chain has returned `Progress.FAILED` or thrown an exception. 4. The same thread passes the message on to the message bus layer for further transport on to its destination. There is a single instance of each document processor chain. In every chain, there is a single instance of each document processor - unless a chain is configured with multiple, identical document processors - this is a rare case. As is evident from the model above, multiple worker threads execute the document processors in a chain concurrently. Thus, many threads of execution can be going through `process()` in a document processor, at the same time. This model places an important constraint on document processor classes: _instance variables are not safe._ They must be eliminated, or made thread-safe somehow. Also see [Resource management](components.html#resource-management), use `deconstruct()` in order to not leak resources. ### Asynchronous execution The execution model outlined above also shows one important restriction: If a document processor performs any high-latency operation in its process() method, a docproc worker thread will be occupied. With all _n_ worker threads blocking on an external resource, throughput will be limited. This can be fixed by saving the state in the Processing object, and returning `Progress.LATER`. A document processor doing a high-latency operation should use a pattern like this: 1. Check a self-defined context variable in Processing for status. Basically, _have we seen this Processing before?_ 2. If no: 1. We have been given a Processing object fresh off the network, we have not seen this before. Process it up until the high-latency operation. 2. Start the high-latency operation (possibly in a separate thread). 3. Save the state of the operation in a self-defined context variable in the Processing. 4. Return `Progress.LATER`. This Processing is the appended to the back of the input queue, and we will be called again later. 3. If yes: 1. Retrieve the reference that we set in our self-defined context variable in Processing. 2. Is the high-latency operation done? If so, return `Progress.DONE`. 3. Is it not yet done? Return `Progress.LATER` again. As is evident, this will let the finite set of document processing threads to do more work at the same time. ## State Any state in the document processor for the particular Processing should be kept as local variables in the process method, while state which should be shared by all Processings should be kept as member variables. As the latter kind will be accessed by multiple threads at any one time, the state of such member variables must be _thread-safe_. This critical restriction is similar to those of e.g. the Servlet API. Options for implementing a multithread-safe document processor with instance variables: 1. Use immutable (and preferably final) objects: they never change after they are constructed; no modifications to their state occurs after the DocumentProcessor constructor returns. 2. Use a single instance of a thread-safe class. 3. Create a single instance and synchronize access to it across all threads (but this will severely limit scalability). 4. Arrange for each thread to have its own instance, e.g. with a `ThreadLocal`. ### Processing Context Variables `Processing` has a map `String -> Object` that can be used to pass information between document processors. It is also useful when using `Progress.LATER` to save the state of a processing - see [Processing.java](https://github.com/vespa-engine/vespa/blob/master/docproc/src/main/java/com/yahoo/docproc/Processing.java) for `get/setVariable` and more. The [sample application](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) uses such context variables, too. ## Operation ordering ### Feed ordering Ordering of feed operations is not guaranteed. Operations on different documents will be done concurrently and are therefore not ordered. However, Vespa guarantees that operations on the same document are processed in the order they were fed if they enter vespa at the _same_ feed endpoint. ### Document processing ordering Document operations that are produced inside a document processor obey the same rules as at feed time. If you either split the input into other documents or into multiple operations to the same document, Vespa will ensure that operations to the same document id are sequenced and are delivered in the order they enter. ## (Re)configuring Document Processing Consider the following configuration: ``` \\value\\ ``` Changing chain ids, components in a chain, component configuration and schema mapping all takes effect after deployment - no restart required. Changing a _cluster name_ (i.e. the container id) requires a restart of docproc services after _vespa activate_. Note when adding or modifying a processing chain in a running cluster; if at the same time deploying a _new_ document processor (i.e. a document processor that was unknown to Vespa at the time the cluster was started), the container must be restarted: ``` $[vespa-sentinel-cmd](../reference/operations/self-managed/tools.html#vespa-sentinel-cmd)restart container ``` ## Class diagram ![Document processing core class diagram](/assets/img/document-processing-class-diagram.svg) The framework core supports asynchronous processing, processing one or multiple documents or document updates at the same time, document processors that makes dynamic decisions about the processing flow and passing of information between processors outside the document or document update: - One or more named `Docproc Services` may be created. One of the services is the _default_. - A service accepts subclasses of `DocumentOperation` for processing, meaning `DocumentPuts`, `DocumentUpdates` and `DocumentRemoves`. It has a `Call Stack` which lists the calls to make to various `DocumentProcessors` to process each DocumentOperation handed to the service. - Call Stacks consist of `Calls`, which refer to the Document Processor instance to call. - Document puts and document updates are processed asynchronously, the state is kept in a `Processing` for its duration (instead of in a thread or process). A Document Processor may make some asynchronous calls (typically to remote services) and return to the framework that it should be called again later for the same Processing to handle the outcome of the calls. - A processing contains its own copy of the Call Stack of the Docproc Service to keep track of what to call next. Document Processors may modify this Call Stack to dynamically decide the processing steps required to process a DocumentOperation. - A Processing may contain one or more DocumentOperations to be processed as a unit. - A Processing has a `context`, which is a Map of named values which can be used to pass arguments between processors. - Processings are prepared to be stored to disk, to allow a high number of ongoing long-term processings per node. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Deploying a Document Processor](#deploying-a-document-processor) - [Document Processors](#document-processors) - [Chains](#chains) - [Execution model](#execution-model) - [Asynchronous execution](#asynchronous-execution) - [State](#state) - [Processing Context Variables](#processing-context-variables) - [Operation ordering](#operation-ordering) - [Feed ordering](#feed-ordering) - [Document processing ordering](#document-processing-ordering) - [(Re)configuring Document Processing](#reconfiguring-document-processing) - [Class diagram](#class-diagram) --- # Source: https://docs.vespa.ai/en/writing/document-routing.html.md # Routing _Routing_ is used to configure the paths that documents and updates written to Vespa take through the system. Vespa will automatically set up a routing configuration which is appropriate for most cases, so no explicit routing configuration is necessary. However, explicit routing can be used in advanced use cases such as sending different document streams to different document processing clusters, or through multiple consecutive clusters etc. There are other, more in-depth, articles on routing: - Use [vespa-route](../reference/operations/self-managed/tools.html#vespa-route) to inspect routes and services of a Vespa application, like in the [example](#example-reconfigure-the-default-route) - [Routing policies reference](#routing-policies-reference). See the [routing policies](#routing-policies) note for complex routes and default routing In Vespa, there is a transport layer and a programming interface that are available to clients that wish to communicate with a Vespa application. The transport layer is _Message Bus_.[Document API](document-api-guide.html) is implemented on top of Message Bus. Configuring the interface therefore exposes some features available in Message Bus. Refer to the [Vespa APIs and interfaces](../reference/api/api.html)for clients using the _Document API_. The atoms in Vespa routing are _routes_ and _hops_. [document-processing](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) is an example of custom document processing, and useful for testing routing. ## A route is a sequence of hops The sequence of hosts, routers, bridges, gateways, and other devices that network traffic takes, or could take, from its source to its destination is what is classically termed a _route_. As a verb, _to route_ means to determine the link down which to send a packet, that will minimize its total journey time according to some routing algorithm. In Vespa, a route is simply a sequence of named hops. Instead of leaving selection logic to a route, the responsibility of resolving recipients is given to the [hops](#a-hop-is-a-point-to-point-transmission)' [selectors](#selection-logic). A hop can do more or less whatever it wants to change a message's journey through your application; it can slightly alter itself by choosing among some predefined recipients, it can change itself completely by either rewriting or looking up another hop, or it can even modify the entire route from that branch onwards. In effect, a route can end up branching at several points along its path, resulting in complex routes. As the figure suggests, Message Bus supports both [unicasting](https://en.wikipedia.org/wiki/Unicast)and [multicasting](https://en.wikipedia.org/wiki/Multicast) - Message Bus allows for arbitrarily complex routes. Each node in the above graph represents a Vespa service: ![Illustration of routes with hops](/assets/img/routing.svg) ## A hop is a point-to-point transmission In telecommunication, a _hop_ is one step, from one router to the next, on the path of a packet on an Internet Protocol network. It is a direct host-to-host connection forming part of the route between two hosts in a routed network such as the Internet. In more general terms, a hop is a point-to-point transmission in a series required to get a message from point A to point B. With Message Bus the concept of hops was introduced as the smallest steps of the transmission of a message. A hop consists of a _name_ that is used by the messaging clients to select it, a list of _recipient_ services that it may transmit to, and a _selector_ that is used to select among those recipients. Unlike traditional hops, in Vespa a hop is a transmission from one sender to many recipients. Well, the above is only partially true; it is the easiest way to understand the hop concept. In fact, a hop's recipient list is nothing more than a configured list of strings that is made available to all [routing policies](#routing-policies)that are named in the selector string. See [selection logic](#selection-logic) below for details. A hop's recipient is the service name of a Message Bus client that has been registered in Vespa's service location broker (vespa-slobrok). These names are well-defined once their derivation logic is understood; they are "/"-separated sets of address-components whose values are given by a service's role in the application. An example of a recipient is: ``` search/cluster.foo/*/feed-destination ``` The marked components of the above recipient, `/search/cluster.foo/*`, resolves to a host's symbolic name. This is the name with which a Message Bus instance was configured. The unmarked component, `feed-destination`, is the local name of the running service that the hop transmits to, i.e. the name of the _session_ created on the running Message Bus instance. The Active Configuration page in Vespa's administration interface gives an insight into what symbolic names exist for any given application by looking at its current configuration subscriptions. All available Message Bus services use their `ConfigId` as their host's symbolic name. See [vespa-route](../reference/operations/self-managed/tools.html#vespa-route) for how to inspect this, or use the [config API](../reference/api/config-v2.html). A hop can be prefixed using the special character "?" to force it to behave as if its[ignore-result](#hop) attribute was configured to "true". ### Asterisk A service identifier may include the special character "\*" as an address component. A recipient that contains this character is a request for the network to choose _any one_ service that matches it. ## Routing policies A routing policy is a protocol-specific algorithm that chooses among a list of candidate recipients for a single address component - see [hop description](#a-hop-is-a-point-to-point-transmission) above. These policies are designed and implemented as key parts of a Message Bus protocol. E.g. for the "Document" protocol these are what make up the routing behavior for document transmission. Without policies, a hop would only be able to match verbatim to a recipient, and thus the only advanced selection logic would be that of the [asterisk](#asterisk). In addition to implementing a selection algorithm, a routing policy must also implement a merging algorithm that combines the replies returned from each selected recipient into a single sensible reply. This is needed because a client does not necessarily know whether a message has been sent to one or multiple recipients, and **Message Bus guarantees a single reply for every message**. More formally, a routing policy is an arbitrarily large (or small), named, stand-alone piece of code registered with a Message Bus protocol. As discussed [above](#selection-logic), an instance of a policy is run both when resolving a route to recipients, and when merging replies. The policy is passed a `RoutingContext` object that pretty much allows it to do whatever it pleases to the route and replies. The same policy object and the same context object is used for both selection and merging. Refer to the [routing policy reference](#routing-policies-reference). ## Selection logic When Message Bus is about to route a message, at the last possible time, it inspects the **first** hop of the message's route to resolve a set of recipients. First, all of its [policies are resolved](#1-resolve-policy-directives). Second, the output service name is matched to the routing table to see if it maps to another hop or route. Finally, the message is [sent](#3-send-to-services) to all chosen recipient services. Because each policy can select multiple recipients, this can give rise to an arbitrarily complex routing tree. There are, of course, safeguards within Message Bus to prevent infinite recursions due to circular dependencies or misconfiguration. **Note:** It **is** possible to develop a different protocol with other policies to run in the application, but since all of Vespa's component only support the "Document" protocol, it makes little sense to do so. ### 1. Resolve Policy Directives The logic run at this step is actually simple; as long as the hop string contains a policy directive, i.e. some arbitrary string enclosed in square brackets, Message Bus will create and run an instance of that policy for the protocol of the message being routed. ``` Name: storage/cluster.backup Selector: storage/cluster.backup/distributor/[Distributor]/default Recipients: - ``` The above hop is probably the simplest hop you will encounter in Vespa; it has a single policy directive contained in a string that closely resembles service names discussed above, and it has no recipients. When resolving this hop, Message Bus creates an instance of the "DocumentRouteSelector" policy and invokes its `select()` method. The "Distributor" policy will replace its own directive with a proper distributor identifier, yielding a hop string that is now an unambiguous service identifier. ``` Name: indexing Selector: [DocumentRouteSelector] Recipients: search/cluster.music search/cluster.books ``` This hop has a selector which is nothing more than a single policy directive, "[DocumentRouteSelector]", and it has two configured recipients, "search/cluster.music" and "search/cluster.books". This policy expands the hop to zero, one or two **new** routes by replacing its own directive with the content of the recipient routes. Each of these routes may have one or more hops themselves. In turn, these will be processed independently. When replies are available from all chosen recipients, the policy's `merge()` method is invoked, and the resulting reply is passed upwards. ``` Name: default Selector: [AND:indexing storage/cluster.backup] Recipients: - ``` This hop has a selector but no recipients. The reason for this is best explained in the [routing policies reference](#routing-policies-reference), but it serves as an example of a hop that has no configured recipients. Notice how the policy directive contains a colon (":") which denotes that the remainder of the directive is a parameter to the policy constructor. This policy replaces the whole route of the message with the set of routes named in the parameter string. What routing policies are available depends on what protocol is currently running. As of this version the only supported protocol is "Document". This offers a set of routing policies discussed[below](#routing-policies-reference). ### 2. Resolve Hop- and Route names As soon as all policy directives have been resolved, Message Bus makes sure that the resulting string is, in fact, a service name and not the name of another hop or route (in that order) configured for the running protocol. The outcome is either: 1. The string is recognized as a hop name - The current hop is replaced by the named one, and processing returns to [step 1](#1-resolve-policy-directives). 2. The string is recognized as a route name - The current route, including all the hops following this, is replaced by the named one. Processing returns to [step 1](#1-resolve-policy-directives). 3. The string is accepted as a service name - This terminates the current branch of the routing tree. If all branches are terminated, processing proceeds to [step 3](#3-send-to-services). Because hop names are checked before route names, Message Bus also supports a "route:" prefix that forces the remainder of the string to resolve to a configured route or fail. ### 3. Send to Services When the route resolver reaches this point, the first hop of the message being sent has been resolved to an arbitrarily complex routing tree. Each leaf of this tree represents a service that is to receive the message, unless some policy has already generated a reply for it. No matter how many recipients are chosen, the message is serialized only once, and the network transmission is able to share the same chunk of memory between all recipients. As replies to the message arrive at the sender they are handed over to the corresponding leaf nodes of the routing tree, but merging will not commence until all leaf nodes are ready. Route resolving happens just before network transmission, after all resending logic. This means that if the route configuration changes while there are messages scheduled for resending, these will adhere to the new routes. If the resolution of a recipient passed through a hop that was configured to [ignore results](#hop), the network layer will reply immediately with a synthetic "OK". ## Example: Reconfigure the default route Assume that the application requires both search and storage capabilities, but that the default feed should only pass through to search. An imaginary scenario for this would be a system where there is a continuous feed being passed into Vespa with no filtering on spam. You would like a minimal storage-only cluster that stores a URL blocklist that can be used by a custom document processor to block incoming documents from offending sites. Apart from the blocklist and the document processor, add the following: ``` ``` This overrides the default route to pass through any available blocklisting document processor before being indexed. If the document processor decides to block a message, it must respond with an appropriate _ok_ reply, or your client software needs to accept whatever error reply you decide to return when blocking. When feeding blocklist information to storage, your application need only use the already available `storage` hop. See [#13193](https://github.com/vespa-engine/vespa/issues/13193) for a discussion on using _default_ as a name. ### The Document API With the current implementation of Document API running on Message bus, the configuration of the API implies configuration of the latter. Most clients will only ever route through this API. To use the Document API, you need to instantiate a class that implements the `DocumentAccess` interface. At the time of writing only `MessageBusDocumentAccess` exists, and it requires a parameter set for creation. These parameters are contained in an instance of `MessageBusDocumentAccessParam`that looks somewhat like the following: ``` class MessageBusDocumentAccessParams { String documentManagerConfigId; // The id to resolve to document manager config. String oosServerPattern; // The service pattern to resolve to fleet controller // services. String appConfigId; // The id to resolve to application config. String slobrokConfigId; // The id to resolve to slobrok config. String routingConfigId; // The id to resolve to messagebus routing config. String routeName; // The name of the route to send to. int traceLevel; // The trace level to use when sending.class SourceSessionParams { int maxPending; // Maximum number of pending messages. int maxPendingSize; // Maximum size of pending messages. double timeout; // Default timeout in seconds for messages // that have no timeout set. double requestTimeoutA; // Default request timeout in seconds, using double requestTimeoutB; // the equation 'requestTimeout = a \* retry + b'. double retryDelay; // Number of seconds to wait before resending.}; } ``` The most obvious configuration parameter is `routeName`, which informs the `MessageBusDocumentAccess` object the name of the route to use when sending documents and updates. The second parameter is `traceLevel`, which allows a client to see exactly how the data was transmitted. **Note:** Tracing can be enabled on a level from 1-9, where a higher number means more tracing. Because the concept of tracing is not exposed by the Document API itself, its data will simply be printed to standard output when a reply arrives for the sender. This should therefore not be used in production, but can be helpful when debugging. Refer to the [Document API JavaDoc](https://javadoc.io/doc/com.yahoo.vespa/documentapi). ## Routing services This is the reference documentation for all elements in the _routing_ section of [services.xml](../reference/applications/services/services.html). ``` [routing [version]](#routing)[routingtable [protocol, verify]](#routingtable)[route [name, hops]](#route)[hop [name, selector, ignore-result]](#hop)[recipient [session]](#recipient)[services [protocol]](#services)[service [name]](#service) ``` ## routing Contained in [services](../reference/applications/services/services.html#services). The container element for all configuration related to routing. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | version | required | number | | Must be set to "1.0" in this Vespa-version | Optional subelements: - [routingtable](#routingtable) - [services](#services) Example: ``` ``` ## routingtable Contained in [routing](#routing). Specifies a routing table for a specific protocol. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | protocol | required | | | Configure which protocol to use. Only the protocol _document_ is defined, so if you define a routing table for an unsupported protocol, the application will just log an INFO entry that contains the name of that protocol. | | verify | optional | boolean | | ToDo: document this | Optional subelements: - [route](#route) - [hop](#hop) Example: ``` ``` ## route Contained in [routingtable](#routingtable). Specifies a route for a message to its destination through a set of intermediate hops. If at least one hop in a route does not exist, the application will fail to start and issue an error that contains the name of that hop. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | name | required | | | Route name. | | hops | required | | | A whitespace-separated list of hop names, where each name must be a valid hop. | Subelements: none Example: ``` ``` ## hop Contained in [routingtable](#routingtable). Specifies a single hop that can be used to construct one or more routes. A hop must have a name that is unique within the routing table to which it belongs. A hop contains a selector string and a list of recipient sessions. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | name | required | | | Hop name. | | selector | required | | | Selector string. | | ignore-result | optional | | | If set to _true_, specifies that the result of routing through that hop should be ignored. | Optional subelements: - [recipient](#recipient) Example: ``` ``` ## recipient Contained in [hop](#hop). Specifies a recipient session of a hop. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | session | required | | | This attribute must correspond to a running instance of a service that can be routed to. All session identifiers consist of a location part and a name. A search node is always given a session name on the form _search/cluster.name/g#/r#/c#/feed-destination_, whereas a document processor service is always named _docproc/cluster.name/docproc/#/feed-processor_. | Subelements: none Example: ``` ``` ## services Contained in [routing](#routing). Specifies a set of services available for a specific protocol. At the moment the only supported protocol is _document_. The services specified are used by the route verification step to allow hops and routes to reference services known to exist, but that can not be derived from _services.xml_. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | protocol | required | | | Configure which protocol to use. Only the protocol _document_ is defined. | Optional subelements: - [service](#service) Example: ``` ``` ## service Contained in [services](#services). Specifies a single known service that can not be derived from the _services.xml_. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | name | required | | | The name of the service. | Subelements: none Example: ``` ``` ## Routingpolicies reference This article contains detailed descriptions of the behaviour of all routing policies available in Vespa. The _Document protocol_ is currently the only Message Bus protocol supported by Vespa. Furthermore, all routing policies that are part of this protocol share a common code path for [merging replies](#merge). The policies offered by the protocol are: - [AND](#and) - Selects all configured recipient hops. - [DocumentRouteSelector](#documentrouteselector) - Uses a [document selection string](../reference/writing/document-selector-language.html) to select compatible routes - [Content](#content) - Selects a content cluster distributor based on system state - [MessageType](#messagetype) - Selects a next hop based on message type - [Extern](#extern) - Selects a recipient by querying a remote Vespa application - [LocalService](#localservice) - Selects a recipient based on ip address - [RoundRobin](#roundrobin) - Selects one from the configured recipients in round-robin order - [SubsetService](#subsetservice) - Selects only among a subset of all matching services - [LoadBalancer](#loadbalancer) - A round-robin policy that chooses between the recipients by generating a weight according to their performance ### Common Document `merge()` logic The shared merge logic of most Document routing policies is an attempt to do the "right" thing when merging multiple replies into one. It works by first stepping through all replies, storing their content as either: 1. OK replies, 2. IGNORE replies, or 3. ERROR replies If at least one ERROR reply is found, return a new reply that contains all the errors of the others. If there is at least one OK reply, return the first OK reply, but transfer all feed information from the others to this (this is specific data for start- and end-of-feed messages). Otherwise, return a new reply that contains all the IGNORE errors. Pseudocode: ``` for each reply, do if reply has no errors, do store reply in OK list else, do if reply has only IGNORE errors copy all errors from reply to IGNORE list else, do copy all errors from reply to ERROR list if ERROR list is not empty, do return new reply with all errors else, do if OK list is not empty, do return first reply with all feed answers else, do return new reply with all IGNORE errors ``` ## Routing policies reference | Policy | Description | | --- | --- | | AND | This is a mostly a convenience policy that allows the user to fork a message's route to all configured recipients. It is not message-type aware, and will simply always select all recipients. Replies are merged according to the [shared logic](#merge). The optional string parameter is parsed as a space-separated list of hops. Configured recipients have precedence over parameter-given recipients, although this is likely to be changed in the future. | | DocumentRouteSelector | This policy is responsible for selecting among the policy's recipients according to the subscription rules defined by a content cluster's _documents_ element in [services.xml](../reference/applications/services/services.html). If the "selection" attribute is set in the "documents" element, its value is processed as a [document select](../reference/writing/document-selector-language.html) string, and run on documents and document updates to determine routes. If the "feedname" attribute is set, all feed commands are filtered through it. The recipient list of this policy is required to map directly to route names. E.g. if a recipient is "search/cluster.music", and a message is appropriate according to the selection criteria, the message is routed to the "search/cluster.music" route. If the route does not exist, this policy will reply with an error. In short, this policy selects one or more recipient routes based on document content and configured criteria. If more than one route is chosen, its replies are merged according to the [shared logic](#merge). This policy does not support any parameters. The configuration for this is "documentrouteselectorpolicy" available from config id "routing/documentapi". **Important:** Because GET messages do not contain any document on which to run the selection criteria, this policy returns an IGNORED reply that the merging logic processes. You can see this by attempting to retrieve a document from an application that does not have a content cluster. | | Content | This policy allows you to send a message to a content cluster. The policy uses a system state retrieved from the cluster in question in conjunction with slobrok information to pick the correct distributor for your message. In short; use this policy when communicating with document storage. This policy supports multiple parameters, up to one each of: | cluster | The name of the cluster you want to reach. Example: cluster=mycluster | | config | A comma-separated list of config servers or proxies you want to use to fetch configuration for the policy. This can be used to communicate with other clusters than the one you're currently in. Example: config=tcp/myadmin1:19070,tcp/myadmin2:19070 | Separate each parameter with a semicolon. | | MessageType | This policy will select the next hop based on the type of the message. You configure where all messages should go (defaultroute). Then you configure what messages types should be overridden and sent to alternative routes. It is currently only used internally by vespa when using the [content](../reference/applications/services/content.html#content) element. | | Extern | This policy implements the necessary logic to communicate with an external Vespa application and resolve a single service pattern using that other application's slobrok servers. Keep in mind that there might be some delay from the moment this policy is initially created and when it receives the response to its service query, so using this policy might cause a message to be resent a few times until it is resolved. If you disable retries, this policy might cause all messages to fail for the first seconds. This policy uses its parameter for both the address of the extern slobrok server to connect to, and also the pattern to use for querying. The parameter is required to be on the form `;`, where `spec` is a comma-separated list of slobrok connection specs on the form "tcp/hostname:port", and `service` is a service running on the remote Vespa application. **Important:** The remote application needs to have a version of both message bus and the document api that is binary compatible with the application sending from. This can be a problem even between patch releases, so keep the application versions in sync when using this policy. | | LocalService | This policy is used to select among all matching services, but preferring those running on the same host as the current one. The pattern used when querying for available services is the current one, but replacing the policy directive with an asterisk. E.g. the hop "docproc/cluster.default/[LocalService]/chain.default" would prefer local services among all those that match the pattern "docproc/cluster.default/\*/chain.default". If there are multiple matching services that run locally, this policy will do simple round-robin load balancing between them. If no matching services run locally, this policy simply returns the asterisk as a match to allow the underlying network logic to do load balancing among all available. This policy accepts an optional parameter which overrides the local hostname. Use this if you wish the hop to prefer some specific host. **Important:** There is no additional logic to replace other policy directives with an asterisk, meaning that if other policies directives are present in the hop string after "[LocalService]", no services can possibly be matched. | | RoundRobin | This policy is used to select among a configured set of recipients. For each configured recipient, this policy determines what online services are matched, and then selects one among all of those in round-robin order. If none of the configured recipients match any available service, this policy returns an error that indicates to the sender that it should retry later. Because this policy only selects a single recipient, it contains no merging logic. | | SubsetService | This policy is used to select among a subset of all matching services, and is used to minimize number of connections in the system. The pattern used when querying for available services is the current one, but replacing the policy directive with an asterisk. E.g. the hop "docproc/cluster.default/[SubsetService:3]/chain.default" would select among a subset of all those that match the pattern "docproc/cluster.default/\*/chain.default". Given that the pattern returns a set of matches, this policy stores a subset of these based on the hash-value of the running message bus' connection string (this is unique for each instance). If there are no matching services, this policy returns the asterisk as a match to allow the underlying network logic to fail gracefully. This policy parses its optional parameter as the size of the subset. If none is given, the subset defaults to size 5. **Important:** There is no additional logic to replace other policy directives with an asterisk, meaning that if other policies directives are present in the hop string after "[SubsetService]", no services can possibly be matched. | | LoadBalancer | This policy is used to send to a stateless cluster such as docproc, where any node can be chosen to process any message. Messages are sent between the nodes in a round-robin fashion, but each node is assigned a weight based on its performance. The weights are calculated by measuring the number of times the node had a full input-queue and returned a busy response. Use this policy to send to docproc clusters that have nodes with different performance characteristics. This policy supports multiple parameters, up to one each of: | cluster | The name of the cluster you want to reach. Example: cluster=docproc/cluster.default (mandatory) | | session | The destination session you want to reach. In the case of docproc, the name of the docproc chain. Example: session=chain.mychain (mandatory) | | config | A comma-separated list of config servers or proxies you want to use to fetch configuration for the policy. This can be used to communicate with other clusters than the one you're currently in. Example: config=tcp/myadmin1:19070,tcp/myadmin2:19070 | Separate each parameter with a semicolon. By default, this policy will use the current Vespa cluster for configuration. | ## Routing for indexing A normal Vespa configuration has container and content cluster(s), with one or more document types defined in _schemas_. Routing document writes means routing documents to the _indexing_ container cluster, then the right _content_ cluster. The indexing cluster is a container cluster - see [multiple container clusters](#multiple-container-clusters) for variants. Add the [document-api](../reference/applications/services/container.html#document-api) feed endpoint to this cluster. The mapping from document type to content cluster is in [document](../reference/applications/services/content.html#document) in the content cluster. From [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml): ``` \ 1 ``` Given this configuration, Vespa knows which is the container cluster used for indexing, and which content cluster that stores the _music_ document type. Use [vespa-route](../reference/operations/self-managed/tools.html#vespa-route) to display routing generated from this configuration: ``` $ vespa-route There are 6 route(s): 1.default2. default-get 3. music 4. music-direct 5. music-index 6. storage/cluster.music There are 2 hop(s): 1.container/chain.indexing 2. indexing ``` Note the _default_ route. This route is auto-generated by Vespa, and is used when no other route is used when using [/document/v1](../reference/api/document-v1.html). _default_ points to _indexing_: ``` $ vespa-route --route default The route 'default' has 1 hop(s): 1. indexing ``` ``` $ vespa-route --hop indexing The hop 'indexing' has selector: [DocumentRouteSelector] And 1 recipient(s): 1. music ``` ``` $ vespa-route --route music The route 'music' has 1 hop(s): 1. [MessageType:music] ``` In short, the _default_ route handles documents of type _music_. Vespa will route to the container cluster with _document-api_ - note the _chain.indexing_ above. This is a set of built-in _document processors_ that does the indexing (below). Refer to the [trace appendix](#appendix-trace) for routing details. ## chain.indexing This indexing chain is set up on the container once a content cluster has `mode="index"`. The [IndexingProcessor](https://github.com/vespa-engine/vespa/blob/master/docprocs/src/main/java/com/yahoo/docprocs/indexing/IndexingProcessor.java) annotates the document based on the [indexing script](../reference/writing/indexing-language.html) generated from the schema. Example: ``` $ vespa-get-config -n vespa.configdefinition.ilscripts \ -i container/docprocchains/chain/indexing/component/com.yahoo.docprocs.indexing.IndexingProcessor maxtermoccurrences 100 fieldmatchmaxlength 1000000 ilscript[0].doctype "music" ilscript[0].docfield[0] "artist" ilscript[0].docfield[1] "artistId" ilscript[0].docfield[2] "title" ilscript[0].docfield[3] "album" ilscript[0].docfield[4] "duration" ilscript[0].docfield[5] "year" ilscript[0].docfield[6] "popularity" ilscript[0].content[0] "clear_state | guard { input artist | tokenize normalize stem:"BEST" | summary artist | index artist; }" ilscript[0].content[1] "clear_state | guard { input artistId | summary artistId | attribute artistId; }" ilscript[0].content[2] "clear_state | guard { input title | tokenize normalize stem:"BEST" | summary title | index title; }" ilscript[0].content[3] "clear_state | guard { input album | tokenize normalize stem:"BEST" | index album; }" ilscript[0].content[4] "clear_state | guard { input duration | summary duration; }" ilscript[0].content[5] "clear_state | guard { input year | summary year | attribute year; }" ilscript[0].content[6] "clear_state | guard { input popularity | summary popularity | attribute popularity; }" ``` Refer to [linguistics](../linguistics/linguistics.html) for more details. By default, the indexing chain is set up on the _first_ container cluster in _services.xml_. When having multiple container clusters, it is recommended to configure this explicitly, see [multiple container clusters](#multiple-container-clusters). ## Document selection The [document](../reference/applications/services/content.html#document) can have a [selection](../reference/writing/document-selector-language.html) string, normally used to expire documents. This is also evaluated during feeding, so documents that would immediately expire are dropped. This is not an error, the document API will report 200 - but can be confusing. The evaluation is done in the [DocumentRouteSelector](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/messagebus/protocol/DocumentRouteSelectorPolicy.java) at the feeding endpoint - _before_ any processing/indexing. I.e. the document is evaluated using the selection string (drop it or not), then where to route it, based on document type. Example: the selection is configured to not match the document being fed: ``` 1 ``` ``` $ vespa-feeder --trace 6 doc.json [1564576570.693] Source session accepted a 4096 byte message. 1 message(s) now pending. [1564576570.713] Sequencer sending message with sequence id '-1163801147'. [1564576570.721] Recognized 'default' as route 'indexing'. [1564576570.727] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false). [1564576570.811] Running routing policy 'DocumentRouteSelector'. [1564576570.822] Policy 'DocumentRouteSelector' assigned a reply to this branch. [1564576570.828] Sequencer received reply with sequence id '-1163801147'. [1564576570.828] Source session received reply. 0 message(s) now pending. Messages sent to vespa (route default) : ---------------------------------------- PutDocument: ok: 0 msgs/sec: 0.00 failed: 0ignored: 1latency(min, max, avg): 9223372036854775807, -9223372036854775808, 0 ``` Without the selection (i.e. everything matches): ``` $ vespa-feeder --trace 6 doc.json [1564576637.147] Source session accepted a 4096 byte message. 1 message(s) now pending. [1564576637.168] Sequencer sending message with sequence id '-1163801147'. [1564576637.176] Recognized 'default' as route 'indexing'. [1564576637.180] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false). [1564576637.256] Running routing policy 'DocumentRouteSelector'.[1564576637.268] Component '[MessageType:music]' selected by policy 'DocumentRouteSelector'.... Messages sent to vespa (route default) : ---------------------------------------- PutDocument: ok: 1 msgs/sec: 1.05 failed: 0ignored: 0latency(min, max, avg): 845, 845, 845 ``` In the last case, in the [DocumentRouteSelector](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/messagebus/protocol/DocumentRouteSelectorPolicy.java) routing policy, the document matched the selection string / there was no selection string, and the document was forward to the nex hop in the route. ## Document processing Add custom processing of documents using [document processing](../applications/document-processors.html). The normal use case is to add document processors in the default route, before indexing. Example: ``` \\\\\ 1 type="music" mode="index" /> ``` Note that a new hop _default/chain.default_ is added, and the default route is changed to include this: ``` $ vespa-route There are 6 route(s): 1. default 2. default-get 3. music 4. music-direct 5. music-index 6. storage/cluster.music There are 3 hop(s): 1. default/chain.default 2. default/chain.indexing 3. indexing ``` ``` $ vespa-route --route default The route 'default' has 2 hop(s): 1. default/chain.default 2. indexing ``` Note that the document processing chain must be called _default_ to automatically be included in the default route. ### Inherit indexing chain An alternative to the above is inheriting the indexing chain - use this when getting this error: ``` Indexing cluster 'XX' specifies the chain 'default' as indexing chain. As the 'default' chain is run by default, using it as the indexing chain will run it twice. Use a different name for the indexing chain. ``` Call the chain something else than _default_, and let it inherit _indexing_: ``` \\\\\ 1\\\\ ``` See [#13193](https://github.com/vespa-engine/vespa/issues/13193) for details. ## Multiple container clusters Vespa can be configured to use more than one container cluster. Use cases can be to separate search and document processing or having different document processing clusters due to capacity constraints or dependencies. Example with separate search and feeding/indexing container clusters: ``` \\ 1 \ ``` Notes: - The indexing route is explicit using [document-processing](../reference/applications/services/content.html#document-processing) elements from the content to the container cluster - Set up _document-api_ on the same cluster as indexing to avoid network hop from feed endpoint to indexing processors - If no _document-processing_ is configured, it defaults to a container cluster named _default_. When using multiple container clusters, it is best practice to explicitly configure _document-processing_. Observe the _container-indexing/chain.indexing_ hop, and the indexing chain is set up on the _container-indexing_ cluster: ``` $ vespa-route There are 6 route(s): 1. default 2. default-get 3. music 4. music-direct 5. music-index 6. storage/cluster.music There are 2 hop(s): 1. container-indexing/chain.indexing 2. indexing ``` ``` $ curl -s http://localhost:8081 | python -m json.tool | grep -C 3 chain.indexing { "bundle": "container-disc:7.0.0", "class": "com.yahoo.messagebus.jdisc.MbusClient", "id": "chain.indexing@MbusClient", "serverBindings": [] }, { -- "class": "com.yahoo.docproc.jdisc.DocumentProcessingHandler", "id": "com.yahoo.docproc.jdisc.DocumentProcessingHandler", "serverBindings": [ "mbus://*/chain.indexing" ] }, { ``` ## Appendix: trace Below is a trace example, no selection string: ``` $ cat doc.json [ { "put": "id:mynamespace:music::123", "fields": { "album": "Bad", "artist": "Michael Jackson", "title": "Bad", "year": 1987, "duration": 247 } } ] $ vespa-feeder --trace 6 doc.json [1564571762.403] Source session accepted a 4096 byte message. 1 message(s) now pending. [1564571762.420] Sequencer sending message with sequence id '-1163801147'. [1564571762.426] Recognized 'default' as route 'indexing'. [1564571762.429] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false). [1564571762.489] Running routing policy 'DocumentRouteSelector'. [1564571762.493] Component '[MessageType:music]' selected by policy 'DocumentRouteSelector'. [1564571762.493] Resolving '[MessageType:music]'. [1564571762.520] Running routing policy 'MessageType'. [1564571762.520] Component 'music-index' selected by policy 'MessageType'. [1564571762.520] Resolving 'music-index'. [1564571762.520] Recognized 'music-index' as route 'container/chain.indexing [Content:cluster=music]'. [1564571762.520] Recognized 'container/chain.indexing' as HopBlueprint(selector = { '[LoadBalancer:cluster=container;session=chain.indexing]' }, recipients = { }, ignoreResult = false). [1564571762.526] Running routing policy 'LoadBalancer'. [1564571762.538] Component 'tcp/vespa-container:19101/chain.indexing' selected by policy 'LoadBalancer'. [1564571762.538] Resolving 'tcp/vespa-container:19101/chain.indexing [Content:cluster=music]'. [1564571762.580] Sending message (version 7.83.27) from client to 'tcp/vespa-container:19101/chain.indexing' with 179.853 seconds timeout. [1564571762.581] Message (type 100004) received at 'container/container.0' for session 'chain.indexing'. [1564571762.581] Message received by MbusServer. [1564571762.582] Request received by MbusClient. [1564571762.582] Running routing policy 'Content'. [1564571762.582] Selecting route [1564571762.582] No cluster state cached. Sending to random distributor. [1564571762.582] Too few nodes seen up in state. Sending totally random. [1564571762.582] Component 'tcp/vespa-container:19114/default' selected by policy 'Content'. [1564571762.582] Resolving 'tcp/vespa-container:19114/default'. [1564571762.586] Sending message (version 7.83.27) from 'container/container.0' to 'tcp/vespa-container:19114/default' with 179.995 seconds timeout. [1564571762.587181] Message (type 100004) received at 'storage/cluster.music/distributor/0' for session 'default'. [1564571762.587245] music/distributor/0 CommunicationManager: Received message from message bus [1564571762.587510] Communication manager: Sending Put(BucketId(0x2000000000000020), id:mynamespace:music::123, timestamp 1564571762000000, size 275) [1564571762.587529] Communication manager: Passing message to source session [1564571762.587547] Source session accepted a 1 byte message. 1 message(s) now pending. [1564571762.587681] Sending message (version 7.83.27) from 'storage/cluster.music/distributor/0' to 'storage/cluster.music/storage/0/default' with 180.00 seconds timeout. [1564571762.587960] Message (type 10) received at 'storage/cluster.music/storage/0' for session 'default'. [1564571762.588052] music/storage/0 CommunicationManager: Received message from message bus [1564571762.588263] PersistenceThread: Processing message in persistence layer [1564571762.588953] Communication manager: Sending PutReply(id:mynamespace:music::123, BucketId(0x2000000000000020), timestamp 1564571762000000) [1564571762.589023] Sending reply (version 7.83.27) from 'storage/cluster.music/storage/0'. [1564571762.589332] Reply (type 11) received at 'storage/cluster.music/distributor/0'. [1564571762.589448] Source session received reply. 0 message(s) now pending. [1564571762.589459] music/distributor/0Communication manager: Received reply from message bus [1564571762.589679] Communication manager: Sending PutReply(id:music:music::123, BucketId(0x0000000000000000), timestamp 1564571762000000) [1564571762.589807] Sending reply (version 7.83.27) from 'storage/cluster.music/distributor/0'. [1564571762.590] Reply (type 200004) received at 'container/container.0'. [1564571762.590] Routing policy 'Content' merging replies. [1564571762.590] Reply received by MbusClient. [1564571762.590] Sending reply from MbusServer. [1564571762.590] Sending reply (version 7.83.27) from 'container/container.0'. [1564571762.612] Reply (type 200004) received at client. [1564571762.613] Routing policy 'LoadBalancer' merging replies. [1564571762.613] Routing policy 'MessageType' merging replies. [1564571762.615] Routing policy 'DocumentRouteSelector' merging replies. [1564571762.622] Sequencer received reply with sequence id '-1163801147'. [1564571762.622] Source session received reply. 0 message(s) now pending. Messages sent to vespa (route default) : ---------------------------------------- PutDocument: ok: 1 msgs/sec: 3.30 failed: 0 ignored: 0 latency(min, max, avg): 225, 225, 225 ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [A route is a sequence of hops](#a-route-is-a-sequence-of-hops) - [A hop is a point-to-point transmission](#a-hop-is-a-point-to-point-transmission) - [Asterisk](#asterisk) - [Routing policies](#routing-policies) - [Selection logic](#selection-logic) - [1. Resolve Policy Directives](#1-resolve-policy-directives) - [2. Resolve Hop- and Route names](#2-resolve-hop-and-route-names) - [3. Send to Services](#3-send-to-services) - [Example: Reconfigure the default route](#example-reconfigure-the-default-route) - [The Document API](#the-document-api) - [Routing services](#routing-services) - [routing](#routing) - [routingtable](#routingtable) - [route](#route) - [hop](#hop) - [recipient](#recipient) - [services](#services) - [service](#service) - [Routingpolicies reference](#routingpolicies-reference) - [Common Document](#merge) - [Routing policies reference](#routing-policies-reference) - [Routing for indexing](#routing-for-indexing) - [chain.indexing](#chain-indexing) - [Document selection](#document-selection) - [Document processing](#document-processing) - [Inherit indexing chain](#inherit-indexing-chain) - [Multiple container clusters](#multiple-container-clusters) - [Appendix: trace](#appendix-trace) --- # Source: https://docs.vespa.ai/en/reference/writing/document-selector-language.html.md # Document selector language reference This document describes the _document selector language_, used to select a subset of documents when feeding, dumping and garbage collecting data. It defines a text string format that can be parsed to build a parse tree, which in turn can answer whether a given document is contained within the subset or not. ## Examples Match all documents in the `music` schema: `music` As applications can have multiple schemas, match document type (schema) and then a specific value in the `artistname` field: `music and music.artistname == "Coldplay"` Below, the first condition states that the documents should be of type music, and the author field must exist. The second states that the field length must be set, and be less than or equal to 1000: `music.author and music.length <= 1000` The next expression selects all documents where either of the subexpressions are true. The first one states that the author field should include the name John Doe, with anything in between or in front. The `\n` escape is converted to a newline before the field comparison is done. Thus requiring the field to end with Doe and a newline for a match to be true. The second expression selects all books where no author is defined: `book.author = "*John*Doe\n" or not book.author` Here is an example of how parentheses are used to group expressions. Also, a constant value false has been used. Note that the `(false or music.test)` sub-expression could be exchanged with just `music.test` without altering the result of the selection. The sub-expression within the `not` clause selects all documents where the size field is above 1000 and the test field is defined. The `not` clause inverts the selection, thus selecting all documents with size less than or equal to 1000 or the test field undefined: `not (music.length > 1000) and (false or music.test)` Other examples: - `music.version() == 3 and (music.givenname + " " + music.surname).lowercase() = "bruce spring*"` - `id.user.hash().abs() % 300 % 7 = 1` - `music.wavstream.hash() == music.checksum` - `music.size / music.length > 10` - `music.expire > now() - 7200` ## Case sensitiveness The identifiers used in this language (`and or not true false null id scheme namespace specific user group`) are not case-sensitive. It is recommended to use lower cased identifiers for consistency with the documentation. ## Branch operators / precedence The branch operators are used to combine other nodes in the parse tree generated from the text format. The different branch nodes existing is listed in the table below in order of precedence. Operators listed in order of precedence: | Operator | Description | | --- | --- | | NOT | Unary prefix operator inverting the selection of the child node | | AND | Binary infix operator, which is true if all its children are | | OR | Binary infix operator, which is true if any of its children are | Use parentheses to define own precedence. `a and b or c and d` is equivalent to `(a and b) or (c and d) ` since and has higher precedence than or. The expression `a and (b or c) and d` is not equivalent to the previous two, since parentheses have been used to force the or-expression to be evaluated first. Parentheses can also be used in value calculations. Where modulo `%` has the highest precedence, multiplication `*` and division `/` next, addition `+` and subtractions `-` have lowest precedence. ## Primitives | Primitive | Description | | --- | --- | | Boolean constant | The boolean constants `true` and `false` can be used to match all/nothing | | Null constant | Referencing a field that is not present in a document returns a special `null` value. The expression `music.title` is shorthand for `music.title != null`. There are potentially subtle interactions with null values when used with comparisons, see [comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values). | | Document type | A document type can be used as a primitive to select a given type of documents - [example](/en/writing/visiting.html#analyzing-field-values). | | Document field specification | A document field specification (`doctype.field`) can be used as a primitive to select all documents that have field set - a shorter form of `doctype.field != null` | | Comparison | The comparison is a primitive used to compare two values | ## Comparison Comparisons operators compares two values using an operator. All the operators are infix and take two arguments. | Operator | Description | | --- | --- | | \> | This is true if the left argument is greater than the right one. Operators using greater than or less than notations only makes sense where both arguments are either numbers or strings. In case of strings, they are ordered by their binary (byte-wise) representation, with the first character being the most significant and the last character the least significant. If the argument is of mixed type or one of the arguments are not a number or a string, the comparison will be invalid and not match. | | \< | Matches if left argument is less than the right one | | \<= | Matches if the left argument is less than or equal to the right one | | \>= | Matches if the left argument is greater than or equal to the right one | | == | Matches if both arguments are exactly the same. Both arguments must be of the same type for a match | | != | Matches if both arguments are not the same | | = | String matching using a glob pattern. Matches only if the pattern given as the right argument matches the whole string given by the left argument. Asterisk `*` can be used to match zero or more of any character. Question mark `?` can be used to match any one character. The pattern matching operators, regex `=~` and glob `=`, only makes sense if both arguments are strings. The regex operator will never match anything else. The glob operator will revert to the behaviour of `==` if both arguments are not strings. | | =~ | String matching using a regular expression. Matches if the regular expression given as the right argument matches the string given as the left argument. Regex notation is like perl. Use '^' to indicate start of value, '$' to indicate end of value | ### Comparisons with missing fields (null values) The only comparison operators that are well-defined when one or both operands may be `null`(i.e. field is not present) are `==` and `!=`. Using any other comparison operators on a `null` value will yield a special _invalid_ value. Invalid values may "poison" any logical expression they are part of: - `AND` returns invalid if none of its operands are false and at least one is invalid - `OR` returns invalid if none of its operands are true and at least one is invalid - `NOT` returns invalid if the operand is invalid If an invalid value is propagated as the root result of a selection expression, the document is not considered a match. This is usually the behavior you want; if a field does not exist, any selection requiring it should not match either. However, in garbage collection, documents which results in an invalid selection are _not_ removed as that could be dangerous. One example where this may have _unexpected_ behavior: 1. You have many documents of type `foo` already fed into a cluster. 2. You add a new field `expires_at_time` to the document type and update a subset of the documents that you wish to keep. 3. You add a garbage collection selection to the `foo` document declaration to only keep non-expired documents: `foo.expires_at_time > now()` At this point, the old documents that _do not_ contain an `expires_at_time` field will _not_ be removed, as the expression will evaluate to invalid instead of `false`. To work around this issue, "short-circuiting" using a field presence check may be used: `(foo.expires_at_time != null) and (foo.expires_at_time > now())`. ## Null behavior with imported fields If your selection references imported fields, `null` will be returned for any imported field when the selection is evaluated in a context where the referenced document can't be retrieved. For GC expressions this will happen in the client as part of the feed routing logic, and it may also happen on backend nodes whose parent document set is incomplete (in case of node failures etc.). It is therefore important that you have this in mind when writing GC selections using imported fields. When you specify a selection criteria in a `` tag, you're stating what a document must satisfy in order to be fed into the content cluster and to be kept there. As an example, imagine a document type `music_recording` with an imported field `artist_is_cool` that points to a boolean field `is_cool` in a parent `artist` document. If you only want your cluster to retain recordings from artists that are certifiably cool, you might be tempted to write a selection like the following: ``` ``` ``` ``` **This won't work as expected**, because this expression is evaluated as part of the feeding pipeline to figure out if a cluster should accept a given document. At that point in time, there is no access to the parent document. Consequently, the field will return `null` and the document won't be routed to the cluster. Instead, write your expressions to handle the case where the parent document _may not exist_: ``` ``` ``` ``` With this selection, we explicitly let a document be accepted into the cluster if its imported field is _not_ available. However, if it _is_ available, we allow it to be used for GC. ## Locale / Character sets The language currently does not support character sets other than ASCII. Glob and regex matching of single characters are not guaranteed to match exactly one character, but might match a part of a character represented by multiple byte values. ## Values The comparison operator compares two values. A value can be any of the following: | Document field specification | Syntax: `.` Documents have a set of fields defined, depending on the document type. The field name is the identifier used for the field. This expression returns the value of the field, which can be an integer, a floating point number, a string, an array, or a map of these types. For multivalues, we support only the _equals_ operator for comparison. The semantics is that the array returned by the fieldvalue must _contain_ at least one element that matches the other side of the comparison. For maps, there must exist a key matching the comparison. The simplest use of the fieldpath is to specify a field, but for complex types please refer to [the field path syntax documentation](../schemas/document-field-path.html). | | Id | Syntax: ` id.[scheme|namespace|type|specific|user|group] ` Each document has a Document Id, uniquely identifying that document within a Vespa installation. The id operator returns the string identifier, or if an optional argument is given, a part of the id. - scheme (id) - namespace (to separate different users' data) - type (specified in the id scheme) - specific (User specified part to distinguish documents within a namespace) - user (The number specified in document ids using the n= modifier) - group (The string group specified in document ids using the g= modifier) | | null | The value null can be given to specify nothingness. For instance, a field specification for a document not containing the field will evaluate to null, so the comparison 'music.artist == null' will select all documents that don't have the artist field set. 'id.user == null' will match all documents that don't use the `n=`[document id scheme](../../schemas/documents.html#id-scheme). Tensor fields can _only_ be compared against null. It's not possible to write a document selection that uses the _contents_ of tensor fields—only their presence can be checked. | | Number | A value can be a number, either an integer or a floating point number. Type of number is insignificant. You don't have to use the same type of number on both sides of a comparison. For instance '3.0 \< 4' will match, and '3.0 == 3' will probably match (operator == is generally not advised for floating point numbers due to rounding issues). Numbers can be written in multiple ways - examples: ``` 1234 -234 +53 +534.34 543.34e4 -534E-3 0.2343e-8 ``` | | Strings | A string value is given quoted with double quotes (i.e. "mystring"). The string is interpreted as an ASCII string. that is, only ASCII values 32 to 126 can be used unescaped, apart from the characters \ and " which also needs to be escaped. Escape common special characters like: | Character | Escaped character | | --- | --- | | Newline | \n | | Carriage return | \r | | Tab | \t | | Form feed | \f | | " | \" | | Any other character | \x## (where ## is a two digit hexadecimal number specifying the ASCII value. | | ### Value arithmetics You can do arithmetics on values. The common arithmetics operators addition `+`, subtraction `-`, multiplication `*`, division `/` and modulo `%` are supported. ### Functions Functions are called on something and returns a value that can be used in comparison expressions: | Value functions | A value function takes a value, does something with it and returns a value which can be of any type. - _abs()_ Called on a numeric type, returns the absolute value of that numeric type. That is -3 returns 3 and -4.3 returns 4.3. - _hash()_ Calculates an MD5 hash of whatever value it is called on. The result is a signed 64-bit integer. (Use abs() after if you want to only get positive hash values). - _lowercase()_ Called on a string value to turn upper case characters into lower case ones. **NOTE:** This only works for the characters 'a' through 'z', no locale support. | | Document type functions | Some functions can take a document type instead of a value, and return a value based on the type. - _version()_ The `version()` function returns the version number of a document type. | #### Now function Document selection provides a _now()_ function, which returns the current date timestamp. Use this to filter documents by age, typically for [garbage collection](../applications/services/content.html#documents). **Example**: If you have a long field _inserttimestamp_ in your `music` schema, this expression will only match documents from the last two hours: `music.inserttimestamp > now() - 7200` ## Using imported fields in selections When using [parent-child](../../schemas/parent-child.html) you can refer to simple imported fields (i.e. top-level primitive fields) in selections as if they were regular fields in the child document type. Complex fields (collections, structures etc.) are not supported. **Important:** special care needs to be taken when using document selections referencing imported fields, especially if using these are part of garbage collection expressions. If an imported field references a document that cannot be accessed at evaluation time, the imported field behaves as if it had been a regular, non-present field in the child document. In other words, it will return the special `null` value. See [comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values)for a more detailed discussion of null-semantics and how to write selections that handle these in a well-defined manner. In particular, read [null behavior with imported fields](#null-behavior-with-imported-fields) if you're writing GC selections. ### Example The following is an example of a 3-level parent-child hierarchy. Grandparent schema: ``` schema grandparent { document grandparent { field a1 type int { indexing: attribute | summary } } } ``` Parent schema, with reference to grandparent: ``` schema parent { document parent { field a2 type int { indexing: attribute | summary } field ref type reference { indexing: attribute | summary } } import field ref.a1 as a1 {} } ``` Child schema, with reference to parent and (transitively) grandparent: ``` schema child { document child { field a3 type int { indexing: attribute | summary } field ref type reference { indexing: attribute | summary } } import field ref.a1 as a1 {} import field ref.a2 as a2 {} } ``` Using these in document selection expressions is easy: Find all child docs whose grandparents have an `a1` greater than 5: `child.a1 > 5` Find all child docs whose parents have an `a2` of 10 and grandparents have `a1` of 4: `child.a1 == 10 and child.a2 == 4` Find all child docs where the parent document cannot be found (or where the referenced field is not set in the parent): `child.a2 == null` Note that when visiting `child` documents we only ever access imported fields via the**child** document type itself. A much more complete list usage examples for the above document schemas and reference relations can be found in the[imported fields in selections](https://github.com/vespa-engine/system-test/blob/master/tests/search/parent_child/imported_fields_in_selections.rb) system test. This test covers both the visiting and GC cases. ## Constraints Language identifiers restrict what can be used as document type names. The following values are not valid document type names:_true, false, and, or, not, id, null_ ## Grammar - EBNF of the language To simplify, double casing of strings has not been included. The identifiers "null", "true", "false" etc. can be written in any case, including mixed case. ``` nil = "null" ; bool = "true" | "false" ; posdigit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; digit = '0' | posdigit ; hexdigit = digit | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' ; integer = ['-' | '+'], posdigit, { digit } ; float = ['-' | '+'], digit, { digit }, ['.' , { digit }, [ ('e' | 'E'), posdigit, { digit }] ] ; number = float | integer ; stdchars = ? All ASCII chars except '\\', '"', 0 - 31 and 127 - 255 ? ; alpha = ? ASCII characters in the range a-z and A-Z ? ; alphanum = alpha | digit ; space = ( ' ' | '\t' | '\f' | '\r' | '\n' ) ; string = '"', { stdchars | ( '\\', ( 't' | 'n' | 'f' | 'r' | '"' ) ) | ( "\\x", hexdigit, hexdigit ) }, '"' ; doctype = alpha, { alphanum } ; fieldname = { alphanum '{' |'}' | '[' | ']' '.' } ; function = alpha, { alphanum } ; idarg = "scheme" | "namespace" | "type" | "specific" | "user" | "group" ; searchcolumnarg = integer ; operator = ">=" | ">" | "==" | "=~" | "=" | "<=" | "<" | "!=" ; idspec = "id", ['.', idarg] ; searchcolumnspec = "searchcolumn", ['.', searchcolumnarg] ; fieldspec = doctype, ( function | ('.', fieldname) ) ; value = ( valuegroup | nil | number | string | idspec | searchcolumnspec | fieldspec ), { function } ; valuefuncmod = ( valuegroup | value ), '%', ( valuefuncmod | valuegroup | value ) ; valuefuncmul = ( valuefuncmod | valuegroup | value ), ( '*' | '/' ), ( valuefuncmul | valuefuncmod | valuegroup | value ) ; valuefuncadd = ( valuefuncmul | valuefuncmod | valuegroup | value ), ( '+' | '-' ), ( valuefuncadd | valuefuncmul | valuefuncmod | valuegroup | value ) ; valuegroup = '(', arithmvalue, ')' ; arithmvalue = ( valuefuncadd | valuefuncmul | valuefuncmod | valuegroup | value ) ; comparison = arithmvalue, { space }, operator, { space }, arithmvalue ; leaf = bool | comparison | fieldspec | doctype ; not = "not", { space }, ( group | leaf ) ; and = ( not | group | leaf ), { space }, "and", { space }, ( and | not | group | leaf ) ; or = ( and | not | group | leaf ), { space }, "or", { space }, ( or | and | not | group | leaf ) ; group = '(', { space }, ( or | and | not | group | leaf ), { space }, ')' ; expression = ( or | and | not | group | leaf ) ; ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Examples](#examples) - [Case sensitiveness](#case-sensitiveness) - [Branch operators / precedence](#branch-operators-precedence) - [Primitives](#primitives) - [Comparison](#comparison) - [Comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values) - [Null behavior with imported fields](#null-behavior-with-imported-fields) - [Locale / Character sets](#locale-character-sets) - [Values](#values) - [Value arithmetics](#value-arithmetics) - [Functions](#functions) - [Using imported fields in selections](#using-imported-fields-in-selections) - [Example](#example) - [Constraints](#constraints) - [Grammar - EBNF of the language](#grammar-EBNF-of-the-language) --- # Source: https://docs.vespa.ai/en/querying/document-summaries.html.md # Document Summaries A _document summary_ is the information that is shown for each document in a query result. What information to include is determined by a _document summary class_: A named set of fields with config on which information they should contain. A special document summary named `default` is always present and used by default. This contains: - all fields which specifies in their indexing statements that they may be included in summaries - all fields specified in any document summary - [sddocname](../reference/querying/default-result-format.html#sddocname) - [documentid](../reference/querying/default-result-format.html#documentid). Summary classes are defined in the schema: ``` schema music { document music { field artist type string { indexing: summary | index } field album type string { indexing: summary | index index: enable-bm25 } field year type int { indexing: summary | attribute } field category_scores type tensor(cat{}) { indexing: summary | attribute } }document-summary my-short-summary {summary artist {}summary album {}}} ``` See the [schema reference](../reference/schemas/schemas.html#summary) for details. The summary class to use for a query is determined by the parameter[presentation.summary](../reference/api/query.html#presentation.summary);: ``` $ vespa query "select\*from music where album contains 'head'" \"presentation.summary=my-short-summary" ``` A common reason to define a document summary class is [performance](#performance): By configuring a document summary which only contains attributes the result can be generated without disk accesses. Note that this is needed to ensure only memory is accessed even if all fields are attributes because the [document id](../schemas/documents.html#document-ids) is not stored as an attribute. Document summaries may also contain [dynamic snippets and highlighted terms](#dynamic-snippets). The document summary class to use can also be issued programmatically to the `fill()`method from a searcher, and multiple fill operations interleaved with programmatic filtering can be used to optimize data access and transfer when programmatic filtering in a Searcher is used. ## Selecting summary fields in YQL A [YQL](query-language.html) statement can also be used to filter which fields from a document summary to include in results. Note that this is just a field filter in the container - a summary containing all fields of a summary class is always fetched from content nodes, so to optimize performance it is necessary to create custom summary classes. ``` $ vespa query "selectartist,album,documentid,sddocnamefrom music where album contains 'head'" ``` ``` ``` { "root": { }, "children": [ { "id": "id:mynamespace:music::a-head-full-of-dreams", "relevance": 0.16343879032006284, "source": "mycontentcluster", "fields": { "sddocname": "music", "documentid": "id:mynamespace:music::a-head-full-of-dreams", "artist": "Coldplay", "album": "A Head Full of Dreams" } } ] } } ``` ``` Use `*` to select all the fields of the chosen document summary class used, (which is `default` by default). ``` $ vespa query "select\*from music where album contains 'head'" ``` ``` ``` { "root": { }, "children": [ { "id": "id:mynamespace:music::a-head-full-of-dreams", "relevance": 0.16343879032006284, "source": "mycontentcluster", "fields": { "sddocname": "music", "documentid": "id:mynamespace:music::a-head-full-of-dreams", "artist": "Coldplay", "album": "A Head Full of Dreams", "year": 2015, "category_scores": { "type": "tensor(cat{})", "cells": { "pop": 1.0, "rock": 0.20000000298023224, "jazz": 0.0 } } } } ] } } ``` ``` ## Summary field rename Summary classes may define fields by names not used in the document type: ``` document-summary rename-summary { summary artist_name { source: artist } } ``` Refer to the [schema reference](../reference/schemas/schemas.html#source) for adding [attribute](../reference/schemas/schemas.html#add-or-remove-an-existing-document-field-from-document-summary) and[non-attribute](../reference/schemas/schemas.html#add-or-remove-a-new-non-attribute-document-field-from-document-summary) fields - some changes require re-indexing. ## Dynamic snippets Use [dynamic](../reference/schemas/schemas.html#summary)to generate dynamic snippets from fields based on the query keywords. Example from Vespa Documentation Search - see the[schema](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/application/schemas/doc.sd): ``` document doc { field content type string { indexing: summary | indexsummary : dynamic} ``` A query for _document summary_ returns: > Use **document summaries** to configure which fields ... indexing: **summary** | index } } **document-summary** titleyear { **summary** title ... The example above creates a dynamic summary with the matched terms highlighted. The latter is called [bolding](../reference/schemas/schemas.html#bolding)and can be enabled independently of dynamic summaries. Refer to the [reference](../reference/schemas/schemas.html#summary) for the response format. ### Dynamic snippet configuration You can configure generation of dynamic snippets by adding an instance of the[vespa.config.search.summary.juniperrc config](https://github.com/vespa-engine/vespa/blob/master/searchsummary/src/vespa/searchsummary/config/juniperrc.def)in services.xml inside the \ cluster tag for the content cluster in question. E.g: ``` ... 2 1000 500 300 ... ``` Numbers here are in bytes. ## Performance [Attribute](../content/attributes.html) fields are held in memory. This means summaries are memory-only operations if all fields requested are attributes, and is the optimal way to get high query throughput. The other document fields are stored as blobs in the [document store](../content/proton.html#document-store). Requesting these fields may therefore require a disk access, increasing latency. **Important:** The default summary class will access the document store as it includes the [documentid](../reference/querying/default-result-format.html#documentid) field which is stored there. For maximum query throughput using memory-only access, use a dedicated summary class with attributes only. When using additional summary classes to increase performance, only the network data size is changed - the data read from storage is unchanged. Having "debug" fields with summary enabled will hence also affect the amount of information that needs to be read from disk. See [query execution](query-api.html#query-execution) - breakdown of the summary (a.k.a. result processing, rendering) phase: - The document summary latency on the content node, tracked by [content\_proton\_search\_protocol\_docsum\_latency\_average](../operations/metrics.html). - Getting data across from content nodes to containers. - Deserialization from internal binary formats (potentially) to Java objects if touched in a [Searcher](../applications/searchers.html), and finally serialization to JSON (default rendering) + rendering and network. The work, and thus latency, increases with more [hits](../reference/api/query.html#hits). Use [query tracing](query-api.html#query-tracing) to analyze performance. Refer to [content node summary cache](../performance/caches-in-vespa.html#content-node-summary-cache). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Selecting summary fields in YQL](#selecting-summary-fields-in-yql) - [Summary field rename](#summary-field-rename) - [Dynamic snippets](#dynamic-snippets) - [Dynamic snippet configuration](#dynamic-snippet-configuration) - [Performance](#performance) --- # Source: https://docs.vespa.ai/en/writing/document-v1-api-guide.html.md # /document/v1 API guide Use the _/document/v1/_ API to read, write, update and delete documents. Refer to the [document/v1 API reference](../reference/api/document-v1.html) for API details. [Reads and writes](reads-and-writes.html) has an overview of alternative tools and APIs as well as the flow through the Vespa components when accessing documents. See [getting started](#getting-started) for how to work with the _/document/v1/ API_. Examples: | GET | | Get | ``` $ curl http://localhost:8080/document/v1/my_namespace/music/docid/love-id-here-to-stay ``` | | Visit | [Visit](visiting.html) all documents with given namespace and document type: ``` $ curl http://localhost:8080/document/v1/namespace/music/docid ``` Visit all documents using continuation: ``` $ curl http://localhost:8080/document/v1/namespace/music/docid?continuation=AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA ``` Visit using a _selection_: ``` $ curl http://localhost:8080/document/v1/namespace/music/docid?selection=music.genre=='blues' ``` Visit documents across all _non-global_ document types and namespaces stored in content cluster `mycluster`: ``` $ curl http://localhost:8080/document/v1/?cluster=mycluster ``` Visit documents across all _[global](../reference/applications/services/content.html#document)_ document types and namespaces stored in content cluster `mycluster`: ``` $ curl http://localhost:8080/document/v1/?cluster=mycluster&bucketSpace=global ``` Read about [visiting throughput](#visiting-throughput) below. | | | POST | Post data in the [document JSON format](../reference/schemas/document-json-format.html). ``` $ curl -X POST -H "Content-Type:application/json" --data ' { "fields": { "artist": "Coldplay", "album": "A Head Full of Dreams", "year": 2015 } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` | | PUT | Do a [partial update](partial-updates.html) for a document. ``` $ curl -X PUT -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Warmplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` | | DELETE | Delete a document by ID: ``` $ curl -X DELETE http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` Delete all documents in the `music` schema: ``` $ curl -X DELETE \ "http://localhost:8080/document/v1/mynamespace/music/docid?selection=true&cluster=my_cluster" ``` | ## Conditional writes A _test-and-set_ [condition](../reference/writing/document-selector-language.html) can be added to Put, Remove and Update operations. Example: ``` $ curl -X PUT -H "Content-Type:application/json" --data ' { "condition": "music.artist==\"Warmplay\"", "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` **Important:** Use _documenttype.fieldname_ (e.g. music.artist) in the condition, not only _fieldname_. If the condition is not met, a _412 Precondition Failed_ is returned: ``` ``` { "pathId": "/document/v1/mynamespace/music/docid/a-head-full-of-dreams", "id": "id:mynamespace:music::a-head-full-of-dreams", "message": "[UNKNOWN(251013) @ tcp/vespa-container:19112/default]: ReturnCode(TEST_AND_SET_CONDITION_FAILED, Condition did not match document nodeIndex=0 bucket=20000000000000c4 ) " } ``` ``` Also see the [condition reference](../reference/schemas/document-json-format.html#test-and-set). ## Create if nonexistent ### Upserts Updates to nonexistent documents are supported using [create](../reference/schemas/document-json-format.html#create). This is often called an _upsert_ — insert a document if it does not already exist, or update it if it exists. An empty document is created on the content nodes, before the update is applied. This simplifies client code in the case of multiple writers. Example: ``` $ curl -X PUT -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true ``` ### Conditional updates and puts with create Conditional updates and puts can be combined with [create](../reference/schemas/document-json-format.html#create). This has the following semantics: - If the document already exists, the condition is evaluated against the most recent document version available. The operation is applied if (and only if) the condition matches. - Otherwise (i.e. the document does not exist or the newest document version is a tombstone), the condition is _ignored_ and the operation is applied as if no condition was provided. Support for conditional puts with create was added in Vespa 8.178. ``` $ curl -X POST -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true&condition=music.title%3D%3D%27best+of%27 ``` **Warning:** If all existing replicas of a document are missing when an operation with `"create": true` is executed, a new document will always be created. This happens even if a condition has been given. If the existing replicas become available later, their version of the document will be overwritten by the newest update since it has a higher timestamp. **Note:** See [document expiry](../schemas/documents.html#document-expiry) for auto-created documents — it is possible to create documents that do not match the selection criterion. **Note:** Specifying _create_ for a Put operation _without_ a condition has no observable effect, as unconditional Put operations will always write a new version of a document regardless of whether it existed already. ## Data dump To iterate over documents, use [visiting](visiting.html) — sample output: ``` ``` { "pathId": "/document/v1/namespace/doc/docid", "documents": [ { "id": "id:namespace:doc::id-1", "fields": { "title": "Document title 1", } } ], "continuation": "AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA" } ``` ``` Note the _continuation_ token — use this in the next request for more data. Below is a sample script dumping all data using [jq](https://stedolan.github.io/jq/) for JSON parsing. It splits the corpus in 8 slices by default; using a number of slices at least four times the number of container nodes is recommended for high throughput. Timeout can be set lower for benchmarking. (Each request has a maximum timeout of 60s to ensure progress is saved at regular intervals) ``` ``` #!/bin bash set -eo pipefail if [$# -gt 2] then echo "Usage: $0 [number of slices, default 8] [timeout in seconds, default 31536000 (1 year)]" exit 1 fi endpoint="https://my.vespa.endpoint" cluster="db" selection="true" slices="${1:-8}" timeout="${2:-31516000}" curlTimeout="$((timeout > 60 ? 60 : timeout))" url="$endpoint/document/v1/?cluster=$cluster&selection=$selection&stream=true&timeout=$curlTimeout&concurrency=8&slices=$slices" auth="--key my-key --cert my-cert -H 'Authorization: my-auth'" curl="curl -sS $auth" start=$(date '+%s') doom=$((start + timeout)) ## auth can be something like auth='--key data-plane-private-key.pem --cert data-plane-public-cert.pem' curl="curl -sS $auth" function visit { sliceId="$1" documents=0 continuation="" while printf -v filename "data-%03g-%012g.json.gz" $sliceId $documents json="$(eval "$curl '$url&sliceId=$sliceId$continuation'" | tee >( gzip > $filename ) | jq '{ documentCount, continuation, message }')" message="$(jq -re .message <<< $json)" && echo "Failed visit for sliceId $sliceId: $message" >&2 && exit 1 documentCount="$(jq -re .documentCount <<< $json)" && ((documents += $documentCount)) ["$(date '+%s')" -lt "$doom"] && token="$(jq -re .continuation <<< $json)" do echo "$documentCount documents retrieved from slice $sliceId; continuing at $token" continuation="&continuation=$token" done time=$(($(date '+%s') - start)) echo "$documents documents total retrieved in $time seconds ($((documents / time)) docs/s) from slice $sliceId" >&2 } for ((sliceId = 0; sliceId < slices; sliceId++)) do visit $sliceId & done wait ``` ``` ### Visiting throughput Note that visit with selection is a linear scan over all the music documents in the request examples at the start of this guide. Each complete visit thus requires the selection expression to be evaluated for all documents. Running concurrent visits with selections that match disjoint subsets of the document corpus is therefore a poor way of increasing throughput, as work is duplicated across each such visit. Fortunately, the API offers other options for increasing throughput: - Split the corpus into any number of smaller [slices](../reference/api/document-v1.html#slices), each to be visited by a separate, independent series of HTTP requests. This is by far the most effective setting to change, as it allows visiting through all HTTP containers simultaneously, and from any number of clients—either of which is typically the bottleneck for visits through _/document/v1_. A good value for this setting is at least a handful per container. - Increase backend [concurrency](../reference/api/document-v1.html#concurrency) so each visit HTTP response is promptly filled with documents. When using this together with slicing (above), take care to also stream the HTTP responses (below), to avoid buffering too much data in the container layer. When a high number of slices is specified, this setting may have no effect. - [Stream](../reference/api/document-v1.html#stream) the HTTP responses. This lets you receive data earlier, and more of it per request, reducing HTTP overhead. It also minimizes memory usage due to buffering in the container, allowing higher concurrency per container. It is recommended to always use this, but the default is not to, due to backwards compatibility. ## Getting started Pro-tip: It is easy to generate a `/document/v1` request by using the [Vespa CLI](../clients/vespa-cli.html), with the `-v` option to output a generated `/document/v1` request - example: ``` $ vespa document -v ext/A-Head-Full-of-Dreams.jsoncurl -X POST -H 'Content-Type: application/json'--data-binary @ext/A-Head-Full-of-Dreams.jsonhttp://127.0.0.1:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreamsSuccess: put id:mynamespace:music::a-head-full-of-dreams ``` See the [document JSON format](../reference/schemas/document-json-format.html) for creating JSON payloads. This is a quick guide into dumping random documents from a cluster to get started: 1. To get documents from a cluster, look up the content cluster name from the configuration, like in the [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml) example: ``. 2. Use the cluster name to start dumping document IDs (skip `jq` for full json): ``` $ curl -s 'http://localhost:8080/document/v1/?cluster=music&wantedDocumentCount=10&timeout=60s' | \ jq -r .documents[].id ``` ``` id:mynamespace:music::love-is-here-to-stay id:mynamespace:music::a-head-full-of-dreams id:mynamespace:music::hardwired-to-self-destruct ``` `wantedDocumentCount` is useful to let the operation run longer to find documents, to avoid an empty result. This operation is a scan through the corpus, and it is normal to get empty result and the [continuation token](#data-dump). 3. Look up the document with id `id:mynamespace:music::love-is-here-to-stay`: ``` $ curl -s 'http://localhost:8080/document/v1/mynamespace/music/docid/love-is-here-to-stay' | jq . ``` ``` ``` { "pathId": "/document/v1/mynamespace/music/docid/love-is-here-to-stay", "id": "id:mynamespace:music::love-is-here-to-stay", "fields": { "artist": "Diana Krall", "year": 2018, "category_scores": { "type": "tensor(cat{})", "cells": { "pop": 0.4000000059604645, "rock": 0, "jazz": 0.800000011920929 } }, "album": "Love Is Here To Stay" } } ``` ``` 4. Read more about [document IDs](../schemas/documents.html). ## Troubleshooting - When troubleshooting documents not found using the query API, use [vespa visit](../clients/vespa-cli.html#documents) to export the documents. Then compare the `id` field with other user-defined `id` fields in the query. - Document not found responses look like: - Query results can have results like: - Delete _all_ documents in _music_ schema, with security credentials: ## Request size limit Starting from version 8.577.16, Vespa returns 413 (Content too large) as a response to POST and PUT requests that are above the request size limit. To avoid this, automatically check document size and truncate or split large documents before feeding. For optimal performance, it is recommended to keep the document size below 10 MB. ## Backpressure Vespa returns response code 429 (Too Many Requests) as a backpressure signal whenever client feed throughput exceeds system capacity. Clients should implement retry strategies as described in the [HTTP best practices](../cloud/http-best-practices.html) document. Instead of implementing your own retry logic, consider using Vespa's feed clients which automatically handle retries and backpressure. See the [feed command](../clients/vespa-cli.html#documents) of the Vespa CLI and the [vespa-feed-client](../clients/vespa-feed-client.html). The `/document/v1` API includes a configurable operation queue that by default is tuned to balance latency, throughput and memory. Applications can adjust this balance by overriding the parameters defined in the [document-operation-executor](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/document-operation-executor.def) config definition. To optimize for higher throughput at the cost of increased latency and higher memory usage on the container, increase any of the `maxThrottled` (maximum queue capacity in number of operations), `maxThrottledAge` (maximum time in queue in seconds), and `maxThrottledBytes` (maximum memory usage in bytes) parameters. This allows the container to buffer more operations during temporary spikes in load, reducing the number of 429 responses while increasing request latency. Make sure to increase operation and client timeouts to accommodate for the increased latency. See the [config definition](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/document-operation-executor.def) for a detailed explanation of each parameter. Set the values to `0` for the opposite effect, i.e. to optimize for latency. Operations will be dispatched directly, and failed out immediately if the number of pending operations exceeds the dynamic window size of the document processing pipeline. _Example: overriding the default value of all 3 parameters to `0`._ ``` 0 0 0 ``` The effective operation queue configuration is logged when the container starts up, see below example. ``` INFO container Container.com.yahoo.document.restapi.resource.DocumentV1ApiHandler Operation queue: max-items=256, max-age=3000 ms, max-bytes=100 MB ``` You can observe the state of the operation queue through the metrics `httpapi_queued_operations`, `httpapi_queued_bytes` and `httpapi_queued_age`. ## Using number and group id modifiers Do not use group or number modifiers with regular indexed mode document types. These are special cases that only work as expected for document types with [mode=streaming or mode=store-only](../reference/applications/services/content.html#document). Examples: | Get | Get a document in a group: ``` $ curl http://localhost:8080/document/v1/mynamespace/music/number/23/some_key ``` ``` $ curl http://localhost:8080/document/v1/mynamespace/music/group/mygroupname/some_key ``` | | Visit | Visit all documents for a group: ``` $ curl http://localhost:8080/document/v1/namespace/music/number/23/ ``` ``` $ curl http://localhost:8080/document/v1/namespace/music/group/mygroupname/ ``` | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Conditional writes](#conditional-writes) - [Create if nonexistent](#create-if-nonexistent) - [Upserts](#upserts) - [Conditional updates and puts with create](#conditional-updates-and-puts-with-create) - [Data dump](#data-dump) - [Visiting throughput](#visiting-throughput) - [Getting started](#getting-started) - [Troubleshooting](#troubleshooting) - [Request size limit](#request-size-limit) - [Backpressure](#backpressure) - [Using number and group id modifiers](#using-number-and-group-id-modifiers) --- # Source: https://docs.vespa.ai/en/reference/api/document-v1.html.md # /document/v1 API reference This is the /document/v1 API reference documentation. Use this API for synchronous [Document](../../schemas/documents.html) operations to a Vespa endpoint - refer to [reads and writes](../../writing/reads-and-writes.html) for other options. The [document/v1 API guide](../../writing/document-v1-api-guide.html) has examples and use cases. **Note:** Mapping from document IDs to /document/v1/ URLs is found in [document IDs](../../schemas/documents.html#id-scheme) - also see [troubleshooting](../../writing/document-v1-api-guide.html#troubleshooting). Some examples use _number_ and _group_[document id](../../schemas/documents.html#document-ids) modifiers. These are special cases that only work as expected for document types with [mode=streaming or mode=store-only](../applications/services/content.html#document). Do not use group or number modifiers with regular indexed mode document types. ## Configuration To enable the API, add `document-api` in the serving container cluster - [services.xml](../applications/services/container.html): ``` \ ``` ## HTTP requests | HTTP request | document/v1 operation | Description | | --- | --- | --- | | GET | _Get_ a document by ID or _Visit_ a set of documents by selection. | | | Get | Get a document: ``` /document/v1///docid/ /document/v1///number// /document/v1///group// ``` Optional parameters: - [cluster](#cluster) - [fieldSet](#fieldset) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Visit | Iterate over and get all documents, or a [selection](#selection) of documents, in chunks, using [continuation](#continuation) tokens to track progress. Visits are a linear scan over the documents in the cluster. ``` /document/v1/ ``` It is possible to specify namespace and document type with the visit path: ``` /document/v1///docid ``` Documents can be grouped to limit accesses to a subset. A group is defined by a numeric ID or string — see [id scheme](../../schemas/documents.html#id-scheme). ``` /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - Visits can only retrieve data from _one_ content cluster, so `cluster` **must** be specified for requests at the root `/document/v1/` level, or when there is ambiguity. This is required even if the application has only one content cluster. Optional parameters: - [bucketSpace](#bucketspace) - Parent documents are [global](../applications/services/content.html#document) and in the `global` [bucket space](#bucketspace). By default, visit will visit non-global documents in the `default` bucket space, unless document type is indicated, and is a global document type. - [concurrency](#concurrency) - Use to configure backend parallelism for each visit HTTP request. - [continuation](#continuation) - [fieldSet](#fieldset) - [selection](#selection) - [sliceId](#sliceid) - [slices](#slices) - Split visiting of the document corpus across more than one HTTP request—thus allowing the concurrent use of more HTTP containers—use the `slices` and `sliceId` parameters. - [stream](#stream) - It's recommended enabling streamed HTTP responses, with the [stream](#stream) parameter, as this reduces memory consumption and reduces HTTP overhead. - [timeout](#timeout) - [tracelevel](#tracelevel) - [wantedDocumentCount](#wanteddocumentcount) - [fromTimestamp](#fromtimestamp) - [toTimestamp](#totimestamp) - [includeRemoves](#includeRemoves) Optional request headers: - [Accept](#accept) - specify the desired response format. | | POST | _Put_ a given document, by ID, or _Copy_ a set of documents by selection from one content cluster to another. | | | Put | Write the document contained in the request body in JSON format. ``` /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Optional parameters: - [condition](#condition) - Use for conditional writes. - [route](#route) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Copy | Write documents visited in source [cluster](#cluster) to the [destinationCluster](#destinationcluster) in the same application. A [selection](#selection) is mandatory — typically the document type. Supported paths (see [visit](#visit) above for semantics): ``` /document/v1/ /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - [destinationCluster](#destinationcluster) - [selection](#selection) Optional parameters: - [bucketSpace](#bucketspace) - [continuation](#continuation) - [timeChunk](#timechunk) - [timeout](#timeout) - [tracelevel](#tracelevel) | | PUT | _Update_ a document with the given partial update, by ID, or _Update where_ the given selection is true. | | | Update | Update a document with the partial update contained in the request body in the [document update JSON format](../schemas/document-json-format.html#update). ``` /document/v1///docid/ ``` Optional parameters: - [condition](#condition) - use for conditional writes - [create](#create) - use to create empty documents when updating non-existent ones. - [route](#route) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Update where | Update visited documents in [cluster](#cluster) with the partial update contained in the request body in the [document update JSON format](../schemas/document-json-format.html#update). Supported paths (see [visit](#visit) above for semantics): ``` /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - [selection](#selection) Optional parameters: - [bucketSpace](#bucketspace) - See [visit](#visit), `default` or `global` bucket space - [continuation](#continuation) - [stream](#stream) - [timeChunk](#timechunk) - [timeout](#timeout) - [tracelevel](#tracelevel) | | DELETE | _Remove_ a document, by ID, or _Remove where_ the given selection is true. | | | Remove | Remove a document. ``` /document/v1///docid/ ``` Optional parameters: - [condition](#condition) - [route](#route) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Delete where | Delete visited documents from [cluster](#cluster). Supported paths (see [visit](#visit) above for semantics): ``` /document/v1/ /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - [selection](#selection) Optional parameters: - [bucketSpace](#bucketspace) - See [visit](#visit), `default` or `global` bucket space - [continuation](#continuation) - [stream](#stream) - [timeChunk](#timechunk) - [timeout](#timeout) - [tracelevel](#tracelevel) | ## Request parameters | Parameter | Type | Description | | --- | --- | --- | | bucketSpace | String | Specify the bucket space to visit. Document types marked as `global` exist in a separate _bucket space_ from non-global document types. When visiting a particular document type, the bucket space is automatically deduced based on the provided type name. When visiting at a root `/document/v1/` level this information is not available, and the non-global ("default") bucket space is visited by default. Specify `global` to visit global documents instead. Supported values: `default` (for non-global documents) and `global`. | | cluster | String | Name of [content cluster](../../content/content-nodes.html) to GET from, or visit. | | concurrency | Integer | Sends the given number of visitors in parallel to the backend, improving throughput at the cost of resource usage. Default is 1. When `stream=true`, concurrency limits the maximum concurrency, which is otherwise unbounded, but controlled by a dynamic throttle policy. **Important:** Given a concurrency parameter of _N_, the worst case for memory used while processing the request grows linearly with _N_, unless [stream](#stream) mode is turned on. This is because the container currently buffers all response data in memory before sending them to the client, and all sent visitors must complete before the response can be sent. | | condition | String | For test-and-set. Run a document operation conditionally — if the condition fails, a _412 Precondition Failed_ is returned. See [example](../../writing/document-v1-api-guide.html#conditional-writes). | | continuation | String | When visiting, a continuation token is returned as the `"continuation"` field in the JSON response, as long as more documents remain. Use this token as the `continuation` parameter to visit the next chunk of documents. See [example](../../writing/document-v1-api-guide.html#data-dump). | | create | Boolean | If `true`, updates to non-existent documents will create an empty document to update. See [create if nonexistent](../../writing/document-v1-api-guide.html#create-if-nonexistent). | | destinationCluster | String | Name of [content cluster](../../content/content-nodes.html) to copy to, during a copy visit. | | dryRun | Boolean | Used by the [vespa-feed-client](../../clients/vespa-feed-client.html) using `--speed-test` for bandwidth testing, by setting to `true`. | | fieldSet | String | A [field set string](../../schemas/documents.html#fieldsets) with the set of document fields to fetch from the backend. Default is the special `[document]` fieldset, returning all _document_ fields. To fetch specific fields, use the name of the document type, followed by a comma-separated list of fields (for example `music:artist,song` to fetch two fields declared in `music.sd`). | | route | String | The route for single document operations, and for operations generated by [copy](#copy), [update](#update-where) or [deletion](#delete-where) visits. Default value is `default`. See [routes](../../writing/document-routing.html). | | selection | String | Select only a subset of documents when [visiting](../../writing/visiting.html) — details in [document selector language](../writing/document-selector-language.html). | | sliceId | Integer | The slice number of the visit represented by this HTTP request. This number must be non-negative and less than the number of [slices](#slices) specified for the visit - e.g., if the number of slices is 10, `sliceId` is in the range [0-9]. **Note:** If the number of distribution bits change during a sliced visit, the results are undefined. Thankfully, this is a very rare occurrence and is only triggered when adding content nodes. | | slices | Integer | Split the document corpus into this number of independent slices. This lets multiple, concurrent series of HTTP requests advance the same logical visit independently, by specifying a different [sliceId](#sliceid) for each. | | stream | Boolean | Whether to stream the HTTP response, allowing data to flow as soon as documents arrive from the backend. This obsoletes the [wantedDocumentCount](#wanteddocumentcount) parameter. The HTTP status code will always be 200 if the visit is successfully initiated. Default value is false. | | format.tensors | String | Controls how tensors are rendered in the result. | Value | Description | | --- | --- | | `short` | **Default**. Render the tensor value in an object having two keys, "type" containing the value, and "cells"/"blocks"/"values" ([depending on the type](../schemas/document-json-format.html#tensor)) containing the tensor content. Render the tensor content in the [type-appropriate short form](../schemas/document-json-format.html#tensor). | | `long` | Render the tensor value in an object having two keys, "type" containing the value, and "cells" containing the tensor content. Render the tensor content in the [general verbose form](../schemas/document-json-format.html#tensor). | | `short-value` | Render the tensor content directly. Render the tensor content in the [type-appropriate short form](../schemas/document-json-format.html#tensor). | | `long-value` | Render the tensor content directly. Render the tensor content in the [general verbose form](../schemas/document-json-format.html#tensor). | | | timeChunk | String | Target time to spend on one chunk of a copy, update or remove visit; with optional ks, s, ms or µs unit. Default value is 60. | | timeout | String | Request timeout in seconds, or with optional ks, s, ms or µs unit. Default value is 180s. | | tracelevel | Integer | Number in the range [0,9], where higher gives more details. The trace dumps which nodes and chains the document operation has touched. See [routes](../../writing/document-routing.html). | | wantedDocumentCount | Integer | Best effort attempt to not respond to the client before `wantedDocumentCount` number of documents have been visited. Response may still contain fewer documents if there are not enough matching documents left to visit in the cluster, or if the visiting times out. This parameter is intended for the case when you have relatively few documents in your cluster and where each visit request would otherwise process only a handful of documents. The maximum value of `wantedDocumentCount` is bounded by an implementation-specific limit to prevent excessive resource usage. If the cluster has many documents (on the order of tens of millions), there is no need to set this value. | | fromTimestamp | Integer | Filters the returned document set to only include documents that were last modified at a time point equal to or higher to the specified value, in microseconds from UTC epoch. Default value is 0 (include all documents). | | toTimestamp | Integer | Filters the returned document set to only include documents that were last modified at a time point lower than the specified value, in microseconds from UTC epoch. Default value is 0 (sentinel value; include all documents). If non-zero, must be greater than, or equal to, `fromTimestamp`. | | includeRemoves | Boolean | Include recently removed document IDs, along with the set of returned documents. By default, only documents currently present in the corpus are returned in the `"documents"` array of the response; when this parameter is set to `"true"`, documents that were recently removed, and whose tombstones still exist, are also included in that array, as entries on the form `{ "remove": "id:ns:type::foobar" }`. See [here](/en/operations/self-managed/admin-procedures.html#data-retention-vs-size) for specifics on tombstones, including their lifetime. | ## HTTP request headers | Header | Values | Description | | --- | --- | --- | | Accept | `application/json` or `application/jsonl` | The [Accept](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Accept) header lets the client specify to the server what [media (MIME) types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types) it accepts as the response format. All Document V1 API calls support `application/json` for returning [JSON](#json) responses. [Streaming visiting](#stream) additionally supports `application/jsonl` for returning [JSON Lines](#json-lines) (JSONL) since Vespa 8.593. To ensure compatibility with older versions, make sure to check the `Content-Type`[HTTP response header](#http-response-headers). A JSONL response will always have a `Content-Type` media type of `application/jsonl`, and JSON wil always have a media type of `application/json`. Multiple acceptable types can be specified. JSONL will be returned if (and only if) `application/jsonl` is part of the list _and_ no other media types have a higher [quality value](https://httpwg.org/specs/rfc9110.html#quality.values). Example: ``` Accept: application/jsonl ``` If the client accepts both JSON and JSONL, the server will respond with JSONL: ``` Accept: application/json, application/jsonl ``` For backwards compatibility, if no `Accept` header is provided (or if no provided media types are acceptable) `application/json` is assumed. | ## Request body POST and PUT requests must include a body for single document operations; PUT must also include a body for [update where](#update-where) visits. A field has a _value_ for a POST and an _update operation object_ for PUT. Documents and operations use the [document JSON format](../schemas/document-json-format.html). The document fields must match the [schema](../../basics/schemas.html): ``` ``` { "fields": { "": "" } } ``` ``` ``` ``` { "fields": { "": { "" : "" } } } ``` ``` The _update-operation_ is most often `assign` - see [update operations](../schemas/document-json-format.html#update-operations) for the full list. Values for `id` / `put` / `update` in the request body are silently dropped. The ID is generated from the request path, regardless of request body data - example: ``` ``` { "put" : "id:mynamespace:music::123", "fields": { "title": "Best of" } } ``` ``` This makes it easier to generate a feed file that can be used for both the [vespa-feed-client](../../clients/vespa-feed-client.html) and this API. ## HTTP status codes | Code | Description | | --- | --- | | 200 | OK. Attempts to remove or update a non-existent document also yield this status code (see 412 below). | | 204 | No Content. Successful response to OPTIONS request. | | 400 | Bad request. Returned for undefined document types + other request errors. See [13465](https://github.com/vespa-engine/vespa/issues/13465) for defined document types not assigned to a content cluster when using PUT. Inspect `message` for details. | | 404 | Not found; the document was not found. This is only used when getting documents. | | 405 | Method Not Allowed. HTTP method is not supported by the endpoint. Valid combinations are listed [above](#http-requests) | | 412 | [condition](#condition) is not met. Inspect `message` for details. This is also the result when a condition if specified, but the document does not exist. | | 413 | Content too large; used for POST and PUT requests that are above the [request size limit](../../writing/document-v1-api-guide.html#request-size-limit). | | 429 | Too many requests; the document API has too many inflight feed operations, retry later. | | 500 | Server error; an unspecified error occurred when processing the request/response. | | 503 | Service unavailable; the document API was unable to produce a response at this time. | | 504 | Gateway timeout; the document API failed to respond within the given (or default 180s) timeout. | | 507 | Insufficient storage; the content cluster is out of memory or disk space. | ## HTTP response headers | Header | Values | Description | | --- | --- | --- | | X-Vespa-Ignored-Fields | true | Will be present and set to 'true' only when a put or update contains one or more fields which were [ignored since they are not present in the document type](../applications/services/container.html#ignore-undefined-fields). Such operations will be applied exactly as if they did not contain the field operations referencing non-existing fields. References to non-existing fields in field _paths_ are not detected. | | Content-Type | `application/json` or `application/jsonl` | The [media type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types) (MIME type) of the response body. Either `application/json` for [JSON](#json) responses or `application/jsonl` for [JSON Lines](#json-lines) (JSONL) responses. The content type may include additional parameters such as `charset`. Example header: ``` Content-Type: application/json; charset=UTF-8 ``` | ## Response formats Responses are by default in JSON format. [Streaming visiting](#stream)supports an optional [JSON Lines](#json-lines) (JSONL) response format since Vespa 8.593. ### JSON JSON responses have the following fields: | Field | Description | | --- | --- | | pathId | Request URL path — always included. | | message | An error message — included for all failed requests. | | id | Document ID — always included for single document operations, including _Get_. | | fields | The requested document fields — included for successful _Get_ operations. | | documents[] | Array of documents in a visit result — each document has the _id_ and _fields_. | | documentCount | Number of visited and selected documents. If [includeRemoves](#includeRemoves) is `true`, this also includes the number of returned removes (tombstones). | | continuation | Token to be used to get the next chunk of the corpus - see [continuation](#continuation). | GET can include a `fields` object if a document was found in a _GET_ request ``` ``` { "pathId": "", "id": "", "fields": { } } ``` ``` A GET _visit_ result can include an array of `documents`plus a [continuation](#continuation): ``` ``` { "pathId": "", "documents": [ { "id": "", "fields": { } } ], "continuation": "", "documentCount": 123 } ``` ``` A continuation indicates the client should make further requests to get more data, while lack of a continuation indicates an error occurred, and that visiting should cease, or that there are no more documents. A `message` can be returned for failed operations: ``` ``` { "pathId": "", "message": "" } ``` ``` ### JSON Lines A JSON Lines (JSONL) response is a stream of newline-separated JSON objects. Each line contains exactly one JSON object, and each JSON object takes up exactly one line. No line breaks are allowed within an object. JSONL is an optional response format for [streaming visiting](#stream), enabling efficient client-side parsing and fine-grained, continuous tracking of visitor progress. The JSONL response format is currently not supported for any other operations than streaming visiting. The JSONL response format is enabled by providing a HTTP [Accept](#accept) request header that specifies `application/jsonl` as the preferred response type, and will have a [Content-Type](#content-type) of `application/jsonl` if the server is on a version that supports JSONL visiting. Clients must check the `Content-Type` header to ensure they are getting the format they expect. JSONL support requires Vespa 8.593 or newer. Example response body: ``` ``` {"put":"id:ns:music::one","fields":{"foo":"bar"}} {"put":"id:ns:music::two","fields":{"foo":"baz"}} {"continuation":{"token":"...","percentFinished":40.0}} {"put":"id:ns:music::three","fields":{"foo":"zoid"}} {"remove":"id:ns:music::four"} {"continuation":{"token":"...","percentFinished":50.0}} {"continuation":{"token":"...","percentFinished":60.0}} {"put":"id:ns:music::five","fields":{"foo":"berg"}} {"continuation":{"token":"...","percentFinished":70.0}} {"sessionStats":{"documentCount":5}} {"continuation":{"percentFinished":100.0}} ``` ``` Note that the `"..."` values are placeholders for (from a client's perspective) opaque string values. #### JSONL response objects **Note:** To be forwards compatible with future extensions to the response format, ignore unknown objects and fields. | Object | Description | | --- | --- | | put | A document [Put](../schemas/document-json-format.html#put) operation in the same format as that accepted by Vespa's JSONL feed API. | | remove | A document [Remove](../schemas/document-json-format.html#remove) operation in the same format as that accepted by Vespa's JSONL feed API. Only present if [includeRemoves](#includeRemoves) is `true`. | | continuation | A visitor [continuation](#continuation). Possible sub-object fields: | Field name | Description | | --- | --- | | `token` | An opaque string value representing the current visitor progress through the data space. This value can be provided as part of a subsequent visitor request to continue visiting from where the last request left off. Clients should not attempt to parse the contents of this string, as it's considered an internal implementation detail and may be changed (in a backwards compatible way) without any prior announcement. | | `percentFinished` | A floating point number between 0 and 100 (inclusive) that gives an approximation of how far the visitor has progressed through the data space. | The last line of a successful request should always be a `continuation` object. If (and only if) visiting has completed, the last `continuation` object will have a `percentFinished` value of `100` and will _not_ have a `token` field. | | message | A message received from the backend visitor session. Can be used by clients to report problems encountered during visiting. Possible sub-object fields: | Field name | Description | | --- | --- | | `text` | The actual message, in unstructured text | | `severity` | The severity of the message. One of `info`, `warning` or `error`. | | | sessionStats | Statistics from the backend visitor session. Possible sub-object fields: | Field name | Description | | --- | --- | | `documentCount` | The number of visited and selected documents. If [includeRemoves](#includeRemoves) is `true`, this also includes the number of returned removes (tombstones). | | Note that it's possible for a successful response to contain zero `put` or `remove` objects if the [selection](#selection) did not match any documents. #### Differences from the JSON format The biggest difference in semantics between the JSON and JSONL response formats is when, and how, [continuation](#continuation) objects are returned. In the JSON format a continuation is included _once_ at the very end of the response object and covers the progress made by the entire request. If the request somehow fails after receiving 99% of all documents but prior to receiving the continuation field, the client must retry the entire request from the previously known continuation value. This can result in getting many requested documents twice; once from the incomplete first request and once more from the second request that covers the same part of the data space. In the JSON Lines format, a contination object is emitted to the stream _every time_ a backend data [bucket](../../content/buckets.html) has been fully visited, as well as at the end of the response stream. This may happen many times in a response. Each continuation object _subsumes_ the progress of previously emitted continuations, meaning that a client only needs to remember the _most recent_ continuation value it observed in the response. If the request fails prior to completion, the client can specify the most recent continuation in the next request; it will then only receive duplicates for the data buckets that were actively being processed when the request failed. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Configuration](#configuration) - [HTTP requests](#http-requests) - [Request parameters](#request-parameters) - [HTTP request headers](#http-request-headers) - [Request body](#request-body) - [HTTP status codes](#http-status-codes) - [HTTP response headers](#http-response-headers) - [Response formats](#response-formats) - [JSON](#json) - [JSON Lines](#json-lines) --- # Source: https://docs.vespa.ai/en/schemas/documents.html.md # Documents Vespa models data as _documents_. A document has a string identifier, set by the application, unique across all documents. A document is a set of [key-value pairs](../writing/document-api-guide.html). A document has a schema (i.e. type), defined in the [schema](../basics/schemas.html). When configuring clusters, a [documents](../reference/applications/services/content.html#documents) element sets what document types a cluster is to store. This configuration is used to configure the garbage collector if it is enabled. Additionally, it is used to define default routes for documents sent into the application. By default, a document will be sent to all clusters having the document type defined. Refer to [routing](../writing/document-routing.html) for details. Vespa uses the document ID to distribute documents to nodes. From the document identifier, the content layer calculates a numeric location. A bucket contains all the documents, where a given amount of least-significant bits of the location are all equal. This property is used to enable co-localized storage of documents - read more in [buckets](../content/buckets.html) and [content cluster elasticity](../content/elasticity.html). Documents can be [global](../reference/applications/services/content.html#document), see [parent/child](parent-child.html). ## Document IDs The document identifiers are URIs, represented by a string, which must conform to a defined URI scheme for document identifiers. The document identifier string may only contain _text characters_, as defined by `isTextCharacter` in [com.yahoo.text.Text](https://github.com/vespa-engine/vespa/blob/master/vespajlib/src/main/java/com/yahoo/text/Text.java). ### id scheme Vespa currently has only one defined scheme, the _id scheme_: `id::::` **Note:** An example mapping from ID to the URL used in [/document/v1/](../writing/document-v1-api-guide.html) is from`id:mynamespace:mydoctype::user-defined-id` to`/document/v1/mynamespace/mydoctype/docid/user-defined-id`. Find examples and tools in [troubleshooting](../writing/document-v1-api-guide.html#document-not-found). Find examples in the [/document/v1/](../writing/document-v1-api-guide.html) guide. | Part | Required | Description | | --- | --- | --- | | namespace | Yes | Not used by Vespa, see [below](#namespace). | | document-type | Yes | Document type as defined in [services.xml](../reference/applications/services/content.html#document) and the [schema](../reference/schemas/schemas.html). | | key/value-pair | Optional | Modifiers to the id scheme, used to configure document distribution to [buckets](../content/buckets.html#document-to-bucket-distribution). With no modifiers, the id scheme distributes all documents uniformly. The key/value-pair field contains one of two possible key/value pairs; **n** and **g** are mutually exclusive: | n=_\_ | Number in the range [0,2^63-1] - only for testing of abnormal bucket distributions | | g=_\_ | The _groupname_ string is hashed and used to select the storage location | **Important:** This is only useful for document types with [mode=streaming or mode=store-only](../reference/applications/services/content.html#document). Do not use modifiers for regular indexed document types. See [streaming search](../performance/streaming-search.html). Using modifiers for regular indexed document will cause unpredictable feeding performance, in addition, search dispatch does not have support to limit the search to modifiers/buckets. | | user-specified | Yes | A unique ID string. | ### Document IDs in search results The full Document ID (as a string) will often contain redundant information and be quite long; a typical value may look like "id:mynamespace:mydoctype::user-specified-identifier" where only the last part is useful outside Vespa. The Document ID is therefore not stored in memory, and it **not always present** in [search results](../reference/querying/default-result-format.html#id). It is therefore recommended to put your own unique identifier (usually the "user-specified-identifier" above) in a document field, typically named "myid" or "shortid" or similar: ``` field shortid type string { indexing: attribute | summary } ``` This enables using a [document-summary](../querying/document-summaries.html) with only in-memory fields while still getting the identifier you actually care about. If the "user-specified-identifier" is just a simple number you could even use "type int" for this field for minimal memory overhead. The Document ID is stored on disk in the document summary. To return this value in search results, configure the schema like this: ``` schema music { document music { field ... } document-summary empty-summary { summary documentid { source: documentid } from-disk } ... ``` ... and use `presentation.summary=empty-summary` in the query API. The `from-disk` setting mutes a warning for document summary disk access; Use a higher query timeout when requesting many IDs like this. ### Namespace The namespace in document ids is useful when you have multiple document collections that you want to be sure never end up with the same document id. It has no function in Vespa beyond this, and can just be set to any short constant value like for example "doc". Consider also letting synthetic documents used for testing use namespace "test" so it's easy to detect and remove them if they are present outside the test by mistake. Example - if feeding - document A by `curl -X POST https:.../document/v1/first_namespace/my_doc_type/docid/shakespeare` - document B by `curl -X POST https:.../document/v1/second_namespace/my_doc_type/docid/shakespeare` then those will be separate documents, both searchable, with different document IDs. The document ID differs not in the user specified part (this is `shakespeare` for both documents), but in the namespace part (`first_namespace` vs `second_namespace`). The full document ID for document A is `id:first_namespace:my_doc_type::shakespeare`. The namespace has no relation to other configuration elsewhere, like in _services.xml_ or in schemas. It is just like the user specified part of each document ID in that sense. Namespace can not be used in queries, other than as part of the full document ID. However, it can be used for [document selection](../reference/writing/document-selector-language.html), where `id.namespace` can be accessed and compared to a given string, for instance. An example use case is [visiting](../writing/visiting.html) a subset of documents. ## Fields Documents can have fields, see the [schema reference](../reference/schemas/schemas.html#field). A field can not be defined with a default value. Use a [choice ('||') indexing statement or a](../writing/indexing.html#choice-example)[document processor](../applications/document-processors.html) to assign a default to document put/update operations. ## Fieldsets Use _fieldset_ to limit the fields that are returned from a read operation, like _get_ or _visit_ - see [examples](../clients/vespa-cli.html#documents). Vespa may return more fields than specified if this does not impact performance. **Note:** Document field sets is a different thing than[searchable fieldsets](../reference/schemas/schemas.html#fieldset). There are two options for specifying a fieldset: - Built-in fieldset - Name of a document type, then a colon ":", followed by a comma-separated list of fields (for example `music:artist,song` to fetch two fields declared in `music.sd`) Built-in fieldsets: | Fieldset | Description | | --- | --- | | [all] | Returns all fields in the schema (generated fields included) and the document ID. | | [document] | Returns original fields in the document, including the document ID. | | [none] | Returns no fields at all, not even the document ID. _Internal, do not use_ | | [id] | Returns only the document ID | | \:[document] | **Deprecated:** Use `[document]` Same as `[document]` fieldset above: Returns only the original document fields (generated fields not included) together with the document ID. | If a built-in field set is not used, a list of fields can be specified. Syntax: ``` :field1,field2,… ``` Example: ``` music:title,artist ``` ## Document expiry To auto-expire documents, use a [selection](../reference/applications/services/content.html#documents.selection) with [now](../reference/writing/indexing-language.html#now). Example, set time-to-live (TTL) for _music_-documents to one day, using a field called _timestamp_: ``` ``` ``` ``` Note: The `selection` expression says which documents to _keep_, not which ones to delete. The _timestamp_ field must have a value in seconds since EPOCH: ``` field timestamp type long { indexing: attribute attribute { fast-access } } ``` When `garbage-collection="true"`, Vespa iterates over the document space to purge expired documents. Vespa will invoke the configured GC selection for each stored document once every [garbage-collection-interval](../reference/applications/services/content.html#documents.selection) seconds. It is unspecified when a particular document will be processed within the configured interval. **Important:** This is a best-effort garbage collection feature to conserve CPU and space. Use query filters if it is important to exclude documents based on a criterion. - Using a _selection_ with _now_ can have side effects when re-feeding or re-processing documents, as timestamps can be stale. A common problem is feeding with too old timestamps, resulting in no documents being indexed. - Normally, documents that are already expired at write time are not persisted. When using [create](../writing/document-v1-api-guide.html#create-if-nonexistent) (Create if nonexistent), it is possible to create documents that are expired and will be removed in next cycle. - Deploying a configuration where the selection string selects no documents will cause all documents to be garbage collected. Use [visit](../writing/visiting.html) to test the selection string. Garbage collected documents are not to be expected to be recoverable. - The fields that are referenced in the selection expression should be attributes. Also, either the fields should be set with _"fast-access"_ or the number of [searchable copies](../reference/applications/services/content.html#searchable-copies) in the content cluster should be the same as the [redundancy](../reference/applications/services/content.html#redundancy). Otherwise, the document selection maintenance will be slow and have a major performance impact on the system. - [Imported fields](../reference/schemas/schemas.html#import-field) can be used in the selection string to expire documents, but special care needs to be taken when using these. See [using imported fields in selections](../reference/writing/document-selector-language.html#using-imported-fields-in-selections) for more information and restrictions. - Document garbage collection is a low priority background operation that runs continuously unless preempted by higher priority operations. If the cluster is too heavily loaded by client feed operations, there's a risk of starving GC from running. To verify that garbage collection is not starved, check the [vds.idealstate.max\_observed\_time\_since\_last\_gc\_sec.average](../operations/metrics.html) distributor metric. If it significantly exceeds `garbage-collection-interval` it is an indication that GC is starved. To batch remove, set a selection that matches no documents, like _"not music"_ Use [vespa visit](../writing/visiting.html) to test the selection. Dump the IDs of all documents that would be _preserved_: ``` ``` $ vespa visit --selection 'music.timestamp > now() - 86400' --field-set "music.timestamp" ``` ``` Negate the expression by wrapping it in a `not` to dump the IDs of all the documents that would be _removed_ during GC: ``` ``` $ vespa visit --selection 'not (music.timestamp > now() - 86400)' --field-set "music.timestamp" ``` ``` ## Processing documents To process documents, use [Document processing](../applications/document-processors.html). Examples are enriching documents (look up data from other sources), transform content (like linguistic transformations, tokenization), filter data and trigger other events based on the input data. See the sample app [album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) for use of Vespa APIs like: - [Document API](../writing/document-api-guide.html) - work on documents and fields in documents, and create unit tests using the Application framework - [Document Processing](../applications/document-processors.html) - chain independent processors with ordering constraints The sample app [vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search) has examples of processing PUTs or UPDATEs (using [create-if-nonexistent](../writing/document-v1-api-guide.html#create-if-nonexistent)) of documents in [OutLinksDocumentProcessor](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/java/ai/vespa/cloud/docsearch/OutLinksDocumentProcessor.java). It is also in introduction to using [multivalued fields](../searching-multi-valued-fields) like arrays, maps and tensors. Use the [VespaDocSystemTest](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/test/java/ai/vespa/cloud/docsearch/VespaDocSystemTest.java) to build code that feeds and tests an instance in the Vespa Developer Cloud / local Docker instance. Both sample apps also use the Document API to GET/PUT/UPDATE other documents as part of processing, using asynchronous [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java). Use this as a starting point for applications that enrich data when writing. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Document IDs](#document-ids) - [id scheme](#id-scheme) - [Document IDs in search results](#docid-in-results) - [Namespace](#namespace) - [Fields](#fields) - [Fieldsets](#fieldsets) - [Document expiry](#document-expiry) - [Processing documents](#processing-documents) --- # Source: https://docs.vespa.ai/en/learn/tutorials/e-commerce.html.md # Use Case - shopping The [e-commerce, or shopping, use case](https://github.com/vespa-engine/sample-apps/tree/master/use-case-shopping)is an example of an e-commerce site complete with sample data and a web front end to browse product data and reviews. To quick start the application, follow the instructions in the[README](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/README.md)in the sample app. ![Shopping sample app screenshot](/assets/img/shopping-1.png) To browse the application, navigate to[localhost:8080/site](http://localhost:8080/site). This site is implemented through a custom [request handler](../../applications/request-handlers.html)and is meant to be a simple example of creating a front end / middleware that sits in front of the Vespa back end. As such it is fairly independent of Vespa features, and the code is designed to be fairly easy to follow and as non-magical as possible. All the queries against Vespa are sent as HTTP requests, and the JSON results from Vespa are parsed and rendered. This sample application is built around the Amazon product data set found at[https://cseweb.ucsd.edu/~jmcauley/datasets.html](https://cseweb.ucsd.edu/~jmcauley/datasets.html). A small sample of this data is included in the sample application, and full data sets are available from the above site. This sample application contains scripts to convert from the data set format to Vespa format:[convert\_meta.py](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/convert_meta.py) and[convert\_reviews.py](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/convert_reviews.py). See [README](https://github.com/vespa-engine/sample-apps/tree/master/use-case-shopping#readme) for example use. When feeding reviews, there is a custom [document processor](../../applications/document-processors.html)that intercepts document writes and updates the parent item with the review rating, so the aggregated review rating is kept stored with the item - see [ReviewProcessor](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/src/main/java/ai/vespa/example/shopping/ReviewProcessor.java). This is more an example of a custom document processor than a recommended way to do this, as feeding the reviews more than once will result in inflated values. To do this correctly, one should probably calculate this offline so a re-feed does not cause unexpected results. ### Highlighted features - [Multiple document types](../../basics/schemas.html) - [Custom document processor](../../applications/document-processors.html) - [Custom searcher processor](../../applications/searchers.html) - [Custom handlers](../../applications/request-handlers.html) - [Custom configuration](../../applications/configuring-components.html) - [Partial update](../../reference/schemas/document-json-format.html#update) - [Search using YQL](../../querying/query-language.html) - [Grouping](../../querying/grouping.html) - [Rank profiles](../../basics/ranking.html) - [Native embedders](../../rag/embedding.html) - [Vector search](../../querying/nearest-neighbor-search) - [Ranking functions](../../reference/schemas/schemas.html#function-rank) Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/content/elasticity.html.md # Content cluster elasticity Vespa clusters can be grown and shrunk while serving queries and writes. Documents in content clusters are automatically redistributed on changes to maintain an even distribution with minimal data movement. To resize, just change the [nodes](../reference/applications/services/services.html#nodes) and redeploy the application - no restarts needed. ![A cluster growing in two dimensions](/assets/img/elastic-grow.svg) Documents are managed by Vespa in chunks called [buckets](#buckets). The size and number of buckets are completely managed by Vespa and there is never any need to manually control sharding. The elasticity mechanism is also used to recover from a node loss: New replicas of documents are created automatically on other nodes to maintain the configured redundancy. Failed nodes is therefore not a problem that requires immediate attention - clusters will self-heal from node failures as long as there are sufficient resources. ![A cluster with a node failure](/assets/img/elastic-fail.svg) When you want to remove nodes from a content cluster, you can have the system migrate data off them in an orderly fashion prior to removal. This is done by marking nodes as _retired_. This is useful to remove nodes that should be retired, but also to migrate a cluster to entirely new nodes while online: Add the new nodes, mark the old nodes retired, wait for the data to be redistributed and remove the old nodes. The auto-elasticity is configured for a normal fail-safe operation, but there are tradeoffs like recovery speed and resource usage. Learn more in [procedures](../operations/self-managed/admin-procedures.html#content-cluster-configuration). ## Adding nodes To add or remove nodes from a content cluster, just `nodes` tag of the [content](../reference/applications/services/content.html) cluster in [services.xml](../reference/applications/services/services.html) and [redeploy](../basics/applications.html#deploying-applications). Read more in [procedures](../operations/self-managed/admin-procedures.html). When adding a new node, a new _ideal state_ is calculated for all buckets. The buckets mapped to the new node are moved, the superfluous are removed. See redistribution example - add a new node to the system, with redundancy n=2: ![Bucket migration as a node is added to the cluster](/assets/img/add-node-move-buckets.svg) The distribution algorithm generates a random node sequence for each bucket. In this example with n=2, replicas map to the two nodes sorted first. The illustration shows how placement onto two nodes changes as a third node is added. The new node takes over as primary for the buckets where it got sorted first, and as secondary for the buckets where it got sorted second. This ensures minimal data movement when nodes come and go, and allows capacity to be changed easily. No buckets are moved between the existing nodes when a new node is added. Based on the pseudo-random sequences, some buckets change from primary to secondary, or are removed. Multiple nodes can be added in the same deployment. ## Removing nodes Whether a node fails or is _retired_, the same redistribution happens. If the node is retired, replicas are generated on the other nodes and the node stays up, but with no active replicas. Example of redistribution after node failure, n=2: ![Bucket migration as a node is removed from the cluster](/assets/img/lose-node-move-buckets.svg) Here, node 2 fails. This node held the active replicas of bucket 2 and 6. Once the node fails the secondary replicas are set active. If they were already in a _ready_ state, they start serving queries immediately, otherwise they will index replicas, see [searchable-copies](../reference/applications/services/content.html#searchable-copies). All buckets that no longer have secondary replicas are merged to the remaining nodes according to the ideal state. ## Grouped distribution Nodes in content clusters can be placed in [groups](../reference/applications/services/content.html#group). A group of nodes in a content cluster will have one or more complete replicas of the entire document corpus. ![A cluster changes from using one to many groups](/assets/img/query-groups.svg) This is useful in the cases listed below: | Cluster upgrade | With multiple groups it becomes safe to take out a full group for upgrade instead of just one node at a time. [Read more](../operations/self-managed/live-upgrade.html). | | Query throughput | Applications with high query rates and/or high static query cost can use groups to scale to higher query rates since Vespa will automatically send a query to just a single group. [Read more](../performance/sizing-search.html). | | Topology | By using groups you can control replica placement over network switches or racks to ensure there is redundancy at the switch and rack level. | Tuning group sizes and node resources enables applications to easily find the latency/cost sweet spot, the elasticity operations are automatic and queries and writes work as usual with no downtime. ## Changing topology A Vespa elasticity feature is the ability to change topology (i.e. grouped distribution) without service disruption. This is a live change, and will auto-redistribute documents to the new topology. Also read [topology change](../operations/self-managed/admin-procedures.html#topology-change) if running Vespa self-hosted - the below steps are general for all hosting options. ### Replicas When changing topology, pay attention to the [min-redundancy](../reference/applications/services/content.html#min-redundancy) setting - this setting configures a _minimum_ number of replicas in a cluster, the _actual_ number is topology dependent - example: A flat cluster with min-redundancy n=2 and 15 nodes is changed into a grouped cluster with 3 groups with 5 nodes each (total node count and n is kept unchanged). In this case, the actual redundancy will be 3 after the change, as each of the 3 groups will have at least 1 replica for full query coverage. The practical consequence is that disk and memory requirements per node _increases_ due to the change to topology. It is therefore important to calculate the actual replica count before reconfiguring topology. ### Query coverage Changing topology might cause query coverage loss in the transition, unless steps taken in the right order. If full coverage is not important, just make the change and wait for document redistribution to complete. To keep full query coverage, make sure not to change both group size and number of groups at the same time: 1. To add nodes for more data, or to have less data per node, increase group size. E.g., in a 2-group cluster with 8 nodes per group, add 4 nodes for a 25% capacity increase with 10 nodes per group. 2. If the goal is to add query capacity, add one or more groups, with the same node count as existing group(s). A flat cluster is the same as one group - if the flat cluster has 8 nodes, change to a grouped cluster with 2 groups of 8 nodes per group. This will add an empty group, which is put in query serving once populated. In short, if the end-state means both changing number of groups and node count per group, do this as separate steps, as a combination of the above. Between each step, wait for document redistribution to complete using the `merge_bucket.pending` metric - see [example](../writing/initial-batch-feed.html). ## Buckets To manage documents, Vespa groups them in _buckets_, using hashing or hints in the [document id](../schemas/documents.html). A document Put or Update is sent to all replicas of the bucket with the document. If bucket replicas are out of sync, a bucket merge operation is run to re-sync the bucket. A bucket contains [tombstones](../operations/self-managed/admin-procedures.html#data-retention-vs-size) of recently removed documents. Buckets are split when they grow too large, and joined when they shrink. This is a key feature for high performance in small to large instances, and eliminates need for downtime or manual operations when scaling. Buckets are purely a content management concept, and data is not stored or indexed in separate buckets, nor does queries relate to buckets in any way. Read more in [buckets](buckets.html). ## Ideal state distribution algorithm The [ideal state distribution algorithm](idealstate.html) uses a variant of the [CRUSH algorithm](https://ceph.io/assets/pdfs/weil-crush-sc06.pdf) to decide bucket placement. It makes a minimal number of documents move when nodes are added or removed. Central to the algorithm is the assignment of a node sequence to each bucket: ![Assignment of a node sequence to each bucket](/assets/img/bucket-node-sequence.svg) Steps to assign a bucket to a set of nodes: 1. Seed a random generator with the bucket ID to generate a pseudo-random sequence of numbers. Using the bucket ID as seed will then always generate the same sequence for the bucket. 2. Nodes are ordered by [distribution-key](../reference/applications/services/content.html#node), assign the random number in that order. E.g. a node with distribution-key 0 will get the first random number, node 1 the second. 3. Sort the node list by the random number. 4. Select nodes in descending random number order - above, node 1, 3 and 0 will store bucket 0x3c000000000000a0 with n=3 (redundancy). For n=2, node 1 and 3 will store the bucket. This specification of where to place a bucket is called the bucket's _ideal state_. Repeat this for all buckets in the system. ## Consistency Consistency is maintained at bucket level. Content nodes calculate local checksums based on the bucket contents, and the distributors compare checksums across the bucket replicas. A _bucket merge_ is issued to resolve inconsistency, when detected. While there are inconsistent bucket replicas, operations are routed to the "best" replica. As buckets are split and joined, it is possible for replicas of a bucket to be split at different levels. A node may have been down while its buckets have been split or joined. This is called _inconsistent bucket splitting_. Bucket checksums can not be compared across buckets with different split levels. Consequently, content nodes do not know whether all documents exist in enough replicas in this state. Due to this, inconsistent splitting is one of the highest maintenance priorities. After all buckets are split or joined back to the same level, the content nodes can verify that all the replicas are consistent and fix any detected issues with a merge. [Read more](consistency). ## Further reading - [content nodes](content-nodes.html) - [proton](proton.html) - see _ready_ state Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Adding nodes](#adding-nodes) - [Removing nodes](#removing-nodes) - [Grouped distribution](#grouped-distribution) - [Changing topology](#changing-topology) - [Replicas](#replicas) - [Query coverage](#query-coverage) - [Buckets](#buckets) - [Ideal state distribution algorithm](#ideal-state-distribution-algorithm) - [Consistency](#consistency) - [Further reading](#further-reading) --- # Source: https://docs.vespa.ai/en/rag/embedding.html.md # Source: https://docs.vespa.ai/en/reference/rag/embedding.html.md # Embedding Reference Reference configuration for [embedders](../../rag/embedding.html). ## Model config reference Embedder models use the [model](../applications/config-files.html#model) type configuration which accepts the attributes `model-id`, `url` or `path`. Multiple of these can be specified as a single config value, where one is used depending on the deployment environment: - If a `model-id` is specified and the application is deployed on Vespa Cloud, the `model-id` is used. - Otherwise, if a `url` is specified, it is used - Otherwise, `path` is used. When using `path`, the model files must be supplied in the application package. ## Huggingface Embedder An embedder using any [Huggingface tokenizer](https://huggingface.co/docs/tokenizers/index), including multilingual tokenizers, to produce tokens which is then input to a supplied transformer model in ONNX model format. The Huggingface embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag: ``` ``` query: passage: ... ``` ``` ### Private Model Hub You may also use models hosted in a[private Huggingface model hub](https://huggingface.co/docs/hub/en/repositories-settings#private-repositories). Retrieve an API key from Huggingface with the appropriate permissions, and add it to the [vespa secret store.](../../security/secret-store)Add the secret to the container `` and refer to it in your Huggingface model configuration: ``` ``` ``` ``` ### Huggingface embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A | | max-tokens | One | The maximum number of tokens accepted by the transformer model | numeric | 512 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use `` | string | token\_type\_ids | | transformer-output | One | The name or identifier for the transformer output | string | last\_hidden\_state | | pooling-strategy | One | How the output vectors of the ONNX model is pooled to obtain a single vector representation. Valid values are `mean`,`cls` and `none` | string | mean | | normalize | One | A boolean indicating whether to normalize the output embedding vector to unit length (length 1). Useful for `prenormalized-angular`[distance-metric](../schemas/schemas.html#distance-metric) | boolean | false | | prepend | Optional | Prepend instructions that are prepended to the text input before tokenization and inference. Useful for models that have been trained with specific prompt instructions. The instructions are prepended to the input text. - Element \ - Optional query prepend instruction. - Element \ - Optional document prepend instruction. ``` ``` query: passage: ``` ``` | Optional \ \ elements. | | ## Bert embedder The Bert embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag: ``` ``` ``` ``` ### Bert embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-vocab | One | Use to point to the Huggingface `vocab.txt` tokenizer file with valid wordpiece tokens. Does not support `tokenizer.json` format. | [model-type](#model-config-reference) | N/A | | max-tokens | One | The maximum number of tokens allowed in the input | integer | 384 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use `` | string | token\_type\_ids | | transformer-output | One | The name or identifier for the transformer output | string | output\_0 | | transformer-start-sequence-token | One | The start of sequence token | numeric | 101 | | transformer-end-sequence-token | One | The start of sequence token | numeric | 102 | | pooling-strategy | One | How the output vectors of the ONNX model is pooled to obtain a single vector representation. Valid values are `mean` and `cls` | string | mean | ## colbert embedder The colbert embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag: ``` ``` 32 256 ``` ``` The Vespa colbert implementation works with default configurations for transformer models that use WordPiece tokenization. ### colbert embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ColBERT ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A | | max-tokens | One | Max length of token sequence the transformer-model can handle | numeric | 512 | | max-query-tokens | One | The maximum number of ColBERT query token embeddings. Queries are padded to this length. Must be lower than max-tokens | numeric | 32 | | max-document-tokens | One | The maximum number of ColBERT document token embeddings. Documents are not padded. Must be lower than max-tokens | numeric | 512 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-mask-token | One | The mask token id used for ColBERT query padding | numeric | 103 | | transformer-start-sequence-token | One | The start of sequence token id | numeric | 101 | | transformer-end-sequence-token | One | The end of sequence token id | numeric | 102 | | transformer-pad-token | One | The pad sequence token id | numeric | 0 | | query-token-id | One | The colbert query token marker id | numeric | 1 | | document-token-id | One | The colbert document token marker id | numeric | 2 | | transformer-output | One | The name or identifier for the transformer output | string | contextual | The Vespa colbert-embedder uses `[unused0]`token id 1 for `query-token-id`, and `[unused1]`, token id 2 for ` document-token-id`document marker. Document punctuation chars are filtered (not configurable). The following characters are removed `!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~`. ### splade embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A | | term-score-threshold | One | An optional threshold to increase sparseness, tokens/terms with a score lower than this is not retained. | numeric | N/A | | max-tokens | One | The maximum number of tokens accepted by the transformer model | numeric | 512 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use `` | string | token\_type\_ids | | transformer-output | One | The name or identifier for the transformer output | string | logits | ## Huggingface tokenizer embedder The Huggingface tokenizer embedder is configured in [services.xml](../applications/services/services.html), within the `container` tag: ``` ``` ``` ``` ### Huggingface tokenizer reference config | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | model | One To Many | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file. Also supports `language`, which is only relevant if one wants to tokenize differently based on the document language. Use "unknown" for a model to be used for any language (i.e. by default). | [model-type](#model-config-reference) | N/A | ## Embedder ONNX reference config Vespa uses [ONNX Runtime](https://onnxruntime.ai/) to accelerate inference of embedding models. These parameters are valid for both [Bert embedder](#bert-embedder) and [Huggingface embedder](#huggingface-embedder). | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | onnx-execution-mode | One | Low level ONNX execution model. Valid values are `parallel` or `sequential`. Only relevant for inference on CPU. See [ONNX runtime documentation](https://onnxruntime.ai/docs/performance/tune-performance/threading.html) on threading. | string | sequential | | onnx-interop-threads | One | Low level ONNX setting.Only relevant for inference on CPU. | numeric | 1 | | onnx-intraop-threads | One | Low level ONNX setting. Only relevant for inference on CPU. | numeric | 4 | | onnx-gpu-device | One | The GPU device to run the model on. See [configuring GPU for Vespa container image](/en/operations/self-managed/vespa-gpu-container.html). Use `-1` to not use GPU for the model, even if the instance has available GPUs. | numeric | 0 | ## SentencePiece embedder A native Java implementation of [SentencePiece](https://github.com/google/sentencepiece). SentencePiece breaks text into chunks independent of spaces, which is robust to misspellings and works with CJK languages. Prefer the [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) over this for better compatibility with Huggingface models. This is suitable to use in conjunction with [custom components](../../applications/components.html), or the resulting tensor can be used in [ranking](../../basics/ranking.html). To use the [SentencePiece embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/sentencepiece/SentencePieceEmbedder.java), add it to [services.xml](../applications/services/services.html): ``` ``` ; unknown model/en.wiki.bpe.vs10000.model ``` ``` See the options available for configuring SentencePiece in [the full configuration definition](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/resources/configdefinitions/language.sentencepiece.sentence-piece.def). ## WordPiece embedder A native Java implementation of [WordPiece](https://github.com/google-research/bert#tokenization), which is commonly used with BERT models. Prefer the [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) over this for better compatibility with Huggingface models. This is suitable to use in conjunction with [custom components](../../applications/components.html), or the resulting tensor can be used in [ranking](../../basics/ranking.html). To use the [WordPiece embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/wordpiece/WordPieceEmbedder.java), add it to [services.xml](../applications/services/services.html) within the `container` tag: ``` ``` class="com.yahoo.language.wordpiece.WordPieceEmbedder" bundle="linguistics-components"> unknown models/bert-base-uncased-vocab.txt ``` ``` See the options available for configuring WordPiece in [the full configuration definition](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/resources/configdefinitions/language.wordpiece.word-piece.def). WordPiece is suitable to use in conjunction with custom components, or the resulting tensor can be used in [ranking](../../basics/ranking.html). ## Using an embedder from Java When writing custom Java components (such as [Searchers](../../applications/searchers.html) or [Document processors](../../applications/document-processors.html#document-processors)), use embedders you have configured by [having them injected in the constructor](../../applications/dependency-injection.html), just as any other component: ``` ``` class MyComponent { @Inject public MyComponent(ComponentRegistry embedders) { // embedders contains all the embedders configured in your services.xml } } ``` ``` See a concrete example of using an embedder in a custom searcher in[LLMSearcher](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/java/ai/vespa/cloud/docsearch/LLMSearcher.java). ## Custom Embedders Vespa provides a Java interface for defining components which can provide embeddings of text:[com.yahoo.language.process.Embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/process/Embedder.java). To define a custom embedder in an application and make it usable by Vespa (see [embedding a query text](../../rag/embedding.html#embedding-a-query-text)), implement this interface and add it as a [component](../../applications/developer-guide.html#developing-components) to [services.xml](../applications/services/container.html): ``` ``` foo ``` ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Model config reference](#model-config-reference) - [Huggingface Embedder](#huggingface-embedder) - [Private Model Hub](#private-model-hub) - [Huggingface embedder reference config](#huggingface-embedder-reference-config) - [Bert embedder](#bert-embedder) - [Bert embedder reference config](#bert-embedder-reference-config) - [colbert embedder](#colbert-embedder) - [colbert embedder reference config](#colbert-embedder-reference-config) - [splade embedder reference config](#splade-embedder-reference-config) - [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) - [Huggingface tokenizer reference config](#huggingface-tokenizer-reference-config) - [Embedder ONNX reference config](#embedder-onnx-reference-config) - [SentencePiece embedder](#sentencepiece-embedder) - [WordPiece embedder](#wordpiece-embedder) - [Using an embedder from Java](#using-an-embedder-from-java) - [Custom Embedders](#custom-embedders) --- # Source: https://docs.vespa.ai/en/operations/enclave/enclave.html.md # Vespa Cloud Enclave ![enclave architecture](/assets/img/enclave-architecture.png) Vespa Cloud Enclave allows Vespa Cloud applications to run inside the tenant's own cloud accounts while everything is still fully managed by Vespa Cloud's automation, giving the tenant full access to Vespa Cloud features inside their own cloud account. This allows tenant data to always remain within the bounds of services controlled by the tenant, and also to build closer integrations with Vespa applications inside the cloud services. Vespa Cloud Enclave is available in AWS, Azure, and GCP. **Note:** As the Vespa Cloud Enclave resources run in _your_ account, this incurs resource costs from your cloud provider in _addition_ to the Vespa Cloud costs. ## AWS - [Getting started](aws-getting-started.html) - [Architecture and security](aws-architecture) ## Azure - [Getting started](azure-getting-started.html) - [Architecture and security](azure-architecture) ## GCP - [Getting started](gcp-getting-started.html) - [Architecture and security](gcp-architecture) ## Guides - [Log archive](archive) - [Operations and Support](operations) ## FAQ **Which kind of permission is needed for the Vespa control plane to access my AWS accounts / Azure subscriptions / GCP projects?**The permissions required are coded into the Terraform modules found at: - [terraform-aws](https://github.com/vespa-cloud/terraform-aws-enclave/tree/main) - [terraform-azure](https://github.com/vespa-cloud/terraform-azure-enclave/tree/main) - [terraform-google](https://github.com/vespa-cloud/terraform-google-enclave/tree/main) Navigate to the _modules_ directory for details. **How can I configure agents/daemons on Vespa hosts securely?**Use terraform to grant Vespa hosts access to necessary secrets, and create an RPM that retrieves them and configures your application. See [enclave-examples](https://github.com/vespa-cloud/enclave-examples/tree/main/systemd-secrets)for a complete example. **Deployment failure: Could not provision …**This happens if you deploy to new zones _before_ running the Terraform/CloudFormation templates: ``` Deployment failed: Invalid application: In container cluster 'mycluster': Could not provision load balancer mytenant:myapp:myinstance:mycluster: Expected to find exactly 1 resource, but got 0 for subnet with service 'tenantelb' ``` **Do we need to take any actions when AWS sends us Amazon EC2 Instance Retirement, Amazon EC2 Instance Availability Issue, or Amazon EC2 Maintenance notifications,?** Vespa Cloud will take proactive actions on maintenance operations and replace instances that are scheduled for maintenance tasks ahead of time to reduce any impact the maintenance may incur. All EC2 instance failures are detected by our control plane, and the problematic instances are automatically replaced. The system will, as part of the replacement process, also ensure that the document distribution is kept in line with your application configuration. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [AWS](#aws) - [Azure](#azure) - [GCP](#gcp) - [Guides](#guides) - [FAQ](#faq) --- # Source: https://docs.vespa.ai/en/operations/endpoint-routing.html.md # Routing and endpoints Vespa Cloud supports multiple methods of routing requests to an application. This guide describes how these routing methods work, failover, and how to configure them. By default, each deployment of a Vespa Cloud application will have a zone endpoint. In addition to the default zone endpoint, one can configure global endpoints. All endpoints for an application are available under the _endpoints_ tab of each deployment in the console. ## Endpoint format Vespa Cloud endpoints are on the format: `{random}.{random}.{scope}.vespa-app.cloud`. ## Endpoint scopes ### Zone endpoint This is the default endpoint for a deployment. Requests through a zone endpoint are sent directly to the zone. Zone endpoints are created implicitly, one per container cluster declared in [services.xml](/en/reference/applications/services/container.html). Zone endpoints are not configurable. Zone endpoints have the suffix `z.vespa-app.cloud` ### Global endpoint A global endpoint is an endpoint that can route requests to multiple zones. It can be configured in [deployment.xml](/en/reference/applications/deployment.html#endpoints-global). Similar to how a [CDN](https://en.wikipedia.org/wiki/Content_delivery_network) works, requests through this endpoint will be routed to the nearest zone based on geo proximity, i.e. the zone that is nearest to the client. Global endpoints have the suffix `g.vespa-app.cloud` **Important:** Global endpoints do not support feeding. Feeding must be done through zone endpoints. ## Routing control Vespa Cloud has two mechanisms for manually controlling routing of requests to a zone: - Removing the `` element from the relevant `` elements in [deployment.xml](../reference/applications/deployment.html) and deploying a new version of your application. - Changing the status through the console. This section describes the latter mechanism. Navigate to the relevant deployment of your application in the console. Hovering over the _GLOBAL ROUTING_ badge will display the current status and when it was last changed. ### Change status In case of a production emergency, a zone can be manually set out to prevent it from receiving requests: 1. Hover over the _GLOBAL ROUTING_ badge for the problematic deployment and click _Deactivate_. 2. Inspection of the status will now show the status set to _OUT_. To set the zone back in and have it continue receiving requests: Hover over the _GLOBAL ROUTING_ badge again and click _Activate_. ### Behaviour Changing the routing status is independent of the endpoint scope used. You're technically overriding the routing status the deployment reports to the Vespa Cloud routing infrastructure. This means that a change to routing status affects both _zonal endpoints_ and _global endpoints_. Deactivating a deployment disables routing of requests to that deployment through global endpoints until the deployment is activated again. As routing through these endpoints is DNS-based, it may take up between 5 and 15 minutes for all traffic to shift to other deployments. If all deployments of an endpoint are deactivated, requests are distributed as if all deployments were active. This is because attempting to route traffic according to the original configuration is preferable to discarding all requests. ## AWS clients While Vespa Cloud is hosted in AWS, clients that talk to Vespa Cloud from AWS nodes will be treated as any other client from the Internet. This means clients in AWS will generate regular Internet egress traffic even though they are talking to a service in AWS in the same zone. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Endpoint format](#) - [Endpoint scopes](#) - [Zone endpoint](#zone-endpoint) - [Global endpoint](#global-endpoint) - [Routing control](#routing-control) - [Change status](#) - [Behaviour](#) - [AWS clients](#aws-clients) --- # Source: https://docs.vespa.ai/en/operations/environments.html.md # Environments Vespa Cloud has two kinds of environments: - Manual environment for rapid development and test: `dev` - Automated environment with integrated CD pipeline: `prod` An application is deployed to one or more _zones_ (see [zone list](zones.html)), which is a combination of an _environment_ and a _region_, like `vespa deploy -z dev.aws-us-east-1c`. ## Dev The dev environment is built for rapid developments cycles, with auto-downscaling and auto-expiry for ease of use and cost control. The dev environment is the default, to deploy to this, use `vespa deploy`. ### Auto downscaling One use case for the dev environment is to take an application package from a prod environment and deploy to the dev environment to debug. To minimize cost and make this speedy, Vespa Cloud will by default ignore [nodes](../reference/applications/services/services.html#nodes) and [resources](../reference/applications/services/services.html#resources) settings. With this, you can safely download an application package from prod (that are normally large) and deploy to dev, with no changes. To override this behavior and control the resources, specify them explicitly for the dev environment as described in [deployment variants](deployment-variants.html#services.xml-variants). Example: ``` ``` > ``` ``` **Important:** The `dev` environment has redundancy 1 by default, and there are no availability or data persistence guarantees. Do not use applications deployed to these zones for production serving use cases. ### Auto expiry Deployments to `dev` expire after 14 days of inactivity, that is, 14 days after the last [deployment](../basics/applications.html#deploying-applications). **This applies to all plans**. To add 7 more days to the expiry period, redeploy the application or use the Vespa Cloud Console. ### Vespa version The latest active Vespa version is used when deploying to the dev environment. The deployment is upgraded at a time which is most likely at night for the developer in order to minimize downtime (based on the time when last deployments were made). An upgrade will be skipped if metrics indicate ongoing feed or query load, but will still be done if current version is more than a week old. ## Prod Applications are deployed to the `prod` environment for production serving. Deployments are passed through an integrated CD pipeline for system tests and staging tests. Read more in [automated deployments](automated-deployments.html). ## Test The `test` environment is used by the integrated CD pipeline for prod deployments, to run [system tests](automated-deployments.html#system-tests). The test capacity is ephemeral and only used during test. Nodes in test and staging environments do not have access to data in prod environments. Note that one cannot deploy directly to test and staging environments. For long-lived test applications (e.g., a QA system that is integrated with other services) use the prod environment. System tests are always invoked, even if there are no tests defined. In this case, an instance is just started and then stopped. This has value in itself, as it ensures that the application is able to start. Test runs can be [aborted](automated-deployments.html#disabling-tests). ## Staging See system tests above, this applies to the staging, too. [Staging tests](automated-deployments.html#staging-tests) use a fraction of the configured prod capacity, this can be overridden to using 1 node regardless of prod cluster size: ``` ``` ``` ``` ## Reference Environment settings: | Name | Description | Expiry | Cluster sizes | | --- | --- | --- | --- | | `dev` | Used for manual development testing. | 14 days | `1` | | `test` | Used for [automated system tests](../applications/testing.html#system-tests). | - | `1` | | `staging` | Used for [automated staging tests](../applications/testing.html#staging-tests). | - | `min(max(2, 0.05 * spec), spec)` | | `prod` | Hosts all production deployments. | No expiry | `max(2, spec)` | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Dev](#dev) - [Auto downscaling](#auto-downscaling) - [Auto expiry](#auto-expiry) - [Vespa version](#vespa-version) - [Prod](#prod) - [Test](#test) - [Staging](#staging) - [Reference](#reference) --- # Source: https://docs.vespa.ai/en/schemas/exposing-schema-information.html.md # Exposing schema information Some applications need to expose information about schemas to data plane clients. This document explains how to add an API for that to your application. You need to know two things: - Your application can expose any custom API by implementing a [handler](../applications/request-handlers.html). - Information about the deployed schemas are available in the component _com.yahoo.search.schema.SchemaInfo_. With this information, we can add an API exposing schemas information through the following steps. ## 1. Make sure your application package can contain Java components Application packages containing Java components must follow Maven layout. If your application package root contains a `pom.xml` and `src/main`you're good, otherwise convert it to this layout by copying the pom.xml from[the album-recommendation.java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java)sample app and moving the files to follow this layout before moving on. ## 2. Add a handler exposing schema info Add the following handler (to a package of your choosing): ``` ``` package ai.vespa.example; import com.yahoo.container.jdisc.HttpRequest; import com.yahoo.container.jdisc.HttpResponse; import com.yahoo.container.jdisc.ThreadedHttpRequestHandler; import com.yahoo.jdisc.Metric; import com.yahoo.search.schema.SchemaInfo; import java.io.IOException; import java.io.OutputStream; import java.nio.charset.Charset; import java.util.concurrent.Executor; public class SchemaInfoHandler extends ThreadedHttpRequestHandler { private final SchemaInfo schemaInfo; public SchemaInfoHandler(Executor executor, Metric metric, SchemaInfo schemaInfo) { super(executor, metric); this.schemaInfo = schemaInfo; } @Override public HttpResponse handle(HttpRequest httpRequest) { // Creating JSON, handling different paths etc. left as an exercise for the reader StringBuilder response = new StringBuilder(); for (var schema : schemaInfo.schemas().values()) { response.append("schema: " + schema.name() + "\n"); for (var field : schema.fields().values()) response.append(" field: " + field.name() + "\n"); } return new Response(200, response.toString()); } private static class Response extends HttpResponse { private final byte[] data; Response(int code, byte[] data) { super(code); this.data = data; } Response(int code, String data) { this(code, data.getBytes(Charset.forName(DEFAULT_CHARACTER_ENCODING))); } @Override public String getContentType() { return "application/json"; } @Override public void render(OutputStream outputStream) throws IOException { outputStream.write(data); } } private static class ErrorResponse extends Response { ErrorResponse(int code, String message) { super(code, "{\"error\":\"" + message + "\"}"); } } } ``` ``` ## 3. Add the new API handler to your container cluster In your `services.xml` file, under ``, add: ``` ``` http://*/schema/v1/* ``` ``` ## 4. Deploy the modified application ``` $ mvn install $ vespa deploy ``` ## 5. Verify that it works ``` $ vespa curl "schema/v1/" ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [1. Make sure your application package can contain Java components](#) - [2. Add a handler exposing schema info](#) - [3. Add the new API handler to your container cluster](#) - [4. Deploy the modified application](#) - [5. Verify that it works](#) --- # Source: https://docs.vespa.ai/en/rag/external-llms.html.md # External LLMs in Vespa Please refer to [Large Language Models in Vespa](llms-in-vespa.html) for an introduction to using LLMs in Vespa. Vespa provides a client for integration with OpenAI compatible APIs. This includes, but is not limited to [OpenAI](https://platform.openai.com/docs/overview), [Google Gemini](https://ai.google.dev/), [Anthropic](https://www.anthropic.com/api), [Cohere](https://docs.cohere.com/docs/compatibility-api) and [Together.ai](https://docs.together.ai/docs/openai-api-compatibility). You can also host your own OpenAI-compatible server using for example [VLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#quickstart-online) or [llama-cpp-server](https://llama-cpp-python.readthedocs.io/en/latest/server/). **Note:** Note that this is currently a Beta feature so changes can be expected. ### Configuring the OpenAI client To set up a connection to an LLM service such as OpenAI's ChatGPT, you need to define a component in your application's[services.xml](../reference/applications/services/services.html): ``` ... ... ... ... ``` To see the full list of available configuration parameters, refer to the [llm-client config definition file](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/resources/configdefinitions/llm-client.def). This sets up a client component that can be used in a[searcher](../learn/glossary.html#searcher) or a [document processor](../learn/glossary.html#document-processor). ### API key configuration Vespa provides several options to configure the API key used by the client. 1. Using the [Vespa Cloud secret store](../security/secret-store) to store the API key. 2. This is done by setting the `apiKeySecretRef` configuration parameter to the name of the secret 3. in the secret store. This is the recommended way for Vespa Cloud users. 4. Providing the API key in the `X-LLM-API-KEY` HTTP header of the Vespa query. 5. It is also possible to configure the API key in a custom component. For example, [this](https://github.com/vespa-engine/system-test/tree/master/tests/docproc/generate_field_openai) system-test shows how to retrieve the API key from a local file deployed with your Vespa application. Please note that this is NOT recommended for production use, as it is less secure than using the secret store, but it can be modified to suit your needs. You can set up multiple connections with different settings. For instance, you might want to run different LLMs for different tasks. To distinguish between the connections, modify the `id` attribute in the component specification. We will see below how this is used to control which LLM is used for which task. As a reminder, Vespa also has the option of running custom LLMs locally. Please refer to[running LLMs in your application](local-llms.html) for more information. ### Inference parameters Please refer to the general discussion in [LLM parameters](llms-in-vespa.html#llm-parameters) for setting inference parameters. The OpenAI-client also has the following inference parameters that can be sent along with the query: | Parameter (Vespa) | Parameter (OpenAI) | Description | | --- | --- | --- | | `maxTokens` | `max_completion_tokens` | Maximum number of tokens that can be generated in the chat completion. | | `temperature` | `temperature` | Number between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. | | `topP` | `top_p` | An alternative to temperature sampling. Model considers tokens with top\_p probability mass (0-1). Value of 0.1 means only tokens comprising top 10% probability are considered. | | `seed` | `seed` | If specified, the system will attempt to sample deterministically, so repeated requests with the same seed should return similar results. Determinism is not guaranteed. | | `npredict` | `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all choices. | | `frequencypenalty` | `frequency_penalty` | Number between -2.0 and 2.0. Positive values penalize new tokens based on their frequency in the text so far, decreasing the likelihood of repetition. Negative values encourage repetition. | | `presencepenalty` | `presence_penalty` | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Negative values encourage repeating content from the prompt. | Any parameter sent with the query will override configuration specified for the client component in `services.xml`. Note that if you are not using OpenAI's API, the parameters may be handled differently than the descriptions above. ### Connecting to other OpenAI-compatible providers By default, this particular client connects to the OpenAI service, but can be used against any[OpenAI chat completion compatible API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api)by changing the `endpoint` configuration parameter. ### FAQ - **Q: How do I know if my LLM is compatible with the OpenAI client?** - A: The OpenAI client is compatible with any LLM that implements the OpenAI chat completion API. You can check the documentation of your LLM provider to see if they support this API. - **Q: Can I use the [Responses](https://platform.openai.com/docs/api-reference/responses/create) provided by OpenAI** - A: No, currently only the [Chat Completion API](https://platform.openai.com/docs/api-reference/chat) is supported. - **Q: Can I use the OpenAI client for reranking?** - A: Yes, but currently, you need to implement a [custom searcher](../applications/searchers.html) that uses the OpenAI client to rerank the results. - **Q: Can I use the OpenAI client for retrieving embeddings?** - A: No, currently, only the [Chat Completion API](https://platform.openai.com/docs/api-reference/chat) is supported. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Configuring the OpenAI client](#configuring-the-openai-client) - [API key configuration](#api-key-configuration) - [Inference parameters](#inference-parameters) - [Connecting to other OpenAI-compatible providers](#connecting-to-other-openai-compatible-providers) - [FAQ](#faq) --- # Source: https://docs.vespa.ai/en/learn/faq.html.md # FAQ - frequently asked questions Refer to [Vespa Support](https://vespa.ai/support/) for more support options. ## Ranking ### Does Vespa support a flexible ranking score? [Ranking](../basics/ranking.html) is maybe the primary Vespa feature - we like to think of it as scalable, online computation. A rank profile is where the application's logic is implemented, supporting simple types like `double` and complex types like `tensor`. Supply ranking data in queries in query features (e.g. different weights per customer), or look up in a [Searcher](../applications/searchers.html). Typically, a document (e.g. product) "feature vector"/"weights" will be compared to a user-specific vector (tensor). ### Where would customer specific weightings be stored? Vespa doesn't have specific support for storing customer data as such. You can store this data as a separate document type in Vespa and look it up before passing the query, or store this customer meta-data as part of the other meta-data for the customer (i.e. login information) and pass it along the query when you send it to the backend. Find an example on how to look up data in[album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing). ### How to create a tensor on the fly in the ranking expression? Create a tensor in the ranking function from arrays or weighted sets using `tensorFrom...` functions - see [document features](../reference/ranking/rank-features.html#document-features). ### How to set a dynamic (query time) ranking drop threshold? Pass a ranking feature like `query(threshold)` and use an `if` statement in the ranking expression - see [retrieval and ranking](../ranking/ranking-intro#retrieval-and-ranking). Example: ``` rank-profile drop-low-score { function my_score() { expression: ..... #custom first phase score } rank-score-drop-limit:0.0 first-phase { if(my_score() < query(threshold), -1, my_score()) } } ``` ### Are ranking expressions or functions evaluated lazily? Rank expressions are not evaluated lazily. No, this would require lambda arguments. Only doubles and tensors are passed between functions. Example: ``` function inline foo(tensor, defaultVal) { expression: if (count(tensor) == 0, defaultValue, sum(tensor)) } function bar() { expression: foo(tensor, sum(tensor1 * tensor2)) } ``` ### Does Vespa support early termination of matching and ranking? Yes, this can be accomplished by configuring [match-phase](../reference/schemas/schemas.html#match-phase) in the rank profile, or by adding a range query item using _hitLimit_ to the query tree, see [capped numeric range search](../reference/querying/yql.html#numeric). Both methods require an _attribute_ field with _fast-search_. The capped range query is faster, but beware that if there are other restrictive filters in the query, one might end up with 0 hits. The additional filters are applied as a post filtering step over the hits from the capped range query. _match-phase_ on the other hand, is safe to use with filters or other query terms, and also supports diversification which the capped range query term does not support. ### What could cause the relevance field to be -Infinity The returned [relevance](../reference/querying/default-result-format.html#relevance) for a hit can become "-Infinity" instead of a double. This can happen in two cases: - The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce -Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this. - Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside of what Vespa computed and caches on a heap. This is controlled by the [keep-rank-count](../reference/schemas/schemas.html#keep-rank-count). ### How to pin query results? To hard-code documents to positions in the result set, see the [pin results example](../ranking/multivalue-query-operators.html#pin-results-example). ## Documents ### What limits apply to document size? There is a [maximum document size](../reference/applications/services/content.html#max-document-size) of 128 MiB, which is configurable per content cluster in services.xml. ### Is there any size limitation for multivalued fields? No enforced limit, except resource usage (memory). ### Can a document have lists (key value pairs)? E.g. a product is offered in a list of stores with a quantity per store. Use [multivalue fields](../searching-multi-valued-fields.html) (array of struct) or [parent child](../schemas/parent-child.html). Which one to chose depends on use case, see discussion in the latter link. ### Does a whole document need to be updated and re-indexed? E.g. price and quantity available per store may often change vs the actual product attributes. Vespa supports [partial updates](../writing/reads-and-writes.html) of documents. Also, the parent/child feature is implemented to support use-cases where child elements are updated frequently, while a more limited set of parent elements are updated less frequently. ### What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc? See the [Vespa Consistency Model](../content/consistency). Vespa is not transactional in the traditional sense, it doesn't have strict ACID guarantees. Vespa is designed for high performance use-cases with eventual consistency as an acceptable (and to some extent configurable) trade-off. ### Does vespa support wildcard fields? Wildcard fields are not supported in vespa. Workaround would be to use maps to store the wildcard fields. Map needs to be defined with `indexing: attribute` and hence will be stored in memory. Refer to [map](../reference/schemas/schemas.html#map). ### Can we set a limit for the number of elements that can be stored in an array? Implement a [document processor](../applications/document-processors.html) for this. ### How to auto-expire documents / set up garbage collection? Set a selection criterion on the `document` element in `services.xml`. The criterion selects documents to keep. I.e. to purge documents "older than two weeks", the expression should be "newer than two weeks". Read more about [document expiry](../schemas/documents.html#document-expiry). ### How to increase redundancy and track data migration progress? Changing redundancy is a live and safe change (assuming there is headroom on disk / memory - e.g. from 2 to 3 is 50% more). The time to migrate will be quite similar to what it took to feed initially - a bit hard to say generally, and depends on IO and index settings, like if building an HNSW index. To monitor progress, take a look at the[multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode)sample application for the _clustercontroller_ status page - this shows buckets pending, live. Finally, use the `.idealstate.merge_bucket.pending` metric to track progress - when 0, there are no more data syncing operations - see[monitor distance to ideal state](../operations/self-managed/admin-procedures.html#monitor-distance-to-ideal-state). Nodes will work as normal during data sync, and query coverage will be the same. ### How does namespace relate to schema? It does not,_namespace_ is a mechanism to split the document space into parts that can be used for document selection - see [documentation](../schemas/documents.html#namespace). The namespace is not indexed and cannot be searched using the query api, but can be used by [visiting](../writing/visiting.html). ### Visiting does not dump all documents, and/or hangs. There are multiple things that can cause this, see [visiting troubleshooting](../writing/visiting.html#troubleshooting). ### How to find number of documents in the index? Run a query like `vespa query "select * from sources * where true"` and see the `totalCount` field. Alternatively, use metrics or `vespa visit` - see [examples](../writing/batch-delete.html#example). ### Can I define a default value for a field? Not in the field definition, but it's possible to do this with the [choice](../writing/indexing.html#choice-example)expression in an indexing statement. ## Query ### Are hierarchical facets supported? Facets is called [grouping](../grouping.html) in Vespa. Groups can be multi-level. ### Are filters supported? Add filters to the query using [YQL](../querying/query-language.html)using boolean, numeric and [text matching](../querying/text-matching.html). Query terms can be annotated as filters, which means that they are not highlighted when bolding results. ### How to query for similar items? One way is to describe items using tensors and query for the[nearest neighbor](../reference/querying/yql.html#nearestneighbor) - using full precision or approximate (ANN) - the latter is used when the set is too large for an exact calculation. Apply filters to the query to limit the neighbor candidate set. Using [dot products](../ranking/multivalue-query-operators.html) or [weak and](../ranking/wand.html) are alternatives. ### Does Vespa support stop-word removal? Vespa does not have a stop-word concept inherently. See the [sample app](https://github.com/vespa-engine/sample-apps/pull/335/files)for how to use [filter terms](../reference/querying/yql.html#annotations).[Tripling the query performance of lexical search](https://blog.vespa.ai/tripling-the-query-performance-of-lexical-search/)it s good blog post on this subject. ### How to extract more than 400 hits / query and get ALL documents? Trying to request more than 400 hits in a query, getting this error:`{'code': 3, 'summary': 'Illegal query', 'message': '401 hits requested, configured limit: 400.'}`. - To increase max result set size (i.e. allow a higher [hits](../reference/api/query.html#hits)), configure `maxHits` in a [query profile](../reference/api/query.html#queryprofile), e.g. `500` in `search/query-profiles/default.xml` (create as needed). The [query timeout](../reference/api/query.html#timeout) can be increased, but it will still be costly and likely impact other queries - large limit more so than a large offset. It can be made cheaper by using a smaller [document summary](../querying/document-summaries.html), and avoiding fields on disk if possible. - Using _visit_ in the [document/v1/ API](../writing/document-v1-api-guide.html)is usually a better option for dumping all the data. ### How to make a sub-query to get data to enrich the query, like get a user profile? See the [UserProfileSearcher](https://github.com/vespa-engine/sample-apps/blob/master/news/app-6-recommendation-with-searchers/src/main/java/ai/vespa/example/UserProfileSearcher.java)for how to create a new query to fetch data - this creates a new Query, sets a new root and parameters - then `fill`s the Hits. ### How to create a cache that refreshes itself regularly See the sub-query question above, in addition add something like: ``` ``` public class ConfigCacheRefresher extends AbstractComponent { private final ScheduledExecutorService configFetchService = Executors.newSingleThreadScheduledExecutor(); private Chain searcherChain; void initialize() { Runnable task = () -> refreshCache(); configFetchService.scheduleWithFixedDelay(task, 1, 1, TimeUnit.MINUTES); searcherChain = executionFactory.searchChainRegistry().getChain(new ComponentId("configDefaultProvider")); } public void refreshCache() { Execution execution = executionFactory.newExecution(searcherChain); Query query = createQuery(execution); public void deconstruct() { super.deconstruct(); try { configFetchService.shutdown(); configFetchService.awaitTermination(1, TimeUnit.MINUTES); }catch(Exception e) {..} } } ``` ``` ### Is it possible to query Vespa using a list of document ids? Yes, using the [in query operator](../reference/querying/yql.html#in). Example: ``` select * from data where user_id in (10, 20, 30) ``` The best article on the subject is[multi-lookup set filtering](../performance/feature-tuning.html#multi-lookup-set-filtering). Refer to the [in operator example](../ranking/multivalue-query-operators.html#in-example)on how to use it programmatically in a [Java Searcher](../applications/searchers.html). ### How to query documents where one field matches any values in a list? Similar to using SQL IN operator Use the [in query operator](../reference/querying/yql.html#in). Example: ``` select * from data where category in ('cat1', 'cat2', 'cat3') ``` See [multi-lookup set filtering](#is-it-possible-to-query-vespa-using-a-list-of-document-ids)above for more details. ### How to count hits / all documents without returning results? Count all documents using a query like [select \* from doc where true](../querying/query-language.html) - this counts all documents from the "doc" source. Using `select * from doc where true limit 0` will return the count and no hits, alternatively add [hits=0](../reference/api/query.html#hits). Pass [ranking.profile=unranked](../reference/api/query.html#ranking.profile)to make the query less expensive to run. If an _estimate_ is good enough, use [hitcountestimate=true](../reference/api/query.html#hitcountestimate). ### Must all fields in a fieldset have compatible type and matching settings? Yes - a deployment warning with _This may lead to recall and ranking issues_ is emitted when fields with conflicting tokenization are put in the same[fieldset](../reference/schemas/schemas.html#fieldset). This is because a given query item searching one fieldset is tokenized just once, so there's no right choice of tokenization in this case. If you have user input that you want to apply to multiple fields with different tokenization, include the userInput multiple times in the query: ``` select * from sources * where ({defaultIndex: 'fieldsetOrField1'}userInput(@query)) or ({defaultIndex: 'fieldsetOrField2'}userInput(@query)) ``` More details on [stack overflow](https://stackoverflow.com/questions/72784136/why-vepsa-easily-warning-me-this-may-lead-to-recall-and-ranking-issues). ### How is the query timeout computed? Find query timeout details in the [Query API Guide](../querying/query-api.html#timeout)and the [Query API Reference](../reference/api/query.html#timeout). ### How does backslash escapes work? Backslash is used to escape special characters in YQL. For example, to query with a literal backslash, which is useful in regexpes, you need to escape it with another backslash: \. Unescaped backslashes in YQL will lead to "token recognition error at: ''". In addition, Vespa CLI unescapes double backslashes to single (while single backslashes are left alone), so if you query with Vespa CLI you need to escape with another backslash: \\. The same applies to strings in Java. Also note that both log messages and JSON results escape backslashes, so any \ becomes \. ### Is it possible to have multiple SELECT statements in a single call (subqueries)? E.g. two select queries with slightly different filtering condition and have a limit operator for each of the subquery. This makes it impossible to do via OR conditions to select both collection of documents - something equivalent to: ``` SELECT 1 AS x UNION ALL SELECT 2 AS y; ``` This isn’t possible, need to run 2 queries. Alternatively, split a single incoming query into two running in parallel in a [Searcher](../applications/searchers.html) - example: ``` FutureResult futureResult = new AsyncExecution(settings).search(query); FutureResult otherFutureResult = new AsyncExecution(settings).search(otherQuery); ``` ### Is it possible to query for the number of elements in an array There is no index or attribute data structure that allows efficient _searching_ for documents where an array field has a certain number of elements or items. The _grouping language_ has a [size()](../reference/querying/grouping-language.html#list-expressions) operator that can be used in queries. ### Is it possible to query for fields with NaN/no value set/null/none The [visiting](../writing/visiting.html#analyzing-field-values) API using document selections supports it, with a linear scan over all documents. If the field is an _attribute_ one can query using grouping to identify Nan Values, see count and list [fields with NaN](../querying/grouping.html#count-fields-with-nan). ### How to retrieve random documents using YQL? Functionality similar to MySQL "ORDER BY rand()" See the [random.match](../reference/ranking/rank-features.html#random.match) rank feature - example: ``` rank-profile random { first-phase { expression: random.match } } ``` Run queries, seeding the random generator: ``` $ vespa query 'select * from music where true' \ ranking=random \ rankproperty.random.match.seed=2 ``` ### Some of the query results have too many hits from the same source, how to create a diverse result set? See [result diversity](../querying/result-diversity) for strategies on how to create result sets from different sources. ### How to find most distant neighbor in a embedding field called clip\_query\_embedding? If you want to search for the most dissimilar items, you can with angular distance multiply your `clip_query_embedding` by the scalar -1. Then you are searching for the points that are closest to the point which is the farthest away from your `clip_query_embedding`. Also see a [pyvespa example](https://vespa-engine.github.io/pyvespa/examples/pyvespa-examples.html#Neighbors). ## Feeding ### How to debug a feeding 400 response? The best option is to use `--verbose` option, like `vespa feed --verbose myfile.jsonl` - see [documentation](../clients/vespa-cli.html#documents). A common problem is a mismatch in schema names and [document IDs](../schemas/documents.html#document-ids) - a schema like: ``` schema article { document article { ... } } ``` will have a document feed like: ``` {"put": "id:mynamespace:article::1234", "fields": { ... }} ``` Note that the [namespace](glossary.html#namespace) is not mentioned in the schema, and the schema name is the same as the document name. ### How to debug document processing chain configuration? This configuration is a combination of content and container cluster configuration, see [indexing](../writing/indexing.html) and [feed troubleshooting](../operations/self-managed/admin-procedures.html#troubleshooting). ### I feed documents with no error, but they are not in the index This is often a problem if using [document expiry](../schemas/documents.html#document-expiry), as documents already expired will not be persisted, they are silently dropped and ignored. Feeding stale test data with old timestamps in combination with document-expiry can cause this behavior. ### How to feed many files, avoiding 429 error? Using too many HTTP clients can generate a 429 response code. The Vespa sample apps use [vespa feed](../clients/vespa-cli.html#documents) which uses HTTP/2 for high throughput - it is better to stream the feed files through this client. ### Can I use Kafka to feed to Vespa? Vespa does not have a Kafka connector. Refer to third-party connectors like [kafka-connect-vespa](https://github.com/vinted/kafka-connect-vespa). ## Text Search ### Does Vespa support addition of flexible NLP processing for documents and search queries? E.g. integrating NER, word sense disambiguation, specific intent detection. Vespa supports these things well: - [Query (and result) processing](../applications/searchers.html) - [Document processing](../applications/document-processors.html)and document processors working on semantic annotations of text ### Does Vespa support customization of the inverted index? E.g. instead of using terms or n-grams as the unit, we might use terms with specific word senses - e.g. bark (dog bark) vs. bark (tree bark), or BCG (company) vs. BCG (vaccine name). Creating a new index _format_ means changing the core. However, for the examples above, one just need control over the tokens which are indexed (and queried). That is easily done in some Java code. The simplest way to do this is to plug in a [custom tokenizer](../linguistics/linguistics.html). That gets called from the query parser and bundled linguistics processing [Searchers](../applications/searchers.html)as well as the [Document Processor](../applications/document-processors.html)creating the annotations that are consumed by the indexing operation. Since all that is Searchers and Docprocs which you can replace and/or add custom components before and after, you can also take full control over these things without modifying the platform itself. ### Does vespa provide any support for named entity extraction? It provides the building blocks but not an out-of-the-box solution. We can write a [Searcher](../applications/searchers.html) to detect query-side entities and rewrite the query, and a [DocProc](../applications/document-processors.html) if we want to handle them in some special way on the indexing side. ### Does vespa provide support for text extraction? You can write a document processor for text extraction, Vespa does not provide it out of the box. ### How to do Text Search in an imported field? [Imported fields](../schemas/parent-child.html) from parent documents are defined as [attributes](../content/attributes.html), and have limited text match modes (i.e. `indexing: index` cannot be used).[Details](https://stackoverflow.com/questions/71936330/parent-child-mode-cannot-be-searched-by-parent-column). ## Semantic search ### Why is closeness 1 for all my vectors? If you have added vectors to your documents and queries, and see that the rank feature closeness(field, yourEmbeddingField) produces 1.0 for all documents, you are likely using[distance-metric](../reference/schemas/schemas.html#distance-metric): innerproduct/prenormalized-angular, but your vectors are not normalized, and the solution is normally to switch to[distance-metric: angular](../reference/schemas/schemas.html#angular)or use[distance-metric: dotproduct](../reference/schemas/schemas.html#dotproduct)(available from Vespa 8.170.18). With non-normalized vectors, you often get negative distances, and those are capped to 0, leading to closeness 1.0. Some embedding models, such as models from sbert.net, claim to output normalized vectors but might not. ## Programming Vespa ### Is Python plugins supported / is there a scripting language? Plugins have to run in the JVM - [jython](https://www.jython.org/) might be an alternative, however Vespa Team has no experience with it. Vespa does not have a language like[painless](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html) - it is more flexible to write application logic in a JVM-supported language, using[Searchers](../applications/searchers.html) and [Document Processors](../applications/document-processors.html). ### How can I batch-get documents by ids in a Searcher A [Searcher](../applications/searchers.html) intercepts a query and/or result. To get a number of documents by id in a Searcher or other component like a [Document processor](../applications/document-processors.html), you can have an instance of [com.yahoo.documentapi.DocumentAccess](../reference/applications/components.html#injectable-components)injected and use that to get documents by id instead of the HTTP API. ### Does Vespa work with Java 20? Vespa uses Java 17 - it will support 20 some time in the future. ### How to write debug output from a custom component? Use `System.out.println` to write text to the [vespa.log](../reference/operations/log-files.html). ## Performance ### What is the latency of documents being ingested vs indexed and available for search? Vespa has a near real-time indexing core with typically sub-second latencies from document ingestion to being indexed. This depends on the use-case, available resources and how the system is tuned. Some more examples and thoughts can be found in the [scaling guide](../performance/sizing-search.html). ### Is there a batch ingestion mode, what limits apply? Vespa does not have a concept of "batch ingestion" as it contradicts many of the core features that are the strengths of Vespa, including [serving elasticity](../content/elasticity.html) and sub-second indexing latency. That said, we have numerous use-cases in production that do high throughput updates to large parts of the (sometimes entire) document set. In cases where feed throughput is more important than indexing latency, you can tune this to meet your requirements. Some of this is detailed in the [feed sizing guide](../performance/sizing-feeding.html). ### Can the index support up to 512 GB index size in memory? Yes. The [content node](../content/proton.html) is implemented in C++ and not memory constrained other than what the operating system does. ### Get request for a document when document is not in sync in all the replica nodes? If the replicas are in sync the request is only sent to the primary content node. Otherwise, it's sent to several nodes, depending on replica metadata. Example: if a bucket has 3 replicas A, B, C and A & B both have metadata state X and C has metadata state Y, a request will be sent to A and C (but not B since it has the same state as A and would therefore not return a potentially different document). ### How to keep indexes in memory? [Attribute](../content/attributes.html) (with or without `fast-search`) is always in memory, but does not support tokenized matching. It is for structured data.[Index](../basics/schemas.html#document-fields) (where there’s no such thing as fast-search since it is always fast) is in memory to the extent there is available memory and supports tokenized matching. It is for unstructured text. It is possible to guarantee that fields that are defined with `index`have both the dictionary and the postings in memory by changing from `mmap` to `populate`, see [index \> io \> search](../reference/applications/services/content.html#index-io-search). Make sure that the content nodes run on nodes with plenty of memory available, during index switch the memory footprint will 2x. Familiarity with Linux tools like `pmap` can help diagnose what is mapped and if it’s resident or not. Fields that are defined with `attribute` are in-memory, fields that have both `index` and `attribute` have separate data structures, queries will use the default mapped on disk data structures that supports `text` matching, while grouping, summary and ranking can access the field from the `attribute` store. A Vespa query is executed in two phases as described in [sizing search](../performance/sizing-search.html), and summary requests can touch disk (and also uses `mmap` by default). Due to their potential size there is no populate option here, but one can define [dedicated document summary](../querying/document-summaries.html#performance)containing only fields that are defined with `attribute`. The [practical performance guide](../performance/practical-search-performance-guide)can be a good starting point as well to understand Vespa query execution, difference between `index` and `attribute` and summary fetching performance. ### Is memory freed when deleting documents? Deleting documents, by using the [document API](../writing/reads-and-writes.html)or [garbage collection](../schemas/documents.html#document-expiry) will increase the capacity on the content nodes. However, this is not necessarily observable in system metrics - this depends on many factors, like what kind of memory that is released, when [flush](../content/proton.html#proton-maintenance-jobs) jobs are run and document [schema](../basics/schemas.html). In short, Vespa is not designed to release memory once used. It is designed for sustained high throughput, low latency, keeping maximum memory used under control using features like [feed block](../writing/feed-block.html). When deleting documents, one can observe a slight increase in memory. A deleted document is represented using a [tombstone](../operations/self-managed/admin-procedures.html#content-cluster-configuration), that will later be removed, see [removed-db-prune-age](../reference/applications/services/content.html#removed-db-prune-age). When running garbage collection, the summary store is scanned using mmap and both VIRT and page cache memory usage increases. Read up on [attributes](../content/attributes.html) to understand more of how such fields are stored and managed.[Paged attributes](../content/attributes.html#paged-attributes) trades off memory usage vs. query latency for a lower max memory usage. ### Do empty fields consume memory? A field is of type _index_ or _attribute_ - [details](../querying/text-matching.html#index-and-attribute). Fields with _index_ use no incremental memory at deployment, if the field has no value. Fields with _attribute_ use memory, even if the field value is not set, Attributes are optimized for random access: To be able to jump to the value of any document in O(1) time. That requires allocating a constant amount of memory (the value, or a pointer) per document, regardless of whether there is a value. In short, knowing that a value is unset is a value in itself for attributes, so deploying new fields or new schemas with attributes will cause an incremental increase in memory. Applications with many unused schemas and fields can factor this in when sizing for memory. Refer to [attributes](../content/attributes.html#attribute-memory-usage) for details. ### What is the best practice for scaling Vespa for day vs night? [Autoscaling](../operations/autoscaling.html) is the best guide to understand how to size and autoscale the system. Container clusters are stateless and can be autoscaled more quickly than content clusters. ### We can spike 8x in 5 minutes in terms of throughput requirements. It is not possible to autoscale content clusters for 8x load increase in 5 minutes, as this requires both provisioning and data migration. Such use cases are best discussed with the Vespa Team to understand the resource bottlenecks, tradeoffs and mitigations. Also read [Graceful Degradation](../performance/graceful-degradation.html). ### How much lower-level configuration do we need to do? For example, do we need to alter the number of threads per container? It depends. Vespa aims to adapt to resources (like auto thread config based on virtual node thread count) and actual use (when to run maintenance jobs like compaction), but there are tradeoffs that applications owners can/should make. Start off by reading the [Vespa Serving Scaling Guide](../performance/sizing-search.html), then run [benchmarks](../performance/benchmarking-cloud.html) and use the [dashboards](../operations/monitoring.html). ## Administration ### Self-managed: Can one do a partial deploy to the config server / update the schema without deploying all the node configs? Yes, deployment is using this web service API, which allows you to create an edit session from the currently deployed package, make modifications, and deploy (prepare+activate) it: [deploy-rest-api-v2.html](../reference/api/deploy-v2.html). However, this is only useful in cases where you want to avoid transferring data to the config server unnecessarily. When you resend everything, the config server will notice that you did not actually change e.g. the node configs and avoid unnecessary noop changes. ### How fast can nodes be added and removed from a running cluster? [Elasticity](../content/elasticity.html) is a core Vespa strength - easily add and remove nodes with minimal (if any) serving impact. The exact time needed depends on how much data will need to be migrated in the background for the system to converge to [ideal data distribution](../content/idealstate.html). ### Should Vespa API search calls be load balanced or does Vespa do this automatically? You will need to load balance incoming requests between the nodes running the[stateless Java container cluster(s)](overview.html). This can typically be done using a simple network load balancer available in most cloud services. This is included when using [Vespa Cloud](https://cloud.vespa.ai/), with an HTTPS endpoint that is already load balanced - both locally within the region and globally across regions. ### Supporting index partitions [Search sizing](../performance/sizing-search.html) is the intro to this. Topology matters, and this is much used in the high-volume Vespa applications to optimise latency vs. cost. ### Can a running cluster be upgraded with zero downtime? With [Vespa Cloud](https://cloud.vespa.ai/), we do automated background upgrades daily without noticeable serving impact. If you host Vespa yourself, you can do this, but need to implement the orchestration logic necessary to handle this. The high level procedure is found in [live-upgrade](../operations/self-managed/live-upgrade.html). ### Can Vespa be deployed multi-region? [Vespa Cloud](https://cloud.vespa.ai/en/reference/zones) has integrated support - query a global endpoint. Writes will have to go to each zone. There is no auto-sync between zones. ### Can Vespa serve an Offline index? Building indexes offline requires the partition layout to be known in the offline system, which is in conflict with elasticity and auto-recovery (where nodes can come and go without service impact). It is also at odds with realtime writes. For these reasons, it is not recommended, and not supported. ### Does vespa give us any tool to browse the index and attribute data? Use [visiting](../writing/visiting.html) to dump all or a subset of the documents. See [data-management-and-backup](https://cloud.vespa.ai/en/data-management-and-backup) for more information. ### What is the response when data is written only on some nodes and not on all replica nodes (Based on the redundancy count of the content cluster)? Failure response will be given in case the document is not written on some replica nodes. ### When the doc is not written to some nodes, will the document become available due to replica reconciliation? Yes, it will be available, eventually. Also try [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode). ### Does vespa provide soft delete functionality? Yes just add a `deleted` attribute, add [fast-search](../content/attributes.html#fast-search) on it and create a searcher which adds an `andnot deleted` item to queries. ### Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down? You can set a [transition-time](../reference/applications/services/content.html#transition-time) in services.xml to configure the cluster controller how long a node is to be kept in maintenance mode before being automatically marked down. ### What is the recommended redundant/searchable-copies config when using grouping distribution? Grouped distribution is used to reduce search latency. Content is distributed to a configured set of groups, such that the entire document collection is contained in each group. Setting the redundancy and searchable-copies equal to the number of groups ensures that data can be queried from all groups. ### How to set up for disaster recovery / backup? Refer to [#17898](https://github.com/vespa-engine/vespa/issues/17898) for a discussion of options. ### Self-managed: How to check Vespa version for a running instance? Use [/state/v1/version](../reference/api/state-v1.html#state-v1-version) to find Vespa version. ### Deploy rollback See [rollback](../applications/deployment.html#rollback) for options. ## Troubleshooting ### Deployment fails with response code 413 If deployment fails with error message "Deployment failed, code: 413 ("Payload Too Large.")" you might need to increase the config server's JVM heap size. The config server has a default JVM heap size of 2 Gb. When deploying an app with e.g. large models this might not be enough, try increasing the heap to e.g. 4 Gb when executing 'docker run …' by adding an environment variable to the command line: ``` docker run --env VESPA_CONFIGSERVER_JVMARGS=-Xmx4g ``` ### The endpoint does not come up after deployment When deploying an application package, with some kind of error, the endpoints might fail, like: ``` $ vespa deploy --wait 300 Uploading application package ... done Success: Deployed target/application.zip Waiting up to 5m0s for query service to become available ... Error: service 'query' is unavailable: services have not converged ``` Another example: ``` [INFO] [03:33:48] Failed to get 100 consecutive OKs from endpoint ... ``` There are many ways this can fail, the first step is to check the Vespa Container: ``` $ docker exec vespa vespa-logfmt -l error [2022-10-21 10:55:09.744] ERROR container Container.com.yahoo.container.jdisc.ConfiguredApplication Reconfiguration failed, your application package must be fixed, unless this is a JNI reload issue: Could not create a component with id 'ai.vespa.example.album.MetalSearcher'. Tried to load class directly, since no bundle was found for spec: album-recommendation-java. If a bundle with the same name is installed, there is a either a version mismatch or the installed bundle's version contains a qualifier string. ... ``` [Bundle plugin troubleshooting](../applications/bundles.html#bundle-plugin-troubleshooting) is a good resource to analyze Vespa container startup / bundle load problems. ### Starting Vespa using Docker on M1 fails Using an M1 MacBook Pro / AArch64 makes the Docker run fail: ``` WARNING: The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested ``` Make sure you are running a recent version of the Docker image, do `docker pull vespaengine/vespa`. ### Deployment fails / nothing is listening on 19071 Make sure all [Config servers](../operations/self-managed/configuration-server.html#troubleshooting) are started, and are able to establish ZooKeeper quorum (if more than one) - see the [multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample application. Validate that the container has [enough memory](../operations/self-managed/docker-containers.html). ### Startup problems in multinode Kubernetes cluster - readinessProbe using 19071 fails The Config Server cluster with 3 nodes fails to start. The ZooKeeper cluster the Config Servers use waits for hosts on the network, the hosts wait for ZooKeeper in a catch 22 - see [sampleapp troubleshooting](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations#troubleshooting). ### How to display vespa.log? Use [vespa-logfmt](../reference/operations/self-managed/tools.html#vespa-logfmt) to dump logs. If Vespa is running in a local container (named "vespa"), run `docker exec vespa vespa-logfmt`. ### How to fix encoding problems in document text? See [encoding troubleshooting](../linguistics/troubleshooting-encoding.html)for how to handle and remove control characters from the document feed. ## Login, Tenants and Plans ### How to get started? [Deploy an application](../basics/deploy-an-application.html) to create a tenant and start your [free trial](https://vespa.ai/free-trial/). This tenant can be your personal tenant, or shared with others. It can not be renamed. ### How to create a company tenant? If the tenant is already created, add more users to it. Click the "account" button in the Vespa Cloud Console (top right in the tenant view), then "users". From this view you can manage users in the tenant, and their roles - from here, you can add/set tenant admins. ### How to accept Terms of Service? When starting the free trial, you are asked to accept Terms of Service. For paid plans, this is covered by the contract. ### How do I switch from free trial to a paid plan? Click "account", then "billing" in the console to enter information required for billing. Use [Vespa Support](https://vespa.ai/support/) if you need to provide this information without console login. ### Does Vespa Cloud support Single Sign-On (SSO)? Yes, contact [Vespa Support](https://vespa.ai/support/) to set it up. ## Vespa Cloud Operations ### How can I change the cost of my Vespa Cloud usage? See [node resources](../performance/node-resources) to assess current and auto-suggested resources and [autoscaling](../operations/autoscaling.html) for how to automate. ### How can I manually modify resources used? Managing resources is easy, as most changes are automated. Adding / removing / changing nodes starts automated data migration, see [elasticity](../content/elasticity.html). ### How to modify a schema? Schema changes might require data reindexing, which is automated, but takes some time. Other schema changes require data refeed - [details](../reference/schemas/schemas.html#modifying-schemas) ### How to evaluate how much memory a field is using? Use the [Memory Visualizer](../performance/memory-visualizer.html) to evaluate how memory is allocated to the fields. Fields can be `index`, `attribute` and `summary`, and combinations of these, with settings like `fast-search` that affects memory usage.[Attributes](../content/attributes.html) is a great read for understanding Vespa memory usage. ### Archive access failed with Permission 'serviceusage.services.use' denied Listing archived objects can fail, e.g. `gsutil -u my_project ls gs://vespa-cloud-data-prod-gcp-us-central1-f-12345f/my_tenant` can fail with`AccessDeniedException: 403 me@mymail.com does not have serviceusage.services.use access to the Google Cloud project. Permission \'serviceusage.services.use\' denied on resource (or it may not exist).`This can be due to missing rights on your Google project (my\_project in the example above) - from the Google documentation:_"The user account accessing the Cloud Storage Bucket must be granted the Service Usage Consumer role (see [https://cloud.google.com/service-usage/docs/access-control](https://cloud.google.com/service-usage/docs/access-control)) in order to charge the specified user project for the bucket usage cost"_ ### How do I integrate with my current monitoring infrastructure? Vespa Cloud applications have a Prometheus endpoint. Find guides for how to integrate with Grafana and AWS Cloudwatch at [monitoring](../operations/monitoring.html). ### What is the best way to monitor instantaneously what is happening in Vespa? CPU usage? Memory usage? htop? Cloudwatch metrics? Vespa Cloud has detailed dashboards linked from the _monitoring_ tab in the Console, one for each zone the instance is deployed to. ### How are Vespa versions upgrades handled - only for new deploys? Vespa is normally upgraded daily. There are exceptions, like holidays and weekends. During upgrades, nodes are stopped one-by-one per cluster. As all clusters have one redundant node, serving and write traffic is not impacted by upgrades. Before the upgrade, the application's [system and staging tests](../operations/automated-deployments.html) are run, halting the upgrade if they fail. Documents are re-migrated to the upgraded node before doing the next node, see [Elastic Vespa](../content/elasticity.html) for details. ### How do we get alerted to issues like Feed Block? Searchable copy going offline? Issues like Feed Blocked, Deployment and Deprecation warnings show up in the console. There are no warnings on redundancy level / searchable copies, as redundant document buckets are activated for queries automatically, and auto data-migration kicks in for node failures / replacements. ### What actions are needed when deploying schema changes? - Schema changes that [require service restart](../reference/schemas/schemas.html#changes-that-require-restart-but-not-re-feed)are handled automatically by Vespa Cloud. A deployment job involves waiting for these to complete. - Schema changes that [require reindexing](../reference/schemas/schemas.html#changes-that-require-reindexing)of data require a validation override, and will trigger automatic reindexing. Status can be tracked in the console application view. Vespa Cloud also periodically re-indexes all data, with minimal resource usage, to account for changes in linguistics libraries. - Schema changes that [require refeeding](../reference/schemas/schemas.html#changes-that-require-re-feed)data require a validation override, and the user must refeed the data after deployment. ### What are the Vespa Cloud data retention policies? The management of data stored in an application running on Vespa Cloud is the responsibility of the application owner and, as such, Vespa Cloud does not have any retention policy for this data as long as it is stored by the application. The following data retention policies applies to Vespa Cloud: - After a node previously allocated to an application has been deallocated (e.g. due to application being deleted by application owner), all application data will be deleted within_four hours_. - All application log data will be deleted from Vespa servers after no more than _30 days_ (most often sooner) dependent on log volume, allocated disk resources, etc. _PLEASE NOTE:_ This is the theoretical maximum retention time - see [archive guide](../cloud/archive-guide.html) for how to ensure access to your application logs. ### Is Vespa Cloud certified for ISO 27001 or SOC 2? Yes, Vespa.ai has a SOC 2 attestation: [Trust Center](https://trust.vespa.ai). ### Is Vespa Cloud GDPR compliant? Read more in [GDPR](https://cloud.vespa.ai/en/gdpr). ### Does Vespa store information from the information sources with which it is integrated? Vespa is most often used for queries in data written from the information sources, although it can also be used without data, e.g. for model serving. It is the application owner that writes the integration with Vespa Cloud to write data. ### What is the encryption algorithm used at rest? Vespa Cloud uses the following Cloud providers: - AWS EC2 instances, with local or remote storage - GCP Compute instances, with local or remote storage - Azure Compute instances, with local or remote storage The storage devices are encrypted per Cloud provider, at rest. ### Does the Vespa console have audit trails/logs module and can it be accessed by an Admin user? See the [security guide](../security/guide.html) for roles and permissions. The Vespa Cloud Console has a log view tool, and logs / access logs can be exported to the customer's AWS account easily. Deployment operations are tracked in the deployment view, with a history. Vespa Cloud Operators do not have node access, unless specifically granted by the customer, audit logged. ### Once the service purchased with Vespa is terminated, is there a secure deletion procedure for the information collected from the customer? At termination, all application instances are removed, with data, before the tenant can be deactivated. ### Why is the CPU usage for my application above 100%? In `dev` zones we use shared resources hence have more than one node on each host/instance. In order to provide a best possible overall responsiveness we do not restrict CPU resources for the individual application nodes. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Ranking](#ranking) - [Does Vespa support a flexible ranking score?](#does-vespa-support-a-flexible-ranking-score) - [Where would customer specific weightings be stored?](#where-would-customer-specific-weightings-be-stored) - [How to create a tensor on the fly in the ranking expression?](#how-to-create-a-tensor-on-the-fly-in-the-ranking-expression) - [How to set a dynamic (query time) ranking drop threshold?](#how-to-set-a-dynamic-query-time-ranking-drop-threshold) - [Are ranking expressions or functions evaluated lazily?](#are-ranking-expressions-or-functions-evaluated-lazily) - [Does Vespa support early termination of matching and ranking?](#does-vespa-support-early-termination-of-matching-and-ranking) - [What could cause the relevance field to be -Infinity](#what-could-cause-the-relevance-field-to-be--infinity) - [How to pin query results?](#how-to-pin-query-results) - [Documents](#documents) - [What limits apply to document size?](#what-limits-apply-to-document-size) - [Is there any size limitation for multivalued fields?](#is-there-any-size-limitation-for-multivalued-fields) - [Can a document have lists (key value pairs)?](#can-a-document-have-lists-key-value-pairs) - [Does a whole document need to be updated and re-indexed?](#does-a-whole-document-need-to-be-updated-and-re-indexed) - [What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc?](#what-acid-guarantees-if-any-does-vespa-provide-for-single-writes--updates--deletes-vs-batch-operations-etc) - [Does vespa support wildcard fields?](#does-vespa-support-wildcard-fields) - [Can we set a limit for the number of elements that can be stored in an array?](#can-we-set-a-limit-for-the-number-of-elements-that-can-be-stored-in-an-array) - [How to auto-expire documents / set up garbage collection?](#how-to-auto-expire-documents--set-up-garbage-collection) - [How to increase redundancy and track data migration progress?](#how-to-increase-redundancy-and-track-data-migration-progress) - [How does namespace relate to schema?](#how-does-namespace-relate-to-schema) - [Visiting does not dump all documents, and/or hangs.](#visiting-does-not-dump-all-documents-andor-hangs) - [How to find number of documents in the index?](#how-to-find-number-of-documents-in-the-index) - [Can I define a default value for a field?](#can-i-define-a-default-value-for-a-field) - [Query](#query) - [Are hierarchical facets supported?](#are-hierarchical-facets-supported) - [Are filters supported?](#are-filters-supported) - [How to query for similar items?](#how-to-query-for-similar-items) - [Does Vespa support stop-word removal?](#does-vespa-support-stop-word-removal) - [How to extract more than 400 hits / query and get ALL documents?](#how-to-extract-more-than-400-hits--query-and-get-all-documents) - [How to make a sub-query to get data to enrich the query, like get a user profile?](#how-to-make-a-sub-query-to-get-data-to-enrich-the-query-like-get-a-user-profile) - [How to create a cache that refreshes itself regularly](#how-to-create-a-cache-that-refreshes-itself-regularly) - [Is it possible to query Vespa using a list of document ids?](#is-it-possible-to-query-vespa-using-a-list-of-document-ids) - [How to query documents where one field matches any values in a list? Similar to using SQL IN operator](#how-to-query-documents-where-one-field-matches-any-values-in-a-list-similar-to-using-sql-in-operator) - [How to count hits / all documents without returning results?](#how-to-count-hits--all-documents-without-returning-results) - [Must all fields in a fieldset have compatible type and matching settings?](#must-all-fields-in-a-fieldset-have-compatible-type-and-matching-settings) - [How is the query timeout computed?](#how-is-the-query-timeout-computed) - [How does backslash escapes work?](#how-does-backslash-escapes-work) - [Is it possible to have multiple SELECT statements in a single call (subqueries)?](#is-it-possible-to-have-multiple-select-statements-in-a-single-call-subqueries) - [Is it possible to query for the number of elements in an array](#is-it-possible-to-query-for-the-number-of-elements-in-an-array) - [Is it possible to query for fields with NaN/no value set/null/none](#is-it-possible-to-query-for-fields-with-nanno-value-setnullnone) - [How to retrieve random documents using YQL? Functionality similar to MySQL "ORDER BY rand()"](#how-to-retrieve-random-documents-using-yql-functionality-similar-to-mysql-order-by-rand) - [Some of the query results have too many hits from the same source, how to create a diverse result set?](#some-of-the-query-results-have-too-many-hits-from-the-same-source-how-to-create-a-diverse-result-set) - [How to find most distant neighbor in a embedding field called clip\_query\_embedding?](#how-to-find-most-distant-neighbor-in-a-embedding-field-called-clip_query_embedding) - [Feeding](#feeding) - [How to debug a feeding 400 response?](#how-to-debug-a-feeding-400-response) - [How to debug document processing chain configuration?](#how-to-debug-document-processing-chain-configuration) - [I feed documents with no error, but they are not in the index](#i-feed-documents-with-no-error-but-they-are-not-in-the-index) - [How to feed many files, avoiding 429 error?](#how-to-feed-many-files-avoiding-429-error) - [Can I use Kafka to feed to Vespa?](#can-i-use-kafka-to-feed-to-vespa) - [Text Search](#text-search) - [Does Vespa support addition of flexible NLP processing for documents and search queries?](#does-vespa-support-addition-of-flexible-nlp-processing-for-documents-and-search-queries) - [Does Vespa support customization of the inverted index?](#does-vespa-support-customization-of-the-inverted-index) - [Does vespa provide any support for named entity extraction?](#does-vespa-provide-any-support-for-named-entity-extraction) - [Does vespa provide support for text extraction?](#does-vespa-provide-support-for-text-extraction) - [How to do Text Search in an imported field?](#how-to-do-text-search-in-an-imported-field) - [Semantic search](#semantic-search) - [Why is closeness 1 for all my vectors?](#why-is-closeness-1-for-all-my-vectors) - [Programming Vespa](#programming-vespa) - [Is Python plugins supported / is there a scripting language?](#is-python-plugins-supported--is-there-a-scripting-language) - [How can I batch-get documents by ids in a Searcher](#how-can-i-batch-get-documents-by-ids-in-a-searcher) - [Does Vespa work with Java 20?](#does-vespa-work-with-java-20) - [How to write debug output from a custom component?](#how-to-write-debug-output-from-a-custom-component) - [Performance](#performance) - [What is the latency of documents being ingested vs indexed and available for search?](#what-is-the-latency-of-documents-being-ingested-vs-indexed-and-available-for-search) - [Is there a batch ingestion mode, what limits apply?](#is-there-a-batch-ingestion-mode-what-limits-apply) - [Can the index support up to 512 GB index size in memory?](#can-the-index-support-up-to-512-gb-index-size-in-memory) - [Get request for a document when document is not in sync in all the replica nodes?](#get-request-for-a-document-when-document-is-not-in-sync-in-all-the-replica-nodes) - [How to keep indexes in memory?](#how-to-keep-indexes-in-memory) - [Is memory freed when deleting documents?](#is-memory-freed-when-deleting-documents) - [Do empty fields consume memory?](#do-empty-fields-consume-memory) - [What is the best practice for scaling Vespa for day vs night?](#what-is-the-best-practice-for-scaling-vespa-for-day-vs-night) - [We can spike 8x in 5 minutes in terms of throughput requirements.](#we-can-spike-8x-in-5-minutes-in-terms-of-throughput-requirements) - [How much lower-level configuration do we need to do? For example, do we need to alter the number of threads per container?](#how-much-lower-level-configuration-do-we-need-to-do-for-example-do-we-need-to-alter-the-number-of-threads-per-container) - [Administration](#administration) - [Self-managed: Can one do a partial deploy to the config server / update the schema without deploying all the node configs?](#self-managed-can-one-do-a-partial-deploy-to-the-config-server--update-the-schema-without-deploying-all-the-node-configs) - [How fast can nodes be added and removed from a running cluster?](#how-fast-can-nodes-be-added-and-removed-from-a-running-cluster) - [Should Vespa API search calls be load balanced or does Vespa do this automatically?](#should-vespa-api-search-calls-be-load-balanced-or-does-vespa-do-this-automatically) - [Supporting index partitions](#supporting-index-partitions) - [Can a running cluster be upgraded with zero downtime?](#can-a-running-cluster-be-upgraded-with-zero-downtime) - [Can Vespa be deployed multi-region?](#can-vespa-be-deployed-multi-region) - [Can Vespa serve an Offline index?](#can-vespa-serve-an-offline-index) - [Does vespa give us any tool to browse the index and attribute data?](#does-vespa-give-us-any-tool-to-browse-the-index-and-attribute-data) - [What is the response when data is written only on some nodes and not on all replica nodes (Based on the redundancy count of the content cluster)?](#what-is-the-response-when-data-is-written-only-on-some-nodes-and-not-on-all-replica-nodes-based-on-the-redundancy-count-of-the-content-cluster) - [When the doc is not written to some nodes, will the document become available due to replica reconciliation?](#when-the-doc-is-not-written-to-some-nodes-will-the-document-become-available-due-to-replica-reconciliation) - [Does vespa provide soft delete functionality?](#does-vespa-provide-soft-delete-functionality) - [Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down?](#can-we-configure-a-grace-period-for-bucket-distribution-so-that-buckets-are-not-redistributed-as-soon-as-a-node-goes-down) - [What is the recommended redundant/searchable-copies config when using grouping distribution?](#what-is-the-recommended-redundantsearchable-copies-config-when-using-grouping-distribution) - [How to set up for disaster recovery / backup?](#how-to-set-up-for-disaster-recovery--backup) - [Self-managed: How to check Vespa version for a running instance?](#self-managed-how-to-check-vespa-version-for-a-running-instance) - [Deploy rollback](#deploy-rollback) - [Troubleshooting](#troubleshooting) - [Deployment fails with response code 413](#deployment-fails-with-response-code-413) - [The endpoint does not come up after deployment](#the-endpoint-does-not-come-up-after-deployment) - [Starting Vespa using Docker on M1 fails](#starting-vespa-using-docker-on-m1-fails) - [Deployment fails / nothing is listening on 19071](#deployment-fails--nothing-is-listening-on-19071) - [Startup problems in multinode Kubernetes cluster - readinessProbe using 19071 fails](#startup-problems-in-multinode-kubernetes-cluster---readinessprobe-using-19071-fails) - [How to display vespa.log?](#how-to-display-vespalog) - [How to fix encoding problems in document text?](#how-to-fix-encoding-problems-in-document-text) - [Login, Tenants and Plans](#login-tenants-and-plans) - [How to get started?](#how-to-get-started) - [How to create a company tenant?](#how-to-create-a-company-tenant) - [How to accept Terms of Service?](#how-to-accept-terms-of-service) - [How do I switch from free trial to a paid plan?](#how-do-i-switch-from-free-trial-to-a-paid-plan) - [Does Vespa Cloud support Single Sign-On (SSO)?](#does-vespa-cloud-support-single-sign-on-sso) - [Vespa Cloud Operations](#vespa-cloud-operations) - [How can I change the cost of my Vespa Cloud usage?](#how-can-i-change-the-cost-of-my-vespa-cloud-usage) - [How can I manually modify resources used?](#how-can-i-manually-modify-resources-used) - [How to modify a schema?](#how-to-modify-a-schema) - [How to evaluate how much memory a field is using?](#how-to-evaluate-how-much-memory-a-field-is-using) - [Archive access failed with Permission 'serviceusage.services.use' denied](#archive-access-failed-with-permission-serviceusageservicesuse-denied) - [How do I integrate with my current monitoring infrastructure?](#how-do-i-integrate-with-my-current-monitoring-infrastructure) - [What is the best way to monitor instantaneously what is happening in Vespa? CPU usage? Memory usage? htop? Cloudwatch metrics?](#what-is-the-best-way-to-monitor-instantaneously-what-is-happening-in-vespa-cpu-usage-memory-usage-htop-cloudwatch-metrics) - [How are Vespa versions upgrades handled - only for new deploys?](#how-are-vespa-versions-upgrades-handled---only-for-new-deploys) - [How do we get alerted to issues like Feed Block? Searchable copy going offline?](#how-do-we-get-alerted-to-issues-like-feed-block-searchable-copy-going-offline) - [What actions are needed when deploying schema changes?](#what-actions-are-needed-when-deploying-schema-changes) - [What are the Vespa Cloud data retention policies?](#what-are-the-vespa-cloud-data-retention-policies) - [Is Vespa Cloud certified for ISO 27001 or SOC 2?](#is-vespa-cloud-certified-for-iso-27001-or-soc-2) - [Is Vespa Cloud GDPR compliant?](#is-vespa-cloud-gdpr-compliant) - [Does Vespa store information from the information sources with which it is integrated?](#does-vespa-store-information-from-the-information-sources-with-which-it-is-integrated) - [What is the encryption algorithm used at rest?](#what-is-the-encryption-algorithm-used-at-rest) - [Does the Vespa console have audit trails/logs module and can it be accessed by an Admin user?](#does-the-vespa-console-have-audit-trailslogs-module-and-can-it-be-accessed-by-an-admin-user) - [Once the service purchased with Vespa is terminated, is there a secure deletion procedure for the information collected from the customer?](#once-the-service-purchased-with-vespa-is-terminated-is-there-a-secure-deletion-procedure-for-the-information-collected-from-the-customer) - [Why is the CPU usage for my application above 100%?](#why-is-the-cpu-usage-for-my-application-above-100) --- # Source: https://docs.vespa.ai/en/performance/feature-tuning.html.md # Vespa Serving Tuning This document describes how to tune certain features of an application for high query serving performance, where the main focus is on content cluster search features; see [Container tuning](container-tuning.html) for tuning of container clusters. The [search sizing guide](sizing-search.html) is about _scaling_ an application deployment. ## Attribute vs index The [attribute](../content/attributes.html) documentation summarizes when to use [attribute](../reference/schemas/schemas.html#attribute) in the [indexing](../reference/schemas/schemas.html#indexing) statement. Also see the [procedure](/en/reference/schemas/schemas.html#modifying-schemas) for changing from attribute to index and vice-versa. ``` field timestamp type long { indexing: summary | attribute } ``` If both index and attribute are configured for string-type fields, Vespa will search and match against the index with default match `text`. All numeric type fields and tensor fields are attribute (in-memory) fields in Vespa. ## When to use fast-search for attribute fields By default, Vespa does not build any posting list index structures over _attribute_ fields. Adding _fast-search_ to the attribute definition as shown below will add an in-memory B-tree posting list structure which enables faster search for some cases (but not all, see next paragraph): ``` field timestamp type long { indexing: summary | attribute attribute: fast-search rank: filter } ``` When Vespa runs a query with multiple query items, it builds a query execution plan. It tries to optimize the plan so that the temporary result set is as small as possible. To do this, restrictive query tree items (matching few documents) are evaluated early. The query execution plan looks at hit count estimates for each part of the query tree using the index and B-tree dictionaries, which track the number of documents in which a given term occurs. However, for attribute fields without [fast-search](../content/attributes.html#fast-search) there is no hit count estimate, so the estimate becomes the total number of documents (matches all) and the query tree item is moved to the end of the query evaluation. A query with only one query term searching an attribute field without `fast-search` would be a linear scan over all documents and thus expensive: ``` select * from sources * where range(timestamp, 0, 100) ``` But if this query term is _and_-ed with another term that matches fewer documents, that term will determine the cost instead, and fast-search won't be necessary, e.g.: ``` select * from sources * where range(timestamp, 0, 100) and uuid contains "123e4567-e89b-12d3-a456-426655440000" ``` The general rules of thumb for when to use fast-search for an attribute field are: - Use _fast-search_ if the attribute field is searched without any other query terms - Use _fast-search_ if the attribute field could limit the total number of hits efficiently Changing fast-search aspect of the attribute is a [live change](/en/reference/schemas/schemas.html#modifying-schemas) which does not require any re-feeding, so testing the performance with and without is low effort. Adding or removing _fast-search_ requires restart. Note that _attribute_ fields with _fast-search_ that are not used in term based [ranking](../basics/ranking.html) should use _rank: filter_for optimal performance. See reference [rank: filter](../reference/schemas/schemas.html#rank). See optimization for sorting on a _single-value numeric attribute with fast-search_ using [sorting.degrading](../reference/api/query.html#sorting.degrading). ## Tuning query performance for lexical search Lexical search (or keyword-based search) is a method that matches query terms as they appear in indexed documents. It relies on the lexical representation of words rather than their meaning, and is one of the two retrieval methods used in [hybrid search](../learn/tutorials/hybrid-search.html). Lexical search in Vespa is done by querying string (text) [index](../basics/schemas.html#document-fields) fields, typically using the [weakAnd](../ranking/wand.html#weakand) query operator with [BM25](../ranking/bm25.html) ranking. The following schema represents a simple article document with _title_ and _content_ fields, that can represent Wikipedia articles as an example. A _default_ fieldset is specified such that user queries are matched against both the _title_ and _content_ fields. BM25 ranking combines the scores of both fields in the _default_ rank profile. In addition, the _optimized_ rank profile specifies tuning parameters to improve query performance: ``` schema article { document article { field title type string { indexing: index | summary index: enable-bm25 } field content type string { indexing: index | summary index: enable-bm25 } } fieldset default { fields: title, content } rank-profile default { first-phase { expression: bm25(title) + bm25(content) } } rank-profile optimized inherits default { filter-threshold: 0.05 weakand { stopword-limit: 0.6 adjust-target: 0.01 } } } ``` The following shows an example question-answer query against a collection of articles, using the _weakAnd_ query operator and the _optimized_ rank profile. Question-answer queries are often written in full sentences, and as a consequence, they tend to contain many stopwords that are present in many documents and of less relevance when it comes to ranking. E.g., terms as "the", "in", and "are" are typically present in more the 60% of the documents: ``` ``` { "yql": "select * from article where userQuery()", "ranking.profile": "optimized", "query": "what are the three highest mountains in the world" } ``` ``` The cost of evaluating such a query is primarily linear with the number of matched documents. The _AND_ operator is most effective, but often ends up being too restrictive by not returning enough matches. The _OR_ operator is less restrictive, but has the problem of returning too many matches, which is very costly. The _weakAnd_ operator is somewhere in between the two in cost. ### Posting Lists To find matching documents, the query operator uses the _posting lists_ associated with each query term. A posting list is part of the inverted index and contains all occurrences of a term within a collection of documents. It consists of document IDs for documents that contain the term, and additional information such as the positions of the term within those documents (used for ranking purposes). For common terms (e.g., stopwords), the posting lists are very large and can be expensive to use during evaluation and ranking. CPU work is required to iterate them, and I/O work is required to load portions of them from disk to memory with MMAP. The last part is especially problematic when all posting lists of a disk index cannot fit into physical memory, and the system must constantly swap parts of them in and out of memory, leading to high I/O wait times. To improve query performance, the following tuning parameters are available, as seen used in the _optimized_ rank profile. These are used to make tradeoffs between performance and quality. - **Use more compact posting lists for common terms**: Setting [filter-threshold](../reference/schemas/schemas.html#filter-threshold) to 0.05 ensures that all terms that are estimated to occur in more than 5% of the documents are handled with [compact posting lists (bitvectors)](../content/proton.html#index) instead of the full posting lists. This makes matching faster at the cost of producing less information for BM25 ranking (only a boolean signal is available). - **Avoid using large posting lists all together**: Setting [stopword-limit](../reference/schemas/schemas.html#weakand-stopword-limit) to 0.6, ensures that all terms that are estimated to occur in more than 60% of the documents are considered stopwords and dropped entirely from the query and also from ranking. - **Reduce the number of hits produced by _weakAnd_**: Setting [adjust-target](../reference/schemas/schemas.html#weakand-adjust-target) ensures that documents that only match terms that occur very frequently in the documents are not considered hits. This also removes the need to calculate _first-phase_ ranking for these documents, which is beneficial if _first-phase_ ranking is more complex and expensive. ### Performance The tuning parameters used in the _optimized_ rank profile have been shown to provide a good tradeoff between performance and quality in testing. A Wikipedia dataset with [SQuAD](https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf) (Stanford Question Answering Dataset) queries was used to analyze performance, and [trec-covid](https://ir.nist.gov/trec-covid/), [MS MARCO](https://microsoft.github.io/msmarco/) and [nfcorpus](https://huggingface.co/datasets/BeIR/nfcorpus) from the BEIR dataset to analyze quality implications. For instance, the query performance was tripled without any measurable drop in quality with the Wikipedia dataset, using the tuning parameters in the _optimized_ rank profile. See the blog post [Tripling the query performance of lexical search](https://blog.vespa.ai/tripling-the-query-performance-of-lexical-search/) for more details. Note that testing should be conducted on your particular dataset to find the right tradeoff between performance and quality. ## Hybrid TAAT and DAAT query evaluation Vespa supports **hybrid** query evaluation over inverted indexes, combining _TAAT_ and _DAAT_ evaluation to combine the best of both query evaluation techniques. Hybrid is not enabled per default and is triggered by a run-time query parameter. - **TAAT:** _Term At A Time_ scores documents one query term at a time. The entire posting iterator can be read per query term, and the score of a document is accumulated. It is CPU cache friendly as posting data is read sequentially without randomly seeking the posting list iterator. The downside is that _TAAT_ limits the term-based ranking function to be a linear sum of term scores. This downside is one reason why most search engines use _DAAT_. - **DAAT:** _Document At A Time_ scores documents completely one at a time. This requires multiple seeks in the term posting lists, which is CPU cache unfriendly but allows non-linear ranking functions. Generally, Vespa does _DAAT_ (document-at-a-time) query evaluation and not _TAAT_ (term-at-a time) for the reason listed above. Ranking (score calculation) and matching (does the document match the query logic) is not fully two separate disjunct phases, where one first finds matches and calculates the ranking score in a later phase. Matching and _first-phase_ score calculation is interleaved when using _DAAT_. The _first-phase_ ranking score is assigned to the hit when it satisfies the query constraints. At that point, the term iterators are positioned at the document id and one can unpack additional data from the term posting lists - e.g., for term proximity scoring used by the [nativeRank](../ranking/nativerank.html) ranking feature, which also requires unpacking of positions of the term within the document. The way hybrid query evaluation is done is that _TAAT_ is used for sub-branches of the overall query tree, which is not used for term-based ranking. Using _TAAT_ can speed up query matching significantly (up to 30-50%) in cases where the query tree is large and complex, and where only parts of the query tree are used for term-based ranking. Examples of query tree branches that would require _DAAT_ is using text ranking features like [bm25 or nativeRank](../reference/ranking/rank-features.html). The list of ranking features which can handle _TAAT_ is long, but using [attribute or tensor](../ranking/tensor-user-guide.html) features only can have the entire tree evaluated using _TAAT_. For example, for a query where there is a user text query from an end user, one can use _userQuery()_ YQL syntax and combine it with application-level constraints. The application level filter constraints in the query could benefit from using _TAAT_. Given the following document schema: ``` search news { document news { field title type string {} field body type string{} field popularity type float {} field market type string { rank:filter indexing: attribute attribute: fast-search } field language type string { rank:filter indexing: attribute attribute: fast-search } } fieldset default { fields: title,body } rank-profile text-and-popularity { first-phase { expression: attribute(popularity) + log10(bm25(title)) + log10(bm25(body)) } } } ``` In this case, the rank profile only uses two ranking features, the popularity attribute and the [bm25](../ranking/bm25.html) score of the userQuery(). These are used in the default fieldset containing the title and body. Notice how neither _market_ nor _language_ is used in the ranking expression. In this query example, there is a language constraint and a market constraint, where both language and market are queried with a long list of valid values using OR, meaning that the document should match any of the market constraints and any of the language constraints: ``` ``` { "hits": 10, "ranking.profile": "text-and-popularity", "yql": "select * from sources * where userQuery() and (language contains \"en\" or language contains \"br\") and (market contains \"us\" or market contains \"eu\" or market contains \"apac\" or market contains \"..\" )", "query": "cat video", "ranking.matching.termwiselimit": 0.1 } ``` ``` The language and the market constraints in the query tree are not used in the ranking score, and that part of the query tree could be evaluated using _TAAT_. See also [multi lookup set filter](#multi-lookup-set-filtering) for how to most efficiently search with large set filters. The subtree result is then passed as a bit vector into the _DAAT_ query evaluation, which could significantly speed up the overall evaluation. Enabling hybrid _TAAT_ is done by passing `ranking.matching.termwiselimit=0.1` as a request parameter. It's possible to evaluate the performance impact by changing this limit. Setting the limit to 0 will force termwise evaluation, which might hurt performance. One can evaluate if using the hybrid evaluation improves search performance by adding the above parameter. The limit is compared to the hit fraction estimate of the entire query tree. If the hit fraction estimate is higher than the limit, the termwise evaluation is used to evaluate the sub-branch of the query. ## Indexing uuids When configuring [string](../reference/schemas/schemas.html#string) type fields with `index`, the default [match](../reference/schemas/schemas.html#match) mode is `text`. This means Vespa will [tokenize](../linguistics/linguistics-opennlp.html#tokenization) the content and index the tokens. The string representation of an [Universally unique identifier](https://en.wikipedia.org/wiki/Universally_unique_identifier) (UUID) is 32 hexadecimal (base 16) digits, in five groups, separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters (32 alphanumeric characters and four hyphens). Example: Indexing `123e4567-e89b-12d3-a456-426655440000` with the above document definition, Vespa will tokenize this into 5 tokens: `[123e4567,e89b,12d3,a456,426655440000]`, each of which could be matched independently, leading to possible incorrect matches. To avoid this, change the mode to [match: word](../reference/schemas/schemas.html#word) to treat the entire uuid as _one_ token/word: ``` field uuid type string { indexing: summary | index match: word rank: filter } ``` In addition, configure the `uuid` as a [rank: filter](../reference/schemas/schemas.html#rank) field - the field will then be represented as efficiently as possible during search and ranking. The `rank:filter` behavior can also be triggered at query time on a per-query item basis by the `com.yahoo.prelude.query.Item.setRanked()` in a [custom searcher](../applications/searchers.html). ## Parent child and search performance When searching imported attribute fields (with `fast-search`) from parent document types, there is an additional indirection that can be reduced significantly if the imported field is defined with `rank:filter` and [visibility-delay](../reference/applications/services/content.html#visibility-delay) is configured to \> 0. The [rank:filter](../reference/schemas/schemas.html#rank) setting impacts posting list granularity and `visibility-delay` enables a cache for the indirection between the child and parent document. ## Ranking and ML Model inferences Vespa [scales](sizing-search.html) with the number of hits the query retrieves per node/search thread, and which needs to be evaluated by the first-phase ranking function. Read more on [phased ranking](../ranking/phased-ranking.html). Phased ranking enables using more resources during the second phase ranking step than in the first phase. The first phase should focus on getting decent recall (retrieving relevant documents in the top k), while the second phase should tune precision. For [text search](../ranking/nativerank.html) applications, consider using the [WAND](../ranking/wand.html) query operator - WAND can efficiently (sublinear) find the top-k documents using an inner scoring function. ## Multi Lookup - Set filtering Several real-world search use cases are built around limiting or filtering based on a set filter. If the contents of a field in the document match any of the values in the query set, it should be retrieved. E.g., searching data for a set of users: ``` select * from sources * where user_id = 1 or user_id = 2 or user_id = 3 or user_id = 3 or user_id = 4 or user_id 5 ... ``` For OR filters over the same field, it is strongly recommended to use the [in query operator](../reference/querying/yql.html#in) instead. It has considerably better performance than plain OR for set filtering: ``` select * from sources * where user_id in (1, 2, 3, 4, 5) ``` **Note:** Large sets can slow down YQL-parsing of the query - see [parameter substitution](../reference/querying/yql.html#parameter-substitution)for how to send the set in a compact, performance-effective way. Attribute fields used like the above without other stronger query terms, should have `fast-search` and `rank: filter`. If there is a large number of unique values in the field, it is also faster to use `hash` dictionary instead of `btree`, which is the default data structure for dictionaries for attribute fields with `fast-search`: ``` field user_id type long { indexing: summary | attribute attribute: fast-search dictionary: hash rank: filter } ``` For `string` fields, we also need to include [match](/en/reference/schemas/schemas.html#match) settings if using the `hash` dictionary: ``` field user_id_str type string { indexing: summary | attribute attribute: fast-search match: cased rank: filter dictionary { hash cased } } ``` If having 10M unique user\_ids in the dictionary and searching for 1000 users per query, the _btree_ dictionary would be 1000 lookup times log(10M), while _hash_ based would be 1000 lookups times O(1). Still, the _btree_ dictionary offers more flexibility in terms of [match](/en/reference/schemas/schemas.html#match) settings. The `in` query set filtering approach can be used in combination with hybrid _TAAT_ evaluation to further improve performance. See the [hybrid TAAT/DAAT](#hybrid-taat-daat) section. Also see the [dictionary schema reference](../reference/schemas/schemas.html#dictionary). **Note:** For most use cases, the time spent on dictionary traversal is negligible compared to the time spent on query evaluation (matching and ranking). If the query is very selective, for example, using vespa as a key-value lookup store with ranking support, the dictionary traversal time can be significant. ## Document summaries - hits If queries request many (thousands) of hits from a content cluster with few content nodes, increasing the [summary cache](caches-in-vespa.html) might reduce latency and cost. Using [explicit document summaries](../querying/document-summaries.html), Vespa can support memory-only summary fetching if all fields referenced in the document summary are **all** defined with `attribute`. Dedicated in-memory summaries avoid (potential) disk read and summary chunk decompression. Vespa document summaries are stored using compressed [chunks](../reference/applications/services/content.html#summary-store-logstore-chunk). See also the [practical search performance guide on hits fetching](practical-search-performance-guide.html#hits-and-summaries). ## Boolean, numeric, text attribute When using the attribute field type, considering performance, this is a rule of thumb: 1. Use boolean if a field is a boolean (max two values) 2. Use a string attribute if there is a set of values - only unique strings are stored 3. Use a numeric attribute for range searches 4. Use a numeric attribute if the data really is numeric; don't replace numeric with string numeric Refer to [attributes](../content/attributes.html) for details. ## Tensor ranking The ranking workload can be significant for large tensors - it is important to understand both the potential memory and computational cost for each query. ### Memory Assume the dot product of two tensors with 1000 values of 8 bytes each, as in `tensor(x[1000])`. With one query tensor and one document tensor, the dot product is `sum(query(tensor1) * attribute(tensor2))`. Given a Haswell CPU architecture, where the theoretical upper memory bandwidth is 68 GB/sec, this gives 68 GB/sec / 8 KB = 9M ranking evaluations/sec. In other words, for a 1 M index, 9 queries per second before being memory bound. See below for using smaller [cell value types](#cell-value-types), and read more about [quantization](https://blog.vespa.ai/from-research-to-production-scaling-a-state-of-the-art-machine-learning-system/#model-quantization). ### Compute When using tensor types with at least one mapped dimension (sparse or mixed tensor), [attribute: fast-rank](../reference/schemas/schemas.html#attribute) can be used to optimize the tensor attribute for ranking expression evaluation at the cost of using more memory. This is a good tradeoff if benchmarking indicates significant latency improvements with `fast-rank`. When optimizing ranking functions with tensors, try to avoid temporary objects. Use the [Tensor Playground](https://docs.vespa.ai/playground/) to evaluate what the expressions map to, using the execution details to list the detailed steps - find examples below. ### Multiphase ranking To save both memory and compute resources, use [multiphase ranking](../ranking/phased-ranking.html). In short, use less expensive ranking evaluations to find the most promising candidates, then a high-precision evaluation for the top-k candidates. The blog post series on [Building Billion-Scale Vector Search](https://blog.vespa.ai/building-billion-scale-vector-search/) is a good read. ### Cell value types | Type | Description | | --- | --- | | double | The default tensor cell type is the 64-bit floating-point `double` format. It gives the best precision at the cost of high memory usage and somewhat slower calculations. Using a smaller value type increases performance, trading off precision, so consider changing to one of the cell types below before scaling the application. | | float | The 32-bit floating-point format `float` should usually be used for all tensors when scaling for production. Note that some frameworks like TensorFlow prefer 32-bit floats. A vector with 1000 dimensions, `tensor(x[1000])` uses approximately 4K memory per tensor value. | | bfloat16 | This type has the range as a normal 32-bit float but only 8 bits of precision and can be thought of as a "float with lossy compression" - see [Wikipedia](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format). If memory (or memory bandwidth) is a concern, change the most space-consuming tensors to use the `bfloat16` cell type. Some careful analysis of the data is required before using this type. When doing calculations, `bfloat16` will act as if it was a 32-bit float, but the smaller size comes with a potential computational overhead. In most cases, the `bfloat16` needs conversion to a 32-bit float before the actual calculation can occur, adding an extra conversion step. In some cases, having tensors with `bfloat16` cells might bypass some built-in optimizations (like matrix multiplication) that will be hardware-accelerated only if the cells are of the same type. To avoid this, use the [cell\_cast](../reference/ranking/ranking-expressions.html#cell_cast) tensor operation to make sure the cells are of the right type before doing the more expensive operations. | | int8 | If using machine learning to generate a model with data quantization, one can target the `int8` cell value type, which is a signed integer with a range from -128 to +127 only. This is also treated like a "float with limited range and lossy compression" by the Vespa tensor framework, and gives results as if it were a 32-bit float when any calculation is done. This type is also suitable when representing boolean values (0 or 1). **Note:** If the input for an `int8` cell is not directly representable, the resulting cell value is undefined, so take care to only input numbers in the `[-128,127]` range. It's also possible to use `int8` representing binary data for [hamming distance](../reference/schemas/schemas.html#distance-metric) Nearest-Neighbor search. Refer to [billion-scale-knn](https://blog.vespa.ai/billion-scale-knn/) for example use. | ### Inner/outer products The following is a primer into inner/outer products and execution details: | tensor a | tensor b | product | sum | comment | | --- | --- | --- | --- | --- | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(x[3]):[4.0, 5.0, 6.0] | tensor(x[3]):[4.0, 10.0, 18.0] | 32 | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMmz5AFmUBWZQA2KU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). The dimension name and size are the same in both tensors - this is an inner product with a scalar result. | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(y[3]):[4.0, 5.0, 6.0] | tensor(x[3],y[3]):[ [4.0, 5.0, 6.0], [8.0, 10.0, 12.0], [12.0, 15.0, 18.0] ] | 90 | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMmz5AFmUBWZQA2KU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). The dimension size is the same in both tensors, but dimensions have different names -\> this is an outer product; the result is a two-dimensional tensor. | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(x[2]):[4.0, 5.0] | undefined | | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMQrZ8gCzKArFKajroY+qxGmCacZtYU+JHk9Jyw5o7sDIwAVPbWoajhhnHRhA4WCdYQkMlQ0ACutADGZenOkHx1ALbC2bYywTT5OihSINpAA). Two tensors in the same dimension but with different lengths -\> undefined. | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(y[2]):[4.0, 5.0] | tensor(x[3],y[2]):[ [4.0, 5.0], [8.0, 10.0], [12.0, 15.0] ] | 54 | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IcICeiFbPkAWZQBWKU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). Two tensors with different names and dimensions -\> this is an outer product; the result is a two-dimensional tensor. | Inner product - observe optimized into `DenseDotProductFunction` with no temporary objects: ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::DenseDotProductFunction", "symbol": "vespalib::eval::(anonymous namespace)::my_cblas_double_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` Outer product, parsed into a tensor multiplication (`DenseSimpleExpandFunction`), followed by a `Reduce` operation: ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::DenseSimpleExpandFunction", "symbol": "void vespalib::eval::(anonymous namespace)::my_simple_expand_op, true>(vespalib::eval::InterpretedFunction::State&, unsigned long)" }, { "class": "vespalib::eval::tensor_function::Reduce", "symbol": "void vespalib::eval::instruction::(anonymous namespace)::my_full_reduce_op >(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` Note that an inner product can also be run on mapped tensors ([Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPYAF8AlHGCiAjHHnEwogExw1K0QGY4OiZFYQJrCaQzVyuBkQttKaY3QYAjFjUwcoPfkLGSMnKKACwAdAAM2hoArJHaegBskYbOphjmrFaYNpx2zhT42eT0nACWtLQEgjiCWAAmAK4AxlwenoQMjABU7mlmKAC6IBJAA)): ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::SparseFullOverlapJoinFunction", "symbol": "void vespalib::eval::(anonymous namespace)::my_sparse_full_overlap_join_op, true>(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` ### Mapped lookups `sum(model_id * models, m_id)` | tensor name | tensor type | | --- | --- | | model\_id | `tensor(m_id{})` | | models | `tensor(m_id{}, x[3])` | Using a mapped dimension to select an indexed tensor can be considered a [mapped lookup](../ranking/tensor-examples.html#using-a-tensor-as-a-lookup-structure). This is similar to creating a slice but optimized into a single `MappedLookup` - see [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gFssATAgGwH0BLFksmpCIJIAFwK0AzlgBOACkY8WwAL4BKOMEYAmOAEYAdAAYVkARBUCVpDNXK4GRG4MppzdBszbtJ-GpmEocSlZBSVVYgAPRABmAF0NLT04RAAWY2IwAFYMsAA2YzjMnRSAdlyADlyATkLTd0sMawE7TAcRJ3cKfFbyehFJAFdGBVYOJTAAKjAvDklipTU-f0IGIZHZrl4pmbGfBd4lhqtnCF6odtXTzFdziEh+qEl2bgBjTpXVkVHvSUnNxZaJRwHT1fyNVDNWxdS5CZbkW7ue6PJgAQxwOAILE47CwWAA1oMcJwZFjBu94YJApBSSxyQQfnN-nslMR1sRFIczOCrCg4iAVEA) example. ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::MappedLookup", "symbol": "void vespalib::eval::(anonymous namespace)::my_mapped_lookup_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` ### Three-way dot product - mapped `sum(query(model_id) * model_weights * model_features)` | tensor name | tensor type | | --- | --- | | query(model\_id) | `tensor(model{})` | | model\_weights | `tensor(model{}, feature{})` | | model\_features | `tensor(feature{})` | Three-way mapped (sparse) dot product: [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEcBXAgJwE8AKAWywBMCAGwD6AS34BKEmRqQiCSABcCtAM5Y2AHmhCsAQyUA+XgOHAAvpLjAeABjgBGC5FkQLsi6QzVyuBkTecpRobnQMfIKiAO4EYgDmABZKajI0mApQKuqaOnqGJpHmXmDQBIbMbASW1sBgADq0tmZCcPbEpeVKlQRw0HYWTsSNzVFtdh1lFVV9znAATMNNRa08jpNdPX0DcADMS6PCbeud073QcwAsYC5hHhhesr6Y-oqBYRT4z+T0iisiU26VVSQXS8gY2Q02l0BmMXEBPRqNn6cAArJNHHAAGy3dL3VCPHwfV6ENLBL5hCCQX5QFjsbj-CSSMAAKjA-1iCWSalZ7JaAM2wLJYMyTFYnFMUXEUl5HLiSRSsv5CKFd08oNCYJJ4I1VJC33CijUzB4XDpEsZMrZcq5iutysFBDU0l1GQYxtN5oZ-KZSqlnIVPPtUpVTukaoeKAAuiALEA) ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::Sparse112DotProduct", "symbol": "void vespalib::eval::(anonymous namespace)::my_sparse_112_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` ### Three-way dot product - mixed `sum(query(model_id) * model_weights * model_features)` | tensor name | tensor type | | --- | --- | | query(model\_id) | `tensor(model{})` | | model\_weights | `tensor(model{}, feature[2])` | | model\_features | `tensor(feature[2])` | Three-way mapped (mixed) dot product: [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEcBXAgJwE8AKAWywBMCAGwD6AS34BKEmRqQiCSABcCtAM5Y2AHmhCsAQyUA+XgOHAAvpLjAeABjgBGC5FkQLsi6QzVyuBkTecpRobnQMfIKiAO4EYgDmABZKajI0mApQKuqaOnqGJpHmXmDQBIbMbASIAEwAutbAYAA6tLZmQnD2xKXlSpUEcHYWTsSt7VFddj1lFVVOIzVjbUWdPI4zfQNDIwDMyxPCXRu9c4POcAAsYC5hHhhesr6Y-oqBYRT4z+T0iqsis36VVSQXS8gY2Q02l0BmMXEBA1qDTgiAArMQAGx1Vzpe6oR4+D6vQhpYJfMIQSC-KAsdjcf4SSRgABUYH+sQSyTULLZHQBW2BpLBmSYrE4pii4ikPPZcSSKRlfIRgrunlBoTBxPB6spIW+4UUamYPC4tPFDOlrNlnIVVqVAoIamkOoyDCNJrN9L5jMVko58u5dslysd0lVDxQdRAFiAA) ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::Mixed112DotProduct", "symbol": "void vespalib::eval::(anonymous namespace)::my_mixed_112_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Attribute vs index](#attribute-vs-index) - [When to use fast-search for attribute fields](#when-to-use-fast-search-for-attribute-fields) - [Tuning query performance for lexical search](#tuning-query-performance-for-lexical-search) - [Posting Lists](#posting-lists) - [Performance](#performance) - [Hybrid TAAT and DAAT query evaluation](#hybrid-taat-daat) - [Indexing uuids](#indexing-uuids) - [Parent child and search performance](#parent-child-and-search-performance) - [Ranking and ML Model inferences](#ranking-and-ml-model-inferences) - [Multi Lookup - Set filtering](#multi-lookup-set-filtering) - [Document summaries - hits](#document-summaries-hits) - [Boolean, numeric, text attribute](#boolean-numeric-text-attribute) - [Tensor ranking](#tensor-ranking) - [Memory](#memory) - [Compute](#compute) - [Multiphase ranking](#multiphase-ranking) - [Cell value types](#cell-value-types) - [Inner/outer products](#Inner-outer-products) - [Mapped lookups](#mapped-lookups) - [Three-way dot product - mapped](#three-way-dot-product-mapped) - [Three-way dot product - mixed](#three-way-dot-product-mixed) --- # Source: https://docs.vespa.ai/ja/features.html.md # Source: https://docs.vespa.ai/en/learn/features.html.md # Features ## What is Vespa? Vespa is a platform for applications which need low-latency computation over large data sets. It allows you to write and persist any amount of data, and execute high volumes of queries over the data which typically complete in tens of milliseconds. Queries can use both structured filters conditions, text and nearest neighbor vector search to select data. All the matching data is then ranked according to ranking functions - typically machine learned - to implement such use cases as search relevance, recommendation, targeting and personalization. All the matching data can also be grouped into groups and subgroups where data is aggregated for each group to implement features like graphs, tag clouds, navigational tools, result diversity and so on. Application specific behavior can be included by adding Java components for processing queries, results and writes to the application package. Vespa is real time. It is architected to maintain constant response times with any data volume by executing queries in parallel over many data shards and cores, and with added query volume by executing queries in parallel over many copies of the same data (groups). It is optimized to return responses in tens of milliseconds. Writes to data becomes visible in a few milliseconds and can be handled at a rate of thousands to tens of thousands per node per second. A lot of work has gone into making Vespa easy to set up and operate. Any Vespa application - from single node systems to systems running on hundreds of nodes in data centers - are fully configured by a single artifact called an _application package_. Low level configuration of nodes, processes and components is done by the system itself based on the desired traits specified in the application package. Vespa is scalable. System sizes up to hundreds of nodes handling tens of billions of documents, and tens of thousands of queries per second are not uncommon, and no harder to set up and modify than single node systems. Since all system components, as well as stored data is redundant and self-correcting, hardware failures are not operational emergencies and can be handled by re-adding capacity when convenient. Vespa is self-repairing and dynamic. When machines are lost or new ones added, data is automatically redistributed over the machines, while continuing serving and accepting writes to the data. Changes to configuration and Java components can be made while serving by deploying a changed application package - no downtime or restarts required. ## Features This section provides an overview of the main features of Vespa. The remainder of the documentation goes into full detail. ### Data and writes - Documents in Vespa may be added, replaced, modified (single fields or any subset) and removed. - Writes are acknowledged back to the client issuing them when they are durable and visible in queries, in a few milliseconds. - Writes can be issued at a sustained volume of thousands to tens of thousands per node per second while serving queries. - Data is replicated with a configurable redundancy. - An even data distribution, with the desired redundancy is automatically maintained when nodes are added, removed or lost unexpectedly. - Data corruption is automatically repaired from an uncorrupted replica of the data. - Data is written over a simple HTTP/2 API, or (for high volume) using a small, standalone client. - Document data schemas allow fields of any of the usual primitive types as well as collections, structs and tensors. - Any number of data schemas can be used at the same time. - Documents may reference each other and field from referenced documents may be used in queries without performance penalty. - Write operations can be processed by adding custom Java components. - Data can be streamed out of the system for batch reprocessing. ### Queries - Queries may contain any combination of structured filters, free text and vector search operators. - Queries may contain large tensors and vectors (to represent e.g a user). - Queries choose how results should be ranked and specify how they should be organized (see sections below). - Queries and results may be processed by adding custom Java components - or any HTTP request may be turned into a query by custom request handlers. - Query response times are typically in tens of milliseconds and can be maintained given any load and data size by adding more hardware. - A _streaming search_ mode is available where search/selection is only supported on predefined groups of documents (e.g a user's document). In this mode each node can store and serve billions of documents while maintaining low response times. ### Ranking and inference - All results are ranked using a configured ranking function, selected in the query. - A ranking function may be any mathematical function over scalars or tensors (multidimensional arrays). - Scalar functions include an "if" function to express business logic and decision trees. - Tensor functions include a powerful set of primitives and composite functions which allows expression of advanced machine-learned ranking functions such as e.g. deep neural nets. - Functions can also refer to ONNX models invoked locally on the content nodes. - Multiple ranking phases are supported to allocate more CPU to ranking promising candidates. - A powerful set of text ranking features using positional information from the documents is provided out of the box. - Other ranking features include 2D distance and freshness. ### Organizing data and presenting results - Matches to a query can be grouped and aggregated according to a specification in the query. - All the matches are included, even though they reside on multiple machines executing in parallel. - Matches can be grouped by a unique value or by a numerical bucket. - Any level of groups and subgroups are supported, and multiple parallel groupings can be specified in one query. - Data can be aggregated (counted, averaged etc.) and selected within each group and subgroup. - Any selection of data from documents can be included with the final result returned to the client. - Search engine style keyword highlighting in matching fields is supported. ## Configuration and operations - Vespa can be installed using rpm files or a Docker image - on personal laptops, owned datacenters or in AWS. - An application of Vespa is fully specified as a separate buildable artifact: An _application package_ - individual machines or processes need never be configured individually. - Systems may contain multiple clusters of each type (stateless and stateful), each containing any number of nodes. - Systems of any size may be specified by two short configuration files in the application package. - Document schemas, Java components and ranking functions/models are also configured in the application package. - An application package is deployed as a single unit to Vespa to realizes the system desired by the application. - Most application changes (including Java component changes) can be performed by deploying a changed application package - the system will manage its own change process while serving and handling writes. - Most document schema changes (excluding field type changes) can be made while the system is live. - Application package changes are validated on deployment to prevent destructive changes to live systems. - Vespa has no single-point-of-failures and automatically routes around failing nodes. - System logs are collected to a central server in real time. - Selected metrics may be emitted to a third-party metrics/alerting system from all the nodes. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [What is Vespa?](#what-is-vespa) - [Features](#features) - [Data and writes](#data-and-writes) - [Queries](#queries) - [Ranking and inference](#ranking-and-inference) - [Organizing data and presenting results](#organizing-data-and-presenting-results) - [Configuration and operations](#configuration-and-operations) --- # Source: https://docs.vespa.ai/en/querying/federation.html.md # Federation ![Federation example](/assets/img/federation-simple.svg) The Vespa Container allows multiple sources of data to be _federated_ to a common search service. The sources of data may be both search clusters belonging to the same application, or external services, backed by Vespa or any other kind of service. The container may be used as a pure _federation platform_ by setting up a system consisting solely of container nodes federating to external services. This document gives a short intro to federation, explains how to create an application package doing federation and shows what support is available for choosing the sources given a query, and the final result given the query and some source specific results. _Federation_ allows users to access data from multiple sources of various kinds through one interface. This is useful to: - enrich the results returned from an application with auxiliary data, like finding appropriate images to accompany news articles. - provide more comprehensive results by finding data from alternative sources in the cases where the application has none, like back-filling web results. - create applications whose main purpose is not to provide access to some data set but to provide users or frontend applications a single starting point to access many kinds of data from various sources. Examples are browse pages created dynamically for any topic by pulling together data from external sources. The main tasks in creating a federation solution are: 1. creating connectors to the various sources 2. selecting the data sources which will receive a given query 3. rewriting the received request to an executable query returning the desired data from each source 4. creating the final result by selecting from, organizing and combining the returned data from each selected source The container aids with these tasks by providing a way to organize a federated execution as a set of search chains which can be configured through the application package. Read the [Container intro](../applications/containers.html) and[Chained components](../applications/chaining.html) before proceeding. Refer to the `com.yahoo.search.federation`[Javadoc](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/federation/package-summary.html). ## Configuring Providers A _provider_ is a search chain that produces data (in the form of a Result) from a data source. The provider must contain a Searcher which connects to the data source and produces a Result from the returned data. Configure a provider as follows: ``` ``` You can add multiple searchers in the provider just like in other chains. Search chains that provide data from some content cluster in the same application are also _providers_. To explicitly configure a provider talking to internal content clusters, set the attribute type="local" on the provider. That will automatically add the searchers necessary to talk to internal content clusters to the search chain. Example: querying this provider will not lowercase / stem terms: ``` ``` ## Configuring Sources A single provider may be used to produce multiple kinds of results. To implement and present each kind of result, we can use _sources_. A _source_ is a search chain that provides a specific kind of result by extending or modifying the behavior of one or more providers. Suppose that we want to retrieve two kinds of results from my-provider: Web results and java API documentation: ``` ``` This results in two _source search chains_ being created,`web@my-provider` and `java-api@my-provider`. Each of them constitutes a source, namely `web` and `java-api` respectively. As the example suggests, these search chains are named after the source and the enclosing provider. The @-sign in the name should be read as _in_, so `web@my-provider` should for example be read as _web in my-provider_. The JavaApiSearcher is responsible for modifying the query so that we only get hits from the java API documentation. We added this searcher directly inside the source element; source search chains and providers are both instances of search chains. All the options for configuring regular search chains are therefore also available for them. How does the `web@my-provider`and `java-api@my-provider` source search chains use the`my-provider` provider to send queries to the external service? Internally, the source search chains _inherit_ from the enclosing provider. Since the provider contains searchers that know how to talk to the external service, the sources will also contain the same searchers. As an example, consider the "web" search chain; It will contain exactly the same searcher instances as the`my-provider` search chain. By organizing chains for talking to data providers, we can reuse the same connections and logic for talking to remote services ("providers") for multiple purposes ("sources"). The provider search chain `my-provider` is _not modified_ by adding sources. To verify this, try to send queries to the three search chains`my-provider`, `web@my-provider` and `java-api@my-provider`. ### Multiple Providers per Source You can create a source that consists of source search chains from several providers. Effectively, this lets you vary which provider should be used to satisfy each request to the source: ``` ``` Here, the two source search chains `common-search@news-search` and`common-search@my-provider` constitutes a single source `common-search`. The source search chains using the `idref` attribute are called participants, while the ones using the `id` attribute are called leaders. Each source must consist of a single leader and zero or more participants. Per default, only the leader search chain is used when _federating_ to a source. To use one of the participants instead, use [sources](../reference/api/query.html#model.sources) and _source_: ``` http://[host]:[port]/?sources=common-search&source.common-search.provider=news-search ``` ## Federation Now we can search both the web and the java API documentation at the same time, and get a combined result set back. We achieve this by setting up a _federation_ searcher: ``` ``` Inside the Federation element, we list the sources we want to use. Do not let the name _source_ fool you; If it behaves like a source, then you can use it as a source (i.e. all types of search chains including providers are accepted). As an example, try replacing the _web_ reference with _my-provider_. When searching, select a subset of the sources specified in the federation element by specifying the [sources](../reference/api/query.html#model.sources) query parameter. ## Built-in Federation The built-in search chains _native_ and_vespa_ contain a federation searcher named _federation._This searcher has been configured to federate to: - All sources - All providers that does not contain a source If configuring your own federation searcher, you are not limited to a subset of these sources - you can use any provider, source or search chain. ## Inheriting default Sources To get the same sources as the built-in federation searcher, inherit the default source set: ``` ... ``` ## Changing content cluster chains With the information above, we can create a configuration where we modify the search chain sending queries to and receiving queries form a single content cluster (here, removing a searcher and adding another): ``` ``` ## Timeout behavior What if we want to limit how much time a provider is allowed to use to answer a query? ``` ``` The provider search chain will then be limited to use 100 ms to execute each query. The Federation layer allows all providers to continue until the non-optional provider with the longest timeout is finished or canceled. In some cases it is useful to be able to keep executing the request to a provider longer than we are willing to wait for it in that particular query. This allows us to populate caches inside sources which can only meet the timeout after caches are populated. To use this option, specify a [request timeout](../reference/applications/services/search.html#federationoptions)for the provider: ``` ... ``` Also see [Searcher timeouts](../applications/searchers.html#timeouts). ## Non-essential Providers Now let us add a provider that retrieves ads: ``` ``` Suppose that it is more important to return the result to the user as fast as possible, than to retrieve ads. To signal this, we mark the ads provider as _optional_: ``` ``` The Federation searcher will then only wait for ads as long as it waits for mandatory providers. If the ads are available in time, they are used, otherwise they are dropped. If only optional providers are selected for Federation, they will all be treated as mandatory. Otherwise, they would not get a chance to return any results. ## Federation options inheritance The sources automatically use the same Federation options as the enclosing provider._override_ one or more of the federation options in the sources: ``` ``` You can use a single source in different Federation searchers. If you send queries with different cost to the same source from different federation searchers, you might also want to _override_ the federation options for when they are used: ``` ``` ## Selecting Search Chains programmatically If we have complicated rules for when a search chain should be used, we can select search chains programmatically instead of setting up sources under federation in services.xml. The selection code is implemented as a[TargetSelector](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/federation/selection/TargetSelector.html). This TargetSelector is used by registering it on a federation searcher. ``` ``` package com.yahoo.example; import com.google.common.base.Preconditions; import com.yahoo.component.chain.Chain; import com.yahoo.processing.execution.chain.ChainRegistry; import com.yahoo.search.Query; import com.yahoo.search.Result; import com.yahoo.search.result.Hit; import com.yahoo.search.Searcher; import com.yahoo.search.federation.selection.FederationTarget; import com.yahoo.search.federation.selection.TargetSelector; import com.yahoo.search.searchchain.model.federation.FederationOptions; import java.util.Arrays; import java.util.Collection; class MyTargetSelector implements TargetSelector { @Override public Collection> getTargets(Query query, ChainRegistry searcherChainRegistry) { Chain searchChain = searcherChainRegistry.getComponent("my-chain"); Preconditions.checkNotNull(searchChain, "No search chain named 'my-chain' exists in services.xml"); return Arrays.asList(new FederationTarget<>(searchChain, new FederationOptions(), null)); } @Override public void modifyTargetQuery(FederationTarget target, Query query) { query.setHits(10); } @Override public void modifyTargetResult(FederationTarget target, Result result) { for (Hit hit: result.hits()) { hit.setField("my-field", "hello-world"); } } } ``` ``` The target selector chooses search chains for the federation searcher. In this example, MyTargetSelector.getTargets returns a single chain named "my-chain" that has been set up in `services.xml`. Before executing each search chain, the federation searcher allows the target selector to modify the query by calling modifyTargetQuery. In the example, the number of hits to retrieve is set to 10. After the search chain has been executed, the federation searcher allows the target selector to modify the result by calling modifyTargetResult. In the example, each hit gets a field called "my-field" with the value "hello-world". Configure a federation searcher to use a target selector in `services.xml`. Only a single target selector is supported. ``` ``` We can also set up both a target-selector and normal sources. The federation searcher will then send queries both to programmatically selected sources and those that would normally be selected without the target selector: ``` ... ``` ## Example: Setting up a Federated Service A federation application is created by providing custom searcher components performing the basic federation tasks and combining these into chains in a federation setup in[services.xml](../reference/applications/services/services.html). For example, this is a complete configuration which sets up a cluster of container nodes (having 1 node) which federates to the another Vespa service (news) and to some web service: ``` ``` This creates a configuration of search chains like: ![Federation example](/assets/img/federation.svg) Each provider _is_ a search chain ending in a Searcher forwarding the query to a remote service. In addition, there is a main chain (included by default) ending in a FederationSearcher, which by default forwards the query to all the providers in parallel. The provider chains returns their result upwards to the federation searcher which merges them into a complete result which is returned up the main chain. This services file, an implementation of the `example` classes (see below), and _[hosts.xml](../reference/applications/hosts.html)_listing the container nodes, is all that is needed to set up and[deploy](../basics/applications.html#deploying-applications)an application federating to multiple sources. For a reference to these XML sections, see the [chains reference](../reference/applications/services/search.html#chain). The following sections outlines how this can be elaborated into a solution producing more user-friendly federated results. ### Selecting Sources To do the best possible job of bringing relevant data to the user, we should send every query to all sources, and decide what data to include when all the results are available, and we have as much information as possible at hand. In general this is not advisable because of the resource cost involved, so we must select a subset based on information in the query. This is best viewed as a probabilistic optimization problem: The selected sources should be the ones having a high enough probability of being useful to offset the cost of querying it. Any Searcher which is involved in selecting sources or processing the entire result should be added to the main search chain, which was created implicitly in the examples above. To do this, the main chain should be created explicitly: ``` \\\ ``` This adds an explicit main chain to the configuration which has two additional searchers in addition to those inherited from the `native` chain, which includes the FederationSearcher. Note that if the full Vespa functionality is needed, the `vespa` chain should be inherited rather than `native`. The chain called `default` will be invoked if no searchChain parameter is given in the query. To learn more about creating Searcher components, see [searcher development](../applications/searchers.html). ### Rewriting Queries to Individual Providers The _provider_ searchers are responsible for accepting a Query object, translating it to a suitable request to the backend in question and deserializing the response into a Result object. There is often a need to modify the query to match the particulars of a provider before passing it on: - To get results from the provider which matches the determined interpretation and intent as well as possible, the query may need to be rewritten using detailed information about the provider - Parameters beyond the basic ones supported by each provider searcher may need to be translated to the provider - There may be a need for provider specific business rules These query changes may range in complexity from setting a query parameter, applying some source specific information to the query or transferring all the relevant query state into a new object representation which is consumed by the provider searcher. This example shows a searcher adding a customer id to the `news` request: ``` ``` package com.yahoo.example; import com.yahoo.search.searchchain.Execution; import com.yahoo.search.*; public class NewsCustomerIdSearcher extends Searcher { @Override public Result search(Query query, Execution execution) { String customerId="provider.news.custid"; if (query.properties().get(customerId) == null) query.properties().set(customerId, "yahoo/test"); if (query.getTraceLevel() >= 3) query.trace("News provider: Will use " + customerId + "=" + query.properties().get(customerId), false, 3); return execution.search(query); } } ``` ``` This searcher should be added to the `news` source chain as shown above. You may have noticed that we have referred to the search chains talking to a service as a **provider**while referring to selection of **sources**. The reason for making this distinction is that it is sometimes useful to treat different kinds of processing of queries and results to/from the same service as different sources. Hence, it is possible to create `source` search chains in addition to the provider chains in _services.xml_. Each such source will refer to a provider (by inheriting the provider chain) but include some searchers specific to that source. Selection and routing of the query from the federation searchers is always to sources, not providers. By default, if no source tags are added in the provider, each provider implicitly creates a source by the same name. ### Processing Results When we have selected the sources, created queries fitting to get results from each source and executed those queries, we have produced a result which contains a HitGroup per source containing the list of hits from that source. These results may be returned in XML as is, preserving the structure as XML, by requesting the [page](../reference/querying/page-result-format.html) result format: ``` http://[host]:[port]/search/?query=test&presentation.format=page ``` However, this is not suitable for presenting to the user in most cases. What we want to do is select the subset of the hits having the highest probable utility to the user, organized in a way that maximizes the user experience. This is not an easy task, and we will not attempt to solve it here, other than noting that any solution should make use of both the information in the intent model and the information within the results from each source, and that this is a highly connected optimization problem because the utility of including some data in the result depends on what other data is included. Here we will just use a searcher which shows how this is done in principle, this searcher flattens the news and web service hit groups into a single list of hits, where only the highest ranked news ones are included: ``` ``` package com.yahoo.example; import com.yahoo.search.*; import com.yahoo.search.result.*; import com.yahoo.search.searchchain.Execution; public class ResultBlender extends Searcher { @Override public Result search(Query query,Execution execution) { Result result = execution.search(query); HitGroup news = (HitGroup)result.hits().remove("source:news"); HitGroup webService = (HitGroup)result.hits().remove("source:webService"); if (webService == null) return result; result.hits().addAll(webService.asList()); if (news == null) return result; for (Hit hit : news.asList()) if (shouldIncludeNewsHit(hit)) result.hits().add(hit); return result; } private boolean shouldIncludeNewsHit(Hit hit) { if (hit.isMeta()) return true; if (hit.getRelevance().getScore() > 0.7) return true; return false; } } ``` ``` The optimal result to return to the user is not necessarily one flattened list. In some cases it may be better to keep the source organization, or to pick some other organization. The [page result format](../reference/querying/page-result-format.html)requested in the query above is able to represent any hierarchical organization as XML. A more realistic version of this searcher will use that to choose between some predefined layouts which the frontend in question knows how to handle, and choose some way of grouping the available hits suitable for the selected layout. This searcher should be added to the main (`default`) search chain in_services.xml_ together with the SourceSelector (the order does not matter). ### Unit Testing the Result Processor Unit test example for the Searcher above: ``` ``` package com.yahoo.search.example.test; import org.junit.Test; import com.yahoo.search.searchchain.*; import com.yahoo.search.example.ResultBlender; import com.yahoo.search.*; import com.yahoo.search.result.*; public class ResultBlenderTestCase { @Test public void testBlending() { Chain chain = new Chain(new ResultBlender(), new MockBackend()); Context context = Execution.Context.createContextStub(null); Result result = new Execution(chain, context).search(new Query("?query=test")); assertEquals(4, result.hits().size()); assertEquals("webService:1", result.hits().get(0).getId().toString()); assertEquals("news:1", result.hits().get(1).getId().toString()); assertEquals("webService:2", result.hits().get(2).getId().toString()); assertEquals("webService:3", result.hits().get(3).getId().toString()); } private static class MockBackend extends Searcher { @Override public Result search(Query query,Execution execution) { Result result = new Result(query); HitGroup webService = new HitGroup("source:webService"); webService.add(new Hit("webService:1",0.9)); webService.add(new Hit("webService:2",0.7)); webService.add(new Hit("webService:3",0.5)); result.hits().add(webService); HitGroup news = new HitGroup("source:news"); news.add(new Hit("news:1",0.8)); news.add(new Hit("news:2",0.6)); news.add(new Hit("news:3",0.4)); result.hits().add(news); return result; } } } ``` ``` This shows how a search chain can be created programmatically, with a mock backend producing results suitable for exercising the functionality of the searcher being tested. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Configuring Providers](#configuring-providers) - [Configuring Sources](#configuring-sources) - [Multiple Providers per Source](#multiple-providers-per-source) - [Federation](#federation) - [Built-in Federation](#built-in-federation) - [Inheriting default Sources](#inheriting-default-sources) - [Changing content cluster chains](#changing-content-cluster-chains) - [Timeout behavior](#timeout-behavior) - [Non-essential Providers](#non-essential-providers) - [Federation options inheritance](#federation-options-inheritance) - [Selecting Search Chains programmatically](#selecting-search-chains-programmatically) - [Example: Setting up a Federated Service](#setting-up-a-federated-service) - [Selecting Sources](#selecting-sources) - [Rewriting Queries to Individual Providers](#rewriting-queries-to-individual-providers) - [Processing Results](#processing-results) - [Unit Testing the Result Processor](#unit-testing-the-result-processor) --- # Source: https://docs.vespa.ai/en/writing/feed-block.html.md # Feed block A content cluster blocks external write operations when at least one content node has reached the [resource limit](../reference/applications/services/content.html#resource-limits) of disk or memory. This is done to avoid saturating resource usage on content nodes. The _Cluster controller_ monitors the resource usage of the content nodes and decides whether to block feeding. Transient resource usage (see details in the metrics below) is not included in the monitored usage. This ensures that transient resource usage is covered by the resource headroom on the content nodes, instead of leading to feed blocked due to natural fluctuations. **Note:** When running Vespa in a Docker image on a laptop, one can easily get `[UNKNOWN(251009) @ tcp/vespa-host:19112/default]: ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: in content cluster 'example': disk on node 0 [vespa-host] is 76.7% full (the configured limit is 75.0%, effective limit lowered to 74.0% until feed unblocked)`. Fix this by increasing allocated storage for the Docker daemon, clean up unused volumes or remove unused Docker images. HTTP clients will see _507 Server Error: Insufficient Storage_ when this happens. When feed is blocked, write operations are rejected by _Distributors_. All Put operations and most Update operations are rejected. These operations are still allowed: - Remove operations - Update [assign](../reference/schemas/document-json-format.html#assign) operations to numeric single-value fields To remedy, add nodes to the content cluster. The data will [auto-redistribute](../content/elasticity.html), and feeding is unblocked when all content nodes are below the limits. For self-managed Vespa you can configure [resource-limits](../reference/applications/services/content.html#resource-limits), although this is not recommended. Increasing them too much might lead to OOM and content nodes being unable to start. **Important:** Always **add** nodes, do not change node capacity - this is in practise safer and quicker. As most Vespa applications are set up on homogeneous nodes, changing node capacity can cause a full node set swap and more data copying than just adding more nodes of the same kind. Copying data will in itself stress nodes, adding one node is normally the smallest and safest change. These [metrics](../operations/metrics.html) are used to monitor resource usage and whether feeding is blocked: | cluster-controller.resource\_usage.nodes\_above\_limit | The number of content nodes that are above one or more resource limits. When above 0, feeding is blocked. | | content.proton.resource\_usage.disk | A number between 0 and 1, indicating how much disk (of total available) is used on the content node. Transient disk used during [disk index fusion](../content/proton.html#disk-index-fusion) is not included. | | content.proton.resource\_usage.memory | A number between 0 and 1, indicating how much memory (of total available) is used on the content node. Transient memory used by [memory indexes](../content/proton.html#memory-index-flush) is not included. | When feeding is blocked, error messages are returned in write operation replies - example: ``` ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: in content cluster 'example': memory on node 0 [my-vespa-node-0.example.com] is 82.0% full (the configured limit is 80.0%, effective limit lowered to 79.0% until feed unblocked)) ``` Note that when feeding is blocked resource usage needs to decrease below another, lower limit before getting unblocked. This is to avoid flip-flopping between blocking and unblocking feed when being near the limit. This lower limit is 1% lower than the configured limit. The address space used by data structures in attributes (_Multivalue Mapping_, _Enum Store_, and _Tensor Store_) can also go full and block feeding - see [attribute data structures](../content/attributes.html#data-structures) for details. This will rarely happen. The following metric is used to monitor address space usage: | content.proton.documentdb.attribute.resource\_usage.address\_space.max | A number between 0 and 1, indicating how much address space is used by the worst attribute data structure on the content node. | An error is returned when the address space limit (default value is 0.90) is exceeded: ``` ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: in content cluster 'example': attribute-address-space:example.ready.a1.enum-store on node 0 [my-vespa-node-0.example.com] is 91.0% full (the configured limit is 90.0%)) ``` To remedy, add nodes to the content cluster to distribute documents with attributes over more nodes. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/self-managed/files-processes-and-ports.html.md # Files, Processes, Ports, Environment This is a reference of directories used in a Vespa installation, processes that run on the Vespa nodes and ports / environment variables used. Also see [log files](../../reference/operations/log-files.html). ## Directories | Directory | Description | | --- | --- | | $VESPA\_HOME/bin/ | Command line utilities and scripts | | $VESPA\_HOME/libexec/vespa/ | Command line utilities and scripts | | $VESPA\_HOME/sbin/ | Server programs, daemons, etc | | $VESPA\_HOME/lib64/ | Dynamically linked libraries, typically third-party libraries | | $VESPA\_HOME/lib/jars/ | Java archives | | $VESPA\_HOME/logs/vespa/ | Log files | | $VESPA\_HOME/var/db/vespa/config\_server/serverdb/ | Config server database and user applications | | $VESPA\_HOME/share/vespa/ | A directory with config definitions and XML schemas for application package validation | | $VESPA\_HOME/conf/vespa | Various config files used by Vespa or libraries Vespa depend on | ## Processes and ports The following is an overview of which ports and port ranges are used by the different services in a Vespa system. Note that for services capable of running multiple instances on the same node, all instances will run within the listed port range. Processes are run as user `vespa`. Many services are allocated ports dynamically. So even though the allocation is deterministic, i.e. the same system will get the same ports on subsequent startups, a particular service instance may get different ports when the overall system setup is changed through [services.xml](../../reference/applications/services/services.html). Use [vespa-model-inspect](../../reference/operations/self-managed/tools.html#vespa-model-inspect) to see port allocations. - The number of ports used in a range depends on number of instances that are running - Not all ports within a range are used, but they are assigned each service to support future extensions - The range from 19100 is used for internal communication ports, i.e. ports that are not necessary to use from an external API - See [Configuring Http Servers and Filters](../../applications/http-servers-and-filters.html) for how to configure Container ports and [services.xml](../../reference/applications/services/services.html) for how to configure other ports | Process | Host | Port/range | ps | Function | | --- | --- | --- | --- | --- | | [Config server](configuration-server.html) | Config server nodes | 19070-19071 | java (...) -jar $VESPA\_HOME/lib/jars/standalone-container-jar-with-dependencies.jar | Vespa Configuration server | | 2181-2183 | | Embedded Zookeeper cluster ports, see [zookeeper-server.def](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/zookeeper-server.def) | | [Config sentinel](config-sentinel.html) | All nodes | 19098 | $VESPA\_HOME/sbin/vespa-config-sentinel | Sentinel that starts and stops vespa services and makes sure they are running unless they are manually stopped | | [Config proxy](config-proxy.html) | All nodes | 19090 | java (…) com.yahoo.vespa.config.proxy.ProxyServer | Communication liaison between Vespa processes and config server. Caches config in memory | | [Slobrok](slobrok.html) | Admin nodes | 19099 for RPC port, HTTP port dynamically allocated in the 19100-19899 range | $VESPA\_HOME/sbin/vespa-slobrok | Service location object broker | | [logd](../../reference/operations/log-files.html#logd) | All nodes | 19089 | $VESPA\_HOME/sbin/vespa-logd | Reads local log files and sends them to log server | | [Log server](../../reference/operations/log-files.html#log-server) | Log server node | 19080 | java (...) -jar lib/jars/logserver-jar-with-dependencies.jar | Vespa Log server | | [Metrics proxy](monitoring.html#metrics-proxy) | All nodes | 19092-19095 | java (...) -jar $VESPA\_HOME/lib/jars/container-disc-with-dependencies.jar | Provides a single access point for metrics from all services on a Vespa node | | [Distributor](../../content/content-nodes.html#distributor) | Content cluster | dynamically allocated in the 19100-19899 range | $VESPA\_HOME/sbin/vespa-distributord-bin | Content layer distributor processes | | [Cluster controller](../../content/content-nodes.html#cluster-controller) | Content cluster | 19050, plus ports dynamically allocated in the 19100-19899 range | java (...) -jar $VESPA\_HOME/lib/jars/container-disc-jar-with-dependencies.jar | Cluster controller processes, manages state for content nodes | | [proton](../../content/proton.html) | Content cluster | dynamically allocated in the 19100-19899 range | $VESPA\_HOME/sbin/vespa-proton-bin | Searchnode process, receives queries from the container and returns results from the indexes. Also receives feed and indexes documents | | [container](../../applications/containers.html) | Container cluster | 8080 | java (...) -jar $VESPA\_HOME/lib/jars/container-disc-with-dependencies.jar | Container running servers, handlers and processing components | ## System limits The [startup scripts](admin-procedures.html#vespa-start-stop-restart) checks that system limits are set, failing startup if not. Refer to [vespa-configserver.service](https://github.com/vespa-engine/vespa/blob/master/vespabase/src/vespa-configserver.service.in) and [vespa.service](https://github.com/vespa-engine/vespa/blob/master/vespabase/src/vespa.service.in) for minimum values. ## Core dumps Example settings: ``` $ mkdir -p /tmp/cores && chmod a+rwx /tmp/cores $ echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern ``` This will write files like _/tmp/cores/core.vespa-proton-bi.1721.localhost.1580387387_. ## Environment variables Vespa configuration is set in [application packages](../../basics/applications.html). Some configuration is used to bootstrap nodes - this is set in environment variables. Environment variables are only read at startup. _$VESPA\_HOME/conf/vespa/default-env.txt_ is read in Vespa start scripts - use this to modify variables ([example](multinode-systems.html#aws-ec2)). Each line has the format `action variablename value` where the items are: | Item | Description | | --- | --- | | action | One of `fallback`, `override`, or `unset`. `fallback` sets the variable if it is unset (or empty). `override` set the value regardless. `unset` unsets the variable. | | variablename | The name of the variable, e.g. `VESPA_CONFIGSERVERS` | | value | The rest of the line is the variable's value. | Refer to the [template](https://github.com/vespa-engine/vespa/blob/master/vespabase/conf/default-env.txt.in) for format. | Environment variable | Description | | --- | --- | | VESPA\_CONFIGSERVERS | A comma-separated list of hosts to run configservers, use fully qualified hostnames. Should always be set to the same value on all hosts in a multi-host setup. If not set, `localhost` is assumed. Refer to [configuration server operations](configuration-server.html). | | VESPA\_HOSTNAME | Vespa uses `hostname` for node identity. But sometimes this doesn't work properly, either because that name can't be used to find an IP address which works for connecting to services running on the node, or it's just that the name doesn't agree with what the config server thinks the node's host name is. In this case, override by setting the `VESPA_HOSTNAME`, to be used instead of running the `hostname` command. Note that `VESPA_HOSTNAME` will be used _both_ when a node identifies itself to the config server _and_ when a service on that node registers a network connection point that other services can connect to. An error message with "hostname detection failed" is emitted if the `VESPA_HOSTNAME` isn't set and the hostname isn't usable. If `VESPA_HOSTNAME` is set to something that cannot work, an error with "hostname validation failed" is emitted instead. | | VESPA\_CONFIG\_SOURCES | Used by libraries like the [Document API](../../writing/document-api-guide.html) to set config server endpoints. Refer to [configuration server operations](configuration-server.html#configuration) for example use. | | VESPA\_WEB\_SERVICE\_PORT | The port number where REST apis will run, default `8080`. This isn't strictly needed, as the port number can be set for each HTTP server in `services.xml`, but with a big application it can be easier to set the default port number just once. Also note that this needs to be set when starting the _configserver_, since the REST api implementation gets its port number from there. | | VESPA\_TLS\_CONFIG\_FILE | Absolute path to [TLS configuration file](../../security/mtls). | | VESPA\_CONFIGSERVER\_JVMARGS | JVM arguments for the config server - see [tuning](../../performance/container-tuning.html#config-server-and-config-proxy). | | VESPA\_CONFIGPROXY\_JVMARGS | JVM arguments for the config proxy - see [tuning](../../performance/container-tuning.html#config-server-and-config-proxy). | | VESPA\_LOG\_LEVEL | Tuning of log output from tools, see [controlling log levels](../../reference/operations/log-files.html#controlling-log-levels). | Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Directories](#directories) - [Processes and ports](#processes-and-ports) - [System limits](#vespa-system-limits) - [Core dumps](#core-dumps) - [Environment variables](#environment-variables) --- # Source: https://docs.vespa.ai/en/operations/enclave/gcp-architecture.html.md # Architecture for Vespa Cloud Enclave in GCP ### Architecture Each Vespa Cloud Enclave in the tenant GCP project corresponds to a Vespa Cloud[zone](../zones.html). Inside the tenant GCP project one enclave is contained within one single [VPC](https://cloud.google.com/vpc/). ![Enclave architecture](/assets/img/vespa-cloud-enclave-gcp.png) #### Compute Instances, Load Balancers, and Cloud Storage buckets Configuration Servers inside the Vespa Cloud zone makes the decision to create or destroy compute instances ("Vespa Hosts" in diagram) based on the Vespa applications that are deployed. The Configuration Servers also set up the Network Load Balancers needed to communicate with the deployed Vespa application. Each Vespa Host will periodically sync its logs to a Cloud Storage bucket ("Log Archive"). This bucket is "local" to the enclave and provisioned by the Terraform module inside the tenant's GCP project. #### Networking The enclave VPC is very network restricted. Vespa Hosts do not have public IPv4 addresses and there is no[NAT gateway](https://cloud.google.com/nat/docs/overview) available in the VPC. Vespa Hosts have public IPv6 addresses and are able to make outbound connections. Inbound connections are not allowed. Outbound IPv6 connections are used to bootstrap communication with the Configuration Servers, and to report operational metrics back to Vespa Cloud. When a Vespa Host is booted it will set up an encrypted tunnel back to the Configuration Servers. All communication between Configuration Servers and the Vespa Hosts will be run over this tunnel after it is set up. ### Security The Vespa Cloud operations team does _not_ have any direct access to the resources that is part of the customer account. The only possible access is through the management APIs needed to run Vespa itself. In case it is needed for, e.g. incident debugging, direct access can only be granted to the Vespa team by the tenant itself. Enabling direct access is done by setting the`enable_ssh` input to true in the enclave module. For further details, see the documentation for the[enclave module inputs](https://registry.terraform.io/modules/vespa-cloud/enclave/google/latest/?tab=inputs). All communication between the enclave and the Vespa Cloud configuration servers is encrypted, authenticated and authorized using[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) with identities embedded in the certificate. mTLS communication is facilitated with the[Athenz](https://www.athenz.io/) service. All data stored is encrypted at rest using[Cloud Key Management](https://cloud.google.com/security-key-management). All keys are managed by the tenant in the tenant's GCP project. The resources provisioned in the tenant GCP project are either provisioned by the Terraform module executed by the tenant, or by the orchestration services inside a Vespa Cloud zone. Resources are provisioned by the Vespa Cloud configuration servers, using the[`vespa_cloud_provisioner_role`](https://github.com/vespa-cloud/terraform-google-enclave/blob/main/main.tf)IAM role defined in the Terraform module. The tenant that registered the GCP project is the only tenant that can deploy applications targeting the enclave. For more general information about security in Vespa Cloud, see the[whitepaper](../../security/whitepaper). Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/operations/enclave/gcp-getting-started.html.md # Getting started with Vespa Cloud Enclave in GCP Setting up Vespa Cloud Enclave requires: 1. Registration at [Vespa Cloud](https://console.vespa-cloud.com), or use a pre-existing tenant. 2. Registration of the GCP project in Vespa Cloud. 3. Running a [Terraform](https://www.terraform.io/) configuration to provision necessary GCP resources in the project. 4. Deployment of a Vespa application. ### 1. Vespa Cloud Tenant setup Register at [Vespa Cloud](https://console.vespa-cloud.com) or use an existing tenant. Note that the tenant must be on a [paid plan](https://vespa.ai/pricing/). ### 2. Onboarding Contact [support@vespa.ai](mailto:support@vespa.ai) stating which tenant should be on-boarded to use Vespa Cloud Enclave. Also include the [GCP Project ID](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects)to associate with the tenant. **Note:** We recommend using a _dedicated_ project for your Vespa Cloud Enclave. Resources in this project will be fully managed by Vespa Cloud. One project can host all your Vespa applications, there is no need for multiple tenants or projects. ### 3. Configure GCP Project The same project used in step two must be prepared for deploying Vespa applications. Use [Terraform](https://www.terraform.io/) to set up the necessary resources using the[modules](https://registry.terraform.io/modules/vespa-cloud/enclave/google/latest)published by the Vespa team. Modify the[multi-region example](https://github.com/vespa-cloud/terraform-google-enclave/blob/main/examples/multi-region/main.tf)for your deployment. If you are unfamiliar with Terraform: It is a tool to manage resources and their configuration in various cloud providers, like AWS and GCP. Terraform has published a[GCP](https://developer.hashicorp.com/terraform/tutorials/gcp-get-started)tutorial, and we strongly encourage enclave users to read and follow the Terraform recommendations for[CI/CD](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform). The Terraform module we provide is regularly updated to add new required resources or extra permissions for Vespa Cloud to automate the operations of your applications. In order for your enclave applications to use the new features you must re-apply your terraform templates with the latest release. The [notification system](../notifications.html)will let you know when a new release is available. ### 4. Deploy a Vespa application By default, all applications are deployed on resources in Vespa Cloud accounts. To deploy in your enclave account, update [deployment.xml](../../reference/applications/deployment.html) to reference the account used in step 1: ``` ``` Useful resources are [getting started](../../basics/deploy-an-application.html)and [migrating to Vespa Cloud](../../learn/migrating-to-cloud) - put _deployment.xml_ next to _services.xml_. ## Next steps After a successful deployment to the [dev](../environments.html#dev) environment, iterate on the configuration to implement your application on Vespa. The _dev_ environment is ideal for this, with rapid deployment cycles. For production serving, deploy to the [prod](../environments.html#prod) environment - follow the steps in [production deployment](../production-deployment.html). ## Enclave teardown To tear down a Vespa Cloud Enclave system, do the steps above in reverse order: 1. [Undeploy the application(s)](../deleting-applications.html) 2. Undeploy the Terraform changes It is important to undeploy the Vespa application(s) first. After running the Terraform, Vespa Cloud cannot manage the resources allocated, so you must clean up these yourself. ## Troubleshooting **Identities restricted by domain**: If your GCP organization is using[domain restriction for identities](https://cloud.google.com/resource-manager/docs/organization-policy/restricting-domains)you will need to permit Vespa.ai GCP identities to be added to your project. For Vespa Cloud the organization ID to allow identities from is: _1056130768533_, and the Google Customer ID is _C00u32w3e_. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [1. Vespa Cloud Tenant setup](#1-vespa-cloud-tenant-setup) - [2. Onboarding](#2-onboarding) - [3. Configure GCP Project](#3-configure-gcp-project) - [4. Deploy a Vespa application](#4-deploy-a-vespa-application) - [Next steps](#next-steps) - [Enclave teardown](#enclave-teardown) - [Troubleshooting](#troubleshooting) --- # Source: https://docs.vespa.ai/en/querying/geo-search.html.md # Geo Search To model a geographical position in documents, use a field where the type is [position](../reference/schemas/schemas.html#position) for a single, required position. To allow any number of positions (including none at all) use `array` instead. This can be used to limit hits (only those documents with a position inside a circular area will be hits), the distance from a point can be used as input to ranking functions, or both. A geographical point in Vespa is specified using the geographical [latitude](https://en.wikipedia.org/wiki/Latitude) and [longitude](https://en.wikipedia.org/wiki/Longitude). As an example, a location in [Sunnyvale, California](https://www.google.com/maps/place/721+1st+Ave,+Sunnyvale,+CA+94089/@37.4181488,-122.0256157,12z) could be latitude 37.4181488 degrees North, longitude 122.0256157 degrees West. This would be represented as `{ "lat": 37.4181488, "lng": -122.0256157 }` in JSON. As seen above, positive numbers are used for north (latitudes) and east (longitudes); negative numbers are used for south and west. This is the usual convention. **Note:** Old formats for position (those used in Vespa 5, 6, and 7) are still accepted as feed input; enabling legacy output is temporarily possible also. See[legacy flag v7-geo-positions](../reference/querying/default-result-format.html#geo-position-rendering). ## Sample schema and document A sample schema could be a business directory, where every business has a position (for its main office or contact point): ``` schema biz { document biz { field title type string { indexing: index } field mainloc typeposition{ indexing: attribute | summary } } fieldset default { fields: title } } ``` Using this schema is one possible business entry with its location: ``` ``` { "put": "id:mynamespace:biz::business-1", "fields": { "title": "Yahoo Inc (main office)", "mainloc": { "lat": 37.4181488, "lng": -122.0256157 } } } ``` ``` ## Restrict The API for adding a geographical restriction is to use a [geoLocation](../reference/querying/yql.html#geolocation) clause in the YQL statement, specifying a point and a maximum distance from that point: ``` $ curl -H "Content-Type: application/json" \ --data '{"yql" : "select * from sources * where title contains \"office\" and geoLocation(mainloc, 37.416383, -122.024683, \"20 miles\")"}' \ http://localhost:8080/search/ ``` One can also build or modify the query programmatically by adding a [GeoLocationItem](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/GeoLocationItem.html) anywhere in the query tree. To use a position for ranking only (without _any_ requirement for a matching position), specify it as a ranking-only term. Use the [rank()](../reference/querying/yql.html#rank) operation in YQL for this, or a [RankItem](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/RankItem.html) when building the query programmatically. At the _same time_, specify a negative radius (for example `-1 m`). This matches any position, and computes distance etc. for the closest position in the document. Example: ``` $ curl -H "Content-Type: application/json" \ --data '{"yql" : "select * from sources * where rank(title contains \"office\", geoLocation(mainloc, 37.416383, -122.024683, \"-1 m\"))"}' \ http://localhost:8080/search/ ``` ## Ranking from a position match The main rank feature to use for the example above would be [distance(mainloc).km](../reference/ranking/rank-features.html#distance(name).km) and doing further calculation on it, giving better rank to documents that are closer to the wanted (query) position. Here one needs to take into consideration what sort of distances is practical; traveling on foot, by car, or by plane should have quite different ranking scales - using different rank profiles would be one natural way to support that. If the query specifies a maximum distance, that could be sent as an input to ranking as well, and used for scaling. There is also a [closeness(mainloc)](../reference/ranking/rank-features.html#closeness(name)) which goes from 1.0 at the exact location to 0.0 at a tunable maximum distance, which is enough for many needs. ### Useful summary-features To do further processing, it may be useful to get the computed distance back. The preferred way to do this is to use the associated rank features as [summary-features](../reference/schemas/schemas.html#summary-features). In particular, [distance(_fieldname_).km](../reference/ranking/rank-features.html#distance(name).km) gives the geographical distance in kilometers, while [distance(_fieldname_).latitude](../reference/ranking/rank-features.html#distance(name).latitude) and [distance(_fieldname_).longitude](../reference/ranking/rank-features.html#distance(name).longitude) gives the geographical coordinates for the best location directly, in degrees. These are easy to use programmatically from a searcher, accessing [feature values in results](../ranking/ranking-expressions-features.html#accessing-feature-function-values-in-results) for further processing. **Note:**`geoLocation` doesn't do proper great-circle-distance calculations. It works well for 'local' search (city or metro area), using simpler distance calculations. For positions which are very distant or close to the international date line (e.g. the Bering sea), the computed results may be inaccurate. ## Using multiple position fields For some applications, it can be useful to have several position attributes that may be searched. For example, we could expand the above examples with the locations of subsidiary offices: ``` schema biz { document biz { field title type string { indexing: index } field mainloc typeposition{ indexing: attribute | summary } field otherlocs typearray\{ indexing: attribute } } fieldset default { fields: title } } ``` Expanding the example business with an office in Australia and one in Norway could look like: ``` ``` { "put": "id:mynamespace:biz::business-1", "fields": { "title": "Yahoo Inc (some offices)", "mainloc": { "lat": 37.4, "lng": -122.0 }, "otherlocs": [ { "lat": -33.9, "lng": 151.2 }, { "lat": 63.4, "lng": 10.4 } ] } } ``` ``` A single query item can only search in one of the position attributes. For a search that spans several fields, use YQL to combine several `geoLocation` items inside an `or` clause, or combine several fields into a combined array field (so in the above example, one could duplicate the "mainloc" position into the "otherlocs" array as well, possibly changing the name from "otherlocs" to "all\_locs"). ## Example with airport positions To give some more example positions, here is a list of some airports with their locations in JSON format: | Airport code | City | Location | | --- | --- | --- | | SFO | San Francisco, USA | { "lat": 37.618806, "lng": -122.375416 } | | LAX | Los Angeles, USA | { "lat": 33.942496, "lng": -118.408048 } | | JFK | New York, USA | { "lat": 40.639928, "lng": -73.778692 } | | LHR | London, UK | { "lat": 51.477500, "lng": -0.461388 } | | SYD | Sydney, Australia | { "lat": -33.946110, "lng": 151.177222 } | | TRD | Trondheim, Norway | { "lat": 63.457556, "lng": 10.924250 } | | OSL | Oslo, Norway | { "lat": 60.193917, "lng": 11.100361 } | | GRU | São Paulo, Brazil | { "lat": -23.435555, "lng": -46.473055 } | | GIG | Rio de Janeiro, Brazil | { "lat": -22.809999, "lng": -43.250555 } | | BLR | Bangalore, India | { "lat": 13.198867, "lng": 77.705472 } | | FCO | Rome, Italy | { "lat": 41.804475, "lng": 12.250797 } | | NRT | Tokyo, Japan | { "lat": 35.765278, "lng": 140.385556 } | | PEK | Beijing, China | { "lat": 40.073, "lng": 116.598 } | | CPT | Cape Town, South Africa | { "lat": -33.971368, "lng": 18.604292 } | | ACC | Accra, Ghana | { "lat": 5.605186, "lng": -0.166785 } | | TBU | Nuku'alofa, Tonga | { "lat": -21.237999, "lng": -175.137166 } | ## Distance to path This example provides an overview of the [DistanceToPath](../reference/ranking/rank-features.html#distanceToPath(name).distance) rank feature. This feature matches _document locations_ to a path given in the query. Not only does this feature return the closest distance for each document to the path, it also includes the length traveled _along_ the path before reaching the closest point, or _intersection_. This feature has been nick named the _gas_ feature because of its obvious use case of finding gas stations along a planned trip. In this example we have been traveling from the US to Bangalore, and we are now planning our trip back. We have decided to rent a car in Bangalore that we are to return upon arrival at the airport in Chennai. We are already quite hungry and wish to stop for a meal once we are outside of town. To avoid having to pay an additional fueling premium, we also wish to refuel just before reaching the airport. We need to figure out what roads to take, what restaurants are available outside of Bangalore, and what fuel stations are available once we get close to Chennai. Here we have plotted our trip from Bangalore to the airport: ![Trip from Bangalore to the airport](/assets/img/geo/path1.png) If we search for restaurants along the path, we only see a small subset of all restaurants present in the window of our quite large map. Here you see how the most relevant results are actually all in Bangalore or Chennai: ![Most relevant results](/assets/img/geo/path2.png) To find the best results, move the map window to just about where we expect to be eating, and redo the search: ![redo search with adjusted map](/assets/img/geo/path3.png) This has to be done similarly for finding a gas station near the airport. This illustrates searching for restaurants in a smaller window along the planned trip without _DistanceToPath_. Next, we outline how _DistanceToPath_ can be used to quickly and easily improve this type of planning to be more convenient for the user. The nature of this feature requires that the search corpus contains documents with position data. A [searcher component](../applications/searchers.html) needs to be written that is able to pass paths with the queries that lie in the same coordinate space as the searchable documents. Finally, a [rank-profile](../basics/ranking.html) needs to defined that scores documents according to how they match some target distance traveled and at the same time lies close "enough" to the path. ### Query Syntax This document does not describe how to write a searcher plugin for the Container, refer to the [container documentation](../applications/searchers.html). However, let us review the syntax expected by _DistanceToPath_. As noted in the [rank features reference](../reference/ranking/rank-features.html#distanceToPath(name).distance), the path is supplied as a query parameter by name of the feature and the `path` keyword: ``` yql=(…)&rankproperty.distanceToPath(_name_).path=(x1,y1,x2,y2,…,xN,yN) ``` Here `name` has to match the name of the position attribute that holds the positions data. The path itself is parsed as a list of `N` coordinate pairs that together form `N-1` line segments: $$(x\_1,y\_1) \rightarrow (x\_2,y\_2), (x\_2,y\_2) \rightarrow (x\_3,y\_3), (…), (x\_{N-1},y\_{N-1}) \rightarrow (x\_N,y\_N)$$ **Note:** The path is _not_ in a readable (latitude, longitude) format, but is a pair of integers in the internal format (degrees multiplied by 1 million). If a transform is required from geographic coordinates to this, the search plugin must do it; note that the first number in each pair (the 'x') is longitude (degrees East or West) while the second ('y') is latitude (degrees North or South), corresponding to the usual orientation for maps - _opposite_ to the usual order of latitude/longitude. ### Rank profile If we were to disregard our scenario for a few moments, we could suggest the following rank profile: ``` rank-profile default { first-phase { expression: nativeRank } second-phase { expression: firstPhase * if (distanceToPath(ll).distance < 10000, 1, 0) } } ``` This profile will first rank all documents according to Vespa's _nativeRank_ feature, and then do a second pass over the top 100 results and order these based on their distance to our path. If a document lies within 100 metres of our path it retains its relevancy, otherwise its relevancy is set to 0. Such a rank profile would indeed solve the current problem, but Vespa's ranking model allows for us to take this a lot further. The following is a rank profile that ranks documents according to a query-specified target distance to path and distance traveled: ``` rank-profile default { first-phase { expression { max(0, query(distance) - distanceToPath(ll).distance) * (1 - fabs(query(traveled) - distanceToPath(ll).traveled)) } } } ``` The expression is two-fold; a first component determines a rank based on the document's distance to the given path as compared to the [query parameter](../reference/ranking/ranking-expressions.html)`distance`. If the allowed distance is exceeded, this component's contribution is 0. The distance contribution is then multiplied by the difference of the actual distance traveled as compared to the query parameter `traveled`. In short, this profile will include all documents that lie close enough to the path, ranked according to their actual distance and traveled measure. **Note:**_DistanceToPath_ is only compatible with _2D coordinates_ because pathing in 1 dimension makes no sense. ### Results For the sake of this example, assume that we have implemented a custom path searcher that is able to pass the path found by the user's initial directions query to Vespa's [query syntax](#query-syntax). There are then two more parameters that must be supplied by the user; `distance` and `traveled`. Vespa expects these parameters to be supplied in a scale compatible with the feature's output, and should probably also be mapped by the container plugin. The feature's _distance_ output is given in Vespa's internal resolution, which is approximately 10 units per meter. The _traveled_ output is a normalized number between 0 and 1, where 0 represents the beginning of the path, and 1 is the end of the path. This illustrates how these parameters can be used to return the most appropriate hits for our scenario. Note that the figures only show the top hit for each query: ![Top tip 1](/assets/img/geo/path4.png) ![Top tip 2](/assets/img/geo/path5.png) 1. Searching for restaurants with the DistanceToPath feature. `distance = 1000, traveled = 0.1` 2. Searching for gas stations with the DistanceToPath feature. `distance = 1000, traveled = 0.9` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Sample schema and document](#sample-schema-and-document) - [Restrict](#restrict) - [Ranking from a position match](#ranking-from-a-position-match) - [Useful summary-features](#useful-summary-features) - [Using multiple position fields](#using-multiple-position-fields) - [Example with airport positions](#example-with-airport-positions) - [Distance to path](#distance-to-path) - [Query Syntax](#query-syntax) - [Rank profile](#rank-profile) - [Results](#results) --- # Source: https://docs.vespa.ai/en/learn/glossary.html.md # Glossary This is a glossary of both Vespa-specific terminology, and general terms useful in this context. * * * - **Application** - **Attribute** - **Boolean Search** - **Cluster** - **Component** - **Configuration Server** - **Container** - **Content Node** - **Control Plane** - **Data Plane** - **Deploy** - **Deployment** - **Diversity** - **Docker** - **Document** - **Document frequency (normalized)** - **Document summary** - **Document Processor** - **Document Type** - **Elasticity** - **Enclave** - **Embedding** - **Estimated hit ratio** - **Federation** - **Field** - **Fieldset** - **Garbage Collection** - **Grouping** - **Handler** - **Indexing** - **Instance** - **Namespace** - **Nearest neighbor search** - **Node** - **Parent / Child** - **Partial Update** - **Posting List** - **Quantization** - **Query** - **Ranking** - **Schema** - **Searcher** - **Semantic search** - **Service** - **Streaming search** - **Tenant** - **Tensor** - **Visit** Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/performance/graceful-degradation.html.md # Graceful Query Coverage Degradation Ideally you want to query all data indexed in a Vespa cluster within the specified timeout, but that might not always be possible: - The system might be overloaded due to capacity constraints, and queries do not complete within the timeout, as they are sitting in a queue waiting for a resource. - A complex query might take longer time to execute than the specified timeout, or the timeout is too low given the complexity of the query and available resource capacity. This document describes how Vespa could gracefully degrade the result set if the query cannot be completed within the timeout specified. Definitions: - **Coverage**: The percentage of documents indexed which were evaluated by the query. The ideal coverage is 100%. - **Timeout**: The total time a query is allowed to run for, see [timeout](../reference/api/query.html#timeout) (default 500 ms). Vespa is a distributed system where multiple components are involved in the query execution. - **Soft Timeout**: Soft timeout allows coverage to be less than 100%, but larger than 0% if the query is approaching timeout. Soft timeout might also be considered as an _early termination_ technique, and is enabled by default. Refer to [ranking.softtimeout.enable](../reference/api/query.html#ranking.softtimeout.enable). ## Detection The default JSON renderer template will always render a _coverage_ element below the root element, which has a _degraded_ element if the query execution was degraded in some way and the _coverage_ field will be less than 100. Example request with a query timeout of 200 ms and _ranking.softtimeout.enable=true_: ``` /search/?searchChain=vespa&yql=select * from sources * where foo contains bar&presentation.format=json&timeout=200ms&ranking.softtimeout.enable=true ``` ``` ``` { "root": { "coverage": { "coverage": 99, "degraded": { "adaptive-timeout": false, "match-phase": false, "non-ideal-state": false, "timeout": true }, "documents": 167006201, "full": false, "nodes": 11, "results": 1, "resultsFull": 0 }, "fields": { "totalCount": 16469732 } } } ``` ``` The result was delivered in 200 ms but the query was degraded as coverage is less than 100. In this case, 167,006,201 out of x documents where queried, and 16,469,732 documents where matched and ranked, using the first-phase ranking expression in the default rank profile. The _degraded_ field contains the following fields which explains why the result had coverage less than 100: - _adaptive-timeout_ is true if [adaptive node timeout](#adaptive-node-timeout) has been enabled, and one or more nodes fail to produce a result at all within the timeout. This could be caused by nodes with degraded hardware making them slower than peers in the cluster. - _match-phase_ is true if the rank profile has defined [match phase ranking degradation](../reference/schemas/schemas.html#match-phase). Match-phase can be used to control which documents are ranked within the timeout. - _non-ideal-state_ is true in cases where the system is not in [ideal state](../content/idealstate.html). This case is extremely rare. - _timeout_ is true if softtimeout was enabled, and not all documents could be matched and ranked within the query timeout. Note that the degraded reasons are not mutually exclusive. In the example, the softtimeout was triggered and only 99% of the documents where queried before the time budget ran out. One could imagine scenarios where 10 out of 11 nodes involved in the query execution were healthy and triggered soft timeout and delivered a result, while the last node was in a bad state (e.g. hw issues) and could not produce a result at all, and that would cause both _timeout_ and _adaptive-timeout_ to be true. When working on Results in a [Searcher](../applications/searchers.html), get the coverage information programmatically: ``` ``` @Override public Result search(Query query, Execution execution) { Result result = execution.search(query); Coverage coverage = result.getCoverage(false); if (coverage != null && coverage.isDegraded()) { logger.warning("Got a degraded result for query " + query + " : " + coverage.getResultPercentage() + "% was searched"); } return result; } ``` ``` ## Adaptive node timeout For a content cluster with [flat](sizing-search.html#data-distribution) data distribution, query performance is no better than the slowest node. The worst case scenario happens when a node in the cluster is experiencing underlying HW issues. In such a state, a node might answer health checks and pings, but still not be able to serve queries within the timeout. Using [adaptive coverage](../reference/applications/services/content.html#coverage) allows ignoring slow node(s). The following example demonstrates how to use adaptive timeout. The example uses a flat content cluster with 10 nodes: ``` ``` 0.9 0.2 0.3 ``` ``` - Assuming using the default vespa timeout of 500ms, the stateless container dispatches the query to all 10 nodes in parallel and waits until 9 out of 10 have replied (minimum coverage 0.9). - Assuming 9 could respond in 100ms, there is 400ms left. The dispatcher then waits minimum 80 ms (0.2\*400ms) for the last node to respond, and at maximum 120 (0.3\*400ms) before giving up waiting for the slowest node and return the result. - The min wait setting is used to allow some per node response time variance. Using min wait 0 will cause the query to return immediately when min coverage has been reached (9 out of 10 nodes replied). A higher than 0 value for min allows a node to be slightly slower than the peers and overall still reach 100% coverage. ## Match phase degradation Refer to the [match-phase reference](../reference/schemas/schemas.html#match-phase). Concrete examples of using match phase is found in the [practical performance guide](practical-search-performance-guide#match-phase-limit---early-termination). Match-phase works by specifying an `attribute` that measures document quality in some way (popularity, click-through rate, pagerank, ad bid value, price, text quality). In addition, a `max-hits` value is specified that specifies how many hits are "more than enough" for the application. Then an estimate is made after collecting a reasonable amount of hits for the query, and if the estimate is higher than the configured `max-hits` value, an extra limitation is added to the query, ensuring that only the highest quality documents can become hits. In effect, this limits the documents actually queried to the highest quality documents, a subset of the full corpus, where the size of subset is calculated in such a way that the query is estimated to give `max-hits` hits. Since some (low-quality) hits will already have been collected to do the estimation, the actual number of hits returned will usually be higher than max-hits. But since the distribution of documents isn't perfectly smooth, you risk sometimes getting less than the configured `max-hits` hits back. Note that limiting hits in the match-phase also affects [aggregation/grouping](../querying/grouping.html), and total-hit-count since it actually limits, so the query gets fewer hits. Also note that it doesn't really make sense to use this feature together with a [WAND operator](../ranking/wand.html) that also limit hits, since they both operate in the same manner, and you would get interference between them that could cause unpredictable results. The graph shows possible hits versus actual hits in a corpus with 100 000 documents, where `max-hits` is configured to 10 000. The corpus is a synthetic (slightly randomized) data set, in practice the graph will be less smooth: ![Plot of possible vs. actual hits](/assets/img/relevance/match-phase-max-hits.png) There is a content node metric per rank-profile named_content.proton.documentdb.matching.rank\_profile.limited\_queries_which can be used to see how many of the queries are actually affected by these settings; compare with the corresponding _content.proton.documentdb.matching.rank\_profile.queries_ metric to measure the percentage. ### Match Phase Tradeoffs There are some important things to consider before using _match-phase_. In a normal query scenario, latency is directly proportional to the number of hits the query matches: a query that matches few documents will have low latency and a query that matches many documents will have high latency. Match-phase has the **opposite** effect. This means that if you have queries that match few documents, match-phase might make these queries significantly slower. It might actually be faster to run the query without the filter. Example: Lets say you have a corpus with a document attribute named _created\_time_. For all queries you want the newest content surfaced, so you enable match-phase on _created\_time_. So far, so good - you get a great latency and always get your top-k hits. The problem might come if you introduce a filter. If you have a filter saying you only want documents from the last day, then match-phase can become suboptimal and in some cases much worse than running without match-phase. By design, Vespa will evaluate potential matches for a query by the order of their internal documentid. This means it will start evaluating documents in the order they were indexed on the node, and for most use-cases that means the oldest documents first. Without a filter, every document is a potential match, and match-phase will quickly figure out how it can optimize. With the filter, on the other hand, the algorithm need to evaluate almost the full corpus before it reaches potential matches (1 day old corpus), and because of the way the algorithm is implemented, end up with doing a lot of unnecessary work and can have orders of magnitude higher latencies than running the query without the filter. Another important thing to mention is that the reported total-hits will be different when doing queries with match-phase enabled. This is because match-phase works on an estimated "virtual" corpus, which might have much fewer hits than is actually in the full corpus. If used correctly match-phase can be a life-saver, however, it is not a straight forward fix-it-all silver bullet. Please test and measure your use of match-phase, and contact the Vespa team if your results are not what you expect. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/reference/querying/grouping-language.html.md # Grouping Reference Read the [Vespa grouping guide](/en/querying/grouping.html) first, for examples and an introduction to grouping - this is the Vespa grouping reference. Also note that using a [multivalued](/en/searching-multi-valued-fields.html) attribute (such as an array of doubles) in a grouping expression affects performance. Such operations can hit a memory bandwidth bottleneck, particularly if the set of hits to be processed is large, as more data is evaluated. ## Group Group query results using a custom expression (using the `group` clause): - A numerical or string constant (e.g., `group(1)` or `group("all")`) which makes one bucket with everything - A document [attribute](../../content/attributes.html) - A function over another expression (`xorbit`, `md5`, `cat`, `xor`, `and`, `or`, `add`, `sub`, `mul`, `div`, `mod`) or any other [expression](#expressions) - The datatype of an expression is resolved using best-effort, similarly to how common programming languages do to resolve arithmetic of different data-typed operands - The results of any expression are either scalar or single-dimensional arrays - `add()` adds all elements together to produce a scalar - `add(, )` adds each element together producing a new array whose size is `max(||, ||)` Groups can contain subgroups (by using `each` and `group` operations), and may be nested to any level. Multiple sub-groupings or outputs can be created under the same group level, using multiple parallel `each`or `all` clauses, and each one may be labelled using [as(mylabel)](#labels). When grouping results, _groups_ that contain _outputs_,_group lists_ and _hit lists_ are generated. Group lists contain subgroups, and hit lists contain hits that are part of the owning group. The identity of a group is held by its _id_. Scalar identities such as long, double, and string, are directly available from the _id_, whereas range identities used for bucket aggregation are separated into the sub-nodes _from_ and _to_. Refer to the [result format reference](default-result-format.html). ### Multivalue attributes A [multivalue](../../querying/searching-multivalue-fields) attribute is a[weighted set](../schemas/schemas.html#weightedset),[array](../schemas/schemas.html#array) or[map](../schemas/schemas.html#map). Most grouping functions will just handle the elements of multivalued attributes separately, as if they were all individual values in separate documents. If you are grouping over array of struct or maps, scoping will be used to preserve structure. Each entry in the array/map will be treated as a separate sub-document. The following syntax can be used when grouping on _map_ attribute fields. Group on map keys: ``` all( group(mymap.key) each(output(count())) ) ``` Group on map keys then on map values: ``` all( group(mymap.key) each( group(mymap.value) each(output(count())) )) ``` Group on values for key _my\_key_: ``` all( group(my_map{"my_key"}) each(output(count())) ) ``` Group on struct field _my\_field_ referenced in map element _my\_key_: ``` all( group(my_map{"my_key"}.my_field) each(output(count())) ) ``` The key can either be specified directly (above) or indirectly via a key source attribute. The key is retrieved from the key source attribute for each document. Note that the key source attribute must be single-valued and have the same data type as the key type of the map: ``` all( group(my_map{attribute(my_key_source)}) each(output(count())) ) ``` Group on array of integers field: ``` all( group(my_array) each(output(count())) ) ``` Group on struct field _my\_field_ in the _my\_array_ array of structs: ``` all( group(my_array.my_field) each(output(count())) ) ``` [Tensors](../schemas/schemas.html#tensor) can not be used in grouping. ## Filtering groups When grouping on multivalue attributes, it may be useful to filter the groups so that only some specific values are collected. This can be done by adding a filter. The `filter` clause expects a filter _predicate_: - [regex("regular expression", input-expression)](#regex-filter) - [range(min-limit, max-limit, input-expression)](#range-filter) - [range(min-limit, max-limit, input-expression, bool, bool)](#range-filter) - [not _predicate_](#logical-predicates-filter) - [_predicate_ and _predicate_](#logical-predicates-filter) - [_predicate_ or _predicate_](#logical-predicates-filter) ### Regex filter Use a regular expression to match the input, and include only documents that match in the grouping. The input will usually be the same -expression as in the "group" clause. Example: ``` all( group(my_array) filter(regex("foo.*", my_array)) ...) ``` Here only the values that start with a "foo" prefix in _my\_array_ are collected into groups, all others are ignored. See also [this example](../../querying/grouping.html#structured-grouping). Regex filtering works on the string representation of any field type. For example, you can also filter on boolean values using regex(true) and regex(false). ### Range filter Use a `range` filter to match documents where a field value is between a lower and an upper bound. Example: ``` all( group(some_field) filter(range(1990, 2012, year)) ...) ``` Here, the lower bound is _inclusive_ (year ≥ 1990) and the upper bound is _exclusive_ (year \< 2012). Use optional bools at the end to control if the lower and upper bounds are inclusive, respectively. The first bool sets the lower bound inclusive, and the second sets the upper bound inclusive. ``` all( group(some_field) filter(range(1990, 2012, year, true, true)) ...) ``` Here, both lower and upper bounds are inclusive. ### Logical predicates Use `not` to negate another filter expression. It takes a single sub-filter and matches when the sub-filter does not. Example: ``` all( group(my_field) filter( not regex("bar.*", my_other_field)) ...) ``` Use `or` to perform a logical disjunction across two sub-filters. The combined filter matches if any of the sub-filters evaluate to true. Example: ``` all( group(my_field) filter( regex("bar.*", my_field) or regex("baz.*", my_third_field) ) ...) ``` Use `and` to perform a logical conjunction across two sub-filters. The combined filter matches only if all of the sub-filters evaluate to true. Example: ``` all( group(my_field) filter( regex("bar.*", my_other_field) and regex("baz.*", my_third_field) ) ...) ``` These logical predicates can be nested to create complex filter conditions. Filter expressions follow _conventional precedence_rules: `not` is evaluated before `and`, and `and` is evaluated before `or`. Operators of the same precedence are evaluated left-to-right. Use parentheses `(...)` to force a different grouping when needed. Example: ``` all( group(my_field) filter( (regex("bar.*", some_field) or regex("baz.*", other_field)) and not regex(".*foo", some_field)) each(...) ) ``` ## Order / max Each level of grouping may specify how to order its groups (using `order`): - Ordering can be done using any of the available aggregates - Multi-level grouping allows strict ordering where primary aggregates may be equal - Ordering is either ascending or descending, specified per level of ordering - Groups are sorted using [locale aware sorting](#uca) Limit the number of groups returned for each level using `max`, returning only first _n_ groups as specified by `order`: - `order` changes the ordering of groups after a merge operation for the following aggregators: `count`, `avg` and ` sum` - `order` **will not** change ordering of groups after a merge operation when `max` or `min` is used - Default order, `-max(relevance())`, **does not** require use of [precision](#precision) ## Continuations Pagination of grouping results is managed by `continuations`. These are opaque objects that can be combined and resubmitted using the `continuations` annotation on the grouping step of the query to move to the previous or next page in a result list. All root groups contain a single _this_ continuation per `select`. That continuation represents the current view, and if submitted as the sole continuation, it will reproduce the exact same result as the one that contained it. There are zero or one _prev_/_next_ continuation per group- and hit list. Submit any number of these to retrieve the next/previous pages of the corresponding lists Any number of continuations can be combined in a query, but the first must always be the _this_-continuation. E.g., one may simultaneously move both to the next page of one list, and the previous page of another. **Note:** If more than one continuation object is provided for the same group- or hit-list, the one given last is the one that takes effect. This is because continuations are processed in the order given, and they replace whatever continuations they collide with. If working programmatically with grouping, find the[Continuation](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/Continuation.html)objects within[RootGroup](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/result/RootGroup.html),[GroupList](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/result/GroupList.html) and[HitList](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/result/HitList.html)result objects. These can then be added back into the continuation list of the[GroupingRequest](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/GroupingRequest.html)to paginate. Refer to the [grouping guide](../../querying/grouping.html#pagination) for an example. ## Labels Lists created using the `each` keyword can be assigned a label using the construct `each(...) as(mylabel)`. The outputs created by each clause will be identified by this label. ## Aliases Grouping expressions can be tagged with an _alias_. An alias allows the expression to be reused without having to repeat the expression verbatim. ``` all(group(a) alias(myalias, count()) each(output($myalias))) ``` is equivalent to ``` all(group(a) each(output(count()))) ``` . ``` all(group(a) order($myalias=count()) each(output($myalias))) ``` is equivalent to ``` all(group(a) order(count()) each(output(count()))) ``` . ## Precision The number of intermediate groups returned from each content node during expression evaluation to give the container node more data to consider when selecting the groups that are to be evaluated further:`each(...) precision(1000)`A higher number costs more bandwidth, but leads to higher accuracy in some cases. ## Query parameters The following _query parameters_ are relevant for grouping. See the [Query API Reference](../api/query.html#parameters) for description. - [select](../api/query.html#select) - [groupingSessionCache](../api/query.html#groupingsessioncache) - [grouping.defaultMaxGroups](../api/query.html#grouping.defaultmaxgroups) - [grouping.defaultMaxHits](../api/query.html#grouping.defaultmaxhits) - [grouping.globalMaxGroups](../api/query.html#grouping.globalmaxgroups) - [grouping.defaultPrecisionFactor](../api/query.html#grouping.defaultprecisionfactor) ## Grouping Session Cache **Important:** The grouping session cache is **only useful if** the grouping expression uses default ordering. The **drawback** is that when `max` is specified in the grouping expression, it might cause inaccuracies in aggregated values such as `count`. It is recommended to test whether this is an issue or not, and if so, adjust the `precision` parameter to still get correct counts. The session cache stores intermediate grouping results in the content nodes when using multi-level grouping expressions, in order to speed up grouping at a potential loss of accuracy. This causes the query and grouping expression to be run only once. When having multi-level grouping expressions, the search query is normally re-run for each level. The drawback of this is that, with an expensive ranking function, the query will take more time than strictly necessary. ## Aggregators Each level of grouping specifies a set of aggregates to collect for all documents that belong to that group (using the `output` operation): - The documents in a group, retrieved using a specified summary class - The count of documents in a group - The sum, average, min, max, xor or standard deviation of an expression - Multiple quantiles of an expressions value When all arguments are numeric, the result type is resolved by looking at the argument types. If all arguments are longs, the result is a long. If at least one argument is a double, the result is a double. When using `order`, aggregators can also be used in expressions in order to get increased control over group sorting. This does not work with expressions that take attributes as an argument, unless the expression is enclosed within an aggregator. Using sum, max on a multivalued attribute: Doing an operation such as `output(sum(myarray))` will run the sum over each element value in each document. The result is the sum of sums of values. Similarly, `max(myarray)` will yield the maximal element over all elements in all documents, and so on. Compute quantiles by listing the desired quantile values (comma-separated) in brackets, followed by a comma and the expression (e.g., a field): ``` all( group(city) each(output(quantiles([0.5], delivery_days) as(median_delivery_days) ) ) ) ``` to compute the median, or ``` all( group(city) each(output(quantiles([0.5, 0.9], delivery_days))) ) ``` This computes the median (p50) and 90th percentile (p90) time to delivery in days per city. Note that quantiles are computed using [KLL Sketch](https://datasketches.apache.org/docs/KLL/KLLSketch.html), so they are approximate. Multivalue fields, such as maps and arrays, can be used for grouping. However, using aggregation functions such as sum() on such fields can give misleading results. Assume a map from strings to integers (`map`), where the strings are some sort of key to use for grouping. The following expression will provide the sum of the values for all keys: ``` all( group(mymap.key) each(output(sum(mymap.value))) ) ``` and not the sum of the values within each key, as one would expect. It is still, however, possible to run the following expression to get the sum of values within a specific key: ``` all( group("my_group") each(output(sum(mymap{"foo"}))) ) ``` Refer to the system test for[grouping on struct and map types](https://github.com/vespa-engine/system-test/blob/master/tests/search/struct_and_map_types/struct_and_map_grouping.rb)for more examples. | ### Group list aggregators | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | count | Counts the number of unique groups (as produced by `group`). Note that `count` operates independently of `max` and that this count is an estimate using HyperLogLog++ which is an algorithm for the count-distinct problem | None | Long | | ### Group aggregators | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | count | Increments a long counter every time it is invoked | None | Long | | sum | Sums the argument over all selected documents | Numeric | Numeric | | avg | Computes the average over all selected documents | Numeric | Numeric | | min | Keeps the minimum value of selected documents | Numeric | Numeric | | max | Keeps the maximum value of selected documents | Numeric | Numeric | | xor | XOR the values (their least significant 64 bits) of all selected documents | Any | Long | | stddev | Computes the population standard deviation over all selected documents | Numeric | Double | | quantiles | Computes one or multiple quantiles of the values of an expression. Quantiles must be a number between 0 and 1 inclusive. | [Numeric+], Expr | [{"quantile":Double,"value":Double}+] | | ### Hit aggregators | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | summary | Produces a summary of the requested [summary class](/en/reference/schemas/schemas.html#document-summary) | Name of summary class | Summary | ## Expressions | ### Arithmetic expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | add | Add the arguments together | Numeric+ | Numeric | | + | Add left and right argument | Numeric, Numeric | Numeric | | mul | Multiply the arguments together | Numeric+ | Numeric | | \* | Multiply left and right argument | Numeric, Numeric | Numeric | | sub | Subtract second argument from first, third from result, etc | Numeric+ | Numeric | | - | Subtract right argument from left | Numeric, Numeric | Numeric | | div | Divide first argument by second, result by third, etc | Numeric+ | Numeric | | / | Divide left argument by right | Numeric, Numeric | Numeric | | mod | Modulo first argument by second, result by third, etc | Numeric+ | Numeric | | % | Modulo left argument by right | Numeric, Numeric | Numeric | | neg | Negate argument | Numeric | Numeric | | - | Negate right argument | Numeric | Numeric | | ### Bitwise expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | and | AND the arguments in order | Long+ | Long | | or | OR the arguments in order | Long+ | Long | | xor | XOR the arguments in order | Long+ | Long | | ### String expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | strlen | Count the number of bytes in argument | String | Long | | strcat | Concatenate arguments in order | String+ | String | | ### Type conversion expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | todouble | Convert argument to double | Any | Double | | tolong | Convert argument to long | Any | Long | | tostring | Convert argument to string | Any | String | | toraw | Convert argument to raw | Any | Raw | | ### Raw data expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | cat | Cat the binary representation of the arguments together | Any+ | Raw | | md5 | Does an MD5 over the binary representation of the argument, and keeps the lowest 'width' bits | Any, Numeric(width) | Raw | | xorbit | Does an XOR of 'width' bits over the binary representation of the argument. Width is rounded up to a multiple of 8 | Any, Numeric(width) | Raw | | ### Accessor expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | relevance | Return the computed rank of a document | None | Double | | \ | Return the value of the named attribute | None | Any | | array.at | Array element access. The expression `array.at(myarray, idx)` returns one value per document by evaluating the `idx` expression and using it as an index into the array. The expression can then be used to build bigger expressions such as `output(sum(array.at(myarray, 0)))` which will sum the first element in the array of each document. - The `idx` expression is capped to `[0, size(myarray)-1]` - If \> array size, the last element is returned - If \< 0, the first element is returned | Array, Numeric | Any | | interpolatedlookup | Counts elements in a sorted array that are less than an expression, with linear interpolation if the expression is between element values. The operation `interpolatedlookup(myarray, expr)` is intended for generic graph/function lookup. The data in `myarray` should be numerical values sorted in ascending order. The operation will then scan from the start of the array to find the position where the element values become equal to (or greater than) the value of the `expr` lookup argument, and return the index of that position. When the lookup argument's value is between two consecutive array element values, the returned position will be a linear interpolation between their respective indexes. The return value is always in the range `[0, size(myarray)-1]` of the valid index values for an array. Assume `myarray` is a sorted array of type `array` in each document: The expression `interpolatedlookup(myarray, 4.2)` is now a per-document expression that first evaluates the lookup argument, here a constant expression 4.2, and then looks at the contents of `myarray` in the document. The scan starts at the first element and proceeds until it hits an element value greater than 4.2 in the array. This means that: - If the first element in the array is greater than 4.2, the expression returns 0 - If the first element in the array is exactly 4.2, the expression still returns 0 - If the first element in the array is 1.7 while the **second** element value is exactly 4.2, the expression returns 1.0 - the index of the second element - If **all** the elements in the array are less than 4.2, the last valid array index `size(myarray)-1` is returned - If the first 5 elements in the array have values smaller than the lookup argument, and the lookup argument is halfway between the fifth and sixth element, a value of 4.5 is returned - halfway between the array indexes of the fifth and sixth elements - Similarly, if the elements in the array are `{0, 1, 2, 4, 8}` then passing a lookup argument of "5" would return 3.25 (linear interpolation between `indexOf(4)==3` and `indexOf(8)==4`) | Array, Numeric | Numeric | | ### Bucket expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | fixedwidth | Maps the value of the first argument into consecutive buckets whose width equals the second argument | Any, Numeric | NumericBucketList | | predefined | Maps the value of the first argument into the given buckets. - Standard mathematical start and end specifiers may be used to define the width of a `bucket`. The `(` and `)` evaluates to `[` and `>` by default. - The buckets assume the type of the start/end specifiers (`string`, `long`, `double` or `raw`). Values are converted to this type before being compared with these specifiers. (e.g., `double` values are rounded to the nearest integer for buckets of type `long`). - The end specifier can be skipped. The buckets `bucket(3)`/`bucket[3]` are the same as `bucket[3,4>`. This is allowed for string expressions as well; `bucket("c")` is identical to `bucket["c", "c ">`. | Any, Bucket+ | BucketList | | ### Time expressions The field must be a [long](../schemas/schemas.html#long), with second resolution (unix timestamp/epoch) - [examples](../../querying/grouping.html#time-and-date). Each of the time-functions will respect the [timezone](../api/query.html#timezone) query parameter. | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | time.dayofmonth | Returns the day of month (1-31) for the given timestamp | Long | Long | | time.dayofweek | Returns the day of week (0-6) for the given timestamp, Monday being 0 | Long | Long | | time.dayofyear | Returns the day of year (0-365) for the given timestamp | Long | Long | | time.hourofday | Returns the hour of day (0-23) for the given timestamp | Long | Long | | time.minuteofhour | Returns the minute of hour (0-59) for the given timestamp | Long | Long | | time.monthofyear | Returns the month of year (1-12) for the given timestamp | Long | Long | | time.secondofminute | Returns the second of minute (0-59) for the given timestamp | Long | Long | | time.year | Returns the full year (e.g. 2009) of the given timestamp | Long | Long | | time.date | Returns the date (e.g. 2009-01-10) of the given timestamp | Long | Long | | ### List expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | size | Return the number of elements in the argument if it is a list. If not return 1 | Any | Long | | sort | Sort the elements in the argument in ascending order if the argument is a list. If not, it is a NOP | Any | Any | | reverse | Reverse the elements in the argument if the argument is a list. If not, it is a NOP | Any | Any | | ### Other expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | zcurve.x | Returns the X component of the given [zcurve](https://en.wikipedia.org/wiki/Z-order_curve) encoded 2d point. All fields of type "position" have an accompanying "\\_zcurve" attribute that can be decoded using this expression, e.g. `zcurve.x(foo_zcurve)` | Long | Long | | zcurve.y | Returns the Y component of the given zcurve encoded 2d point | Long | Long | | uca | Converts the attribute string using [unicode collation algorithm](https://www.unicode.org/reports/tr10/). Groups are sorted using locale-aware sorting, with the default and primary strength values, respectively: ``` all( group(s) order(max(uca(s, "sv"))) each(output(count())) ) ``` ``` all( group(s) order(max(uca(s, "sv", "PRIMARY"))) each(output(count())) ) ``` | Any, Locale(String), Strength(String) | Raw | | ### Single argument standard mathematical expressions These are the standard mathematical functions as found in the Java [Math](https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html) class. | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | math.exp |   | Double | Double | | math.log |   | Double | Double | | math.log1p |   | Double | Double | | math.log10 |   | Double | Double | | math.sqrt |   | Double | Double | | math.cbrt |   | Double | Double | | math.sin |   | Double | Double | | math.cos |   | Double | Double | | math.tan |   | Double | Double | | math.asin |   | Double | Double | | math.acos |   | Double | Double | | math.atan |   | Double | Double | | math.sinh |   | Double | Double | | math.cosh |   | Double | Double | | math.tanh |   | Double | Double | | math.asinh |   | Double | Double | | math.acosh |   | Double | Double | | math.atanh |   | Double | Double | | ### Dual argument standard mathematical expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | math.pow | Return X^Y. | Double, Double | Double | | math.hypot | Return length of hypotenuse given X and Y sqrt(X^2 + Y^2) | Double, Double | Double | ## Filters | ### String filters | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | regex | Matches a field against a regular expression string. | String, Expression | Bool | | ### Numeric filters | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | range | Matches when a field is between a lower and upper bound. | Numeric, Numeric, Expression, Bool?, Bool? | Bool | | ### Predicate filters | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | and | Logical `and` between the arguments. | Filter, Filter | Bool | | not | Logical `not` on the argument. | Filter | Bool | | or | Logical `or` between the arguments. | Filter, Filter | Bool | ## Grouping language grammar ``` request ::= "all(" operations ")" group ::= ( "all" | "each") "(" operations ")" ["as" "(" identifier ")"] operations ::= ["group" "(" exp ")"] ( ( "alias" "(" identifier "," exp ")" ) | ( "filter" "(" filterOp ")" ) | ( "max" "(" ( number | "inf" ) ")" ) | ( "order" "(" expList | aggrList ")" ) | ( "output" "(" aggrList ")" ) | ( "precision" "(" number ")" ) )* group* aggrList ::= aggr ( "," aggr )* aggr ::= ( ( "count" "(" ")" ) | ( "sum" "(" exp ")" ) | ( "avg" "(" exp ")" ) | ( "max" "(" exp ")" ) | ( "min" "(" exp ")" ) | ( "xor" "(" exp ")" ) | ( "stddev" "(" exp ")" ) | ( "summary" "(" [identifier] ")" ) ) ["as" "(" identifier ")"] expList ::= exp ( "," exp )* exp ::= ( "+" | "-") ( "$" identifier ["=" math] ) | ( math ) | ( aggr ) filterOp ::= "regex" "(" string "," exp ")" math ::= value [( "+" | "-" | "*" | "/" | "%" ) value] value ::= ( "(" exp ")" ) | ( "add" "(" expList ")" ) | ( "and" "(" expList ")" ) | ( "cat" "(" expList ")" ) | ( "div" "(" expList ")" ) | ( "docidnsspecific" "(" ")" ) | ( "fixedwidth" "(" exp "," number ")" ) | ( "interpolatedlookup" "(" attributeName "," exp ")") | ( "math" "." ( ( "exp" | "log" | "log1p" | "log10" | "sqrt" | "cbrt" | "sin" | "cos" | "tan" | "asin" | "acos" | "atan" | "sinh" | "cosh" | "tanh" | "asinh" | "acosh" | "atanh" ) "(" exp ")" | ( "pow" | "hypot" ) "(" exp "," exp ")" )) | ( "max" "(" expList ")" ) | ( "md5" "(" exp "," number "," number ")" ) | ( "min" "(" expList ")" ) | ( "mod" "(" expList ")" ) | ( "mul" "(" expList ")" ) | ( "or" "(" expList ")" ) | ( "predefined" "(" exp "," "(" bucket ( "," bucket )* ")" ")" ) | ( "reverse" "(" exp ")" ) | ( "relevance" "(" ")" ) | ( "sort" "(" exp ")" ) | ( "strcat" "(" expList ")" ) | ( "strlen" "(" exp ")" ) | ( "size" "(" exp")" ) | ( "sub" "(" expList ")" ) | ( "time" "." ( "date" | "year" | "monthofyear" | "dayofmonth" | "dayofyear" | "dayofweek" | "hourofday" | "minuteofhour" | "secondofminute" ) "(" exp ")" ) | ( "todouble" "(" exp ")" ) | ( "tolong" "(" exp ")" ) | ( "tostring" "(" exp ")" ) | ( "toraw" "(" exp ")" ) | ( "uca" "(" exp "," string ["," string] ")" ) | ( "xor" "(" expList ")" ) | ( "xorbit" "(" exp "," number ")" ) | ( "zcurve" "." ( "x" | "y" ) "(" exp ")" ) | ( attributeName "." "at" "(" number ")") | ( attributeName ) bucket ::= "bucket" ( "(" | "[" | "<" ) ( "-inf" | rawvalue | number | string ) ["," ( "inf" | rawvalue | number | string )] ( ")" | "]" | ">" ) rawvalue ::= "{" ( ( string | number ) "," )* "}" ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Group](#group) - [Multivalue attributes](#multivalue-attributes) - [Filtering groups](#filtering-groups) - [Regex filter](#regex-filter) - [Range filter](#range-filter) - [Logical predicates](#logical-predicates-filter) - [Order / max](#order) - [Continuations](#continuations) - [Labels](#labels) - [Aliases](#aliases) - [Precision](#precision) - [Query parameters](#query-parameters) - [Grouping Session Cache](#grouping-session-cache) - [Aggregators](#aggregators) - [Group list aggregators](#group-list-aggregators) - [Group aggregators](#group-aggregators) - [Hit aggregators](#hit-aggregators) - [Expressions](#expressions) - [Arithmetic expressions](#arithmetic-expressions) - [Bitwise expressions](#bitwise-expressions) - [String expressions](#string-expressions) - [Type conversion expressions](#type-conversion-expressions) - [Raw data expressions](#raw-data-expressions) - [Accessor expressions](#accessor-expressions) - [Bucket expressions](#bucket-expressions) - [Time expressions](#time-expressions) - [List expressions](#list-expressions) - [Other expressions](#other-expressions) - [Single argument standard mathematical expressions](#single-argument-standard-mathematical-expressions) - [Dual argument standard mathematical expressions](#dual-argument-standard-mathematical-expressions) - [Filters](#filters) - [String filters](#string-filters) - [Numeric filters](#numeric-filters) - [Predicate filters](#bitwise-expressions) - [Grouping language grammar](#grouping-language-grammar) --- # Source: https://docs.vespa.ai/en/querying/grouping.html.md # Grouping and aggregation ## Grouping Interface Try running requests on the [grouping example data](https://github.com/vespa-engine/sample-apps/blob/master/examples/part-purchases-demo/ext/feed.jsonl): all( group(customer) each(output(sum(price))) ) Run Grouping The Vespa grouping language is a list-processing language which describes how the query hits should be grouped, aggregated, and presented in result sets. A grouping statement takes the list of all matches to a query as input and groups/aggregates it, possibly in multiple nested and parallel ways to produce the output. This is a logical specification and does not indicate how it is executed, as instantiating the list of all matches to the query somewhere would be too expensive, and execution is distributed instead. Refer to the [Query API reference](../reference/api/query.html#select) for how to set the _select_ parameter, and the [Grouping reference](../reference/querying/grouping-language.html) for details. Fields used in grouping must be defined as [attribute](../content/attributes.html) in the document schema. Grouping supports continuation objects for [pagination](#pagination). The [Grouping Results](https://github.com/vespa-engine/sample-apps/tree/master/examples/part-purchases-demo) sample application is a practical example. ## The grouping language structure The operations defining the structure of a grouping are: - `all(statement)`: Execute the nested statement once on the input list as a whole. - `each(statement)`: Execute the nested statement on each element of the input list. - `group(specification)`: Turn the input list into a list of lists according to the grouping specification. - `output`: Output some value(s) at the current location in the structure. The parallel and nested collection of these operations defines both the structure of the computation and of the result it produces. For example, `all(group(customer) each(output(count())))` will take all matches, group them by customer id, and for each group, output the count of hits in the group. Vespa distributes and executes the grouping program on content nodes and merges results on container nodes - in multiple phases, as needed. As realizing such programs over a distributed data set requires more network round-trips than a regular search query, these queries may be more expensive than regular queries - see [defaultMaxGroups](../reference/api/query.html#grouping.defaultmaxgroups) and the likes for how to control resource usage. ## Grouping by example For the entirety of this document, assume an index of engine part purchases: | Date | Price | Tax | Item | Customer | | --- | --- | --- | --- | --- | | 2006-09-06 09:00:00 | $1 000 | 0.24 | Intake valve | Smith | | 2006-09-07 10:00:00 | $1 000 | 0.12 | Rocker arm | Smith | | 2006-09-07 11:00:00 | $2 000 | 0.24 | Spring | Smith | | 2006-09-08 12:00:00 | $3 000 | 0.12 | Valve cover | Jones | | 2006-09-08 10:00:00 | $5 000 | 0.24 | Intake port | Jones | | 2006-09-08 11:00:00 | $8 000 | 0.12 | Head | Brown | | 2006-09-09 12:00:00 | $1 300 | 0.24 | Coolant | Smith | | 2006-09-09 10:00:00 | $2 100 | 0.12 | Engine block | Jones | | 2006-09-09 11:00:00 | $3 400 | 0.24 | Oil pan | Brown | | 2006-09-09 12:00:00 | $5 500 | 0.12 | Oil sump | Smith | | 2006-09-10 10:00:00 | $8 900 | 0.24 | Camshaft | Jones | | 2006-09-10 11:00:00 | $1 440 | 0.12 | Exhaust valve | Brown | | 2006-09-10 12:00:00 | $2 330 | 0.24 | Rocker arm | Brown | | 2006-09-10 10:00:00 | $3 770 | 0.12 | Spring | Brown | | 2006-09-10 11:00:00 | $6 100 | 0.24 | Spark plug | Smith | | 2006-09-11 12:00:00 | $9 870 | 0.12 | Exhaust port | Jones | | 2006-09-11 10:00:00 | $1 597 | 0.24 | Piston | Brown | | 2006-09-11 11:00:00 | $2 584 | 0.12 | Connection rod | Smith | | 2006-09-11 12:00:00 | $4 181 | 0.24 | Rod bearing | Jones | | 2006-09-11 13:00:00 | $6 765 | 0.12 | Crankshaft | Jones | ## Basic Grouping Example: _Return the total sum of purchases per customer_ - steps: 1. Select all documents: ``` /search/?yql=select * from sources * where true ``` 2. Take the list of all hits: ``` all(...) ``` 3. Turn it into a list of lists of all hits having the same customer id: ``` group(customer) ``` 4. For each of those lists of same-customer hits: each(...) 5. Output the sum (an aggregator) of the price over all items in that list of hits: ``` output(sum(price)) ``` Final query, producing the sum of the price of all purchases for each customer: ``` /search/?yql=select * from sources * where true limit 0 | all( group(customer) each(output(sum(price))) ) ``` Here, limit is set to zero to get the grouping output only. URL encoded equivalent: ``` /search/?yql=select%20%2A%20from%20sources%20%2A%20where%20true%20limit%200%20%7C%20 all%28%20group%28customer%29%20each%28output%28sum%28price%29%29%29%20%29 ``` Result: | GroupId | Sum(price) | | --- | --- | | Brown | $20 537 | | Jones | $39 816 | | Smith | $19 484 | Example: _Sum price of purchases [per date](#time-and-date):_ ``` select (…) | all(group(time.date(date)) each(output(sum(price)))) ``` Note: in examples above, _all_ documents are evaluated. Modify the query to add filters (and thus cut latency), like (remember to URL encode): ``` /search/?yql=select * from sources * where customer contains "smith" ``` ## Ordering and Limiting Groups In many scenarios, a large collection of groups is produced, possibly too large to display or process. This is handled by ordering groups, then limiting the number of groups to return. The `order` clause accepts a list of one or more expressions. Each of the arguments to `order` is prefixed by either a plus/minus for ascending/descending order. Limit the number of groups using `max` and `precision` - the latter is the number of groups returned per content node to be merged to the global result. Larger document distribution skews hence require a higher `precision` for accurate results. An implicit limit can be specified through the [grouping.defaultMaxGroups](../reference/api/query.html#grouping.defaultmaxgroups) query parameter. This value will always be overridden if `max` is explicitly specified in the query. Use `max(inf)` to retrieve all groups when the query parameter is set. If `precision` is not specified, it will default to a factor times `max`. This factor can be overridden through the [grouping.defaultPrecisionFactor](../reference/api/query.html#grouping.defaultprecisionfactor) query parameter. Example: To find the 2 globally best groups, make an educated guess on how many samples are needed to fetch from each node in order to get the right groups. This is the `precision`. An initial factor of 3 has proven to be quite good in most use cases. If however, the data for customer 'Jones' was spread on 3 different content nodes, 'Jones' might be among the 2 best on only one node. But based on the distribution of the data, we have concluded by earlier tests that if we fetch 5.67 as many groups as we need to, we will have a correct answer with at least 99.999% confidence. So then we just use 6 times as many groups when doing the merge. However, there is one exception. Without an `order` constraint, `precision` is not required. Then, local ordering will be the same as global ordering. Ordering will not change after a merge operation. ### Example Example: _The two customers with most purchases, returning the sum for each:_ ``` select (…) | all(group(customer) max(2) precision(12) order(-count()) each(output(sum(price)))) ``` ## Hits per Group Use `summary` to print the fields for a hit, and `max` to limit the number of hits per group. An implicit limit can be specified through the [grouping.defaultMaxHits](../reference/api/query.html#grouping.defaultmaxhits) query parameter. This value will always be overridden if `max` is explicitly specified in the query. Use `max(inf)` to retrieve all hits when the query parameter is set. ### Example Example: Return the three most expensive parts per customer: ``` /search/?yql=select * from sources * where true | all(group(customer) each(max(3) each(output(summary())))) ``` Notes on ordering in the example above: - The `order` clause is a directive for _group_ ordering, not _hit_ ordering. Here, there is no order clause on the groups, so default ordering `-max(relevance())` is used. The _-_ denotes the sorting order, _-_ means descending (higher score first). In this case, the query is "all documents", so all groups are equally relevant and the group order is random. - To order hits inside groups, use ranking. Add `ranking=pricerank` to the query to use the pricerank [rank profile](../basics/ranking.html) to rank by price: ``` rank-profile pricerank inherits default { first-phase { expression: attribute(price) } } ``` ## Filter within a group Use the `filter` clause to select which values to keep in a group. See the [reference](../reference/querying/grouping-language.html#filtering-groups) for details. ### Example Example: Sum the price per customer of `Bonn.*` where price was over 1000. ``` /search/?yql=select * from sources * where true | all(group(customer) filter(regex("Bonn.*", attributes{"sales_rep"}) and not range(0, 1000, price)) each(output(sum(price)) each(output(summary())))) ``` ## Global limit for grouping queries Use the [grouping.globalMaxGroups](../reference/api/query.html#grouping.globalmaxgroups) query parameter to restrict execution of queries that are potentially too expensive in terms of compute and bandwidth. Queries that may return a result exceeding this threshold are failed preemptively. This limit is compared against the total number of groups and hits that query could return at worst-case. ### Examples The following query may return 5 groups and 0 hits. It will be rejected when `grouping.globalMaxGroups < 5` ``` select (…) | all(group(item) max(5) each(output(count()))) ``` The following query may return 5 groups and 35 hits. It will be rejected when `grouping.globalMaxGroups < 5+5*7`. ``` select (…) | all( group(customer) max(5) each( output(count()) max(7) each(output(summary())) ) ) ``` The following query may return 6 groups and 30 hits. It will be rejected when `grouping.globalMaxGroups < 2*(3+3*5)`. ``` select (…) | all( all(group(item) max(3) each(output(count()) max(5) each(output(summary())))) all(group(customer) max(3) each(output(count()) max(5) each(output(summary()))))) ``` ### Combining with default limits for groups/hits The `grouping.globalMaxGroups` restriction will utilize the [grouping.defaultMaxGroups](../reference/api/query.html#grouping.defaultmaxgroups)/ [grouping.defaultMaxHits](../reference/api/query.html#grouping.defaultmaxhits) values for grouping statements without a `max`. The two queries below are identical, assuming `defaultMaxGroups=5` and `defaultMaxHits=7`, and both will be rejected when `globalMaxGroups < 5+5*7`. ``` select (…) | all( group(customer) max(5) each( output(count()) max(7) each(output(summary())) ) ) ``` ``` select (…) | all( group(customer) each( output(count()) each(output(summary())) ) ) ``` A grouping without `max` combined with `defaultMaxGroups=-1`/`defaultMaxHits=-1` will be rejected unless `globalMaxGroups=-1`. This is because the query produces an unbounded result, an infinite number of groups if `defaultMaxGroups=-1` or an infinite number of summaries if `defaultMaxHits=-1`. An unintentional DoS (Denial of Service) could be the utter consequence if a query returns thousands of groups and summaries. This is why setting `globalMaxGroups=-1` is risky. ### Recommended settings The best practice is to always specify `max` in groupings, making it easy to reason about the worst-case cardinality of the query results. The performance will also benefit. Set `globalMaxGroups` to the overall worst-case result cardinality with some margin. The `defaultMaxGroups`/`defaultMaxHits` should be overridden in a query profile if some groupings do not use `max` and the default values are too low. ``` 20 100 8000 ``` ## Performance and Correctness Grouping is, by default, tuned to favor performance over correctness. Perfect correctness may not be achievable; result of queries using [non-default ordering](#ordering-and-limiting-groups) can be approximate, and correctness can only be partially achieved by a larger `precision` value that sacrifices performance. The [grouping session cache](../reference/querying/grouping-language.html#grouping-session-cache) is enabled by default. Disabling it will improve correctness, especially for queries using `order` and `max`. The cost of multi-level grouping expressions will increase, though. Consider increasing the [precision](#ordering-and-limiting-groups) value when using `max` in combination with `order`. The default precision may not achieve the required correctness for your use case. ## Nested Groups Groups can be nested. This offers great drilling capabilities, as there are no limits to nesting depth or presented information on any level. Example: How much each customer has spent per day by grouping on customer, then date: ``` select (…) | all(group(customer) each(group(time.date(date)) each(output(sum(price))))) ``` Use this to query for all items on a per-customer basis, displaying the most expensive hit for each customer, with subgroups of purchases on a per-date basis. Use the [summary](#hits-per-group) clause to show hits inside any group at any nesting level. Include the sum price for each customer, both as a grand total and broken down on a per-day basis: ``` /search/?yql=select * from sources * where true limit 0| all(group(customer) each(max(1) output(sum(price)) each(output(summary()))) each(group(time.date(date)) each(max(10) output(sum(price)) each(output(summary()))))) &ranking=pricerank ``` | GroupId | sum(price) | | | | | | | --- | --- | --- | --- | --- | --- | --- | | Brown | $20 537 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-08 11:00 | $8 000 | 0.12 | Head | Brown | | | | GroupId | Sum(price) | | | | | | | 2006-09-08 | $8 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-08 11:00 | $8 000 | 0.12 | Head | Brown | | | 2006-09-09 | $3 400 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-09 11:00 | $3 400 | 0.12 | Oil pan | Brown | | | 2006-09-10 | $7 540 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 10:00 | $3 770 | 0.12 | Spring | Brown | | | | 2006-09-10 12:00 | $2 330 | 0.24 | Rocker arm | Brown | | | | 2006-09-10 11:00 | $1 440 | 0.12 | Exhaust valve | Brown | | | 2006-09-11 | $1 597 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 10:00 | $1 597 | 0.24 | Piston | Brown | | Jones | $39 816 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 12:00 | $9 870 | 0.12 | Exhaust port | Jones | | | | GroupId | Sum(price) | | | | | | | 2006-09-08 | $8 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-08 10:00 | $5 000 | 0.24 | Intake port | Jones | | | | 2006-09-08 12:00 | $3 000 | 0.12 | Valve cover | Jones | | | 2006-09-09 | $2 100 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-09 10:00 | $2 100 | 0,12 | Engine block | Jones | | | 2006-09-10 | $8 900 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 10:00 | $8 900 | 0.24 | Camshaft | Jones | | | 2006-09-11 | $20 816 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 12:00 | $9 870 | 0.12 | Exhaust port | Jones | | | | 2006-09-11 13:00 | $6 765 | 0.12 | Crankshaft | Jones | | | | 2006-09-11 12:00 | $4 181 | 0.24 | Rod bearing | Jones | | Smith | $19 484 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 11:00 | $6 100 | 0.24 | Spark plug | Smith | | | | GroupId | Sum(price) | | | | | | | 2006-09-06 | $1 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-06 09:00 | $1 000 | 0.24 | Intake valve | Smith | | | 2006-09-07 | $3 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-07 11:00 | $2 000 | 0.24 | Spring | Smith | | | | 2006-09-07 10:00 | $1 000 | 0.12 | Rocker arm | Smith | | | 2006-09-09 | $6 800 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-09 12:00 | $5 500 | 0.12 | Oil sump | Smith | | | | 2006-09-09 12:00 | $1 300 | 0.24 | Coolant | Smith | | | 2006-09-10 | $6 100 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 11:00 | $6 100 | 0.24 | Spark plug | Smith | | | 2006-09-11 | $2 584 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 11:00 | $2 584 | 0.12 | Connection rod | Smith | ## Structured grouping Structured grouping is nested grouping over an array of structs or maps. In this case, each array element is treated as a sub-document and may be grouped separately. See the reference for grouping on[multivalue attributes](../reference/querying/grouping-language.html#multivalue-attributes)for details. It is also possible to[filter the groups](../reference/querying/grouping-language.html#filtering-groups)so only matching elements are considered. An example could be: ``` select (…) | all(group(attributes.value) filter(regex("delivery_method",attributes.key)) each(output(sum(price)) each(output(summary())))) ``` ## Range grouping In the examples above, results are grouped on distinct values, like customer or date. To group on price: ``` select (…) | all(group(price) each(each(output(summary())))) ``` This gives one group per price. To group on price _ranges_, one could compress the price range. This gives prices in $0 - $999 in bucket 0, $1 000 - $2 000 in bucket 1 and so on: ``` select (…) | all(group(price/1000) each(each(output(summary())))) ``` An alternative is using [bucket expressions](../reference/querying/grouping-language.html#bucket-expressions) - think of a bucket as the range per group. Group on price, make groups have a width of 1000: ``` select (…) | all(group(fixedwidth(price,1000)) each(each(output(summary())))) ``` Use `predefined` to configure group sizes individually (the two below are equivalent): ``` select (…) | all( group(predefined(price, bucket(0,1000), bucket(1000,2000), bucket(2000,5000), bucket(5000,inf))) each(each(output(summary()))) ) ``` This works with strings as well - put Jones and Smith in the second group: ``` select (…) | all(group(predefined(customer, bucket(-inf,"Jones"), bucket("Jones", inf))) each(each(output(summary())))) ``` ... or have Jones in his own group: ``` select (…) | all(group(predefined(customer, bucket<-inf,"Jones">, bucket["Jones"], bucket<"Jones", inf>)) each(each(output(summary())))) ``` Use decimal numbers in bucket definitions if the expression evaluates to a double or float: ``` select (…) | all( group(predefined(tax, bucket(0.0, 0.2), bucket(0.2, 0.5), bucket(0.5, inf))) each( each(output(summary())) ) ) ``` ## Pagination Grouping supports [continuation](../reference/querying/grouping-language.html#continuations) objects that are passed as annotations to the grouping statement. The `continuations` annotation is a list of zero or more continuation strings, returned in the grouping result. For example, given the result: ``` ``` { "root": { "children": [ { "children": [ { "children": [ { "fields": { "count()": 7 }, "value": "Jones", "id": "group:string:Jones", "relevance": 1.0 } ], "continuation": { "next": "BGAAABEBEBC", "prev": "BGAAABEABC" }, "id": "grouplist:customer", "label": "customer", "relevance": 1.0 } ], "continuation": { "this": "BGAAABEBCA" }, "id": "group:root:0", "relevance": 1.0 } ], "fields": { "totalCount": 20 }, "id": "toplevel", "relevance": 1.0 } } ``` ``` reproduce the same result by passing the _this_-continuation along the original select: ``` select (…) | { 'continuations':['BGAAABEBCA'] }all(…) ``` To display the next page of customers, pass the _this_-continuation of the root group, and the _next_ continuation of the customer list: ``` select (…) | { 'continuations':['BGAAABEBCA', 'BGAAABEBEBC'] }all(…) ``` To display the previous page of customers, pass the _this_-continuation of the root group, and the _prev_ continuation of the customer list: ``` select (…) | { 'continuations':['BGAAABEBCA', 'BGAAABEABC'] }all(…) ``` The `continuations` annotation is an ordered list of continuation strings. These are combined by replacement so that a continuation given later will replace any shared state with a continuation given before. Also, when using the `continuations` annotation, always pass the _this_-continuation as its first element. **Note:** Continuations work best when the ordering of hits is stable - which can be achieved by using [ranking](../basics/ranking.html) or[ordering](../reference/querying/grouping-language.html#order). Adding a tie-breaker might be needed - like [random.match](../reference/ranking/rank-features.html#random)or a random double value stored in each document - to keep the ordering stable in case of multiple documents that would otherwise get the same rank score or the same value used for ordering. ## Expressions Instead of just grouping on some attribute value, the `group` clause may contain arbitrarily complex expressions - see `group` in the[grouping reference](../reference/querying/grouping-language.html) for an exhaustive list. Examples: - Select everything. For example, `group("all") each(output(sum(price)))` gives total revenue - Select the minimum or maximum of sub-expressions - Addition, subtraction, multiplication, division, and even modulo of sub-expressions - Bitwise operations on sub-expressions - Concatenation of the results of sub-expressions Sum the prices of purchases on a per-hour-of-day basis: ``` select (…) | all(group(mod(div(date,mul(60,60)),24)) each(output(sum(price)))) ``` These types of expressions may also be used inside `output` operations, so instead of simply calculating the sum price of the grouped purchases, calculate the sum income after taxes per customer: ``` select (…) | all(group(customer) each(output(sum(mul(price,sub(1,tax)))))) ``` Note that the validity of an expression depends on the current nesting level. For, while `sum(price)` would be a valid expression for a group of hits, `price` would not. As a general rule, each operator within an expression either applies to a single hit or aggregates values across a group. ## Search Container API As an alternative to a textual representation, one can use the programmatic API to execute grouping requests. This allows multiple grouping requests to run in parallel, and does not collide with the `yql` parameter - example: ``` ``` @Override public Result search(Query query, Execution execution) { // Create grouping request. GroupingRequest request = GroupingRequest.newInstance(query); request.setRootOperation(new AllOperation() .setGroupBy(new AttributeValue("foo")) .addChild(new EachOperation() .addOutput(new CountAggregator().setLabel("count")))); // Perform grouping request. Result result = execution.search(query); // Process grouping result. Group root = request.getResultGroup(result); GroupList foo = root.getGroupList("foo"); for (Hit hit : foo) { Group group = (Group)hit; Long count = (Long)group.getField("count"); // TODO: Process group and count. } // Pass results back to calling searcher. return result; } ``` ``` Refer to the[API documentation](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/package-summary.html) for the complete reference. ## TopN / Full corpus Simple grouping: count the number of documents in each group: ``` select * from purchase where true | all( group(customer) each(output(count())) ) ``` Two parallel groupings: ``` select * from purchase where true | all( all( group(customer) each(output(count())) ) all( group(item) each(output(count())) ) ) ``` Only the 1000 best hits will be grouped at each content node. Lower accuracy, but higher speed: ``` select * from purchase where true limit 0 | all( max(1000) all( group(customer) each(output(count())) ) ) ``` ## Selecting groups Do a modulo 3 operation before selecting the group: ``` select * from purchase where true limit 0 | all( group(price % 3) each(output(count())) ) ``` Do `price + tax * price` before selecting the group: ``` select * from purchase where true limit 0 | all( group(price + tax * price) each(output(count())) ) ``` ## Ordering groups Do a modulo 5 operation before selecting the group - the groups are then ordered by their aggregated sum of attribute "tax": ``` select * from purchase where true limit 0 | all( group(price % 5) order(sum(tax)) each(output(count())) ) ``` Do `price + tax * price` before selecting the group. Ordering is given by the maximum value of attribute "price" in each group: ``` select * from purchase where true limit 0 | all( group(price + tax * price) order(max(price)) each(output(count())) ) ``` Take the average relevance of the groups and multiply it with the number of groups to get a cumulative count: ``` select * from purchase where true limit 0 | all( group(customer) order(avg(relevance()) * count()) each(output(count())) ) ``` One can not directly reference an attribute in the order clause, as this: ``` select * from purchase where true limit 0 | all( group(customer) order(price * count()) each(output(count())) ) ``` However, one can do this: ``` select * from purchase where true limit 0 | all( group(customer) order(max(price) * count()) each(output(count())) ) ``` Ordering alphabetically works in a similar way: ``` select * from purchase where true limit 0 | all( group(customer) order(max(customer)) each(output(count())) ) ``` **Note:** You can control non-ASCII character folding behavior with [unicode collation](../reference/querying/grouping-language.html#other-expressions). ## Collecting aggregates Simple grouping to count the number of documents in each group and return the best hit in each group: ``` select * from purchase where true limit 0 | all( group(customer) each( max(1) each(output(summary())) ) ) ``` Also return the sum of attribute "price": ``` select * from purchase where true limit 0 | all( group(customer) each(max(1) output(count(), sum(price)) each(output(summary()))) ) ``` Also, return an XOR of the 64 most significant bits of an MD5 over the concatenation of attributes "customer", "price" and "tax": ``` select * from purchase where true limit 0 | all(group(customer) each(max(1) output(count(), sum(price), xor(md5(cat(customer, price, tax), 64))) each(output(summary())))) ``` It is also possible to return quantiles, for instance, the p50 and p90 of the price. ``` select * from purchase where true limit 0 | all(group(customer) each(output(quantiles([0.5,0.9], price)))) ``` ## Grouping Single-level grouping on "customer" attribute, returning at most 5 groups with full hit count as well as the 69 best hits. ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(max(69) output(count()) each(output(summary())))) ``` Two level grouping on "customer" and "item" attribute: ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(output(count()) all(group(item) max(5) each(max(69) output(count()) each(output(summary())))))) ``` Three-level grouping on "customer", "item" and "attributes.key(coupon)" attribute: ``` select * from purchase where true limit 0 | all(group(customer) max(1) each(output(count()) all(group(item) max(1) each(output(count()) max(1) all(group(attributes.key) max(1) each(output(count()) each(output(summary())))))))) ``` As above, but also collect best hit in level 2: ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(output(count()) all(group(item) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(attributes.key) max(5) each(max(69) output(count()) each(output(summary())))))))) ``` As above, but also collect best hit in level 1: ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(item) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(attributes.key) max(5) each(max(69) output(count()) each(output(summary())))))))) ``` As above, but using different document summaries on each level: ``` select * from purchase where true limit 0 | all( group(customer) max(5) each(output(count()) all(max(1) each(output(summary(complexsummary)))) all(group(item) max(5) each(output(count()) all(max(1) each(output(summary(simplesummary)))) all(group(price) max(5) each(max(69) output(count()) each(output(summary(fastsummary)))))) ) ``` Deep grouping with counting and hit collection on all levels: ``` select * from purchase where true limit 0 | all( group(customer) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(item) each(output(count()) all(max(1) each(output(summary()))) all(group(price) each(output(count()) all(max(1) each(output(summary())))))))) ) ``` ## Time and date The field (`time` below, but can have any name) must be a [long](../reference/schemas/schemas.html#long), with second resolution (unix timestamp/epoch). See the [reference](../reference/querying/grouping-language.html#time-expressions) for all time-functions. Group by year: ``` select * from purchase where true limit 0 | all(group(time.year(date)) each(output(count()))) ``` Group by year, then by month: ``` select * from purchase where true limit 0 | all( group(time.year(date)) each(output(count()) all(group(time.monthofyear(date)) each(output(count())))) ) ``` Groups _today_, _yesterday_, _lastweek_, and _lastmonth_ using `predefined` aggregator, and groups each day within each of these separately: ``` select * from purchase where true limit 0 | all( group( predefined((now() - date) / (60 * 60 * 24), bucket(0,1), bucket(1,2), bucket(3,7), bucket(8,31)) ) each(output(count()) all(max(2) each(output(summary()))) all(group((now() - date) / (60 * 60 * 24)) each(output(count()) all(max(2) each(output(summary()))) ) ) ) ) ``` ### Timezones in grouping The `timezone` query parameter can be used to rewrite each time-function with a timezone offset. See the [reference](../reference/api/query.html#timezone). Example: ``` $ vespa query "select * from purchase where true | \ all( group(time.hourofday(date)) each(output(count()))" \ "timezone=America/Los_Angeles" ``` This query selects all documents from `purchase`, groups them by the hour they were made (adjusted to the local time in `America/Los_Angeles`), and counts how many purchases fall into each hour. ## Counting unique groups The `count` aggregator can be applied on a list of groups to determine the number of unique groups without having to explicitly retrieve all groups. Note that this count is an estimate using HyperLogLog++ which is an algorithm for the count-distinct problem. To get an accurate count, one needs to explicitly retrieve all groups and count them in a custom component or in the middle tier calling out to Vespa. This is network intensive and might not be feasible in cases with many unique groups. Another use case for this aggregator is counting the number of unique instances matching a given expression. Output an estimate of the number of groups, which is equivalent to the number of unique values for attribute "customer": ``` select * from purchase where true limit 0 | all( group(customer) each(output(count())) ) ``` Output an estimate of the number of unique string lengths for the attribute "item": ``` select * from purchase where true limit 0 | all(group(strlen(item)) each(output(count()))) ``` Output the sum of the "price" attribute for each group in addition to the accurate count of the overall number of unique groups as the inner each causes all groups to be returned. ``` select * from purchase where true limit 0 | all(group(customer) output(count()) each(output(sum(price)))) ``` The `max` clause is used to restrict the number of groups returned. The query outputs the sum for the 3 best groups. The `count` clause outputs the estimated number of groups (potentially \>3). The `count` becomes an estimate here as the number of groups is limited by max, while in the above example, it's not limited by max: ``` select * from purchase where true limit 0 | all(group(customer) max(3) output(count()) each(output(sum(price)))) ``` Output the number of top-level groups, and for the 10 best groups, output the number of unique values for attribute "item": ``` select * from purchase where true limit 0 | all(group(customer) max(10) output(count()) each(group(item) output(count()))) ``` ## Counting unique groups - multivalue fields A [multivalue](../searching-multi-valued-fields) attribute is a [weighted set](../reference/schemas/schemas.html#weightedset), [array](../reference/schemas/schemas.html#array) or [map](../reference/schemas/schemas.html#map). Most grouping functions will just handle the elements of multivalued attributes separately, as if they were all individual values in separate documents. If you are grouping over array of struct or maps, scoping will be used to preserve structure. Each entry in the array/map will be treated as a separate sub-document, so documents can be counted twice or more - see [#33646](https://github.com/vespa-engine/vespa/issues/33646) for details. This could be solved by adding an additional level of grouping, where you group on a field that is unique for each document (grouping on document id is not supported). You may then count the unique groups to determine the unique document count: ``` select * from purchase where true limit 0 | all(group(customer) each(group(item) output(count()))) ``` ## Impression forecasting Using impression logs for a given user, one can make a function that maps from rank score to the number of impressions an advertisement would get - example: ``` Score Integer (# impressions for this user) 0.200 0 0.210 1 0.220 2 0.240 3 0.320 4 0.420 5 0.560 6 0.700 7 0.800 8 0.880 9 0.920 10 0.940 11 0.950 12 ``` Storing just the first column (the rank scores, including a rank score for 0 impressions) in an array attribute named _impressions_, the grouping operation[interpolatedlookup(impressions, relevance())](../reference/querying/grouping-language.html#interpolatedlookup)can be used to figure out how many times a given advertisement would have been shown to this particular user. So if the rank score is 0.420 for a specific user/ad/bid combination, then `interpolatedlookup(impressions, relevance())` would return 5.0. If the bid is increased so the rank score gets to 0.490, it would get 5.5 as the return value instead. In this context, a count of 5.5 isn't meaningful for the past of a single user, but it gives more information that may be used as a forecast. Summing this across more, different users may then be used to forecast the total of future impressions for the advertisement. ## Aggregating over all documents Grouping is useful for analyzing data. To aggregate over the full document set, create _one_ group (which will have _all_ documents) by using a constant (here 1) - example: ``` select rating from restaurant where true | all(group(1) each(output(avg(price)))) ``` Make sure all documents have a value for the given field, if not, NaN is used, and the final result is also NaN: ``` ``` { "id": "group:long:1", "relevance": 0.0, "value": "1", "fields": { "avg(rating)": "NaN" } } ``` ``` ## Count fields with NaN Count number of documents missing a value for an [attribute](../content/attributes.html) field (actually, in this example, unset or less than 0, see the bucket expression below). Set a higher query timeout, just in case. Example, analyzing a field called _price_: ``` select rating from restaurant where true | all( group(predefined(price, bucket[-inf, 0>, bucket[0, inf>)) each(output(count())) ) ``` Example output, counting 2 documents with `-inf` in _rating_: ``` ``` "children": [ { "id": "group:long_bucket:-9223372036854775808:0", "relevance": 0.0, "limits": { "from": "-9223372036854775808", "to": "0" }, "fields": { "count()": 2 } }, { "id": "group:long_bucket:0:9223372036854775807", "relevance": 0.0, "limits": { "from": "0", "to": "9223372036854775807" }, "fields": { "count()": 8 } } ] ``` ``` See [analyzing field values](../writing/visiting.html#analyzing-field-values) for how to export ids of documents meeting given criteria from the full corpus. ## List fields with NaN This is similar to the counting of NaN above, but instead of aggregating the count, for each hit, print a [document summary](../reference/schemas/schemas.html#document-summary): ``` select rating from restaurant where true | all( group(predefined(price, bucket[-inf, 0>, bucket[0, inf>)) order(max(price)) max(1) each( max(100) each(output(summary()))) ) ``` Notes: - We are only interested in the first group, so order by `max(price)` and use `max(1)` to get only the first - Uses `max(100)` in order to limit result set sizes. Read more about [grouping.defaultmaxhits](../reference/api/query.html#grouping.defaultmaxhits). - Use the [continuation token](#pagination) to iterate over the result set. ## Grouping over a Map field In the example data, a record looks like: ``` ``` { "fields": { "attributes": { "delivery_method": "Curbside Pickup", "sales_rep": "Bonnie", "coupon": "SAVE10" }, "customer": "Smith", "date": 1157526000, "item": "Intake valve", "price": "1000", "tax": "0.24" } } ``` ``` The map field [schema definition](../reference/schemas/schemas.html#map) is: ``` field attributes type map { indexing: summary struct-field key { indexing: attribute } struct-field value { indexing: attribute } } ``` With this, one can group on both key (`delivery_method`, `sales_rep`, and `coupon`) and values (here counting each value). Try the link to see the output: ``` select * from purchase where true limit 0 | all( group(attributes.key) each( group(attributes.value) each(output(count()))) ) ``` A more interesting example is to see the sum per sales rep: ``` select * from purchase where true limit 0 | all( group(attributes.key) each( group(attributes.value) each(output(sum(price)))) ) ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Grouping Interface](#) - [The grouping language structure](#the-grouping-language-structure) - [Grouping by example](#grouping-by-example) - [Basic Grouping](#basic-grouping) - [Ordering and Limiting Groups](#ordering-and-limiting-groups) - [Example](#ordering-and-limiting-groups-example) - [Hits per Group](#hits-per-group) - [Example](#hits-per-group-example) - [Filter within a group](#filter-within-a-group) - [Example](#filter-example) - [Global limit for grouping queries](#global-limit) - [Examples](#resource-control-example) - [Combining with default limits for groups/hits](#global-limit-combining) - [Recommended settings](#global-limit-recommendation) - [Performance and Correctness](#performance-and-correctness) - [Nested Groups](#nested-groups) - [Structured grouping](#structured-grouping) - [Range grouping](#range-grouping) - [Pagination](#pagination) - [Expressions](#expressions) - [Search Container API](#search-container-api) - [TopN / Full corpus](#topn-full-corpus) - [Selecting groups](#selecting-groups) - [Ordering groups](#ordering-groups) - [Collecting aggregates](#collecting-aggregates) - [Grouping](#grouping) - [Time and date](#time-and-date) - [Timezones in grouping](#timezone-grouping) - [Counting unique groups](#counting-unique-groups) - [Counting unique groups - multivalue fields](#counting-unique-groups-multivalue-fields) - [Impression forecasting](#impression-forecasting) - [Aggregating over all documents](#aggregating-over-all-documents) - [Count fields with NaN](#count-fields-with-nan) - [List fields with NaN](#list-fields-with-nan) - [Grouping over a Map field](#grouping-over-a-map-field) --- # Source: https://docs.vespa.ai/en/security/guide.html.md # Security Guide Vespa Cloud has several security mechanisms it is important for developers to understand. Vespa Cloud has two different interaction paths, _Data Plane_ and_Control Plane_. Communication with the Vespa application goes through the _Data Plane_, while the _Control Plane_ is used to manage Vespa tenants and applications. The _Control Plane_ and the _Data Plane_ has different security mechanisms, described in this guide. ## SOC 2 Vespa.ai has a SOC 2 attestation - read more in the [Trust Center](https://trust.vespa.ai/). ## Data Plane Data plane requests are protected using mutual TLS, or optionally tokens. ### Configuring mTLS Certificates can be created using the[Vespa CLI](../clients/vespa-cli.html): ``` $ vespa auth cert --application .. ``` ``` $ vespa auth cert --application scoober.albums.default Success: Certificate written to security/clients.pem Success: Certificate written to $HOME/.vespa/scoober.albums.default/data-plane-public-cert.pem Success: Private key written to $HOME/.vespa/scoober.albums.default/data-plane-private-key.pem ``` The certificates can be created regardless of the application existence in Vespa Cloud. One can use this command to generate `security/clients.pem`for an application package: ``` $ cp $HOME/.vespa/scoober.albums.default/data-plane-public-cert.pem security/clients.pem ``` Certificates can also be created using OpenSSL: ``` $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/clients.pem ``` The certificate is placed inside the application package in[security/clients.pem](../reference/applications/application-packages.html). Make sure`clients.pem` is placed correctly if the certificate is created with OpenSSL, while the Vespa CLI will handle this automatically. `security/clients.pem` files can contain multiple PEM encoded certificates by concatenating them. This allows you to have multiple clients with separate private keys, making it possible to rotate to a new certificate without any downtime. ### Permissions To support different permissions for clients, it is possible to limit the permissions of a client. Only `read` or `write` permissions are supported. #### Request mapping The request actions are mapped from HTTP method. The default mapping rule is: - GET → `read` - PUT, POST, DELETE → `write` For `/search/` this is replaced by: - GET, POST → `read` #### Example Create 3 different certificates, for three different use cases: - Serving - `read` - Ingest - `write` - Full access - `read, write` ``` $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/serve.pem $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/ingest.pem $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/full_access.pem ``` Notes: - Files must be placed in the _security_ folder inside the application package - Certificates must be unique - Certificate chains are currently not supported - Files must be written using PEM encoding Reference the certificate files from services xml using the `clients` element: ``` ... ... ``` #### Custom request mapping The default mapping can be changed by overriding `requestHandlerSpec()`: ``` /** * Example overriding acl mapping of POST requests to read */ public class CustomAclHandler extends ThreadedHttpRequestHandler { private final static RequestHandlerSpec REQUEST_HANDLER_SPEC = RequestHandlerSpec.builder().withAclMapping( HttpMethodAclMapping.standard() .override(Method.POST, AclMapping.Action.READ) .build()) .build(); @Override public RequestHandlerSpec requestHandlerSpec() { return REQUEST_HANDLER_SPEC; } ``` ### Configuring tokens Application endpoints can also be configured with token based authentication. Note that it is still required to define at least one client for mTLS. **Note:** Token authentication must be explicitly enabled when used in combination with[Private Endpoints](../operations/private-endpoints.html). #### Creating tokens using the console Tokens are identified by a name, and can contain multiple versions to easily support token rotation. To create a new token: 1. In the [console](https://console.vespa.ai) tenant view, open **Account \> Tokens** 2. Click **Add token** 3. Enter a name you'll reference in the application later and click **Add**. Remember to copy the token value and store it securely. To add a new token _version_: 1. Find the existing token, click **Add version** 2. Select expiration and click **Add**. Copy the token value and store securely. To revoke a version: 1. Find the existing token version, click **Revoke** To manually rotate a token: 1. Add a new token _version_ following the above steps 2. Revoke the old version when no clients use the old version #### Application configuration with token endpoints After creating a token, it must be configured in your application's services.xml by adding the[clients](../reference/applications/services/container.html#clients) element to your container cluster(s). Here is an example with multiple container clusters and tokens (you may only have one): ``` ... ... ... ... ``` #### Security recommendations The cryptographic properties of token authentication vs mTLS are comparable. There are however a few key differences in how they are used: - tokens are sent as a header with every request - since they are part of the request they are also more easily leaked in log outputs or source code (e.g. curl commands). It is therefore recommended to - create tokens with a short expiry (keeping the default of 30 days). - keep tokens in a secret provider, and remember to hide output. - never commit secret tokens into source code repositories! ### Use endpoints #### Using mTLS Once the application is configured and deployed with a certificate in the application package, requests can be sent to the application. Again, the Vespa CLI can help to use the correct certificate. ``` $ vespa curl --application .. /ApplicationStatus ``` ``` $ curl --key $HOME/.vespa/scoober.albums.default/data-plane-private-key.pem \ --cert $HOME/.vespa/scoober.albums.default/data-plane-public-key.pem \ $ENDPOINT ``` #### Using tokens The token endpoint must be used when using tokens. After deployment is complete, the token endpoint will be available in the token endpoint list (marked “Token”). To use the token endpoint, the token should be sent as a bearer authorization header: ``` $ vespa query \ --header="Authorization: Bearer $TOKEN" \ 'yql=select * from music where album contains "head"' ``` ``` curl -H "Authorization: Bearer $TOKEN" $ENDPOINT ``` #### Using a browser In Vespa guides, curl is used in examples, like: ``` $ curl --cert ./data-plane-public-cert.pem --key ./data-plane-private-key.pem $ENDPOINT ``` To use a browser, install key/cert pair into KeyChain Access (MacOS Sonoma), assuming Certificate Common Name is "cloud.vespa.example" (as in the guides): 1. Install key/cert pair: ``` $ cat data-plane-public-cert.pem data-plane-private-key.pem > pkcs12.pem $ openssl pkcs12 -export -out pkcs12.p12 -in pkcs12.pem ``` 2. New password will be requested, and it will be used in the next steps. 3. In Keychain Access: With login keychain - Click "File" -\> Import Items. - Choose pkcs12.p12 file created before and type the password. - Double-click the imported certificate, open "Trust" and set "When using this certificate" to "Always Trust". - Right-click and "New Certificate Preference...", then add the $ENDPOINT. 4. Open the same URL in Chrome, choose the example.com certificate and allow Chrome to read the private key. #### Using Postman Many developers prefer interactive tools like[Postman](https://postman.com/). The Vespa blog has an article on[how to use Postman with Vespa](https://blog.vespa.ai/interface-with-vespa-apis-using-postman/). #### Using Cloudflare Workers See [Using Cloudflare Workers with Vespa Cloud](cloudflare-workers). ### Different credentials per instance To use different credentials per [instance](../learn/tenant-apps-instances.html), use [services.xml variants](../operations/deployment-variants.html#services.xml-variants). As an example, use this to have a separate MTLS keypair for production instances (use the same pattern if using tokens.): ``` ``` Depending on the [instance](../operations/automated-deployments.html) deployed to, a different keypair will be used for dataplane access. Use the same mechanism to have a dedicated credential for the [dev](../operations/environments.html#dev) environment, using `deploy:environment="dev"`. ## Control Plane The control plane is used to manage the Vespa applications. There are two different ways for access the Control Plane, using`vespa auth login` to log in as a regular user and using Application Keys.`vespa auth login` is intended for developers deploying manually to dev, while Application Keys are intended for deploying applications to production, typically by a continuous build tool. See more about these two methods below. ### Managing users Tenant administrators manage user access through the Vespa Console. ![Vespa Console user management](/assets/img/manage-users.png) Users have two different privilege levels - **Admin:** Can administrate the tenants metadata and the users of the tenant. - **Developer:** Can administrate the applications deployed in the tenant. ### User access to Control Plane Outside using the Vespa Console, communicating with the Control Plane is easiest with the [Vespa CLI](../clients/vespa-cli.html). ``` $ vespa auth login Your Device Confirmation code is: ****-**** If you prefer, you can open the URL directly for verification Your Verification URL: https://vespa.auth0.com/activate?user_code=****-**** Press Enter to open the browser to log in or ^C to quit... Waiting for login to complete in browser ... done Successfully logged in. ``` After logging in with the Vespa CLI, the CLI can be used to deploy applications. Users are logged in with the same privilege as the user described in the Vespa Console. ### Application Key If programmatic access to the Control Plane is needed, for example from a CI/CD system like GitHub Actions, the Application Key can be used - see example [deploy-vector-search.yaml](https://github.com/vespa-cloud/vector-search/blob/main/.github/workflows/deploy-vector-search.yaml). #### Configuration The Application Key can be generated in the Console from the Deployment Screen. The key is generated in the browser but the private key appears as a download in the browser. The public key can be downloaded separately from Deployment Screen. The private key is never persisted in Vespa Cloud, so it is important that the private key is kept securely. If lost, the private key is unrecoverable. ![Vespa Console application key management](/assets/img/application-key.png) The Application Key can also be generated using the Vespa CLI. ``` $ vespa auth api-key -a .. ``` ``` $ vespa auth api-key -a scoober.albums.default Success: API private key written to $HOME/.vespa/scoober.api-key.pem This is your public key: -----BEGIN PUBLIC KEY----- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE5fQUq12J/IlQQdE8pWC5596S7x9f HpPcyxCX2dXBS4aqKxnfN5HEyTkLCNGCo9HQljgLziqW1VFzshAdm3hHQg== -----END PUBLIC KEY----- Its fingerprint is: 91:1f:de:e3:9f:d3:21:28:1b:1b:05:40:52:72:81:4f To use this key in Vespa Cloud click 'Add custom key' at https://console.vespa-cloud.com/tenant/scoober/keys and paste the entire public key including the BEGIN and END lines. ``` #### Using the application key The Application Key can be used from the Vespa CLI to run requests again the Control Plane. Action like deploying applications to Vespa Cloud. ``` $ vespa deploy -z dev.aws-us-east-1c ``` ## Dataplane access Vespa Cloud users on paid plans have access to Vespa Cloud Support. For cases where the Vespa Team needs access to the application's data to provide support, the Vespa support personnel can request access after an explicit approval from the customer in the open support case. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [SOC 2](#soc-2) - [Data Plane](#data-plane) - [Configuring mTLS](#configuring-mtls) - [Permissions](#permissions) - [Configuring tokens](#configuring-tokens) - [Use endpoints](#use-endpoints) - [Different credentials per instance](#different-credentials-per-instance) - [Control Plane](#control-plane) - [Managing users](#managing-users) - [User access to Control Plane](#user-access-to-control-plane) - [Application Key](#application-key) - [Dataplane access](#dataplane-access) --- # Source: https://docs.vespa.ai/en/reference/operations/health-checks.html.md # Health checks reference This is the reference for loadbalancer healthchecks to [containers](../../applications/containers.html). By default, a container configures an instance of [VipStatusHandler](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/handler/VipStatusHandler.java) to serve `/status.html`. This will respond with status code 200 and text _OK_ if content clusters are UP. See [VipStatus.java](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/handler/VipStatus.java) for details. Applications with multiple content clusters should implement custom handlers for healthchecks, if the built-in logic is inadequate for the usage. Also refer to [federation](../../querying/federation.html) for how to manage data sources. ## Override using a status file Use `container.core.vip-status` to make `VipStatusHandler` use a file for health status: ``` true /full-path-to/status-response.html ``` If the file exists, its contents will be served on `/status.html`, otherwise an error message will be generated. To remove a container from service, delete or rename the file to serve. ## Alternative / multiple paths `VipStatusHandler` only looks at a single file path by default. As it is independent of the URI path, it is possible to configure multiple handler instances to serve alternative or custom messages - example: ``` http://*:*/docproc/freshness-data.xml true /full-path-to/freshness-data.xml http://*:*/docproc/ClusteringDocproc.status true /full-path-to/ClusteringDocproc.status ``` The paths `/docproc/freshness-data.xml` and `/docproc/ClusteringDocproc.status` serves the files located at `/full-path-to/freshness-data.xml` and `/full-path-to/ClusteringDocproc.status`, respectively. As the handler instances are independent, a container can be taken out of one type of rotation without affecting another. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/reference/applications/hosts.html.md # hosts.xml _hosts.xml_ is a configuration file in an [application package](application-packages.html). Elements: ``` hosts[host [name]](#host)[alias](#alias) ``` The purpose of _hosts.xml_ is to add aliases for real hostnames to self-defined aliases. The aliases are used in [services.xml](services/services.html) to map service instances to hosts. It is only needed when deploying to multiple hosts. ## host Sub-elements: - [`alias`](#alias) Example: ``` ``` SEARCH0 CONTAINER0 SEARCH1 CONTAINER1 ``` ``` ## alias Alias used in [services.xml](services/services.html) to refer to the host. Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/learn/tutorials/http-api.html.md # Building an HTTP API using request handlers and processors This tutorial builds a simple application consisting of these pieces: - A custom REST API - implemented in a _request handler_. - Two pieces of request/response processing logic - implemented as two chained _processors_. - A _component_ shared by the above processors. - A custom output format - a _renderer_. The end result is to process incoming request of the form: ``` http://hostname:port/demo?terms=something%20completely%20different ``` into a nested structure response produced by the processors and serialized by the renderer. Use the sample application found at [http-api-using-request-handlers-and-processors](https://github.com/vespa-engine/sample-apps/tree/master/examples/http-api-using-request-handlers-and-processors). ## Request handler The custom request handler is required to implement a custom API. In many cases it is not necessary to add a custom handler as the Processors can access the request data directly. However, it is needed if e.g. your application wants more control over exactly which parameters are used to route to a particular processing chain. In this case, the request handler will simply add the request URI as a property and then forward to the built-in processing handler for processing. Review the code in [DemoHandler.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DemoHandler.java) ## Processors This application contains two processors, one for annotating the incoming request (using default values from config) and checking the result, and one for creating the result using the shared component. ### AnnotatingProcessor Review the code in [AnnotatingProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/AnnotatingProcessor.java) ### DataProcessor The other processor creates some structured Response Data from data handled to it in the request. This is done in cases where the web service is a processing service. In cases where the service is implementing some middleware on top of other services, similar processors will instead make outgoing requests to downstream web services to produce Response Data. Review the code in [DataProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DataProcessor.java) Notice how the task of the server is decomposed into separate Processing steps which can be composed by chaining at configuration time and which communicates through the Request and Response only. This structure enhances sharing, reuse and modularity and makes it easy to create variations where some logic encapsulated in a Processor is added, removed or modified. The order of the processors is decided by the @Before and @After annotations - refer to [chained components](../../applications/chaining.html). ### Custom configuration The default terms used by the AnnotatingProcessor are placed in user configuration, where the definition is in [demo.def](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/resources/configdefinitions/demo.def): ``` package=com.mydomain.demo demo[].term string ``` In other words, a configuration class containing a single array named _demo_, containing a class Demo which only contains single string named _term_. ## Renderer The responsibility of the renderer is to serialize the structured result into bytes for transport back to the client. Rendering works by first creating a single instance of the renderer, invoking the constructor, then cloning a new renderer for each result set to be rendered. `init()` will be invoked once on each new clone before `render()` is invoked. Review the code in [DemoRenderer.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DemoRenderer.java) ## Shared component The responsibility of this custom component is to decouple some parts of the application from the Searcher. This makes it possible to reconfigure the Searcher without rebuilding the potentially costly custom component. In this case, what the component does is more than a little silly. More typical use would be an [FSA](/en/reference/operations/tools.html#vespa-makefsa) or complex, shared helper functionality. Review the code in [DemoComponent.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DemoComponent.java) ## Application Review the application's configuration in [services.xml](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/application/services.xml) ## Try it! Build the project, then [run a test](../../applications/developer-guide.html), querying [http://localhost:8080/demo?terms=1%202%203%204](http://localhost:8080/demo?terms=1%202%203%204) gives: ``` OK Renderer initialized: 1369733374898 http://localhost:8080/demo?terms=1%202%203%204 1 2 3 4 Rendering finished work: 1369733374902 ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Request handler](#request-handler) - [Processors](#processors) - [AnnotatingProcessor](#annotatingprocessor) - [DataProcessor](#dataprocessor) - [Custom configuration](#custom-configuration) - [Renderer](#renderer) - [Shared component](#shared-component) - [Application](#application) - [Try it!](#try-it) --- # Source: https://docs.vespa.ai/en/clients/http-best-practices.html.md # HTTP Best Practices ## Always re-use connections As connections to a JDisc container cluster are terminated at the individual container nodes, the cost of connection overhead will impact their serving capability. This is especially important for HTTPS/TLS as full TLS handshakes are expensive in terms of CPU cycles. A handshake also entails multiple network round-trips that certainly degrades request latency for new connections. A client instance should therefore re-use HTTPS connections if possible for subsequent requests. Note that some client implementation may not re-use connections by default. For instance _Apache HttpClient (Java)_[will by default not re-use connections when configured with a client X.509 certificate](https://stackoverflow.com/a/13049131/1615280). Most programmatic clients require the response content to be fully consumed/read for a connection to be reused. ## Use multiple connections Clients performing feed/query must use sufficient number of connections to spread the load evenly among all containers in a cluster. This is due to container clusters being served through a layer 4 load balancer (_Network Load Balancer_). Too few connections overall may result in an unbalanced workload, and some containers may not receive any traffic at all. This aspect is particular relevant for applications with large container clusters and/or few client instances. ## Be aware of server-initiated connection termination Vespa Cloud will terminate idle connections after a timeout and active connections after a max age threshold is exceeded. The latter is performed gracefully through mechanisms in the HTTP protocol. - _HTTP/1.1_: A `Connection: close` header is added to the response for the subsequent request received after timeout. - _HTTP/2_: A `GOAWAY` frame with error code `NO_ERROR (0x0)` is returned for the subsequent request received after timeout. Be aware that some client implementation may not handle this scenario gracefully. Both the idle timeout and max age threshold are aggressive to regularly rebalanced traffic. This ensures that new container nodes quickly receives traffic from existing client instances, for example when new resources are introduced by the [autoscaler](../operations/autoscaling.html). To avoid connection termination issues, clients should either set the `Connection: close` header to explicitly close connections after each request, or configure client-side idle timeouts to **30 seconds or less**. Doing so proactively closes idle connections before the server does and helps prevent errors caused by server-initiated terminations. ## Prefer HTTP/2 We recommend _HTTP/2_ over _HTTP/1.1_. _HTTP/2_ multiplexes multiple concurrent requests over a single connection, and its binary protocol is more compact and efficient. See Vespa's documentation on [HTTP/2](../performance/http2.html) for more details. ## Be deliberate with timeouts and retries Make sure to configure your clients with sensible timeouts and retry policies. Too low timeouts combined with aggressive retries may cause havoc on your Vespa application if latency increases due to overload. Handle _transient failures_ and _partial failures_ through a retry strategy with backoff, for instance _capped exponential backoff_ with a random _jitter_. Consider implementing a [_circuit-breaker_](https://martinfowler.com/bliki/CircuitBreaker.html) for failures persisting over a longer time-span. Only retry requests on _server errors_ - not on _client errors_. A client should typically not retry requests after receiving a `400 Bad Request` response, or retry a TLS connection after handshake fails with client's X.509 certificate being expired. Be careful when handling 5xx responses, especially `503 Service Unavailable` and `504 Gateway Timeout`. These responses typically indicate an overloaded system, and blindly retrying without backoff will only worsen the situation. Clients should reduce overall throughput when receiving such responses. The same principle applies to `429 Too Many Requests` responses from the [Document v1 API](../writing/document-v1-api-guide.html), which indicates that the client is exceeding the system's feed capacity. Clients should implement strategies such as reducing the request rate by a specific percentage, introducing exponential backoff, or pausing requests for a short duration before retrying. These adjustments help prevent further overload and allow the system to recover. For more general advise on retries and timeouts see _Amazon Builder's Library_'s[excellent article](https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/) on the subject. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Always re-use connections](#always-re-use-connections) - [Use multiple connections](#use-multiple-connections) - [Be aware of server-initiated connection termination](#be-aware-of-server-initiated-connection-termination) - [Prefer HTTP/2](#prefer-http2) - [Be deliberate with timeouts and retries](#be-deliberate-with-timeouts-and-retries) --- # Source: https://docs.vespa.ai/en/applications/http-servers-and-filters.html.md # Http servers and filters This document explains how to set up http servers and filters in the Container. Before proceeding, familiarize with the [Developer Guide](developer-guide.html). ## Set up Http servers To accept http requests on e.g. port 8090, add an `http` section with a server to _services.xml_: ``` ``` ``` ``` To verify that the new server is running, check the default handler on the root path, which will return a list of all http servers: ``` $ curl http://localhost:8090/ ``` Adding an `http` section to _services.xml_**disables the default http server** at port 8080. Binding to privileged ports (\< 1024) is supported. Note that this **only** works when running as a standalone container, and **not** when running as a Vespa cluster. ### Configure the HTTP Server Configuration settings for the server can be modified by setting values for the `jdisc.http.connector` config inside the `server` element: ``` ``` false ``` ``` Note that it is not allowed to set the `listenPort` in the http-server config, as it conflicts with the port that is set in the _port_ attribute in the _server_ element. For a complete list of configuration fields that can be set, refer to the config definition schema in [jdisc.http.connector.def](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/resources/configdefinitions/jdisc.http.jdisc.http.connector.def). ### TLS TLS can be configured using either the [ssl](../reference/applications/services/http.html#ssl) or the [ssl-provider](../reference/applications/services/http.html#ssl-provider) element. ``` ``` /path/to/private-key.pem /path/to/certificate.pem /path/to/ca-certificates.pem want TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLSv1.2,TLSv1.3 ``` ``` Refer to the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application for an example. ## Set up Filter Chains There are two main types of filters: - request filters - response filters Request filters run before the handler that processes the request, and response filters run after. They are used for tasks such as authentication, error checking and modifying headers. ### Using Filter Chains Filter chains are set up by using the `request-chain` and `response-chain` elements inside the [filtering](../reference/applications/services/http.html#filtering) element. Example setting up two request filter chains, and one response filter chain: ``` ``` ``` ``` Filters that should be used in more than one chain, must be defined directly in the `filtering` element, as shown with `request-filter1` in the example above. To actually use a filter chain, add one or more URI [bindings](../reference/applications/services/http.html#binding): ``` ``` http://*/* http://*/* ``` ``` These bindings say that both the request chain and the response chain should be used when the request URI matches `http://*/*`. So both a request filter chain and a response filter chain can be used on a single request. However, only one request chain will be used if there are multiple request chains that have a binding that matches a request. And vice versa for response chains. Refer to the [javadoc](https://javadoc.io/doc/com.yahoo.vespa/jdisc_core/latest/com/yahoo/jdisc/application/UriPattern.html) for information about which chain that will be used in such cases. In order to bind a filter chain to a specific _server_, add the server port to the binding: ``` ``` http://*:8080/* http://*:9000/* ``` ``` A request must match a filter chain if any filter is configured. A 403 response is returned for non-matching request. This semantic can be disabled - see [strict-mode](../reference/applications/services/http.html#filtering). #### Excluding Filters from an Inherited Chain Say you have a request filter chain that you are binding to most of your URIs. Now, you want to run almost the same chain on another URI, but you need to exclude one of the filters. This is done by adding `excludes`, which takes a space separated list of filter ids, to the [chain element](../reference/applications/services/http.html#chain). Example where a security filter is excluded from an inherited chain for _status.html_: ``` ``` http://*/status.html ``` ``` ### Creating a custom Filter Create an [application package](developer-guide.html) with artifactId `filter-bundle`. Create a new file `filter-bundle/components/src/main/java/com/yahoo/demo/TestRequestFilter.java`: ``` ``` package com.yahoo.demo; import com.yahoo.jdisc.*; import com.yahoo.jdisc.handler.*; import com.yahoo.jdisc.http.*; import com.yahoo.jdisc.http.filter.RequestFilter; import java.net.*; import java.nio.ByteBuffer; public class TestRequestFilter extends AbstractResource implements RequestFilter { @Override public void filter(HttpRequest httpRequest, ResponseHandler responseHandler) { if (isLocalAddress(httpRequest.getRemoteAddress())) { rejectRequest(httpRequest, responseHandler); } else { httpRequest.context().put("X-NOT-LOCALHOST", "true"); } } private boolean isLocalAddress(SocketAddress socketAddress) { if (socketAddress instanceof InetSocketAddress) { InetAddress address = ((InetSocketAddress)socketAddress).getAddress(); return address.isAnyLocalAddress() || address.isLoopbackAddress(); } else { return false; } } private void rejectRequest(HttpRequest request, ResponseHandler responseHandler) { HttpResponse response = HttpResponse.newInstance(request, Response.Status.FORBIDDEN); ContentChannel channel = responseHandler.handleResponse(response); channel.write(ByteBuffer.wrap("Not accessible by localhost.".getBytes()), null); channel.close(null); } } ``` ``` Build a bundle, and place it in the [application package](../basics/applications.html)'s _components_ directory. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Set up Http servers](#set-up-http-servers) - [Configure the HTTP Server](#configure-the-http-server) - [TLS](#tls) - [Set up Filter Chains](#set-up-filter-chains) - [Using Filter Chains](#using-filter-chains) - [Creating a custom Filter](#creating-a-custom-filter) --- # Source: https://docs.vespa.ai/en/reference/applications/services/http.html.md # services.xml - http This is the reference for the `http` subelement of[container](container.html) in [services.xml](services.html). The http block is used to configure http servers and filters - when this element is present, the default http server is disabled. ``` http[server [id, port]](#server)[ssl](#ssl)[private-key-file](#private-key-file)[certificate-file](#certificate-file)[ca-certificates-file](#ca-certificates-file)[client-authentication](#client-authentication)[protocols](#protocols)[cipher-suites](#cipher-suites)[ssl-provider [class, bundle]](#ssl-provider)[filtering](#filtering)[filter [id, class, bundle, provides, before, after]](#filter)[provides](#provides)[before](#before)[after](#after)[filter-config](#filter-config)[request-/response-chain [id, inherits, excludes]](#chain)[binding](#binding)[filter [id, class, bundle, provides, before, after]](#filter)[provides](#provides)[before](#before)[after](#after)[filter-config](#filter-config)[inherits](#inherits)[chain](#inheritedchain)[exclude](#exclude)[phase [id, before, after]](#phase)[before](#before)[after](#after) ``` Most elements takes optional [config](../config-files.html#generic-configuration-in-services-xml) elements, see example in [server](#server). Note: To bind the search handler port (i.e. the handler for queries), refer to [search bindings](search.html#binding). Example: ``` http://*/* http://*:8080/* http://*:9000/path ``` ## server The definition of a http server. Configure the server using[jdisc.http.connector.def](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/resources/configdefinitions/jdisc.http.jdisc.http.connector.def). | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | id | required | string | | The component ID | | port | optional | number | The web services port of the [environment variables](/en/operations/self-managed/files-processes-and-ports.html#environment-variables) | Server port | | default-request-chain | optional | string | | The default request chain to use for unmatched requests | | default-response-chain | optional | string | | The default response chain to use for unmatched requests | Example: ``` 90 ``` ## ssl Setup TLS on HTTP server using credentials provided in PEM format. ## private-key-file Path to private key file in PEM format. ## certificate-file Path to certificate file in PEM format. ## ca-certificates-file Path to CA certificates file in PEM format. ## client-authentication Client authentication. Supported values: _disabled_, _want_ or _need_. ## protocols Comma-separated list of TLS protocol versions to enable. Example: _TLSv1.2,TLSv1.3_. ## cipher-suites Comma-separated list of TLS cipher suites to enable. The specified ciphers must be supported by JDK installation. Example: _TLS\_AES\_256\_GCM\_SHA384,TLS\_ECDHE\_ECDSA\_WITH\_AES\_256\_GCM\_SHA384_. ## ssl-provider Setup TLS on the HTTP server through a programmatic Java interface. The specified class must implement the [SslProvider](https://javadoc.io/doc/com.yahoo.vespa/container-disc/latest/com/yahoo/jdisc/http/SslProvider.html) interface. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | class | required | string | | The class name | | bundle | required | string | | The bundle name | ## filtering `filtering` is for configuring http filter chains. Sub-elements: - [filter](#filter) - [request-chain](#chain) - [response-chain](#chain) Example: ``` http://*/ http://*/ ``` | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | strict-mode | optional | boolean | true | When set to true, all requests must match a filter. For any requests not matching, an HTTP 403 response is returned. | ## binding Specifies that requests/responses matching the given URI pattern should be sent through the [request-chain/response-chain](#chain). ## filter The definition of a single filter, for referencing when defining chains. If a single filter is to be used in different chains, it is cleaner to define it directly under `http` and then refer to it with `id`, than defining it inline separately for each chain. The following filter types are supported: - RequestFilter - ResponseFilter - SecurityRequestFilter - SecurityResponseFilter Security[Request/Response]Filters are automatically wrapped in Security[Request/Response]FilterChains. This makes them behave like regular Request/Response filters with respect to chaining. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | id | required | string | | The component ID | | class | optional | string | id | The class of the component, defaults to id | | bundle | optional | string | id or class | The bundle to load the component from, defaults to class or id (if no class is given) | | before | optional | string | | Space separated list of phases and/or filters which should succeed this phase | | class | optional | string | id | Space separated list of phases and/or filters which should precede this phase | Sub-elements: - [provides](#provides) - [before](#before) - [after](#after) - [filter-config](#filter-config) Example: ``` ``` ## provides A name provided by a filter for phases and other filters to use as dependencies. Contained in [filter](#filter) and [filter](#filter) (in chain). ## before The name of a phase or filter which should succeed this phase or filter. `before` tags may be used if it is necessary to define filters or phases which always should succeed this filter or phase in a chain. In other words, the phase or filter defined is placed _before_ name in the tag. Contained in [filter](#filter), [filter](#filter) (in chain) and [phase](#phase). ## after The name of a phase or filter which should precede this phase or filter. `after` tags may be used if it is necessary to define filters or phases which always should precede this filter or phase in a chain. In other words, the phase or filter defined is placed _after_ the name in the tag. Contained in [filter](#filter), [filter](#filter) (in chain) and [phase](#phase). Example: ``` Authorization LastFilters Earlyfilters ``` ## filter-config Only used to configure filters that are configured with `com.yahoo.jdisc.http.filter.security.FilterConfig`. This is the case for all filters provided in JDisc bundles. ## request-chain/response-chain Defines a chain of request filters or response filters, respectively. A chain is a set ordered by dependencies. Dependencies are expressed through phases, which may depend upon other phases, or filters. | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | inherits | | string | | A space separated list of chains this chain should include the contents of | | excludes | | string | | A space separated list of filters (contained in an inherited chain) this chain should not include | Sub-elements: - [binding](#binding) - [filter](#filter). Refer to or define a filter. _config_ or _filter-config_ can not be added to references, only filter definitions. - [inherits](#inherits) - [phase](#phase) Examples: ``` http://*/* http://*:8080/* http://*:9000/path ``` ## inherits Wrapper element for information about which chains, if any, a chain should inherit, and how. Contained in [request-chain](#chain) and [response-chain](#chain). Sub-elements: - (inherited) [chain](#inheritedchain) - [exclude](#exclude) ## (inherited) chain The ID of a chain which this chain should inherit, i.e. include all filters and phases from. Use multiple `chain` tags if it is necessary to combine the filters from multiple chains. Contained in [inherits](#inherits). ## exclude A filter the chain under definition should exclude from the chain or chains it inherits from. Use multiple `exclude` tags to exclude multiple filters. Contained in [inherits](#inherits). Example: ``` idOfSomeInheritedChain idOfUnwantedFilter idOfYetAnotherUnwantedFilter ``` ## phase Defines a phase, which is a checkpoint to help order filters. Filters and other phases may depend on a phase to be able to make assumptions about the order of filters. Contained in [chain](#chain). | Attribute | Required | Value | Default | Description | | --- | --- | --- | --- | --- | | id | required | string | | The ID, or name, which other phases and filters may depend upon as a [successor](#before) or [predecessor](#after) | | before | optional | string | | Space separated list of phases and/or filters which should succeed this phase | | after | optional | string | | Space separated list of phases and/or filters which should precede this phase | Sub-elements: - [before](#before) - [after](#after) Example: ``` Authorization ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [server](#server) - [ssl](#ssl) - [private-key-file](#private-key-file) - [certificate-file](#certificate-file) - [ca-certificates-file](#ca-certificates-file) - [client-authentication](#client-authentication) - [protocols](#protocols) - [cipher-suites](#cipher-suites) - [ssl-provider](#ssl-provider) - [filtering](#filtering) - [binding](#binding) - [filter](#filter) - [provides](#provides) - [before](#before) - [after](#after) - [filter-config](#filter-config) - [request-chain/response-chain](#chain) - [inherits](#inherits) - [(inherited) chain](#inheritedchain) - [exclude](#exclude) - [phase](#phase) --- # Source: https://docs.vespa.ai/en/performance/http2.html.md # HTTP/2 This document contains HTTP/2 performance considerations on the container—see [Container tuning](container-tuning.html) for general tuning of container clusters. ## Enabling HTTP/2 on container HTTP/2 is enabled by default on a container for all connectors. We recommend HTTP/2 with TLS, both for added security, but also for a more robust connection upgrade mechanism. Web browsers will typically only allow HTTP/2 over TLS. ### HTTP/2 with TLS Both HTTP/1.1 and HTTP/2 will be served over the same connector using the [TLS ALPN Extension](https://datatracker.ietf.org/doc/html/rfc7301). The Application-Layer Protocol Negotiation (ALPN) extension allows the client to send a list of supported protocols during TLS handshake. The container selects a supported protocol from that list. The [HTTP/2 specification](https://datatracker.ietf.org/doc/html/rfc7540) dictates multiple requirements for the TLS connection. Vespa may enforce some or all of these restrictions. See the HTTP/2 specification for the full list. The most significant are listed below: - Client must use at least TLSv1.2. - Client must provide target domain with the TLS Server Name Indication (SNI) Extension. - Client must not use any of the banned [TLSv1.2 ciphers](https://datatracker.ietf.org/doc/html/rfc7540#appendix-A). ### HTTP/2 without TLS The jdisc container supports both mechanism for HTTP/2 without TLS - see [testing](#testing): 1. Upgrading to HTTP/2 from HTTP/1 2. HTTP/2 with prior knowledge ## Feeding over HTTP/2 One of the major improvements with HTTP/2 is multiplexing of multiple concurrent requests over a single TCP connection. This allows for high-throughput feeding through the [/document/v1/](../reference/api/document-v1.html) HTTP API, with a simple one-operation–one-request model, but without the overhead of hundreds of parallel connections that HTTP/1.1 would require for sufficient concurrency. `vespa feed` in the [Vespa CLI](../clients/vespa-cli.html#documents) and [vespa-feed-client](../clients/vespa-feed-client.html) use /document/v1/ over HTTP/2. ## Performance tuning ### Client The number of multiple concurrent requests per connection is typically adjustable in HTTP/2 clients/libraries. Document v1 API is designed for high concurrency and can easily handle thousands of concurrent requests. Its implementation is asynchronous and max concurrency is not restricted by a thread pool size, so configure your client to allow enough concurrent requests/streams to saturate the feed container. Other APIs such as the [Query API](../querying/query-api.html) is backed by a synchronous implementation, and max concurrency is restricted by the [underlying thread pool size](container-tuning.html#container-worker-threads). Too many concurrent streams may result in the container rejecting requests with 503 responses. There are also still some reasons to use multiple TCP connections—even with HTTP/2: - **Utilize multiple containers**. A single container may not saturate the content layer. A client may have to use more connections than container nodes if the containers are behind a load balancer. - **Higher throughput**. Many clients allow only for a single thread to operate each connection. Multiple connections may be required for utilizing several CPU cores. ## Client recommendations Use [vespa-feed-client](../clients/vespa-feed-client.html) for feeding through Document v1 API (JDK8+). We recommend the [h2load benchmarking tool](https://nghttp2.org/documentation/h2load-howto.html) for load testing. [vespa-fbench](/en/reference/operations/tools.html#vespa-fbench) does not support HTTP/2 at the moment. For Java there are 4 good alternatives: 1. [Jetty Client](https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/client/HttpClient.html) 2. [OkHttp](https://square.github.io/okhttp/) 3. [Apache HttpClient 5.x](https://hc.apache.org/httpcomponents-client-5.1.x/) 4. [java.net.http.HttpClient (JDK11+)](https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html) ## Testing The server does not perform a protocol upgrade if a request contains content (POST, PUT, PATCH with payload). This might be a limitation in Jetty, the HTTP server used in Vespa. Any client should assume HTTP/2 supported - example using `curl --http2-prior-knowledge`: ``` $ curl -i --http2-prior-knowledge \ -X POST -H 'Content-Type: application/json' \ --data-binary @ext/A-Head-Full-of-Dreams.json \ http://127.0.0.1:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreamsHTTP/2 200date: Tue, 06 Dec 2022 11:04:13 GMT content-type: application/json;charset=utf-8 vary: Accept-Encoding content-length: 122 ``` Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Enabling HTTP/2 on container](#enabling-http-2-on-container) - [HTTP/2 with TLS](#http-2-with-tls) - [HTTP/2 without TLS](#http-2-without-tls) - [Feeding over HTTP/2](#feeding-over-http-2) - [Performance tuning](#performance-tuning) - [Client](#client) - [Client recommendations](#client-recommendations) - [Testing](#testing) --- # Source: https://docs.vespa.ai/en/learn/tutorials/hybrid-search.html.md # Hybrid Text Search Tutorial Hybrid search combines different retrieval methods to improve search quality. This tutorial distinguishes between two core components of search: - **Retrieval**: Identifying a subset of potentially relevant documents from a large corpus. Traditional lexical methods like [BM25](../../ranking/bm25.html) excel at this, as do modern, embedding-based [vector search](../../querying/vector-search-intro.html) approaches. - **Ranking**: Ordering retrieved documents by relevance to refine the results. Vespa's flexible [ranking framework](../../basics/ranking.html) enables complex scoring mechanisms. This tutorial demonstrates building a hybrid search application with Vespa that leverages the strengths of both lexical and embedding-based approaches. We'll use the [NFCorpus](https://ir-datasets.com/nfcorpus.html) dataset from the [BEIR](https://github.com/beir-cellar/beir) benchmark and explore various hybrid search techniques using Vespa's query language and ranking features. The main goal is to set up a text search app that combines simple text scoring features such as [BM25](../../ranking/bm25.html) [1](#fn:1) with vector search in combination with text-embedding models. We demonstrate how to obtain text embeddings within Vespa using Vespa's [embedder](../../rag/embedding.html#huggingface-embedder)functionality. In this guide, we use [snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs) as the text embedding model. It is a small model that is fast to run and has a small memory footprint. **Prerequisites:** - Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with [Podman Desktop](https://podman.io/) or [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed, with an engine running. - Alternatively, start the Podman daemon: ``` $ podman machine init --memory 6000 $ podman machine start ``` - See [Docker Containers](/en/operations/self-managed/docker-containers.html) for system limits and other settings. - For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html). - Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations/self-managed/node-setup.html#memory-settings). - Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/writing/feed-block.html). - [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/clients/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases). - Python3 - `curl` ## Installing vespa-cli and ir\_datasets This tutorial uses [Vespa-CLI](../../clients/vespa-cli.html) to deploy, feed, and query Vespa. We also use [ir-datasets](https://ir-datasets.com/) to obtain the NFCorpus relevance dataset. ``` $ pip3 install --ignore-installed vespacli ir_datasets ir_measures requests ``` We can quickly look at a document from [nfcorpus](https://ir-datasets.com/beir.html#beir/nfcorpus): ``` $ ir_datasets export beir/nfcorpus docs --format jsonl | head -1 ``` Which outputs: ``` ``` {"doc_id": "MED-10", "text": "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995\u20132003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (range 0.08\u20139.0 years) 6,011 participants died, of which 3,619 (60.2%) was due to breast cancer. After adjustment for age, tumor characteristics, and treatment selection, both post-diagnostic and pre-diagnostic statin use were associated with lowered risk of breast cancer death (HR 0.46, 95% CI 0.38\u20130.55 and HR 0.54, 95% CI 0.44\u20130.67, respectively). The risk decrease by post-diagnostic statin use was likely affected by healthy adherer bias; that is, the greater likelihood of dying cancer patients to discontinue statin use as the association was not clearly dose-dependent and observed already at low-dose/short-term use. The dose- and time-dependence of the survival benefit among pre-diagnostic statin users suggests a possible causal effect that should be evaluated further in a clinical trial testing statins\u2019 effect on survival in breast cancer patients.", "title": "Statin Use and Breast Cancer Survival: A Nationwide Cohort Study from Finland", "url": "http://www.ncbi.nlm.nih.gov/pubmed/25329299"} ``` ``` The NFCorpus documents have four fields: - The `doc_id` and `url` - The `text` and the `title` We are interested in the title and the text, and we want to be able to search across these two fields. We also need to store the `doc_id` to evaluate [ranking](../../basics/ranking.html)accuracy. We will create a small script that converts the above output to [Vespa JSON document](../../reference/schemas/document-json-format.html) format. Create a `convert.py` file: ``` ``` import sys import json for line in sys.stdin: doc = json.loads(line) del doc['url'] vespa_doc = { "put": "id:hybrid-search:doc::%s" % doc['doc_id'], "fields": { **doc } } print(json.dumps(vespa_doc)) ``` ``` With this script, we convert the document dump to Vespa JSON format. Use the following command to convert the entire dataset to Vespa JSON format: ``` $ ir_datasets export beir/nfcorpus docs --format jsonl | python3 convert.py > vespa-docs.jsonl ``` Now, we will create the Vespa application package and schema to index the documents. ## Create a Vespa Application Package A [Vespa application package](../../application-packages.html) is a set of configuration files and optional Java components that together define the behavior of a Vespa system. Let us define the minimum set of required files to create our hybrid text search application: `doc.sd` and `services.xml`. ``` $ mkdir -p app/schemas ``` ### Schema A [schema](../../basics/schemas.html) is a document-type configuration; a single Vespa application can have multiple schemas with document types. For this application, we define a schema `doc`, which must be saved in a file named `schemas/doc.sd` in the application package directory. Write the following to `app/schemas/doc.sd`: ``` schema doc { document doc { field language type string { indexing: "en" | set_language } field doc_id type string { indexing: attribute | summary match: word } field title type string { indexing: index | summary match: text index: enable-bm25 } field text type string { indexing: index | summary match: text index: enable-bm25 } } fieldset default { fields: title, text } field embedding type tensor(v[384]) { indexing: input title." ".input text | embed | attribute attribute { distance-metric: angular } } rank-profile bm25 { first-phase { expression: bm25(title) + bm25(text) } } rank-profile semantic { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) } } } ``` A lot is happening here; let us go through it in detail. #### Document type and fields The `document` section contains the fields of the document, their types, and how Vespa should index and [match](../../reference/schemas/schemas.html#match) them. The field property `indexing` configures the _indexing pipeline_ for a field. For more information, see [schemas - indexing](../../basics/schemas.html#document-fields). The [string](../../reference/schemas/schemas.html#string) data type represents both unstructured and structured texts, and there are significant differences between [index and attribute](../../querying/text-matching.html#index-and-attribute). The above schema includes default `match` modes for `attribute` and `index` property for visibility. Note that we are enabling [BM25](../../ranking/bm25.html) for `title` and `text`by including `index: enable-bm25`. The language field is the only field that is not the NFCorpus dataset. We hardcode its value to "en" since the dataset is English. Using `set_language` avoids automatic language detection and uses the value when processing the other text fields. Read more in [linguistics](../../linguistics/linguistics.html). #### Fieldset for matching across multiple fields [Fieldset](../../reference/schemas/schemas.html#fieldset) allows searching across multiple fields. Defining `fieldset` does not add indexing/storage overhead. String fields grouped using fieldsets must share the same [match](../../reference/schemas/schemas.html#match) and [linguistic processing](../../linguistics/linguistics.html) settings because the query processing that searches a field or fieldset uses _one_ type of transformation. #### Embedding inference Our `embedding` vector field is of [tensor](../../ranking/tensor-user-guide.html) type with a single named dimension (`v`) of 384 values. ``` field embedding type tensor(v[384]) { indexing: input title." ".input text | embed arctic | attribute attribute { distance-metric: angular } } ``` The `indexing` expression creates the input to the `embed` inference call (in our example the concatenation of the title and the text field). Since the dataset is small, we do not specify `index` which would build [HNSW](../../querying/approximate-nn-hnsw) data structures for faster (but approximate) vector search. This guide uses [snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs) as the text embedding model. The model is trained with cosine similarity, which maps to Vespa's `angular` [distance-metric](../../reference/schemas/schemas.html#distance-metric) for nearestNeighbor search. #### Ranking to determine matched documents ordering You can define many [rank profiles](../../basics/ranking.html), named collections of score calculations, and ranking phases. In this starting point, we have two simple rank-profile's: - a `bm25` rank-profile that uses [BM25](../../ranking/bm25.html). We sum the two field-level BM25 scores using a Vespa [ranking expression](../../ranking/ranking-expressions-features.html). - a `semantic` rank-profile which is used in combination Vespa's nearestNeighbor query operator (vector search). Both profiles specify a single [ranking phase](../../ranking/phased-ranking.html). ### Services Specification The [services.xml](../../reference/applications/services/services.html) defines the services that make up the Vespa application — which services to run and how many nodes per service. Write the following to `app/services.xml`: ``` ``` cls Represent this sentence for searching relevant passages: 1 ``` ``` Some notes about the elements above: - `` defines the [container cluster](../../applications/containers.html) for document, query and result processing. - `` sets up the [query endpoint](../../querying/query-api.html). The default port is 8080. - `` sets up the [document endpoint](../../reference/api/document-v1.html) for feeding. - `` with type `hugging-face-embedder` configures the embedder in the application package. This includes where to fetch the model files from, the prepend instructions, and the pooling strategy. See [huggingface-embedder](../../rag/embedding.html#huggingface-embedder) for details and other embedders supported. - `` defines how documents are stored and searched. - `` denotes how many copies to keep of each document. - `` assigns the document types in the _schema_ to content clusters. ## Deploy the application package Once we have finished writing our application package, we can deploy it. We use settings similar to those in the [Vespa quick start guide](../../basics/deploy-an-application-local.html). Start the Vespa container: ``` $ docker run --detach --name vespa-hybrid --hostname vespa-container \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa ``` Notice that we publish two ports: 8080 is the data-plane where we write and query documents, and 19071 is the control-plane where we can deploy the application. Note that the data-plane port is inactive before deploying the application. Configure the Vespa CLI to use the local container: ``` $ vespa config set target local ``` Starting the container can take a short while. Make sure that the configuration service is running by using `vespa status`. ``` $ vespa status deploy --wait 300 ``` Now, deploy the Vespa application from the `app` directory: ``` $ vespa deploy --wait 300 app ``` ## Feed the data The data fed to Vespa must match the document type in the schema. This step performs embed inference inside Vespa using the snowflake arctic embedding model. Remember the `component` definition in `services.xml` and the `embed` call in the schema. ``` $ vespa feed -t http://localhost:8080 vespa-docs.jsonl ``` The output should look like this (rates may vary depending on your machine HW): ``` ``` { "feeder.operation.count": 3633, "feeder.seconds": 148.515, "feeder.ok.count": 3633, "feeder.ok.rate": 24.462, "feeder.error.count": 0, "feeder.inflight.count": 0, "http.request.count": 3633, "http.request.bytes": 2985517, "http.request.MBps": 0.020, "http.exception.count": 0, "http.response.count": 3633, "http.response.bytes": 348320, "http.response.MBps": 0.002, "http.response.error.count": 0, "http.response.latency.millis.min": 316, "http.response.latency.millis.avg": 787, "http.response.latency.millis.max": 1704, "http.response.code.counts": { "200": 3633 } } ``` ``` Notice: - `feeder.ok.rate` which is the throughput (Note that this step includes embedding inference). See [embedder-performance](../../rag/embedding.html#embedder-performance) for details on embedding inference performance. In this case, embedding inference is the bottleneck for overall indexing throughput. - `http.response.code.counts` matches with `feeder.ok.count`. The dataset has 3633 documents. Note that if you observe any `429` responses, these are harmless. Vespa asks the client to slow down the feed speed because of resource contention. ## Sample queries We can now run a few sample queries to demonstrate various ways to perform searches over this data using the [Vespa query language](../../querying/query-language.html). ``` $ ir_datasets export beir/nfcorpus/test queries --fields query_id text | head -1 ``` ``` PLAIN-2 Do Cholesterol Statin Drugs Cause Breast Cancer? ``` If you see a pipe related error from the above command, you can safely ignore it. Here, `PLAIN-2` is the query id of the first test query. We'll use this test query to demonstrate querying Vespa. ### Lexical search with BM25 scoring The following query uses [weakAnd](../../ranking/wand.html) and where `targetHits` is a hint of how many documents we want to expose to configurable [ranking phases](../../ranking/phased-ranking.html). Refer to [text search tutorial](text-search.html#querying-the-data) for more on querying with `userInput`. ``` $ vespa query \ 'yql=select * from doc where {targetHits:10}userInput(@user-query)' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'hits=1' \ 'language=en' \ 'ranking=bm25' ``` Notice that we choose `ranking` to specify which rank profile to rank the documents retrieved by the query. This query returns the following [JSON result response](../../reference/querying/default-result-format.html): ``` ``` { "root": { "id": "toplevel", "relevance": 1.0, "fields": { "totalCount": 46 }, "coverage": { "coverage": 100, "documents": 3633, "full": true, "nodes": 1, "results": 1, "resultsFull": 1 }, "children": [ { "id": "id:doc:doc::MED-10", "relevance": 25.521817426330887, "source": "content", "fields": { "sddocname": "doc", "documentid": "id:doc:doc::MED-10", "doc_id": "MED-10", "title": "Statin Use and Breast Cancer Survival: A Nationwide Cohort Study from Finland", "text": "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995–2003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (range 0.08–9.0 years) 6,011 participants died, of which 3,619 (60.2%) was due to breast cancer. After adjustment for age, tumor characteristics, and treatment selection, both post-diagnostic and pre-diagnostic statin use were associated with lowered risk of breast cancer death (HR 0.46, 95% CI 0.38–0.55 and HR 0.54, 95% CI 0.44–0.67, respectively). The risk decrease by post-diagnostic statin use was likely affected by healthy adherer bias; that is, the greater likelihood of dying cancer patients to discontinue statin use as the association was not clearly dose-dependent and observed already at low-dose/short-term use. The dose- and time-dependence of the survival benefit among pre-diagnostic statin users suggests a possible causal effect that should be evaluated further in a clinical trial testing statins’ effect on survival in breast cancer patients." } } ] } } ``` ``` The query retrieves and ranks `MED-10` as the most relevant document—notice the `totalCount` which is the number of documents that were retrieved for ranking phases. In this case, we exposed about 50 documents to first-phase ranking, it is higher than our target, but also fewer than the total number of documents that match any query terms. In the example below, we change the grammar from the default `weakAnd` to `any`, and the query matches 1780, or almost 50% of the indexed documents. ``` $ vespa query \ 'yql=select * from doc where {targetHits:10, grammar:"any"}userInput(@user-query)' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'hits=1' \ 'language=en' \ 'ranking=bm25' ``` The bm25 rank profile calculates the relevance score (~25.521), which is configured in the schema as: ``` rank-profile bm25 { first-phase { expression: bm25(title) + bm25(text) } } ``` So, in this case, `relevance` is the sum of the two BM25 scores. The retrieved document looks relevant; we can look at the graded judgment for this query `PLAIN-2`. The following exports the query relevance judgments (we grep for the query id that we are interested in): ``` $ ir_datasets export beir/nfcorpus/test qrels | grep "PLAIN-2 " ``` The following is the output from the above command. Notice line two, the `MED-10` document retrieved above, is judged as very relevant with the grade 2 (perfect) for the query\_id PLAIN-2. This dataset has graded relevance judgments where a grade of 1 is less relevant than 2. Documents retrieved by the system without a relevance judgment are assumed to be irrelevant (grade 0). ``` PLAIN-2 0 MED-2427 2 PLAIN-2 0 MED-10 2 PLAIN-2 0 MED-2429 2 PLAIN-2 0 MED-2430 2 PLAIN-2 0 MED-2431 2 PLAIN-2 0 MED-14 2 PLAIN-2 0 MED-2432 2 PLAIN-2 0 MED-2428 1 PLAIN-2 0 MED-2440 1 PLAIN-2 0 MED-2434 1 PLAIN-2 0 MED-2435 1 PLAIN-2 0 MED-2436 1 PLAIN-2 0 MED-2437 1 PLAIN-2 0 MED-2438 1 PLAIN-2 0 MED-2439 1 PLAIN-2 0 MED-3597 1 PLAIN-2 0 MED-3598 1 PLAIN-2 0 MED-3599 1 PLAIN-2 0 MED-4556 1 PLAIN-2 0 MED-4559 1 PLAIN-2 0 MED-4560 1 PLAIN-2 0 MED-4828 1 PLAIN-2 0 MED-4829 1 PLAIN-2 0 MED-4830 1 ``` ### Dense search using text embedding Now, we turn to embedding-based retrieval, where we embed the query text using the configured text-embedding model and perform an exact `nearestNeighbor` search. We use [embed query](../../rag/embedding.html#embedding-a-query-text) to produce the input tensor `query(e)`, defined in the `semantic` rank-profile in the schema. ``` $ vespa query \ 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,e)' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'ranking=semantic' ``` This query returns the following [JSON result response](../../reference/querying/default-result-format.html): ``` ``` { "root": { "id": "toplevel", "relevance": 1.0, "fields": { "totalCount": 64 }, "coverage": { "coverage": 100, "documents": 3633, "full": true, "nodes": 1, "results": 1, "resultsFull": 1 }, "children": [ { "id": "id:doc:doc::MED-2429", "relevance": 0.6061378635706601, "source": "content", "fields": { "sddocname": "doc", "documentid": "id:doc:doc::MED-2429", "doc_id": "MED-2429", "title": "Statin use and risk of breast cancer: a meta-analysis of observational studies.", "text": "Emerging evidence suggests that statins' may decrease the risk of cancers. However, available evidence on breast cancer is conflicting. We, therefore, examined the association between statin use and risk of breast cancer by conducting a detailed meta-analysis of all observational studies published regarding this subject. PubMed database and bibliographies of retrieved articles were searched for epidemiological studies published up to January 2012, investigating the relationship between statin use and breast cancer. Before meta-analysis, the studies were evaluated for publication bias and heterogeneity. Combined relative risk (RR) and 95 % confidence interval (CI) were calculated using a random-effects model (DerSimonian and Laird method). Subgroup analyses, sensitivity analysis, and cumulative meta-analysis were also performed. A total of 24 (13 cohort and 11 case-control) studies involving more than 2.4 million participants, including 76,759 breast cancer cases contributed to this analysis. We found no evidence of publication bias and evidence of heterogeneity among the studies. Statin use and long-term statin use did not significantly affect breast cancer risk (RR = 0.99, 95 % CI = 0.94, 1.04 and RR = 1.03, 95 % CI = 0.96, 1.11, respectively). When the analysis was stratified into subgroups, there was no evidence that study design substantially influenced the effect estimate. Sensitivity analysis confirmed the stability of our results. Cumulative meta-analysis showed a change in trend of reporting risk of breast cancer from positive to negative in statin users between 1993 and 2011. Our meta-analysis findings do not support the hypothesis that statins' have a protective effect against breast cancer. More randomized clinical trials and observational studies are needed to confirm this association with underlying biological mechanisms in the future." } } ] } } ``` ``` The result of this vector-based search differed from the previous sparse keyword search, with a different relevant document at position 1. In this case, the relevance score is 0.606 and calculated by the `closeness` function in the `semantic` rank-profile. Note that more documents were retrieved than the `targetHits`. ``` rank-profile semantic { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) } } ``` Where [closeness(field, embedding)](../../reference/ranking/rank-features.html#attribute-match-features-normalized) is a ranking feature that calculates the cosine similarity between the query and the document embedding. This returns the inverted of the distance between the two vectors. Small distance = higher closeness. This because Vespa sorts results in descending order of relevance. Descending order means the largest will appear at the top of the ranked list. Note that similarity scores of embedding vectors are often optimized via contrastive or ranking losses, which make them difficult to interpret. ## Evaluate ranking accuracy The previous section demonstrated how to combine the Vespa query language with rank profiles to implement two different retrieval and ranking strategies. In the following section we evaluate all 323 test queries with both models to compare their overall effectiveness, measured using [nDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain). `nDCG@10` is the official evaluation metric of the BEIR benchmark and is an appropriate metric for test sets with graded relevance judgments. For this evaluation task, we need to write a small script. The following script iterates over the queries in the test set, executes the query against the Vespa instance, and reads the response from Vespa. It then evaluates and prints the metric. The overall effectiveness is measured using the average of each query `nDCG@10` metric. ``` ``` import requests import ir_datasets from ir_measures import calc_aggregate, nDCG, ScoredDoc from enum import Enum from typing import List class RModel(Enum): SPARSE = 1 DENSE = 2 HYBRID = 3 def parse_vespa_response(response:dict, qid:str) -> List[ScoredDoc]: result = [] hits = response['root'].get('children',[]) for hit in hits: doc_id = hit['fields']['doc_id'] relevance = hit['relevance'] result.append(ScoredDoc(qid, doc_id, relevance)) return result def search(query:str, qid:str, ranking:str, hits=10, language="en", mode=RModel.SPARSE) -> List[ScoredDoc]: yql = "select doc_id from doc where ({targetHits:100}userInput(@user-query))" if mode == RModel.DENSE: yql = "select doc_id from doc where ({targetHits:10}nearestNeighbor(embedding, e))" elif mode == RModel.HYBRID: yql = "select doc_id from doc where ({targetHits:100}userInput(@user-query)) OR ({targetHits:10}nearestNeighbor(embedding, e))" query_request = { 'yql': yql, 'user-query': query, 'ranking.profile': ranking, 'hits' : hits, 'language': language } if mode == RModel.DENSE or mode == RModel.HYBRID: query_request['input.query(e)'] = "embed(@user-query)" response = requests.post("http://localhost:8080/search/", json=query_request) if response.ok: return parse_vespa_response(response.json(), qid) else: print("Search request failed with response " + str(response.json())) return [] def main(): import argparse parser = argparse.ArgumentParser(description='Evaluate ranking models') parser.add_argument('--ranking', type=str, required=True, help='Vespa ranking profile') parser.add_argument('--mode', type=str, default="sparse", help='retrieval mode, valid values are sparse, dense, hybrid') args = parser.parse_args() mode = RModel.HYBRID if args.mode == "sparse": mode = RModel.SPARSE elif args.mode == "dense": mode = RModel.DENSE dataset = ir_datasets.load("beir/nfcorpus/test") results = [] metrics = [nDCG@10] for query in dataset.queries_iter(): qid = query.query_id query_text = query.text results.extend(search(query_text, qid, args.ranking, mode=mode)) metrics = calc_aggregate(metrics, dataset.qrels, results) print("Ranking metric NDCG@10 for rank profile {}: {:.4f}".format(args.ranking, metrics[nDCG@10])) if __name__ == "__main__": main() ``` ``` Then execute the script: ``` $ python3 evaluate_ranking.py --ranking bm25 --mode sparse ``` The script will produce the following output: ``` Ranking metric NDCG@10 for rank profile bm25: 0.3210 ``` Now, we can evaluate the dense model using the same script: ``` $ python3 evaluate_ranking.py --ranking semantic --mode dense ``` ``` Ranking metric NDCG@10 for rank profile semantic: 0.3077 ``` Note that the _average_ `nDCG@10` score is computed across all the 327 test queries. You can also experiment beyond a single metric and modify the script to calculate more [measures](https://ir-measur.es/en/latest/measures.html), for example, including precision with a relevance label cutoff of 2: ``` metrics = [nDCG@10, P(rel=2)@10] ``` Also note that the exact nDCG@10 values may vary slightly between runs. ## Hybrid Search & Ranking We demonstrated and evaluated two independent retrieval and ranking strategies in the previous sections. Now, we want to explore hybrid search techniques where we combine: - traditional lexical keyword matching with a text scoring method (BM25) - embedding-based search using a text embedding model With Vespa, there is a distinction between retrieval (matching) and configurable [ranking](../../basics/ranking.html). In the Vespa ranking phases, we can express arbitrary scoring complexity with the full power of the Vespa [ranking](../../basics/ranking.html) framework. Meanwhile, top-k retrieval relies on simple built-in functions associated with Vespa's top-k query operators. These top-k operators aim to avoid scoring all documents in the collection for a query by using a simplistic scoring function to identify the top-k documents. These top-k query operators use `index` structures to accelerate the query evaluation, avoiding scoring all documents using heuristics. In the context of hybrid text search, the following Vespa top-k query operators are relevant: - YQL `{targetHits:k}nearestNeighbor()` for dense representations (text embeddings) using a configured [distance-metric](../../reference/schemas/schemas.html#distance-metric) as the scoring function. - YQL `{targetHits:k}userInput(@user-query)` which by default uses [weakAnd](../../ranking/wand.html) for sparse representations. We can combine these operators using boolean query operators like AND/OR/RANK to express a hybrid search query. Then, there is a wild number of ways that we can combine various signals in [ranking](../../basics/ranking.html). ### Define our first simple hybrid rank profile First, we can add our first simple hybrid rank profile that combines the dense and sparse components using multiplication to combine them into a single score. ``` closeness(field, embedding) * (1 + bm25(title) + bm25(text)) ``` - the [closeness(field, embedding)](../../reference/ranking/rank-features.html#attribute-match-features-normalized) rank-feature returns a normalized score in the range 0 to 1 inclusive - Any of the per-field BM25 scores are in the range of 0 to infinity We add a bias constant (1) to avoid the overall score becoming 0 if the document does not match any query terms, as the BM25 scores would be 0. We also add `match-features` to be able to debug each of the scores. ``` schema doc { document doc { field language type string { indexing: "en" | set_language } field doc_id type string { indexing: attribute | summary match: word } field title type string { indexing: index | summary match: text index: enable-bm25 } field text type string { indexing: index | summary match: text index: enable-bm25 } } fieldset default { fields: title, text } field embedding type tensor(v[384]) { indexing: input title." ".input text | embed | attribute attribute { distance-metric: angular } } rank-profile hybrid { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) * (1 + (bm25(title) + bm25(text))) } match-features: bm25(title) bm25(text) closeness(field, embedding) } } ``` Now, re-deploy the Vespa application from the `app` directory: ``` $ vespa deploy --wait 300 app ``` After that, we can start experimenting with how to express hybrid queries using the Vespa query language. ### Hybrid query examples The following demonstrates combining the two top-k query operators using the Vespa query language. In a later section, we will show how to combine the two retrieval strategies using the Vespa ranking framework. This section focuses on the top-k retrieval part that exposes matched documents to the Vespa [ranking](../../basics/ranking.html) phase(s). #### Hybrid query using the OR operator The following query exposes documents to ranking that match the query using _either (OR)_ the sparse or dense representation. ``` $ vespa query \ 'yql=select * from doc where ({targetHits:10}userInput(@user-query)) or ({targetHits:10}nearestNeighbor(embedding,e))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` The documents retrieved into ranking is scored by the `hybrid` rank-profile. Note that both top-k query operators might expose more than the the `targetHits` setting. The above query returns the following [JSON result response](../../reference/querying/default-result-format.html): ``` ``` { "root": { "id": "toplevel", "relevance": 1.0, "fields": { "totalCount": 87 }, "coverage": { "coverage": 100, "documents": 3633, "full": true, "nodes": 1, "results": 1, "resultsFull": 1 }, "children": [ { "id": "id:doc:doc::MED-10", "relevance": 15.898915593367988, "source": "content", "fields": { "matchfeatures": { "bm25(text)": 17.35556767018612, "bm25(title)": 8.166249756144769, "closeness(field,embedding)": 0.5994655395517325 }, "sddocname": "doc", "documentid": "id:doc:doc::MED-10", "doc_id": "MED-10", "title": "Statin Use and Breast Cancer Survival: A Nationwide Cohort Study from Finland", "text": "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995–2003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (range 0.08–9.0 years) 6,011 participants died, of which 3,619 (60.2%) was due to breast cancer. After adjustment for age, tumor characteristics, and treatment selection, both post-diagnostic and pre-diagnostic statin use were associated with lowered risk of breast cancer death (HR 0.46, 95% CI 0.38–0.55 and HR 0.54, 95% CI 0.44–0.67, respectively). The risk decrease by post-diagnostic statin use was likely affected by healthy adherer bias; that is, the greater likelihood of dying cancer patients to discontinue statin use as the association was not clearly dose-dependent and observed already at low-dose/short-term use. The dose- and time-dependence of the survival benefit among pre-diagnostic statin users suggests a possible causal effect that should be evaluated further in a clinical trial testing statins’ effect on survival in breast cancer patients." } } ] } } ``` ``` What is going on here is that we are combining the two top-k query operators using a boolean OR (disjunction). The `totalCount` is the number of documents retrieved into ranking (About 90, which is higher than 10 + 10). The `relevance` is the score assigned by `hybrid` rank-profile. Notice that the `matchfeatures` field shows all the feature scores. This is useful for debugging and understanding the ranking behavior, also for feature logging. #### Hybrid query with AND operator The following combines the two top-k operators using AND, meaning that the retrieved documents must match both the sparse and dense top-k operators: ``` $ vespa query \ 'yql=select * from doc where ({targetHits:10}userInput(@user-query)) and ({targetHits:10}nearestNeighbor(embedding,e))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` For the sparse keyword query matching, the `weakAnd` operator is used by default and it requires that at least one term in the query matches the document (fieldset searched). #### Hybrid query with rank query operator The following combines the two top-k operators using the [rank](../../reference/querying/yql.html#rank) query operator, which allows us to retrieve using only the first operand of the rank operator, but where the remaining operands allow computing (match) features that can be used in ranking phases. This query is meaningful because we can use the computed features in the ranking expressions but retrieve only by the dense representation. This is usually the most resource-effective way to combine the two representations. ``` $ vespa query \ 'yql=select * from doc where rank(({targetHits:10}nearestNeighbor(embedding,e)), ({targetHits:10}userInput(@user-query)))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` We can also invert the order of the operands to the `rank` query operator that retrieves by the sparse representation but uses the dense representation to compute features for ranking. This is very useful in cases where we do not want to build HNSW indexes (adds memory and slows down indexing), but still be able to use semantic signals in ranking phases. ``` $ vespa query \ 'yql=select * from doc where rank(({targetHits:10}userInput(@user-query)),({targetHits:10}nearestNeighbor(embedding,e)))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` This way of performing hybrid retrieval allows retrieving only by the sparse representation and uses the dense vector representation to compute features for ranking. ## Hybrid ranking In the previous section, we demonstrated combining the two top-k query operators using boolean query operators. This section will show combining the two retrieval strategies using the Vespa ranking framework. We can first start evaluating the effectiveness of the hybrid rank profile that combines the two retrieval strategies. ``` $ python3 evaluate_ranking.py --ranking hybrid --mode hybrid ``` Which outputs ``` Ranking metric NDCG@10 for rank profile hybrid: 0.3287 ``` The `nDCG@10` score is slightly higher than the profiles that only use one of the ranking strategies. Now, we can experiment with more complex ranking expressions that combine the two retrieval strategies. We add a few more rank profiles to the schema that combine the two retrieval strategies in different ways. ``` schema doc { document doc { field language type string { indexing: "en" | set_language } field doc_id type string { indexing: attribute | summary match: word } field title type string { indexing: index | summary match: text index: enable-bm25 } field text type string { indexing: index | summary match: text index: enable-bm25 } } fieldset default { fields: title, text } field embedding type tensor(v[384]) { indexing: input title." ".input text | embed | attribute attribute { distance-metric: angular } } rank-profile hybrid { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) * (1 + (bm25(title) + bm25(text))) } match-features: bm25(title) bm25(text) closeness(field, embedding) } rank-profile hybrid-sum inherits hybrid { first-phase { expression: closeness(field, embedding) + ((bm25(title) + bm25(text))) } } rank-profile hybrid-normalize-bm25-with-atan inherits hybrid { function scale(val) { expression: 2*atan(val/8)/(3.14159) } function normalized_bm25() { expression: scale(bm25(title) + bm25(text)) } function cosine() { expression: cos(distance(field, embedding)) } first-phase { expression: normalized_bm25 + cosine } match-features { normalized_bm25 cosine bm25(title) bm25(text) } } rank-profile hybrid-rrf inherits hybrid-normalize-bm25-with-atan{ function bm25_score() { expression: bm25(title) + bm25(text) } global-phase { rerank-count: 100 expression: reciprocal_rank(bm25_score) + reciprocal_rank(cosine) } match-features: bm25(title) bm25(text) bm25_score cosine } rank-profile hybrid-linear-normalize inherits hybrid-normalize-bm25-with-atan{ function bm25_score() { expression: bm25(title) + bm25(text) } global-phase { rerank-count: 100 expression: normalize_linear(bm25_score) + normalize_linear(cosine) } match-features: bm25(title) bm25(text) bm25_score cosine } } ``` Now, re-deploy the Vespa application from the `app` directory: ``` $ vespa deploy --wait 300 app ``` Let us break down the new rank profiles: - `hybrid-sum` combines the two retrieval strategies using addition. This is a simple way to combine the two strategies. But since the BM25 scores are not normalized (unbound) and the closeness score is normalized (0-1), the BM25 scores will dominate the closeness score. - `hybrid-normalize-bm25-with-atan` combines the two strategies using a normalized BM25 score and the cosine similarity. The BM25 scores are normalized using the `atan` function. - `hybrid-rrf` combines the two strategies using the reciprocal rank feature. This is a way to combine the two strategies using a reciprocal rank feature. - `hybrid-linear-normalize` combines the two strategies using a linear normalization function. This is a way to combine the two strategies using a linear normalization function. The two last profiles are using `global-phase` to rerank the top 100 documents using the reciprocal rank and linear normalization functions. This can only be done in the global phase as it requires access to all the documents that are retrieved into ranking and in a multi-node setup, this requires communication between the nodes and knowledge of the score distribution across all the nodes. In addition, each ranking phase can only order the documents by a single score. ### Evaluate the new rank profiles Adding new rank-profiles is a hot change. Once we have deployed the application, we can evaluate the new hybrid profiles using the script: ``` $ python3 evaluate_ranking.py --ranking hybrid-sum --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-sum: 0.3244 ``` ``` $ python3 evaluate_ranking.py --ranking hybrid-normalize-bm25-with-atan --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-normalize-bm25-with-atan: 0.3410 ``` ``` $ python3 evaluate_ranking.py --ranking hybrid-rrf --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-rrf: 0.3195 ``` ``` $ python3 evaluate_ranking.py --ranking hybrid-linear-normalize --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-linear-normalize: 0.3387 ``` On this particular dataset, the `hybrid-normalize-bm25-with-atan` rank profile performs the best, but the difference is small. This also demonstrates that hybrid search and ranking is a complex problem and that the effectiveness of the hybrid model depends on the dataset and the retrieval strategies. These results (which is the best) might not transfer to your specific retrieval use case and dataset, so it is important to evaluate the effectiveness of a hybrid model on your specific dataset. See [Improving retrieval with LLM-as-a-judge](https://blog.vespa.ai/improving-retrieval-with-llm-as-a-judge/) for more information on how to collect relevance judgments for your dataset. ### Summary We showed how to express hybrid queries using the Vespa query language and how to combine the two retrieval strategies using the Vespa ranking framework. We also showed how to evaluate the effectiveness of the hybrid ranking model using one of the datasets that are a part of the BEIR benchmark. We hope this tutorial has given you a good understanding of how to combine different retrieval strategies using Vespa, and that there is not a single silver bullet for all retrieval problems. ## Cleanup ``` $ docker rm -f vespa-hybrid ``` 1. Robertson, Stephen and Zaragoza, Hugo and others, 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval. [↩](#fnref:1) Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Installing vespa-cli and ir\_datasets](#installing-vespa-cli-and-ir_datasets) - [Create a Vespa Application Package](#create-a-vespa-application-package) - [Schema](#schema) - [Services Specification](#services-specification) - [Deploy the application package](#deploy-the-application-package) - [Feed the data](#feed-the-data) - [Sample queries](#sample-queries) - [Lexical search with BM25 scoring](#lexical-search-with-bm25-scoring) - [Dense search using text embedding](#dense-search-using-text-embedding) - [Evaluate ranking accuracy](#evaluate-ranking-accuracy) - [Hybrid Search & Ranking](#hybrid-search--ranking) - [Define our first simple hybrid rank profile](#define-our-first-simple-hybrid-rank-profile) - [Hybrid query examples](#hybrid-query-examples) - [Hybrid ranking](#hybrid-ranking) - [Evaluate the new rank profiles](#evaluate-the-new-rank-profiles) - [Summary](#summary) - [Cleanup](#cleanup) --- # Source: https://docs.vespa.ai/en/applications/ide-support.html.md # IDE support Vespa provides plugins for working with schemas and rank profiles in IDE's: - VSCode: [VS Code extension](https://marketplace.visualstudio.com/items?itemName=vespaai.vespa-language-support) - Cursor, code-server and other VS Code compatible IDEs: [VS Code extension in Open VSX registry](https://open-vsx.org/extension/vespaai/vespa-language-support) - IntelliJ, PyCharm or WebStorm: [Jetbrains plugin](https://plugins.jetbrains.com/plugin/18074-vespa-schema-language-support) - Vim: [neovim](https://blog.vespa.ai/interns-languageserver/#neovim-plugin) If you are working with non-trivial Vespa applications, installing a plugin is highly recommended! ![IDE demo](/assets/img/ide.gif) Copyright © 2026 - [Cookie Preferences](#) --- # Source: https://docs.vespa.ai/en/content/idealstate.html.md # Distribution Algorithm The distribution algorithm decides what nodes should be responsible for a given bucket. This is used directly in the clients to calculate distributor to talk to. Content nodes need time to move buckets when the distribution is changing, so routing to content nodes is done using tracked current state. The distribution algorithm decides which content nodes is wanted to store the bucket copies though, and due to this, the algorithm is also referred to as the ideal state algorithm. The input to the distribution algorithm is a bucket identifier, together with knowledge about what nodes are available, and what their capacities are. The output of the distribution algorithm is a sorted list of the available nodes. The first node in the order is the node most preferred to handle a given bucket. Currently, the highest order distributor node will be the owning distributor, and the redundancy factor decides how many of the highest order content nodes are preferred to store copies for a bucket. To enable minimal transfer of buckets when the list of available nodes changes, the removal or addition of nodes should not alter the sort order of the remaining nodes. Desired qualities for the ideal state algorithm: | Minimal reassignment on cluster state change | - If a node goes down, only buckets that resided on that node should be reassigned. - If a node comes up, only buckets that are moved to the new node should relocate. - Increasing the capacity of a single node should only move buckets to that node. - Reducing the capacity of a single node should only move buckets away from that node. | | No skew in distribution | - Nodes should get an amount of data relative to their capacity. | | Lightweight | - A simple algorithm that is easy to understand is a plus. Being lightweight to calculate is also a plus, giving more options of how to use it, without needing to cache results. | ## Computational cost When considering how efficient the algorithm have to be, it is important to consider how often we need to calculate the ideal locations. Calculations are needed for the following tasks: - A client needs to map buckets to the distributors. If there are few buckets existing, all the results can be cached in clients, but for larger clusters, a lot of buckets may need to exist to create an even distribution, and caching becomes more memory intensive. Preferably the computational cost is cheap enough, such that no caching is needed. Currently, no caching is done by clients, but there is typically less than a million buckets, so caching all results would still have been viable. - Distributors need to calculate ideal state for a single bucket to verify that incoming operations are mapped to the correct distributor (clients have cluster state matching the distributor). This could be eliminated for buckets pre-existing in the bucket database, which would be true in most all cases. Currently, calculation is done for all requests. - Distributors need to calculate correct content nodes to create bucket copies on when operations to currently non-existing buckets come in. This is typically only something happening at the start of the cluster lifetime though. Normally buckets are created through splitting or joining existing buckets. - Distributors need to calculate ideal state to check if any maintenance operations need to be done for a bucket. - Content nodes need to calculate ideal state for a single bucket to verify that the correct distributor sent the request. This could be cached or served through bucket database but currently there is no need. As long as the algorithm is cheap, we can avoid needing to cache the result. The cache will then not limit scalability, and we have less dependencies and complexity within the content layer. The current algorithm has shown itself cheap enough, such that little caching has been needed. ## A simple example: Modulo A simple approach would be to use a modulo operation to find the most preferred node, and then just order the nodes in configured order from there, skipping nodes that are currently not available: $$\text{most preferred node} = \text{bucket % nodecount}$$ Properties: - Computational lightweight and easy to understand - Perfect distribution among nodes. - Total redistribution on state change. By just skipping currently unavailable nodes, nodes can go down and up with minimal movement. However, if the number of configured nodes change, practically all buckets will be redistributed. As the content layer is intended to be scalable, this breaks with one of the intentions and this algorithm has thus not been considered. ## Weighted random election This is the algorithm that is currently used for distribution in the content layer, as it fits our use case well. To avoid a total redistribution on state change, the mapping can not be heavily dependent on the number of nodes in the cluster. By using random numbers, we can distribute the buckets randomly between the nodes, in such a fashion that altering the cluster state has a small impact. As we need the result to be reproducible, we obviously need to use a pseudo-random number generator and not real random numbers. The idea is as follows. To find the location of a given bucket, seed a random number generator with the bucket identifier, when draw one number for each node. The drawn numbers will then decide upon the preferred node order for that specific bucket. For this to be reproducible, all nodes need to draw the same numbers each time. Each node is assigned a distribution key in the configuration. This key decides what random number the node will be assigned. For instance, a node with distribution key 13, will be assigned the 14th random number generated. (As the first will go to the node with key 0). The existence of this node then also requires us to always generate at least 14 random numbers to do the calculation. Thus, one may end up calculating random numbers for nodes that are currently not available, either because they are temporarily down, or because the configuration have left holes in the distribution key space. It is recommended to not leave too large holes in the distribution key space to not waste too much. Using this approach, if you add another node to the cluster, it will roll for each bucket. It should thus steal ownership of some of the buckets. As all the numbers are random, it will steal buckets from all the other nodes, thus, given that the bucket count is large compared to the number of nodes, it will steal on average 1/n of the buckets from each pre-existing node, where n is the number of nodes in the current cluster. Likewise, if a node is removed from the cluster, the remaining nodes will divide the extra load between them. ### Weighting nodes By enforcing all the numbers drawn to be floating point numbers between 0 and 1, we can introduce node weights using the following formula: $${r}^{\frac{1}{c}}$$ Where r is the floating point number between 0 and 1 that was drawn for a given node, and c is the node capacity, which is the weight of the node. Proof not included here, but this will end up giving each node on average an amount of data that is relative to its capacity. That is, among any nodes there are two nodes X and Y, where the number of buckets given to X should be equal to the number of buckets given to Y multiplied by capacity(X)/capacity(Y). (Given perfect random distribution) Altering the weight in a running system will also create a minimal redistribution of data. If we reduce the capacity, all the nodes number will be reduced, and some of its buckets will be taken over by the other nodes, and vice versa if the capacity is increased. Properties: - Minimum data movement on state changes. - Some skew, depending on how good the random number generator is, the amount of nodes we have to divide buckets between, and the number of buckets we have to divide between them. - Fairly cheap to compute given a reasonable amount of nodes, and an inexpensive pseudo-random number generator. ### Distribution skew The algorithm does generate a bit of skew in the distribution, as it is essentially random. The following attributes decrease the skew: - Having more buckets to distribute. - Having less targets (nodes and partitions) to distribute buckets to. - Having a more uniform pseudo-random function. The more buckets exist, the more metadata needs to be tracked in the distributors though, and operations that wants to scan all the buckets will take longer. Additionally, the backend may want buckets above a given size to improve performance, storage efficiency or similar. Consequently, we typically want to enforce enough buckets for a decent distribution, but not more. Then the number of nodes increase, more buckets need to exist to keep the distribution even. If the number of nodes is doubled, the number of buckets must typically more than double to keep the distribution equally even. Thus, this scales worse than linear. It does not scale much worse though, and this has not proved to be a practical problem for the cluster sizes we have used up until now. (A cluster size of a thousand nodes does not seem to be any issue here) Having a good and uniform pseudo-random function makes the distribution more even. However, this may require more computationally heavy generators. Currently, we are using a simple and fast algorithm, and it has proved more than sufficient for our needs. The distribution to distributors are done to create an even distribution between the nodes. The distributors are free to split the buckets further if the backend wants buckets to contain less data. They can not use fewer buckets than are needed for distribution though. By using a minimum amount of buckets for distribution, the distributors have more freedom to control sizes of buckets. ### Distribution waste To measure how many buckets are needed to create a decent distribution a metric is needed. We have defined a waste metric for this purpose as follows: Distribute the buckets to all the units. Assume the size of all units are identical. Assume the unit with the most units assigned to it is at 100% capacity. The wasted space is the percentage of unused capacity compared to the used capacity. This definition seems useful as a cluster is considered at full capacity once one of its partitions is at full capacity. Having one node with more buckets than the rest is thus damaging, while having one node with fewer buckets than the rest is just fine. Example: There are 4 nodes distributing 18 units. The node with the most units has 6. Distribution waste is `100% * (4 * 6 - 18) / (4 * 6) = 25%`. Below we have calculated waste based on number of nodes and the amount of buckets to distribute between them. Bits refer to distribution bits used. A distribution bit count of 16 indicates that there will be 216 buckets. The calculations assume all buckets have the same size. This is normally close to true as documents are randomly assigned to buckets. There will be lots of buckets per node too, so a little variance typically evens out fairly well. The tables below assume only one partition exist on each node. If you have 4 partitions on 16 nodes, you should rather use the values for `4 * 16 = 64` nodes. A higher redundancy factor indicates more buckets to distribute between the same amount of nodes, resulting in a more even distribution. Doubling the redundancy has the same effect as adding one to the distribution bit count. To get values for redundancy 4, the redundancy 2 values can be used, and then the waste will be equal to the value with one less distribution bit used. ### Calculated waste from various cluster sizes A value of 1 indicates 100% waste. A value of 0.1 indicates 10% waste. A waste below 1 % is shown green, below 10% as yellow and below 30% as orange. Red indicates more than 30% waste. #### Distribution with redundancy 1: | Bits \ Nodes | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | | 1 | 0.0000 | 0.0000 | 0.3333 | 0.5000 | 0.6000 | 0.6667 | 0.7143 | 0.7500 | 0.7778 | 0.8000 | 0.8182 | 0.8333 | 0.8462 | 0.8571 | 0.8667 | | 2 | 0.0000 | 0.3333 | 0.3333 | 0.5000 | 0.2000 | 0.3333 | 0.4286 | 0.5000 | 0.5556 | 0.6000 | 0.6364 | 0.6667 | 0.6923 | 0.7143 | 0.7333 | | 3 | 0.0000 | 0.2000 | 0.1111 | 0.3333 | 0.2000 | 0.3333 | 0.6190 | 0.6667 | 0.8222 | 0.8400 | 0.8545 | 0.8333 | 0.6923 | 0.7143 | 0.7333 | | 4 | 0.0000 | 0.1111 | 0.1111 | 0.3333 | 0.3600 | 0.3333 | 0.4286 | 0.5000 | 0.7778 | 0.8000 | 0.8182 | 0.8095 | 0.6923 | 0.7143 | 0.6444 | | 5 | - | 0.0588 | 0.1111 | 0.2727 | 0.2889 | 0.4074 | 0.2381 | 0.3333 | 0.8129 | 0.8316 | 0.8469 | 0.8519 | 0.8359 | 0.8367 | 0.8359 | | 6 | - | 0.0000 | 0.0725 | 0.1579 | 0.1467 | 0.1111 | 0.1688 | 0.3846 | 0.7037 | 0.7217 | 0.7470 | 0.7460 | 0.7265 | 0.6952 | 0.6718 | | 7 | - | 0.0725 | 0.0519 | 0.0857 | 0.0857 | 0.1111 | 0.2050 | 0.2000 | 0.4530 | 0.4667 | 0.5152 | 0.5152 | 0.4530 | 0.3905 | 0.3436 | | 8 | - | 0.0000 | 0.0078 | 0.0725 | 0.0857 | 0.0922 | 0.1293 | 0.1351 | 0.1634 | 0.1742 | 0.1688 | 0.2381 | 0.2426 | 0.2967 | 0.3173 | | 9 | - | 0.0039 | 0.0192 | 0.1467 | 0.1607 | 0.1203 | 0.1080 | 0.1111 | 0.1380 | 0.1322 | 0.1218 | 0.1795 | 0.1962 | 0.2381 | 0.2580 | | 10 | - | 0.0019 | 0.0275 | 0.0922 | 0.0898 | 0.0623 | 0.0741 | 0.0922 | 0.1111 | 0.1018 | 0.1218 | 0.1203 | 0.1438 | 0.1688 | 0.1675 | | 11 | - | 0.0019 | 0.0234 | 0.0430 | 0.0385 | 0.0248 | 0.0248 | 0.0483 | 0.0636 | 0.0648 | 0.0737 | 0.0725 | 0.0894 | 0.0800 | 0.0958 | | 12 | - | - | 0.0121 | 0.0285 | 0.0282 | 0.0121 | 0.0149 | 0.0571 | 0.0577 | 0.0562 | 0.0549 | 0.0412 | 0.0510 | 0.0439 | 0.0616 | | 13 | - | - | 0.0074 | 0.0019 | 0.0070 | 0.0177 | 0.0304 | 0.0303 | 0.0337 | 0.0189 | 0.0252 | 0.0358 | 0.0409 | 0.0501 | 0.0385 | | 14 | - | - | 0.0041 | 0.0024 | 0.0037 | 0.0027 | 0.0145 | 0.0073 | 0.0101 | 0.0130 | 0.0220 | 0.0234 | 0.0290 | 0.0248 | 0.0195 | | 15 | - | - | 0.0019 | 0.0021 | 0.0036 | 0.0083 | 0.0059 | 0.0056 | 0.0101 | 0.0097 | 0.0123 | 0.0163 | 0.0150 | 0.0186 | 0.0173 | | 16 | - | - | 0.0010 | 0.0007 | 0.0010 | 0.0030 | 0.0049 | 0.0039 | 0.0085 | 0.0072 | 0.0097 | 0.0108 | 0.0135 | 0.0141 | 0.0115 | | 17 | - | - | - | - | - | 0.0030 | 0.0033 | 0.0024 | 0.0036 | 0.0030 | 0.0055 | 0.0091 | 0.0135 | 0.0156 | 0.0143 | | 18 | - | - | - | - | - | - | 0.0019 | - | 0.0029 | 0.0027 | 0.0043 | 0.0040 | 0.0066 | 0.0061 | 0.0060 | | 19 | - | - | - | - | - | - | - | - | 0.0019 | - | 0.0021 | 0.0030 | 0.0023 | 0.0031 | 0.0042 | | 20 | - | - | - | - | - | - | - | - | - | - | - | 0.0029 | 0.0025 | 0.0037 | 0.0044 | | 21 | - | - | - | - | - | - | - | - | - | - | - | - | 0.0026 | 0.0035 | 0.0040 | #### Distribution with redundancy 2: | Bits \ Nodes | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | | 1 | 0.0000 | 0.0000 | 0.3333 | 0.5000 | 0.6000 | 0.6667 | 0.4286 | 0.5000 | 0.5556 | 0.6000 | 0.6364 | 0.6667 | 0.6923 | 0.7143 | 0.7333 | | 2 | 0.0000 | 0.0000 | 0.3333 | 0.3333 | 0.2000 | 0.3333 | 0.4286 | 0.5000 | 0.5556 | 0.6000 | 0.6364 | 0.6667 | 0.6923 | 0.4286 | 0.4667 | | 3 | 0.0000 | 0.0000 | 0.1111 | 0.2000 | 0.2000 | 0.3333 | 0.4286 | 0.5000 | 0.7037 | 0.7333 | 0.7576 | 0.7778 | 0.7949 | 0.7714 | 0.7333 | | 4 | 0.0000 | 0.0000 | 0.1111 | 0.2000 | 0.2000 | 0.3333 | 0.3469 | 0.2000 | 0.7460 | 0.7714 | 0.7762 | 0.7778 | 0.7949 | 0.7714 | 0.7630 | | 5 | - | - | 0.0725 | 0.1579 | 0.2471 | 0.2381 | 0.2967 | 0.2727 | 0.7265 | 0.7538 | 0.7673 | 0.7778 | 0.7949 | 0.7922 | 0.7968 | | 6 | - | - | 0.0519 | 0.1111 | 0.1742 | 0.1467 | 0.2050 | 0.2381 | 0.6908 | 0.7023 | 0.7016 | 0.7117 | 0.7265 | 0.7229 | 0.7247 | | 7 | - | - | 0.0303 | 0.0154 | 0.0340 | 0.0303 | 0.0857 | 0.1111 | 0.4921 | 0.4880 | 0.4828 | 0.4797 | 0.5077 | 0.4622 | 0.4667 | | 8 | - | - | 0.0078 | 0.0303 | 0.0248 | 0.0623 | 0.0857 | 0.0725 | 0.0970 | 0.1322 | 0.1049 | 0.1293 | 0.1620 | 0.1873 | 0.2242 | | 9 | - | - | 0.0019 | 0.0266 | 0.0519 | 0.0466 | 0.0682 | 0.0791 | 0.0824 | 0.0519 | 0.0691 | 0.0519 | 0.0623 | 0.0741 | 0.0898 | | 10 | - | - | 0.0063 | 0.0173 | 0.0154 | 0.0275 | 0.0116 | 0.0340 | 0.0558 | 0.0294 | 0.0452 | 0.0466 | 0.0567 | 0.0501 | 0.0584 | | 11 | - | - | 0.0078 | 0.0049 | 0.0154 | 0.0177 | 0.0149 | 0.0210 | 0.0275 | 0.0177 | 0.0252 | 0.0303 | 0.0305 | 0.0344 | 0.0317 | | 12 | - | - | - | 0.0073 | 0.0112 | 0.0192 | 0.0231 | 0.0312 | 0.0296 | 0.0177 | 0.0278 | 0.0358 | 0.0245 | 0.0312 | 0.0385 | | 13 | - | - | - | 0.0061 | 0.0049 | 0.0096 | 0.0112 | 0.0201 | 0.0218 | 0.0088 | 0.0077 | 0.0199 | 0.0138 | 0.0304 | 0.0317 | | 14 | - | - | - | 0.0059 | 0.0058 | 0.0058 | 0.0057 | 0.0092 | 0.0128 | 0.0082 | 0.0139 | 0.0081 | 0.0096 | 0.0199 | 0.0213 | | 15 | - | - | - | - | 0.0014 | 0.0039 | 0.0052 | 0.0034 | 0.0051 | 0.0085 | 0.0044 | 0.0072 | 0.0107 | 0.0101 | 0.0082 | | 16 | - | - | - | - | 0.0016 | 0.0030 | 0.0026 | 0.0036 | 0.0065 | 0.0051 | 0.0061 | 0.0084 | 0.0065 | 0.0083 | 0.0100 | | 17 | - | - | - | - | - | - | 0.0010 | 0.0020 | 0.0028 | - | 0.0040 | 0.0049 | 0.0067 | 0.0071 | 0.0062 | | 18 | - | - | - | - | - | - | - | - | 0.0032 | - | 0.0024 | - | 0.0034 | 0.0056 | 0.0041 | | 19 | - | - | - | - | - | - | - | - | - | - | - | - | 0.0025 | 0.0018 | - | #### Distribution with redundancy 2: | Bits \ Nodes | 16 | 20 | 32 | 48 | 64 | 100 | 128 | 160 | 200 | 256 | 350 | 500 | 800 | 1000 | 5000 | | 8 | 0.2000 | 0.3081 | 0.2727 | 0.5152 | 0.5294 | 0.5733 | 0.6364 | 0.7091 | 0.7673 | 0.8000 | 0.8537 | 0.8862 | 0.8933 | 0.8976 | 0.9659 | | 9 | 0.0725 | 0.2242 | 0.1795 | 0.1795 | 0.3043 | 0.3173 | 0.3846 | 0.5077 | 0.5345 | 0.6364 | 0.7340 | 0.7952 | 0.8400 | 0.8720 | 0.9317 | | 10 | 0.0725 | 0.1322 | 0.1233 | 0.2099 | 0.1579 | 0.2415 | 0.3333 | 0.5733 | 0.4611 | 0.5789 | 0.6558 | 0.7269 | 0.8293 | 0.8425 | 0.8976 | | 11 | 0.0340 | 0.0857 | 0.0922 | 0.1111 | 0.1233 | 0.1969 | 0.2558 | 0.5937 | 0.5643 | 0.5897 | 0.5965 | 0.6099 | 0.6587 | 0.7591 | 0.8830 | | 12 | 0.0448 | 0.0385 | 0.0623 | 0.1065 | 0.0986 | 0.1285 | 0.3725 | 0.3831 | 0.4064 | 0.4074 | 0.4799 | 0.4880 | 0.5124 | 0.8328 | 0.8976 | | 13 | 0.0340 | 0.0328 | 0.0554 | 0.0699 | 0.0623 | 0.0948 | 0.1049 | 0.2183 | 0.2344 | 0.3191 | 0.3498 | 0.4539 | 0.5733 | 0.6656 | 0.8870 | | 14 | 0.0140 | 0.0189 | 0.0376 | 0.0452 | 0.0466 | 0.0717 | 0.0986 | 0.1057 | 0.1047 | 0.2242 | 0.2853 | 0.2798 | 0.4064 | 0.4959 | 0.8830 | | 15 | 0.0094 | 0.0118 | 0.0385 | 0.0268 | 0.0331 | 0.0638 | 0.0708 | 0.0775 | 0.0898 | 0.1322 | 0.2133 | 0.2104 | 0.3550 | 0.4446 | 0.8752 | | 16 | 0.0097 | 0.0081 | 0.0380 | 0.0303 | 0.0362 | 0.0577 | 0.0501 | 0.0627 | 0.0717 | 0.1033 | 0.1733 | 0.1678 | 0.2586 | 0.3101 | 0.8511 | | 17 | 0.0075 | 0.0066 | 0.0346 | 0.0293 | 0.0154 | 0.0258 | 0.0466 | 0.0546 | 0.0704 | 0.1041 | 0.1469 | 0.1983 | 0.2702 | 0.2972 | 0.7740 | | 18 | 0.0053 | 0.0057 | 0.0098 | 0.0098 | 0.0122 | 0.0149 | 0.0238 | 0.0300 | 0.0394 | 0.0353 | 0.0434 | 0.0553 | 0.0611 | 0.1782 | 0.6334 | | 19 | - | 0.0022 | 0.0050 | 0.0162 | 0.0098 | 0.0133 | 0.0149 | 0.0220 | 0.0242 | 0.0252 | 0.0333 | 0.0398 | 0.0495 | 0.0999 | 0.5145 | | 20 | - | - | 0.0030 | 0.0107 | 0.0088 | 0.0098 | 0.0144 | 0.0140 | 0.0148 | 0.0203 | 0.0195 | 0.0255 | 0.0348 | 0.1133 | 0.4481 | | 21 | - | - | 0.0043 | 0.0063 | 0.0051 | 0.0074 | 0.0079 | 0.0085 | 0.0086 | 0.0113 | 0.0147 | 0.0170 | 0.0237 | 0.1068 | 0.4422 | | 22 | - | - | - | 0.0026 | 0.0035 | 0.0037 | 0.0082 | 0.0061 | 0.0077 | 0.0087 | 0.0101 | 0.0134 | 0.0193 | 0.1140 | 0.4635 | | 23 | - | - | - | 0.0019 | - | 0.0026 | 0.0080 | 0.0055 | 0.0056 | 0.0057 | 0.0063 | 0.0096 | 0.0155 | 0.1294 | 0.4982 | | 24 | - | - | - | 0.0013 | - | - | 0.0074 | 0.0060 | 0.0058 | 0.0053 | 0.0049 | 0.0068 | 0.0112 | 0.0471 | 0.3219 | | 25 | - | - | - | - | - | - | - | - | - | 0.0043 | 0.0043 | 0.0058 | 0.0067 | 0.0512 | 0.2543 | | 26 | - | - | - | - | - | - | - | - | - | - | 0.0040 | 0.0042 | 0.0043 | 0.0051 | 0.0210 | | 27 | - | - | - | - | - | - | - | - | - | - | - | - | 0.0028 | 0.0157 | 0.0814 | ### Default number of distribution bits used Note that changing the amount of distribution bits used will change what buckets exist, which will change the distribution considerably. We thus do not want to alter the distribution bit count too often. Ideally, the users would be allowed to configure minimal and maximal acceptable waste, and the current amount of distribution bits could then just be calculated on the fly. But as computing the waste values above are computational heavy, especially with many nodes and many distribution bits, currently only a couple of profiles are available for you to configure. **Vespa Cloud note:** Vespa Cloud locks distribution bit count to 16. This is because Vespa Cloud offers auto-scaling of nodes, and such a scaling decision should not implicitly lead to a full redistribution of data by crossing a distribution bit node count boundary. 16 bits strikes a good balance of low skew and high performance for most production deployments. #### Loose mode (default) The loose mode allows for more waste, allowing the amount of nodes to change considerably without altering the distribution bit counts. | Node count | 1-4 | 5-199 | 200-\> | | Distribution bit count | 8 | 16 | 24 | | Max calculated waste \*) | 3.03 % | 7.17 % | ? | | Minimum buckets/node \*\*) | 256 - 64 | 13108 - 329 | 83886 - | #### Strict mode (not default) The strict mode attempts to keep the waste below 1.0 %. When it needs to increase the bit count it increases the bit count significantly to allow considerable more growth before having to adjust the count again. | Node count | 1-4 | 5-14 | 15-199 | 200-799 | 800-1499 | 1500-4999 | 5000-\> | | Distribution bit count | 8 | 16 | 21 | 25 | 28 | 30 | 32 | | Max calculated waste \*) | 3 % | 0.83 % | 0.86 % | 0.67 % | ? | ? | ? | | Minimum buckets/node \*\*) | 256 - 64 | 13107 - 4681 | 139810 - 10538 | 167772 - 41995 | 335544 - 179076 | 715827 - 214791 | 858993 - | \*) Max calculated waste, given redundancy 2 and the max node count in the given range, as shown in the table above. (Note that this assumes equal sized buckets, and that every possible bucket exist. In a real system there will be random variation). \*\*) Given a node count and distribution bits, there is a minimum number of buckets enforced to exist. However, splitting due to bucket size may increase this count beyond this number. This value shows the maximum value of the minimum. (That is the number of buckets per node enforced for the lowest node count in the range) Ideally one wants to have few buckets enforced by distribution and rather let bucket size split buckets, as that leaves more freedom to users. ## Q/A **Q: I have a cluster with multiple groups, with the same number of nodes (more than one) in each group. Why does the first node in the first group store a slightly different number of documents than the first node in the second group (and so on)?** A: This is both expected and intentional—to see why we must look at how the ideal state algorithm works. As previously outlined, the ideal state algorithm requires 3 distinct inputs: 1. The ID of the bucket to be replicated across content nodes. 2. The set of all nodes (i.e. unique distribution keys) in the cluster _across_ all groups, and their current availability state (Down, Up, Maintenance etc.). 3. The cluster topology and replication configuration. The topology includes knowledge of all groups. From this the algorithm returns a deterministic, ordered sequence of nodes (i.e. distribution keys) across all configured groups. The ordering of nodes is given by their individual pseudo-random node _score_, where higher scoring nodes are considered more _ideal_ for storing replicas for a given bucket. The set of nodes in this sequence respects the constraints given by the configured group topology and replication level. When computing node scores within a group, the _absolute_ distribution keys are used rather than a node's _relative_ ordering within the group. This means the individual node scores—and consequently the distribution of bucket replicas—within one group is different (with a very high probability) from all other groups. What the ideal state algorithm ensures is that there exists a deterministic, configurable number of replicas per bucket within each group and that they are evenly distributed across each group's nodes—the exact mapping can be considered an unspecified "implementation detail". The rationale for using absolute distribution keys rather than relative ordering is closely related to the earlier discussion about why [modulo distribution](#a-simple-example-modulo) is a poor choice. Let \(N\_g \gt 1\) be the number of nodes in a given group: - A relative ordering means that removing—or just reordering—a single node from the configuration can potentially lead to a full redistribution of all data within that group, not just \( \frac{1}{N\_g} \) of the data. Imagine for instance moving a node from being first in the group to being the last. - If we require nodes with the same relative index in each group to store the same data set (i.e. a row-column strategy), this immediately suffers in failure scenarios even when just a single node becomes unavailable. Data coverage in the group remains reduced until the node is replaced, as no other nodes can take over responsibility for the data. This is because removing the node leads to the problem in the previous point, where a disproportionally large amount of data must be moved due to the relative ordering changing. With the ideal state algorithm, the remaining nodes in the group will transparently assume ownership of the data, with each node receiving an expected \( \frac{1}{N\_g - 1} \) of the unavailable node's buckets. Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Computational cost](#computational-cost) - [A simple example: Modulo](#a-simple-example-modulo) - [Weighted random election](#weighted-random-election) - [Weighting nodes](#weighting-nodes) - [Distribution skew](#distribution-skew) - [Distribution waste](#distribution_waste) - [Calculated waste from various cluster sizes](#calculated-waste-from-various-cluster-sizes) - [Default number of distribution bits used](#default-number-of-distribution-bits-used) - [Q/A](#qa) --- # Source: https://docs.vespa.ai/index.html.md # Source: https://docs.vespa.ai/en/performance/index.html.md # Source: https://docs.vespa.ai/en/learn/tutorials/index.html.md # Tutorials and use cases ### Text search - [Tutorial: Text Search](text-search). A text search tutorial and introduction to text ranking with Vespa using traditional information retrieval techniques like BM25. - [Tutorial: Improving Text Search with Machine Learning](text-search-ml). This tutorial builds on the text search tutorial but introduces Learning to Rank to improve relevance. ### Vector Search Learn how to use Vespa Vector Search in the [practical nearest neighbor search guide](../../querying/nearest-neighbor-search-guide). It uses Vespa's support for [nearest neighbor search](../../querying/nearest-neighbor-search), there is also support for fast [approximate nearest neighbor search](../../querying/approximate-nn-hnsw) in Vespa. The guide covers combining vector search with filters and how to perform hybrid search, combining retrieval over inverted index structures with vector search. ### Hybrid Search [Tutorial: Hybrid Text Search](hybrid-search). A search tutorial and introduction to hybrid text ranking with Vespa, combining BM25 with text embedding models. ### RAG (Retrieval-Augmented Generation) - [Tutorial: The RAG Blueprint](rag-blueprint). A tutorial that provides a blueprint for building high-quality RAG applications with Vespa. Includes evaluation and learning-to-rank (LTR). - [Retrieval-augmented generation (RAG) in Vespa](../../rag/rag). ### Combining search and recommendation: The News tutorial Follow this series to learn how to build a complete application supporting both content recommendation/personalization, navigation, and search. - [News 1: Getting Started](news-1-deploy-an-application) - [News 2: Application Packages, Feeding, Query](news-2-basic-feeding-and-query) - [News 3: Sorting, Grouping and Ranking](news-3-searching) - [News 4: Embeddings](news-4-embeddings) - [News 5: Partial Updates, ANNs, Filtering](news-5-recommendation) - [News 6: Custom Searchers, Document Processors](news-6-recommendation-with-searchers) - [News 7: Parent-Child, Tensor Ranking](news-7-recommendation-with-parent-child) ### ML Model Serving Learn how to use Vespa for ML model serving in [Stateless Model Evaluation](../../ranking/stateless-model-evaluation.html). Vespa supports running inference with models from many popular ML frameworks, which can be used for ranking, query classification, question answering, multi-modal retrieval, and more. - [Ranking with ONNX models](../../ranking/onnx). Export models from popular deep learning frameworks such as [PyTorch](https://pytorch.org/docs/stable/onnx.html) to [ONNX](https://onnx.ai/) format for serving in Vespa. Vespa integrates with [ONNX-Runtime](https://blog.vespa.ai/stateful-model-serving-how-we-accelerate-inference-using-onnx-runtime/) for [accelerated inference](https://blog.vespa.ai/stateless-model-evaluation/). Many ML frameworks support exporting models to ONNX, including [sklearn](http://onnx.ai/sklearn-onnx/). - [Ranking with LightGBM models](../../ranking/lightgbm) - [Ranking with XGBoost models](../../ranking/xgboost) - [Ranking with TensorFlow models](../../ranking/tensorflow) ### Embedding Model Inference Vespa supports integrating [embedding](../../rag/embedding.html) models, which avoids transferring large amounts of embedding vector data over the network and allows for efficient serving of embedding models. - [Huggingface Embedder](../../rag/embedding.html#huggingface-embedder) Use single-vector embedding models from Hugging face - [ColBERT Embedder](../../rag/embedding.html#colbert-embedder) Use multi-vector embedding models - [Splade Embedder](../../rag/embedding.html#splade-embedder) Use sparse learned single vector embedding models ### E-Commerce The [e-commerce shopping sample application](e-commerce) demonstrates Vespa grouping, true in-place partial updates, custom ranking, and more. ### Building a custom HTTP API The [HTTP API tutorial](http-api.html) shows how to build a custom HTTP API in an application. ### More examples and sample applications There are many examples and starting applications on[GitHub](https://github.com/vespa-engine/sample-apps/)and [Pyvespa examples](https://vespa-engine.github.io/pyvespa/index.html). Copyright © 2026 - [Cookie Preferences](#) ### On this page: - [Text search](#) - [Vector Search](#) - [Hybrid Search](#) - [RAG (Retrieval-Augmented Generation)](#) - [Combining search and recommendation: The News tutorial](#) - [ML Model Serving](#) - [Embedding Model Inference](#) - [E-Commerce](#) - [Building a custom HTTP API](#) - [More examples and sample applications](#) --- # Source: https://docs.vespa.ai/en/reference/writing/indexing-language.html.md # Indexing language reference This reference documents the full Vespa _indexing language_. If more complex processing of input data is required, implement a[document processor](../../applications/document-processors.html). The indexing language is analogous to UNIX pipes, in that statements consists of expressions separated by the _pipe_ symbol where the output of each expression is the input of the next. Statements are terminated by semicolon and are independent of each other (except when using variables). Find examples in the [indexing](/en/writing/indexing.html) guide. ## Indexing script An indexing script is a sequence of [indexing statements](#indexing-statement) separated by a semicolon (`;`). A script is executed statement-by-statement, in order, one document at a time. Vespa derives one indexing script per search cluster based on the search definitions assigned to that cluster. As a document is fed to a search cluster, it passes through the corresponding[indexing cluster](../applications/services/content.html#document-processing), which runs the document through its indexing script. Note that this also happens whenever the document is[reindexed](../../operations/reindexing.html), so expressions such as [now](#now) must be thought of as the time the document was (last) _indexed_, not when it was _fed_. You can examine the indexing script generated for a specific search cluster by retrieving the configuration of the indexing document processor. ``` $ vespa-get-config -i search/cluster. -n vespa.configdefinition.ilscripts ``` The current _execution value_ is set to `null` prior to executing a statement. ## Indexing statement An indexing statement is a sequence of [indexing expressions](#indexing-expression) separated by a pipe (`|`). A statement is executed expression-by-expression, in order. Within a statement, the execution value is passed from one expression to the next. The simplest of statements passes the value of an input field into an attribute: ``` input year | attribute year; ``` The above statement consists of 2 expressions; `input year` and`attribute year`. The former sets the execution value to the value of the "year" field of the input document. The latter writes the current execution value into the attribute "year". ## Indexing expression ### Primitives A string, numeric literal and true/false can be used as an expression to explicitly set the execution value. Examples: `"foo"`, `69`, `true`). ### Outputs An output expression is an expression that writes the current execution value to a document field. These expressions also double as the indicator for the type of field to construct (i.e. attribute, index or summary). It is important to note that you can not assign different values to the same field in a single document (e.g. `attribute | lowercase | index` is **illegal** and will not deploy). | Expression | Description | | --- | --- | | `attribute` | Writes the execution value to the current field. During deployment, this indicates that the field should be stored as an attribute. | | `index` | Writes the execution value to the current field. During deployment, this indicates that the field should be stored as an index field. | | `summary` | Writes the execution value to the current field. During deployment, this indicates that the field should be included in the document summary. | ### Arithmetics Indexing statements can contain any combination of arithmetic operations, as long as the operands are numeric values. In case you need to convert from string to numeric, or convert from one numeric type to another, use the applicable [converter](#converters) expression. The supported arithmetic operators are: | Operator | Description | | --- | --- | | ` + ` | Sets the execution value to the result of adding of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` - ` | Sets the execution value to the result of subtracting of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` * ` | Sets the execution value to the result of multiplying of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` / ` | Sets the execution value to the result of dividing of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` % ` | Sets the execution value to the remainder of dividing the execution value of the `lhs` expression with that of the `rhs` expression. | | ` . ` | Sets the execution value to the concatenation of the execution value of the `lhs` expression with that of the `rhs` expression. If _both_ `lhs` and `rhs` are collection types, this operator will append `rhs` to `lhs` (if any operand is null, it is treated as an empty collection). If not, this operator concatenates the string representations of `lhs` and `rhs` (if any operand is null, the result is null). | You may use parenthesis to declare precedence of execution (e.g. `(1 + 2) * 3`). This also works for more advanced array concatenation statements such as `(input str_a | split ',') . (input str_b | split ',') | index arr`. ### Converters These expressions let you convert from one data type to another. | Converter | Input | Output | Description | | --- | --- | --- | --- | | `binarize [threshold]` | Any tensor | Any tensor | Replaces all values in a tensor by 0 or 1. This takes an optional argument specifying the threshold a value needs to be larger than to be replaced by 1 instead of 0. The default threshold is 0. This is useful to create a suitable input to [pack\_bits](#pack_bits). | | `embed [id] [args]` | String | A tensor | Invokes an [embedder](../../rag/embedding.html) to convert a text to one or more vector embeddings. The type of the output tensor is what is required by the following expression (as supported by the specific embedder). Arguments are given space separated, as in `embed colbert chunk`. The first argument and can be omitted when only a single embedder is configured. Any additional arguments are passed to the embedder implementation. If the same chunk expression with the same input occurs multiple times in a schema, its value will only be computed once. | | `chunk id [args]` | String | A tensor | Invokes a which convert a string into an array of strings. Arguments are given space separated, as in `chunk fixed-length 512`. The id of the chunker to use is required and can be a chunker bundled with Vespa, or any chunker component added in the services.xml, see the [chunking reference](../rag/chunking.html). Any additional arguments are passed to the chunker implementation. If the same chunk expression with the same input occurs multiple times in a schema, its value will only be computed once. | | `hash` | String | int or long | Converts the input to a hash value (using SipHash). The hash will be int or long depending on the target field. | | `pack_bits` | A tensor | A tensor | Packs the values of a binary tensor into bytes with 1 bit per value in big-endian order. The input tensor must have a single dense dimension. It can have any value type and any number of sparse dimensions. Values that are not 0 or 1 will be binarized with 0 as the threshold. The output tensor will have: - `int8` as the value type. - The dense dimension size divided by 8 (rounded upwards to integer). - The same sparse dimensions as before. The resulting tensor can be unpacked during ranking using [unpack\_bits](../ranking/ranking-expressions.html#unpack-bits). A tensor can be converted to binary form suitable as input to this by the [binarize function](#binarize). | | `to_array` | Any | Array\ | Converts the execution value to a single-element array. | | `to_byte` | Any | Byte | Converts the execution value to a byte. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_double` | Any | Double | Converts the execution value to a double. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_float` | Any | Float | Converts the execution value to a float. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_int` | Any | Integer | Converts the execution value to an int. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_long` | Any | Long | Converts the execution value to a long. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_bool` | Any | Bool | Converts the execution value to a boolean type. If the input is a string it will become true if it is not empty. If the input is a number it will become true if it is != 0. | | `to_pos` | String | Position | Converts the execution value to a position struct. The input format must be either a) `[N|S];[E|W]`, or b) `x;y`. | | `to_string` | Any | String | Converts the execution value to a string. | | `to_uri` | String | Uri | Converts the execution value to a URI struct | | `to_wset` | Any | WeightedSet\ | Converts the execution value to a single-element weighted set with default weight. | | `to_epoch_second` | String | Long | Converts an ISO-8601 instant formatted String to Unix epoch (or Unix time or POSIX time or Unix timestamp) which is the number of seconds elapsed since January 1, 1970, UTC. The converter uses [java.time.Instant.parse](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/time/Instant.html#parse(java.lang.CharSequence)) to parse the input string value. This will throw a DateTimeParseException if the input cannot be parsed. Examples: - `2023-12-24T17:00:43.000Z` is converted to `1703437243L` - `2023-12-24T17:00:43Z` is converted to `1703437243L` - `2023-12-24T17:00:43.431Z` is converted to `1703437243L` - `2023-12-24T17:00:43.431+00:00` is converted to `1703437243L` | ### Other expressions The following are the unclassified expressions available: | Expression | Description | | --- | --- | | `_` | Returns the current execution value. This is useful, e.g., to prepend some other value to the current execution value, see [this example](/en/writing/indexing.html#execution-value-example). | | `attribute ` | Writes the execution value to the named attribute field. | | `base64decode` | If the execution value is a string, it is base-64 decoded to a long integer. If it is not a string, the execution value is set to `Long.MIN_VALUE`. | | `base64encode` | If the execution value is a long integer, it is base-64 encoded to a string. If it is not a long integer, the execution value is set to `null`. | | `echo` | Prints the execution value to standard output, for debug purposes. | | `flatten` | **Deprecated:** Use [tokens](/en/reference/schemas/schemas.html#tokens) in the schema instead. | | `for_each {